Probabilistic inference using linear Gaussian importance sampling for hybrid Bayesian networks
NASA Astrophysics Data System (ADS)
Sun, Wei; Chang, K. C.
2005-05-01
Probabilistic inference for Bayesian networks is in general NP-hard using either exact algorithms or approximate methods. However, for very complex networks, only the approximate methods such as stochastic sampling could be used to provide a solution given any time constraint. There are several simulation methods currently available. They include logic sampling (the first proposed stochastic method for Bayesian networks, the likelihood weighting algorithm) the most commonly used simulation method because of its simplicity and efficiency, the Markov blanket scoring method, and the importance sampling algorithm. In this paper, we first briefly review and compare these available simulation methods, then we propose an improved importance sampling algorithm called linear Gaussian importance sampling algorithm for general hybrid model (LGIS). LGIS is aimed for hybrid Bayesian networks consisting of both discrete and continuous random variables with arbitrary distributions. It uses linear function and Gaussian additive noise to approximate the true conditional probability distribution for continuous variable given both its parents and evidence in a Bayesian network. One of the most important features of the newly developed method is that it can adaptively learn the optimal important function from the previous samples. We test the inference performance of LGIS using a 16-node linear Gaussian model and a 6-node general hybrid model. The performance comparison with other well-known methods such as Junction tree (JT) and likelihood weighting (LW) shows that LGIS-GHM is very promising.
Roh, Min K; Gillespie, Dan T; Petzold, Linda R
2010-11-07
The weighted stochastic simulation algorithm (wSSA) was developed by Kuwahara and Mura [J. Chem. Phys. 129, 165101 (2008)] to efficiently estimate the probabilities of rare events in discrete stochastic systems. The wSSA uses importance sampling to enhance the statistical accuracy in the estimation of the probability of the rare event. The original algorithm biases the reaction selection step with a fixed importance sampling parameter. In this paper, we introduce a novel method where the biasing parameter is state-dependent. The new method features improved accuracy, efficiency, and robustness.
An Efficient MCMC Algorithm to Sample Binary Matrices with Fixed Marginals
ERIC Educational Resources Information Center
Verhelst, Norman D.
2008-01-01
Uniform sampling of binary matrices with fixed margins is known as a difficult problem. Two classes of algorithms to sample from a distribution not too different from the uniform are studied in the literature: importance sampling and Markov chain Monte Carlo (MCMC). Existing MCMC algorithms converge slowly, require a long burn-in period and yield…
Annealed Importance Sampling Reversible Jump MCMC algorithms
DOE Office of Scientific and Technical Information (OSTI.GOV)
Karagiannis, Georgios; Andrieu, Christophe
2013-03-20
It will soon be 20 years since reversible jump Markov chain Monte Carlo (RJ-MCMC) algorithms have been proposed. They have significantly extended the scope of Markov chain Monte Carlo simulation methods, offering the promise to be able to routinely tackle transdimensional sampling problems, as encountered in Bayesian model selection problems for example, in a principled and flexible fashion. Their practical efficient implementation, however, still remains a challenge. A particular difficulty encountered in practice is in the choice of the dimension matching variables (both their nature and their distribution) and the reversible transformations which allow one to define the one-to-one mappingsmore » underpinning the design of these algorithms. Indeed, even seemingly sensible choices can lead to algorithms with very poor performance. The focus of this paper is the development and performance evaluation of a method, annealed importance sampling RJ-MCMC (aisRJ), which addresses this problem by mitigating the sensitivity of RJ-MCMC algorithms to the aforementioned poor design. As we shall see the algorithm can be understood as being an “exact approximation” of an idealized MCMC algorithm that would sample from the model probabilities directly in a model selection set-up. Such an idealized algorithm may have good theoretical convergence properties, but typically cannot be implemented, and our algorithms can approximate the performance of such idealized algorithms to an arbitrary degree while not introducing any bias for any degree of approximation. Our approach combines the dimension matching ideas of RJ-MCMC with annealed importance sampling and its Markov chain Monte Carlo implementation. We illustrate the performance of the algorithm with numerical simulations which indicate that, although the approach may at first appear computationally involved, it is in fact competitive.« less
NASA Astrophysics Data System (ADS)
Shayanfar, Mohsen Ali; Barkhordari, Mohammad Ali; Roudak, Mohammad Amin
2017-06-01
Monte Carlo simulation (MCS) is a useful tool for computation of probability of failure in reliability analysis. However, the large number of required random samples makes it time-consuming. Response surface method (RSM) is another common method in reliability analysis. Although RSM is widely used for its simplicity, it cannot be trusted in highly nonlinear problems due to its linear nature. In this paper, a new efficient algorithm, employing the combination of importance sampling, as a class of MCS, and RSM is proposed. In the proposed algorithm, analysis starts with importance sampling concepts and using a represented two-step updating rule of design point. This part finishes after a small number of samples are generated. Then RSM starts to work using Bucher experimental design, with the last design point and a represented effective length as the center point and radius of Bucher's approach, respectively. Through illustrative numerical examples, simplicity and efficiency of the proposed algorithm and the effectiveness of the represented rules are shown.
Iterative Importance Sampling Algorithms for Parameter Estimation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Grout, Ray W; Morzfeld, Matthias; Day, Marcus S.
In parameter estimation problems one computes a posterior distribution over uncertain parameters defined jointly by a prior distribution, a model, and noisy data. Markov chain Monte Carlo (MCMC) is often used for the numerical solution of such problems. An alternative to MCMC is importance sampling, which can exhibit near perfect scaling with the number of cores on high performance computing systems because samples are drawn independently. However, finding a suitable proposal distribution is a challenging task. Several sampling algorithms have been proposed over the past years that take an iterative approach to constructing a proposal distribution. We investigate the applicabilitymore » of such algorithms by applying them to two realistic and challenging test problems, one in subsurface flow, and one in combustion modeling. More specifically, we implement importance sampling algorithms that iterate over the mean and covariance matrix of Gaussian or multivariate t-proposal distributions. Our implementation leverages massively parallel computers, and we present strategies to initialize the iterations using 'coarse' MCMC runs or Gaussian mixture models.« less
An adaptive importance sampling algorithm for Bayesian inversion with multimodal distributions
Li, Weixuan; Lin, Guang
2015-03-21
Parametric uncertainties are encountered in the simulations of many physical systems, and may be reduced by an inverse modeling procedure that calibrates the simulation results to observations on the real system being simulated. Following Bayes’ rule, a general approach for inverse modeling problems is to sample from the posterior distribution of the uncertain model parameters given the observations. However, the large number of repetitive forward simulations required in the sampling process could pose a prohibitive computational burden. This difficulty is particularly challenging when the posterior is multimodal. We present in this paper an adaptive importance sampling algorithm to tackle thesemore » challenges. Two essential ingredients of the algorithm are: 1) a Gaussian mixture (GM) model adaptively constructed as the proposal distribution to approximate the possibly multimodal target posterior, and 2) a mixture of polynomial chaos (PC) expansions, built according to the GM proposal, as a surrogate model to alleviate the computational burden caused by computational-demanding forward model evaluations. In three illustrative examples, the proposed adaptive importance sampling algorithm demonstrates its capabilities of automatically finding a GM proposal with an appropriate number of modes for the specific problem under study, and obtaining a sample accurately and efficiently representing the posterior with limited number of forward simulations.« less
An adaptive importance sampling algorithm for Bayesian inversion with multimodal distributions
DOE Office of Scientific and Technical Information (OSTI.GOV)
Li, Weixuan; Lin, Guang, E-mail: guanglin@purdue.edu
2015-08-01
Parametric uncertainties are encountered in the simulations of many physical systems, and may be reduced by an inverse modeling procedure that calibrates the simulation results to observations on the real system being simulated. Following Bayes' rule, a general approach for inverse modeling problems is to sample from the posterior distribution of the uncertain model parameters given the observations. However, the large number of repetitive forward simulations required in the sampling process could pose a prohibitive computational burden. This difficulty is particularly challenging when the posterior is multimodal. We present in this paper an adaptive importance sampling algorithm to tackle thesemore » challenges. Two essential ingredients of the algorithm are: 1) a Gaussian mixture (GM) model adaptively constructed as the proposal distribution to approximate the possibly multimodal target posterior, and 2) a mixture of polynomial chaos (PC) expansions, built according to the GM proposal, as a surrogate model to alleviate the computational burden caused by computational-demanding forward model evaluations. In three illustrative examples, the proposed adaptive importance sampling algorithm demonstrates its capabilities of automatically finding a GM proposal with an appropriate number of modes for the specific problem under study, and obtaining a sample accurately and efficiently representing the posterior with limited number of forward simulations.« less
Screen Space Ambient Occlusion Based Multiple Importance Sampling for Real-Time Rendering
NASA Astrophysics Data System (ADS)
Zerari, Abd El Mouméne; Babahenini, Mohamed Chaouki
2018-03-01
We propose a new approximation technique for accelerating the Global Illumination algorithm for real-time rendering. The proposed approach is based on the Screen-Space Ambient Occlusion (SSAO) method, which approximates the global illumination for large, fully dynamic scenes at interactive frame rates. Current algorithms that are based on the SSAO method suffer from difficulties due to the large number of samples that are required. In this paper, we propose an improvement to the SSAO technique by integrating it with a Multiple Importance Sampling technique that combines a stratified sampling method with an importance sampling method, with the objective of reducing the number of samples. Experimental evaluation demonstrates that our technique can produce high-quality images in real time and is significantly faster than traditional techniques.
ELF: An Extended-Lagrangian Free Energy Calculation Module for Multiple Molecular Dynamics Engines.
Chen, Haochuan; Fu, Haohao; Shao, Xueguang; Chipot, Christophe; Cai, Wensheng
2018-06-18
Extended adaptive biasing force (eABF), a collective variable (CV)-based importance-sampling algorithm, has proven to be very robust and efficient compared with the original ABF algorithm. Its implementation in Colvars, a software addition to molecular dynamics (MD) engines, is, however, currently limited to NAMD and LAMMPS. To broaden the scope of eABF and its variants, like its generalized form (egABF), and make them available to other MD engines, e.g., GROMACS, AMBER, CP2K, and openMM, we present a PLUMED-based implementation, called extended-Lagrangian free energy calculation (ELF). This implementation can be used as a stand-alone gradient estimator for other CV-based sampling algorithms, such as temperature-accelerated MD (TAMD) and extended-Lagrangian metadynamics (MtD). ELF provides the end user with a convenient framework to help select the best-suited importance-sampling algorithm for a given application without any commitment to a particular MD engine.
Convergence and Efficiency of Adaptive Importance Sampling Techniques with Partial Biasing
NASA Astrophysics Data System (ADS)
Fort, G.; Jourdain, B.; Lelièvre, T.; Stoltz, G.
2018-04-01
We propose a new Monte Carlo method to efficiently sample a multimodal distribution (known up to a normalization constant). We consider a generalization of the discrete-time Self Healing Umbrella Sampling method, which can also be seen as a generalization of well-tempered metadynamics. The dynamics is based on an adaptive importance technique. The importance function relies on the weights (namely the relative probabilities) of disjoint sets which form a partition of the space. These weights are unknown but are learnt on the fly yielding an adaptive algorithm. In the context of computational statistical physics, the logarithm of these weights is, up to an additive constant, the free-energy, and the discrete valued function defining the partition is called the collective variable. The algorithm falls into the general class of Wang-Landau type methods, and is a generalization of the original Self Healing Umbrella Sampling method in two ways: (i) the updating strategy leads to a larger penalization strength of already visited sets in order to escape more quickly from metastable states, and (ii) the target distribution is biased using only a fraction of the free-energy, in order to increase the effective sample size and reduce the variance of importance sampling estimators. We prove the convergence of the algorithm and analyze numerically its efficiency on a toy example.
Network Sampling and Classification:An Investigation of Network Model Representations
Airoldi, Edoardo M.; Bai, Xue; Carley, Kathleen M.
2011-01-01
Methods for generating a random sample of networks with desired properties are important tools for the analysis of social, biological, and information networks. Algorithm-based approaches to sampling networks have received a great deal of attention in recent literature. Most of these algorithms are based on simple intuitions that associate the full features of connectivity patterns with specific values of only one or two network metrics. Substantive conclusions are crucially dependent on this association holding true. However, the extent to which this simple intuition holds true is not yet known. In this paper, we examine the association between the connectivity patterns that a network sampling algorithm aims to generate and the connectivity patterns of the generated networks, measured by an existing set of popular network metrics. We find that different network sampling algorithms can yield networks with similar connectivity patterns. We also find that the alternative algorithms for the same connectivity pattern can yield networks with different connectivity patterns. We argue that conclusions based on simulated network studies must focus on the full features of the connectivity patterns of a network instead of on the limited set of network metrics for a specific network type. This fact has important implications for network data analysis: for instance, implications related to the way significance is currently assessed. PMID:21666773
Recursive algorithms for phylogenetic tree counting.
Gavryushkina, Alexandra; Welch, David; Drummond, Alexei J
2013-10-28
In Bayesian phylogenetic inference we are interested in distributions over a space of trees. The number of trees in a tree space is an important characteristic of the space and is useful for specifying prior distributions. When all samples come from the same time point and no prior information available on divergence times, the tree counting problem is easy. However, when fossil evidence is used in the inference to constrain the tree or data are sampled serially, new tree spaces arise and counting the number of trees is more difficult. We describe an algorithm that is polynomial in the number of sampled individuals for counting of resolutions of a constraint tree assuming that the number of constraints is fixed. We generalise this algorithm to counting resolutions of a fully ranked constraint tree. We describe a quadratic algorithm for counting the number of possible fully ranked trees on n sampled individuals. We introduce a new type of tree, called a fully ranked tree with sampled ancestors, and describe a cubic time algorithm for counting the number of such trees on n sampled individuals. These algorithms should be employed for Bayesian Markov chain Monte Carlo inference when fossil data are included or data are serially sampled.
Linear Algebra and Sequential Importance Sampling for Network Reliability
2011-12-01
first test case is an Erdős- Renyi graph with 100 vertices and 150 edges. Figure 1 depicts the relative variance of the three Algorithms: Algorithm TOP...e va ria nc e Figure 1: Relative variance of various algorithms on Erdős Renyi graph, 100 vertices 250 edges. Key: Solid = TOP-DOWN algorithm
Cao, Youfang; Liang, Jie
2013-01-01
Critical events that occur rarely in biological processes are of great importance, but are challenging to study using Monte Carlo simulation. By introducing biases to reaction selection and reaction rates, weighted stochastic simulation algorithms based on importance sampling allow rare events to be sampled more effectively. However, existing methods do not address the important issue of barrier crossing, which often arises from multistable networks and systems with complex probability landscape. In addition, the proliferation of parameters and the associated computing cost pose significant problems. Here we introduce a general theoretical framework for obtaining optimized biases in sampling individual reactions for estimating probabilities of rare events. We further describe a practical algorithm called adaptively biased sequential importance sampling (ABSIS) method for efficient probability estimation. By adopting a look-ahead strategy and by enumerating short paths from the current state, we estimate the reaction-specific and state-specific forward and backward moving probabilities of the system, which are then used to bias reaction selections. The ABSIS algorithm can automatically detect barrier-crossing regions, and can adjust bias adaptively at different steps of the sampling process, with bias determined by the outcome of exhaustively generated short paths. In addition, there are only two bias parameters to be determined, regardless of the number of the reactions and the complexity of the network. We have applied the ABSIS method to four biochemical networks: the birth-death process, the reversible isomerization, the bistable Schlögl model, and the enzymatic futile cycle model. For comparison, we have also applied the finite buffer discrete chemical master equation (dCME) method recently developed to obtain exact numerical solutions of the underlying discrete chemical master equations of these problems. This allows us to assess sampling results objectively by comparing simulation results with true answers. Overall, ABSIS can accurately and efficiently estimate rare event probabilities for all examples, often with smaller variance than other importance sampling algorithms. The ABSIS method is general and can be applied to study rare events of other stochastic networks with complex probability landscape. PMID:23862966
NASA Astrophysics Data System (ADS)
Cao, Youfang; Liang, Jie
2013-07-01
Critical events that occur rarely in biological processes are of great importance, but are challenging to study using Monte Carlo simulation. By introducing biases to reaction selection and reaction rates, weighted stochastic simulation algorithms based on importance sampling allow rare events to be sampled more effectively. However, existing methods do not address the important issue of barrier crossing, which often arises from multistable networks and systems with complex probability landscape. In addition, the proliferation of parameters and the associated computing cost pose significant problems. Here we introduce a general theoretical framework for obtaining optimized biases in sampling individual reactions for estimating probabilities of rare events. We further describe a practical algorithm called adaptively biased sequential importance sampling (ABSIS) method for efficient probability estimation. By adopting a look-ahead strategy and by enumerating short paths from the current state, we estimate the reaction-specific and state-specific forward and backward moving probabilities of the system, which are then used to bias reaction selections. The ABSIS algorithm can automatically detect barrier-crossing regions, and can adjust bias adaptively at different steps of the sampling process, with bias determined by the outcome of exhaustively generated short paths. In addition, there are only two bias parameters to be determined, regardless of the number of the reactions and the complexity of the network. We have applied the ABSIS method to four biochemical networks: the birth-death process, the reversible isomerization, the bistable Schlögl model, and the enzymatic futile cycle model. For comparison, we have also applied the finite buffer discrete chemical master equation (dCME) method recently developed to obtain exact numerical solutions of the underlying discrete chemical master equations of these problems. This allows us to assess sampling results objectively by comparing simulation results with true answers. Overall, ABSIS can accurately and efficiently estimate rare event probabilities for all examples, often with smaller variance than other importance sampling algorithms. The ABSIS method is general and can be applied to study rare events of other stochastic networks with complex probability landscape.
Cao, Youfang; Liang, Jie
2013-07-14
Critical events that occur rarely in biological processes are of great importance, but are challenging to study using Monte Carlo simulation. By introducing biases to reaction selection and reaction rates, weighted stochastic simulation algorithms based on importance sampling allow rare events to be sampled more effectively. However, existing methods do not address the important issue of barrier crossing, which often arises from multistable networks and systems with complex probability landscape. In addition, the proliferation of parameters and the associated computing cost pose significant problems. Here we introduce a general theoretical framework for obtaining optimized biases in sampling individual reactions for estimating probabilities of rare events. We further describe a practical algorithm called adaptively biased sequential importance sampling (ABSIS) method for efficient probability estimation. By adopting a look-ahead strategy and by enumerating short paths from the current state, we estimate the reaction-specific and state-specific forward and backward moving probabilities of the system, which are then used to bias reaction selections. The ABSIS algorithm can automatically detect barrier-crossing regions, and can adjust bias adaptively at different steps of the sampling process, with bias determined by the outcome of exhaustively generated short paths. In addition, there are only two bias parameters to be determined, regardless of the number of the reactions and the complexity of the network. We have applied the ABSIS method to four biochemical networks: the birth-death process, the reversible isomerization, the bistable Schlögl model, and the enzymatic futile cycle model. For comparison, we have also applied the finite buffer discrete chemical master equation (dCME) method recently developed to obtain exact numerical solutions of the underlying discrete chemical master equations of these problems. This allows us to assess sampling results objectively by comparing simulation results with true answers. Overall, ABSIS can accurately and efficiently estimate rare event probabilities for all examples, often with smaller variance than other importance sampling algorithms. The ABSIS method is general and can be applied to study rare events of other stochastic networks with complex probability landscape.
An improved algorithm for evaluating trellis phase codes
NASA Technical Reports Server (NTRS)
Mulligan, M. G.; Wilson, S. G.
1982-01-01
A method is described for evaluating the minimum distance parameters of trellis phase codes, including CPFSK, partial response FM, and more importantly, coded CPM (continuous phase modulation) schemes. The algorithm provides dramatically faster execution times and lesser memory requirements than previous algorithms. Results of sample calculations and timing comparisons are included.
An improved algorithm for evaluating trellis phase codes
NASA Technical Reports Server (NTRS)
Mulligan, M. G.; Wilson, S. G.
1984-01-01
A method is described for evaluating the minimum distance parameters of trellis phase codes, including CPFSK, partial response FM, and more importantly, coded CPM (continuous phase modulation) schemes. The algorithm provides dramatically faster execution times and lesser memory requirements than previous algorithms. Results of sample calculations and timing comparisons are included.
Active Learning Using Hint Information.
Li, Chun-Liang; Ferng, Chun-Sung; Lin, Hsuan-Tien
2015-08-01
The abundance of real-world data and limited labeling budget calls for active learning, an important learning paradigm for reducing human labeling efforts. Many recently developed active learning algorithms consider both uncertainty and representativeness when making querying decisions. However, exploiting representativeness with uncertainty concurrently usually requires tackling sophisticated and challenging learning tasks, such as clustering. In this letter, we propose a new active learning framework, called hinted sampling, which takes both uncertainty and representativeness into account in a simpler way. We design a novel active learning algorithm within the hinted sampling framework with an extended support vector machine. Experimental results validate that the novel active learning algorithm can result in a better and more stable performance than that achieved by state-of-the-art algorithms. We also show that the hinted sampling framework allows improving another active learning algorithm designed from the transductive support vector machine.
Cui, Zaixu; Gong, Gaolang
2018-06-02
Individualized behavioral/cognitive prediction using machine learning (ML) regression approaches is becoming increasingly applied. The specific ML regression algorithm and sample size are two key factors that non-trivially influence prediction accuracies. However, the effects of the ML regression algorithm and sample size on individualized behavioral/cognitive prediction performance have not been comprehensively assessed. To address this issue, the present study included six commonly used ML regression algorithms: ordinary least squares (OLS) regression, least absolute shrinkage and selection operator (LASSO) regression, ridge regression, elastic-net regression, linear support vector regression (LSVR), and relevance vector regression (RVR), to perform specific behavioral/cognitive predictions based on different sample sizes. Specifically, the publicly available resting-state functional MRI (rs-fMRI) dataset from the Human Connectome Project (HCP) was used, and whole-brain resting-state functional connectivity (rsFC) or rsFC strength (rsFCS) were extracted as prediction features. Twenty-five sample sizes (ranged from 20 to 700) were studied by sub-sampling from the entire HCP cohort. The analyses showed that rsFC-based LASSO regression performed remarkably worse than the other algorithms, and rsFCS-based OLS regression performed markedly worse than the other algorithms. Regardless of the algorithm and feature type, both the prediction accuracy and its stability exponentially increased with increasing sample size. The specific patterns of the observed algorithm and sample size effects were well replicated in the prediction using re-testing fMRI data, data processed by different imaging preprocessing schemes, and different behavioral/cognitive scores, thus indicating excellent robustness/generalization of the effects. The current findings provide critical insight into how the selected ML regression algorithm and sample size influence individualized predictions of behavior/cognition and offer important guidance for choosing the ML regression algorithm or sample size in relevant investigations. Copyright © 2018 Elsevier Inc. All rights reserved.
Classifying Imbalanced Data Streams via Dynamic Feature Group Weighting with Importance Sampling.
Wu, Ke; Edwards, Andrea; Fan, Wei; Gao, Jing; Zhang, Kun
2014-04-01
Data stream classification and imbalanced data learning are two important areas of data mining research. Each has been well studied to date with many interesting algorithms developed. However, only a few approaches reported in literature address the intersection of these two fields due to their complex interplay. In this work, we proposed an importance sampling driven, dynamic feature group weighting framework (DFGW-IS) for classifying data streams of imbalanced distribution. Two components are tightly incorporated into the proposed approach to address the intrinsic characteristics of concept-drifting, imbalanced streaming data. Specifically, the ever-evolving concepts are tackled by a weighted ensemble trained on a set of feature groups with each sub-classifier (i.e. a single classifier or an ensemble) weighed by its discriminative power and stable level. The un-even class distribution, on the other hand, is typically battled by the sub-classifier built in a specific feature group with the underlying distribution rebalanced by the importance sampling technique. We derived the theoretical upper bound for the generalization error of the proposed algorithm. We also studied the empirical performance of our method on a set of benchmark synthetic and real world data, and significant improvement has been achieved over the competing algorithms in terms of standard evaluation metrics and parallel running time. Algorithm implementations and datasets are available upon request.
Estimating rare events in biochemical systems using conditional sampling.
Sundar, V S
2017-01-28
The paper focuses on development of variance reduction strategies to estimate rare events in biochemical systems. Obtaining this probability using brute force Monte Carlo simulations in conjunction with the stochastic simulation algorithm (Gillespie's method) is computationally prohibitive. To circumvent this, important sampling tools such as the weighted stochastic simulation algorithm and the doubly weighted stochastic simulation algorithm have been proposed. However, these strategies require an additional step of determining the important region to sample from, which is not straightforward for most of the problems. In this paper, we apply the subset simulation method, developed as a variance reduction tool in the context of structural engineering, to the problem of rare event estimation in biochemical systems. The main idea is that the rare event probability is expressed as a product of more frequent conditional probabilities. These conditional probabilities are estimated with high accuracy using Monte Carlo simulations, specifically the Markov chain Monte Carlo method with the modified Metropolis-Hastings algorithm. Generating sample realizations of the state vector using the stochastic simulation algorithm is viewed as mapping the discrete-state continuous-time random process to the standard normal random variable vector. This viewpoint opens up the possibility of applying more sophisticated and efficient sampling schemes developed elsewhere to problems in stochastic chemical kinetics. The results obtained using the subset simulation method are compared with existing variance reduction strategies for a few benchmark problems, and a satisfactory improvement in computational time is demonstrated.
A recurrence-weighted prediction algorithm for musical analysis
NASA Astrophysics Data System (ADS)
Colucci, Renato; Leguizamon Cucunuba, Juan Sebastián; Lloyd, Simon
2018-03-01
Forecasting the future behaviour of a system using past data is an important topic. In this article we apply nonlinear time series analysis in the context of music, and present new algorithms for extending a sample of music, while maintaining characteristics similar to the original piece. By using ideas from ergodic theory, we adapt the classical prediction method of Lorenz analogues so as to take into account recurrence times, and demonstrate with examples, how the new algorithm can produce predictions with a high degree of similarity to the original sample.
An intrinsic algorithm for parallel Poisson disk sampling on arbitrary surfaces.
Ying, Xiang; Xin, Shi-Qing; Sun, Qian; He, Ying
2013-09-01
Poisson disk sampling has excellent spatial and spectral properties, and plays an important role in a variety of visual computing. Although many promising algorithms have been proposed for multidimensional sampling in euclidean space, very few studies have been reported with regard to the problem of generating Poisson disks on surfaces due to the complicated nature of the surface. This paper presents an intrinsic algorithm for parallel Poisson disk sampling on arbitrary surfaces. In sharp contrast to the conventional parallel approaches, our method neither partitions the given surface into small patches nor uses any spatial data structure to maintain the voids in the sampling domain. Instead, our approach assigns each sample candidate a random and unique priority that is unbiased with regard to the distribution. Hence, multiple threads can process the candidates simultaneously and resolve conflicts by checking the given priority values. Our algorithm guarantees that the generated Poisson disks are uniformly and randomly distributed without bias. It is worth noting that our method is intrinsic and independent of the embedding space. This intrinsic feature allows us to generate Poisson disk patterns on arbitrary surfaces in IR(n). To our knowledge, this is the first intrinsic, parallel, and accurate algorithm for surface Poisson disk sampling. Furthermore, by manipulating the spatially varying density function, we can obtain adaptive sampling easily.
Monte Carlo sampling of Wigner functions and surface hopping quantum dynamics
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kube, Susanna; Lasser, Caroline; Weber, Marcus
2009-04-01
The article addresses the achievable accuracy for a Monte Carlo sampling of Wigner functions in combination with a surface hopping algorithm for non-adiabatic quantum dynamics. The approximation of Wigner functions is realized by an adaption of the Metropolis algorithm for real-valued functions with disconnected support. The integration, which is necessary for computing values of the Wigner function, uses importance sampling with a Gaussian weight function. The numerical experiments agree with theoretical considerations and show an error of 2-3%.
NASA Astrophysics Data System (ADS)
Hu, Zixi; Yao, Zhewei; Li, Jinglai
2017-03-01
Many scientific and engineering problems require to perform Bayesian inference for unknowns of infinite dimension. In such problems, many standard Markov Chain Monte Carlo (MCMC) algorithms become arbitrary slow under the mesh refinement, which is referred to as being dimension dependent. To this end, a family of dimensional independent MCMC algorithms, known as the preconditioned Crank-Nicolson (pCN) methods, were proposed to sample the infinite dimensional parameters. In this work we develop an adaptive version of the pCN algorithm, where the covariance operator of the proposal distribution is adjusted based on sampling history to improve the simulation efficiency. We show that the proposed algorithm satisfies an important ergodicity condition under some mild assumptions. Finally we provide numerical examples to demonstrate the performance of the proposed method.
A novel algorithm for Bluetooth ECG.
Pandya, Utpal T; Desai, Uday B
2012-11-01
In wireless transmission of ECG, data latency will be significant when battery power level and data transmission distance are not maintained. In applications like home monitoring or personalized care, to overcome the joint effect of previous issues of wireless transmission and other ECG measurement noises, a novel filtering strategy is required. Here, a novel algorithm, identified as peak rejection adaptive sampling modified moving average (PRASMMA) algorithm for wireless ECG is introduced. This algorithm first removes error in bit pattern of received data if occurred in wireless transmission and then removes baseline drift. Afterward, a modified moving average is implemented except in the region of each QRS complexes. The algorithm also sets its filtering parameters according to different sampling rate selected for acquisition of signals. To demonstrate the work, a prototyped Bluetooth-based ECG module is used to capture ECG with different sampling rate and in different position of patient. This module transmits ECG wirelessly to Bluetooth-enabled devices where the PRASMMA algorithm is applied on captured ECG. The performance of PRASMMA algorithm is compared with moving average and S-Golay algorithms visually as well as numerically. The results show that the PRASMMA algorithm can significantly improve the ECG reconstruction by efficiently removing the noise and its use can be extended to any parameters where peaks are importance for diagnostic purpose.
Compressively sampled MR image reconstruction using generalized thresholding iterative algorithm
NASA Astrophysics Data System (ADS)
Elahi, Sana; kaleem, Muhammad; Omer, Hammad
2018-01-01
Compressed sensing (CS) is an emerging area of interest in Magnetic Resonance Imaging (MRI). CS is used for the reconstruction of the images from a very limited number of samples in k-space. This significantly reduces the MRI data acquisition time. One important requirement for signal recovery in CS is the use of an appropriate non-linear reconstruction algorithm. It is a challenging task to choose a reconstruction algorithm that would accurately reconstruct the MR images from the under-sampled k-space data. Various algorithms have been used to solve the system of non-linear equations for better image quality and reconstruction speed in CS. In the recent past, iterative soft thresholding algorithm (ISTA) has been introduced in CS-MRI. This algorithm directly cancels the incoherent artifacts produced because of the undersampling in k -space. This paper introduces an improved iterative algorithm based on p -thresholding technique for CS-MRI image reconstruction. The use of p -thresholding function promotes sparsity in the image which is a key factor for CS based image reconstruction. The p -thresholding based iterative algorithm is a modification of ISTA, and minimizes non-convex functions. It has been shown that the proposed p -thresholding iterative algorithm can be used effectively to recover fully sampled image from the under-sampled data in MRI. The performance of the proposed method is verified using simulated and actual MRI data taken at St. Mary's Hospital, London. The quality of the reconstructed images is measured in terms of peak signal-to-noise ratio (PSNR), artifact power (AP), and structural similarity index measure (SSIM). The proposed approach shows improved performance when compared to other iterative algorithms based on log thresholding, soft thresholding and hard thresholding techniques at different reduction factors.
An efficient quantum algorithm for spectral estimation
NASA Astrophysics Data System (ADS)
Steffens, Adrian; Rebentrost, Patrick; Marvian, Iman; Eisert, Jens; Lloyd, Seth
2017-03-01
We develop an efficient quantum implementation of an important signal processing algorithm for line spectral estimation: the matrix pencil method, which determines the frequencies and damping factors of signals consisting of finite sums of exponentially damped sinusoids. Our algorithm provides a quantum speedup in a natural regime where the sampling rate is much higher than the number of sinusoid components. Along the way, we develop techniques that are expected to be useful for other quantum algorithms as well—consecutive phase estimations to efficiently make products of asymmetric low rank matrices classically accessible and an alternative method to efficiently exponentiate non-Hermitian matrices. Our algorithm features an efficient quantum-classical division of labor: the time-critical steps are implemented in quantum superposition, while an interjacent step, requiring much fewer parameters, can operate classically. We show that frequencies and damping factors can be obtained in time logarithmic in the number of sampling points, exponentially faster than known classical algorithms.
Adaptive Sampling-Based Information Collection for Wireless Body Area Networks.
Xu, Xiaobin; Zhao, Fang; Wang, Wendong; Tian, Hui
2016-08-31
To collect important health information, WBAN applications typically sense data at a high frequency. However, limited by the quality of wireless link, the uploading of sensed data has an upper frequency. To reduce upload frequency, most of the existing WBAN data collection approaches collect data with a tolerable error. These approaches can guarantee precision of the collected data, but they are not able to ensure that the upload frequency is within the upper frequency. Some traditional sampling based approaches can control upload frequency directly, however, they usually have a high loss of information. Since the core task of WBAN applications is to collect health information, this paper aims to collect optimized information under the limitation of upload frequency. The importance of sensed data is defined according to information theory for the first time. Information-aware adaptive sampling is proposed to collect uniformly distributed data. Then we propose Adaptive Sampling-based Information Collection (ASIC) which consists of two algorithms. An adaptive sampling probability algorithm is proposed to compute sampling probabilities of different sensed values. A multiple uniform sampling algorithm provides uniform samplings for values in different intervals. Experiments based on a real dataset show that the proposed approach has higher performance in terms of data coverage and information quantity. The parameter analysis shows the optimized parameter settings and the discussion shows the underlying reason of high performance in the proposed approach.
Adaptive Sampling-Based Information Collection for Wireless Body Area Networks
Xu, Xiaobin; Zhao, Fang; Wang, Wendong; Tian, Hui
2016-01-01
To collect important health information, WBAN applications typically sense data at a high frequency. However, limited by the quality of wireless link, the uploading of sensed data has an upper frequency. To reduce upload frequency, most of the existing WBAN data collection approaches collect data with a tolerable error. These approaches can guarantee precision of the collected data, but they are not able to ensure that the upload frequency is within the upper frequency. Some traditional sampling based approaches can control upload frequency directly, however, they usually have a high loss of information. Since the core task of WBAN applications is to collect health information, this paper aims to collect optimized information under the limitation of upload frequency. The importance of sensed data is defined according to information theory for the first time. Information-aware adaptive sampling is proposed to collect uniformly distributed data. Then we propose Adaptive Sampling-based Information Collection (ASIC) which consists of two algorithms. An adaptive sampling probability algorithm is proposed to compute sampling probabilities of different sensed values. A multiple uniform sampling algorithm provides uniform samplings for values in different intervals. Experiments based on a real dataset show that the proposed approach has higher performance in terms of data coverage and information quantity. The parameter analysis shows the optimized parameter settings and the discussion shows the underlying reason of high performance in the proposed approach. PMID:27589758
A Novel Binarization Algorithm for Ballistics Firearm Identification
NASA Astrophysics Data System (ADS)
Li, Dongguang
The identification of ballistics specimens from imaging systems is of paramount importance in criminal investigation. Binarization plays a key role in preprocess of recognizing cartridges in the ballistic imaging systems. Unfortunately, it is very difficult to get the satisfactory binary image using existing binary algorithms. In this paper, we utilize the global and local thresholds to enhance the image binarization. Importantly, we present a novel criterion for effectively detecting edges in the images. Comprehensive experiments have been conducted over sample ballistic images. The empirical results demonstrate the proposed method can provide a better solution than existing binary algorithms.
A hierarchical exact accelerated stochastic simulation algorithm
NASA Astrophysics Data System (ADS)
Orendorff, David; Mjolsness, Eric
2012-12-01
A new algorithm, "HiER-leap" (hierarchical exact reaction-leaping), is derived which improves on the computational properties of the ER-leap algorithm for exact accelerated simulation of stochastic chemical kinetics. Unlike ER-leap, HiER-leap utilizes a hierarchical or divide-and-conquer organization of reaction channels into tightly coupled "blocks" and is thereby able to speed up systems with many reaction channels. Like ER-leap, HiER-leap is based on the use of upper and lower bounds on the reaction propensities to define a rejection sampling algorithm with inexpensive early rejection and acceptance steps. But in HiER-leap, large portions of intra-block sampling may be done in parallel. An accept/reject step is used to synchronize across blocks. This method scales well when many reaction channels are present and has desirable asymptotic properties. The algorithm is exact, parallelizable and achieves a significant speedup over the stochastic simulation algorithm and ER-leap on certain problems. This algorithm offers a potentially important step towards efficient in silico modeling of entire organisms.
QUEST: Eliminating Online Supervised Learning for Efficient Classification Algorithms.
Zwartjes, Ardjan; Havinga, Paul J M; Smit, Gerard J M; Hurink, Johann L
2016-10-01
In this work, we introduce QUEST (QUantile Estimation after Supervised Training), an adaptive classification algorithm for Wireless Sensor Networks (WSNs) that eliminates the necessity for online supervised learning. Online processing is important for many sensor network applications. Transmitting raw sensor data puts high demands on the battery, reducing network life time. By merely transmitting partial results or classifications based on the sampled data, the amount of traffic on the network can be significantly reduced. Such classifications can be made by learning based algorithms using sampled data. An important issue, however, is the training phase of these learning based algorithms. Training a deployed sensor network requires a lot of communication and an impractical amount of human involvement. QUEST is a hybrid algorithm that combines supervised learning in a controlled environment with unsupervised learning on the location of deployment. Using the SITEX02 dataset, we demonstrate that the presented solution works with a performance penalty of less than 10% in 90% of the tests. Under some circumstances, it even outperforms a network of classifiers completely trained with supervised learning. As a result, the need for on-site supervised learning and communication for training is completely eliminated by our solution.
Boosting association rule mining in large datasets via Gibbs sampling.
Qian, Guoqi; Rao, Calyampudi Radhakrishna; Sun, Xiaoying; Wu, Yuehua
2016-05-03
Current algorithms for association rule mining from transaction data are mostly deterministic and enumerative. They can be computationally intractable even for mining a dataset containing just a few hundred transaction items, if no action is taken to constrain the search space. In this paper, we develop a Gibbs-sampling-induced stochastic search procedure to randomly sample association rules from the itemset space, and perform rule mining from the reduced transaction dataset generated by the sample. Also a general rule importance measure is proposed to direct the stochastic search so that, as a result of the randomly generated association rules constituting an ergodic Markov chain, the overall most important rules in the itemset space can be uncovered from the reduced dataset with probability 1 in the limit. In the simulation study and a real genomic data example, we show how to boost association rule mining by an integrated use of the stochastic search and the Apriori algorithm.
Monte Carlo sampling in diffusive dynamical systems
NASA Astrophysics Data System (ADS)
Tapias, Diego; Sanders, David P.; Altmann, Eduardo G.
2018-05-01
We introduce a Monte Carlo algorithm to efficiently compute transport properties of chaotic dynamical systems. Our method exploits the importance sampling technique that favors trajectories in the tail of the distribution of displacements, where deviations from a diffusive process are most prominent. We search for initial conditions using a proposal that correlates states in the Markov chain constructed via a Metropolis-Hastings algorithm. We show that our method outperforms the direct sampling method and also Metropolis-Hastings methods with alternative proposals. We test our general method through numerical simulations in 1D (box-map) and 2D (Lorentz gas) systems.
Fiuzy, Mohammad; Haddadnia, Javad; Mollania, Nasrin; Hashemian, Maryam; Hassanpour, Kazem
2012-01-01
Accurate Diagnosis of Breast Cancer is of prime importance. Fine Needle Aspiration test or "FNA", which has been used for several years in Europe, is a simple, inexpensive, noninvasive and accurate technique for detecting breast cancer. Expending the suitable features of the Fine Needle Aspiration results is the most important diagnostic problem in early stages of breast cancer. In this study, we introduced a new algorithm that can detect breast cancer based on combining artificial intelligent system and Fine Needle Aspiration (FNA). We studied the Features of Wisconsin Data Base Cancer which contained about 569 FNA test samples (212 patient samples (malignant) and 357 healthy samples (benign)). In this research, we combined Artificial Intelligence Approaches, such as Evolutionary Algorithm (EA) with Genetic Algorithm (GA), and also used Exact Classifier Systems (here by Fuzzy C-Means (FCM)) to separate malignant from benign samples. Furthermore, we examined artificial Neural Networks (NN) to identify the model and structure. This research proposed a new algorithm for an accurate diagnosis of breast cancer. According to Wisconsin Data Base Cancer (WDBC) data base, 62.75% of samples were benign, and 37.25% were malignant. After applying the proposed algorithm, we achieved high detection accuracy of about "96.579%" on 205 patients who were diagnosed as having breast cancer. It was found that the method had 93% sensitivity, 73% specialty, 65% positive predictive value, and 95% negative predictive value, respectively. If done by experts, Fine Needle Aspiration (FNA) can be a reliable replacement for open biopsy in palpable breast masses. Evaluation of FNA samples during aspiration can decrease insufficient samples. FNA can be the first line of diagnosis in women with breast masses, at least in deprived regions, and may increase health standards and clinical supervision of patients. Such a smart, economical, non-invasive, rapid and accurate system can be introduced as a useful diagnostic system for comprehensive treatment of breast cancer. Another advantage of this method is the possibility of diagnosing breast abnormalities. If done by experts, FNA can be a reliable replacement for open biopsy in palpable breast masses. Evaluation of FNA samples during aspiration can decrease insufficient samples.
NASA Astrophysics Data System (ADS)
Ruf, B.; Erdnuess, B.; Weinmann, M.
2017-08-01
With the emergence of small consumer Unmanned Aerial Vehicles (UAVs), the importance and interest of image-based depth estimation and model generation from aerial images has greatly increased in the photogrammetric society. In our work, we focus on algorithms that allow an online image-based dense depth estimation from video sequences, which enables the direct and live structural analysis of the depicted scene. Therefore, we use a multi-view plane-sweep algorithm with a semi-global matching (SGM) optimization which is parallelized for general purpose computation on a GPU (GPGPU), reaching sufficient performance to keep up with the key-frames of input sequences. One important aspect to reach good performance is the way to sample the scene space, creating plane hypotheses. A small step size between consecutive planes, which is needed to reconstruct details in the near vicinity of the camera may lead to ambiguities in distant regions, due to the perspective projection of the camera. Furthermore, an equidistant sampling with a small step size produces a large number of plane hypotheses, leading to high computational effort. To overcome these problems, we present a novel methodology to directly determine the sampling points of plane-sweep algorithms in image space. The use of the perspective invariant cross-ratio allows us to derive the location of the sampling planes directly from the image data. With this, we efficiently sample the scene space, achieving higher sampling density in areas which are close to the camera and a lower density in distant regions. We evaluate our approach on a synthetic benchmark dataset for quantitative evaluation and on a real-image dataset consisting of aerial imagery. The experiments reveal that an inverse sampling achieves equal and better results than a linear sampling, with less sampling points and thus less runtime. Our algorithm allows an online computation of depth maps for subsequences of five frames, provided that the relative poses between all frames are given.
A Locality-Constrained and Label Embedding Dictionary Learning Algorithm for Image Classification.
Zhengming Li; Zhihui Lai; Yong Xu; Jian Yang; Zhang, David
2017-02-01
Locality and label information of training samples play an important role in image classification. However, previous dictionary learning algorithms do not take the locality and label information of atoms into account together in the learning process, and thus their performance is limited. In this paper, a discriminative dictionary learning algorithm, called the locality-constrained and label embedding dictionary learning (LCLE-DL) algorithm, was proposed for image classification. First, the locality information was preserved using the graph Laplacian matrix of the learned dictionary instead of the conventional one derived from the training samples. Then, the label embedding term was constructed using the label information of atoms instead of the classification error term, which contained discriminating information of the learned dictionary. The optimal coding coefficients derived by the locality-based and label-based reconstruction were effective for image classification. Experimental results demonstrated that the LCLE-DL algorithm can achieve better performance than some state-of-the-art algorithms.
Algorithms for extraction of structural attitudes from 3D outcrop models
NASA Astrophysics Data System (ADS)
Duelis Viana, Camila; Endlein, Arthur; Ademar da Cruz Campanha, Ginaldo; Henrique Grohmann, Carlos
2016-05-01
The acquisition of geological attitudes on rock cuts using traditional field compass survey can be a time consuming, dangerous, or even impossible task depending on the conditions and location of outcrops. The importance of this type of data in rock-mass classifications and structural geology has led to the development of new techniques, in which the application of photogrammetric 3D digital models has had an increasing use. In this paper we present two algorithms for extraction of attitudes of geological discontinuities from virtual outcrop models: ply2atti and scanline, implemented with the Python programming language. The ply2atti algorithm allows for the virtual sampling of planar discontinuities appearing on the 3D model as individual exposed surfaces, while the scanline algorithm allows the sampling of discontinuities (surfaces and traces) along a virtual scanline. Application to digital models of a simplified test setup and a rock cut demonstrated a good correlation between the surveys undertaken using traditional field compass reading and virtual sampling on 3D digital models.
Kim, Hyoungrae; Jang, Cheongyun; Yadav, Dharmendra K; Kim, Mi-Hyun
2017-03-23
The accuracy of any 3D-QSAR, Pharmacophore and 3D-similarity based chemometric target fishing models are highly dependent on a reasonable sample of active conformations. Since a number of diverse conformational sampling algorithm exist, which exhaustively generate enough conformers, however model building methods relies on explicit number of common conformers. In this work, we have attempted to make clustering algorithms, which could find reasonable number of representative conformer ensembles automatically with asymmetric dissimilarity matrix generated from openeye tool kit. RMSD was the important descriptor (variable) of each column of the N × N matrix considered as N variables describing the relationship (network) between the conformer (in a row) and the other N conformers. This approach used to evaluate the performance of the well-known clustering algorithms by comparison in terms of generating representative conformer ensembles and test them over different matrix transformation functions considering the stability. In the network, the representative conformer group could be resampled for four kinds of algorithms with implicit parameters. The directed dissimilarity matrix becomes the only input to the clustering algorithms. Dunn index, Davies-Bouldin index, Eta-squared values and omega-squared values were used to evaluate the clustering algorithms with respect to the compactness and the explanatory power. The evaluation includes the reduction (abstraction) rate of the data, correlation between the sizes of the population and the samples, the computational complexity and the memory usage as well. Every algorithm could find representative conformers automatically without any user intervention, and they reduced the data to 14-19% of the original values within 1.13 s per sample at the most. The clustering methods are simple and practical as they are fast and do not ask for any explicit parameters. RCDTC presented the maximum Dunn and omega-squared values of the four algorithms in addition to consistent reduction rate between the population size and the sample size. The performance of the clustering algorithms was consistent over different transformation functions. Moreover, the clustering method can also be applied to molecular dynamics sampling simulation results.
Real-time algorithm for acoustic imaging with a microphone array.
Huang, Xun
2009-05-01
Acoustic phased array has become an important testing tool in aeroacoustic research, where the conventional beamforming algorithm has been adopted as a classical processing technique. The computation however has to be performed off-line due to the expensive cost. An innovative algorithm with real-time capability is proposed in this work. The algorithm is similar to a classical observer in the time domain while extended for the array processing to the frequency domain. The observer-based algorithm is beneficial mainly for its capability of operating over sampling blocks recursively. The expensive experimental time can therefore be reduced extensively since any defect in a testing can be corrected instantaneously.
A Fast Algorithm of Convex Hull Vertices Selection for Online Classification.
Ding, Shuguang; Nie, Xiangli; Qiao, Hong; Zhang, Bo
2018-04-01
Reducing samples through convex hull vertices selection (CHVS) within each class is an important and effective method for online classification problems, since the classifier can be trained rapidly with the selected samples. However, the process of CHVS is NP-hard. In this paper, we propose a fast algorithm to select the convex hull vertices, based on the convex hull decomposition and the property of projection. In the proposed algorithm, the quadratic minimization problem of computing the distance between a point and a convex hull is converted into a linear equation problem with a low computational complexity. When the data dimension is high, an approximate, instead of exact, convex hull is allowed to be selected by setting an appropriate termination condition in order to delete more nonimportant samples. In addition, the impact of outliers is also considered, and the proposed algorithm is improved by deleting the outliers in the initial procedure. Furthermore, a dimension convention technique via the kernel trick is used to deal with nonlinearly separable problems. An upper bound is theoretically proved for the difference between the support vector machines based on the approximate convex hull vertices selected and all the training samples. Experimental results on both synthetic and real data sets show the effectiveness and validity of the proposed algorithm.
Randomized algorithms for high quality treatment planning in volumetric modulated arc therapy
NASA Astrophysics Data System (ADS)
Yang, Yu; Dong, Bin; Wen, Zaiwen
2017-02-01
In recent years, volumetric modulated arc therapy (VMAT) has been becoming a more and more important radiation technique widely used in clinical application for cancer treatment. One of the key problems in VMAT is treatment plan optimization, which is complicated due to the constraints imposed by the involved equipments. In this paper, we consider a model with four major constraints: the bound on the beam intensity, an upper bound on the rate of the change of the beam intensity, the moving speed of leaves of the multi-leaf collimator (MLC) and its directional-convexity. We solve the model by a two-stage algorithm: performing minimization with respect to the shapes of the aperture and the beam intensities alternatively. Specifically, the shapes of the aperture are obtained by a greedy algorithm whose performance is enhanced by random sampling in the leaf pairs with a decremental rate. The beam intensity is optimized using a gradient projection method with non-monotonic line search. We further improve the proposed algorithm by an incremental random importance sampling of the voxels to reduce the computational cost of the energy functional. Numerical simulations on two clinical cancer date sets demonstrate that our method is highly competitive to the state-of-the-art algorithms in terms of both computational time and quality of treatment planning.
[Purity Detection Model Update of Maize Seeds Based on Active Learning].
Tang, Jin-ya; Huang, Min; Zhu, Qi-bing
2015-08-01
Seed purity reflects the degree of seed varieties in typical consistent characteristics, so it is great important to improve the reliability and accuracy of seed purity detection to guarantee the quality of seeds. Hyperspectral imaging can reflect the internal and external characteristics of seeds at the same time, which has been widely used in nondestructive detection of agricultural products. The essence of nondestructive detection of agricultural products using hyperspectral imaging technique is to establish the mathematical model between the spectral information and the quality of agricultural products. Since the spectral information is easily affected by the sample growth environment, the stability and generalization of model would weaken when the test samples harvested from different origin and year. Active learning algorithm was investigated to add representative samples to expand the sample space for the original model, so as to implement the rapid update of the model's ability. Random selection (RS) and Kennard-Stone algorithm (KS) were performed to compare the model update effect with active learning algorithm. The experimental results indicated that in the division of different proportion of sample set (1:1, 3:1, 4:1), the updated purity detection model for maize seeds from 2010 year which was added 40 samples selected by active learning algorithm from 2011 year increased the prediction accuracy for 2011 new samples from 47%, 33.75%, 49% to 98.89%, 98.33%, 98.33%. For the updated purity detection model of 2011 year, its prediction accuracy for 2010 new samples increased by 50.83%, 54.58%, 53.75% to 94.57%, 94.02%, 94.57% after adding 56 new samples from 2010 year. Meanwhile the effect of model updated by active learning algorithm was better than that of RS and KS. Therefore, the update for purity detection model of maize seeds is feasible by active learning algorithm.
Zhang, Yiyan; Xin, Yi; Li, Qin; Ma, Jianshe; Li, Shuai; Lv, Xiaodan; Lv, Weiqi
2017-11-02
Various kinds of data mining algorithms are continuously raised with the development of related disciplines. The applicable scopes and their performances of these algorithms are different. Hence, finding a suitable algorithm for a dataset is becoming an important emphasis for biomedical researchers to solve practical problems promptly. In this paper, seven kinds of sophisticated active algorithms, namely, C4.5, support vector machine, AdaBoost, k-nearest neighbor, naïve Bayes, random forest, and logistic regression, were selected as the research objects. The seven algorithms were applied to the 12 top-click UCI public datasets with the task of classification, and their performances were compared through induction and analysis. The sample size, number of attributes, number of missing values, and the sample size of each class, correlation coefficients between variables, class entropy of task variable, and the ratio of the sample size of the largest class to the least class were calculated to character the 12 research datasets. The two ensemble algorithms reach high accuracy of classification on most datasets. Moreover, random forest performs better than AdaBoost on the unbalanced dataset of the multi-class task. Simple algorithms, such as the naïve Bayes and logistic regression model are suitable for a small dataset with high correlation between the task and other non-task attribute variables. K-nearest neighbor and C4.5 decision tree algorithms perform well on binary- and multi-class task datasets. Support vector machine is more adept on the balanced small dataset of the binary-class task. No algorithm can maintain the best performance in all datasets. The applicability of the seven data mining algorithms on the datasets with different characteristics was summarized to provide a reference for biomedical researchers or beginners in different fields.
NASA Astrophysics Data System (ADS)
Castelletti, Davide; Demir, Begüm; Bruzzone, Lorenzo
2014-10-01
This paper presents a novel semisupervised learning (SSL) technique defined in the context of ɛ-insensitive support vector regression (SVR) to estimate biophysical parameters from remotely sensed images. The proposed SSL method aims to mitigate the problems of small-sized biased training sets without collecting any additional samples with reference measures. This is achieved on the basis of two consecutive steps. The first step is devoted to inject additional priors information in the learning phase of the SVR in order to adapt the importance of each training sample according to distribution of the unlabeled samples. To this end, a weight is initially associated to each training sample based on a novel strategy that defines higher weights for the samples located in the high density regions of the feature space while giving reduced weights to those that fall into the low density regions of the feature space. Then, in order to exploit different weights for training samples in the learning phase of the SVR, we introduce a weighted SVR (WSVR) algorithm. The second step is devoted to jointly exploit labeled and informative unlabeled samples for further improving the definition of the WSVR learning function. To this end, the most informative unlabeled samples that have an expected accurate target values are initially selected according to a novel strategy that relies on the distribution of the unlabeled samples in the feature space and on the WSVR function estimated at the first step. Then, we introduce a restructured WSVR algorithm that jointly uses labeled and unlabeled samples in the learning phase of the WSVR algorithm and tunes their importance by different values of regularization parameters. Experimental results obtained for the estimation of single-tree stem volume show the effectiveness of the proposed SSL method.
AUTOCLASSIFICATION OF THE VARIABLE 3XMM SOURCES USING THE RANDOM FOREST MACHINE LEARNING ALGORITHM
DOE Office of Scientific and Technical Information (OSTI.GOV)
Farrell, Sean A.; Murphy, Tara; Lo, Kitty K., E-mail: s.farrell@physics.usyd.edu.au
In the current era of large surveys and massive data sets, autoclassification of astrophysical sources using intelligent algorithms is becoming increasingly important. In this paper we present the catalog of variable sources in the Third XMM-Newton Serendipitous Source catalog (3XMM) autoclassified using the Random Forest machine learning algorithm. We used a sample of manually classified variable sources from the second data release of the XMM-Newton catalogs (2XMMi-DR2) to train the classifier, obtaining an accuracy of ∼92%. We also evaluated the effectiveness of identifying spurious detections using a sample of spurious sources, achieving an accuracy of ∼95%. Manual investigation of amore » random sample of classified sources confirmed these accuracy levels and showed that the Random Forest machine learning algorithm is highly effective at automatically classifying 3XMM sources. Here we present the catalog of classified 3XMM variable sources. We also present three previously unidentified unusual sources that were flagged as outlier sources by the algorithm: a new candidate supergiant fast X-ray transient, a 400 s X-ray pulsar, and an eclipsing 5 hr binary system coincident with a known Cepheid.« less
Kaspi, Omer; Yosipof, Abraham; Senderowitz, Hanoch
2017-06-06
An important aspect of chemoinformatics and material-informatics is the usage of machine learning algorithms to build Quantitative Structure Activity Relationship (QSAR) models. The RANdom SAmple Consensus (RANSAC) algorithm is a predictive modeling tool widely used in the image processing field for cleaning datasets from noise. RANSAC could be used as a "one stop shop" algorithm for developing and validating QSAR models, performing outlier removal, descriptors selection, model development and predictions for test set samples using applicability domain. For "future" predictions (i.e., for samples not included in the original test set) RANSAC provides a statistical estimate for the probability of obtaining reliable predictions, i.e., predictions within a pre-defined number of standard deviations from the true values. In this work we describe the first application of RNASAC in material informatics, focusing on the analysis of solar cells. We demonstrate that for three datasets representing different metal oxide (MO) based solar cell libraries RANSAC-derived models select descriptors previously shown to correlate with key photovoltaic properties and lead to good predictive statistics for these properties. These models were subsequently used to predict the properties of virtual solar cells libraries highlighting interesting dependencies of PV properties on MO compositions.
Fiuzy, Mohammad; Haddadnia, Javad; Mollania, Nasrin; Hashemian, Maryam; Hassanpour, Kazem
2012-01-01
Background Accurate Diagnosis of Breast Cancer is of prime importance. Fine Needle Aspiration test or "FNA”, which has been used for several years in Europe, is a simple, inexpensive, noninvasive and accurate technique for detecting breast cancer. Expending the suitable features of the Fine Needle Aspiration results is the most important diagnostic problem in early stages of breast cancer. In this study, we introduced a new algorithm that can detect breast cancer based on combining artificial intelligent system and Fine Needle Aspiration (FNA). Methods We studied the Features of Wisconsin Data Base Cancer which contained about 569 FNA test samples (212 patient samples (malignant) and 357 healthy samples (benign)). In this research, we combined Artificial Intelligence Approaches, such as Evolutionary Algorithm (EA) with Genetic Algorithm (GA), and also used Exact Classifier Systems (here by Fuzzy C-Means (FCM)) to separate malignant from benign samples. Furthermore, we examined artificial Neural Networks (NN) to identify the model and structure. This research proposed a new algorithm for an accurate diagnosis of breast cancer. Results According to Wisconsin Data Base Cancer (WDBC) data base, 62.75% of samples were benign, and 37.25% were malignant. After applying the proposed algorithm, we achieved high detection accuracy of about "96.579%” on 205 patients who were diagnosed as having breast cancer. It was found that the method had 93% sensitivity, 73% specialty, 65% positive predictive value, and 95% negative predictive value, respectively. If done by experts, Fine Needle Aspiration (FNA) can be a reliable replacement for open biopsy in palpable breast masses. Evaluation of FNA samples during aspiration can decrease insufficient samples. FNA can be the first line of diagnosis in women with breast masses, at least in deprived regions, and may increase health standards and clinical supervision of patients. Conclusion Such a smart, economical, non-invasive, rapid and accurate system can be introduced as a useful diagnostic system for comprehensive treatment of breast cancer. Another advantage of this method is the possibility of diagnosing breast abnormalities. If done by experts, FNA can be a reliable replacement for open biopsy in palpable breast masses. Evaluation of FNA samples during aspiration can decrease insufficient samples. PMID:25352966
Alici, Ibrahim Onur; Yılmaz Demirci, Nilgün; Yılmaz, Aydın; Karakaya, Jale; Özaydın, Esra
2016-09-01
There are several papers on the sonographic features of mediastinal lymph nodes affected by several diseases, but none gives the importance and clinical utility of the features. In order to find out which lymph node should be sampled in a particular nodal station during endobronchial ultrasound, we investigated the diagnostic performances of certain sonographic features and proposed an algorithmic approach. We retrospectively analyzed 1051 lymph nodes and randomly assigned them into a preliminary experimental and a secondary study group. The diagnostic performances of the sonographic features (gray scale, echogeneity, shape, size, margin, presence of necrosis, presence of calcification and absence of central hilar structure) were calculated, and an algorithm for lymph node sampling was obtained with decision tree analysis in the experimental group. Later, a modified algorithm was applied to the patients in the study group to give the accuracy. The demographic characteristics of the patients were not statistically significant between the primary and the secondary groups. All of the features were discriminative between malignant and benign diseases. The modified algorithm sensitivity, specificity, and positive and negative predictive values and diagnostic accuracy for detecting metastatic lymph nodes were 100%, 51.2%, 50.6%, 100% and 67.5%, respectively. In this retrospective analysis, the standardized sonographic classification system and the proposed algorithm performed well in choosing the node that should be sampled in a particular station during endobronchial ultrasound. © 2015 John Wiley & Sons Ltd.
High performance transcription factor-DNA docking with GPU computing
2012-01-01
Background Protein-DNA docking is a very challenging problem in structural bioinformatics and has important implications in a number of applications, such as structure-based prediction of transcription factor binding sites and rational drug design. Protein-DNA docking is very computational demanding due to the high cost of energy calculation and the statistical nature of conformational sampling algorithms. More importantly, experiments show that the docking quality depends on the coverage of the conformational sampling space. It is therefore desirable to accelerate the computation of the docking algorithm, not only to reduce computing time, but also to improve docking quality. Methods In an attempt to accelerate the sampling process and to improve the docking performance, we developed a graphics processing unit (GPU)-based protein-DNA docking algorithm. The algorithm employs a potential-based energy function to describe the binding affinity of a protein-DNA pair, and integrates Monte-Carlo simulation and a simulated annealing method to search through the conformational space. Algorithmic techniques were developed to improve the computation efficiency and scalability on GPU-based high performance computing systems. Results The effectiveness of our approach is tested on a non-redundant set of 75 TF-DNA complexes and a newly developed TF-DNA docking benchmark. We demonstrated that the GPU-based docking algorithm can significantly accelerate the simulation process and thereby improving the chance of finding near-native TF-DNA complex structures. This study also suggests that further improvement in protein-DNA docking research would require efforts from two integral aspects: improvement in computation efficiency and energy function design. Conclusions We present a high performance computing approach for improving the prediction accuracy of protein-DNA docking. The GPU-based docking algorithm accelerates the search of the conformational space and thus increases the chance of finding more near-native structures. To the best of our knowledge, this is the first ad hoc effort of applying GPU or GPU clusters to the protein-DNA docking problem. PMID:22759575
Stochastic subset selection for learning with kernel machines.
Rhinelander, Jason; Liu, Xiaoping P
2012-06-01
Kernel machines have gained much popularity in applications of machine learning. Support vector machines (SVMs) are a subset of kernel machines and generalize well for classification, regression, and anomaly detection tasks. The training procedure for traditional SVMs involves solving a quadratic programming (QP) problem. The QP problem scales super linearly in computational effort with the number of training samples and is often used for the offline batch processing of data. Kernel machines operate by retaining a subset of observed data during training. The data vectors contained within this subset are referred to as support vectors (SVs). The work presented in this paper introduces a subset selection method for the use of kernel machines in online, changing environments. Our algorithm works by using a stochastic indexing technique when selecting a subset of SVs when computing the kernel expansion. The work described here is novel because it separates the selection of kernel basis functions from the training algorithm used. The subset selection algorithm presented here can be used in conjunction with any online training technique. It is important for online kernel machines to be computationally efficient due to the real-time requirements of online environments. Our algorithm is an important contribution because it scales linearly with the number of training samples and is compatible with current training techniques. Our algorithm outperforms standard techniques in terms of computational efficiency and provides increased recognition accuracy in our experiments. We provide results from experiments using both simulated and real-world data sets to verify our algorithm.
Application-Specific Graph Sampling for Frequent Subgraph Mining and Community Detection
DOE Office of Scientific and Technical Information (OSTI.GOV)
Purohit, Sumit; Choudhury, Sutanay; Holder, Lawrence B.
Graph mining is an important data analysis methodology, but struggles as the input graph size increases. The scalability and usability challenges posed by such large graphs make it imperative to sample the input graph and reduce its size. The critical challenge in sampling is to identify the appropriate algorithm to insure the resulting analysis does not suffer heavily from the data reduction. Predicting the expected performance degradation for a given graph and sampling algorithm is also useful. In this paper, we present different sampling approaches for graph mining applications such as Frequent Subgrpah Mining (FSM), and Community Detection (CD). Wemore » explore graph metrics such as PageRank, Triangles, and Diversity to sample a graph and conclude that for heterogeneous graphs Triangles and Diversity perform better than degree based metrics. We also present two new sampling variations for targeted graph mining applications. We present empirical results to show that knowledge of the target application, along with input graph properties can be used to select the best sampling algorithm. We also conclude that performance degradation is an abrupt, rather than gradual phenomena, as the sample size decreases. We present the empirical results to show that the performance degradation follows a logistic function.« less
Advisory Algorithm for Scheduling Open Sectors, Operating Positions, and Workstations
NASA Technical Reports Server (NTRS)
Bloem, Michael; Drew, Michael; Lai, Chok Fung; Bilimoria, Karl D.
2012-01-01
Air traffic controller supervisors configure available sector, operating position, and work-station resources to safely and efficiently control air traffic in a region of airspace. In this paper, an algorithm for assisting supervisors with this task is described and demonstrated on two sample problem instances. The algorithm produces configuration schedule advisories that minimize a cost. The cost is a weighted sum of two competing costs: one penalizing mismatches between configurations and predicted air traffic demand and another penalizing the effort associated with changing configurations. The problem considered by the algorithm is a shortest path problem that is solved with a dynamic programming value iteration algorithm. The cost function contains numerous parameters. Default values for most of these are suggested based on descriptions of air traffic control procedures and subject-matter expert feedback. The parameter determining the relative importance of the two competing costs is tuned by comparing historical configurations with corresponding algorithm advisories. Two sample problem instances for which appropriate configuration advisories are obvious were designed to illustrate characteristics of the algorithm. Results demonstrate how the algorithm suggests advisories that appropriately utilize changes in airspace configurations and changes in the number of operating positions allocated to each open sector. The results also demonstrate how the advisories suggest appropriate times for configuration changes.
Development and Evaluation of Algorithms for Breath Alcohol Screening.
Ljungblad, Jonas; Hök, Bertil; Ekström, Mikael
2016-04-01
Breath alcohol screening is important for traffic safety, access control and other areas of health promotion. A family of sensor devices useful for these purposes is being developed and evaluated. This paper is focusing on algorithms for the determination of breath alcohol concentration in diluted breath samples using carbon dioxide to compensate for the dilution. The examined algorithms make use of signal averaging, weighting and personalization to reduce estimation errors. Evaluation has been performed by using data from a previously conducted human study. It is concluded that these features in combination will significantly reduce the random error compared to the signal averaging algorithm taken alone.
A survey of the state-of-the-art and focused research in range systems
NASA Technical Reports Server (NTRS)
Kung, Yao; Balakrishnan, A. V.
1988-01-01
In this one-year renewal of NASA Contract No. 2-304, basic research, development, and implementation in the areas of modern estimation algorithms and digital communication systems have been performed. In the first area, basic study on the conversion of general classes of practical signal processing algorithms into systolic array algorithms is considered, producing four publications. Also studied were the finite word length effects and convergence rates of lattice algorithms, producing two publications. In the second area of study, the use of efficient importance sampling simulation technique for the evaluation of digital communication system performances were studied, producing two publications.
Quantum speedup of Monte Carlo methods.
Montanaro, Ashley
2015-09-08
Monte Carlo methods use random sampling to estimate numerical quantities which are hard to compute deterministically. One important example is the use in statistical physics of rapidly mixing Markov chains to approximately compute partition functions. In this work, we describe a quantum algorithm which can accelerate Monte Carlo methods in a very general setting. The algorithm estimates the expected output value of an arbitrary randomized or quantum subroutine with bounded variance, achieving a near-quadratic speedup over the best possible classical algorithm. Combining the algorithm with the use of quantum walks gives a quantum speedup of the fastest known classical algorithms with rigorous performance bounds for computing partition functions, which use multiple-stage Markov chain Monte Carlo techniques. The quantum algorithm can also be used to estimate the total variation distance between probability distributions efficiently.
Quantum speedup of Monte Carlo methods
Montanaro, Ashley
2015-01-01
Monte Carlo methods use random sampling to estimate numerical quantities which are hard to compute deterministically. One important example is the use in statistical physics of rapidly mixing Markov chains to approximately compute partition functions. In this work, we describe a quantum algorithm which can accelerate Monte Carlo methods in a very general setting. The algorithm estimates the expected output value of an arbitrary randomized or quantum subroutine with bounded variance, achieving a near-quadratic speedup over the best possible classical algorithm. Combining the algorithm with the use of quantum walks gives a quantum speedup of the fastest known classical algorithms with rigorous performance bounds for computing partition functions, which use multiple-stage Markov chain Monte Carlo techniques. The quantum algorithm can also be used to estimate the total variation distance between probability distributions efficiently. PMID:26528079
Kundu, Anupam; Sabhapandit, Sanjib; Dhar, Abhishek
2011-03-01
We present an algorithm for finding the probabilities of rare events in nonequilibrium processes. The algorithm consists of evolving the system with a modified dynamics for which the required event occurs more frequently. By keeping track of the relative weight of phase-space trajectories generated by the modified and the original dynamics one can obtain the required probabilities. The algorithm is tested on two model systems of steady-state particle and heat transport where we find a huge improvement from direct simulation methods.
An Intrinsic Algorithm for Parallel Poisson Disk Sampling on Arbitrary Surfaces.
Ying, Xiang; Xin, Shi-Qing; Sun, Qian; He, Ying
2013-03-08
Poisson disk sampling plays an important role in a variety of visual computing, due to its useful statistical property in distribution and the absence of aliasing artifacts. While many effective techniques have been proposed to generate Poisson disk distribution in Euclidean space, relatively few work has been reported to the surface counterpart. This paper presents an intrinsic algorithm for parallel Poisson disk sampling on arbitrary surfaces. We propose a new technique for parallelizing the dart throwing. Rather than the conventional approaches that explicitly partition the spatial domain to generate the samples in parallel, our approach assigns each sample candidate a random and unique priority that is unbiased with regard to the distribution. Hence, multiple threads can process the candidates simultaneously and resolve conflicts by checking the given priority values. It is worth noting that our algorithm is accurate as the generated Poisson disks are uniformly and randomly distributed without bias. Our method is intrinsic in that all the computations are based on the intrinsic metric and are independent of the embedding space. This intrinsic feature allows us to generate Poisson disk distributions on arbitrary surfaces. Furthermore, by manipulating the spatially varying density function, we can obtain adaptive sampling easily.
Balouchestani, Mohammadreza; Krishnan, Sridhar
2014-01-01
Long-term recording of Electrocardiogram (ECG) signals plays an important role in health care systems for diagnostic and treatment purposes of heart diseases. Clustering and classification of collecting data are essential parts for detecting concealed information of P-QRS-T waves in the long-term ECG recording. Currently used algorithms do have their share of drawbacks: 1) clustering and classification cannot be done in real time; 2) they suffer from huge energy consumption and load of sampling. These drawbacks motivated us in developing novel optimized clustering algorithm which could easily scan large ECG datasets for establishing low power long-term ECG recording. In this paper, we present an advanced K-means clustering algorithm based on Compressed Sensing (CS) theory as a random sampling procedure. Then, two dimensionality reduction methods: Principal Component Analysis (PCA) and Linear Correlation Coefficient (LCC) followed by sorting the data using the K-Nearest Neighbours (K-NN) and Probabilistic Neural Network (PNN) classifiers are applied to the proposed algorithm. We show our algorithm based on PCA features in combination with K-NN classifier shows better performance than other methods. The proposed algorithm outperforms existing algorithms by increasing 11% classification accuracy. In addition, the proposed algorithm illustrates classification accuracy for K-NN and PNN classifiers, and a Receiver Operating Characteristics (ROC) area of 99.98%, 99.83%, and 99.75% respectively.
Shannon, Casey P; Chen, Virginia; Takhar, Mandeep; Hollander, Zsuzsanna; Balshaw, Robert; McManus, Bruce M; Tebbutt, Scott J; Sin, Don D; Ng, Raymond T
2016-11-14
Gene network inference (GNI) algorithms can be used to identify sets of coordinately expressed genes, termed network modules from whole transcriptome gene expression data. The identification of such modules has become a popular approach to systems biology, with important applications in translational research. Although diverse computational and statistical approaches have been devised to identify such modules, their performance behavior is still not fully understood, particularly in complex human tissues. Given human heterogeneity, one important question is how the outputs of these computational methods are sensitive to the input sample set, or stability. A related question is how this sensitivity depends on the size of the sample set. We describe here the SABRE (Similarity Across Bootstrap RE-sampling) procedure for assessing the stability of gene network modules using a re-sampling strategy, introduce a novel criterion for identifying stable modules, and demonstrate the utility of this approach in a clinically-relevant cohort, using two different gene network module discovery algorithms. The stability of modules increased as sample size increased and stable modules were more likely to be replicated in larger sets of samples. Random modules derived from permutated gene expression data were consistently unstable, as assessed by SABRE, and provide a useful baseline value for our proposed stability criterion. Gene module sets identified by different algorithms varied with respect to their stability, as assessed by SABRE. Finally, stable modules were more readily annotated in various curated gene set databases. The SABRE procedure and proposed stability criterion may provide guidance when designing systems biology studies in complex human disease and tissues.
Entropy-Based Search Algorithm for Experimental Design
NASA Astrophysics Data System (ADS)
Malakar, N. K.; Knuth, K. H.
2011-03-01
The scientific method relies on the iterated processes of inference and inquiry. The inference phase consists of selecting the most probable models based on the available data; whereas the inquiry phase consists of using what is known about the models to select the most relevant experiment. Optimizing inquiry involves searching the parameterized space of experiments to select the experiment that promises, on average, to be maximally informative. In the case where it is important to learn about each of the model parameters, the relevance of an experiment is quantified by Shannon entropy of the distribution of experimental outcomes predicted by a probable set of models. If the set of potential experiments is described by many parameters, we must search this high-dimensional entropy space. Brute force search methods will be slow and computationally expensive. We present an entropy-based search algorithm, called nested entropy sampling, to select the most informative experiment for efficient experimental design. This algorithm is inspired by Skilling's nested sampling algorithm used in inference and borrows the concept of a rising threshold while a set of experiment samples are maintained. We demonstrate that this algorithm not only selects highly relevant experiments, but also is more efficient than brute force search. Such entropic search techniques promise to greatly benefit autonomous experimental design.
An Adaptive Defect Weighted Sampling Algorithm to Design Pseudoknotted RNA Secondary Structures
Zandi, Kasra; Butler, Gregory; Kharma, Nawwaf
2016-01-01
Computational design of RNA sequences that fold into targeted secondary structures has many applications in biomedicine, nanotechnology and synthetic biology. An RNA molecule is made of different types of secondary structure elements and an important RNA element named pseudoknot plays a key role in stabilizing the functional form of the molecule. However, due to the computational complexities associated with characterizing pseudoknotted RNA structures, most of the existing RNA sequence designer algorithms generally ignore this important structural element and therefore limit their applications. In this paper we present a new algorithm to design RNA sequences for pseudoknotted secondary structures. We use NUPACK as the folding algorithm to compute the equilibrium characteristics of the pseudoknotted RNAs, and describe a new adaptive defect weighted sampling algorithm named Enzymer to design low ensemble defect RNA sequences for targeted secondary structures including pseudoknots. We used a biological data set of 201 pseudoknotted structures from the Pseudobase library to benchmark the performance of our algorithm. We compared the quality characteristics of the RNA sequences we designed by Enzymer with the results obtained from the state of the art MODENA and antaRNA. Our results show our method succeeds more frequently than MODENA and antaRNA do, and generates sequences that have lower ensemble defect, lower probability defect and higher thermostability. Finally by using Enzymer and by constraining the design to a naturally occurring and highly conserved Hammerhead motif, we designed 8 sequences for a pseudoknotted cis-acting Hammerhead ribozyme. Enzymer is available for download at https://bitbucket.org/casraz/enzymer. PMID:27499762
Twenty-four cases of imported zika virus infections diagnosed by molecular methods.
Alejo-Cancho, Izaskun; Torner, Nuria; Oliveira, Inés; Martínez, Ana; Muñoz, José; Jane, Mireia; Gascón, Joaquim; Requena-Méndez, Ana; Vilella, Anna; Marcos, M Ángeles; Pinazo, María Jesús; Gonzalo, Verónica; Rodriguez, Natalia; Martínez, Miguel J
2016-10-01
Zika virus is an emerging flavivirus widely spreading through Latin America. Molecular diagnosis of the infection can be performed using serum, urine and saliva samples, although a well-defined diagnostic algorithm is not yet established. We describe a series of 24 cases of imported zika virus infection into Catalonia (northeastern Spain). Based on our findings, testing of paired serum and urine samples is recommended. Copyright © 2016 Elsevier Inc. All rights reserved.
Skull removal in MR images using a modified artificial bee colony optimization algorithm.
Taherdangkoo, Mohammad
2014-01-01
Removal of the skull from brain Magnetic Resonance (MR) images is an important preprocessing step required for other image analysis techniques such as brain tissue segmentation. In this paper, we propose a new algorithm based on the Artificial Bee Colony (ABC) optimization algorithm to remove the skull region from brain MR images. We modify the ABC algorithm using a different strategy for initializing the coordinates of scout bees and their direction of search. Moreover, we impose an additional constraint to the ABC algorithm to avoid the creation of discontinuous regions. We found that our algorithm successfully removed all bony skull from a sample of de-identified MR brain images acquired from different model scanners. The obtained results of the proposed algorithm compared with those of previously introduced well known optimization algorithms such as Particle Swarm Optimization (PSO) and Ant Colony Optimization (ACO) demonstrate the superior results and computational performance of our algorithm, suggesting its potential for clinical applications.
Hierarchical Dirichlet process model for gene expression clustering
2013-01-01
Clustering is an important data processing tool for interpreting microarray data and genomic network inference. In this article, we propose a clustering algorithm based on the hierarchical Dirichlet processes (HDP). The HDP clustering introduces a hierarchical structure in the statistical model which captures the hierarchical features prevalent in biological data such as the gene express data. We develop a Gibbs sampling algorithm based on the Chinese restaurant metaphor for the HDP clustering. We apply the proposed HDP algorithm to both regulatory network segmentation and gene expression clustering. The HDP algorithm is shown to outperform several popular clustering algorithms by revealing the underlying hierarchical structure of the data. For the yeast cell cycle data, we compare the HDP result to the standard result and show that the HDP algorithm provides more information and reduces the unnecessary clustering fragments. PMID:23587447
Sampling solution traces for the problem of sorting permutations by signed reversals
2012-01-01
Background Traditional algorithms to solve the problem of sorting by signed reversals output just one optimal solution while the space of all optimal solutions can be huge. A so-called trace represents a group of solutions which share the same set of reversals that must be applied to sort the original permutation following a partial ordering. By using traces, we therefore can represent the set of optimal solutions in a more compact way. Algorithms for enumerating the complete set of traces of solutions were developed. However, due to their exponential complexity, their practical use is limited to small permutations. A partial enumeration of traces is a sampling of the complete set of traces and can be an alternative for the study of distinct evolutionary scenarios of big permutations. Ideally, the sampling should be done uniformly from the space of all optimal solutions. This is however conjectured to be ♯P-complete. Results We propose and evaluate three algorithms for producing a sampling of the complete set of traces that instead can be shown in practice to preserve some of the characteristics of the space of all solutions. The first algorithm (RA) performs the construction of traces through a random selection of reversals on the list of optimal 1-sequences. The second algorithm (DFALT) consists in a slight modification of an algorithm that performs the complete enumeration of traces. Finally, the third algorithm (SWA) is based on a sliding window strategy to improve the enumeration of traces. All proposed algorithms were able to enumerate traces for permutations with up to 200 elements. Conclusions We analysed the distribution of the enumerated traces with respect to their height and average reversal length. Various works indicate that the reversal length can be an important aspect in genome rearrangements. The algorithms RA and SWA show a tendency to lose traces with high average reversal length. Such traces are however rare, and qualitatively our results show that, for testable-sized permutations, the algorithms DFALT and SWA produce distributions which approximate the reversal length distributions observed with a complete enumeration of the set of traces. PMID:22704580
Neural Network and Nearest Neighbor Algorithms for Enhancing Sampling of Molecular Dynamics.
Galvelis, Raimondas; Sugita, Yuji
2017-06-13
The free energy calculations of complex chemical and biological systems with molecular dynamics (MD) are inefficient due to multiple local minima separated by high-energy barriers. The minima can be escaped using an enhanced sampling method such as metadynamics, which apply bias (i.e., importance sampling) along a set of collective variables (CV), but the maximum number of CVs (or dimensions) is severely limited. We propose a high-dimensional bias potential method (NN2B) based on two machine learning algorithms: the nearest neighbor density estimator (NNDE) and the artificial neural network (ANN) for the bias potential approximation. The bias potential is constructed iteratively from short biased MD simulations accounting for correlation among CVs. Our method is capable of achieving ergodic sampling and calculating free energy of polypeptides with up to 8-dimensional bias potential.
Waldispühl, Jérôme; Ponty, Yann
2011-11-01
The analysis of the relationship between sequences and structures (i.e., how mutations affect structures and reciprocally how structures influence mutations) is essential to decipher the principles driving molecular evolution, to infer the origins of genetic diseases, and to develop bioengineering applications such as the design of artificial molecules. Because their structures can be predicted from the sequence data only, RNA molecules provide a good framework to study this sequence-structure relationship. We recently introduced a suite of algorithms called RNAmutants which allows a complete exploration of RNA sequence-structure maps in polynomial time and space. Formally, RNAmutants takes an input sequence (or seed) to compute the Boltzmann-weighted ensembles of mutants with exactly k mutations, and sample mutations from these ensembles. However, this approach suffers from major limitations. Indeed, since the Boltzmann probabilities of the mutations depend of the free energy of the structures, RNAmutants has difficulties to sample mutant sequences with low G+C-contents. In this article, we introduce an unbiased adaptive sampling algorithm that enables RNAmutants to sample regions of the mutational landscape poorly covered by classical algorithms. We applied these methods to sample mutations with low G+C-contents. These adaptive sampling techniques can be easily adapted to explore other regions of the sequence and structural landscapes which are difficult to sample. Importantly, these algorithms come at a minimal computational cost. We demonstrate the insights offered by these techniques on studies of complete RNA sequence structures maps of sizes up to 40 nucleotides. Our results indicate that the G+C-content has a strong influence on the size and shape of the evolutionary accessible sequence and structural spaces. In particular, we show that low G+C-contents favor the apparition of internal loops and thus possibly the synthesis of tertiary structure motifs. On the other hand, high G+C-contents significantly reduce the size of the evolutionary accessible mutational landscapes.
MontePython 3: Parameter inference code for cosmology
NASA Astrophysics Data System (ADS)
Brinckmann, Thejs; Lesgourgues, Julien; Audren, Benjamin; Benabed, Karim; Prunet, Simon
2018-05-01
MontePython 3 provides numerous ways to explore parameter space using Monte Carlo Markov Chain (MCMC) sampling, including Metropolis-Hastings, Nested Sampling, Cosmo Hammer, and a Fisher sampling method. This improved version of the Monte Python (ascl:1307.002) parameter inference code for cosmology offers new ingredients that improve the performance of Metropolis-Hastings sampling, speeding up convergence and offering significant time improvement in difficult runs. Additional likelihoods and plotting options are available, as are post-processing algorithms such as Importance Sampling and Adding Derived Parameter.
Monte Carlo simulation of a photodisintegration of 3 H experiment in Geant4
NASA Astrophysics Data System (ADS)
Gray, Isaiah
2013-10-01
An upcoming experiment involving photodisintegration of 3 H at the High Intensity Gamma-Ray Source facility at Duke University has been simulated in the software package Geant4. CAD models of silicon detectors and wire chambers were imported from Autodesk Inventor using the program FastRad and the Geant4 GDML importer. Sensitive detectors were associated with the appropriate logical volumes in the exported GDML file so that changes in detector geometry will be easily manifested in the simulation. Probability distribution functions for the energy and direction of outgoing protons were generated using numerical tables from previous theory, and energies and directions were sampled from these distributions using a rejection sampling algorithm. The simulation will be a useful tool to optimize detector geometry, estimate background rates, and test data analysis algorithms. This work was supported by the Triangle Universities Nuclear Laboratory REU program at Duke University.
NASA Astrophysics Data System (ADS)
Wang, Ershen; Jia, Chaoying; Tong, Gang; Qu, Pingping; Lan, Xiaoyu; Pang, Tao
2018-03-01
The receiver autonomous integrity monitoring (RAIM) is one of the most important parts in an avionic navigation system. Two problems need to be addressed to improve this system, namely, the degeneracy phenomenon and lack of samples for the standard particle filter (PF). However, the number of samples cannot adequately express the real distribution of the probability density function (i.e., sample impoverishment). This study presents a GPS receiver autonomous integrity monitoring (RAIM) method based on a chaos particle swarm optimization particle filter (CPSO-PF) algorithm with a log likelihood ratio. The chaos sequence generates a set of chaotic variables, which are mapped to the interval of optimization variables to improve particle quality. This chaos perturbation overcomes the potential for the search to become trapped in a local optimum in the particle swarm optimization (PSO) algorithm. Test statistics are configured based on a likelihood ratio, and satellite fault detection is then conducted by checking the consistency between the state estimate of the main PF and those of the auxiliary PFs. Based on GPS data, the experimental results demonstrate that the proposed algorithm can effectively detect and isolate satellite faults under conditions of non-Gaussian measurement noise. Moreover, the performance of the proposed novel method is better than that of RAIM based on the PF or PSO-PF algorithm.
Finite element model updating using the shadow hybrid Monte Carlo technique
NASA Astrophysics Data System (ADS)
Boulkaibet, I.; Mthembu, L.; Marwala, T.; Friswell, M. I.; Adhikari, S.
2015-02-01
Recent research in the field of finite element model updating (FEM) advocates the adoption of Bayesian analysis techniques to dealing with the uncertainties associated with these models. However, Bayesian formulations require the evaluation of the Posterior Distribution Function which may not be available in analytical form. This is the case in FEM updating. In such cases sampling methods can provide good approximations of the Posterior distribution when implemented in the Bayesian context. Markov Chain Monte Carlo (MCMC) algorithms are the most popular sampling tools used to sample probability distributions. However, the efficiency of these algorithms is affected by the complexity of the systems (the size of the parameter space). The Hybrid Monte Carlo (HMC) offers a very important MCMC approach to dealing with higher-dimensional complex problems. The HMC uses the molecular dynamics (MD) steps as the global Monte Carlo (MC) moves to reach areas of high probability where the gradient of the log-density of the Posterior acts as a guide during the search process. However, the acceptance rate of HMC is sensitive to the system size as well as the time step used to evaluate the MD trajectory. To overcome this limitation we propose the use of the Shadow Hybrid Monte Carlo (SHMC) algorithm. The SHMC algorithm is a modified version of the Hybrid Monte Carlo (HMC) and designed to improve sampling for large-system sizes and time steps. This is done by sampling from a modified Hamiltonian function instead of the normal Hamiltonian function. In this paper, the efficiency and accuracy of the SHMC method is tested on the updating of two real structures; an unsymmetrical H-shaped beam structure and a GARTEUR SM-AG19 structure and is compared to the application of the HMC algorithm on the same structures.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gongzhang, R.; Xiao, B.; Lardner, T.
2014-02-18
This paper presents a robust frequency diversity based algorithm for clutter reduction in ultrasonic A-scan waveforms. The performance of conventional spectral-temporal techniques like Split Spectrum Processing (SSP) is highly dependent on the parameter selection, especially when the signal to noise ratio (SNR) is low. Although spatial beamforming offers noise reduction with less sensitivity to parameter variation, phased array techniques are not always available. The proposed algorithm first selects an ascending series of frequency bands. A signal is reconstructed for each selected band in which a defect is present when all frequency components are in uniform sign. Combining all reconstructed signalsmore » through averaging gives a probability profile of potential defect position. To facilitate data collection and validate the proposed algorithm, Full Matrix Capture is applied on the austenitic steel and high nickel alloy (HNA) samples with 5MHz transducer arrays. When processing A-scan signals with unrefined parameters, the proposed algorithm enhances SNR by 20dB for both samples and consequently, defects are more visible in B-scan images created from the large amount of A-scan traces. Importantly, the proposed algorithm is considered robust, while SSP is shown to fail on the austenitic steel data and achieves less SNR enhancement on the HNA data.« less
NASA Astrophysics Data System (ADS)
Balasubramanian, Priya S.; Guo, Jiaqi; Yao, Xinwen; Qu, Dovina; Lu, Helen H.; Hendon, Christine P.
2017-02-01
The directionality of collagen fibers across the anterior cruciate ligament (ACL) as well as the insertion of this key ligament into bone are important for understanding the mechanical integrity and functionality of this complex tissue. Quantitative analysis of three-dimensional fiber directionality is of particular interest due to the physiological, mechanical, and biological heterogeneity inherent across the ACL-to-bone junction, the behavior of the ligament under mechanical stress, and the usefulness of this information in designing tissue engineered grafts. We have developed an algorithm to characterize Optical Coherence Tomography (OCT) image volumes of the ACL. We present an automated algorithm for measuring ligamentous fiber angles, and extracting attenuation and backscattering coefficients of ligament, interface, and bone regions within mature and immature bovine ACL insertion samples. Future directions include translating this algorithm for real time processing to allow three-dimensional volumetric analysis within dynamically moving samples.
DSP Synthesis Algorithm for Generating Florida Scrub Jay Calls
NASA Technical Reports Server (NTRS)
Lane, John; Pittman, Tyler
2017-01-01
A prototype digital signal processing (DSP) algorithm has been developed to approximate Florida scrub jay calls. The Florida scrub jay (Aphelocoma coerulescens), believed to have been in existence for 2 million years, living only in Florida, has a complicated social system that is evident by examining the spectrograms of its calls. Audio data was acquired at the Helen and Allan Cruickshank Sanctuary, Rockledge, Florida during the 2016 mating season using three digital recorders sampling at 44.1 kHz. The synthesis algorithm is a first step at developing a robust identification and call analysis algorithm. Since the Florida scrub jay is severely threatened by loss of habitat, it is important to develop effective methods to monitor their threatened population using autonomous means.
Quantized Spectral Compressed Sensing: Cramer–Rao Bounds and Recovery Algorithms
NASA Astrophysics Data System (ADS)
Fu, Haoyu; Chi, Yuejie
2018-06-01
Efficient estimation of wideband spectrum is of great importance for applications such as cognitive radio. Recently, sub-Nyquist sampling schemes based on compressed sensing have been proposed to greatly reduce the sampling rate. However, the important issue of quantization has not been fully addressed, particularly for high-resolution spectrum and parameter estimation. In this paper, we aim to recover spectrally-sparse signals and the corresponding parameters, such as frequency and amplitudes, from heavy quantizations of their noisy complex-valued random linear measurements, e.g. only the quadrant information. We first characterize the Cramer-Rao bound under Gaussian noise, which highlights the trade-off between sample complexity and bit depth under different signal-to-noise ratios for a fixed budget of bits. Next, we propose a new algorithm based on atomic norm soft thresholding for signal recovery, which is equivalent to proximal mapping of properly designed surrogate signals with respect to the atomic norm that motivates spectral sparsity. The proposed algorithm can be applied to both the single measurement vector case, as well as the multiple measurement vector case. It is shown that under the Gaussian measurement model, the spectral signals can be reconstructed accurately with high probability, as soon as the number of quantized measurements exceeds the order of K log n, where K is the level of spectral sparsity and $n$ is the signal dimension. Finally, numerical simulations are provided to validate the proposed approaches.
NASA Astrophysics Data System (ADS)
Williams, Arnold C.; Pachowicz, Peter W.
2004-09-01
Current mine detection research indicates that no single sensor or single look from a sensor will detect mines/minefields in a real-time manner at a performance level suitable for a forward maneuver unit. Hence, the integrated development of detectors and fusion algorithms are of primary importance. A problem in this development process has been the evaluation of these algorithms with relatively small data sets, leading to anecdotal and frequently over trained results. These anecdotal results are often unreliable and conflicting among various sensors and algorithms. Consequently, the physical phenomena that ought to be exploited and the performance benefits of this exploitation are often ambiguous. The Army RDECOM CERDEC Night Vision Laboratory and Electron Sensors Directorate has collected large amounts of multisensor data such that statistically significant evaluations of detection and fusion algorithms can be obtained. Even with these large data sets care must be taken in algorithm design and data processing to achieve statistically significant performance results for combined detectors and fusion algorithms. This paper discusses statistically significant detection and combined multilook fusion results for the Ellipse Detector (ED) and the Piecewise Level Fusion Algorithm (PLFA). These statistically significant performance results are characterized by ROC curves that have been obtained through processing this multilook data for the high resolution SAR data of the Veridian X-Band radar. We discuss the implications of these results on mine detection and the importance of statistical significance, sample size, ground truth, and algorithm design in performance evaluation.
A Segment-Based Trajectory Similarity Measure in the Urban Transportation Systems.
Mao, Yingchi; Zhong, Haishi; Xiao, Xianjian; Li, Xiaofang
2017-03-06
With the rapid spread of built-in GPS handheld smart devices, the trajectory data from GPS sensors has grown explosively. Trajectory data has spatio-temporal characteristics and rich information. Using trajectory data processing techniques can mine the patterns of human activities and the moving patterns of vehicles in the intelligent transportation systems. A trajectory similarity measure is one of the most important issues in trajectory data mining (clustering, classification, frequent pattern mining, etc.). Unfortunately, the main similarity measure algorithms with the trajectory data have been found to be inaccurate, highly sensitive of sampling methods, and have low robustness for the noise data. To solve the above problems, three distances and their corresponding computation methods are proposed in this paper. The point-segment distance can decrease the sensitivity of the point sampling methods. The prediction distance optimizes the temporal distance with the features of trajectory data. The segment-segment distance introduces the trajectory shape factor into the similarity measurement to improve the accuracy. The three kinds of distance are integrated with the traditional dynamic time warping algorithm (DTW) algorithm to propose a new segment-based dynamic time warping algorithm (SDTW). The experimental results show that the SDTW algorithm can exhibit about 57%, 86%, and 31% better accuracy than the longest common subsequence algorithm (LCSS), and edit distance on real sequence algorithm (EDR) , and DTW, respectively, and that the sensitivity to the noise data is lower than that those algorithms.
Zhao, Xiaowei; Ning, Qiao; Chai, Haiting; Ma, Zhiqiang
2015-06-07
As a widespread type of protein post-translational modifications (PTMs), succinylation plays an important role in regulating protein conformation, function and physicochemical properties. Compared with the labor-intensive and time-consuming experimental approaches, computational predictions of succinylation sites are much desirable due to their convenient and fast speed. Currently, numerous computational models have been developed to identify PTMs sites through various types of two-class machine learning algorithms. These methods require both positive and negative samples for training. However, designation of the negative samples of PTMs was difficult and if it is not properly done can affect the performance of computational models dramatically. So that in this work, we implemented the first application of positive samples only learning (PSoL) algorithm to succinylation sites prediction problem, which was a special class of semi-supervised machine learning that used positive samples and unlabeled samples to train the model. Meanwhile, we proposed a novel succinylation sites computational predictor called SucPred (succinylation site predictor) by using multiple feature encoding schemes. Promising results were obtained by the SucPred predictor with an accuracy of 88.65% using 5-fold cross validation on the training dataset and an accuracy of 84.40% on the independent testing dataset, which demonstrated that the positive samples only learning algorithm presented here was particularly useful for identification of protein succinylation sites. Besides, the positive samples only learning algorithm can be applied to build predictors for other types of PTMs sites with ease. A web server for predicting succinylation sites was developed and was freely accessible at http://59.73.198.144:8088/SucPred/. Copyright © 2015 Elsevier Ltd. All rights reserved.
Spectral gap optimization of order parameters for sampling complex molecular systems
Tiwary, Pratyush; Berne, B. J.
2016-01-01
In modern-day simulations of many-body systems, much of the computational complexity is shifted to the identification of slowly changing molecular order parameters called collective variables (CVs) or reaction coordinates. A vast array of enhanced-sampling methods are based on the identification and biasing of these low-dimensional order parameters, whose fluctuations are important in driving rare events of interest. Here, we describe a new algorithm for finding optimal low-dimensional CVs for use in enhanced-sampling biasing methods like umbrella sampling, metadynamics, and related methods, when limited prior static and dynamic information is known about the system, and a much larger set of candidate CVs is specified. The algorithm involves estimating the best combination of these candidate CVs, as quantified by a maximum path entropy estimate of the spectral gap for dynamics viewed as a function of that CV. The algorithm is called spectral gap optimization of order parameters (SGOOP). Through multiple practical examples, we show how this postprocessing procedure can lead to optimization of CV and several orders of magnitude improvement in the convergence of the free energy calculated through metadynamics, essentially giving the ability to extract useful information even from unsuccessful metadynamics runs. PMID:26929365
Sun, Liping; Luo, Yonglong; Ding, Xintao; Zhang, Ji
2014-01-01
An important component of a spatial clustering algorithm is the distance measure between sample points in object space. In this paper, the traditional Euclidean distance measure is replaced with innovative obstacle distance measure for spatial clustering under obstacle constraints. Firstly, we present a path searching algorithm to approximate the obstacle distance between two points for dealing with obstacles and facilitators. Taking obstacle distance as similarity metric, we subsequently propose the artificial immune clustering with obstacle entity (AICOE) algorithm for clustering spatial point data in the presence of obstacles and facilitators. Finally, the paper presents a comparative analysis of AICOE algorithm and the classical clustering algorithms. Our clustering model based on artificial immune system is also applied to the case of public facility location problem in order to establish the practical applicability of our approach. By using the clone selection principle and updating the cluster centers based on the elite antibodies, the AICOE algorithm is able to achieve the global optimum and better clustering effect.
Random Numbers and Monte Carlo Methods
NASA Astrophysics Data System (ADS)
Scherer, Philipp O. J.
Many-body problems often involve the calculation of integrals of very high dimension which cannot be treated by standard methods. For the calculation of thermodynamic averages Monte Carlo methods are very useful which sample the integration volume at randomly chosen points. After summarizing some basic statistics, we discuss algorithms for the generation of pseudo-random numbers with given probability distribution which are essential for all Monte Carlo methods. We show how the efficiency of Monte Carlo integration can be improved by sampling preferentially the important configurations. Finally the famous Metropolis algorithm is applied to classical many-particle systems. Computer experiments visualize the central limit theorem and apply the Metropolis method to the traveling salesman problem.
Leaf Area Index Estimation Using Chinese GF-1 Wide Field View Data in an Agriculture Region.
Wei, Xiangqin; Gu, Xingfa; Meng, Qingyan; Yu, Tao; Zhou, Xiang; Wei, Zheng; Jia, Kun; Wang, Chunmei
2017-07-08
Leaf area index (LAI) is an important vegetation parameter that characterizes leaf density and canopy structure, and plays an important role in global change study, land surface process simulation and agriculture monitoring. The wide field view (WFV) sensor on board the Chinese GF-1 satellite can acquire multi-spectral data with decametric spatial resolution, high temporal resolution and wide coverage, which are valuable data sources for dynamic monitoring of LAI. Therefore, an automatic LAI estimation algorithm for GF-1 WFV data was developed based on the radiative transfer model and LAI estimation accuracy of the developed algorithm was assessed in an agriculture region with maize as the dominated crop type. The radiative transfer model was firstly used to simulate the physical relationship between canopy reflectance and LAI under different soil and vegetation conditions, and then the training sample dataset was formed. Then, neural networks (NNs) were used to develop the LAI estimation algorithm using the training sample dataset. Green, red and near-infrared band reflectances of GF-1 WFV data were used as the input variables of the NNs, as well as the corresponding LAI was the output variable. The validation results using field LAI measurements in the agriculture region indicated that the LAI estimation algorithm could achieve satisfactory results (such as R² = 0.818, RMSE = 0.50). In addition, the developed LAI estimation algorithm had potential to operationally generate LAI datasets using GF-1 WFV land surface reflectance data, which could provide high spatial and temporal resolution LAI data for agriculture, ecosystem and environmental management researches.
Querying Co-regulated Genes on Diverse Gene Expression Datasets Via Biclustering.
Deveci, Mehmet; Küçüktunç, Onur; Eren, Kemal; Bozdağ, Doruk; Kaya, Kamer; Çatalyürek, Ümit V
2016-01-01
Rapid development and increasing popularity of gene expression microarrays have resulted in a number of studies on the discovery of co-regulated genes. One important way of discovering such co-regulations is the query-based search since gene co-expressions may indicate a shared role in a biological process. Although there exist promising query-driven search methods adapting clustering, they fail to capture many genes that function in the same biological pathway because microarray datasets are fraught with spurious samples or samples of diverse origin, or the pathways might be regulated under only a subset of samples. On the other hand, a class of clustering algorithms known as biclustering algorithms which simultaneously cluster both the items and their features are useful while analyzing gene expression data, or any data in which items are related in only a subset of their samples. This means that genes need not be related in all samples to be clustered together. Because many genes only interact under specific circumstances, biclustering may recover the relationships that traditional clustering algorithms can easily miss. In this chapter, we briefly summarize the literature using biclustering for querying co-regulated genes. Then we present a novel biclustering approach and evaluate its performance by a thorough experimental analysis.
Simulating realistic predator signatures in quantitative fatty acid signature analysis
Bromaghin, Jeffrey F.
2015-01-01
Diet estimation is an important field within quantitative ecology, providing critical insights into many aspects of ecology and community dynamics. Quantitative fatty acid signature analysis (QFASA) is a prominent method of diet estimation, particularly for marine mammal and bird species. Investigators using QFASA commonly use computer simulation to evaluate statistical characteristics of diet estimators for the populations they study. Similar computer simulations have been used to explore and compare the performance of different variations of the original QFASA diet estimator. In both cases, computer simulations involve bootstrap sampling prey signature data to construct pseudo-predator signatures with known properties. However, bootstrap sample sizes have been selected arbitrarily and pseudo-predator signatures therefore may not have realistic properties. I develop an algorithm to objectively establish bootstrap sample sizes that generates pseudo-predator signatures with realistic properties, thereby enhancing the utility of computer simulation for assessing QFASA estimator performance. The algorithm also appears to be computationally efficient, resulting in bootstrap sample sizes that are smaller than those commonly used. I illustrate the algorithm with an example using data from Chukchi Sea polar bears (Ursus maritimus) and their marine mammal prey. The concepts underlying the approach may have value in other areas of quantitative ecology in which bootstrap samples are post-processed prior to their use.
Huang, Tao; Li, Xiao-yu; Xu, Meng-ling; Jin, Rui; Ku, Jing; Xu, Sen-miao; Wu, Zhen-zhong
2015-01-01
The quality of potato is directly related to their edible value and industrial value. Hollow heart of potato, as a physiological disease occurred inside the tuber, is difficult to be detected. This paper put forward a non-destructive detection method by using semi-transmission hyperspectral imaging with support vector machine (SVM) to detect hollow heart of potato. Compared to reflection and transmission hyperspectral image, semi-transmission hyperspectral image can get clearer image which contains the internal quality information of agricultural products. In this study, 224 potato samples (149 normal samples and 75 hollow samples) were selected as the research object, and semi-transmission hyperspectral image acquisition system was constructed to acquire the hyperspectral images (390-1 040 nn) of the potato samples, and then the average spectrum of region of interest were extracted for spectral characteristics analysis. Normalize was used to preprocess the original spectrum, and prediction model were developed based on SVM using all wave bands, the accurate recognition rate of test set is only 87. 5%. In order to simplify the model competitive.adaptive reweighed sampling algorithm (CARS) and successive projection algorithm (SPA) were utilized to select important variables from the all 520 spectral variables and 8 variables were selected (454, 601, 639, 664, 748, 827, 874 and 936 nm). 94. 64% of the accurate recognition rate of test set was obtained by using the 8 variables to develop SVM model. Parameter optimization algorithms, including artificial fish swarm algorithm (AFSA), genetic algorithm (GA) and grid search algorithm, were used to optimize the SVM model parameters: penalty parameter c and kernel parameter g. After comparative analysis, AFSA, a new bionic optimization algorithm based on the foraging behavior of fish swarm, was proved to get the optimal model parameter (c=10. 659 1, g=0. 349 7), and the recognition accuracy of 10% were obtained for the AFSA-SVM model. The results indicate that combining the semi-transmission hyperspectral imaging technology with CARS-SPA and AFSA-SVM can accurately detect hollow heart of potato, and also provide technical support for rapid non-destructive detecting of hollow heart of potato.
Baldassano, Steven N; Brinkmann, Benjamin H; Ung, Hoameng; Blevins, Tyler; Conrad, Erin C; Leyde, Kent; Cook, Mark J; Khambhati, Ankit N; Wagenaar, Joost B; Worrell, Gregory A; Litt, Brian
2017-06-01
There exist significant clinical and basic research needs for accurate, automated seizure detection algorithms. These algorithms have translational potential in responsive neurostimulation devices and in automatic parsing of continuous intracranial electroencephalography data. An important barrier to developing accurate, validated algorithms for seizure detection is limited access to high-quality, expertly annotated seizure data from prolonged recordings. To overcome this, we hosted a kaggle.com competition to crowdsource the development of seizure detection algorithms using intracranial electroencephalography from canines and humans with epilepsy. The top three performing algorithms from the contest were then validated on out-of-sample patient data including standard clinical data and continuous ambulatory human data obtained over several years using the implantable NeuroVista seizure advisory system. Two hundred teams of data scientists from all over the world participated in the kaggle.com competition. The top performing teams submitted highly accurate algorithms with consistent performance in the out-of-sample validation study. The performance of these seizure detection algorithms, achieved using freely available code and data, sets a new reproducible benchmark for personalized seizure detection. We have also shared a 'plug and play' pipeline to allow other researchers to easily use these algorithms on their own datasets. The success of this competition demonstrates how sharing code and high quality data results in the creation of powerful translational tools with significant potential to impact patient care. © The Author (2017). Published by Oxford University Press on behalf of the Guarantors of Brain. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Adaptive control of turbulence intensity is accelerated by frugal flow sampling.
Quinn, Daniel B; van Halder, Yous; Lentink, David
2017-11-01
The aerodynamic performance of vehicles and animals, as well as the productivity of turbines and energy harvesters, depends on the turbulence intensity of the incoming flow. Previous studies have pointed at the potential benefits of active closed-loop turbulence control. However, it is unclear what the minimal sensory and algorithmic requirements are for realizing this control. Here we show that very low-bandwidth anemometers record sufficient information for an adaptive control algorithm to converge quickly. Our online Newton-Raphson algorithm tunes the turbulence in a recirculating wind tunnel by taking readings from an anemometer in the test section. After starting at 9% turbulence intensity, the algorithm converges on values ranging from 10% to 45% in less than 12 iterations within 1% accuracy. By down-sampling our measurements, we show that very-low-bandwidth anemometers record sufficient information for convergence. Furthermore, down-sampling accelerates convergence by smoothing gradients in turbulence intensity. Our results explain why low-bandwidth anemometers in engineering and mechanoreceptors in biology may be sufficient for adaptive control of turbulence intensity. Finally, our analysis suggests that, if certain turbulent eddy sizes are more important to control than others, frugal adaptive control schemes can be particularly computationally effective for improving performance. © 2017 The Author(s).
NASA Astrophysics Data System (ADS)
Zhu, Maohu; Jie, Nanfeng; Jiang, Tianzi
2014-03-01
A reliable and precise classification of schizophrenia is significant for its diagnosis and treatment of schizophrenia. Functional magnetic resonance imaging (fMRI) is a novel tool increasingly used in schizophrenia research. Recent advances in statistical learning theory have led to applying pattern classification algorithms to access the diagnostic value of functional brain networks, discovered from resting state fMRI data. The aim of this study was to propose an adaptive learning algorithm to distinguish schizophrenia patients from normal controls using resting-state functional language network. Furthermore, here the classification of schizophrenia was regarded as a sample selection problem where a sparse subset of samples was chosen from the labeled training set. Using these selected samples, which we call informative vectors, a classifier for the clinic diagnosis of schizophrenia was established. We experimentally demonstrated that the proposed algorithm incorporating resting-state functional language network achieved 83.6% leaveone- out accuracy on resting-state fMRI data of 27 schizophrenia patients and 28 normal controls. In contrast with KNearest- Neighbor (KNN), Support Vector Machine (SVM) and l1-norm, our method yielded better classification performance. Moreover, our results suggested that a dysfunction of resting-state functional language network plays an important role in the clinic diagnosis of schizophrenia.
K-Nearest Neighbor Algorithm Optimization in Text Categorization
NASA Astrophysics Data System (ADS)
Chen, Shufeng
2018-01-01
K-Nearest Neighbor (KNN) classification algorithm is one of the simplest methods of data mining. It has been widely used in classification, regression and pattern recognition. The traditional KNN method has some shortcomings such as large amount of sample computation and strong dependence on the sample library capacity. In this paper, a method of representative sample optimization based on CURE algorithm is proposed. On the basis of this, presenting a quick algorithm QKNN (Quick k-nearest neighbor) to find the nearest k neighbor samples, which greatly reduces the similarity calculation. The experimental results show that this algorithm can effectively reduce the number of samples and speed up the search for the k nearest neighbor samples to improve the performance of the algorithm.
Learning Time-Varying Coverage Functions
Du, Nan; Liang, Yingyu; Balcan, Maria-Florina; Song, Le
2015-01-01
Coverage functions are an important class of discrete functions that capture the law of diminishing returns arising naturally from applications in social network analysis, machine learning, and algorithmic game theory. In this paper, we propose a new problem of learning time-varying coverage functions, and develop a novel parametrization of these functions using random features. Based on the connection between time-varying coverage functions and counting processes, we also propose an efficient parameter learning algorithm based on likelihood maximization, and provide a sample complexity analysis. We applied our algorithm to the influence function estimation problem in information diffusion in social networks, and show that with few assumptions about the diffusion processes, our algorithm is able to estimate influence significantly more accurately than existing approaches on both synthetic and real world data. PMID:25960624
Learning Time-Varying Coverage Functions.
Du, Nan; Liang, Yingyu; Balcan, Maria-Florina; Song, Le
2014-12-08
Coverage functions are an important class of discrete functions that capture the law of diminishing returns arising naturally from applications in social network analysis, machine learning, and algorithmic game theory. In this paper, we propose a new problem of learning time-varying coverage functions, and develop a novel parametrization of these functions using random features. Based on the connection between time-varying coverage functions and counting processes, we also propose an efficient parameter learning algorithm based on likelihood maximization, and provide a sample complexity analysis. We applied our algorithm to the influence function estimation problem in information diffusion in social networks, and show that with few assumptions about the diffusion processes, our algorithm is able to estimate influence significantly more accurately than existing approaches on both synthetic and real world data.
Optimizing Tissue Sampling for the Diagnosis, Subtyping, and Molecular Analysis of Lung Cancer
Ofiara, Linda Marie; Navasakulpong, Asma; Beaudoin, Stephane; Gonzalez, Anne Valerie
2014-01-01
Lung cancer has entered the era of personalized therapy with histologic subclassification and the presence of molecular biomarkers becoming increasingly important in therapeutic algorithms. At the same time, biopsy specimens are becoming increasingly smaller as diagnostic algorithms seek to establish diagnosis and stage with the least invasive techniques. Here, we review techniques used in the diagnosis of lung cancer including bronchoscopy, ultrasound-guided bronchoscopy, transthoracic needle biopsy, and thoracoscopy. In addition to discussing indications and complications, we focus our discussion on diagnostic yields and the feasibility of testing for molecular biomarkers such as epidermal growth factor receptor and anaplastic lymphoma kinase, emphasizing the importance of a sufficient tumor biopsy. PMID:25295226
Building test data from real outbreaks for evaluating detection algorithms.
Texier, Gaetan; Jackson, Michael L; Siwe, Leonel; Meynard, Jean-Baptiste; Deparis, Xavier; Chaudet, Herve
2017-01-01
Benchmarking surveillance systems requires realistic simulations of disease outbreaks. However, obtaining these data in sufficient quantity, with a realistic shape and covering a sufficient range of agents, size and duration, is known to be very difficult. The dataset of outbreak signals generated should reflect the likely distribution of authentic situations faced by the surveillance system, including very unlikely outbreak signals. We propose and evaluate a new approach based on the use of historical outbreak data to simulate tailored outbreak signals. The method relies on a homothetic transformation of the historical distribution followed by resampling processes (Binomial, Inverse Transform Sampling Method-ITSM, Metropolis-Hasting Random Walk, Metropolis-Hasting Independent, Gibbs Sampler, Hybrid Gibbs Sampler). We carried out an analysis to identify the most important input parameters for simulation quality and to evaluate performance for each of the resampling algorithms. Our analysis confirms the influence of the type of algorithm used and simulation parameters (i.e. days, number of cases, outbreak shape, overall scale factor) on the results. We show that, regardless of the outbreaks, algorithms and metrics chosen for the evaluation, simulation quality decreased with the increase in the number of days simulated and increased with the number of cases simulated. Simulating outbreaks with fewer cases than days of duration (i.e. overall scale factor less than 1) resulted in an important loss of information during the simulation. We found that Gibbs sampling with a shrinkage procedure provides a good balance between accuracy and data dependency. If dependency is of little importance, binomial and ITSM methods are accurate. Given the constraint of keeping the simulation within a range of plausible epidemiological curves faced by the surveillance system, our study confirms that our approach can be used to generate a large spectrum of outbreak signals.
Building test data from real outbreaks for evaluating detection algorithms
Texier, Gaetan; Jackson, Michael L.; Siwe, Leonel; Meynard, Jean-Baptiste; Deparis, Xavier; Chaudet, Herve
2017-01-01
Benchmarking surveillance systems requires realistic simulations of disease outbreaks. However, obtaining these data in sufficient quantity, with a realistic shape and covering a sufficient range of agents, size and duration, is known to be very difficult. The dataset of outbreak signals generated should reflect the likely distribution of authentic situations faced by the surveillance system, including very unlikely outbreak signals. We propose and evaluate a new approach based on the use of historical outbreak data to simulate tailored outbreak signals. The method relies on a homothetic transformation of the historical distribution followed by resampling processes (Binomial, Inverse Transform Sampling Method—ITSM, Metropolis-Hasting Random Walk, Metropolis-Hasting Independent, Gibbs Sampler, Hybrid Gibbs Sampler). We carried out an analysis to identify the most important input parameters for simulation quality and to evaluate performance for each of the resampling algorithms. Our analysis confirms the influence of the type of algorithm used and simulation parameters (i.e. days, number of cases, outbreak shape, overall scale factor) on the results. We show that, regardless of the outbreaks, algorithms and metrics chosen for the evaluation, simulation quality decreased with the increase in the number of days simulated and increased with the number of cases simulated. Simulating outbreaks with fewer cases than days of duration (i.e. overall scale factor less than 1) resulted in an important loss of information during the simulation. We found that Gibbs sampling with a shrinkage procedure provides a good balance between accuracy and data dependency. If dependency is of little importance, binomial and ITSM methods are accurate. Given the constraint of keeping the simulation within a range of plausible epidemiological curves faced by the surveillance system, our study confirms that our approach can be used to generate a large spectrum of outbreak signals. PMID:28863159
Aerial surveillance based on hierarchical object classification for ground target detection
NASA Astrophysics Data System (ADS)
Vázquez-Cervantes, Alberto; García-Huerta, Juan-Manuel; Hernández-Díaz, Teresa; Soto-Cajiga, J. A.; Jiménez-Hernández, Hugo
2015-03-01
Unmanned aerial vehicles have turned important in surveillance application due to the flexibility and ability to inspect and displace in different regions of interest. The instrumentation and autonomy of these vehicles have been increased; i.e. the camera sensor is now integrated. Mounted cameras allow flexibility to monitor several regions of interest, displacing and changing the camera view. A well common task performed by this kind of vehicles correspond to object localization and tracking. This work presents a hierarchical novel algorithm to detect and locate objects. The algorithm is based on a detection-by-example approach; this is, the target evidence is provided at the beginning of the vehicle's route. Afterwards, the vehicle inspects the scenario, detecting all similar objects through UTM-GPS coordinate references. Detection process consists on a sampling information process of the target object. Sampling process encode in a hierarchical tree with different sampling's densities. Coding space correspond to a huge binary space dimension. Properties such as independence and associative operators are defined in this space to construct a relation between the target object and a set of selected features. Different densities of sampling are used to discriminate from general to particular features that correspond to the target. The hierarchy is used as a way to adapt the complexity of the algorithm due to optimized battery duty cycle of the aerial device. Finally, this approach is tested in several outdoors scenarios, proving that the hierarchical algorithm works efficiently under several conditions.
Miao, Qiguang; Cao, Ying; Xia, Ge; Gong, Maoguo; Liu, Jiachen; Song, Jianfeng
2016-11-01
AdaBoost has attracted much attention in the machine learning community because of its excellent performance in combining weak classifiers into strong classifiers. However, AdaBoost tends to overfit to the noisy data in many applications. Accordingly, improving the antinoise ability of AdaBoost plays an important role in many applications. The sensitiveness to the noisy data of AdaBoost stems from the exponential loss function, which puts unrestricted penalties to the misclassified samples with very large margins. In this paper, we propose two boosting algorithms, referred to as RBoost1 and RBoost2, which are more robust to the noisy data compared with AdaBoost. RBoost1 and RBoost2 optimize a nonconvex loss function of the classification margin. Because the penalties to the misclassified samples are restricted to an amount less than one, RBoost1 and RBoost2 do not overfocus on the samples that are always misclassified by the previous base learners. Besides the loss function, at each boosting iteration, RBoost1 and RBoost2 use numerically stable ways to compute the base learners. These two improvements contribute to the robustness of the proposed algorithms to the noisy training and testing samples. Experimental results on the synthetic Gaussian data set, the UCI data sets, and a real malware behavior data set illustrate that the proposed RBoost1 and RBoost2 algorithms perform better when the training data sets contain noisy data.
Chen, Zhiru; Hong, Wenxue
2016-02-01
Considering the low accuracy of prediction in the positive samples and poor overall classification effects caused by unbalanced sample data of MicroRNA (miRNA) target, we proposes a support vector machine (SVM)-integration of under-sampling and weight (IUSM) algorithm in this paper, an under-sampling based on the ensemble learning algorithm. The algorithm adopts SVM as learning algorithm and AdaBoost as integration framework, and embeds clustering-based under-sampling into the iterative process, aiming at reducing the degree of unbalanced distribution of positive and negative samples. Meanwhile, in the process of adaptive weight adjustment of the samples, the SVM-IUSM algorithm eliminates the abnormal ones in negative samples with robust sample weights smoothing mechanism so as to avoid over-learning. Finally, the prediction of miRNA target integrated classifier is achieved with the combination of multiple weak classifiers through the voting mechanism. The experiment revealed that the SVM-IUSW, compared with other algorithms on unbalanced dataset collection, could not only improve the accuracy of positive targets and the overall effect of classification, but also enhance the generalization ability of miRNA target classifier.
Evaluation of a treatment-based classification algorithm for low back pain: a cross-sectional study.
Stanton, Tasha R; Fritz, Julie M; Hancock, Mark J; Latimer, Jane; Maher, Christopher G; Wand, Benedict M; Parent, Eric C
2011-04-01
Several studies have investigated criteria for classifying patients with low back pain (LBP) into treatment-based subgroups. A comprehensive algorithm was created to translate these criteria into a clinical decision-making guide. This study investigated the translation of the individual subgroup criteria into a comprehensive algorithm by studying the prevalence of patients meeting the criteria for each treatment subgroup and the reliability of the classification. This was a cross-sectional, observational study. Two hundred fifty patients with acute or subacute LBP were recruited from the United States and Australia to participate in the study. Trained physical therapists performed standardized assessments on all participants. The researchers used these findings to classify participants into subgroups. Thirty-one participants were reassessed to determine interrater reliability of the algorithm decision. Based on individual subgroup criteria, 25.2% (95% confidence interval [CI]=19.8%-30.6%) of the participants did not meet the criteria for any subgroup, 49.6% (95% CI=43.4%-55.8%) of the participants met the criteria for only one subgroup, and 25.2% (95% CI=19.8%-30.6%) of the participants met the criteria for more than one subgroup. The most common combination of subgroups was manipulation + specific exercise (68.4% of the participants who met the criteria for 2 subgroups). Reliability of the algorithm decision was moderate (kappa=0.52, 95% CI=0.27-0.77, percentage of agreement=67%). Due to a relatively small patient sample, reliability estimates are somewhat imprecise. These findings provide important clinical data to guide future research and revisions to the algorithm. The finding that 25% of the participants met the criteria for more than one subgroup has important implications for the sequencing of treatments in the algorithm. Likewise, the finding that 25% of the participants did not meet the criteria for any subgroup provides important information regarding potential revisions to the algorithm's bottom table (which guides unclear classifications). Reliability of the algorithm is sufficient for clinical use.
Numerical computation of solar neutrino flux attenuated by the MSW mechanism
NASA Astrophysics Data System (ADS)
Kim, Jai Sam; Chae, Yoon Sang; Kim, Jung Dae
1999-07-01
We compute the survival probability of an electron neutrino in its flight through the solar core experiencing the Mikheyev-Smirnov-Wolfenstein effect with all three neutrino species considered. We adopted a hybrid method that uses an accurate approximation formula in the non-resonance region and numerical integration in the non-adiabatic resonance region. The key of our algorithm is to use the importance sampling method for sampling the neutrino creation energy and position and to find the optimum radii to start and stop numerical integration. We further developed a parallel algorithm for a message passing parallel computer. By using an idea of job token, we have developed a dynamical load balancing mechanism which is effective under any irregular load distributions
Memory-efficient dynamic programming backtrace and pairwise local sequence alignment.
Newberg, Lee A
2008-08-15
A backtrace through a dynamic programming algorithm's intermediate results in search of an optimal path, or to sample paths according to an implied probability distribution, or as the second stage of a forward-backward algorithm, is a task of fundamental importance in computational biology. When there is insufficient space to store all intermediate results in high-speed memory (e.g. cache) existing approaches store selected stages of the computation, and recompute missing values from these checkpoints on an as-needed basis. Here we present an optimal checkpointing strategy, and demonstrate its utility with pairwise local sequence alignment of sequences of length 10,000. Sample C++-code for optimal backtrace is available in the Supplementary Materials. Supplementary data is available at Bioinformatics online.
Evaluation of ultrasonic array imaging algorithms for inspection of a coarse grained material
NASA Astrophysics Data System (ADS)
Van Pamel, A.; Lowe, M. J. S.; Brett, C. R.
2014-02-01
Improving the ultrasound inspection capability for coarse grain metals remains of longstanding interest to industry and the NDE research community and is expected to become increasingly important for next generation power plants. A test sample of coarse grained Inconel 625 which is representative of future power plant components has been manufactured to test the detectability of different inspection techniques. Conventional ultrasonic A, B, and C-scans showed the sample to be extraordinarily difficult to inspect due to its scattering behaviour. However, in recent years, array probes and Full Matrix Capture (FMC) imaging algorithms, which extract the maximum amount of information possible, have unlocked exciting possibilities for improvements. This article proposes a robust methodology to evaluate the detection performance of imaging algorithms, applying this to three FMC imaging algorithms; Total Focusing Method (TFM), Phase Coherent Imaging (PCI), and Decomposition of the Time Reversal Operator with Multiple Scattering (DORT MSF). The methodology considers the statistics of detection, presenting the detection performance as Probability of Detection (POD) and probability of False Alarm (PFA). The data is captured in pulse-echo mode using 64 element array probes at centre frequencies of 1MHz and 5MHz. All three algorithms are shown to perform very similarly when comparing their flaw detection capabilities on this particular case.
The mGA1.0: A common LISP implementation of a messy genetic algorithm
NASA Technical Reports Server (NTRS)
Goldberg, David E.; Kerzic, Travis
1990-01-01
Genetic algorithms (GAs) are finding increased application in difficult search, optimization, and machine learning problems in science and engineering. Increasing demands are being placed on algorithm performance, and the remaining challenges of genetic algorithm theory and practice are becoming increasingly unavoidable. Perhaps the most difficult of these challenges is the so-called linkage problem. Messy GAs were created to overcome the linkage problem of simple genetic algorithms by combining variable-length strings, gene expression, messy operators, and a nonhomogeneous phasing of evolutionary processing. Results on a number of difficult deceptive test functions are encouraging with the mGA always finding global optima in a polynomial number of function evaluations. Theoretical and empirical studies are continuing, and a first version of a messy GA is ready for testing by others. A Common LISP implementation called mGA1.0 is documented and related to the basic principles and operators developed by Goldberg et. al. (1989, 1990). Although the code was prepared with care, it is not a general-purpose code, only a research version. Important data structures and global variations are described. Thereafter brief function descriptions are given, and sample input data are presented together with sample program output. A source listing with comments is also included.
Chen, Nan; Majda, Andrew J
2017-12-05
Solving the Fokker-Planck equation for high-dimensional complex dynamical systems is an important issue. Recently, the authors developed efficient statistically accurate algorithms for solving the Fokker-Planck equations associated with high-dimensional nonlinear turbulent dynamical systems with conditional Gaussian structures, which contain many strong non-Gaussian features such as intermittency and fat-tailed probability density functions (PDFs). The algorithms involve a hybrid strategy with a small number of samples [Formula: see text], where a conditional Gaussian mixture in a high-dimensional subspace via an extremely efficient parametric method is combined with a judicious Gaussian kernel density estimation in the remaining low-dimensional subspace. In this article, two effective strategies are developed and incorporated into these algorithms. The first strategy involves a judicious block decomposition of the conditional covariance matrix such that the evolutions of different blocks have no interactions, which allows an extremely efficient parallel computation due to the small size of each individual block. The second strategy exploits statistical symmetry for a further reduction of [Formula: see text] The resulting algorithms can efficiently solve the Fokker-Planck equation with strongly non-Gaussian PDFs in much higher dimensions even with orders in the millions and thus beat the curse of dimension. The algorithms are applied to a [Formula: see text]-dimensional stochastic coupled FitzHugh-Nagumo model for excitable media. An accurate recovery of both the transient and equilibrium non-Gaussian PDFs requires only [Formula: see text] samples! In addition, the block decomposition facilitates the algorithms to efficiently capture the distinct non-Gaussian features at different locations in a [Formula: see text]-dimensional two-layer inhomogeneous Lorenz 96 model, using only [Formula: see text] samples. Copyright © 2017 the Author(s). Published by PNAS.
Moteghaed, Niloofar Yousefi; Maghooli, Keivan; Garshasbi, Masoud
2018-01-01
Background: Gene expression data are characteristically high dimensional with a small sample size in contrast to the feature size and variability inherent in biological processes that contribute to difficulties in analysis. Selection of highly discriminative features decreases the computational cost and complexity of the classifier and improves its reliability for prediction of a new class of samples. Methods: The present study used hybrid particle swarm optimization and genetic algorithms for gene selection and a fuzzy support vector machine (SVM) as the classifier. Fuzzy logic is used to infer the importance of each sample in the training phase and decrease the outlier sensitivity of the system to increase the ability to generalize the classifier. A decision-tree algorithm was applied to the most frequent genes to develop a set of rules for each type of cancer. This improved the abilities of the algorithm by finding the best parameters for the classifier during the training phase without the need for trial-and-error by the user. The proposed approach was tested on four benchmark gene expression profiles. Results: Good results have been demonstrated for the proposed algorithm. The classification accuracy for leukemia data is 100%, for colon cancer is 96.67% and for breast cancer is 98%. The results show that the best kernel used in training the SVM classifier is the radial basis function. Conclusions: The experimental results show that the proposed algorithm can decrease the dimensionality of the dataset, determine the most informative gene subset, and improve classification accuracy using the optimal parameters of the classifier with no user interface. PMID:29535919
Zhang, He-Hua; Yang, Liuyang; Liu, Yuchuan; Wang, Pin; Yin, Jun; Li, Yongming; Qiu, Mingguo; Zhu, Xueru; Yan, Fang
2016-11-16
The use of speech based data in the classification of Parkinson disease (PD) has been shown to provide an effect, non-invasive mode of classification in recent years. Thus, there has been an increased interest in speech pattern analysis methods applicable to Parkinsonism for building predictive tele-diagnosis and tele-monitoring models. One of the obstacles in optimizing classifications is to reduce noise within the collected speech samples, thus ensuring better classification accuracy and stability. While the currently used methods are effect, the ability to invoke instance selection has been seldomly examined. In this study, a PD classification algorithm was proposed and examined that combines a multi-edit-nearest-neighbor (MENN) algorithm and an ensemble learning algorithm. First, the MENN algorithm is applied for selecting optimal training speech samples iteratively, thereby obtaining samples with high separability. Next, an ensemble learning algorithm, random forest (RF) or decorrelated neural network ensembles (DNNE), is used to generate trained samples from the collected training samples. Lastly, the trained ensemble learning algorithms are applied to the test samples for PD classification. This proposed method was examined using a more recently deposited public datasets and compared against other currently used algorithms for validation. Experimental results showed that the proposed algorithm obtained the highest degree of improved classification accuracy (29.44%) compared with the other algorithm that was examined. Furthermore, the MENN algorithm alone was found to improve classification accuracy by as much as 45.72%. Moreover, the proposed algorithm was found to exhibit a higher stability, particularly when combining the MENN and RF algorithms. This study showed that the proposed method could improve PD classification when using speech data and can be applied to future studies seeking to improve PD classification methods.
A sampling algorithm for segregation analysis
Tier, Bruce; Henshall, John
2001-01-01
Methods for detecting Quantitative Trait Loci (QTL) without markers have generally used iterative peeling algorithms for determining genotype probabilities. These algorithms have considerable shortcomings in complex pedigrees. A Monte Carlo Markov chain (MCMC) method which samples the pedigree of the whole population jointly is described. Simultaneous sampling of the pedigree was achieved by sampling descent graphs using the Metropolis-Hastings algorithm. A descent graph describes the inheritance state of each allele and provides pedigrees guaranteed to be consistent with Mendelian sampling. Sampling descent graphs overcomes most, if not all, of the limitations incurred by iterative peeling algorithms. The algorithm was able to find the QTL in most of the simulated populations. However, when the QTL was not modeled or found then its effect was ascribed to the polygenic component. No QTL were detected when they were not simulated. PMID:11742631
An evolution based biosensor receptor DNA sequence generation algorithm.
Kim, Eungyeong; Lee, Malrey; Gatton, Thomas M; Lee, Jaewan; Zang, Yupeng
2010-01-01
A biosensor is composed of a bioreceptor, an associated recognition molecule, and a signal transducer that can selectively detect target substances for analysis. DNA based biosensors utilize receptor molecules that allow hybridization with the target analyte. However, most DNA biosensor research uses oligonucleotides as the target analytes and does not address the potential problems of real samples. The identification of recognition molecules suitable for real target analyte samples is an important step towards further development of DNA biosensors. This study examines the characteristics of DNA used as bioreceptors and proposes a hybrid evolution-based DNA sequence generating algorithm, based on DNA computing, to identify suitable DNA bioreceptor recognition molecules for stable hybridization with real target substances. The Traveling Salesman Problem (TSP) approach is applied in the proposed algorithm to evaluate the safety and fitness of the generated DNA sequences. This approach improves efficiency and stability for enhanced and variable-length DNA sequence generation and allows extension to generation of variable-length DNA sequences with diverse receptor recognition requirements.
Reinharz, Vladimir; Ponty, Yann; Waldispühl, Jérôme
2013-07-01
The design of RNA sequences folding into predefined secondary structures is a milestone for many synthetic biology and gene therapy studies. Most of the current software uses similar local search strategies (i.e. a random seed is progressively adapted to acquire the desired folding properties) and more importantly do not allow the user to control explicitly the nucleotide distribution such as the GC-content in their sequences. However, the latter is an important criterion for large-scale applications as it could presumably be used to design sequences with better transcription rates and/or structural plasticity. In this article, we introduce IncaRNAtion, a novel algorithm to design RNA sequences folding into target secondary structures with a predefined nucleotide distribution. IncaRNAtion uses a global sampling approach and weighted sampling techniques. We show that our approach is fast (i.e. running time comparable or better than local search methods), seedless (we remove the bias of the seed in local search heuristics) and successfully generates high-quality sequences (i.e. thermodynamically stable) for any GC-content. To complete this study, we develop a hybrid method combining our global sampling approach with local search strategies. Remarkably, our glocal methodology overcomes both local and global approaches for sampling sequences with a specific GC-content and target structure. IncaRNAtion is available at csb.cs.mcgill.ca/incarnation/. Supplementary data are available at Bioinformatics online.
NASA Astrophysics Data System (ADS)
Svejkosky, Joseph
The spectral signatures of vehicles in hyperspectral imagery exhibit temporal variations due to the preponderance of surfaces with material properties that display non-Lambertian bi-directional reflectance distribution functions (BRDFs). These temporal variations are caused by changing illumination conditions, changing sun-target-sensor geometry, changing road surface properties, and changing vehicle orientations. To quantify these variations and determine their relative importance in a sub-pixel vehicle reacquisition and tracking scenario, a hyperspectral vehicle BRDF sampling experiment was conducted in which four vehicles were rotated at different orientations and imaged over a six-hour period. The hyperspectral imagery was calibrated using novel in-scene methods and converted to reflectance imagery. The resulting BRDF sampled time-series imagery showed a strong vehicle level BRDF dependence on vehicle shape in off-nadir imaging scenarios and a strong dependence on vehicle color in simulated nadir imaging scenarios. The imagery also exhibited spectral features characteristic of sampling the BRDF of non-Lambertian targets, which were subsequently verified with simulations. In addition, the imagery demonstrated that the illumination contribution from vehicle adjacent horizontal surfaces significantly altered the shape and magnitude of the vehicle reflectance spectrum. The results of the BRDF sampling experiment illustrate the need for a target vehicle BRDF model and detection scheme that incorporates non-Lambertian BRDFs. A new detection algorithm called Eigenvector Loading Regression (ELR) is proposed that learns a hyperspectral vehicle BRDF from a series of BRDF measurements using regression in a lower dimensional space and then applies the learned BRDF to make test spectrum predictions. In cases of non-Lambertian vehicle BRDF, this detection methodology performs favorably when compared to subspace detections algorithms and graph-based detection algorithms that do not account for the target BRDF. The algorithms are compared using a test environment in which observed spectral reflectance signatures from the BRDF sampling experiment are implanted into aerial hyperspectral imagery that contain large quantities of vehicles.
Improved pulse laser ranging algorithm based on high speed sampling
NASA Astrophysics Data System (ADS)
Gao, Xuan-yi; Qian, Rui-hai; Zhang, Yan-mei; Li, Huan; Guo, Hai-chao; He, Shi-jie; Guo, Xiao-kang
2016-10-01
Narrow pulse laser ranging achieves long-range target detection using laser pulse with low divergent beams. Pulse laser ranging is widely used in military, industrial, civil, engineering and transportation field. In this paper, an improved narrow pulse laser ranging algorithm is studied based on the high speed sampling. Firstly, theoretical simulation models have been built and analyzed including the laser emission and pulse laser ranging algorithm. An improved pulse ranging algorithm is developed. This new algorithm combines the matched filter algorithm and the constant fraction discrimination (CFD) algorithm. After the algorithm simulation, a laser ranging hardware system is set up to implement the improved algorithm. The laser ranging hardware system includes a laser diode, a laser detector and a high sample rate data logging circuit. Subsequently, using Verilog HDL language, the improved algorithm is implemented in the FPGA chip based on fusion of the matched filter algorithm and the CFD algorithm. Finally, the laser ranging experiment is carried out to test the improved algorithm ranging performance comparing to the matched filter algorithm and the CFD algorithm using the laser ranging hardware system. The test analysis result demonstrates that the laser ranging hardware system realized the high speed processing and high speed sampling data transmission. The algorithm analysis result presents that the improved algorithm achieves 0.3m distance ranging precision. The improved algorithm analysis result meets the expected effect, which is consistent with the theoretical simulation.
Generating probabilistic Boolean networks from a prescribed transition probability matrix.
Ching, W-K; Chen, X; Tsing, N-K
2009-11-01
Probabilistic Boolean networks (PBNs) have received much attention in modeling genetic regulatory networks. A PBN can be regarded as a Markov chain process and is characterised by a transition probability matrix. In this study, the authors propose efficient algorithms for constructing a PBN when its transition probability matrix is given. The complexities of the algorithms are also analysed. This is an interesting inverse problem in network inference using steady-state data. The problem is important as most microarray data sets are assumed to be obtained from sampling the steady-state.
Space resection model calculation based on Random Sample Consensus algorithm
NASA Astrophysics Data System (ADS)
Liu, Xinzhu; Kang, Zhizhong
2016-03-01
Resection has been one of the most important content in photogrammetry. It aims at the position and attitude information of camera at the shooting point. However in some cases, the observed values for calculating are with gross errors. This paper presents a robust algorithm that using RANSAC method with DLT model can effectually avoiding the difficulties to determine initial values when using co-linear equation. The results also show that our strategies can exclude crude handicap and lead to an accurate and efficient way to gain elements of exterior orientation.
GARLIC: GAmma Reconstruction at a LInear Collider experiment
NASA Astrophysics Data System (ADS)
Jeans, D.; Brient, J.-C.; Reinhard, M.
2012-06-01
The precise measurement of hadronic jet energy is crucial to maximise the physics reach of a future Linear Collider. An important ingredient required to achieve this is the efficient identification of photons within hadronic showers. One configuration of the ILD detector concept employs a highly granular silicon-tungsten sampling calorimeter to identify and measure photons, and the GARLIC algorithm described in this paper has been developed to identify photons in such a calorimeter. We describe the algorithm and characterise its performance using events fully simulated in a model of the ILD detector.
Data Mining Methods for Recommender Systems
NASA Astrophysics Data System (ADS)
Amatriain, Xavier; Jaimes*, Alejandro; Oliver, Nuria; Pujol, Josep M.
In this chapter, we give an overview of the main Data Mining techniques used in the context of Recommender Systems. We first describe common preprocessing methods such as sampling or dimensionality reduction. Next, we review the most important classification techniques, including Bayesian Networks and Support Vector Machines. We describe the k-means clustering algorithm and discuss several alternatives. We also present association rules and related algorithms for an efficient training process. In addition to introducing these techniques, we survey their uses in Recommender Systems and present cases where they have been successfully applied.
Nangia, Shikha; Jasper, Ahren W; Miller, Thomas F; Truhlar, Donald G
2004-02-22
The most widely used algorithm for Monte Carlo sampling of electronic transitions in trajectory surface hopping (TSH) calculations is the so-called anteater algorithm, which is inefficient for sampling low-probability nonadiabatic events. We present a new sampling scheme (called the army ants algorithm) for carrying out TSH calculations that is applicable to systems with any strength of coupling. The army ants algorithm is a form of rare event sampling whose efficiency is controlled by an input parameter. By choosing a suitable value of the input parameter the army ants algorithm can be reduced to the anteater algorithm (which is efficient for strongly coupled cases), and by optimizing the parameter the army ants algorithm may be efficiently applied to systems with low-probability events. To demonstrate the efficiency of the army ants algorithm, we performed atom-diatom scattering calculations on a model system involving weakly coupled electronic states. Fully converged quantum mechanical calculations were performed, and the probabilities for nonadiabatic reaction and nonreactive deexcitation (quenching) were found to be on the order of 10(-8). For such low-probability events the anteater sampling scheme requires a large number of trajectories ( approximately 10(10)) to obtain good statistics and converged semiclassical results. In contrast by using the new army ants algorithm converged results were obtained by running 10(5) trajectories. Furthermore, the results were found to be in excellent agreement with the quantum mechanical results. Sampling errors were estimated using the bootstrap method, which is validated for use with the army ants algorithm. (c) 2004 American Institute of Physics.
A randomised approach for NARX model identification based on a multivariate Bernoulli distribution
NASA Astrophysics Data System (ADS)
Bianchi, F.; Falsone, A.; Prandini, M.; Piroddi, L.
2017-04-01
The identification of polynomial NARX models is typically performed by incremental model building techniques. These methods assess the importance of each regressor based on the evaluation of partial individual models, which may ultimately lead to erroneous model selections. A more robust assessment of the significance of a specific model term can be obtained by considering ensembles of models, as done by the RaMSS algorithm. In that context, the identification task is formulated in a probabilistic fashion and a Bernoulli distribution is employed to represent the probability that a regressor belongs to the target model. Then, samples of the model distribution are collected to gather reliable information to update it, until convergence to a specific model. The basic RaMSS algorithm employs multiple independent univariate Bernoulli distributions associated to the different candidate model terms, thus overlooking the correlations between different terms, which are typically important in the selection process. Here, a multivariate Bernoulli distribution is employed, in which the sampling of a given term is conditioned by the sampling of the others. The added complexity inherent in considering the regressor correlation properties is more than compensated by the achievable improvements in terms of accuracy of the model selection process.
Liu, Chang; Wang, Guofeng; Xie, Qinglu; Zhang, Yanchao
2014-01-01
Effective fault classification of rolling element bearings provides an important basis for ensuring safe operation of rotating machinery. In this paper, a novel vibration sensor-based fault diagnosis method using an Ellipsoid-ARTMAP network (EAM) and a differential evolution (DE) algorithm is proposed. The original features are firstly extracted from vibration signals based on wavelet packet decomposition. Then, a minimum-redundancy maximum-relevancy algorithm is introduced to select the most prominent features so as to decrease feature dimensions. Finally, a DE-based EAM (DE-EAM) classifier is constructed to realize the fault diagnosis. The major characteristic of EAM is that the sample distribution of each category is realized by using a hyper-ellipsoid node and smoothing operation algorithm. Therefore, it can depict the decision boundary of disperse samples accurately and effectively avoid over-fitting phenomena. To optimize EAM network parameters, the DE algorithm is presented and two objectives, including both classification accuracy and nodes number, are simultaneously introduced as the fitness functions. Meanwhile, an exponential criterion is proposed to realize final selection of the optimal parameters. To prove the effectiveness of the proposed method, the vibration signals of four types of rolling element bearings under different loads were collected. Moreover, to improve the robustness of the classifier evaluation, a two-fold cross validation scheme is adopted and the order of feature samples is randomly arranged ten times within each fold. The results show that DE-EAM classifier can recognize the fault categories of the rolling element bearings reliably and accurately. PMID:24936949
Cost-effective analysis of different algorithms for the diagnosis of hepatitis C virus infection.
Barreto, A M E C; Takei, K; E C, Sabino; Bellesa, M A O; Salles, N A; Barreto, C C; Nishiya, A S; Chamone, D F
2008-02-01
We compared the cost-benefit of two algorithms, recently proposed by the Centers for Disease Control and Prevention, USA, with the conventional one, the most appropriate for the diagnosis of hepatitis C virus (HCV) infection in the Brazilian population. Serum samples were obtained from 517 ELISA-positive or -inconclusive blood donors who had returned to Fundação Pró-Sangue/Hemocentro de São Paulo to confirm previous results. Algorithm A was based on signal-to-cut-off (s/co) ratio of ELISA anti-HCV samples that show s/co ratio > or =95% concordance with immunoblot (IB) positivity. For algorithm B, reflex nucleic acid amplification testing by PCR was required for ELISA-positive or -inconclusive samples and IB for PCR-negative samples. For algorithm C, all positive or inconclusive ELISA samples were submitted to IB. We observed a similar rate of positive results with the three algorithms: 287, 287, and 285 for A, B, and C, respectively, and 283 were concordant with one another. Indeterminate results from algorithms A and C were elucidated by PCR (expanded algorithm) which detected two more positive samples. The estimated cost of algorithms A and B was US$21,299.39 and US$32,397.40, respectively, which were 43.5 and 14.0% more economic than C (US$37,673.79). The cost can vary according to the technique used. We conclude that both algorithms A and B are suitable for diagnosing HCV infection in the Brazilian population. Furthermore, algorithm A is the more practical and economical one since it requires supplemental tests for only 54% of the samples. Algorithm B provides early information about the presence of viremia.
Visual Tracking via Sparse and Local Linear Coding.
Wang, Guofeng; Qin, Xueying; Zhong, Fan; Liu, Yue; Li, Hongbo; Peng, Qunsheng; Yang, Ming-Hsuan
2015-11-01
The state search is an important component of any object tracking algorithm. Numerous algorithms have been proposed, but stochastic sampling methods (e.g., particle filters) are arguably one of the most effective approaches. However, the discretization of the state space complicates the search for the precise object location. In this paper, we propose a novel tracking algorithm that extends the state space of particle observations from discrete to continuous. The solution is determined accurately via iterative linear coding between two convex hulls. The algorithm is modeled by an optimal function, which can be efficiently solved by either convex sparse coding or locality constrained linear coding. The algorithm is also very flexible and can be combined with many generic object representations. Thus, we first use sparse representation to achieve an efficient searching mechanism of the algorithm and demonstrate its accuracy. Next, two other object representation models, i.e., least soft-threshold squares and adaptive structural local sparse appearance, are implemented with improved accuracy to demonstrate the flexibility of our algorithm. Qualitative and quantitative experimental results demonstrate that the proposed tracking algorithm performs favorably against the state-of-the-art methods in dynamic scenes.
A Class of Manifold Regularized Multiplicative Update Algorithms for Image Clustering.
Yang, Shangming; Yi, Zhang; He, Xiaofei; Li, Xuelong
2015-12-01
Multiplicative update algorithms are important tools for information retrieval, image processing, and pattern recognition. However, when the graph regularization is added to the cost function, different classes of sample data may be mapped to the same subspace, which leads to the increase of data clustering error rate. In this paper, an improved nonnegative matrix factorization (NMF) cost function is introduced. Based on the cost function, a class of novel graph regularized NMF algorithms is developed, which results in a class of extended multiplicative update algorithms with manifold structure regularization. Analysis shows that in the learning, the proposed algorithms can efficiently minimize the rank of the data representation matrix. Theoretical results presented in this paper are confirmed by simulations. For different initializations and data sets, variation curves of cost functions and decomposition data are presented to show the convergence features of the proposed update rules. Basis images, reconstructed images, and clustering results are utilized to present the efficiency of the new algorithms. Last, the clustering accuracies of different algorithms are also investigated, which shows that the proposed algorithms can achieve state-of-the-art performance in applications of image clustering.
Wang, Junxiao; Wang, Xiaorui; Zhou, Shenglu; Wu, Shaohua; Zhu, Yan; Lu, Chunfeng
2016-01-01
With China’s rapid economic development, the reduction in arable land has emerged as one of the most prominent problems in the nation. The long-term dynamic monitoring of arable land quality is important for protecting arable land resources. An efficient practice is to select optimal sample points while obtaining accurate predictions. To this end, the selection of effective points from a dense set of soil sample points is an urgent problem. In this study, data were collected from Donghai County, Jiangsu Province, China. The number and layout of soil sample points are optimized by considering the spatial variations in soil properties and by using an improved simulated annealing (SA) algorithm. The conclusions are as follows: (1) Optimization results in the retention of more sample points in the moderate- and high-variation partitions of the study area; (2) The number of optimal sample points obtained with the improved SA algorithm is markedly reduced, while the accuracy of the predicted soil properties is improved by approximately 5% compared with the raw data; (3) With regard to the monitoring of arable land quality, a dense distribution of sample points is needed to monitor the granularity. PMID:27706051
Rare Event Simulation in Radiation Transport
NASA Astrophysics Data System (ADS)
Kollman, Craig
This dissertation studies methods for estimating extremely small probabilities by Monte Carlo simulation. Problems in radiation transport typically involve estimating very rare events or the expected value of a random variable which is with overwhelming probability equal to zero. These problems often have high dimensional state spaces and irregular geometries so that analytic solutions are not possible. Monte Carlo simulation must be used to estimate the radiation dosage being transported to a particular location. If the area is well shielded the probability of any one particular particle getting through is very small. Because of the large number of particles involved, even a tiny fraction penetrating the shield may represent an unacceptable level of radiation. It therefore becomes critical to be able to accurately estimate this extremely small probability. Importance sampling is a well known technique for improving the efficiency of rare event calculations. Here, a new set of probabilities is used in the simulation runs. The results are multiplied by the likelihood ratio between the true and simulated probabilities so as to keep our estimator unbiased. The variance of the resulting estimator is very sensitive to which new set of transition probabilities are chosen. It is shown that a zero variance estimator does exist, but that its computation requires exact knowledge of the solution. A simple random walk with an associated killing model for the scatter of neutrons is introduced. Large deviation results for optimal importance sampling in random walks are extended to the case where killing is present. An adaptive "learning" algorithm for implementing importance sampling is given for more general Markov chain models of neutron scatter. For finite state spaces this algorithm is shown to give, with probability one, a sequence of estimates converging exponentially fast to the true solution. In the final chapter, an attempt to generalize this algorithm to a continuous state space is made. This involves partitioning the space into a finite number of cells. There is a tradeoff between additional computation per iteration and variance reduction per iteration that arises in determining the optimal grid size. All versions of this algorithm can be thought of as a compromise between deterministic and Monte Carlo methods, capturing advantages of both techniques.
On Bayesian Testing of Additive Conjoint Measurement Axioms Using Synthetic Likelihood
ERIC Educational Resources Information Center
Karabatsos, George
2017-01-01
This article introduces a Bayesian method for testing the axioms of additive conjoint measurement. The method is based on an importance sampling algorithm that performs likelihood-free, approximate Bayesian inference using a synthetic likelihood to overcome the analytical intractability of this testing problem. This new method improves upon…
Selected-node stochastic simulation algorithm
NASA Astrophysics Data System (ADS)
Duso, Lorenzo; Zechner, Christoph
2018-04-01
Stochastic simulations of biochemical networks are of vital importance for understanding complex dynamics in cells and tissues. However, existing methods to perform such simulations are associated with computational difficulties and addressing those remains a daunting challenge to the present. Here we introduce the selected-node stochastic simulation algorithm (snSSA), which allows us to exclusively simulate an arbitrary, selected subset of molecular species of a possibly large and complex reaction network. The algorithm is based on an analytical elimination of chemical species, thereby avoiding explicit simulation of the associated chemical events. These species are instead described continuously in terms of statistical moments derived from a stochastic filtering equation, resulting in a substantial speedup when compared to Gillespie's stochastic simulation algorithm (SSA). Moreover, we show that statistics obtained via snSSA profit from a variance reduction, which can significantly lower the number of Monte Carlo samples needed to achieve a certain performance. We demonstrate the algorithm using several biological case studies for which the simulation time could be reduced by orders of magnitude.
Assessing the external validity of algorithms to estimate EQ-5D-3L from the WOMAC.
Kiadaliri, Aliasghar A; Englund, Martin
2016-10-04
The use of mapping algorithms have been suggested as a solution to predict health utilities when no preference-based measure is included in the study. However, validity and predictive performance of these algorithms are highly variable and hence assessing the accuracy and validity of algorithms before use them in a new setting is of importance. The aim of the current study was to assess the predictive accuracy of three mapping algorithms to estimate the EQ-5D-3L from the Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC) among Swedish people with knee disorders. Two of these algorithms developed using ordinary least squares (OLS) models and one developed using mixture model. The data from 1078 subjects mean (SD) age 69.4 (7.2) years with frequent knee pain and/or knee osteoarthritis from the Malmö Osteoarthritis study in Sweden were used. The algorithms' performance was assessed using mean error, mean absolute error, and root mean squared error. Two types of prediction were estimated for mixture model: weighted average (WA), and conditional on estimated component (CEC). The overall mean was overpredicted by an OLS model and underpredicted by two other algorithms (P < 0.001). All predictions but the CEC predictions of mixture model had a narrower range than the observed scores (22 to 90 %). All algorithms suffered from overprediction for severe health states and underprediction for mild health states with lesser extent for mixture model. While the mixture model outperformed OLS models at the extremes of the EQ-5D-3D distribution, it underperformed around the center of the distribution. While algorithm based on mixture model reflected the distribution of EQ-5D-3L data more accurately compared with OLS models, all algorithms suffered from systematic bias. This calls for caution in applying these mapping algorithms in a new setting particularly in samples with milder knee problems than original sample. Assessing the impact of the choice of these algorithms on cost-effectiveness studies through sensitivity analysis is recommended.
Small convolution kernels for high-fidelity image restoration
NASA Technical Reports Server (NTRS)
Reichenbach, Stephen E.; Park, Stephen K.
1991-01-01
An algorithm is developed for computing the mean-square-optimal values for small, image-restoration kernels. The algorithm is based on a comprehensive, end-to-end imaging system model that accounts for the important components of the imaging process: the statistics of the scene, the point-spread function of the image-gathering device, sampling effects, noise, and display reconstruction. Subject to constraints on the spatial support of the kernel, the algorithm generates the kernel values that restore the image with maximum fidelity, that is, the kernel minimizes the expected mean-square restoration error. The algorithm is consistent with the derivation of the spatially unconstrained Wiener filter, but leads to a small, spatially constrained kernel that, unlike the unconstrained filter, can be efficiently implemented by convolution. Simulation experiments demonstrate that for a wide range of imaging systems these small kernels can restore images with fidelity comparable to images restored with the unconstrained Wiener filter.
NASA Astrophysics Data System (ADS)
Qian, Kun; Zhou, Huixin; Rong, Shenghui; Wang, Bingjian; Cheng, Kuanhong
2017-05-01
Infrared small target tracking plays an important role in applications including military reconnaissance, early warning and terminal guidance. In this paper, an effective algorithm based on the Singular Value Decomposition (SVD) and the improved Kernelized Correlation Filter (KCF) is presented for infrared small target tracking. Firstly, the super performance of the SVD-based algorithm is that it takes advantage of the target's global information and obtains a background estimation of an infrared image. A dim target is enhanced by subtracting the corresponding estimated background with update from the original image. Secondly, the KCF algorithm is combined with Gaussian Curvature Filter (GCF) to eliminate the excursion problem. The GCF technology is adopted to preserve the edge and eliminate the noise of the base sample in the KCF algorithm, helping to calculate the classifier parameter for a small target. At last, the target position is estimated with a response map, which is obtained via the kernelized classifier. Experimental results demonstrate that the presented algorithm performs favorably in terms of efficiency and accuracy, compared with several state-of-the-art algorithms.
Space Weather Activities of IONOLAB Group: TEC Mapping
NASA Astrophysics Data System (ADS)
Arikan, F.; Yilmaz, A.; Arikan, O.; Sayin, I.; Gurun, M.; Akdogan, K. E.; Yildirim, S. A.
2009-04-01
Being a key player in Space Weather, ionospheric variability affects the performance of both communication and navigation systems. To improve the performance of these systems, ionosphere has to be monitored. Total Electron Content (TEC), line integral of the electron density along a ray path, is an important parameter to investigate the ionospheric variability. A cost-effective way of obtaining TEC is by using dual-frequency GPS receivers. Since these measurements are sparse in space, accurate and robust interpolation techniques are needed to interpolate (or map) the TEC distribution for a given region in space. However, the TEC data derived from GPS measurements contain measurement noise, model and computational errors. Thus, it is necessary to analyze the interpolation performance of the techniques on synthetic data sets that can represent various ionospheric states. By this way, interpolation performance of the techniques can be compared over many parameters that can be controlled to represent the desired ionospheric states. In this study, Multiquadrics, Inverse Distance Weighting (IDW), Cubic Splines, Ordinary and Universal Kriging, Random Field Priors (RFP), Multi-Layer Perceptron Neural Network (MLP-NN), and Radial Basis Function Neural Network (RBF-NN) are employed as the spatial interpolation algorithms. These mapping techniques are initially tried on synthetic TEC surfaces for parameter and coefficient optimization and determination of error bounds. Interpolation performance of these methods are compared on synthetic TEC surfaces over the parameters of sampling pattern, number of samples, the variability of the surface and the trend type in the TEC surfaces. By examining the performance of the interpolation methods, it is observed that both Kriging, RFP and NN have important advantages and possible disadvantages depending on the given constraints. It is also observed that the determining parameter in the error performance is the trend in the Ionosphere. Optimization of the algorithms in terms of their performance parameters (like the choice of the semivariogram function for Kriging algorithms and the hidden layer and neuron numbers for MLP-NN) mostly depend on the behavior of the ionosphere at that given time instant for the desired region. The sampling pattern and number of samples are the other important parameters that may contribute to the higher errors in reconstruction. For example, for all of the above listed algorithms, hexagonal regular sampling of the ionosphere provides the lowest reconstruction error and the performance significantly degrades as the samples in the region become sparse and clustered. The optimized models and coefficients are applied to regional GPS-TEC mapping using the IONOLAB-TEC data (www.ionolab.org). Both Kriging combined with Kalman Filter and dynamic modeling of NN are also implemented as first trials of TEC and space weather predictions.
Vectorized Rebinning Algorithm for Fast Data Down-Sampling
NASA Technical Reports Server (NTRS)
Dean, Bruce; Aronstein, David; Smith, Jeffrey
2013-01-01
A vectorized rebinning (down-sampling) algorithm, applicable to N-dimensional data sets, has been developed that offers a significant reduction in computer run time when compared to conventional rebinning algorithms. For clarity, a two-dimensional version of the algorithm is discussed to illustrate some specific details of the algorithm content, and using the language of image processing, 2D data will be referred to as "images," and each value in an image as a "pixel." The new approach is fully vectorized, i.e., the down-sampling procedure is done as a single step over all image rows, and then as a single step over all image columns. Data rebinning (or down-sampling) is a procedure that uses a discretely sampled N-dimensional data set to create a representation of the same data, but with fewer discrete samples. Such data down-sampling is fundamental to digital signal processing, e.g., for data compression applications.
Detecting chaos in irregularly sampled time series.
Kulp, C W
2013-09-01
Recently, Wiebe and Virgin [Chaos 22, 013136 (2012)] developed an algorithm which detects chaos by analyzing a time series' power spectrum which is computed using the Discrete Fourier Transform (DFT). Their algorithm, like other time series characterization algorithms, requires that the time series be regularly sampled. Real-world data, however, are often irregularly sampled, thus, making the detection of chaotic behavior difficult or impossible with those methods. In this paper, a characterization algorithm is presented, which effectively detects chaos in irregularly sampled time series. The work presented here is a modification of Wiebe and Virgin's algorithm and uses the Lomb-Scargle Periodogram (LSP) to compute a series' power spectrum instead of the DFT. The DFT is not appropriate for irregularly sampled time series. However, the LSP is capable of computing the frequency content of irregularly sampled data. Furthermore, a new method of analyzing the power spectrum is developed, which can be useful for differentiating between chaotic and non-chaotic behavior. The new characterization algorithm is successfully applied to irregularly sampled data generated by a model as well as data consisting of observations of variable stars.
Van Pamel, Anton; Brett, Colin R; Lowe, Michael J S
2014-12-01
Improving the ultrasound inspection capability for coarse-grained metals remains of longstanding interest and is expected to become increasingly important for next-generation electricity power plants. Conventional ultrasonic A-, B-, and C-scans have been found to suffer from strong background noise caused by grain scattering, which can severely limit the detection of defects. However, in recent years, array probes and full matrix capture (FMC) imaging algorithms have unlocked exciting possibilities for improvements. To improve and compare these algorithms, we must rely on robust methodologies to quantify their performance. This article proposes such a methodology to evaluate the detection performance of imaging algorithms. For illustration, the methodology is applied to some example data using three FMC imaging algorithms; total focusing method (TFM), phase-coherent imaging (PCI), and decomposition of the time-reversal operator with multiple scattering filter (DORT MSF). However, it is important to note that this is solely to illustrate the methodology; this article does not attempt the broader investigation of different cases that would be needed to compare the performance of these algorithms in general. The methodology considers the statistics of detection, presenting the detection performance as probability of detection (POD) and probability of false alarm (PFA). A test sample of coarse-grained nickel super alloy, manufactured to represent materials used for future power plant components and containing some simple artificial defects, is used to illustrate the method on the candidate algorithms. The data are captured in pulse-echo mode using 64-element array probes at center frequencies of 1 and 5 MHz. In this particular case, it turns out that all three algorithms are shown to perform very similarly when comparing their flaw detection capabilities.
Dias, Philipe A; Dunkel, Thiemo; Fajado, Diego A S; Gallegos, Erika de León; Denecke, Martin; Wiedemann, Philipp; Schneider, Fabio K; Suhr, Hajo
2016-06-11
In the activated sludge process, problems of filamentous bulking and foaming can occur due to overgrowth of certain filamentous bacteria. Nowadays, these microorganisms are typically monitored by means of light microscopy, commonly combined with staining techniques. As drawbacks, these methods are susceptible to human errors, subjectivity and limited by the use of discontinuous microscopy. The in situ microscope appears as a suitable tool for continuous monitoring of filamentous bacteria, providing real-time examination, automated analysis and eliminating sampling, preparation and transport of samples. In this context, a proper image processing algorithm is proposed for automated recognition and measurement of filamentous objects. This work introduces a method for real-time evaluation of images without any staining, phase-contrast or dilution techniques, differently from studies present in the literature. Moreover, we introduce an algorithm which estimates the total extended filament length based on geodesic distance calculation. For a period of twelve months, samples from an industrial activated sludge plant were weekly collected and imaged without any prior conditioning, replicating real environment conditions. Trends of filament growth rate-the most important parameter for decision making-are correctly identified. For reference images whose filaments were marked by specialists, the algorithm correctly recognized 72 % of the filaments pixels, with a false positive rate of at most 14 %. An average execution time of 0.7 s per image was achieved. Experiments have shown that the designed algorithm provided a suitable quantification of filaments when compared with human perception and standard methods. The algorithm's average execution time proved its suitability for being optimally mapped into a computational architecture to provide real-time monitoring.
The generalization ability of online SVM classification based on Markov sampling.
Xu, Jie; Yan Tang, Yuan; Zou, Bin; Xu, Zongben; Li, Luoqing; Lu, Yang
2015-03-01
In this paper, we consider online support vector machine (SVM) classification learning algorithms with uniformly ergodic Markov chain (u.e.M.c.) samples. We establish the bound on the misclassification error of an online SVM classification algorithm with u.e.M.c. samples based on reproducing kernel Hilbert spaces and obtain a satisfactory convergence rate. We also introduce a novel online SVM classification algorithm based on Markov sampling, and present the numerical studies on the learning ability of online SVM classification based on Markov sampling for benchmark repository. The numerical studies show that the learning performance of the online SVM classification algorithm based on Markov sampling is better than that of classical online SVM classification based on random sampling as the size of training samples is larger.
New multirate sampled-data control law structure and synthesis algorithm
NASA Technical Reports Server (NTRS)
Berg, Martin C.; Mason, Gregory S.; Yang, Gen-Sheng
1992-01-01
A new multirate sampled-data control law structure is defined and a new parameter-optimization-based synthesis algorithm for that structure is introduced. The synthesis algorithm can be applied to multirate, multiple-input/multiple-output, sampled-data control laws having a prescribed dynamic order and structure, and a priori specified sampling/update rates for all sensors, processor states, and control inputs. The synthesis algorithm is applied to design two-input, two-output tip position controllers of various dynamic orders for a sixth-order, two-link robot arm model.
Peng, Bo; Wang, Yuqi; Hall, Timothy J; Jiang, Jingfeng
2017-01-01
Our primary objective of this work was to extend a previously published 2D coupled sub-sample tracking algorithm for 3D speckle tracking in the framework of ultrasound breast strain elastography. In order to overcome heavy computational cost, we investigated the use of a graphic processing unit (GPU) to accelerate the 3D coupled sub-sample speckle tracking method. The performance of the proposed GPU implementation was tested using a tissue-mimicking (TM) phantom and in vivo breast ultrasound data. The performance of this 3D sub-sample tracking algorithm was compared with the conventional 3D quadratic sub-sample estimation algorithm. On the basis of these evaluations, we concluded that the GPU implementation of this 3D sub-sample estimation algorithm can provide high-quality strain data (i.e. high correlation between the pre- and the motion-compensated post-deformation RF echo data and high contrast-to-noise ratio strain images), as compared to the conventional 3D quadratic sub-sample algorithm. Using the GPU implementation of the 3D speckle tracking algorithm, volumetric strain data can be achieved relatively fast (approximately 20 seconds per volume [2.5 cm × 2.5 cm × 2.5 cm]). PMID:28166493
Samanipour, Saer; Baz-Lomba, Jose A; Alygizakis, Nikiforos A; Reid, Malcolm J; Thomaidis, Nikolaos S; Thomas, Kevin V
2017-06-09
LC-HR-QTOF-MS recently has become a commonly used approach for the analysis of complex samples. However, identification of small organic molecules in complex samples with the highest level of confidence is a challenging task. Here we report on the implementation of a two stage algorithm for LC-HR-QTOF-MS datasets. We compared the performances of the two stage algorithm, implemented via NIVA_MZ_Analyzer™, with two commonly used approaches (i.e. feature detection and XIC peak picking, implemented via UNIFI by Waters and TASQ by Bruker, respectively) for the suspect analysis of four influent wastewater samples. We first evaluated the cross platform compatibility of LC-HR-QTOF-MS datasets generated via instruments from two different manufacturers (i.e. Waters and Bruker). Our data showed that with an appropriate spectral weighting function the spectra recorded by the two tested instruments are comparable for our analytes. As a consequence, we were able to perform full spectral comparison between the data generated via the two studied instruments. Four extracts of wastewater influent were analyzed for 89 analytes, thus 356 detection cases. The analytes were divided into 158 detection cases of artificial suspect analytes (i.e. verified by target analysis) and 198 true suspects. The two stage algorithm resulted in a zero rate of false positive detection, based on the artificial suspect analytes while producing a rate of false negative detection of 0.12. For the conventional approaches, the rates of false positive detection varied between 0.06 for UNIFI and 0.15 for TASQ. The rates of false negative detection for these methods ranged between 0.07 for TASQ and 0.09 for UNIFI. The effect of background signal complexity on the two stage algorithm was evaluated through the generation of a synthetic signal. We further discuss the boundaries of applicability of the two stage algorithm. The importance of background knowledge and experience in evaluating the reliability of results during the suspect screening was evaluated. Copyright © 2017 Elsevier B.V. All rights reserved.
Prediction of carbonate rock type from NMR responses using data mining techniques
NASA Astrophysics Data System (ADS)
Gonçalves, Eduardo Corrêa; da Silva, Pablo Nascimento; Silveira, Carla Semiramis; Carneiro, Giovanna; Domingues, Ana Beatriz; Moss, Adam; Pritchard, Tim; Plastino, Alexandre; Azeredo, Rodrigo Bagueira de Vasconcellos
2017-05-01
Recent studies have indicated that the accurate identification of carbonate rock types in a reservoir can be employed as a preliminary step to enhance the effectiveness of petrophysical property modeling. Furthermore, rock typing activity has been shown to be of key importance in several steps of formation evaluation, such as the study of sedimentary series, reservoir zonation and well-to-well correlation. In this paper, a methodology based exclusively on the analysis of 1H-NMR (Nuclear Magnetic Resonance) relaxation responses - using data mining algorithms - is evaluated to perform the automatic classification of carbonate samples according to their rock type. We analyze the effectiveness of six different classification algorithms (k-NN, Naïve Bayes, C4.5, Random Forest, SMO and Multilayer Perceptron) and two data preprocessing strategies (discretization and feature selection). The dataset used in this evaluation is formed by 78 1H-NMR T2 distributions of fully brine-saturated rock samples from six different rock type classes. The experiments reveal that the combination of preprocessing strategies with classification algorithms is able to achieve a prediction accuracy of 97.4%.
NASA Astrophysics Data System (ADS)
Khan, Faisal; Enzmann, Frieder; Kersten, Michael
2016-03-01
Image processing of X-ray-computed polychromatic cone-beam micro-tomography (μXCT) data of geological samples mainly involves artefact reduction and phase segmentation. For the former, the main beam-hardening (BH) artefact is removed by applying a best-fit quadratic surface algorithm to a given image data set (reconstructed slice), which minimizes the BH offsets of the attenuation data points from that surface. A Matlab code for this approach is provided in the Appendix. The final BH-corrected image is extracted from the residual data or from the difference between the surface elevation values and the original grey-scale values. For the segmentation, we propose a novel least-squares support vector machine (LS-SVM, an algorithm for pixel-based multi-phase classification) approach. A receiver operating characteristic (ROC) analysis was performed on BH-corrected and uncorrected samples to show that BH correction is in fact an important prerequisite for accurate multi-phase classification. The combination of the two approaches was thus used to classify successfully three different more or less complex multi-phase rock core samples.
Shanks, Leslie; Siddiqui, M Ruby; Kliescikova, Jarmila; Pearce, Neil; Ariti, Cono; Muluneh, Libsework; Pirou, Erwan; Ritmeijer, Koert; Masiga, Johnson; Abebe, Almaz
2015-02-03
In Ethiopia a tiebreaker algorithm using 3 rapid diagnostic tests (RDTs) in series is used to diagnose HIV. Discordant results between the first 2 RDTs are resolved by a third 'tiebreaker' RDT. Médecins Sans Frontières uses an alternate serial algorithm of 2 RDTs followed by a confirmation test for all double positive RDT results. The primary objective was to compare the performance of the tiebreaker algorithm with a serial algorithm, and to evaluate the addition of a confirmation test to both algorithms. A secondary objective looked at the positive predictive value (PPV) of weakly reactive test lines. The study was conducted in two HIV testing sites in Ethiopia. Study participants were recruited sequentially until 200 positive samples were reached. Each sample was re-tested in the laboratory on the 3 RDTs and on a simple to use confirmation test, the Orgenics Immunocomb Combfirm® (OIC). The gold standard test was the Western Blot, with indeterminate results resolved by PCR testing. 2620 subjects were included with a HIV prevalence of 7.7%. Each of the 3 RDTs had an individual specificity of at least 99%. The serial algorithm with 2 RDTs had a single false positive result (1 out of 204) to give a PPV of 99.5% (95% CI 97.3%-100%). The tiebreaker algorithm resulted in 16 false positive results (PPV 92.7%, 95% CI: 88.4%-95.8%). Adding the OIC confirmation test to either algorithm eliminated the false positives. All the false positives had at least one weakly reactive test line in the algorithm. The PPV of weakly reacting RDTs was significantly lower than those with strongly positive test lines. The risk of false positive HIV diagnosis in a tiebreaker algorithm is significant. We recommend abandoning the tie-breaker algorithm in favour of WHO recommended serial or parallel algorithms, interpreting weakly reactive test lines as indeterminate results requiring further testing except in the setting of blood transfusion, and most importantly, adding a confirmation test to the RDT algorithm. It is now time to focus research efforts on how best to translate this knowledge into practice at the field level. Clinical Trial registration #: NCT01716299.
Maintaining and Enhancing Diversity of Sampled Protein Conformations in Robotics-Inspired Methods.
Abella, Jayvee R; Moll, Mark; Kavraki, Lydia E
2018-01-01
The ability to efficiently sample structurally diverse protein conformations allows one to gain a high-level view of a protein's energy landscape. Algorithms from robot motion planning have been used for conformational sampling, and several of these algorithms promote diversity by keeping track of "coverage" in conformational space based on the local sampling density. However, large proteins present special challenges. In particular, larger systems require running many concurrent instances of these algorithms, but these algorithms can quickly become memory intensive because they typically keep previously sampled conformations in memory to maintain coverage estimates. In addition, robotics-inspired algorithms depend on defining useful perturbation strategies for exploring the conformational space, which is a difficult task for large proteins because such systems are typically more constrained and exhibit complex motions. In this article, we introduce two methodologies for maintaining and enhancing diversity in robotics-inspired conformational sampling. The first method addresses algorithms based on coverage estimates and leverages the use of a low-dimensional projection to define a global coverage grid that maintains coverage across concurrent runs of sampling. The second method is an automatic definition of a perturbation strategy through readily available flexibility information derived from B-factors, secondary structure, and rigidity analysis. Our results show a significant increase in the diversity of the conformations sampled for proteins consisting of up to 500 residues when applied to a specific robotics-inspired algorithm for conformational sampling. The methodologies presented in this article may be vital components for the scalability of robotics-inspired approaches.
An algorithmic approach to crustal deformation analysis
NASA Technical Reports Server (NTRS)
Iz, Huseyin Baki
1987-01-01
In recent years the analysis of crustal deformation measurements has become important as a result of current improvements in geodetic methods and an increasing amount of theoretical and observational data provided by several earth sciences. A first-generation data analysis algorithm which combines a priori information with current geodetic measurements was proposed. Relevant methods which can be used in the algorithm were discussed. Prior information is the unifying feature of this algorithm. Some of the problems which may arise through the use of a priori information in the analysis were indicated and preventive measures were demonstrated. The first step in the algorithm is the optimal design of deformation networks. The second step in the algorithm identifies the descriptive model of the deformation field. The final step in the algorithm is the improved estimation of deformation parameters. Although deformation parameters are estimated in the process of model discrimination, they can further be improved by the use of a priori information about them. According to the proposed algorithm this information must first be tested against the estimates calculated using the sample data only. Null-hypothesis testing procedures were developed for this purpose. Six different estimators which employ a priori information were examined. Emphasis was put on the case when the prior information is wrong and analytical expressions for possible improvements under incompatible prior information were derived.
Droplet Image Super Resolution Based on Sparse Representation and Kernel Regression
NASA Astrophysics Data System (ADS)
Zou, Zhenzhen; Luo, Xinghong; Yu, Qiang
2018-02-01
Microgravity and containerless conditions, which are produced via electrostatic levitation combined with a drop tube, are important when studying the intrinsic properties of new metastable materials. Generally, temperature and image sensors can be used to measure the changes of sample temperature, morphology and volume. Then, the specific heat, surface tension, viscosity changes and sample density can be obtained. Considering that the falling speed of the material sample droplet is approximately 31.3 m/s when it reaches the bottom of a 50-meter-high drop tube, a high-speed camera with a collection rate of up to 106 frames/s is required to image the falling droplet. However, at the high-speed mode, very few pixels, approximately 48-120, will be obtained in each exposure time, which results in low image quality. Super-resolution image reconstruction is an algorithm that provides finer details than the sampling grid of a given imaging device by increasing the number of pixels per unit area in the image. In this work, we demonstrate the application of single image-resolution reconstruction in the microgravity and electrostatic levitation for the first time. Here, using the image super-resolution method based on sparse representation, a low-resolution droplet image can be reconstructed. Employed Yang's related dictionary model, high- and low-resolution image patches were combined with dictionary training, and high- and low-resolution-related dictionaries were obtained. The online double-sparse dictionary training algorithm was used in the study of related dictionaries and overcome the shortcomings of the traditional training algorithm with small image patch. During the stage of image reconstruction, the algorithm of kernel regression is added, which effectively overcomes the shortcomings of the Yang image's edge blurs.
Droplet Image Super Resolution Based on Sparse Representation and Kernel Regression
NASA Astrophysics Data System (ADS)
Zou, Zhenzhen; Luo, Xinghong; Yu, Qiang
2018-05-01
Microgravity and containerless conditions, which are produced via electrostatic levitation combined with a drop tube, are important when studying the intrinsic properties of new metastable materials. Generally, temperature and image sensors can be used to measure the changes of sample temperature, morphology and volume. Then, the specific heat, surface tension, viscosity changes and sample density can be obtained. Considering that the falling speed of the material sample droplet is approximately 31.3 m/s when it reaches the bottom of a 50-meter-high drop tube, a high-speed camera with a collection rate of up to 106 frames/s is required to image the falling droplet. However, at the high-speed mode, very few pixels, approximately 48-120, will be obtained in each exposure time, which results in low image quality. Super-resolution image reconstruction is an algorithm that provides finer details than the sampling grid of a given imaging device by increasing the number of pixels per unit area in the image. In this work, we demonstrate the application of single image-resolution reconstruction in the microgravity and electrostatic levitation for the first time. Here, using the image super-resolution method based on sparse representation, a low-resolution droplet image can be reconstructed. Employed Yang's related dictionary model, high- and low-resolution image patches were combined with dictionary training, and high- and low-resolution-related dictionaries were obtained. The online double-sparse dictionary training algorithm was used in the study of related dictionaries and overcome the shortcomings of the traditional training algorithm with small image patch. During the stage of image reconstruction, the algorithm of kernel regression is added, which effectively overcomes the shortcomings of the Yang image's edge blurs.
Guided filter and convolutional network based tracking for infrared dim moving target
NASA Astrophysics Data System (ADS)
Qian, Kun; Zhou, Huixin; Qin, Hanlin; Rong, Shenghui; Zhao, Dong; Du, Juan
2017-09-01
The dim moving target usually submerges in strong noise, and its motion observability is debased by numerous false alarms for low signal-to-noise ratio. A tracking algorithm that integrates the Guided Image Filter (GIF) and the Convolutional neural network (CNN) into the particle filter framework is presented to cope with the uncertainty of dim targets. First, the initial target template is treated as a guidance to filter incoming templates depending on similarities between the guidance and candidate templates. The GIF algorithm utilizes the structure in the guidance and performs as an edge-preserving smoothing operator. Therefore, the guidance helps to preserve the detail of valuable templates and makes inaccurate ones blurry, alleviating the tracking deviation effectively. Besides, the two-layer CNN method is adopted to obtain a powerful appearance representation. Subsequently, a Bayesian classifier is trained with these discriminative yet strong features. Moreover, an adaptive learning factor is introduced to prevent the update of classifier's parameters when a target undergoes sever background. At last, classifier responses of particles are utilized to generate particle importance weights and a re-sample procedure preserves samples according to the weight. In the predication stage, a 2-order transition model considers the target velocity to estimate current position. Experimental results demonstrate that the presented algorithm outperforms several relative algorithms in the accuracy.
Color image analysis technique for measuring of fat in meat: an application for the meat industry
NASA Astrophysics Data System (ADS)
Ballerini, Lucia; Hogberg, Anders; Lundstrom, Kerstin; Borgefors, Gunilla
2001-04-01
Intramuscular fat content in meat influences some important meat quality characteristics. The aim of the present study was to develop and apply image processing techniques to quantify intramuscular fat content in beefs together with the visual appearance of fat in meat (marbling). Color images of M. longissimus dorsi meat samples with a variability of intramuscular fat content and marbling were captured. Image analysis software was specially developed for the interpretation of these images. In particular, a segmentation algorithm (i.e. classification of different substances: fat, muscle and connective tissue) was optimized in order to obtain a proper classification and perform subsequent analysis. Segmentation of muscle from fat was achieved based on their characteristics in the 3D color space, and on the intrinsic fuzzy nature of these structures. The method is fully automatic and it combines a fuzzy clustering algorithm, the Fuzzy c-Means Algorithm, with a Genetic Algorithm. The percentages of various colors (i.e. substances) within the sample are then determined; the number, size distribution, and spatial distributions of the extracted fat flecks are measured. Measurements are correlated with chemical and sensory properties. Results so far show that advanced image analysis is useful for quantify the visual appearance of meat.
Kotrri, Gynter; Fusch, Gerhard; Kwan, Celia; Choi, Dasol; Choi, Arum; Al Kafi, Nisreen; Rochow, Niels; Fusch, Christoph
2016-02-26
Commercial infrared (IR) milk analyzers are being increasingly used in research settings for the macronutrient measurement of breast milk (BM) prior to its target fortification. These devices, however, may not provide reliable measurement if not properly calibrated. In the current study, we tested a correction algorithm for a Near-IR milk analyzer (Unity SpectraStar, Brookfield, CT, USA) for fat and protein measurements, and examined the effect of pasteurization on the IR matrix and the stability of fat, protein, and lactose. Measurement values generated through Near-IR analysis were compared against those obtained through chemical reference methods to test the correction algorithm for the Near-IR milk analyzer. Macronutrient levels were compared between unpasteurized and pasteurized milk samples to determine the effect of pasteurization on macronutrient stability. The correction algorithm generated for our device was found to be valid for unpasteurized and pasteurized BM. Pasteurization had no effect on the macronutrient levels and the IR matrix of BM. These results show that fat and protein content can be accurately measured and monitored for unpasteurized and pasteurized BM. Of additional importance is the implication that donated human milk, generally low in protein content, has the potential to be target fortified.
Kotrri, Gynter; Fusch, Gerhard; Kwan, Celia; Choi, Dasol; Choi, Arum; Al Kafi, Nisreen; Rochow, Niels; Fusch, Christoph
2016-01-01
Commercial infrared (IR) milk analyzers are being increasingly used in research settings for the macronutrient measurement of breast milk (BM) prior to its target fortification. These devices, however, may not provide reliable measurement if not properly calibrated. In the current study, we tested a correction algorithm for a Near-IR milk analyzer (Unity SpectraStar, Brookfield, CT, USA) for fat and protein measurements, and examined the effect of pasteurization on the IR matrix and the stability of fat, protein, and lactose. Measurement values generated through Near-IR analysis were compared against those obtained through chemical reference methods to test the correction algorithm for the Near-IR milk analyzer. Macronutrient levels were compared between unpasteurized and pasteurized milk samples to determine the effect of pasteurization on macronutrient stability. The correction algorithm generated for our device was found to be valid for unpasteurized and pasteurized BM. Pasteurization had no effect on the macronutrient levels and the IR matrix of BM. These results show that fat and protein content can be accurately measured and monitored for unpasteurized and pasteurized BM. Of additional importance is the implication that donated human milk, generally low in protein content, has the potential to be target fortified. PMID:26927169
Abnormal global and local event detection in compressive sensing domain
NASA Astrophysics Data System (ADS)
Wang, Tian; Qiao, Meina; Chen, Jie; Wang, Chuanyun; Zhang, Wenjia; Snoussi, Hichem
2018-05-01
Abnormal event detection, also known as anomaly detection, is one challenging task in security video surveillance. It is important to develop effective and robust movement representation models for global and local abnormal event detection to fight against factors such as occlusion and illumination change. In this paper, a new algorithm is proposed. It can locate the abnormal events on one frame, and detect the global abnormal frame. The proposed algorithm employs a sparse measurement matrix designed to represent the movement feature based on optical flow efficiently. Then, the abnormal detection mission is constructed as a one-class classification task via merely learning from the training normal samples. Experiments demonstrate that our algorithm performs well on the benchmark abnormal detection datasets against state-of-the-art methods.
Error analysis of stochastic gradient descent ranking.
Chen, Hong; Tang, Yi; Li, Luoqing; Yuan, Yuan; Li, Xuelong; Tang, Yuanyan
2013-06-01
Ranking is always an important task in machine learning and information retrieval, e.g., collaborative filtering, recommender systems, drug discovery, etc. A kernel-based stochastic gradient descent algorithm with the least squares loss is proposed for ranking in this paper. The implementation of this algorithm is simple, and an expression of the solution is derived via a sampling operator and an integral operator. An explicit convergence rate for leaning a ranking function is given in terms of the suitable choices of the step size and the regularization parameter. The analysis technique used here is capacity independent and is novel in error analysis of ranking learning. Experimental results on real-world data have shown the effectiveness of the proposed algorithm in ranking tasks, which verifies the theoretical analysis in ranking error.
Optimal Budget Allocation for Sample Average Approximation
2011-06-01
an optimization algorithm applied to the sample average problem. We examine the convergence rate of the estimator as the computing budget tends to...regime for the optimization algorithm . 1 Introduction Sample average approximation (SAA) is a frequently used approach to solving stochastic programs...appealing due to its simplicity and the fact that a large number of standard optimization algorithms are often available to optimize the resulting sample
Automated Rock Identification for Future Mars Exploration Missions
NASA Technical Reports Server (NTRS)
Gulick, V. C.; Morris, R. L.; Gazis, P.; Bishop, J. L.; Alena, R.; Hart, S. D.; Horton, A.
2003-01-01
A key task for human or robotic explorers on the surface of Mars is choosing which particular rock or mineral samples should be selected for more intensive study. The usual challenges of such a task are compounded by the lack of sensory input available to a suited astronaut or the limited downlink bandwidth available to a rover. Additional challenges facing a human mission include limited surface time and the similarities in appearance of important minerals (e.g. carbonates, silicates, salts). Yet the choice of which sample to collect is critical. To address this challenge we are developing science analysis algorithms to interface with a Geologist's Field Assistant (GFA) device that will allow robotic or human remote explorers to better sense and explore their surroundings during limited surface excursions. We aim for our algorithms to interpret spectral and imaging data obtained by various sensors. The algorithms, for example, will identify key minerals, rocks, and sediments from mid-IR, Raman, and visible/near-IR spectra as well as from high resolution and microscopic images to help interpret data and to provide high-level advice to the remote explorer. A top-level system will consider multiple inputs from raw sensor data output by imagers and spectrometers (visible/near-IR, mid-IR, and Raman) as well as human opinion to identify rock and mineral samples.
NASA Astrophysics Data System (ADS)
Rajaona, Harizo; Septier, François; Armand, Patrick; Delignon, Yves; Olry, Christophe; Albergel, Armand; Moussafir, Jacques
2015-12-01
In the eventuality of an accidental or intentional atmospheric release, the reconstruction of the source term using measurements from a set of sensors is an important and challenging inverse problem. A rapid and accurate estimation of the source allows faster and more efficient action for first-response teams, in addition to providing better damage assessment. This paper presents a Bayesian probabilistic approach to estimate the location and the temporal emission profile of a pointwise source. The release rate is evaluated analytically by using a Gaussian assumption on its prior distribution, and is enhanced with a positivity constraint to improve the estimation. The source location is obtained by the means of an advanced iterative Monte-Carlo technique called Adaptive Multiple Importance Sampling (AMIS), which uses a recycling process at each iteration to accelerate its convergence. The proposed methodology is tested using synthetic and real concentration data in the framework of the Fusion Field Trials 2007 (FFT-07) experiment. The quality of the obtained results is comparable to those coming from the Markov Chain Monte Carlo (MCMC) algorithm, a popular Bayesian method used for source estimation. Moreover, the adaptive processing of the AMIS provides a better sampling efficiency by reusing all the generated samples.
Moura, Lidia M V R; Price, Maggie; Cole, Andrew J; Hoch, Daniel B; Hsu, John
2017-04-01
To evaluate published algorithms for the identification of epilepsy cases in medical claims data using a unique linked dataset with both clinical and claims data. Using data from a large, regional health delivery system, we identified all patients contributing biologic samples to the health system's Biobank (n = 36K). We identified all subjects with at least one diagnosis potentially consistent with epilepsy, for example, epilepsy, convulsions, syncope, or collapse, between 2014 and 2015, or who were seen at the epilepsy clinic (n = 1,217), plus a random sample of subjects with neither claims nor clinic visits (n = 435); we then performed a medical chart review in a random subsample of 1,377 to assess the epilepsy diagnosis status. Using the chart review as the reference standard, we evaluated the test characteristics of six published algorithms. The best-performing algorithm used diagnostic and prescription drug data (sensitivity = 70%, 95% confidence interval [CI] 66-73%; specificity = 77%, 95% CI 73-81%; and area under the curve [AUC] = 0.73, 95%CI 0.71-0.76) when applied to patients age 18 years or older. Restricting the sample to adults aged 18-64 years resulted in a mild improvement in accuracy (AUC = 0.75,95%CI 0.73-0.78). Adding information about current antiepileptic drug use to the algorithm increased test performance (AUC = 0.78, 95%CI 0.76-0.80). Other algorithms varied in their included data types and performed worse. Current approaches for identifying patients with epilepsy in insurance claims have important limitations when applied to the general population. Approaches incorporating a range of information, for example, diagnoses, treatments, and site of care/specialty of physician, improve the performance of identification and could be useful in epilepsy studies using large datasets. Wiley Periodicals, Inc. © 2017 International League Against Epilepsy.
Research on Abnormal Detection Based on Improved Combination of K - means and SVDD
NASA Astrophysics Data System (ADS)
Hao, Xiaohong; Zhang, Xiaofeng
2018-01-01
In order to improve the efficiency of network intrusion detection and reduce the false alarm rate, this paper proposes an anomaly detection algorithm based on improved K-means and SVDD. The algorithm first uses the improved K-means algorithm to cluster the training samples of each class, so that each class is independent and compact in class; Then, according to the training samples, the SVDD algorithm is used to construct the minimum superspheres. The subordinate relationship of the samples is determined by calculating the distance of the minimum superspheres constructed by SVDD. If the test sample is less than the center of the hypersphere, the test sample belongs to this class, otherwise it does not belong to this class, after several comparisons, the final test of the effective detection of the test sample.In this paper, we use KDD CUP99 data set to simulate the proposed anomaly detection algorithm. The results show that the algorithm has high detection rate and low false alarm rate, which is an effective network security protection method.
Image reconstructions from super-sampled data sets with resolution modeling in PET imaging.
Li, Yusheng; Matej, Samuel; Metzler, Scott D
2014-12-01
Spatial resolution in positron emission tomography (PET) is still a limiting factor in many imaging applications. To improve the spatial resolution for an existing scanner with fixed crystal sizes, mechanical movements such as scanner wobbling and object shifting have been considered for PET systems. Multiple acquisitions from different positions can provide complementary information and increased spatial sampling. The objective of this paper is to explore an efficient and useful reconstruction framework to reconstruct super-resolution images from super-sampled low-resolution data sets. The authors introduce a super-sampling data acquisition model based on the physical processes with tomographic, downsampling, and shifting matrices as its building blocks. Based on the model, we extend the MLEM and Landweber algorithms to reconstruct images from super-sampled data sets. The authors also derive a backprojection-filtration-like (BPF-like) method for the super-sampling reconstruction. Furthermore, they explore variant methods for super-sampling reconstructions: the separate super-sampling resolution-modeling reconstruction and the reconstruction without downsampling to further improve image quality at the cost of more computation. The authors use simulated reconstruction of a resolution phantom to evaluate the three types of algorithms with different super-samplings at different count levels. Contrast recovery coefficient (CRC) versus background variability, as an image-quality metric, is calculated at each iteration for all reconstructions. The authors observe that all three algorithms can significantly and consistently achieve increased CRCs at fixed background variability and reduce background artifacts with super-sampled data sets at the same count levels. For the same super-sampled data sets, the MLEM method achieves better image quality than the Landweber method, which in turn achieves better image quality than the BPF-like method. The authors also demonstrate that the reconstructions from super-sampled data sets using a fine system matrix yield improved image quality compared to the reconstructions using a coarse system matrix. Super-sampling reconstructions with different count levels showed that the more spatial-resolution improvement can be obtained with higher count at a larger iteration number. The authors developed a super-sampling reconstruction framework that can reconstruct super-resolution images using the super-sampling data sets simultaneously with known acquisition motion. The super-sampling PET acquisition using the proposed algorithms provides an effective and economic way to improve image quality for PET imaging, which has an important implication in preclinical and clinical region-of-interest PET imaging applications.
Equilibrium Sampling in Biomolecular Simulation
2015-01-01
Equilibrium sampling of biomolecules remains an unmet challenge after more than 30 years of atomistic simulation. Efforts to enhance sampling capability, which are reviewed here, range from the development of new algorithms to parallelization to novel uses of hardware. Special focus is placed on classifying algorithms — most of which are underpinned by a few key ideas — in order to understand their fundamental strengths and limitations. Although algorithms have proliferated, progress resulting from novel hardware use appears to be more clear-cut than from algorithms alone, partly due to the lack of widely used sampling measures. PMID:21370970
Active learning for semi-supervised clustering based on locally linear propagation reconstruction.
Chang, Chin-Chun; Lin, Po-Yi
2015-03-01
The success of semi-supervised clustering relies on the effectiveness of side information. To get effective side information, a new active learner learning pairwise constraints known as must-link and cannot-link constraints is proposed in this paper. Three novel techniques are developed for learning effective pairwise constraints. The first technique is used to identify samples less important to cluster structures. This technique makes use of a kernel version of locally linear embedding for manifold learning. Samples neither important to locally linear propagation reconstructions of other samples nor on flat patches in the learned manifold are regarded as unimportant samples. The second is a novel criterion for query selection. This criterion considers not only the importance of a sample to expanding the space coverage of the learned samples but also the expected number of queries needed to learn the sample. To facilitate semi-supervised clustering, the third technique yields inferred must-links for passing information about flat patches in the learned manifold to semi-supervised clustering algorithms. Experimental results have shown that the learned pairwise constraints can capture the underlying cluster structures and proven the feasibility of the proposed approach. Copyright © 2014 Elsevier Ltd. All rights reserved.
Biomedical sensor design using analog compressed sensing
NASA Astrophysics Data System (ADS)
Balouchestani, Mohammadreza; Krishnan, Sridhar
2015-05-01
The main drawback of current healthcare systems is the location-specific nature of the system due to the use of fixed/wired biomedical sensors. Since biomedical sensors are usually driven by a battery, power consumption is the most important factor determining the life of a biomedical sensor. They are also restricted by size, cost, and transmission capacity. Therefore, it is important to reduce the load of sampling by merging the sampling and compression steps to reduce the storage usage, transmission times, and power consumption in order to expand the current healthcare systems to Wireless Healthcare Systems (WHSs). In this work, we present an implementation of a low-power biomedical sensor using analog Compressed Sensing (CS) framework for sparse biomedical signals that addresses both the energy and telemetry bandwidth constraints of wearable and wireless Body-Area Networks (BANs). This architecture enables continuous data acquisition and compression of biomedical signals that are suitable for a variety of diagnostic and treatment purposes. At the transmitter side, an analog-CS framework is applied at the sensing step before Analog to Digital Converter (ADC) in order to generate the compressed version of the input analog bio-signal. At the receiver side, a reconstruction algorithm based on Restricted Isometry Property (RIP) condition is applied in order to reconstruct the original bio-signals form the compressed bio-signals with high probability and enough accuracy. We examine the proposed algorithm with healthy and neuropathy surface Electromyography (sEMG) signals. The proposed algorithm achieves a good level for Average Recognition Rate (ARR) at 93% and reconstruction accuracy at 98.9%. In addition, The proposed architecture reduces total computation time from 32 to 11.5 seconds at sampling-rate=29 % of Nyquist rate, Percentage Residual Difference (PRD)=26 %, Root Mean Squared Error (RMSE)=3 %.
Sampling ARG of multiple populations under complex configurations of subdivision and admixture.
Carrieri, Anna Paola; Utro, Filippo; Parida, Laxmi
2016-04-01
Simulating complex evolution scenarios of multiple populations is an important task for answering many basic questions relating to population genomics. Apart from the population samples, the underlying Ancestral Recombinations Graph (ARG) is an additional important means in hypothesis checking and reconstruction studies. Furthermore, complex simulations require a plethora of interdependent parameters making even the scenario-specification highly non-trivial. We present an algorithm SimRA that simulates generic multiple population evolution model with admixture. It is based on random graphs that improve dramatically in time and space requirements of the classical algorithm of single populations.Using the underlying random graphs model, we also derive closed forms of expected values of the ARG characteristics i.e., height of the graph, number of recombinations, number of mutations and population diversity in terms of its defining parameters. This is crucial in aiding the user to specify meaningful parameters for the complex scenario simulations, not through trial-and-error based on raw compute power but intelligent parameter estimation. To the best of our knowledge this is the first time closed form expressions have been computed for the ARG properties. We show that the expected values closely match the empirical values through simulations.Finally, we demonstrate that SimRA produces the ARG in compact forms without compromising any accuracy. We demonstrate the compactness and accuracy through extensive experiments. SimRA (Simulation based on Random graph Algorithms) source, executable, user manual and sample input-output sets are available for downloading at: https://github.com/ComputationalGenomics/SimRA CONTACT: : parida@us.ibm.com Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
A sample implementation for parallelizing Divide-and-Conquer algorithms on the GPU.
Mei, Gang; Zhang, Jiayin; Xu, Nengxiong; Zhao, Kunyang
2018-01-01
The strategy of Divide-and-Conquer (D&C) is one of the frequently used programming patterns to design efficient algorithms in computer science, which has been parallelized on shared memory systems and distributed memory systems. Tzeng and Owens specifically developed a generic paradigm for parallelizing D&C algorithms on modern Graphics Processing Units (GPUs). In this paper, by following the generic paradigm proposed by Tzeng and Owens, we provide a new and publicly available GPU implementation of the famous D&C algorithm, QuickHull, to give a sample and guide for parallelizing D&C algorithms on the GPU. The experimental results demonstrate the practicality of our sample GPU implementation. Our research objective in this paper is to present a sample GPU implementation of a classical D&C algorithm to help interested readers to develop their own efficient GPU implementations with fewer efforts.
Emad, Amin; Milenkovic, Olgica
2014-01-01
We introduce a novel algorithm for inference of causal gene interactions, termed CaSPIAN (Causal Subspace Pursuit for Inference and Analysis of Networks), which is based on coupling compressive sensing and Granger causality techniques. The core of the approach is to discover sparse linear dependencies between shifted time series of gene expressions using a sequential list-version of the subspace pursuit reconstruction algorithm and to estimate the direction of gene interactions via Granger-type elimination. The method is conceptually simple and computationally efficient, and it allows for dealing with noisy measurements. Its performance as a stand-alone platform without biological side-information was tested on simulated networks, on the synthetic IRMA network in Saccharomyces cerevisiae, and on data pertaining to the human HeLa cell network and the SOS network in E. coli. The results produced by CaSPIAN are compared to the results of several related algorithms, demonstrating significant improvements in inference accuracy of documented interactions. These findings highlight the importance of Granger causality techniques for reducing the number of false-positives, as well as the influence of noise and sampling period on the accuracy of the estimates. In addition, the performance of the method was tested in conjunction with biological side information of the form of sparse “scaffold networks”, to which new edges were added using available RNA-seq or microarray data. These biological priors aid in increasing the sensitivity and precision of the algorithm in the small sample regime. PMID:24622336
Performance evaluation of an importance sampling technique in a Jackson network
NASA Astrophysics Data System (ADS)
brahim Mahdipour, E.; Masoud Rahmani, Amir; Setayeshi, Saeed
2014-03-01
Importance sampling is a technique that is commonly used to speed up Monte Carlo simulation of rare events. However, little is known regarding the design of efficient importance sampling algorithms in the context of queueing networks. The standard approach, which simulates the system using an a priori fixed change of measure suggested by large deviation analysis, has been shown to fail in even the simplest network settings. Estimating probabilities associated with rare events has been a topic of great importance in queueing theory, and in applied probability at large. In this article, we analyse the performance of an importance sampling estimator for a rare event probability in a Jackson network. This article carries out strict deadlines to a two-node Jackson network with feedback whose arrival and service rates are modulated by an exogenous finite state Markov process. We have estimated the probability of network blocking for various sets of parameters, and also the probability of missing the deadline of customers for different loads and deadlines. We have finally shown that the probability of total population overflow may be affected by various deadline values, service rates and arrival rates.
Mutation Clusters from Cancer Exome.
Kakushadze, Zura; Yu, Willie
2017-08-15
We apply our statistically deterministic machine learning/clustering algorithm *K-means (recently developed in https://ssrn.com/abstract=2908286) to 10,656 published exome samples for 32 cancer types. A majority of cancer types exhibit a mutation clustering structure. Our results are in-sample stable. They are also out-of-sample stable when applied to 1389 published genome samples across 14 cancer types. In contrast, we find in- and out-of-sample instabilities in cancer signatures extracted from exome samples via nonnegative matrix factorization (NMF), a computationally-costly and non-deterministic method. Extracting stable mutation structures from exome data could have important implications for speed and cost, which are critical for early-stage cancer diagnostics, such as novel blood-test methods currently in development.
Mutation Clusters from Cancer Exome
Kakushadze, Zura; Yu, Willie
2017-01-01
We apply our statistically deterministic machine learning/clustering algorithm *K-means (recently developed in https://ssrn.com/abstract=2908286) to 10,656 published exome samples for 32 cancer types. A majority of cancer types exhibit a mutation clustering structure. Our results are in-sample stable. They are also out-of-sample stable when applied to 1389 published genome samples across 14 cancer types. In contrast, we find in- and out-of-sample instabilities in cancer signatures extracted from exome samples via nonnegative matrix factorization (NMF), a computationally-costly and non-deterministic method. Extracting stable mutation structures from exome data could have important implications for speed and cost, which are critical for early-stage cancer diagnostics, such as novel blood-test methods currently in development. PMID:28809811
Analytical Algorithms to Quantify the Uncertainty in Remaining Useful Life Prediction
NASA Technical Reports Server (NTRS)
Sankararaman, Shankar; Saxena, Abhinav; Daigle, Matthew; Goebel, Kai
2013-01-01
This paper investigates the use of analytical algorithms to quantify the uncertainty in the remaining useful life (RUL) estimate of components used in aerospace applications. The prediction of RUL is affected by several sources of uncertainty and it is important to systematically quantify their combined effect by computing the uncertainty in the RUL prediction in order to aid risk assessment, risk mitigation, and decisionmaking. While sampling-based algorithms have been conventionally used for quantifying the uncertainty in RUL, analytical algorithms are computationally cheaper and sometimes, are better suited for online decision-making. While exact analytical algorithms are available only for certain special cases (for e.g., linear models with Gaussian variables), effective approximations can be made using the the first-order second moment method (FOSM), the first-order reliability method (FORM), and the inverse first-order reliability method (Inverse FORM). These methods can be used not only to calculate the entire probability distribution of RUL but also to obtain probability bounds on RUL. This paper explains these three methods in detail and illustrates them using the state-space model of a lithium-ion battery.
A solution quality assessment method for swarm intelligence optimization algorithms.
Zhang, Zhaojun; Wang, Gai-Ge; Zou, Kuansheng; Zhang, Jianhua
2014-01-01
Nowadays, swarm intelligence optimization has become an important optimization tool and wildly used in many fields of application. In contrast to many successful applications, the theoretical foundation is rather weak. Therefore, there are still many problems to be solved. One problem is how to quantify the performance of algorithm in finite time, that is, how to evaluate the solution quality got by algorithm for practical problems. It greatly limits the application in practical problems. A solution quality assessment method for intelligent optimization is proposed in this paper. It is an experimental analysis method based on the analysis of search space and characteristic of algorithm itself. Instead of "value performance," the "ordinal performance" is used as evaluation criteria in this method. The feasible solutions were clustered according to distance to divide solution samples into several parts. Then, solution space and "good enough" set can be decomposed based on the clustering results. Last, using relative knowledge of statistics, the evaluation result can be got. To validate the proposed method, some intelligent algorithms such as ant colony optimization (ACO), particle swarm optimization (PSO), and artificial fish swarm algorithm (AFS) were taken to solve traveling salesman problem. Computational results indicate the feasibility of proposed method.
A physics-based algorithm for the estimation of bearing spall width using vibrations
NASA Astrophysics Data System (ADS)
Kogan, G.; Klein, R.; Bortman, J.
2018-05-01
Evaluation of the damage severity in a mechanical system is required for the assessment of its remaining useful life. In rotating machines, bearings are crucial components. Hence, the estimation of the size of spalls in bearings is important for prognostics of the remaining useful life. Recently, this topic has been extensively studied and many of the methods used for the estimation of spall size are based on the analysis of vibrations. A new tool is proposed in the current study for the estimation of the spall width on the outer ring raceway of a rolling element bearing. The understanding and analysis of the dynamics of the rolling element-spall interaction enabled the development of a generic and autonomous algorithm. The algorithm is generic in the sense that it does not require any human interference to make adjustments for each case. All of the algorithm's parameters are defined by analytical expressions describing the dynamics of the system. The required conditions, such as sampling rate, spall width and depth, defining the feasible region of such algorithms, are analyzed in the paper. The algorithm performance was demonstrated with experimental data for different spall widths.
Improved argument-FFT frequency offset estimation for QPSK coherent optical Systems
NASA Astrophysics Data System (ADS)
Han, Jilong; Li, Wei; Yuan, Zhilin; Li, Haitao; Huang, Liyan; Hu, Qianggao
2016-02-01
A frequency offset estimation (FOE) algorithm based on fast Fourier transform (FFT) of the signal's argument is investigated, which does not require removing the modulated data phase. In this paper, we analyze the flaw of the argument-FFT algorithm and propose a combined FOE algorithm, in which the absolute of frequency offset (FO) is accurately calculated by argument-FFT algorithm with a relatively large number of samples and the sign of FO is determined by FFT-based interpolation discrete Fourier transformation (DFT) algorithm with a relatively small number of samples. Compared with the previous algorithms based on argument-FFT, the proposed one has low complexity and can still effectively work with a relatively less number of samples.
System and method for resolving gamma-ray spectra
Gentile, Charles A.; Perry, Jason; Langish, Stephen W.; Silber, Kenneth; Davis, William M.; Mastrovito, Dana
2010-05-04
A system for identifying radionuclide emissions is described. The system includes at least one processor for processing output signals from a radionuclide detecting device, at least one training algorithm run by the at least one processor for analyzing data derived from at least one set of known sample data from the output signals, at least one classification algorithm derived from the training algorithm for classifying unknown sample data, wherein the at least one training algorithm analyzes the at least one sample data set to derive at least one rule used by said classification algorithm for identifying at least one radionuclide emission detected by the detecting device.
Accelerated Optical Projection Tomography Applied to In Vivo Imaging of Zebrafish
Correia, Teresa; Yin, Jun; Ramel, Marie-Christine; Andrews, Natalie; Katan, Matilda; Bugeon, Laurence; Dallman, Margaret J.; McGinty, James; Frankel, Paul; French, Paul M. W.; Arridge, Simon
2015-01-01
Optical projection tomography (OPT) provides a non-invasive 3-D imaging modality that can be applied to longitudinal studies of live disease models, including in zebrafish. Current limitations include the requirement of a minimum number of angular projections for reconstruction of reasonable OPT images using filtered back projection (FBP), which is typically several hundred, leading to acquisition times of several minutes. It is highly desirable to decrease the number of required angular projections to decrease both the total acquisition time and the light dose to the sample. This is particularly important to enable longitudinal studies, which involve measurements of the same fish at different time points. In this work, we demonstrate that the use of an iterative algorithm to reconstruct sparsely sampled OPT data sets can provide useful 3-D images with 50 or fewer projections, thereby significantly decreasing the minimum acquisition time and light dose while maintaining image quality. A transgenic zebrafish embryo with fluorescent labelling of the vasculature was imaged to acquire densely sampled (800 projections) and under-sampled data sets of transmitted and fluorescence projection images. The under-sampled OPT data sets were reconstructed using an iterative total variation-based image reconstruction algorithm and compared against FBP reconstructions of the densely sampled data sets. To illustrate the potential for quantitative analysis following rapid OPT data acquisition, a Hessian-based method was applied to automatically segment the reconstructed images to select the vasculature network. Results showed that 3-D images of the zebrafish embryo and its vasculature of sufficient visual quality for quantitative analysis can be reconstructed using the iterative algorithm from only 32 projections—achieving up to 28 times improvement in imaging speed and leading to total acquisition times of a few seconds. PMID:26308086
Robust online tracking via adaptive samples selection with saliency detection
NASA Astrophysics Data System (ADS)
Yan, Jia; Chen, Xi; Zhu, QiuPing
2013-12-01
Online tracking has shown to be successful in tracking of previously unknown objects. However, there are two important factors which lead to drift problem of online tracking, the one is how to select the exact labeled samples even when the target locations are inaccurate, and the other is how to handle the confusors which have similar features with the target. In this article, we propose a robust online tracking algorithm with adaptive samples selection based on saliency detection to overcome the drift problem. To deal with the problem of degrading the classifiers using mis-aligned samples, we introduce the saliency detection method to our tracking problem. Saliency maps and the strong classifiers are combined to extract the most correct positive samples. Our approach employs a simple yet saliency detection algorithm based on image spectral residual analysis. Furthermore, instead of using the random patches as the negative samples, we propose a reasonable selection criterion, in which both the saliency confidence and similarity are considered with the benefits that confusors in the surrounding background are incorporated into the classifiers update process before the drift occurs. The tracking task is formulated as a binary classification via online boosting framework. Experiment results in several challenging video sequences demonstrate the accuracy and stability of our tracker.
NASA Astrophysics Data System (ADS)
Walicka, A.; Jóźków, G.; Borkowski, A.
2018-05-01
The fluvial transport is an important aspect of hydrological and geomorphologic studies. The knowledge about the movement parameters of different-size fractions is essential in many applications, such as the exploration of the watercourse changes, the calculation of the river bed parameters or the investigation of the frequency and the nature of the weather events. Traditional techniques used for the fluvial transport investigations do not provide any information about the long-term horizontal movement of the rocks. This information can be gained by means of terrestrial laser scanning (TLS). However, this is a complex issue consisting of several stages of data processing. In this study the methodology for individual rocks segmentation from TLS point cloud has been proposed, which is the first step for the semi-automatic algorithm for movement detection of individual rocks. The proposed algorithm is executed in two steps. Firstly, the point cloud is classified as rocks or background using only geometrical information. Secondly, the DBSCAN algorithm is executed iteratively on points classified as rocks until only one stone is detected in each segment. The number of rocks in each segment is determined using principal component analysis (PCA) and simple derivative method for peak detection. As a result, several segments that correspond to individual rocks are formed. Numerical tests were executed on two test samples. The results of the semi-automatic segmentation were compared to results acquired by manual segmentation. The proposed methodology enabled to successfully segment 76 % and 72 % of rocks in the test sample 1 and test sample 2, respectively.
Automatic cortical thickness analysis on rodent brain
NASA Astrophysics Data System (ADS)
Lee, Joohwi; Ehlers, Cindy; Crews, Fulton; Niethammer, Marc; Budin, Francois; Paniagua, Beatriz; Sulik, Kathy; Johns, Josephine; Styner, Martin; Oguz, Ipek
2011-03-01
Localized difference in the cortex is one of the most useful morphometric traits in human and animal brain studies. There are many tools and methods already developed to automatically measure and analyze cortical thickness for the human brain. However, these tools cannot be directly applied to rodent brains due to the different scales; even adult rodent brains are 50 to 100 times smaller than humans. This paper describes an algorithm for automatically measuring the cortical thickness of mouse and rat brains. The algorithm consists of three steps: segmentation, thickness measurement, and statistical analysis among experimental groups. The segmentation step provides the neocortex separation from other brain structures and thus is a preprocessing step for the thickness measurement. In the thickness measurement step, the thickness is computed by solving a Laplacian PDE and a transport equation. The Laplacian PDE first creates streamlines as an analogy of cortical columns; the transport equation computes the length of the streamlines. The result is stored as a thickness map over the neocortex surface. For the statistical analysis, it is important to sample thickness at corresponding points. This is achieved by the particle correspondence algorithm which minimizes entropy between dynamically moving sample points called particles. Since the computational cost of the correspondence algorithm may limit the number of corresponding points, we use thin-plate spline based interpolation to increase the number of corresponding sample points. As a driving application, we measured the thickness difference to assess the effects of adolescent intermittent ethanol exposure that persist into adulthood and performed t-test between the control and exposed rat groups. We found significantly differing regions in both hemispheres.
A robust data scaling algorithm to improve classification accuracies in biomedical data.
Cao, Xi Hang; Stojkovic, Ivan; Obradovic, Zoran
2016-09-09
Machine learning models have been adapted in biomedical research and practice for knowledge discovery and decision support. While mainstream biomedical informatics research focuses on developing more accurate models, the importance of data preprocessing draws less attention. We propose the Generalized Logistic (GL) algorithm that scales data uniformly to an appropriate interval by learning a generalized logistic function to fit the empirical cumulative distribution function of the data. The GL algorithm is simple yet effective; it is intrinsically robust to outliers, so it is particularly suitable for diagnostic/classification models in clinical/medical applications where the number of samples is usually small; it scales the data in a nonlinear fashion, which leads to potential improvement in accuracy. To evaluate the effectiveness of the proposed algorithm, we conducted experiments on 16 binary classification tasks with different variable types and cover a wide range of applications. The resultant performance in terms of area under the receiver operation characteristic curve (AUROC) and percentage of correct classification showed that models learned using data scaled by the GL algorithm outperform the ones using data scaled by the Min-max and the Z-score algorithm, which are the most commonly used data scaling algorithms. The proposed GL algorithm is simple and effective. It is robust to outliers, so no additional denoising or outlier detection step is needed in data preprocessing. Empirical results also show models learned from data scaled by the GL algorithm have higher accuracy compared to the commonly used data scaling algorithms.
Adaptive Metropolis Sampling with Product Distributions
NASA Technical Reports Server (NTRS)
Wolpert, David H.; Lee, Chiu Fan
2005-01-01
The Metropolis-Hastings (MH) algorithm is a way to sample a provided target distribution pi(z). It works by repeatedly sampling a separate proposal distribution T(x,x') to generate a random walk {x(t)}. We consider a modification of the MH algorithm in which T is dynamically updated during the walk. The update at time t uses the {x(t' less than t)} to estimate the product distribution that has the least Kullback-Leibler distance to pi. That estimate is the information-theoretically optimal mean-field approximation to pi. We demonstrate through computer experiments that our algorithm produces samples that are superior to those of the conventional MH algorithm.
NASA Technical Reports Server (NTRS)
George, William K.; Rae, William J.; Woodward, Scott H.
1991-01-01
The importance of frequency response considerations in the use of thin-film gages for unsteady heat transfer measurements in transient facilities is considered, and methods for evaluating it are proposed. A departure frequency response function is introduced and illustrated by an existing analog circuit. A Fresnel integral temperature which possesses the essential features of the film temperature in transient facilities is introduced and is used to evaluate two numerical algorithms. Finally, criteria are proposed for the use of finite-difference algorithms for the calculation of the unsteady heat flux from a sampled temperature signal.
Efficient field-theoretic simulation of polymer solutions
DOE Office of Scientific and Technical Information (OSTI.GOV)
Villet, Michael C.; Fredrickson, Glenn H., E-mail: ghf@mrl.ucsb.edu; Department of Materials, University of California, Santa Barbara, California 93106
2014-12-14
We present several developments that facilitate the efficient field-theoretic simulation of polymers by complex Langevin sampling. A regularization scheme using finite Gaussian excluded volume interactions is used to derive a polymer solution model that appears free of ultraviolet divergences and hence is well-suited for lattice-discretized field theoretic simulation. We show that such models can exhibit ultraviolet sensitivity, a numerical pathology that dramatically increases sampling error in the continuum lattice limit, and further show that this pathology can be eliminated by appropriate model reformulation by variable transformation. We present an exponential time differencing algorithm for integrating complex Langevin equations for fieldmore » theoretic simulation, and show that the algorithm exhibits excellent accuracy and stability properties for our regularized polymer model. These developments collectively enable substantially more efficient field-theoretic simulation of polymers, and illustrate the importance of simultaneously addressing analytical and numerical pathologies when implementing such computations.« less
Understanding the Lomb–Scargle Periodogram
NASA Astrophysics Data System (ADS)
VanderPlas, Jacob T.
2018-05-01
The Lomb–Scargle periodogram is a well-known algorithm for detecting and characterizing periodic signals in unevenly sampled data. This paper presents a conceptual introduction to the Lomb–Scargle periodogram and important practical considerations for its use. Rather than a rigorous mathematical treatment, the goal of this paper is to build intuition about what assumptions are implicit in the use of the Lomb–Scargle periodogram and related estimators of periodicity, so as to motivate important practical considerations required in its proper application and interpretation.
Bayesian Analysis for Exponential Random Graph Models Using the Adaptive Exchange Sampler.
Jin, Ick Hoon; Yuan, Ying; Liang, Faming
2013-10-01
Exponential random graph models have been widely used in social network analysis. However, these models are extremely difficult to handle from a statistical viewpoint, because of the intractable normalizing constant and model degeneracy. In this paper, we consider a fully Bayesian analysis for exponential random graph models using the adaptive exchange sampler, which solves the intractable normalizing constant and model degeneracy issues encountered in Markov chain Monte Carlo (MCMC) simulations. The adaptive exchange sampler can be viewed as a MCMC extension of the exchange algorithm, and it generates auxiliary networks via an importance sampling procedure from an auxiliary Markov chain running in parallel. The convergence of this algorithm is established under mild conditions. The adaptive exchange sampler is illustrated using a few social networks, including the Florentine business network, molecule synthetic network, and dolphins network. The results indicate that the adaptive exchange algorithm can produce more accurate estimates than approximate exchange algorithms, while maintaining the same computational efficiency.
In situ microscopy for online monitoring of cell concentration in Pichia pastoris cultivations.
Marquard, D; Enders, A; Roth, G; Rinas, U; Scheper, T; Lindner, P
2016-09-20
In situ Microscopy (ISM) is an optical non-invasive technique to monitor cells in bioprocesses in real-time. Pichia pastoris is one of the most promising protein expression systems. This yeast combines fast growth on simple media and important eukaryotic features such as glycosylation. In this work, the ISM technology was applied to Pichia pastoris cultivations for online monitoring of the cell concentration during cultivation. Different ISM settings were tested. The acquired images were analyzed with two image processing algorithms. In seven cultivations the cell concentration was monitored by the applied algorithms and offline samples were taken to determine optical density (OD) and dry cell mass (DCM). Cell concentrations up to 74g/L dry cell mass could be analyzed via the ISM. Depending on the algorithm and the ISM settings, an accuracy between 0.3 % and 12 % was achieved. The overall results show that for a robust measurement a combination of the two described algorithms is required. Copyright © 2016 Elsevier B.V. All rights reserved.
Mao, Yong; Zhou, Xiao-Bo; Pi, Dao-Ying; Sun, You-Xian; Wong, Stephen T C
2005-10-01
In microarray-based cancer classification, gene selection is an important issue owing to the large number of variables and small number of samples as well as its non-linearity. It is difficult to get satisfying results by using conventional linear statistical methods. Recursive feature elimination based on support vector machine (SVM RFE) is an effective algorithm for gene selection and cancer classification, which are integrated into a consistent framework. In this paper, we propose a new method to select parameters of the aforementioned algorithm implemented with Gaussian kernel SVMs as better alternatives to the common practice of selecting the apparently best parameters by using a genetic algorithm to search for a couple of optimal parameter. Fast implementation issues for this method are also discussed for pragmatic reasons. The proposed method was tested on two representative hereditary breast cancer and acute leukaemia datasets. The experimental results indicate that the proposed method performs well in selecting genes and achieves high classification accuracies with these genes.
Data-driven process decomposition and robust online distributed modelling for large-scale processes
NASA Astrophysics Data System (ADS)
Shu, Zhang; Lijuan, Li; Lijuan, Yao; Shipin, Yang; Tao, Zou
2018-02-01
With the increasing attention of networked control, system decomposition and distributed models show significant importance in the implementation of model-based control strategy. In this paper, a data-driven system decomposition and online distributed subsystem modelling algorithm was proposed for large-scale chemical processes. The key controlled variables are first partitioned by affinity propagation clustering algorithm into several clusters. Each cluster can be regarded as a subsystem. Then the inputs of each subsystem are selected by offline canonical correlation analysis between all process variables and its controlled variables. Process decomposition is then realised after the screening of input and output variables. When the system decomposition is finished, the online subsystem modelling can be carried out by recursively block-wise renewing the samples. The proposed algorithm was applied in the Tennessee Eastman process and the validity was verified.
Stochastic Inversion of 2D Magnetotelluric Data
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chen, Jinsong
2010-07-01
The algorithm is developed to invert 2D magnetotelluric (MT) data based on sharp boundary parametrization using a Bayesian framework. Within the algorithm, we consider the locations and the resistivity of regions formed by the interfaces are as unknowns. We use a parallel, adaptive finite-element algorithm to forward simulate frequency-domain MT responses of 2D conductivity structure. Those unknown parameters are spatially correlated and are described by a geostatistical model. The joint posterior probability distribution function is explored by Markov Chain Monte Carlo (MCMC) sampling methods. The developed stochastic model is effective for estimating the interface locations and resistivity. Most importantly, itmore » provides details uncertainty information on each unknown parameter. Hardware requirements: PC, Supercomputer, Multi-platform, Workstation; Software requirements C and Fortan; Operation Systems/version is Linux/Unix or Windows« less
Novel search algorithms for a mid-infrared spectral library of cotton contaminants.
Loudermilk, J Brian; Himmelsbach, David S; Barton, Franklin E; de Haseth, James A
2008-06-01
During harvest, a variety of plant based contaminants are collected along with cotton lint. The USDA previously created a mid-infrared, attenuated total reflection (ATR), Fourier transform infrared (FT-IR) spectral library of cotton contaminants for contaminant identification as the contaminants have negative impacts on yarn quality. This library has shown impressive identification rates for extremely similar cellulose based contaminants in cases where the library was representative of the samples searched. When spectra of contaminant samples from crops grown in different geographic locations, seasons, and conditions and measured with a different spectrometer and accessories were searched, identification rates for standard search algorithms decreased significantly. Six standard algorithms were examined: dot product, correlation, sum of absolute values of differences, sum of the square root of the absolute values of differences, sum of absolute values of differences of derivatives, and sum of squared differences of derivatives. Four categories of contaminants derived from cotton plants were considered: leaf, stem, seed coat, and hull. Experiments revealed that the performance of the standard search algorithms depended upon the category of sample being searched and that different algorithms provided complementary information about sample identity. These results indicated that choosing a single standard algorithm to search the library was not possible. Three voting scheme algorithms based on result frequency, result rank, category frequency, or a combination of these factors for the results returned by the standard algorithms were developed and tested for their capability to overcome the unpredictability of the standard algorithms' performances. The group voting scheme search was based on the number of spectra from each category of samples represented in the library returned in the top ten results of the standard algorithms. This group algorithm was able to identify correctly as many test spectra as the best standard algorithm without relying on human choice to select a standard algorithm to perform the searches.
Abramyan, Tigran M; Snyder, James A; Thyparambil, Aby A; Stuart, Steven J; Latour, Robert A
2016-08-05
Clustering methods have been widely used to group together similar conformational states from molecular simulations of biomolecules in solution. For applications such as the interaction of a protein with a surface, the orientation of the protein relative to the surface is also an important clustering parameter because of its potential effect on adsorbed-state bioactivity. This study presents cluster analysis methods that are specifically designed for systems where both molecular orientation and conformation are important, and the methods are demonstrated using test cases of adsorbed proteins for validation. Additionally, because cluster analysis can be a very subjective process, an objective procedure for identifying both the optimal number of clusters and the best clustering algorithm to be applied to analyze a given dataset is presented. The method is demonstrated for several agglomerative hierarchical clustering algorithms used in conjunction with three cluster validation techniques. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Zhang, Hong-guang; Lu, Jian-gang
2016-02-01
Abstract To overcome the problems of significant difference among samples and nonlinearity between the property and spectra of samples in spectral quantitative analysis, a local regression algorithm is proposed in this paper. In this algorithm, net signal analysis method(NAS) was firstly used to obtain the net analyte signal of the calibration samples and unknown samples, then the Euclidean distance between net analyte signal of the sample and net analyte signal of calibration samples was calculated and utilized as similarity index. According to the defined similarity index, the local calibration sets were individually selected for each unknown sample. Finally, a local PLS regression model was built on each local calibration sets for each unknown sample. The proposed method was applied to a set of near infrared spectra of meat samples. The results demonstrate that the prediction precision and model complexity of the proposed method are superior to global PLS regression method and conventional local regression algorithm based on spectral Euclidean distance.
NASA Astrophysics Data System (ADS)
Dimov, I.; Georgieva, R.; Todorov, V.; Ostromsky, Tz.
2017-10-01
Reliability of large-scale mathematical models is an important issue when such models are used to support decision makers. Sensitivity analysis of model outputs to variation or natural uncertainties of model inputs is crucial for improving the reliability of mathematical models. A comprehensive experimental study of Monte Carlo algorithms based on Sobol sequences for multidimensional numerical integration has been done. A comparison with Latin hypercube sampling and a particular quasi-Monte Carlo lattice rule based on generalized Fibonacci numbers has been presented. The algorithms have been successfully applied to compute global Sobol sensitivity measures corresponding to the influence of several input parameters (six chemical reactions rates and four different groups of pollutants) on the concentrations of important air pollutants. The concentration values have been generated by the Unified Danish Eulerian Model. The sensitivity study has been done for the areas of several European cities with different geographical locations. The numerical tests show that the stochastic algorithms under consideration are efficient for multidimensional integration and especially for computing small by value sensitivity indices. It is a crucial element since even small indices may be important to be estimated in order to achieve a more accurate distribution of inputs influence and a more reliable interpretation of the mathematical model results.
NASA Astrophysics Data System (ADS)
Nasution, A. B.; Efendi, S.; Suwilo, S.
2018-04-01
The amount of data inserted in the form of audio samples that use 8 bits with LSB algorithm, affect the value of PSNR which resulted in changes in image quality of the insertion (fidelity). So in this research will be inserted audio samples using 5 bits with MLSB algorithm to reduce the number of data insertion where previously the audio sample will be compressed with Arithmetic Coding algorithm to reduce file size. In this research will also be encryption using Triple DES algorithm to better secure audio samples. The result of this research is the value of PSNR more than 50dB so it can be concluded that the image quality is still good because the value of PSNR has exceeded 40dB.
NASA Astrophysics Data System (ADS)
Zurek, Sebastian; Guzik, Przemyslaw; Pawlak, Sebastian; Kosmider, Marcin; Piskorski, Jaroslaw
2012-12-01
We explore the relation between correlation dimension, approximate entropy and sample entropy parameters, which are commonly used in nonlinear systems analysis. Using theoretical considerations we identify the points which are shared by all these complexity algorithms and show explicitly that the above parameters are intimately connected and mutually interdependent. A new geometrical interpretation of sample entropy and correlation dimension is provided and the consequences for the interpretation of sample entropy, its relative consistency and some of the algorithms for parameter selection for this quantity are discussed. To get an exact algorithmic relation between the three parameters we construct a very fast algorithm for simultaneous calculations of the above, which uses the full time series as the source of templates, rather than the usual 10%. This algorithm can be used in medical applications of complexity theory, as it can calculate all three parameters for a realistic recording of 104 points within minutes with the use of an average notebook computer.
Zhang, Xiaoheng; Wang, Lirui; Cao, Yao; Wang, Pin; Zhang, Cheng; Yang, Liuyang; Li, Yongming; Zhang, Yanling; Cheng, Oumei
2018-02-01
Diagnosis of Parkinson's disease (PD) based on speech data has been proved to be an effective way in recent years. However, current researches just care about the feature extraction and classifier design, and do not consider the instance selection. Former research by authors showed that the instance selection can lead to improvement on classification accuracy. However, no attention is paid on the relationship between speech sample and feature until now. Therefore, a new diagnosis algorithm of PD is proposed in this paper by simultaneously selecting speech sample and feature based on relevant feature weighting algorithm and multiple kernel method, so as to find their synergy effects, thereby improving classification accuracy. Experimental results showed that this proposed algorithm obtained apparent improvement on classification accuracy. It can obtain mean classification accuracy of 82.5%, which was 30.5% higher than the relevant algorithm. Besides, the proposed algorithm detected the synergy effects of speech sample and feature, which is valuable for speech marker extraction.
Wang, San-Yuan; Kuo, Ching-Hua; Tseng, Yufeng J
2015-03-03
Able to detect known and unknown metabolites, untargeted metabolomics has shown great potential in identifying novel biomarkers. However, elucidating all possible liquid chromatography/time-of-flight mass spectrometry (LC/TOF-MS) ion signals in a complex biological sample remains challenging since many ions are not the products of metabolites. Methods of reducing ions not related to metabolites or simply directly detecting metabolite related (pure) ions are important. In this work, we describe PITracer, a novel algorithm that accurately detects the pure ions of a LC/TOF-MS profile to extract pure ion chromatograms and detect chromatographic peaks. PITracer estimates the relative mass difference tolerance of ions and calibrates the mass over charge (m/z) values for peak detection algorithms with an additional option to further mass correction with respect to a user-specified metabolite. PITracer was evaluated using two data sets containing 373 human metabolite standards, including 5 saturated standards considered to be split peaks resultant from huge m/z fluctuation, and 12 urine samples spiked with 50 forensic drugs of varying concentrations. Analysis of these data sets show that PITracer correctly outperformed existing state-of-art algorithm and extracted the pure ion chromatograms of the 5 saturated standards without generating split peaks and detected the forensic drugs with high recall, precision, and F-score and small mass error.
Wang, Tongtong; Xiao, Zhiqiang; Liu, Zhigang
2017-01-01
Leaf area index (LAI) is an important biophysical parameter and the retrieval of LAI from remote sensing data is the only feasible method for generating LAI products at regional and global scales. However, most LAI retrieval methods use satellite observations at a specific time to retrieve LAI. Because of the impacts of clouds and aerosols, the LAI products generated by these methods are spatially incomplete and temporally discontinuous, and thus they cannot meet the needs of practical applications. To generate high-quality LAI products, four machine learning algorithms, including back-propagation neutral network (BPNN), radial basis function networks (RBFNs), general regression neutral networks (GRNNs), and multi-output support vector regression (MSVR) are proposed to retrieve LAI from time-series Moderate Resolution Imaging Spectroradiometer (MODIS) reflectance data in this study and performance of these machine learning algorithms is evaluated. The results demonstrated that GRNNs, RBFNs, and MSVR exhibited low sensitivity to training sample size, whereas BPNN had high sensitivity. The four algorithms performed slightly better with red, near infrared (NIR), and short wave infrared (SWIR) bands than red and NIR bands, and the results were significantly better than those obtained using single band reflectance data (red or NIR). Regardless of band composition, GRNNs performed better than the other three methods. Among the four algorithms, BPNN required the least training time, whereas MSVR needed the most for any sample size. PMID:28045443
Wang, Tongtong; Xiao, Zhiqiang; Liu, Zhigang
2017-01-01
Leaf area index (LAI) is an important biophysical parameter and the retrieval of LAI from remote sensing data is the only feasible method for generating LAI products at regional and global scales. However, most LAI retrieval methods use satellite observations at a specific time to retrieve LAI. Because of the impacts of clouds and aerosols, the LAI products generated by these methods are spatially incomplete and temporally discontinuous, and thus they cannot meet the needs of practical applications. To generate high-quality LAI products, four machine learning algorithms, including back-propagation neutral network (BPNN), radial basis function networks (RBFNs), general regression neutral networks (GRNNs), and multi-output support vector regression (MSVR) are proposed to retrieve LAI from time-series Moderate Resolution Imaging Spectroradiometer (MODIS) reflectance data in this study and performance of these machine learning algorithms is evaluated. The results demonstrated that GRNNs, RBFNs, and MSVR exhibited low sensitivity to training sample size, whereas BPNN had high sensitivity. The four algorithms performed slightly better with red, near infrared (NIR), and short wave infrared (SWIR) bands than red and NIR bands, and the results were significantly better than those obtained using single band reflectance data (red or NIR). Regardless of band composition, GRNNs performed better than the other three methods. Among the four algorithms, BPNN required the least training time, whereas MSVR needed the most for any sample size.
Structure-guided Protein Transition Modeling with a Probabilistic Roadmap Algorithm.
Maximova, Tatiana; Plaku, Erion; Shehu, Amarda
2016-07-07
Proteins are macromolecules in perpetual motion, switching between structural states to modulate their function. A detailed characterization of the precise yet complex relationship between protein structure, dynamics, and function requires elucidating transitions between functionally-relevant states. Doing so challenges both wet and dry laboratories, as protein dynamics involves disparate temporal scales. In this paper we present a novel, sampling-based algorithm to compute transition paths. The algorithm exploits two main ideas. First, it leverages known structures to initialize its search and define a reduced conformation space for rapid sampling. This is key to address the insufficient sampling issue suffered by sampling-based algorithms. Second, the algorithm embeds samples in a nearest-neighbor graph where transition paths can be efficiently computed via queries. The algorithm adapts the probabilistic roadmap framework that is popular in robot motion planning. In addition to efficiently computing lowest-cost paths between any given structures, the algorithm allows investigating hypotheses regarding the order of experimentally-known structures in a transition event. This novel contribution is likely to open up new venues of research. Detailed analysis is presented on multiple-basin proteins of relevance to human disease. Multiscaling and the AMBER ff14SB force field are used to obtain energetically-credible paths at atomistic detail.
Air-to-Air Missile Vector Scoring
2012-03-22
SIR sampling-importance resampling . . . . . . . . . . . . . . 53 EPF extended particle filter . . . . . . . . . . . . . . . . . . . . 54 UPF unscented...particle filter ( EPF ) or a unscented particle fil- ter (UPF) [20]. The basic concept is to apply a bank of N EKF or UKF filters to move particles from...Merwe, Doucet, Freitas and Wan provide a comprehensive discussion on the EPF and UPF, including algorithms for implementation [20]. 2Result based on
Measurement of glucose concentration by image processing of thin film slides
NASA Astrophysics Data System (ADS)
Piramanayagam, Sankaranaryanan; Saber, Eli; Heavner, David
2012-02-01
Measurement of glucose concentration is important for diagnosis and treatment of diabetes mellitus and other medical conditions. This paper describes a novel image-processing based approach for measuring glucose concentration. A fluid drop (patient sample) is placed on a thin film slide. Glucose, present in the sample, reacts with reagents on the slide to produce a color dye. The color intensity of the dye formed varies with glucose at different concentration levels. Current methods use spectrophotometry to determine the glucose level of the sample. Our proposed algorithm uses an image of the slide, captured at a specific wavelength, to automatically determine glucose concentration. The algorithm consists of two phases: training and testing. Training datasets consist of images at different concentration levels. The dye-occupied image region is first segmented using a Hough based technique and then an intensity based feature is calculated from the segmented region. Subsequently, a mathematical model that describes a relationship between the generated feature values and the given concentrations is obtained. During testing, the dye region of a test slide image is segmented followed by feature extraction. These two initial steps are similar to those done in training. However, in the final step, the algorithm uses the model (feature vs. concentration) obtained from the training and feature generated from test image to predict the unknown concentration. The performance of the image-based analysis was compared with that of a standard glucose analyzer.
Remote sensing estimation of colored dissolved organic matter (CDOM) in optically shallow waters
NASA Astrophysics Data System (ADS)
Li, Jiwei; Yu, Qian; Tian, Yong Q.; Becker, Brian L.
2017-06-01
It is not well understood how bottom reflectance of optically shallow waters affects the algorithm performance of colored dissolved organic matters (CDOM) retrieval. This study proposes a new algorithm that considers bottom reflectance in estimating CDOM absorption from optically shallow inland or coastal waters. The field sampling was conducted during four research cruises within the Saginaw River, Kawkawlin River and Saginaw Bay of Lake Huron. A stratified field sampling campaign collected water samples, determined the depth at each sampling location and measured optical properties. The sampled CDOM absorption at 440 nm broadly ranged from 0.12 to 8.46 m-1. Field sample analysis revealed that bottom reflectance does significantly change water apparent optical properties. We developed a CDOM retrieval algorithm (Shallow water Bio-Optical Properties algorithm, SBOP) that effectively reduces uncertainty by considering bottom reflectance in shallow waters. By incorporating the bottom contribution in upwelling radiances, the SBOP algorithm was able to explain 74% of the variance of CDOM values (RMSE = 0.22 and R2 = 0.74). The bottom effect index (BEI) was introduced to efficiently separate optically shallow and optically deep waters. Based on the BEI, an adaptive approach was proposed that references the amount of bottom effect in order to identify the most suitable algorithm (optically shallow water algorithm [SBOP] or optically deep water algorithm [QAA-CDOM]) to improve CDOM estimation (RMSE = 0.22 and R2 = 0.81). Our results potentially help to advance the capability of remote sensing in monitoring carbon pools at the land-water interface.
NASA Astrophysics Data System (ADS)
Stramska, Malgorzata; Yngve Børsheim, Knut; Białogrodzka, Jagoda; Cieszyńska, Agata; Ficek, Dariusz; Wereszka, Marzena
2016-04-01
There is still a limited knowledge about suspended and dissolved matter fluxes transported from coastal regions into the open sea regions in the Arctic. The land/sea interface is environmentally important and sensitive to climate change. Important biogeochemical material entering the oceans (including carbon) passes through this interface, but too little is known about the efficiency of this transport. Our goal in the NORDFLUX program is to improve quantitative understanding of the environmental feedbacks involved in these processes through an interdisciplinary study with innovative in situ observations. Completed work includes two in situ experiments in the Norwegian fiord (Porsangerfjorden) in the summers of 2014 and 2015. Experiments used research boat for collection of water samples and in situ bio-optical data, an autonomous glider, mooring with T S sensors, and a high frequency radar system. We have used these data to derive spatial maps of water temperature, salinity, surface currents, chlorophyll fluorescence, dissolved organic matter (DOM) fluorescence, and inherent optical properties (IOPs) of the water. The interpretation of these data in terms of suspended matter concentration and composition is possible by in situ 'calibrations' using water samples from discrete hydrographic stations. Total suspended matter (TSM), particulate carbon (POC and PIC), and dissolved organic carbon (DOC) concentrations together with measured water currents will allow us to estimate reservoirs and fluxes. Concentrations and fluxes will be related to physical conditions and meteorological data. An important aspect of this project is the work on regional ocean color algorithms. Global ocean color (OC) algorithms currently used by NASA do not perform sufficiently well in coastal Case 2 waters. Our data sets will allow us to derive such local algorithms. We will then use these algorithms for interpretation of OC data in terms of TSM concentrations and composition and DOC. After deriving these algorithms, we will analyze historical satellite imagery to assess multiyear trends in concentrations of various water components. This work is still in progress, specific and more detailed results are presented as posters during this meeting. This work was funded by the Norway Grants (NCBR contract No. 201985, project NORDFLUX). Partial support for MS comes from the Institute of Oceanology (IO PAN).
NASA Astrophysics Data System (ADS)
Salman, S. S.; Abbas, W. A.
2018-05-01
The goal of the study is to support analysis Enhancement of Resolution and study effect on classification methods on bands spectral information of specific and quantitative approaches. In this study introduce a method to enhancement resolution Landsat 8 of combining the bands spectral of 30 meters resolution with panchromatic band 8 of 15 meters resolution, because of importance multispectral imagery to extracting land - cover. Classification methods used in this study to classify several lands -covers recorded from OLI- 8 imagery. Two methods of Data mining can be classified as either supervised or unsupervised. In supervised methods, there is a particular predefined target, that means the algorithm learn which values of the target are associated with which values of the predictor sample. K-nearest neighbors and maximum likelihood algorithms examine in this work as supervised methods. In other hand, no sample identified as target in unsupervised methods, the algorithm of data extraction searches for structure and patterns between all the variables, represented by Fuzzy C-mean clustering method as one of the unsupervised methods, NDVI vegetation index used to compare the results of classification method, the percent of dense vegetation in maximum likelihood method give a best results.
High density DNA microarrays: algorithms and biomedical applications.
Liu, Wei-Min
2004-08-01
DNA microarrays are devices capable of detecting the identity and abundance of numerous DNA or RNA segments in samples. They are used for analyzing gene expressions, identifying genetic markers and detecting mutations on a genomic scale. The fundamental chemical mechanism of DNA microarrays is the hybridization between probes and targets due to the hydrogen bonds of nucleotide base pairing. Since the cross hybridization is inevitable, and probes or targets may form undesirable secondary or tertiary structures, the microarray data contain noise and depend on experimental conditions. It is crucial to apply proper statistical algorithms to obtain useful signals from noisy data. After we obtained the signals of a large amount of probes, we need to derive the biomedical information such as the existence of a transcript in a cell, the difference of expression levels of a gene in multiple samples, and the type of a genetic marker. Furthermore, after the expression levels of thousands of genes or the genotypes of thousands of single nucleotide polymorphisms are determined, it is usually important to find a small number of genes or markers that are related to a disease, individual reactions to drugs, or other phenotypes. All these applications need careful data analyses and reliable algorithms.
NASA Technical Reports Server (NTRS)
Morris, Richard; Anderson, R.; Clegg, S. M.; Bell, J. F., III
2010-01-01
Laser-induced breakdown spectroscopy (LIBS) uses pulses of laser light to ablate a material from the surface of a sample and produce an expanding plasma. The optical emission from the plasma produces a spectrum which can be used to classify target materials and estimate their composition. The ChemCam instrument on the Mars Science Laboratory (MSL) mission will use LIBS to rapidly analyze targets remotely, allowing more resource- and time-intensive in-situ analyses to be reserved for targets of particular interest. ChemCam will also be used to analyze samples that are not reachable by the rover's in-situ instruments. Due to these tactical and scientific roles, it is important that ChemCam-derived sample compositions are as accurate as possible. We have compared the results of partial least squares (PLS), multilayer perceptron (MLP) artificial neural networks (ANNs), and cascade correlation (CC) ANNs to determine which technique yields better estimates of quantitative element abundances in rock and mineral samples. The number of hidden nodes in the MLP ANNs was optimized using a genetic algorithm. The influence of two data preprocessing techniques were also investigated: genetic algorithm feature selection and averaging the spectra for each training sample prior to training the PLS and ANN algorithms. We used a ChemCam-like laboratory stand-off LIBS system to collect spectra of 30 pressed powder geostandards and a diverse suite of 196 geologic slab samples of known bulk composition. We tested the performance of PLS and ANNs on a subset of these samples, choosing to focus on silicate rocks and minerals with a loss on ignition of less than 2 percent. This resulted in a set of 22 pressed powder geostandards and 80 geologic samples. Four of the geostandards were used as a validation set and 18 were used as the training set for the algorithms. We found that PLS typically resulted in the lowest average absolute error in its predictions, but that the optimized MLP ANN and the CC ANN often gave results comparable to PLS. Averaging the spectra for each training sample and/or using feature selection to choose a small subset of wavelengths to use for predictions gave mixed results, with degraded performance in some cases and similar or slightly improved performance in other cases. However, training time was significantly reduced for both PLS and ANN methods by implementing feature selection, making this a potentially appealing method for initial, rapid-turn-around analyses necessary for Chemcam's tactical role on MSL. Choice of training samples has a strong influence on the accuracy of predictions. We are currently investigating the use of clustering algorithms (e.g. k-means, neural gas, etc.) to identify training sets that are spectrally similar to the unknown samples that are being predicted, and therefore result in improved predictions
Kirwan, Jennifer A; Weber, Ralf J M; Broadhurst, David I; Viant, Mark R
2014-01-01
Direct-infusion mass spectrometry (DIMS) metabolomics is an important approach for characterising molecular responses of organisms to disease, drugs and the environment. Increasingly large-scale metabolomics studies are being conducted, necessitating improvements in both bioanalytical and computational workflows to maintain data quality. This dataset represents a systematic evaluation of the reproducibility of a multi-batch DIMS metabolomics study of cardiac tissue extracts. It comprises of twenty biological samples (cow vs. sheep) that were analysed repeatedly, in 8 batches across 7 days, together with a concurrent set of quality control (QC) samples. Data are presented from each step of the workflow and are available in MetaboLights. The strength of the dataset is that intra- and inter-batch variation can be corrected using QC spectra and the quality of this correction assessed independently using the repeatedly-measured biological samples. Originally designed to test the efficacy of a batch-correction algorithm, it will enable others to evaluate novel data processing algorithms. Furthermore, this dataset serves as a benchmark for DIMS metabolomics, derived using best-practice workflows and rigorous quality assessment. PMID:25977770
Detection of Soil Nitrogen Using Near Infrared Sensors Based on Soil Pretreatment and Algorithms
Nie, Pengcheng; Dong, Tao; He, Yong; Qu, Fangfang
2017-01-01
Soil nitrogen content is one of the important growth nutrient parameters of crops. It is a prerequisite for scientific fertilization to accurately grasp soil nutrient information in precision agriculture. The information about nutrients such as nitrogen in the soil can be obtained quickly by using a near-infrared sensor. The data can be analyzed in the detection process, which is nondestructive and non-polluting. In order to investigate the effect of soil pretreatment on nitrogen content by near infrared sensor, 16 nitrogen concentrations were mixed with soil and the soil samples were divided into three groups with different pretreatment. The first group of soil samples with strict pretreatment were dried, ground, sieved and pressed. The second group of soil samples were dried and ground. The third group of soil samples were simply dried. Three linear different modeling methods are used to analyze the spectrum, including partial least squares (PLS), uninformative variable elimination (UVE), competitive adaptive reweighted algorithm (CARS). The model of nonlinear partial least squares which supports vector machine (LS-SVM) is also used to analyze the soil reflectance spectrum. The results show that the soil samples with strict pretreatment have the best accuracy in predicting nitrogen content by near-infrared sensor, and the pretreatment method is suitable for practical application. PMID:28492480
Analyzing Kernel Matrices for the Identification of Differentially Expressed Genes
Xia, Xiao-Lei; Xing, Huanlai; Liu, Xueqin
2013-01-01
One of the most important applications of microarray data is the class prediction of biological samples. For this purpose, statistical tests have often been applied to identify the differentially expressed genes (DEGs), followed by the employment of the state-of-the-art learning machines including the Support Vector Machines (SVM) in particular. The SVM is a typical sample-based classifier whose performance comes down to how discriminant samples are. However, DEGs identified by statistical tests are not guaranteed to result in a training dataset composed of discriminant samples. To tackle this problem, a novel gene ranking method namely the Kernel Matrix Gene Selection (KMGS) is proposed. The rationale of the method, which roots in the fundamental ideas of the SVM algorithm, is described. The notion of ''the separability of a sample'' which is estimated by performing -like statistics on each column of the kernel matrix, is first introduced. The separability of a classification problem is then measured, from which the significance of a specific gene is deduced. Also described is a method of Kernel Matrix Sequential Forward Selection (KMSFS) which shares the KMGS method's essential ideas but proceeds in a greedy manner. On three public microarray datasets, our proposed algorithms achieved noticeably competitive performance in terms of the B.632+ error rate. PMID:24349110
Distribution-Preserving Stratified Sampling for Learning Problems.
Cervellera, Cristiano; Maccio, Danilo
2017-06-09
The need for extracting a small sample from a large amount of real data, possibly streaming, arises routinely in learning problems, e.g., for storage, to cope with computational limitations, obtain good training/test/validation sets, and select minibatches for stochastic gradient neural network training. Unless we have reasons to select the samples in an active way dictated by the specific task and/or model at hand, it is important that the distribution of the selected points is as similar as possible to the original data. This is obvious for unsupervised learning problems, where the goal is to gain insights on the distribution of the data, but it is also relevant for supervised problems, where the theory explains how the training set distribution influences the generalization error. In this paper, we analyze the technique of stratified sampling from the point of view of distances between probabilities. This allows us to introduce an algorithm, based on recursive binary partition of the input space, aimed at obtaining samples that are distributed as much as possible as the original data. A theoretical analysis is proposed, proving the (greedy) optimality of the procedure together with explicit error bounds. An adaptive version of the algorithm is also introduced to cope with streaming data. Simulation tests on various data sets and different learning tasks are also provided.
NASA Technical Reports Server (NTRS)
Chang, H.
1976-01-01
A computer program using Lemke, Salkin and Spielberg's Set Covering Algorithm (SCA) to optimize a traffic model problem in the Scheduling Algorithm for Mission Planning and Logistics Evaluation (SAMPLE) was documented. SCA forms a submodule of SAMPLE and provides for input and output, subroutines, and an interactive feature for performing the optimization and arranging the results in a readily understandable form for output.
Robust algorithm for aligning two-dimensional chromatograms.
Gros, Jonas; Nabi, Deedar; Dimitriou-Christidis, Petros; Rutler, Rebecca; Arey, J Samuel
2012-11-06
Comprehensive two-dimensional gas chromatography (GC × GC) chromatograms typically exhibit run-to-run retention time variability. Chromatogram alignment is often a desirable step prior to further analysis of the data, for example, in studies of environmental forensics or weathering of complex mixtures. We present a new algorithm for aligning whole GC × GC chromatograms. This technique is based on alignment points that have locations indicated by the user both in a target chromatogram and in a reference chromatogram. We applied the algorithm to two sets of samples. First, we aligned the chromatograms of twelve compositionally distinct oil spill samples, all analyzed using the same instrument parameters. Second, we applied the algorithm to two compositionally distinct wastewater extracts analyzed using two different instrument temperature programs, thus involving larger retention time shifts than the first sample set. For both sample sets, the new algorithm performed favorably compared to two other available alignment algorithms: that of Pierce, K. M.; Wood, Lianna F.; Wright, B. W.; Synovec, R. E. Anal. Chem.2005, 77, 7735-7743 and 2-D COW from Zhang, D.; Huang, X.; Regnier, F. E.; Zhang, M. Anal. Chem.2008, 80, 2664-2671. The new algorithm achieves the best matches of retention times for test analytes, avoids some artifacts which result from the other alignment algorithms, and incurs the least modification of quantitative signal information.
Lee, Hun Joo; Han, Eunyoung; Lee, Jaesin; Chung, Heesun; Min, Sung-Gi
2016-11-01
The aim of this study is to improve resolution of impurity peaks using a newly devised normalization algorithm for multi-internal standards (ISs) and to describe a visual peak selection system (VPSS) for efficient support of impurity profiling. Drug trafficking routes, location of manufacture, or synthetic route can be identified from impurities in seized drugs. In the analysis of impurities, different chromatogram profiles are obtained from gas chromatography and used to examine similarities between drug samples. The data processing method using relative retention time (RRT) calculated by a single internal standard is not preferred when many internal standards are used and many chromatographic peaks present because of the risk of overlapping between peaks and difficulty in classifying impurities. In this study, impurities in methamphetamine (MA) were extracted by liquid-liquid extraction (LLE) method using ethylacetate containing 4 internal standards and analyzed by gas chromatography-flame ionization detection (GC-FID). The newly developed VPSS consists of an input module, a conversion module, and a detection module. The input module imports chromatograms collected from GC and performs preprocessing, which is converted with a normalization algorithm in the conversion module, and finally the detection module detects the impurities in MA samples using a visualized zoning user interface. The normalization algorithm in the conversion module was used to convert the raw data from GC-FID. The VPSS with the built-in normalization algorithm can effectively detect different impurities in samples even in complex matrices and has high resolution keeping the time sequence of chromatographic peaks the same as that of the RRT method. The system can widen a full range of chromatograms so that the peaks of impurities were better aligned for easy separation and classification. The resolution, accuracy, and speed of impurity profiling showed remarkable improvement. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Rauscher, Sarah; Neale, Chris; Pomès, Régis
2009-10-13
Generalized-ensemble algorithms in temperature space have become popular tools to enhance conformational sampling in biomolecular simulations. A random walk in temperature leads to a corresponding random walk in potential energy, which can be used to cross over energetic barriers and overcome the problem of quasi-nonergodicity. In this paper, we introduce two novel methods: simulated tempering distributed replica sampling (STDR) and virtual replica exchange (VREX). These methods are designed to address the practical issues inherent in the replica exchange (RE), simulated tempering (ST), and serial replica exchange (SREM) algorithms. RE requires a large, dedicated, and homogeneous cluster of CPUs to function efficiently when applied to complex systems. ST and SREM both have the drawback of requiring extensive initial simulations, possibly adaptive, for the calculation of weight factors or potential energy distribution functions. STDR and VREX alleviate the need for lengthy initial simulations, and for synchronization and extensive communication between replicas. Both methods are therefore suitable for distributed or heterogeneous computing platforms. We perform an objective comparison of all five algorithms in terms of both implementation issues and sampling efficiency. We use disordered peptides in explicit water as test systems, for a total simulation time of over 42 μs. Efficiency is defined in terms of both structural convergence and temperature diffusion, and we show that these definitions of efficiency are in fact correlated. Importantly, we find that ST-based methods exhibit faster temperature diffusion and correspondingly faster convergence of structural properties compared to RE-based methods. Within the RE-based methods, VREX is superior to both SREM and RE. On the basis of our observations, we conclude that ST is ideal for simple systems, while STDR is well-suited for complex systems.
Importance Sampling of Word Patterns in DNA and Protein Sequences
Chan, Hock Peng; Chen, Louis H.Y.
2010-01-01
Abstract Monte Carlo methods can provide accurate p-value estimates of word counting test statistics and are easy to implement. They are especially attractive when an asymptotic theory is absent or when either the search sequence or the word pattern is too short for the application of asymptotic formulae. Naive direct Monte Carlo is undesirable for the estimation of small probabilities because the associated rare events of interest are seldom generated. We propose instead efficient importance sampling algorithms that use controlled insertion of the desired word patterns on randomly generated sequences. The implementation is illustrated on word patterns of biological interest: palindromes and inverted repeats, patterns arising from position-specific weight matrices (PSWMs), and co-occurrences of pairs of motifs. PMID:21128856
An implementation of differential evolution algorithm for inversion of geoelectrical data
NASA Astrophysics Data System (ADS)
Balkaya, Çağlayan
2013-11-01
Differential evolution (DE), a population-based evolutionary algorithm (EA) has been implemented to invert self-potential (SP) and vertical electrical sounding (VES) data sets. The algorithm uses three operators including mutation, crossover and selection similar to genetic algorithm (GA). Mutation is the most important operator for the success of DE. Three commonly used mutation strategies including DE/best/1 (strategy 1), DE/rand/1 (strategy 2) and DE/rand-to-best/1 (strategy 3) were applied together with a binomial type crossover. Evolution cycle of DE was realized without boundary constraints. For the test studies performed with SP data, in addition to both noise-free and noisy synthetic data sets two field data sets observed over the sulfide ore body in the Malachite mine (Colorado) and over the ore bodies in the Neem-Ka Thana cooper belt (India) were considered. VES test studies were carried out using synthetically produced resistivity data representing a three-layered earth model and a field data set example from Gökçeada (Turkey), which displays a seawater infiltration problem. Mutation strategies mentioned above were also extensively tested on both synthetic and field data sets in consideration. Of these, strategy 1 was found to be the most effective strategy for the parameter estimation by providing less computational cost together with a good accuracy. The solutions obtained by DE for the synthetic cases of SP were quite consistent with particle swarm optimization (PSO) which is a more widely used population-based optimization algorithm than DE in geophysics. Estimated parameters of SP and VES data were also compared with those obtained from Metropolis-Hastings (M-H) sampling algorithm based on simulated annealing (SA) without cooling to clarify uncertainties in the solutions. Comparison to the M-H algorithm shows that DE performs a fast approximate posterior sampling for the case of low-dimensional inverse geophysical problems.
NASA Astrophysics Data System (ADS)
Chang, Bingguo; Chen, Xiaofei
2018-05-01
Ultrasonography is an important examination for the diagnosis of chronic liver disease. The doctor gives the liver indicators and suggests the patient's condition according to the description of ultrasound report. With the rapid increase in the amount of data of ultrasound report, the workload of professional physician to manually distinguish ultrasound results significantly increases. In this paper, we use the spectral clustering method to cluster analysis of the description of the ultrasound report, and automatically generate the ultrasonic diagnostic diagnosis by machine learning. 110 groups ultrasound examination report of chronic liver disease were selected as test samples in this experiment, and the results were validated by spectral clustering and compared with k-means clustering algorithm. The results show that the accuracy of spectral clustering is 92.73%, which is higher than that of k-means clustering algorithm, which provides a powerful ultrasound-assisted diagnosis for patients with chronic liver disease.
A partially reflecting random walk on spheres algorithm for electrical impedance tomography
DOE Office of Scientific and Technical Information (OSTI.GOV)
Maire, Sylvain, E-mail: maire@univ-tln.fr; Simon, Martin, E-mail: simon@math.uni-mainz.de
2015-12-15
In this work, we develop a probabilistic estimator for the voltage-to-current map arising in electrical impedance tomography. This novel so-called partially reflecting random walk on spheres estimator enables Monte Carlo methods to compute the voltage-to-current map in an embarrassingly parallel manner, which is an important issue with regard to the corresponding inverse problem. Our method uses the well-known random walk on spheres algorithm inside subdomains where the diffusion coefficient is constant and employs replacement techniques motivated by finite difference discretization to deal with both mixed boundary conditions and interface transmission conditions. We analyze the global bias and the variance ofmore » the new estimator both theoretically and experimentally. Subsequently, the variance of the new estimator is considerably reduced via a novel control variate conditional sampling technique which yields a highly efficient hybrid forward solver coupling probabilistic and deterministic algorithms.« less
Component Pin Recognition Using Algorithms Based on Machine Learning
NASA Astrophysics Data System (ADS)
Xiao, Yang; Hu, Hong; Liu, Ze; Xu, Jiangchang
2018-04-01
The purpose of machine vision for a plug-in machine is to improve the machine’s stability and accuracy, and recognition of the component pin is an important part of the vision. This paper focuses on component pin recognition using three different techniques. The first technique involves traditional image processing using the core algorithm for binary large object (BLOB) analysis. The second technique uses the histogram of oriented gradients (HOG), to experimentally compare the effect of the support vector machine (SVM) and the adaptive boosting machine (AdaBoost) learning meta-algorithm classifiers. The third technique is the use of an in-depth learning method known as convolution neural network (CNN), which involves identifying the pin by comparing a sample to its training. The main purpose of the research presented in this paper is to increase the knowledge of learning methods used in the plug-in machine industry in order to achieve better results.
Evaluating nodes importance in complex network based on PageRank algorithm
NASA Astrophysics Data System (ADS)
Li, Kai; He, Yongfeng
2018-04-01
To evaluate the important nodes in the complex network, and aim at the problems existing in the traditional PageRank algorithm, we propose a modified PageRank algorithm. The algorithm has convergence for the evaluation of the importance of the suspended nodes and the nodes with a directed loop network. The simulation example shows the effectiveness of the modified algorithm for the evaluation of the complexity of the complex network nodes.
A real time QRS detection using delay-coordinate mapping for the microcontroller implementation.
Lee, Jeong-Whan; Kim, Kyeong-Seop; Lee, Bongsoo; Lee, Byungchae; Lee, Myoung-Ho
2002-01-01
In this article, we propose a new algorithm using the characteristics of reconstructed phase portraits by delay-coordinate mapping utilizing lag rotundity for a real-time detection of QRS complexes in ECG signals. In reconstructing phase portrait the mapping parameters, time delay, and mapping dimension play important roles in shaping of portraits drawn in a new dimensional space. Experimentally, the optimal mapping time delay for detection of QRS complexes turned out to be 20 ms. To explore the meaning of this time delay and the proper mapping dimension, we applied a fill factor, mutual information, and autocorrelation function algorithm that were generally used to analyze the chaotic characteristics of sampled signals. From these results, we could find the fact that the performance of our proposed algorithms relied mainly on the geometrical property such as an area of the reconstructed phase portrait. For the real application, we applied our algorithm for designing a small cardiac event recorder. This system was to record patients' ECG and R-R intervals for 1 h to investigate HRV characteristics of the patients who had vasovagal syncope symptom and for the evaluation, we implemented our algorithm in C language and applied to MIT/BIH arrhythmia database of 48 subjects. Our proposed algorithm achieved a 99.58% detection rate of QRS complexes.
On the improvement of blood sample collection at clinical laboratories
2014-01-01
Background Blood samples are usually collected daily from different collection points, such hospitals and health centers, and transported to a core laboratory for testing. This paper presents a project to improve the collection routes of two of the largest clinical laboratories in Spain. These routes must be designed in a cost-efficient manner while satisfying two important constraints: (i) two-hour time windows between collection and delivery, and (ii) vehicle capacity. Methods A heuristic method based on a genetic algorithm has been designed to solve the problem of blood sample collection. The user enters the following information for each collection point: postal address, average collecting time, and average demand (in thermal containers). After implementing the algorithm using C programming, this is run and, in few seconds, it obtains optimal (or near-optimal) collection routes that specify the collection sequence for each vehicle. Different scenarios using various types of vehicles have been considered. Unless new collection points are added or problem parameters are changed substantially, routes need to be designed only once. Results The two laboratories in this study previously planned routes manually for 43 and 74 collection points, respectively. These routes were covered by an external carrier company. With the implementation of this algorithm, the number of routes could be reduced from ten to seven in one laboratory and from twelve to nine in the other, which represents significant annual savings in transportation costs. Conclusions The algorithm presented can be easily implemented in other laboratories that face this type of problem, and it is particularly interesting and useful as the number of collection points increases. The method designs blood collection routes with reduced costs that meet the time and capacity constraints of the problem. PMID:24406140
Computational model for vitamin D deficiency using hair mineral analysis.
Hassanien, Aboul Ella; Tharwat, Alaa; Own, Hala S
2017-10-01
Vitamin D deficiency is prevalent in the Arabian Gulf region, especially among women. Recent studies show that the vitamin D deficiency is associated with a mineral status of a patient. Therefore, it is important to assess the mineral status of the patient to reveal the hidden mineral imbalance associated with vitamin D deficiency. A well-known test such as the red blood cells is fairly expensive, invasive, and less informative. On the other hand, a hair mineral analysis can be considered an accurate, excellent, highly informative tool to measure mineral imbalance associated with vitamin D deficiency. In this study, 118 apparently healthy Kuwaiti women were assessed for their mineral levels and vitamin D status by a hair mineral analysis (HMA). This information was used to build a computerized model that would predict vitamin D deficiency based on its association with the levels and ratios of minerals. The first phase of the proposed model introduces a novel hybrid optimization algorithm, which can be considered as an improvement of Bat Algorithm (BA) to select the most discriminative features. The improvement includes using the mutation process of Genetic Algorithm (GA) to update the positions of bats with the aim of speeding up convergence; thus, making the algorithm more feasible for wider ranges of real-world applications. Due to the imbalanced class distribution in our dataset, in the second phase, different sampling methods such as Random Under-Sampling, Random Over-Sampling, and Synthetic Minority Oversampling Technique are used to solve the problem of imbalanced datasets. In the third phase, an AdaBoost ensemble classifier is used to predicting the vitamin D deficiency. The results showed that the proposed model achieved good results to detect the deficiency in vitamin D. Copyright © 2017 Elsevier Ltd. All rights reserved.
Jiang, Weiqin; Shen, Yifei; Ding, Yongfeng; Ye, Chuyu; Zheng, Yi; Zhao, Peng; Liu, Lulu; Tong, Zhou; Zhou, Linfu; Sun, Shuo; Zhang, Xingchen; Teng, Lisong; Timko, Michael P; Fan, Longjiang; Fang, Weijia
2018-01-15
Synchronous multifocal tumors are common in the hepatobiliary and pancreatic system but because of similarities in their histological features, oncologists have difficulty in identifying their precise tissue clonal origin through routine histopathological methods. To address this problem and assist in more precise diagnosis, we developed a computational approach for tissue origin diagnosis based on naive Bayes algorithm (TOD-Bayes) using ubiquitous RNA-Seq data. Massive tissue-specific RNA-Seq data sets were first obtained from The Cancer Genome Atlas (TCGA) and ∼1,000 feature genes were used to train and validate the TOD-Bayes algorithm. The accuracy of the model was >95% based on tenfold cross validation by the data from TCGA. A total of 18 clinical cancer samples (including six negative controls) with definitive tissue origin were subsequently used for external validation and 17 of the 18 samples were classified correctly in our study (94.4%). Furthermore, we included as cases studies seven tumor samples, taken from two individuals who suffered from synchronous multifocal tumors across tissues, where the efforts to make a definitive primary cancer diagnosis by traditional diagnostic methods had failed. Using our TOD-Bayes analysis, the two clinical test cases were successfully diagnosed as pancreatic cancer (PC) and cholangiocarcinoma (CC), respectively, in agreement with their clinical outcomes. Based on our findings, we believe that the TOD-Bayes algorithm is a powerful novel methodology to accurately identify the tissue origin of synchronous multifocal tumors of unknown primary cancers using RNA-Seq data and an important step toward more precision-based medicine in cancer diagnosis and treatment. © 2017 UICC.
Final Report: Sampling-Based Algorithms for Estimating Structure in Big Data.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Matulef, Kevin Michael
The purpose of this project was to develop sampling-based algorithms to discover hidden struc- ture in massive data sets. Inferring structure in large data sets is an increasingly common task in many critical national security applications. These data sets come from myriad sources, such as network traffic, sensor data, and data generated by large-scale simulations. They are often so large that traditional data mining techniques are time consuming or even infeasible. To address this problem, we focus on a class of algorithms that do not compute an exact answer, but instead use sampling to compute an approximate answer using fewermore » resources. The particular class of algorithms that we focus on are streaming algorithms , so called because they are designed to handle high-throughput streams of data. Streaming algorithms have only a small amount of working storage - much less than the size of the full data stream - so they must necessarily use sampling to approximate the correct answer. We present two results: * A streaming algorithm called HyperHeadTail , that estimates the degree distribution of a graph (i.e., the distribution of the number of connections for each node in a network). The degree distribution is a fundamental graph property, but prior work on estimating the degree distribution in a streaming setting was impractical for many real-world application. We improve upon prior work by developing an algorithm that can handle streams with repeated edges, and graph structures that evolve over time. * An algorithm for the task of maintaining a weighted subsample of items in a stream, when the items must be sampled according to their weight, and the weights are dynamically changing. To our knowledge, this is the first such algorithm designed for dynamically evolving weights. We expect it may be useful as a building block for other streaming algorithms on dynamic data sets.« less
Stable orthogonal local discriminant embedding for linear dimensionality reduction.
Gao, Quanxue; Ma, Jingjie; Zhang, Hailin; Gao, Xinbo; Liu, Yamin
2013-07-01
Manifold learning is widely used in machine learning and pattern recognition. However, manifold learning only considers the similarity of samples belonging to the same class and ignores the within-class variation of data, which will impair the generalization and stableness of the algorithms. For this purpose, we construct an adjacency graph to model the intraclass variation that characterizes the most important properties, such as diversity of patterns, and then incorporate the diversity into the discriminant objective function for linear dimensionality reduction. Finally, we introduce the orthogonal constraint for the basis vectors and propose an orthogonal algorithm called stable orthogonal local discriminate embedding. Experimental results on several standard image databases demonstrate the effectiveness of the proposed dimensionality reduction approach.
Determination of surface stress by Seasat-SASS - A case study with JASIN data
NASA Technical Reports Server (NTRS)
Liu, W. T.; Large, W. G.
1981-01-01
The values of sea surface stress determined with the dissipation method and those determined with a surface-layer model from observations on F.S. Meteor during the Joint Air-Sea Interaction (JASIN) Experiment are compared with the backscatter coefficients measured by the scatterometer SASS on the satellite Seasat. This study demonstrates that SASS can be used to determine surface stress directly as well as wind speed. The quality of the surface observations used in the calibration of the retrieval algorithms, however, is important. This sample of measurements disagrees with the predictions by the existing wind retrieval algorithm under non-neutral conditions and the discrepancies depend on atmospheric stability.
Fundamentals and Recent Developments in Approximate Bayesian Computation
Lintusaari, Jarno; Gutmann, Michael U.; Dutta, Ritabrata; Kaski, Samuel; Corander, Jukka
2017-01-01
Abstract Bayesian inference plays an important role in phylogenetics, evolutionary biology, and in many other branches of science. It provides a principled framework for dealing with uncertainty and quantifying how it changes in the light of new evidence. For many complex models and inference problems, however, only approximate quantitative answers are obtainable. Approximate Bayesian computation (ABC) refers to a family of algorithms for approximate inference that makes a minimal set of assumptions by only requiring that sampling from a model is possible. We explain here the fundamentals of ABC, review the classical algorithms, and highlight recent developments. [ABC; approximate Bayesian computation; Bayesian inference; likelihood-free inference; phylogenetics; simulator-based models; stochastic simulation models; tree-based models.] PMID:28175922
Real time ray tracing based on shader
NASA Astrophysics Data System (ADS)
Gui, JiangHeng; Li, Min
2017-07-01
Ray tracing is a rendering algorithm for generating an image through tracing lights into an image plane, it can simulate complicate optical phenomenon like refraction, depth of field and motion blur. Compared with rasterization, ray tracing can achieve more realistic rendering result, however with greater computational cost, simple scene rendering can consume tons of time. With the GPU's performance improvement and the advent of programmable rendering pipeline, complicated algorithm can also be implemented directly on shader. So, this paper proposes a new method that implement ray tracing directly on fragment shader, mainly include: surface intersection, importance sampling and progressive rendering. With the help of GPU's powerful throughput capability, it can implement real time rendering of simple scene.
Ouyang, Qin; Zhao, Jiewen; Chen, Quansheng
2015-01-01
The non-sugar solids (NSS) content is one of the most important nutrition indicators of Chinese rice wine. This study proposed a rapid method for the measurement of NSS content in Chinese rice wine using near infrared (NIR) spectroscopy. We also systemically studied the efficient spectral variables selection algorithms that have to go through modeling. A new algorithm of synergy interval partial least square with competitive adaptive reweighted sampling (Si-CARS-PLS) was proposed for modeling. The performance of the final model was back-evaluated using root mean square error of calibration (RMSEC) and correlation coefficient (Rc) in calibration set and similarly tested by mean square error of prediction (RMSEP) and correlation coefficient (Rp) in prediction set. The optimum model by Si-CARS-PLS algorithm was achieved when 7 PLS factors and 18 variables were included, and the results were as follows: Rc=0.95 and RMSEC=1.12 in the calibration set, Rp=0.95 and RMSEP=1.22 in the prediction set. In addition, Si-CARS-PLS algorithm showed its superiority when compared with the commonly used algorithms in multivariate calibration. This work demonstrated that NIR spectroscopy technique combined with a suitable multivariate calibration algorithm has a high potential in rapid measurement of NSS content in Chinese rice wine. Copyright © 2015 Elsevier B.V. All rights reserved.
Disruption Warning Database Development and Exploratory Machine Learning Studies on Alcator C-Mod
NASA Astrophysics Data System (ADS)
Montes, Kevin; Rea, Cristina; Granetz, Robert
2017-10-01
A database of about 1800 shots from the 2015 campaign on the Alcator C-Mod tokamak is assembled, including disruptive and non-disruptive discharges. The database consists of 40 relevant plasma parameters with data taken from 160k time slices. In order to investigate the possibility of developing a robust disruption prediction algorithm that is tokamak-independent, we focused machine learning studies on a subset of dimensionless parameters such as βp, n /nG , etc. The Random Forests machine learning algorithm provides insight on the available data set by ranking the relative importance of the input features. Its application on the C-Mod database, however, reveals that virtually no one parameter has more importance than any other, and that its classification algorithm has a low rate of successfully predicted samples, as well as poor false positive and false negative rates. Comparing the analysis of this algorithm on the C-Mod database with its application to a similar database on DIII-D, we conclude that disruption prediction may not be feasible on C-Mod. This conclusion is supported by empirical observations that most C-Mod disruptions are caused by radiative collapse due to molybdenum from the first wall, which happens on just a 1-2ms timescale. Supported by the US Dept. of Energy under DE-FC02-99ER54512 and DE-FC02-04ER54698.
Wang, Shuaiqun; Aorigele; Kong, Wei; Zeng, Weiming; Hong, Xiaomin
2016-01-01
Gene expression data composed of thousands of genes play an important role in classification platforms and disease diagnosis. Hence, it is vital to select a small subset of salient features over a large number of gene expression data. Lately, many researchers devote themselves to feature selection using diverse computational intelligence methods. However, in the progress of selecting informative genes, many computational methods face difficulties in selecting small subsets for cancer classification due to the huge number of genes (high dimension) compared to the small number of samples, noisy genes, and irrelevant genes. In this paper, we propose a new hybrid algorithm HICATS incorporating imperialist competition algorithm (ICA) which performs global search and tabu search (TS) that conducts fine-tuned search. In order to verify the performance of the proposed algorithm HICATS, we have tested it on 10 well-known benchmark gene expression classification datasets with dimensions varying from 2308 to 12600. The performance of our proposed method proved to be superior to other related works including the conventional version of binary optimization algorithm in terms of classification accuracy and the number of selected genes.
Aorigele; Zeng, Weiming; Hong, Xiaomin
2016-01-01
Gene expression data composed of thousands of genes play an important role in classification platforms and disease diagnosis. Hence, it is vital to select a small subset of salient features over a large number of gene expression data. Lately, many researchers devote themselves to feature selection using diverse computational intelligence methods. However, in the progress of selecting informative genes, many computational methods face difficulties in selecting small subsets for cancer classification due to the huge number of genes (high dimension) compared to the small number of samples, noisy genes, and irrelevant genes. In this paper, we propose a new hybrid algorithm HICATS incorporating imperialist competition algorithm (ICA) which performs global search and tabu search (TS) that conducts fine-tuned search. In order to verify the performance of the proposed algorithm HICATS, we have tested it on 10 well-known benchmark gene expression classification datasets with dimensions varying from 2308 to 12600. The performance of our proposed method proved to be superior to other related works including the conventional version of binary optimization algorithm in terms of classification accuracy and the number of selected genes. PMID:27579323
Quantum algorithm for association rules mining
NASA Astrophysics Data System (ADS)
Yu, Chao-Hua; Gao, Fei; Wang, Qing-Le; Wen, Qiao-Yan
2016-10-01
Association rules mining (ARM) is one of the most important problems in knowledge discovery and data mining. Given a transaction database that has a large number of transactions and items, the task of ARM is to acquire consumption habits of customers by discovering the relationships between itemsets (sets of items). In this paper, we address ARM in the quantum settings and propose a quantum algorithm for the key part of ARM, finding frequent itemsets from the candidate itemsets and acquiring their supports. Specifically, for the case in which there are Mf(k ) frequent k -itemsets in the Mc(k ) candidate k -itemsets (Mf(k )≤Mc(k ) ), our algorithm can efficiently mine these frequent k -itemsets and estimate their supports by using parallel amplitude estimation and amplitude amplification with complexity O (k/√{Mc(k )Mf(k ) } ɛ ) , where ɛ is the error for estimating the supports. Compared with the classical counterpart, i.e., the classical sampling-based algorithm, whose complexity is O (k/Mc(k ) ɛ2) , our quantum algorithm quadratically improves the dependence on both ɛ and Mc(k ) in the best case when Mf(k )≪Mc(k ) and on ɛ alone in the worst case when Mf(k )≈Mc(k ) .
Distribution majorization of corner points by reinforcement learning for moving object detection
NASA Astrophysics Data System (ADS)
Wu, Hao; Yu, Hao; Zhou, Dongxiang; Cheng, Yongqiang
2018-04-01
Corner points play an important role in moving object detection, especially in the case of free-moving camera. Corner points provide more accurate information than other pixels and reduce the computation which is unnecessary. Previous works only use intensity information to locate the corner points, however, the information that former and the last frames provided also can be used. We utilize the information to focus on more valuable area and ignore the invaluable area. The proposed algorithm is based on reinforcement learning, which regards the detection of corner points as a Markov process. In the Markov model, the video to be detected is regarded as environment, the selections of blocks for one corner point are regarded as actions and the performance of detection is regarded as state. Corner points are assigned to be the blocks which are seperated from original whole image. Experimentally, we select a conventional method which uses marching and Random Sample Consensus algorithm to obtain objects as the main framework and utilize our algorithm to improve the result. The comparison between the conventional method and the same one with our algorithm show that our algorithm reduce 70% of the false detection.
Fast Ss-Ilm a Computationally Efficient Algorithm to Discover Socially Important Locations
NASA Astrophysics Data System (ADS)
Dokuz, A. S.; Celik, M.
2017-11-01
Socially important locations are places which are frequently visited by social media users in their social media lifetime. Discovering socially important locations provide several valuable information about user behaviours on social media networking sites. However, discovering socially important locations are challenging due to data volume and dimensions, spatial and temporal calculations, location sparseness in social media datasets, and inefficiency of current algorithms. In the literature, several studies are conducted to discover important locations, however, the proposed approaches do not work in computationally efficient manner. In this study, we propose Fast SS-ILM algorithm by modifying the algorithm of SS-ILM to mine socially important locations efficiently. Experimental results show that proposed Fast SS-ILM algorithm decreases execution time of socially important locations discovery process up to 20 %.
Small-Noise Analysis and Symmetrization of Implicit Monte Carlo Samplers
Goodman, Jonathan; Lin, Kevin K.; Morzfeld, Matthias
2015-07-06
Implicit samplers are algorithms for producing independent, weighted samples from multivariate probability distributions. These are often applied in Bayesian data assimilation algorithms. We use Laplace asymptotic expansions to analyze two implicit samplers in the small noise regime. Our analysis suggests a symmetrization of the algorithms that leads to improved implicit sampling schemes at a relatively small additional cost. Here, computational experiments confirm the theory and show that symmetrization is effective for small noise sampling problems.
Online clustering algorithms for radar emitter classification.
Liu, Jun; Lee, Jim P Y; Senior; Li, Lingjie; Luo, Zhi-Quan; Wong, K Max
2005-08-01
Radar emitter classification is a special application of data clustering for classifying unknown radar emitters from received radar pulse samples. The main challenges of this task are the high dimensionality of radar pulse samples, small sample group size, and closely located radar pulse clusters. In this paper, two new online clustering algorithms are developed for radar emitter classification: One is model-based using the Minimum Description Length (MDL) criterion and the other is based on competitive learning. Computational complexity is analyzed for each algorithm and then compared. Simulation results show the superior performance of the model-based algorithm over competitive learning in terms of better classification accuracy, flexibility, and stability.
Classical boson sampling algorithms with superior performance to near-term experiments
NASA Astrophysics Data System (ADS)
Neville, Alex; Sparrow, Chris; Clifford, Raphaël; Johnston, Eric; Birchall, Patrick M.; Montanaro, Ashley; Laing, Anthony
2017-12-01
It is predicted that quantum computers will dramatically outperform their conventional counterparts. However, large-scale universal quantum computers are yet to be built. Boson sampling is a rudimentary quantum algorithm tailored to the platform of linear optics, which has sparked interest as a rapid way to demonstrate such quantum supremacy. Photon statistics are governed by intractable matrix functions, which suggests that sampling from the distribution obtained by injecting photons into a linear optical network could be solved more quickly by a photonic experiment than by a classical computer. The apparently low resource requirements for large boson sampling experiments have raised expectations of a near-term demonstration of quantum supremacy by boson sampling. Here we present classical boson sampling algorithms and theoretical analyses of prospects for scaling boson sampling experiments, showing that near-term quantum supremacy via boson sampling is unlikely. Our classical algorithm, based on Metropolised independence sampling, allowed the boson sampling problem to be solved for 30 photons with standard computing hardware. Compared to current experiments, a demonstration of quantum supremacy over a successful implementation of these classical methods on a supercomputer would require the number of photons and experimental components to increase by orders of magnitude, while tackling exponentially scaling photon loss.
The Even-Rho and Even-Epsilon Algorithms for Accelerating Convergence of a Numerical Sequence
1981-12-01
equal, leading to zero or very small divisors. Computer programs implementing these algorithms are given along with sample output. An appreciable amount...calculation of the array of Shank’s transforms or, -A equivalently, of the related Padd Table. The :other, the even-rho algorithm, is closely related...leading to zero or very small divisors. Computer pro- grams implementing these algorithms are given along with sample output. An appreciable amount or
NASA Astrophysics Data System (ADS)
Zhang, Leihong; Liang, Dong; Li, Bei; Kang, Yi; Pan, Zilan; Zhang, Dawei; Gao, Xiumin; Ma, Xiuhua
2016-07-01
On the basis of analyzing the cosine light field with determined analytic expression and the pseudo-inverse method, the object is illuminated by a presetting light field with a determined discrete Fourier transform measurement matrix, and the object image is reconstructed by the pseudo-inverse method. The analytic expression of the algorithm of computational ghost imaging based on discrete Fourier transform measurement matrix is deduced theoretically, and compared with the algorithm of compressive computational ghost imaging based on random measurement matrix. The reconstruction process and the reconstruction error are analyzed. On this basis, the simulation is done to verify the theoretical analysis. When the sampling measurement number is similar to the number of object pixel, the rank of discrete Fourier transform matrix is the same as the one of the random measurement matrix, the PSNR of the reconstruction image of FGI algorithm and PGI algorithm are similar, the reconstruction error of the traditional CGI algorithm is lower than that of reconstruction image based on FGI algorithm and PGI algorithm. As the decreasing of the number of sampling measurement, the PSNR of reconstruction image based on FGI algorithm decreases slowly, and the PSNR of reconstruction image based on PGI algorithm and CGI algorithm decreases sharply. The reconstruction time of FGI algorithm is lower than that of other algorithms and is not affected by the number of sampling measurement. The FGI algorithm can effectively filter out the random white noise through a low-pass filter and realize the reconstruction denoising which has a higher denoising capability than that of the CGI algorithm. The FGI algorithm can improve the reconstruction accuracy and the reconstruction speed of computational ghost imaging.
An Improved Nested Sampling Algorithm for Model Selection and Assessment
NASA Astrophysics Data System (ADS)
Zeng, X.; Ye, M.; Wu, J.; WANG, D.
2017-12-01
Multimodel strategy is a general approach for treating model structure uncertainty in recent researches. The unknown groundwater system is represented by several plausible conceptual models. Each alternative conceptual model is attached with a weight which represents the possibility of this model. In Bayesian framework, the posterior model weight is computed as the product of model prior weight and marginal likelihood (or termed as model evidence). As a result, estimating marginal likelihoods is crucial for reliable model selection and assessment in multimodel analysis. Nested sampling estimator (NSE) is a new proposed algorithm for marginal likelihood estimation. The implementation of NSE comprises searching the parameters' space from low likelihood area to high likelihood area gradually, and this evolution is finished iteratively via local sampling procedure. Thus, the efficiency of NSE is dominated by the strength of local sampling procedure. Currently, Metropolis-Hasting (M-H) algorithm and its variants are often used for local sampling in NSE. However, M-H is not an efficient sampling algorithm for high-dimensional or complex likelihood function. For improving the performance of NSE, it could be feasible to integrate more efficient and elaborated sampling algorithm - DREAMzs into the local sampling. In addition, in order to overcome the computation burden problem of large quantity of repeating model executions in marginal likelihood estimation, an adaptive sparse grid stochastic collocation method is used to build the surrogates for original groundwater model.
Shu, Tongxin; Xia, Min; Chen, Jiahong; Silva, Clarence de
2017-11-05
Power management is crucial in the monitoring of a remote environment, especially when long-term monitoring is needed. Renewable energy sources such as solar and wind may be harvested to sustain a monitoring system. However, without proper power management, equipment within the monitoring system may become nonfunctional and, as a consequence, the data or events captured during the monitoring process will become inaccurate as well. This paper develops and applies a novel adaptive sampling algorithm for power management in the automated monitoring of the quality of water in an extensive and remote aquatic environment. Based on the data collected on line using sensor nodes, a data-driven adaptive sampling algorithm (DDASA) is developed for improving the power efficiency while ensuring the accuracy of sampled data. The developed algorithm is evaluated using two distinct key parameters, which are dissolved oxygen (DO) and turbidity. It is found that by dynamically changing the sampling frequency, the battery lifetime can be effectively prolonged while maintaining a required level of sampling accuracy. According to the simulation results, compared to a fixed sampling rate, approximately 30.66% of the battery energy can be saved for three months of continuous water quality monitoring. Using the same dataset to compare with a traditional adaptive sampling algorithm (ASA), while achieving around the same Normalized Mean Error (NME), DDASA is superior in saving 5.31% more battery energy.
Shu, Tongxin; Xia, Min; Chen, Jiahong; de Silva, Clarence
2017-01-01
Power management is crucial in the monitoring of a remote environment, especially when long-term monitoring is needed. Renewable energy sources such as solar and wind may be harvested to sustain a monitoring system. However, without proper power management, equipment within the monitoring system may become nonfunctional and, as a consequence, the data or events captured during the monitoring process will become inaccurate as well. This paper develops and applies a novel adaptive sampling algorithm for power management in the automated monitoring of the quality of water in an extensive and remote aquatic environment. Based on the data collected on line using sensor nodes, a data-driven adaptive sampling algorithm (DDASA) is developed for improving the power efficiency while ensuring the accuracy of sampled data. The developed algorithm is evaluated using two distinct key parameters, which are dissolved oxygen (DO) and turbidity. It is found that by dynamically changing the sampling frequency, the battery lifetime can be effectively prolonged while maintaining a required level of sampling accuracy. According to the simulation results, compared to a fixed sampling rate, approximately 30.66% of the battery energy can be saved for three months of continuous water quality monitoring. Using the same dataset to compare with a traditional adaptive sampling algorithm (ASA), while achieving around the same Normalized Mean Error (NME), DDASA is superior in saving 5.31% more battery energy. PMID:29113087
NASA Technical Reports Server (NTRS)
Taylor, Robert P.; Luck, Rogelio
1995-01-01
The view factors which are used in diffuse-gray radiation enclosure calculations are often computed by approximate numerical integrations. These approximately calculated view factors will usually not satisfy the important physical constraints of reciprocity and closure. In this paper several view-factor rectification algorithms are reviewed and a rectification algorithm based on a least-squares numerical filtering scheme is proposed with both weighted and unweighted classes. A Monte-Carlo investigation is undertaken to study the propagation of view-factor and surface-area uncertainties into the heat transfer results of the diffuse-gray enclosure calculations. It is found that the weighted least-squares algorithm is vastly superior to the other rectification schemes for the reduction of the heat-flux sensitivities to view-factor uncertainties. In a sample problem, which has proven to be very sensitive to uncertainties in view factor, the heat transfer calculations with weighted least-squares rectified view factors are very good with an original view-factor matrix computed to only one-digit accuracy. All of the algorithms had roughly equivalent effects on the reduction in sensitivity to area uncertainty in this case study.
Jiang, Yanhua; Xiong, Guangming; Chen, Huiyan; Lee, Dah-Jye
2014-01-01
This paper presents a monocular visual odometry algorithm that incorporates a wheeled vehicle model for ground vehicles. The main innovation of this algorithm is to use the single-track bicycle model to interpret the relationship between the yaw rate and side slip angle, which are the two most important parameters that describe the motion of a wheeled vehicle. Additionally, the pitch angle is also considered since the planar-motion hypothesis often fails due to the dynamic characteristics of wheel suspensions and tires in real-world environments. Linearization is used to calculate a closed-form solution of the motion parameters that works as a hypothesis generator in a RAndom SAmple Consensus (RANSAC) scheme to reduce the complexity in solving equations involving trigonometric. All inliers found are used to refine the winner solution through minimizing the reprojection error. Finally, the algorithm is applied to real-time on-board visual localization applications. Its performance is evaluated by comparing against the state-of-the-art monocular visual odometry methods using both synthetic data and publicly available datasets over several kilometers in dynamic outdoor environments. PMID:25256109
NASA Astrophysics Data System (ADS)
Valdes, Gilmer; Solberg, Timothy D.; Heskel, Marina; Ungar, Lyle; Simone, Charles B., II
2016-08-01
To develop a patient-specific ‘big data’ clinical decision tool to predict pneumonitis in stage I non-small cell lung cancer (NSCLC) patients after stereotactic body radiation therapy (SBRT). 61 features were recorded for 201 consecutive patients with stage I NSCLC treated with SBRT, in whom 8 (4.0%) developed radiation pneumonitis. Pneumonitis thresholds were found for each feature individually using decision stumps. The performance of three different algorithms (Decision Trees, Random Forests, RUSBoost) was evaluated. Learning curves were developed and the training error analyzed and compared to the testing error in order to evaluate the factors needed to obtain a cross-validated error smaller than 0.1. These included the addition of new features, increasing the complexity of the algorithm and enlarging the sample size and number of events. In the univariate analysis, the most important feature selected was the diffusion capacity of the lung for carbon monoxide (DLCO adj%). On multivariate analysis, the three most important features selected were the dose to 15 cc of the heart, dose to 4 cc of the trachea or bronchus, and race. Higher accuracy could be achieved if the RUSBoost algorithm was used with regularization. To predict radiation pneumonitis within an error smaller than 10%, we estimate that a sample size of 800 patients is required. Clinically relevant thresholds that put patients at risk of developing radiation pneumonitis were determined in a cohort of 201 stage I NSCLC patients treated with SBRT. The consistency of these thresholds can provide radiation oncologists with an estimate of their reliability and may inform treatment planning and patient counseling. The accuracy of the classification is limited by the number of patients in the study and not by the features gathered or the complexity of the algorithm.
Efficient Classical Algorithm for Boson Sampling with Partially Distinguishable Photons
NASA Astrophysics Data System (ADS)
Renema, J. J.; Menssen, A.; Clements, W. R.; Triginer, G.; Kolthammer, W. S.; Walmsley, I. A.
2018-06-01
We demonstrate how boson sampling with photons of partial distinguishability can be expressed in terms of interference of fewer photons. We use this observation to propose a classical algorithm to simulate the output of a boson sampler fed with photons of partial distinguishability. We find conditions for which this algorithm is efficient, which gives a lower limit on the required indistinguishability to demonstrate a quantum advantage. Under these conditions, adding more photons only polynomially increases the computational cost to simulate a boson sampling experiment.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jakeman, John D.; Narayan, Akil; Zhou, Tao
We propose an algorithm for recovering sparse orthogonal polynomial expansions via collocation. A standard sampling approach for recovering sparse polynomials uses Monte Carlo sampling, from the density of orthogonality, which results in poor function recovery when the polynomial degree is high. Our proposed approach aims to mitigate this limitation by sampling with respect to the weighted equilibrium measure of the parametric domain and subsequently solves a preconditionedmore » $$\\ell^1$$-minimization problem, where the weights of the diagonal preconditioning matrix are given by evaluations of the Christoffel function. Our algorithm can be applied to a wide class of orthogonal polynomial families on bounded and unbounded domains, including all classical families. We present theoretical analysis to motivate the algorithm and numerical results that show our method is superior to standard Monte Carlo methods in many situations of interest. In conclusion, numerical examples are also provided to demonstrate that our proposed algorithm leads to comparable or improved accuracy even when compared with Legendre- and Hermite-specific algorithms.« less
Jakeman, John D.; Narayan, Akil; Zhou, Tao
2017-06-22
We propose an algorithm for recovering sparse orthogonal polynomial expansions via collocation. A standard sampling approach for recovering sparse polynomials uses Monte Carlo sampling, from the density of orthogonality, which results in poor function recovery when the polynomial degree is high. Our proposed approach aims to mitigate this limitation by sampling with respect to the weighted equilibrium measure of the parametric domain and subsequently solves a preconditionedmore » $$\\ell^1$$-minimization problem, where the weights of the diagonal preconditioning matrix are given by evaluations of the Christoffel function. Our algorithm can be applied to a wide class of orthogonal polynomial families on bounded and unbounded domains, including all classical families. We present theoretical analysis to motivate the algorithm and numerical results that show our method is superior to standard Monte Carlo methods in many situations of interest. In conclusion, numerical examples are also provided to demonstrate that our proposed algorithm leads to comparable or improved accuracy even when compared with Legendre- and Hermite-specific algorithms.« less
Yu, Qiang; Wei, Dingbang; Huo, Hongwei
2018-06-18
Given a set of t n-length DNA sequences, q satisfying 0 < q ≤ 1, and l and d satisfying 0 ≤ d < l < n, the quorum planted motif search (qPMS) finds l-length strings that occur in at least qt input sequences with up to d mismatches and is mainly used to locate transcription factor binding sites in DNA sequences. Existing qPMS algorithms have been able to efficiently process small standard datasets (e.g., t = 20 and n = 600), but they are too time consuming to process large DNA datasets, such as ChIP-seq datasets that contain thousands of sequences or more. We analyze the effects of t and q on the time performance of qPMS algorithms and find that a large t or a small q causes a longer computation time. Based on this information, we improve the time performance of existing qPMS algorithms by selecting a sample sequence set D' with a small t and a large q from the large input dataset D and then executing qPMS algorithms on D'. A sample sequence selection algorithm named SamSelect is proposed. The experimental results on both simulated and real data show (1) that SamSelect can select D' efficiently and (2) that the qPMS algorithms executed on D' can find implanted or real motifs in a significantly shorter time than when executed on D. We improve the ability of existing qPMS algorithms to process large DNA datasets from the perspective of selecting high-quality sample sequence sets so that the qPMS algorithms can find motifs in a short time in the selected sample sequence set D', rather than take an unfeasibly long time to search the original sequence set D. Our motif discovery method is an approximate algorithm.
Simple-random-sampling-based multiclass text classification algorithm.
Liu, Wuying; Wang, Lin; Yi, Mianzhu
2014-01-01
Multiclass text classification (MTC) is a challenging issue and the corresponding MTC algorithms can be used in many applications. The space-time overhead of the algorithms must be concerned about the era of big data. Through the investigation of the token frequency distribution in a Chinese web document collection, this paper reexamines the power law and proposes a simple-random-sampling-based MTC (SRSMTC) algorithm. Supported by a token level memory to store labeled documents, the SRSMTC algorithm uses a text retrieval approach to solve text classification problems. The experimental results on the TanCorp data set show that SRSMTC algorithm can achieve the state-of-the-art performance at greatly reduced space-time requirements.
Backfitting in Smoothing Spline Anova, with Application to Historical Global Temperature Data
NASA Astrophysics Data System (ADS)
Luo, Zhen
In the attempt to estimate the temperature history of the earth using the surface observations, various biases can exist. An important source of bias is the incompleteness of sampling over both time and space. There have been a few methods proposed to deal with this problem. Although they can correct some biases resulting from incomplete sampling, they have ignored some other significant biases. In this dissertation, a smoothing spline ANOVA approach which is a multivariate function estimation method is proposed to deal simultaneously with various biases resulting from incomplete sampling. Besides that, an advantage of this method is that we can get various components of the estimated temperature history with a limited amount of information stored. This method can also be used for detecting erroneous observations in the data base. The method is illustrated through an example of modeling winter surface air temperature as a function of year and location. Extension to more complicated models are discussed. The linear system associated with the smoothing spline ANOVA estimates is too large to be solved by full matrix decomposition methods. A computational procedure combining the backfitting (Gauss-Seidel) algorithm and the iterative imputation algorithm is proposed. This procedure takes advantage of the tensor product structure in the data to make the computation feasible in an environment of limited memory. Various related issues are discussed, e.g., the computation of confidence intervals and the techniques to speed up the convergence of the backfitting algorithm such as collapsing and successive over-relaxation.
Optimal structure and parameter learning of Ising models
Lokhov, Andrey; Vuffray, Marc Denis; Misra, Sidhant; ...
2018-03-16
Reconstruction of the structure and parameters of an Ising model from binary samples is a problem of practical importance in a variety of disciplines, ranging from statistical physics and computational biology to image processing and machine learning. The focus of the research community shifted toward developing universal reconstruction algorithms that are both computationally efficient and require the minimal amount of expensive data. Here, we introduce a new method, interaction screening, which accurately estimates model parameters using local optimization problems. The algorithm provably achieves perfect graph structure recovery with an information-theoretically optimal number of samples, notably in the low-temperature regime, whichmore » is known to be the hardest for learning. Here, the efficacy of interaction screening is assessed through extensive numerical tests on synthetic Ising models of various topologies with different types of interactions, as well as on real data produced by a D-Wave quantum computer. Finally, this study shows that the interaction screening method is an exact, tractable, and optimal technique that universally solves the inverse Ising problem.« less
Gene Selection and Cancer Classification: A Rough Sets Based Approach
NASA Astrophysics Data System (ADS)
Sun, Lijun; Miao, Duoqian; Zhang, Hongyun
Indentification of informative gene subsets responsible for discerning between available samples of gene expression data is an important task in bioinformatics. Reducts, from rough sets theory, corresponding to a minimal set of essential genes for discerning samples, is an efficient tool for gene selection. Due to the compuational complexty of the existing reduct algoritms, feature ranking is usually used to narrow down gene space as the first step and top ranked genes are selected . In this paper,we define a novel certierion based on the expression level difference btween classes and contribution to classification of the gene for scoring genes and present a algorithm for generating all possible reduct from informative genes.The algorithm takes the whole attribute sets into account and find short reduct with a significant reduction in computational complexity. An exploration of this approach on benchmark gene expression data sets demonstrates that this approach is successful for selecting high discriminative genes and the classification accuracy is impressive.
Optimal structure and parameter learning of Ising models
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lokhov, Andrey; Vuffray, Marc Denis; Misra, Sidhant
Reconstruction of the structure and parameters of an Ising model from binary samples is a problem of practical importance in a variety of disciplines, ranging from statistical physics and computational biology to image processing and machine learning. The focus of the research community shifted toward developing universal reconstruction algorithms that are both computationally efficient and require the minimal amount of expensive data. Here, we introduce a new method, interaction screening, which accurately estimates model parameters using local optimization problems. The algorithm provably achieves perfect graph structure recovery with an information-theoretically optimal number of samples, notably in the low-temperature regime, whichmore » is known to be the hardest for learning. Here, the efficacy of interaction screening is assessed through extensive numerical tests on synthetic Ising models of various topologies with different types of interactions, as well as on real data produced by a D-Wave quantum computer. Finally, this study shows that the interaction screening method is an exact, tractable, and optimal technique that universally solves the inverse Ising problem.« less
Han, Wenhua; Shen, Xiaohui; Xu, Jun; Wang, Ping; Tian, Guiyun; Wu, Zhengyang
2014-01-01
Magnetic flux leakage (MFL) inspection is one of the most important and sensitive nondestructive testing approaches. For online MFL inspection of a long-range railway track or oil pipeline, a fast and effective defect profile estimating method based on a multi-power affine projection algorithm (MAPA) is proposed, where the depth of a sampling point is related with not only the MFL signals before it, but also the ones after it, and all of the sampling points related to one point appear as serials or multi-power. Defect profile estimation has two steps: regulating a weight vector in an MAPA filter and estimating a defect profile with the MAPA filter. Both simulation and experimental data are used to test the performance of the proposed method. The results demonstrate that the proposed method exhibits high speed while maintaining the estimated profiles clearly close to the desired ones in a noisy environment, thereby meeting the demand of accurate online inspection. PMID:25192314
Li, Zhi; Zhang, Zhao-hui; Zhao, Xiao-yan; Su, Hai-xia; Yan, Fang
2012-04-01
Extracting absorption spectrum in THz band is one of the important aspects in THz applications. Sample's absorption coefficient has a complex nonlinear relationship with its thickness. However, as it is not convenient to measure the thickness directly, absorption spectrum is usually determined incorrectly. Based on the method proposed by Duvillaret which was used to precisely determine the thickness of LiNbO3, the approach to measuring the absorption coefficient spectra of glutamine and histidine in frequency range from 0.3 to 2.6 THz(1 THz = 10(12) Hz) was improved in this paper. In order to validate the correctness of this absorption spectrum, we designed a series of experiments to compare the linearity of absorption coefficient belonging to one kind amino acid in different concentrations. The results indicate that as agreed by Lambert-Beer's Law, absorption coefficient spectrum of amino acid from the improved algorithm performs better linearity with its concentration than that from the common algorithm, which can be the basis of quantitative analysis in further researches.
Han, Wenhua; Shen, Xiaohui; Xu, Jun; Wang, Ping; Tian, Guiyun; Wu, Zhengyang
2014-09-04
Magnetic flux leakage (MFL) inspection is one of the most important and sensitive nondestructive testing approaches. For online MFL inspection of a long-range railway track or oil pipeline, a fast and effective defect profile estimating method based on a multi-power affine projection algorithm (MAPA) is proposed, where the depth of a sampling point is related with not only the MFL signals before it, but also the ones after it, and all of the sampling points related to one point appear as serials or multi-power. Defect profile estimation has two steps: regulating a weight vector in an MAPA filter and estimating a defect profile with the MAPA filter. Both simulation and experimental data are used to test the performance of the proposed method. The results demonstrate that the proposed method exhibits high speed while maintaining the estimated profiles clearly close to the desired ones in a noisy environment, thereby meeting the demand of accurate online inspection.
Rare event simulation in radiation transport
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kollman, Craig
1993-10-01
This dissertation studies methods for estimating extremely small probabilities by Monte Carlo simulation. Problems in radiation transport typically involve estimating very rare events or the expected value of a random variable which is with overwhelming probability equal to zero. These problems often have high dimensional state spaces and irregular geometries so that analytic solutions are not possible. Monte Carlo simulation must be used to estimate the radiation dosage being transported to a particular location. If the area is well shielded the probability of any one particular particle getting through is very small. Because of the large number of particles involved,more » even a tiny fraction penetrating the shield may represent an unacceptable level of radiation. It therefore becomes critical to be able to accurately estimate this extremely small probability. Importance sampling is a well known technique for improving the efficiency of rare event calculations. Here, a new set of probabilities is used in the simulation runs. The results are multiple by the likelihood ratio between the true and simulated probabilities so as to keep the estimator unbiased. The variance of the resulting estimator is very sensitive to which new set of transition probabilities are chosen. It is shown that a zero variance estimator does exist, but that its computation requires exact knowledge of the solution. A simple random walk with an associated killing model for the scatter of neutrons is introduced. Large deviation results for optimal importance sampling in random walks are extended to the case where killing is present. An adaptive ``learning`` algorithm for implementing importance sampling is given for more general Markov chain models of neutron scatter. For finite state spaces this algorithm is shown to give with probability one, a sequence of estimates converging exponentially fast to the true solution.« less
NASA Astrophysics Data System (ADS)
Nazemi, A.; Zaerpour, M.
2016-12-01
Current paradigm for assessing the vulnerability of water resource systems to changing streamflow conditions often involves a cascade application of climate and hydrological models to project the future states of streamflow regime, entering to a given water resource system. It is widely warned, however, that the overall uncertainty in this "top-down" modeling enterprise can be large due to the limitations in representing natural and anthropogenic processes that affect future streamflow variability and change. To address this, various types of stress-tests are suggested to assess the vulnerability of water resources systems under a wide range of possible changes in streamflow conditions. The scope of such "bottom-up" assessments can go well beyond top-down projections and therefore provide a basis for monitoring different response modes, under which water resource systems become vulnerable. Despite methodological differences, all bottom-up assessments are equipped with a systematic sampling procedure, with which different possibilities for future climate and/or streamflow conditions can be realized. Regardless of recent developments, currently available streamflow sampling algorithms are still limited, particularly in regional contexts, for which accurate representation of spatiotemporal dependencies in streamflow regime are of major importance. In this presentation, we introduce a new development that enables handling temporal and spatial dependencies in regional streamflow regimes through a unified stochastic reconstruction algorithm. We demonstrate the application of this algorithm accross various Canadian regions. By considering a real-world regional water resources system, we show how the new multi-site reconstruction algorithm can extend the practical utility of bottom-up vulnerability assessment and improve quantifying the associated risk in natural and anthropogenic water systems under unknown future conditions.
al3c: high-performance software for parameter inference using Approximate Bayesian Computation.
Stram, Alexander H; Marjoram, Paul; Chen, Gary K
2015-11-01
The development of Approximate Bayesian Computation (ABC) algorithms for parameter inference which are both computationally efficient and scalable in parallel computing environments is an important area of research. Monte Carlo rejection sampling, a fundamental component of ABC algorithms, is trivial to distribute over multiple processors but is inherently inefficient. While development of algorithms such as ABC Sequential Monte Carlo (ABC-SMC) help address the inherent inefficiencies of rejection sampling, such approaches are not as easily scaled on multiple processors. As a result, current Bayesian inference software offerings that use ABC-SMC lack the ability to scale in parallel computing environments. We present al3c, a C++ framework for implementing ABC-SMC in parallel. By requiring only that users define essential functions such as the simulation model and prior distribution function, al3c abstracts the user from both the complexities of parallel programming and the details of the ABC-SMC algorithm. By using the al3c framework, the user is able to scale the ABC-SMC algorithm in parallel computing environments for his or her specific application, with minimal programming overhead. al3c is offered as a static binary for Linux and OS-X computing environments. The user completes an XML configuration file and C++ plug-in template for the specific application, which are used by al3c to obtain the desired results. Users can download the static binaries, source code, reference documentation and examples (including those in this article) by visiting https://github.com/ahstram/al3c. astram@usc.edu Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Higher-order time integration of Coulomb collisions in a plasma using Langevin equations
Dimits, A. M.; Cohen, B. I.; Caflisch, R. E.; ...
2013-02-08
The extension of Langevin-equation Monte-Carlo algorithms for Coulomb collisions from the conventional Euler-Maruyama time integration to the next higher order of accuracy, the Milstein scheme, has been developed, implemented, and tested. This extension proceeds via a formulation of the angular scattering directly as stochastic differential equations in the two fixed-frame spherical-coordinate velocity variables. Results from the numerical implementation show the expected improvement [O(Δt) vs. O(Δt 1/2)] in the strong convergence rate both for the speed |v| and angular components of the scattering. An important result is that this improved convergence is achieved for the angular component of the scattering ifmore » and only if the “area-integral” terms in the Milstein scheme are included. The resulting Milstein scheme is of value as a step towards algorithms with both improved accuracy and efficiency. These include both algorithms with improved convergence in the averages (weak convergence) and multi-time-level schemes. The latter have been shown to give a greatly reduced cost for a given overall error level when compared with conventional Monte-Carlo schemes, and their performance is improved considerably when the Milstein algorithm is used for the underlying time advance versus the Euler-Maruyama algorithm. A new method for sampling the area integrals is given which is a simplification of an earlier direct method and which retains high accuracy. Lastly, this method, while being useful in its own right because of its relative simplicity, is also expected to considerably reduce the computational requirements for the direct conditional sampling of the area integrals that is needed for adaptive strong integration.« less
Development and validation of an online interactive, multimedia wound care algorithms program.
Beitz, Janice M; van Rijswijk, Lia
2012-01-01
To provide education based on evidence-based and validated wound care algorithms we designed and implemented an interactive, Web-based learning program for teaching wound care. A mixed methods quantitative pilot study design with qualitative components was used to test and ascertain the ease of use, validity, and reliability of the online program. A convenience sample of 56 RN wound experts (formally educated, certified in wound care, or both) participated. The interactive, online program consists of a user introduction, interactive assessment of 15 acute and chronic wound photos, user feedback about the percentage correct, partially correct, or incorrect algorithm and dressing choices and a user survey. After giving consent, participants accessed the online program, provided answers to the demographic survey, and completed the assessment module and photographic test, along with a posttest survey. The construct validity of the online interactive program was strong. Eighty-five percent (85%) of algorithm and 87% of dressing choices were fully correct even though some programming design issues were identified. Online study results were consistently better than previously conducted comparable paper-pencil study results. Using a 5-point Likert-type scale, participants rated the program's value and ease of use as 3.88 (valuable to very valuable) and 3.97 (easy to very easy), respectively. Similarly the research process was described qualitatively as "enjoyable" and "exciting." This digital program was well received indicating its "perceived benefits" for nonexpert users, which may help reduce barriers to implementing safe, evidence-based care. Ongoing research using larger sample sizes may help refine the program or algorithms while identifying clinician educational needs. Initial design imperfections and programming problems identified also underscored the importance of testing all paper and Web-based programs designed to educate health care professionals or guide patient care.
Improving image quality in laboratory x-ray phase-contrast imaging
NASA Astrophysics Data System (ADS)
De Marco, F.; Marschner, M.; Birnbacher, L.; Viermetz, M.; Noël, P.; Herzen, J.; Pfeiffer, F.
2017-03-01
Grating-based X-ray phase-contrast (gbPC) is known to provide significant benefits for biomedical imaging. To investigate these benefits, a high-sensitivity gbPC micro-CT setup for small (≍ 5 cm) biological samples has been constructed. Unfortunately, high differential-phase sensitivity leads to an increased magnitude of data processing artifacts, limiting the quality of tomographic reconstructions. Most importantly, processing of phase-stepping data with incorrect stepping positions can introduce artifacts resembling Moiré fringes to the projections. Additionally, the focal spot size of the X-ray source limits resolution of tomograms. Here we present a set of algorithms to minimize artifacts, increase resolution and improve visual impression of projections and tomograms from the examined setup. We assessed two algorithms for artifact reduction: Firstly, a correction algorithm exploiting correlations of the artifacts and differential-phase data was developed and tested. Artifacts were reliably removed without compromising image data. Secondly, we implemented a new algorithm for flatfield selection, which was shown to exclude flat-fields with strong artifacts. Both procedures successfully improved image quality of projections and tomograms. Deconvolution of all projections of a CT scan can minimize blurring introduced by the finite size of the X-ray source focal spot. Application of the Richardson-Lucy deconvolution algorithm to gbPC-CT projections resulted in an improved resolution of phase-contrast tomograms. Additionally, we found that nearest-neighbor interpolation of projections can improve the visual impression of very small features in phase-contrast tomograms. In conclusion, we achieved an increase in image resolution and quality for the investigated setup, which may lead to an improved detection of very small sample features, thereby maximizing the setup's utility.
Tang, Jie; Nett, Brian E; Chen, Guang-Hong
2009-10-07
Of all available reconstruction methods, statistical iterative reconstruction algorithms appear particularly promising since they enable accurate physical noise modeling. The newly developed compressive sampling/compressed sensing (CS) algorithm has shown the potential to accurately reconstruct images from highly undersampled data. The CS algorithm can be implemented in the statistical reconstruction framework as well. In this study, we compared the performance of two standard statistical reconstruction algorithms (penalized weighted least squares and q-GGMRF) to the CS algorithm. In assessing the image quality using these iterative reconstructions, it is critical to utilize realistic background anatomy as the reconstruction results are object dependent. A cadaver head was scanned on a Varian Trilogy system at different dose levels. Several figures of merit including the relative root mean square error and a quality factor which accounts for the noise performance and the spatial resolution were introduced to objectively evaluate reconstruction performance. A comparison is presented between the three algorithms for a constant undersampling factor comparing different algorithms at several dose levels. To facilitate this comparison, the original CS method was formulated in the framework of the statistical image reconstruction algorithms. Important conclusions of the measurements from our studies are that (1) for realistic neuro-anatomy, over 100 projections are required to avoid streak artifacts in the reconstructed images even with CS reconstruction, (2) regardless of the algorithm employed, it is beneficial to distribute the total dose to more views as long as each view remains quantum noise limited and (3) the total variation-based CS method is not appropriate for very low dose levels because while it can mitigate streaking artifacts, the images exhibit patchy behavior, which is potentially harmful for medical diagnosis.
An improved target velocity sampling algorithm for free gas elastic scattering
DOE Office of Scientific and Technical Information (OSTI.GOV)
Romano, Paul K.; Walsh, Jonathan A.
We present an improved algorithm for sampling the target velocity when simulating elastic scattering in a Monte Carlo neutron transport code that correctly accounts for the energy dependence of the scattering cross section. The algorithm samples the relative velocity directly, thereby avoiding a potentially inefficient rejection step based on the ratio of cross sections. Here, we have shown that this algorithm requires only one rejection step, whereas other methods of similar accuracy require two rejection steps. The method was verified against stochastic and deterministic reference results for upscattering percentages in 238U. Simulations of a light water reactor pin cell problemmore » demonstrate that using this algorithm results in a 3% or less penalty in performance when compared with an approximate method that is used in most production Monte Carlo codes« less
An improved target velocity sampling algorithm for free gas elastic scattering
Romano, Paul K.; Walsh, Jonathan A.
2018-02-03
We present an improved algorithm for sampling the target velocity when simulating elastic scattering in a Monte Carlo neutron transport code that correctly accounts for the energy dependence of the scattering cross section. The algorithm samples the relative velocity directly, thereby avoiding a potentially inefficient rejection step based on the ratio of cross sections. Here, we have shown that this algorithm requires only one rejection step, whereas other methods of similar accuracy require two rejection steps. The method was verified against stochastic and deterministic reference results for upscattering percentages in 238U. Simulations of a light water reactor pin cell problemmore » demonstrate that using this algorithm results in a 3% or less penalty in performance when compared with an approximate method that is used in most production Monte Carlo codes« less
New prior sampling methods for nested sampling - Development and testing
NASA Astrophysics Data System (ADS)
Stokes, Barrie; Tuyl, Frank; Hudson, Irene
2017-06-01
Nested Sampling is a powerful algorithm for fitting models to data in the Bayesian setting, introduced by Skilling [1]. The nested sampling algorithm proceeds by carrying out a series of compressive steps, involving successively nested iso-likelihood boundaries, starting with the full prior distribution of the problem parameters. The "central problem" of nested sampling is to draw at each step a sample from the prior distribution whose likelihood is greater than the current likelihood threshold, i.e., a sample falling inside the current likelihood-restricted region. For both flat and informative priors this ultimately requires uniform sampling restricted to the likelihood-restricted region. We present two new methods of carrying out this sampling step, and illustrate their use with the lighthouse problem [2], a bivariate likelihood used by Gregory [3] and a trivariate Gaussian mixture likelihood. All the algorithm development and testing reported here has been done with Mathematica® [4].
Mori, Yoshiharu; Okumura, Hisashi
2015-12-05
Simulated tempering (ST) is a useful method to enhance sampling of molecular simulations. When ST is used, the Metropolis algorithm, which satisfies the detailed balance condition, is usually applied to calculate the transition probability. Recently, an alternative method that satisfies the global balance condition instead of the detailed balance condition has been proposed by Suwa and Todo. In this study, ST method with the Suwa-Todo algorithm is proposed. Molecular dynamics simulations with ST are performed with three algorithms (the Metropolis, heat bath, and Suwa-Todo algorithms) to calculate the transition probability. Among the three algorithms, the Suwa-Todo algorithm yields the highest acceptance ratio and the shortest autocorrelation time. These suggest that sampling by a ST simulation with the Suwa-Todo algorithm is most efficient. In addition, because the acceptance ratio of the Suwa-Todo algorithm is higher than that of the Metropolis algorithm, the number of temperature states can be reduced by 25% for the Suwa-Todo algorithm when compared with the Metropolis algorithm. © 2015 Wiley Periodicals, Inc.
NASA Astrophysics Data System (ADS)
Jiang, Kaili; Zhu, Jun; Tang, Bin
2017-12-01
Periodic nonuniform sampling occurs in many applications, and the Nyquist folding receiver (NYFR) is an efficient, low complexity, and broadband spectrum sensing architecture. In this paper, we first derive that the radio frequency (RF) sample clock function of NYFR is periodic nonuniform. Then, the classical results of periodic nonuniform sampling are applied to NYFR. We extend the spectral reconstruction algorithm of time series decomposed model to the subsampling case by using the spectrum characteristics of NYFR. The subsampling case is common for broadband spectrum surveillance. Finally, we take example for a LFM signal under large bandwidth to verify the proposed algorithm and compare the spectral reconstruction algorithm with orthogonal matching pursuit (OMP) algorithm.
Driven-dissipative quantum Monte Carlo method for open quantum systems
NASA Astrophysics Data System (ADS)
Nagy, Alexandra; Savona, Vincenzo
2018-05-01
We develop a real-time full configuration-interaction quantum Monte Carlo approach to model driven-dissipative open quantum systems with Markovian system-bath coupling. The method enables stochastic sampling of the Liouville-von Neumann time evolution of the density matrix thanks to a massively parallel algorithm, thus providing estimates of observables on the nonequilibrium steady state. We present the underlying theory and introduce an initiator technique and importance sampling to reduce the statistical error. Finally, we demonstrate the efficiency of our approach by applying it to the driven-dissipative two-dimensional X Y Z spin-1/2 model on a lattice.
Hyperspectral analysis of clay minerals
NASA Astrophysics Data System (ADS)
Janaki Rama Suresh, G.; Sreenivas, K.; Sivasamy, R.
2014-11-01
A study was carried out by collecting soil samples from parts of Gwalior and Shivpuri district, Madhya Pradesh in order to assess the dominant clay mineral of these soils using hyperspectral data, as 0.4 to 2.5 μm spectral range provides abundant and unique information about many important earth-surface minerals. Understanding the spectral response along with the soil chemical properties can provide important clues for retrieval of mineralogical soil properties. The soil samples were collected based on stratified random sampling approach and dominant clay minerals were identified through XRD analysis. The absorption feature parameters like depth, width, area and asymmetry of the absorption peaks were derived from spectral profile of soil samples through DISPEC tool. The derived absorption feature parameters were used as inputs for modelling the dominant soil clay mineral present in the unknown samples using Random forest approach which resulted in kappa accuracy of 0.795. Besides, an attempt was made to classify the Hyperion data using Spectral Angle Mapper (SAM) algorithm with an overall accuracy of 68.43 %. Results showed that kaolinite was the dominant mineral present in the soils followed by montmorillonite in the study area.
Symmetry compression method for discovering network motifs.
Wang, Jianxin; Huang, Yuannan; Wu, Fang-Xiang; Pan, Yi
2012-01-01
Discovering network motifs could provide a significant insight into systems biology. Interestingly, many biological networks have been found to have a high degree of symmetry (automorphism), which is inherent in biological network topologies. The symmetry due to the large number of basic symmetric subgraphs (BSSs) causes a certain redundant calculation in discovering network motifs. Therefore, we compress all basic symmetric subgraphs before extracting compressed subgraphs and propose an efficient decompression algorithm to decompress all compressed subgraphs without loss of any information. In contrast to previous approaches, the novel Symmetry Compression method for Motif Detection, named as SCMD, eliminates most redundant calculations caused by widespread symmetry of biological networks. We use SCMD to improve three notable exact algorithms and two efficient sampling algorithms. Results of all exact algorithms with SCMD are the same as those of the original algorithms, since SCMD is a lossless method. The sampling results show that the use of SCMD almost does not affect the quality of sampling results. For highly symmetric networks, we find that SCMD used in both exact and sampling algorithms can help get a remarkable speedup. Furthermore, SCMD enables us to find larger motifs in biological networks with notable symmetry than previously possible.
An agglomerative hierarchical clustering approach to visualisation in Bayesian clustering problems
Dawson, Kevin J.; Belkhir, Khalid
2009-01-01
Clustering problems (including the clustering of individuals into outcrossing populations, hybrid generations, full-sib families and selfing lines) have recently received much attention in population genetics. In these clustering problems, the parameter of interest is a partition of the set of sampled individuals, - the sample partition. In a fully Bayesian approach to clustering problems of this type, our knowledge about the sample partition is represented by a probability distribution on the space of possible sample partitions. Since the number of possible partitions grows very rapidly with the sample size, we can not visualise this probability distribution in its entirety, unless the sample is very small. As a solution to this visualisation problem, we recommend using an agglomerative hierarchical clustering algorithm, which we call the exact linkage algorithm. This algorithm is a special case of the maximin clustering algorithm that we introduced previously. The exact linkage algorithm is now implemented in our software package Partition View. The exact linkage algorithm takes the posterior co-assignment probabilities as input, and yields as output a rooted binary tree, - or more generally, a forest of such trees. Each node of this forest defines a set of individuals, and the node height is the posterior co-assignment probability of this set. This provides a useful visual representation of the uncertainty associated with the assignment of individuals to categories. It is also a useful starting point for a more detailed exploration of the posterior distribution in terms of the co-assignment probabilities. PMID:19337306
Fall 2014 SEI Research Review Probabilistic Analysis of Time Sensitive Systems
2014-10-28
Osmosis SMC Tool Osmosis is a tool for Statistical Model Checking (SMC) with Semantic Importance Sampling. • Input model is written in subset of C...ASSERT() statements in model indicate conditions that must hold. • Input probability distributions defined by the user. • Osmosis returns the...on: – Target relative error, or – Set number of simulations Osmosis Main Algorithm 1 http://dreal.cs.cmu.edu/ (?⃑?): Indicator
Fantin, Yuri S.; Neverov, Alexey D.; Favorov, Alexander V.; Alvarez-Figueroa, Maria V.; Braslavskaya, Svetlana I.; Gordukova, Maria A.; Karandashova, Inga V.; Kuleshov, Konstantin V.; Myznikova, Anna I.; Polishchuk, Maya S.; Reshetov, Denis A.; Voiciehovskaya, Yana A.; Mironov, Andrei A.; Chulanov, Vladimir P.
2013-01-01
Sanger sequencing is a common method of reading DNA sequences. It is less expensive than high-throughput methods, and it is appropriate for numerous applications including molecular diagnostics. However, sequencing mixtures of similar DNA of pathogens with this method is challenging. This is important because most clinical samples contain such mixtures, rather than pure single strains. The traditional solution is to sequence selected clones of PCR products, a complicated, time-consuming, and expensive procedure. Here, we propose the base-calling with vocabulary (BCV) method that computationally deciphers Sanger chromatograms obtained from mixed DNA samples. The inputs to the BCV algorithm are a chromatogram and a dictionary of sequences that are similar to those we expect to obtain. We apply the base-calling function on a test dataset of chromatograms without ambiguous positions, as well as one with 3–14% sequence degeneracy. Furthermore, we use BCV to assemble a consensus sequence for an HIV genome fragment in a sample containing a mixture of viral DNA variants and to determine the positions of the indels. Finally, we detect drug-resistant Mycobacterium tuberculosis strains carrying frameshift mutations mixed with wild-type bacteria in the pncA gene, and roughly characterize bacterial communities in clinical samples by direct 16S rRNA sequencing. PMID:23382983
SPECT reconstruction with nonuniform attenuation from highly under-sampled projection data
NASA Astrophysics Data System (ADS)
Li, Cuifen; Wen, Junhai; Zhang, Kangping; Shi, Donghao; Dong, Haixiang; Li, Wenxiao; Liang, Zhengrong
2012-03-01
Single photon emission computed tomography (SPECT) is an important nuclear medicine imaging technique and has been using in clinical diagnoses. The SPECT image can reflect not only organizational structure but also functional activities of human body, therefore diseases can be found much earlier. In SPECT, the reconstruction is based on the measurement of gamma photons emitted by the radiotracer. The number of gamma photons detected is proportional to the dose of radiopharmaceutical, but the dose is limited because of patient safety. There is an upper limit in the number of gamma photons that can be detected per unit time, so it takes a long time to acquire SPECT projection data. Sometimes we just can obtain highly under-sampled projection data because of the limit of the scanning time or imaging hardware. How to reconstruct an image using highly under-sampled projection data is an interesting problem. One method is to minimize the total variation (TV) of the reconstructed image during the iterative reconstruction. In this work, we developed an OSEM-TV SPECT reconstruction algorithm, which could reconstruct the image from highly under-sampled projection data with non-uniform attenuation. Simulation results demonstrate that the OSEM-TV algorithm performs well in SPECT reconstruction with non-uniform attenuation.
NASA Astrophysics Data System (ADS)
Azimi, Ehsan; Behrad, Alireza; Ghaznavi-Ghoushchi, Mohammad Bagher; Shanbehzadeh, Jamshid
2016-11-01
The projective model is an important mapping function for the calculation of global transformation between two images. However, its hardware implementation is challenging because of a large number of coefficients with different required precisions for fixed point representation. A VLSI hardware architecture is proposed for the calculation of a global projective model between input and reference images and refining false matches using random sample consensus (RANSAC) algorithm. To make the hardware implementation feasible, it is proved that the calculation of the projective model can be divided into four submodels comprising two translations, an affine model and a simpler projective mapping. This approach makes the hardware implementation feasible and considerably reduces the required number of bits for fixed point representation of model coefficients and intermediate variables. The proposed hardware architecture for the calculation of a global projective model using the RANSAC algorithm was implemented using Verilog hardware description language and the functionality of the design was validated through several experiments. The proposed architecture was synthesized by using an application-specific integrated circuit digital design flow utilizing 180-nm CMOS technology as well as a Virtex-6 field programmable gate array. Experimental results confirm the efficiency of the proposed hardware architecture in comparison with software implementation.
NASA Astrophysics Data System (ADS)
Khan, F.; Enzmann, F.; Kersten, M.
2015-12-01
In X-ray computed microtomography (μXCT) image processing is the most important operation prior to image analysis. Such processing mainly involves artefact reduction and image segmentation. We propose a new two-stage post-reconstruction procedure of an image of a geological rock core obtained by polychromatic cone-beam μXCT technology. In the first stage, the beam-hardening (BH) is removed applying a best-fit quadratic surface algorithm to a given image data set (reconstructed slice), which minimizes the BH offsets of the attenuation data points from that surface. The final BH-corrected image is extracted from the residual data, or the difference between the surface elevation values and the original grey-scale values. For the second stage, we propose using a least square support vector machine (a non-linear classifier algorithm) to segment the BH-corrected data as a pixel-based multi-classification task. A combination of the two approaches was used to classify a complex multi-mineral rock sample. The Matlab code for this approach is provided in the Appendix. A minor drawback is that the proposed segmentation algorithm may become computationally demanding in the case of a high dimensional training data set.
NASA Astrophysics Data System (ADS)
Huang, Shaohua; Wang, Lan; Chen, Weiwei; Lin, Duo; Huang, Lingling; Wu, Shanshan; Feng, Shangyuan; Chen, Rong
2014-09-01
A surface-enhanced Raman spectroscopy (SERS) approach was utilized for urine biochemical analysis with the aim to develop a label-free and non-invasive optical diagnostic method for esophagus cancer detection. SERS spectrums were acquired from 31 normal urine samples and 47 malignant esophagus cancer (EC) urine samples. Tentative assignments of urine SERS bands demonstrated esophagus cancer specific changes, including an increase in the relative amounts of urea and a decrease in the percentage of uric acid in the urine of normal compared with EC. The empirical algorithm integrated with linear discriminant analysis (LDA) were employed to identify some important urine SERS bands for differentiation between healthy subjects and EC urine. The empirical diagnostic approach based on the ratio of the SERS peak intensity at 527 to 1002 cm-1 and 725 to 1002 cm-1 coupled with LDA yielded a diagnostic sensitivity of 72.3% and specificity of 96.8%, respectively. The area under the receive operating characteristic (ROC) curve was 0.954, which further evaluate the performance of the diagnostic algorithm based on the ratio of the SERS peak intensity combined with LDA analysis. This work demonstrated that the urine SERS spectra associated with empirical algorithm has potential for noninvasive diagnosis of esophagus cancer.
Enhanced algorithms for stochastic programming
DOE Office of Scientific and Technical Information (OSTI.GOV)
Krishna, Alamuru S.
1993-09-01
In this dissertation, we present some of the recent advances made in solving two-stage stochastic linear programming problems of large size and complexity. Decomposition and sampling are two fundamental components of techniques to solve stochastic optimization problems. We describe improvements to the current techniques in both these areas. We studied different ways of using importance sampling techniques in the context of Stochastic programming, by varying the choice of approximation functions used in this method. We have concluded that approximating the recourse function by a computationally inexpensive piecewise-linear function is highly efficient. This reduced the problem from finding the mean ofmore » a computationally expensive functions to finding that of a computationally inexpensive function. Then we implemented various variance reduction techniques to estimate the mean of a piecewise-linear function. This method achieved similar variance reductions in orders of magnitude less time than, when we directly applied variance-reduction techniques directly on the given problem. In solving a stochastic linear program, the expected value problem is usually solved before a stochastic solution and also to speed-up the algorithm by making use of the information obtained from the solution of the expected value problem. We have devised a new decomposition scheme to improve the convergence of this algorithm.« less
Lantieri, Francesca; Malacarne, Michela; Gimelli, Stefania; Santamaria, Giuseppe; Coviello, Domenico; Ceccherini, Isabella
2017-01-01
The presence of false positive and false negative results in the Array Comparative Genomic Hybridization (aCGH) design is poorly addressed in literature reports. We took advantage of a custom aCGH recently carried out to analyze its design performance, the use of several Agilent aberrations detection algorithms, and the presence of false results. Our study provides a confirmation that the high density design does not generate more noise than standard designs and, might reach a good resolution. We noticed a not negligible presence of false negative and false positive results in the imbalances call performed by the Agilent software. The Aberration Detection Method 2 (ADM-2) algorithm with a threshold of 6 performed quite well, and the array design proved to be reliable, provided that some additional filters are applied, such as considering only intervals with average absolute log2ratio above 0.3. We also propose an additional filter that takes into account the proportion of probes with log2ratio exceeding suggestive values for gain or loss. In addition, the quality of samples was confirmed to be a crucial parameter. Finally, this work raises the importance of evaluating the samples profiles by eye and the necessity of validating the imbalances detected. PMID:28287439
Baek, Hyun Jae; Shin, JaeWook; Jin, Gunwoo; Cho, Jaegeol
2017-10-24
Photoplethysmographic signals are useful for heart rate variability analysis in practical ambulatory applications. While reducing the sampling rate of signals is an important consideration for modern wearable devices that enable 24/7 continuous monitoring, there have not been many studies that have investigated how to compensate the low timing resolution of low-sampling-rate signals for accurate heart rate variability analysis. In this study, we utilized the parabola approximation method and measured it against the conventional cubic spline interpolation method for the time, frequency, and nonlinear domain variables of heart rate variability. For each parameter, the intra-class correlation, standard error of measurement, Bland-Altman 95% limits of agreement and root mean squared relative error were presented. Also, elapsed time taken to compute each interpolation algorithm was investigated. The results indicated that parabola approximation is a simple, fast, and accurate algorithm-based method for compensating the low timing resolution of pulse beat intervals. In addition, the method showed comparable performance with the conventional cubic spline interpolation method. Even though the absolute value of the heart rate variability variables calculated using a signal sampled at 20 Hz were not exactly matched with those calculated using a reference signal sampled at 250 Hz, the parabola approximation method remains a good interpolation method for assessing trends in HRV measurements for low-power wearable applications.
Test Generation Algorithm for Fault Detection of Analog Circuits Based on Extreme Learning Machine
Zhou, Jingyu; Tian, Shulin; Yang, Chenglin; Ren, Xuelong
2014-01-01
This paper proposes a novel test generation algorithm based on extreme learning machine (ELM), and such algorithm is cost-effective and low-risk for analog device under test (DUT). This method uses test patterns derived from the test generation algorithm to stimulate DUT, and then samples output responses of the DUT for fault classification and detection. The novel ELM-based test generation algorithm proposed in this paper contains mainly three aspects of innovation. Firstly, this algorithm saves time efficiently by classifying response space with ELM. Secondly, this algorithm can avoid reduced test precision efficiently in case of reduction of the number of impulse-response samples. Thirdly, a new process of test signal generator and a test structure in test generation algorithm are presented, and both of them are very simple. Finally, the abovementioned improvement and functioning are confirmed in experiments. PMID:25610458
Van Herpe, Tom; De Brabanter, Jos; Beullens, Martine; De Moor, Bart; Van den Berghe, Greet
2008-01-01
Introduction Blood glucose (BG) control performed by intensive care unit (ICU) nurses is becoming standard practice for critically ill patients. New (semi-automated) 'BG control' algorithms (or 'insulin titration' algorithms) are under development, but these require stringent validation before they can replace the currently used algorithms. Existing methods for objectively comparing different insulin titration algorithms show weaknesses. In the current study, a new approach for appropriately assessing the adequacy of different algorithms is proposed. Methods Two ICU patient populations (with different baseline characteristics) were studied, both treated with a similar 'nurse-driven' insulin titration algorithm targeting BG levels of 80 to 110 mg/dl. A new method for objectively evaluating BG deviations from normoglycemia was founded on a smooth penalty function. Next, the performance of this new evaluation tool was compared with the current standard assessment methods, on an individual as well as a population basis. Finally, the impact of four selected parameters (the average BG sampling frequency, the duration of algorithm application, the severity of disease, and the type of illness) on the performance of an insulin titration algorithm was determined by multiple regression analysis. Results The glycemic penalty index (GPI) was proposed as a tool for assessing the overall glycemic control behavior in ICU patients. The GPI of a patient is the average of all penalties that are individually assigned to each measured BG value based on the optimized smooth penalty function. The computation of this index returns a number between 0 (no penalty) and 100 (the highest penalty). For some patients, the assessment of the BG control behavior using the traditional standard evaluation methods was different from the evaluation with GPI. Two parameters were found to have a significant impact on GPI: the BG sampling frequency and the duration of algorithm application. A higher BG sampling frequency and a longer algorithm application duration resulted in an apparently better performance, as indicated by a lower GPI. Conclusion The GPI is an alternative method for evaluating the performance of BG control algorithms. The blood glucose sampling frequency and the duration of algorithm application should be similar when comparing algorithms. PMID:18302732
Optical identity authentication technique based on compressive ghost imaging with QR code
NASA Astrophysics Data System (ADS)
Wenjie, Zhan; Leihong, Zhang; Xi, Zeng; Yi, Kang
2018-04-01
With the rapid development of computer technology, information security has attracted more and more attention. It is not only related to the information and property security of individuals and enterprises, but also to the security and social stability of a country. Identity authentication is the first line of defense in information security. In authentication systems, response time and security are the most important factors. An optical authentication technology based on compressive ghost imaging with QR codes is proposed in this paper. The scheme can be authenticated with a small number of samples. Therefore, the response time of the algorithm is short. At the same time, the algorithm can resist certain noise attacks, so it offers good security.
Crystal Structure Predictions Using Adaptive Genetic Algorithm and Motif Search methods
NASA Astrophysics Data System (ADS)
Ho, K. M.; Wang, C. Z.; Zhao, X.; Wu, S.; Lyu, X.; Zhu, Z.; Nguyen, M. C.; Umemoto, K.; Wentzcovitch, R. M. M.
2017-12-01
Material informatics is a new initiative which has attracted a lot of attention in recent scientific research. The basic strategy is to construct comprehensive data sets and use machine learning to solve a wide variety of problems in material design and discovery. In pursuit of this goal, a key element is the quality and completeness of the databases used. Recent advance in the development of crystal structure prediction algorithms has made it a complementary and more efficient approach to explore the structure/phase space in materials using computers. In this talk, we discuss the importance of the structural motifs and motif-networks in crystal structure predictions. Correspondingly, powerful methods are developed to improve the sampling of the low-energy structure landscape.
A surrogate accelerated multicanonical Monte Carlo method for uncertainty quantification
NASA Astrophysics Data System (ADS)
Wu, Keyi; Li, Jinglai
2016-09-01
In this work we consider a class of uncertainty quantification problems where the system performance or reliability is characterized by a scalar parameter y. The performance parameter y is random due to the presence of various sources of uncertainty in the system, and our goal is to estimate the probability density function (PDF) of y. We propose to use the multicanonical Monte Carlo (MMC) method, a special type of adaptive importance sampling algorithms, to compute the PDF of interest. Moreover, we develop an adaptive algorithm to construct local Gaussian process surrogates to further accelerate the MMC iterations. With numerical examples we demonstrate that the proposed method can achieve several orders of magnitudes of speedup over the standard Monte Carlo methods.
An algorithm for U-Pb isotope dilution data reduction and uncertainty propagation
NASA Astrophysics Data System (ADS)
McLean, N. M.; Bowring, J. F.; Bowring, S. A.
2011-06-01
High-precision U-Pb geochronology by isotope dilution-thermal ionization mass spectrometry is integral to a variety of Earth science disciplines, but its ultimate resolving power is quantified by the uncertainties of calculated U-Pb dates. As analytical techniques have advanced, formerly small sources of uncertainty are increasingly important, and thus previous simplifications for data reduction and uncertainty propagation are no longer valid. Although notable previous efforts have treated propagation of correlated uncertainties for the U-Pb system, the equations, uncertainties, and correlations have been limited in number and subject to simplification during propagation through intermediary calculations. We derive and present a transparent U-Pb data reduction algorithm that transforms raw isotopic data and measured or assumed laboratory parameters into the isotopic ratios and dates geochronologists interpret without making assumptions about the relative size of sample components. To propagate uncertainties and their correlations, we describe, in detail, a linear algebraic algorithm that incorporates all input uncertainties and correlations without limiting or simplifying covariance terms to propagate them though intermediate calculations. Finally, a weighted mean algorithm is presented that utilizes matrix elements from the uncertainty propagation algorithm to propagate random and systematic uncertainties for data comparison between other U-Pb labs and other geochronometers. The linear uncertainty propagation algorithms are verified with Monte Carlo simulations of several typical analyses. We propose that our algorithms be considered by the community for implementation to improve the collaborative science envisioned by the EARTHTIME initiative.
Retention time alignment of LC/MS data by a divide-and-conquer algorithm.
Zhang, Zhongqi
2012-04-01
Liquid chromatography-mass spectrometry (LC/MS) has become the method of choice for characterizing complex mixtures. These analyses often involve quantitative comparison of components in multiple samples. To achieve automated sample comparison, the components of interest must be detected and identified, and their retention times aligned and peak areas calculated. This article describes a simple pairwise iterative retention time alignment algorithm, based on the divide-and-conquer approach, for alignment of ion features detected in LC/MS experiments. In this iterative algorithm, ion features in the sample run are first aligned with features in the reference run by applying a single constant shift of retention time. The sample chromatogram is then divided into two shorter chromatograms, which are aligned to the reference chromatogram the same way. Each shorter chromatogram is further divided into even shorter chromatograms. This process continues until each chromatogram is sufficiently narrow so that ion features within it have a similar retention time shift. In six pairwise LC/MS alignment examples containing a total of 6507 confirmed true corresponding feature pairs with retention time shifts up to five peak widths, the algorithm successfully aligned these features with an error rate of 0.2%. The alignment algorithm is demonstrated to be fast, robust, fully automatic, and superior to other algorithms. After alignment and gap-filling of detected ion features, their abundances can be tabulated for direct comparison between samples.
Godsey, Brian; Heiser, Diane; Civin, Curt
2012-01-01
MicroRNAs (miRs) are known to play an important role in mRNA regulation, often by binding to complementary sequences in "target" mRNAs. Recently, several methods have been developed by which existing sequence-based target predictions can be combined with miR and mRNA expression data to infer true miR-mRNA targeting relationships. It has been shown that the combination of these two approaches gives more reliable results than either by itself. While a few such algorithms give excellent results, none fully addresses expression data sets with a natural ordering of the samples. If the samples in an experiment can be ordered or partially ordered by their expected similarity to one another, such as for time-series or studies of development processes, stages, or types, (e.g. cell type, disease, growth, aging), there are unique opportunities to infer miR-mRNA interactions that may be specific to the underlying processes, and existing methods do not exploit this. We propose an algorithm which specifically addresses [partially] ordered expression data and takes advantage of sample similarities based on the ordering structure. This is done within a Bayesian framework which specifies posterior distributions and therefore statistical significance for each model parameter and latent variable. We apply our model to a previously published expression data set of paired miR and mRNA arrays in five partially ordered conditions, with biological replicates, related to multiple myeloma, and we show how considering potential orderings can improve the inference of miR-mRNA interactions, as measured by existing knowledge about the involved transcripts.
Sampling, feasibility, and priors in data assimilation
Tu, Xuemin; Morzfeld, Matthias; Miller, Robert N.; ...
2016-03-01
Importance sampling algorithms are discussed in detail, with an emphasis on implicit sampling, and applied to data assimilation via particle filters. Implicit sampling makes it possible to use the data to find high-probability samples at relatively low cost, making the assimilation more efficient. A new analysis of the feasibility of data assimilation is presented, showing in detail why feasibility depends on the Frobenius norm of the covariance matrix of the noise and not on the number of variables. A discussion of the convergence of particular particle filters follows. A major open problem in numerical data assimilation is the determination ofmore » appropriate priors, a progress report on recent work on this problem is given. The analysis highlights the need for a careful attention both to the data and to the physics in data assimilation problems.« less
A novel directional asymmetric sampling search algorithm for fast block-matching motion estimation
NASA Astrophysics Data System (ADS)
Li, Yue-e.; Wang, Qiang
2011-11-01
This paper proposes a novel directional asymmetric sampling search (DASS) algorithm for video compression. Making full use of the error information (block distortions) of the search patterns, eight different direction search patterns are designed for various situations. The strategy of local sampling search is employed for the search of big-motion vector. In order to further speed up the search, early termination strategy is adopted in procedure of DASS. Compared to conventional fast algorithms, the proposed method has the most satisfactory PSNR values for all test sequences.
Plane-Based Sampling for Ray Casting Algorithm in Sequential Medical Images
Lin, Lili; Chen, Shengyong; Shao, Yan; Gu, Zichun
2013-01-01
This paper proposes a plane-based sampling method to improve the traditional Ray Casting Algorithm (RCA) for the fast reconstruction of a three-dimensional biomedical model from sequential images. In the novel method, the optical properties of all sampling points depend on the intersection points when a ray travels through an equidistant parallel plan cluster of the volume dataset. The results show that the method improves the rendering speed at over three times compared with the conventional algorithm and the image quality is well guaranteed. PMID:23424608
A method for feature selection of APT samples based on entropy
NASA Astrophysics Data System (ADS)
Du, Zhenyu; Li, Yihong; Hu, Jinsong
2018-05-01
By studying the known APT attack events deeply, this paper propose a feature selection method of APT sample and a logic expression generation algorithm IOCG (Indicator of Compromise Generate). The algorithm can automatically generate machine readable IOCs (Indicator of Compromise), to solve the existing IOCs logical relationship is fixed, the number of logical items unchanged, large scale and cannot generate a sample of the limitations of the expression. At the same time, it can reduce the redundancy and useless APT sample processing time consumption, and improve the sharing rate of information analysis, and actively respond to complex and volatile APT attack situation. The samples were divided into experimental set and training set, and then the algorithm was used to generate the logical expression of the training set with the IOC_ Aware plug-in. The contrast expression itself was different from the detection result. The experimental results show that the algorithm is effective and can improve the detection effect.
Janson, Lucas; Schmerling, Edward; Clark, Ashley; Pavone, Marco
2015-01-01
In this paper we present a novel probabilistic sampling-based motion planning algorithm called the Fast Marching Tree algorithm (FMT*). The algorithm is specifically aimed at solving complex motion planning problems in high-dimensional configuration spaces. This algorithm is proven to be asymptotically optimal and is shown to converge to an optimal solution faster than its state-of-the-art counterparts, chiefly PRM* and RRT*. The FMT* algorithm performs a “lazy” dynamic programming recursion on a predetermined number of probabilistically-drawn samples to grow a tree of paths, which moves steadily outward in cost-to-arrive space. As such, this algorithm combines features of both single-query algorithms (chiefly RRT) and multiple-query algorithms (chiefly PRM), and is reminiscent of the Fast Marching Method for the solution of Eikonal equations. As a departure from previous analysis approaches that are based on the notion of almost sure convergence, the FMT* algorithm is analyzed under the notion of convergence in probability: the extra mathematical flexibility of this approach allows for convergence rate bounds—the first in the field of optimal sampling-based motion planning. Specifically, for a certain selection of tuning parameters and configuration spaces, we obtain a convergence rate bound of order O(n−1/d+ρ), where n is the number of sampled points, d is the dimension of the configuration space, and ρ is an arbitrarily small constant. We go on to demonstrate asymptotic optimality for a number of variations on FMT*, namely when the configuration space is sampled non-uniformly, when the cost is not arc length, and when connections are made based on the number of nearest neighbors instead of a fixed connection radius. Numerical experiments over a range of dimensions and obstacle configurations confirm our the-oretical and heuristic arguments by showing that FMT*, for a given execution time, returns substantially better solutions than either PRM* or RRT*, especially in high-dimensional configuration spaces and in scenarios where collision-checking is expensive. PMID:27003958
Sampling from complex networks using distributed learning automata
NASA Astrophysics Data System (ADS)
Rezvanian, Alireza; Rahmati, Mohammad; Meybodi, Mohammad Reza
2014-02-01
A complex network provides a framework for modeling many real-world phenomena in the form of a network. In general, a complex network is considered as a graph of real world phenomena such as biological networks, ecological networks, technological networks, information networks and particularly social networks. Recently, major studies are reported for the characterization of social networks due to a growing trend in analysis of online social networks as dynamic complex large-scale graphs. Due to the large scale and limited access of real networks, the network model is characterized using an appropriate part of a network by sampling approaches. In this paper, a new sampling algorithm based on distributed learning automata has been proposed for sampling from complex networks. In the proposed algorithm, a set of distributed learning automata cooperate with each other in order to take appropriate samples from the given network. To investigate the performance of the proposed algorithm, several simulation experiments are conducted on well-known complex networks. Experimental results are compared with several sampling methods in terms of different measures. The experimental results demonstrate the superiority of the proposed algorithm over the others.
Chen, Jun; Quan, Wenting; Cui, Tingwei
2015-01-01
In this study, two sample semi-analytical algorithms and one new unified multi-band semi-analytical algorithm (UMSA) for estimating chlorophyll-a (Chla) concentration were constructed by specifying optimal wavelengths. The three sample semi-analytical algorithms, including the three-band semi-analytical algorithm (TSA), four-band semi-analytical algorithm (FSA), and UMSA algorithm, were calibrated and validated by the dataset collected in the Yellow River Estuary between September 1 and 10, 2009. By comparing of the accuracy of assessment of TSA, FSA, and UMSA algorithms, it was found that the UMSA algorithm had a superior performance in comparison with the two other algorithms, TSA and FSA. Using the UMSA algorithm in retrieving Chla concentration in the Yellow River Estuary decreased by 25.54% NRMSE (normalized root mean square error) when compared with the FSA algorithm, and 29.66% NRMSE in comparison with the TSA algorithm. These are very significant improvements upon previous methods. Additionally, the study revealed that the TSA and FSA algorithms are merely more specific forms of the UMSA algorithm. Owing to the special form of the UMSA algorithm, if the same bands were used for both the TSA and UMSA algorithms or FSA and UMSA algorithms, the UMSA algorithm would theoretically produce superior results in comparison with the TSA and FSA algorithms. Thus, good results may also be produced if the UMSA algorithm were to be applied for predicting Chla concentration for datasets of Gitelson et al. (2008) and Le et al. (2009).
Adaptive Sampling for Urban Air Quality through Participatory Sensing
Zeng, Yuanyuan; Xiang, Kai
2017-01-01
Air pollution is one of the major problems of the modern world. The popularization and powerful functions of smartphone applications enable people to participate in urban sensing to better know about the air problems surrounding them. Data sampling is one of the most important problems that affect the sensing performance. In this paper, we propose an Adaptive Sampling Scheme for Urban Air Quality (AS-air) through participatory sensing. Firstly, we propose to find the pattern rules of air quality according to the historical data contributed by participants based on Apriori algorithm. Based on it, we predict the on-line air quality and use it to accelerate the learning process to choose and adapt the sampling parameter based on Q-learning. The evaluation results show that AS-air provides an energy-efficient sampling strategy, which is adaptive toward the varied outside air environment with good sampling efficiency. PMID:29099766
Discovering the Unknown: Improving Detection of Novel Species and Genera from Short Reads
Rosen, Gail L.; Polikar, Robi; Caseiro, Diamantino A.; ...
2011-01-01
High-throughput sequencing technologies enable metagenome profiling, simultaneous sequencing of multiple microbial species present within an environmental sample. Since metagenomic data includes sequence fragments (“reads”) from organisms that are absent from any database, new algorithms must be developed for the identification and annotation of novel sequence fragments. Homology-based techniques have been modified to detect novel species and genera, but, composition-based methods, have not been adapted. We develop a detection technique that can discriminate between “known” and “unknown” taxa, which can be used with composition-based methods, as well as a hybrid method. Unlike previous studies, we rigorously evaluate all algorithms for theirmore » ability to detect novel taxa. First, we show that the integration of a detector with a composition-based method performs significantly better than homology-based methods for the detection of novel species and genera, with best performance at finer taxonomic resolutions. Most importantly, we evaluate all the algorithms by introducing an “unknown” class and show that the modified version of PhymmBL has similar or better overall classification performance than the other modified algorithms, especially for the species-level and ultrashort reads. Finally, we evaluate theperformance of several algorithms on a real acid mine drainage dataset.« less
IEEE 802.15.4 ZigBee-Based Time-of-Arrival Estimation for Wireless Sensor Networks.
Cheon, Jeonghyeon; Hwang, Hyunsu; Kim, Dongsun; Jung, Yunho
2016-02-05
Precise time-of-arrival (TOA) estimation is one of the most important techniques in RF-based positioning systems that use wireless sensor networks (WSNs). Because the accuracy of TOA estimation is proportional to the RF signal bandwidth, using broad bandwidth is the most fundamental approach for achieving higher accuracy. Hence, ultra-wide-band (UWB) systems with a bandwidth of 500 MHz are commonly used. However, wireless systems with broad bandwidth suffer from the disadvantages of high complexity and high power consumption. Therefore, it is difficult to employ such systems in various WSN applications. In this paper, we present a precise time-of-arrival (TOA) estimation algorithm using an IEEE 802.15.4 ZigBee system with a narrow bandwidth of 2 MHz. In order to overcome the lack of bandwidth, the proposed algorithm estimates the fractional TOA within the sampling interval. Simulation results show that the proposed TOA estimation algorithm provides an accuracy of 0.5 m at a signal-to-noise ratio (SNR) of 8 dB and achieves an SNR gain of 5 dB as compared with the existing algorithm. In addition, experimental results indicate that the proposed algorithm provides accurate TOA estimation in a real indoor environment.
Separation of pulsar signals from noise using supervised machine learning algorithms
NASA Astrophysics Data System (ADS)
Bethapudi, S.; Desai, S.
2018-04-01
We evaluate the performance of four different machine learning (ML) algorithms: an Artificial Neural Network Multi-Layer Perceptron (ANN MLP), Adaboost, Gradient Boosting Classifier (GBC), and XGBoost, for the separation of pulsars from radio frequency interference (RFI) and other sources of noise, using a dataset obtained from the post-processing of a pulsar search pipeline. This dataset was previously used for the cross-validation of the SPINN-based machine learning engine, obtained from the reprocessing of the HTRU-S survey data (Morello et al., 2014). We have used the Synthetic Minority Over-sampling Technique (SMOTE) to deal with high-class imbalance in the dataset. We report a variety of quality scores from all four of these algorithms on both the non-SMOTE and SMOTE datasets. For all the above ML methods, we report high accuracy and G-mean for both the non-SMOTE and SMOTE cases. We study the feature importances using Adaboost, GBC, and XGBoost and also from the minimum Redundancy Maximum Relevance approach to report algorithm-agnostic feature ranking. From these methods, we find that the signal to noise of the folded profile to be the best feature. We find that all the ML algorithms report FPRs about an order of magnitude lower than the corresponding FPRs obtained in Morello et al. (2014), for the same recall value.
Learning topic models by belief propagation.
Zeng, Jia; Cheung, William K; Liu, Jiming
2013-05-01
Latent Dirichlet allocation (LDA) is an important hierarchical Bayesian model for probabilistic topic modeling, which attracts worldwide interest and touches on many important applications in text mining, computer vision and computational biology. This paper represents the collapsed LDA as a factor graph, which enables the classic loopy belief propagation (BP) algorithm for approximate inference and parameter estimation. Although two commonly used approximate inference methods, such as variational Bayes (VB) and collapsed Gibbs sampling (GS), have gained great success in learning LDA, the proposed BP is competitive in both speed and accuracy, as validated by encouraging experimental results on four large-scale document datasets. Furthermore, the BP algorithm has the potential to become a generic scheme for learning variants of LDA-based topic models in the collapsed space. To this end, we show how to learn two typical variants of LDA-based topic models, such as author-topic models (ATM) and relational topic models (RTM), using BP based on the factor graph representations.
Xiong, Zheng; He, Yinyan; Hattrick-Simpers, Jason R; Hu, Jianjun
2017-03-13
The creation of composition-processing-structure relationships currently represents a key bottleneck for data analysis for high-throughput experimental (HTE) material studies. Here we propose an automated phase diagram attribution algorithm for HTE data analysis that uses a graph-based segmentation algorithm and Delaunay tessellation to create a crystal phase diagram from high throughput libraries of X-ray diffraction (XRD) patterns. We also propose the sample-pair based objective evaluation measures for the phase diagram prediction problem. Our approach was validated using 278 diffraction patterns from a Fe-Ga-Pd composition spread sample with a prediction precision of 0.934 and a Matthews Correlation Coefficient score of 0.823. The algorithm was then applied to the open Ni-Mn-Al thin-film composition spread sample to obtain the first predicted phase diagram mapping for that sample.
Mobile robot motion estimation using Hough transform
NASA Astrophysics Data System (ADS)
Aldoshkin, D. N.; Yamskikh, T. N.; Tsarev, R. Yu
2018-05-01
This paper proposes an algorithm for estimation of mobile robot motion. The geometry of surrounding space is described with range scans (samples of distance measurements) taken by the mobile robot’s range sensors. A similar sample of space geometry in any arbitrary preceding moment of time or the environment map can be used as a reference. The suggested algorithm is invariant to isotropic scaling of samples or map that allows using samples measured in different units and maps made at different scales. The algorithm is based on Hough transform: it maps from measurement space to a straight-line parameters space. In the straight-line parameters, space the problems of estimating rotation, scaling and translation are solved separately breaking down a problem of estimating mobile robot localization into three smaller independent problems. The specific feature of the algorithm presented is its robustness to noise and outliers inherited from Hough transform. The prototype of the system of mobile robot orientation is described.
Rough sets and Laplacian score based cost-sensitive feature selection
Yu, Shenglong
2018-01-01
Cost-sensitive feature selection learning is an important preprocessing step in machine learning and data mining. Recently, most existing cost-sensitive feature selection algorithms are heuristic algorithms, which evaluate the importance of each feature individually and select features one by one. Obviously, these algorithms do not consider the relationship among features. In this paper, we propose a new algorithm for minimal cost feature selection called the rough sets and Laplacian score based cost-sensitive feature selection. The importance of each feature is evaluated by both rough sets and Laplacian score. Compared with heuristic algorithms, the proposed algorithm takes into consideration the relationship among features with locality preservation of Laplacian score. We select a feature subset with maximal feature importance and minimal cost when cost is undertaken in parallel, where the cost is given by three different distributions to simulate different applications. Different from existing cost-sensitive feature selection algorithms, our algorithm simultaneously selects out a predetermined number of “good” features. Extensive experimental results show that the approach is efficient and able to effectively obtain the minimum cost subset. In addition, the results of our method are more promising than the results of other cost-sensitive feature selection algorithms. PMID:29912884
Rough sets and Laplacian score based cost-sensitive feature selection.
Yu, Shenglong; Zhao, Hong
2018-01-01
Cost-sensitive feature selection learning is an important preprocessing step in machine learning and data mining. Recently, most existing cost-sensitive feature selection algorithms are heuristic algorithms, which evaluate the importance of each feature individually and select features one by one. Obviously, these algorithms do not consider the relationship among features. In this paper, we propose a new algorithm for minimal cost feature selection called the rough sets and Laplacian score based cost-sensitive feature selection. The importance of each feature is evaluated by both rough sets and Laplacian score. Compared with heuristic algorithms, the proposed algorithm takes into consideration the relationship among features with locality preservation of Laplacian score. We select a feature subset with maximal feature importance and minimal cost when cost is undertaken in parallel, where the cost is given by three different distributions to simulate different applications. Different from existing cost-sensitive feature selection algorithms, our algorithm simultaneously selects out a predetermined number of "good" features. Extensive experimental results show that the approach is efficient and able to effectively obtain the minimum cost subset. In addition, the results of our method are more promising than the results of other cost-sensitive feature selection algorithms.
NASA Astrophysics Data System (ADS)
Mogollón, José M.; Dale, Andrew W.; Jensen, Jørn B.; Schlüter, Michael; Regnier, Pierre
2013-08-01
Estimating the amount of methane in the seafloor globally as well as the flux of methane from sediments toward the ocean-atmosphere system are important considerations in both geological and climate sciences. Nevertheless, global estimates of methane inventories and rates of methane production and consumption through anaerobic oxidation in marine sediments are very poorly constrained. Tools for regionally assessing methane formation and consumption rates would greatly increase our understanding of the spatial heterogeneity of the methane cycle as well as help constrain the global methane budget. In this article, an algorithm for calculating methane consumption rates in the inner shelf is applied to the gas-rich sediments of the Belt Seas and The Sound (North Sea-Baltic Sea transition). It is based on the depth of free gas determined by hydroacoustic techniques and the local methane solubility concentration. Due to the continuous nature of shipboard hydroacoustic measurements, this algorithm captures spatial heterogeneities in methane fluxes better than geochemical analyses of point sources such as observational/sampling stations. The sensibility of the algorithm with respect to the resolution of the free gas depth measurements (2 m vs. 50 cm) is proven of minor importance (a discrepancy of <10%) for a small part of the study area. The algorithm-derived anaerobic methane oxidation rates compare well with previous measured and modeling studies. Finally, regional results reveal that contemporary anaerobic methane oxidation in worldwide inner-shelf sediments may be an order of magnitude lower (ca. 0.24 Tmol year-1) than previous estimates (4.6 Tmol year-1). These algorithms ultimately help improve regional estimates of anaerobic oxidation of methane rates.
Zhan, Xue-yan; Zhao, Na; Lin, Zhao-zhou; Wu, Zhi-sheng; Yuan, Rui-juan; Qiao, Yan-jiang
2014-12-01
The appropriate algorithm for calibration set selection was one of the key technologies for a good NIR quantitative model. There are different algorithms for calibration set selection, such as Random Sampling (RS) algorithm, Conventional Selection (CS) algorithm, Kennard-Stone(KS) algorithm and Sample set Portioning based on joint x-y distance (SPXY) algorithm, et al. However, there lack systematic comparisons between two algorithms of the above algorithms. The NIR quantitative models to determine the asiaticoside content in Centella total glucosides were established in the present paper, of which 7 indexes were classified and selected, and the effects of CS algorithm, KS algorithm and SPXY algorithm for calibration set selection on the accuracy and robustness of NIR quantitative models were investigated. The accuracy indexes of NIR quantitative models with calibration set selected by SPXY algorithm were significantly different from that with calibration set selected by CS algorithm or KS algorithm, while the robustness indexes, such as RMSECV and |RMSEP-RMSEC|, were not significantly different. Therefore, SPXY algorithm for calibration set selection could improve the predicative accuracy of NIR quantitative models to determine asiaticoside content in Centella total glucosides, and have no significant effect on the robustness of the models, which provides a reference to determine the appropriate algorithm for calibration set selection when NIR quantitative models are established for the solid system of traditional Chinese medcine.
Zhang, ZhiZhuo; Chang, Cheng Wei; Hugo, Willy; Cheung, Edwin; Sung, Wing-Kin
2013-03-01
Although de novo motifs can be discovered through mining over-represented sequence patterns, this approach misses some real motifs and generates many false positives. To improve accuracy, one solution is to consider some additional binding features (i.e., position preference and sequence rank preference). This information is usually required from the user. This article presents a de novo motif discovery algorithm called SEME (sampling with expectation maximization for motif elicitation), which uses pure probabilistic mixture model to model the motif's binding features and uses expectation maximization (EM) algorithms to simultaneously learn the sequence motif, position, and sequence rank preferences without asking for any prior knowledge from the user. SEME is both efficient and accurate thanks to two important techniques: the variable motif length extension and importance sampling. Using 75 large-scale synthetic datasets, 32 metazoan compendium benchmark datasets, and 164 chromatin immunoprecipitation sequencing (ChIP-Seq) libraries, we demonstrated the superior performance of SEME over existing programs in finding transcription factor (TF) binding sites. SEME is further applied to a more difficult problem of finding the co-regulated TF (coTF) motifs in 15 ChIP-Seq libraries. It identified significantly more correct coTF motifs and, at the same time, predicted coTF motifs with better matching to the known motifs. Finally, we show that the learned position and sequence rank preferences of each coTF reveals potential interaction mechanisms between the primary TF and the coTF within these sites. Some of these findings were further validated by the ChIP-Seq experiments of the coTFs. The application is available online.
Frequency-domain beamformers using conjugate gradient techniques for speech enhancement.
Zhao, Shengkui; Jones, Douglas L; Khoo, Suiyang; Man, Zhihong
2014-09-01
A multiple-iteration constrained conjugate gradient (MICCG) algorithm and a single-iteration constrained conjugate gradient (SICCG) algorithm are proposed to realize the widely used frequency-domain minimum-variance-distortionless-response (MVDR) beamformers and the resulting algorithms are applied to speech enhancement. The algorithms are derived based on the Lagrange method and the conjugate gradient techniques. The implementations of the algorithms avoid any form of explicit or implicit autocorrelation matrix inversion. Theoretical analysis establishes formal convergence of the algorithms. Specifically, the MICCG algorithm is developed based on a block adaptation approach and it generates a finite sequence of estimates that converge to the MVDR solution. For limited data records, the estimates of the MICCG algorithm are better than the conventional estimators and equivalent to the auxiliary vector algorithms. The SICCG algorithm is developed based on a continuous adaptation approach with a sample-by-sample updating procedure and the estimates asymptotically converge to the MVDR solution. An illustrative example using synthetic data from a uniform linear array is studied and an evaluation on real data recorded by an acoustic vector sensor array is demonstrated. Performance of the MICCG algorithm and the SICCG algorithm are compared with the state-of-the-art approaches.
Clusternomics: Integrative context-dependent clustering for heterogeneous datasets
Wernisch, Lorenz
2017-01-01
Integrative clustering is used to identify groups of samples by jointly analysing multiple datasets describing the same set of biological samples, such as gene expression, copy number, methylation etc. Most existing algorithms for integrative clustering assume that there is a shared consistent set of clusters across all datasets, and most of the data samples follow this structure. However in practice, the structure across heterogeneous datasets can be more varied, with clusters being joined in some datasets and separated in others. In this paper, we present a probabilistic clustering method to identify groups across datasets that do not share the same cluster structure. The proposed algorithm, Clusternomics, identifies groups of samples that share their global behaviour across heterogeneous datasets. The algorithm models clusters on the level of individual datasets, while also extracting global structure that arises from the local cluster assignments. Clusters on both the local and the global level are modelled using a hierarchical Dirichlet mixture model to identify structure on both levels. We evaluated the model both on simulated and on real-world datasets. The simulated data exemplifies datasets with varying degrees of common structure. In such a setting Clusternomics outperforms existing algorithms for integrative and consensus clustering. In a real-world application, we used the algorithm for cancer subtyping, identifying subtypes of cancer from heterogeneous datasets. We applied the algorithm to TCGA breast cancer dataset, integrating gene expression, miRNA expression, DNA methylation and proteomics. The algorithm extracted clinically meaningful clusters with significantly different survival probabilities. We also evaluated the algorithm on lung and kidney cancer TCGA datasets with high dimensionality, again showing clinically significant results and scalability of the algorithm. PMID:29036190
Clusternomics: Integrative context-dependent clustering for heterogeneous datasets.
Gabasova, Evelina; Reid, John; Wernisch, Lorenz
2017-10-01
Integrative clustering is used to identify groups of samples by jointly analysing multiple datasets describing the same set of biological samples, such as gene expression, copy number, methylation etc. Most existing algorithms for integrative clustering assume that there is a shared consistent set of clusters across all datasets, and most of the data samples follow this structure. However in practice, the structure across heterogeneous datasets can be more varied, with clusters being joined in some datasets and separated in others. In this paper, we present a probabilistic clustering method to identify groups across datasets that do not share the same cluster structure. The proposed algorithm, Clusternomics, identifies groups of samples that share their global behaviour across heterogeneous datasets. The algorithm models clusters on the level of individual datasets, while also extracting global structure that arises from the local cluster assignments. Clusters on both the local and the global level are modelled using a hierarchical Dirichlet mixture model to identify structure on both levels. We evaluated the model both on simulated and on real-world datasets. The simulated data exemplifies datasets with varying degrees of common structure. In such a setting Clusternomics outperforms existing algorithms for integrative and consensus clustering. In a real-world application, we used the algorithm for cancer subtyping, identifying subtypes of cancer from heterogeneous datasets. We applied the algorithm to TCGA breast cancer dataset, integrating gene expression, miRNA expression, DNA methylation and proteomics. The algorithm extracted clinically meaningful clusters with significantly different survival probabilities. We also evaluated the algorithm on lung and kidney cancer TCGA datasets with high dimensionality, again showing clinically significant results and scalability of the algorithm.
Online selective kernel-based temporal difference learning.
Chen, Xingguo; Gao, Yang; Wang, Ruili
2013-12-01
In this paper, an online selective kernel-based temporal difference (OSKTD) learning algorithm is proposed to deal with large scale and/or continuous reinforcement learning problems. OSKTD includes two online procedures: online sparsification and parameter updating for the selective kernel-based value function. A new sparsification method (i.e., a kernel distance-based online sparsification method) is proposed based on selective ensemble learning, which is computationally less complex compared with other sparsification methods. With the proposed sparsification method, the sparsified dictionary of samples is constructed online by checking if a sample needs to be added to the sparsified dictionary. In addition, based on local validity, a selective kernel-based value function is proposed to select the best samples from the sample dictionary for the selective kernel-based value function approximator. The parameters of the selective kernel-based value function are iteratively updated by using the temporal difference (TD) learning algorithm combined with the gradient descent technique. The complexity of the online sparsification procedure in the OSKTD algorithm is O(n). In addition, two typical experiments (Maze and Mountain Car) are used to compare with both traditional and up-to-date O(n) algorithms (GTD, GTD2, and TDC using the kernel-based value function), and the results demonstrate the effectiveness of our proposed algorithm. In the Maze problem, OSKTD converges to an optimal policy and converges faster than both traditional and up-to-date algorithms. In the Mountain Car problem, OSKTD converges, requires less computation time compared with other sparsification methods, gets a better local optima than the traditional algorithms, and converges much faster than the up-to-date algorithms. In addition, OSKTD can reach a competitive ultimate optima compared with the up-to-date algorithms.
An algorithm for extraction of periodic signals from sparse, irregularly sampled data
NASA Technical Reports Server (NTRS)
Wilcox, J. Z.
1994-01-01
Temporal gaps in discrete sampling sequences produce spurious Fourier components at the intermodulation frequencies of an oscillatory signal and the temporal gaps, thus significantly complicating spectral analysis of such sparsely sampled data. A new fast Fourier transform (FFT)-based algorithm has been developed, suitable for spectral analysis of sparsely sampled data with a relatively small number of oscillatory components buried in background noise. The algorithm's principal idea has its origin in the so-called 'clean' algorithm used to sharpen images of scenes corrupted by atmospheric and sensor aperture effects. It identifies as the signal's 'true' frequency that oscillatory component which, when passed through the same sampling sequence as the original data, produces a Fourier image that is the best match to the original Fourier space. The algorithm has generally met with succession trials with simulated data with a low signal-to-noise ratio, including those of a type similar to hourly residuals for Earth orientation parameters extracted from VLBI data. For eight oscillatory components in the diurnal and semidiurnal bands, all components with an amplitude-noise ratio greater than 0.2 were successfully extracted for all sequences and duty cycles (greater than 0.1) tested; the amplitude-noise ratios of the extracted signals were as low as 0.05 for high duty cycles and long sampling sequences. When, in addition to these high frequencies, strong low-frequency components are present in the data, the low-frequency components are generally eliminated first, by employing a version of the algorithm that searches for non-integer multiples of the discrete FET minimum frequency.
NASA Astrophysics Data System (ADS)
Wang, Xiaowei; Li, Huiping; Li, Zhichao
2018-04-01
The interfacial heat transfer coefficient (IHTC) is one of the most important thermal physical parameters which have significant effects on the calculation accuracy of physical fields in the numerical simulation. In this study, the artificial fish swarm algorithm (AFSA) was used to evaluate the IHTC between the heated sample and the quenchant in a one-dimensional heat conduction problem. AFSA is a global optimization method. In order to speed up the convergence speed, a hybrid method which is the combination of AFSA and normal distribution method (ZAFSA) was presented. The IHTC evaluated by ZAFSA were compared with those attained by AFSA and the advanced-retreat method and golden section method. The results show that the reasonable IHTC is obtained by using ZAFSA, the convergence of hybrid method is well. The algorithm based on ZAFSA can not only accelerate the convergence speed, but also reduce the numerical oscillation in the evaluation of IHTC.
Huang, Shuai; Li, Jing; Ye, Jieping; Fleisher, Adam; Chen, Kewei; Wu, Teresa; Reiman, Eric
2013-06-01
Structure learning of Bayesian Networks (BNs) is an important topic in machine learning. Driven by modern applications in genetics and brain sciences, accurate and efficient learning of large-scale BN structures from high-dimensional data becomes a challenging problem. To tackle this challenge, we propose a Sparse Bayesian Network (SBN) structure learning algorithm that employs a novel formulation involving one L1-norm penalty term to impose sparsity and another penalty term to ensure that the learned BN is a Directed Acyclic Graph--a required property of BNs. Through both theoretical analysis and extensive experiments on 11 moderate and large benchmark networks with various sample sizes, we show that SBN leads to improved learning accuracy, scalability, and efficiency as compared with 10 existing popular BN learning algorithms. We apply SBN to a real-world application of brain connectivity modeling for Alzheimer's disease (AD) and reveal findings that could lead to advancements in AD research.
Infrared vehicle recognition using unsupervised feature learning based on K-feature
NASA Astrophysics Data System (ADS)
Lin, Jin; Tan, Yihua; Xia, Haijiao; Tian, Jinwen
2018-02-01
Subject to the complex battlefield environment, it is difficult to establish a complete knowledge base in practical application of vehicle recognition algorithms. The infrared vehicle recognition is always difficult and challenging, which plays an important role in remote sensing. In this paper we propose a new unsupervised feature learning method based on K-feature to recognize vehicle in infrared images. First, we use the target detection algorithm which is based on the saliency to detect the initial image. Then, the unsupervised feature learning based on K-feature, which is generated by Kmeans clustering algorithm that extracted features by learning a visual dictionary from a large number of samples without label, is calculated to suppress the false alarm and improve the accuracy. Finally, the vehicle target recognition image is finished by some post-processing. Large numbers of experiments demonstrate that the proposed method has satisfy recognition effectiveness and robustness for vehicle recognition in infrared images under complex backgrounds, and it also improve the reliability of it.
Huang, Shuai; Li, Jing; Ye, Jieping; Fleisher, Adam; Chen, Kewei; Wu, Teresa; Reiman, Eric
2014-01-01
Structure learning of Bayesian Networks (BNs) is an important topic in machine learning. Driven by modern applications in genetics and brain sciences, accurate and efficient learning of large-scale BN structures from high-dimensional data becomes a challenging problem. To tackle this challenge, we propose a Sparse Bayesian Network (SBN) structure learning algorithm that employs a novel formulation involving one L1-norm penalty term to impose sparsity and another penalty term to ensure that the learned BN is a Directed Acyclic Graph (DAG)—a required property of BNs. Through both theoretical analysis and extensive experiments on 11 moderate and large benchmark networks with various sample sizes, we show that SBN leads to improved learning accuracy, scalability, and efficiency as compared with 10 existing popular BN learning algorithms. We apply SBN to a real-world application of brain connectivity modeling for Alzheimer’s disease (AD) and reveal findings that could lead to advancements in AD research. PMID:22665720
A review on quantum search algorithms
NASA Astrophysics Data System (ADS)
Giri, Pulak Ranjan; Korepin, Vladimir E.
2017-12-01
The use of superposition of states in quantum computation, known as quantum parallelism, has significant advantage in terms of speed over the classical computation. It is evident from the early invented quantum algorithms such as Deutsch's algorithm, Deutsch-Jozsa algorithm and its variation as Bernstein-Vazirani algorithm, Simon algorithm, Shor's algorithms, etc. Quantum parallelism also significantly speeds up the database search algorithm, which is important in computer science because it comes as a subroutine in many important algorithms. Quantum database search of Grover achieves the task of finding the target element in an unsorted database in a time quadratically faster than the classical computer. We review Grover's quantum search algorithms for a singe and multiple target elements in a database. The partial search algorithm of Grover and Radhakrishnan and its optimization by Korepin called GRK algorithm are also discussed.
Automatic crack detection method for loaded coal in vibration failure process
Li, Chengwu
2017-01-01
In the coal mining process, the destabilization of loaded coal mass is a prerequisite for coal and rock dynamic disaster, and surface cracks of the coal and rock mass are important indicators, reflecting the current state of the coal body. The detection of surface cracks in the coal body plays an important role in coal mine safety monitoring. In this paper, a method for detecting the surface cracks of loaded coal by a vibration failure process is proposed based on the characteristics of the surface cracks of coal and support vector machine (SVM). A large number of cracked images are obtained by establishing a vibration-induced failure test system and industrial camera. Histogram equalization and a hysteresis threshold algorithm were used to reduce the noise and emphasize the crack; then, 600 images and regions, including cracks and non-cracks, were manually labelled. In the crack feature extraction stage, eight features of the cracks are extracted to distinguish cracks from other objects. Finally, a crack identification model with an accuracy over 95% was trained by inputting the labelled sample images into the SVM classifier. The experimental results show that the proposed algorithm has a higher accuracy than the conventional algorithm and can effectively identify cracks on the surface of the coal and rock mass automatically. PMID:28973032
Automatic crack detection method for loaded coal in vibration failure process.
Li, Chengwu; Ai, Dihao
2017-01-01
In the coal mining process, the destabilization of loaded coal mass is a prerequisite for coal and rock dynamic disaster, and surface cracks of the coal and rock mass are important indicators, reflecting the current state of the coal body. The detection of surface cracks in the coal body plays an important role in coal mine safety monitoring. In this paper, a method for detecting the surface cracks of loaded coal by a vibration failure process is proposed based on the characteristics of the surface cracks of coal and support vector machine (SVM). A large number of cracked images are obtained by establishing a vibration-induced failure test system and industrial camera. Histogram equalization and a hysteresis threshold algorithm were used to reduce the noise and emphasize the crack; then, 600 images and regions, including cracks and non-cracks, were manually labelled. In the crack feature extraction stage, eight features of the cracks are extracted to distinguish cracks from other objects. Finally, a crack identification model with an accuracy over 95% was trained by inputting the labelled sample images into the SVM classifier. The experimental results show that the proposed algorithm has a higher accuracy than the conventional algorithm and can effectively identify cracks on the surface of the coal and rock mass automatically.
NASA Astrophysics Data System (ADS)
Wang, Z.
2015-12-01
For decades, distributed and lumped hydrological models have furthered our understanding of hydrological system. The development of hydrological simulation in large scale and high precision elaborated the spatial descriptions and hydrological behaviors. Meanwhile, the new trend is also followed by the increment of model complexity and number of parameters, which brings new challenges of uncertainty quantification. Generalized Likelihood Uncertainty Estimation (GLUE) has been widely used in uncertainty analysis for hydrological models referring to Monte Carlo method coupled with Bayesian estimation. However, the stochastic sampling method of prior parameters adopted by GLUE appears inefficient, especially in high dimensional parameter space. The heuristic optimization algorithms utilizing iterative evolution show better convergence speed and optimality-searching performance. In light of the features of heuristic optimization algorithms, this study adopted genetic algorithm, differential evolution, shuffled complex evolving algorithm to search the parameter space and obtain the parameter sets of large likelihoods. Based on the multi-algorithm sampling, hydrological model uncertainty analysis is conducted by the typical GLUE framework. To demonstrate the superiority of the new method, two hydrological models of different complexity are examined. The results shows the adaptive method tends to be efficient in sampling and effective in uncertainty analysis, providing an alternative path for uncertainty quantilization.
Williams, Robert C; Elston, Robert C; Kumar, Pankaj; Knowler, William C; Abboud, Hanna E; Adler, Sharon; Bowden, Donald W; Divers, Jasmin; Freedman, Barry I; Igo, Robert P; Ipp, Eli; Iyengar, Sudha K; Kimmel, Paul L; Klag, Michael J; Kohn, Orly; Langefeld, Carl D; Leehey, David J; Nelson, Robert G; Nicholas, Susanne B; Pahl, Madeleine V; Parekh, Rulan S; Rotter, Jerome I; Schelling, Jeffrey R; Sedor, John R; Shah, Vallabh O; Smith, Michael W; Taylor, Kent D; Thameem, Farook; Thornley-Brown, Denyse; Winkler, Cheryl A; Guo, Xiuqing; Zager, Phillip; Hanson, Robert L
2016-05-04
The presence of population structure in a sample may confound the search for important genetic loci associated with disease. Our four samples in the Family Investigation of Nephropathy and Diabetes (FIND), European Americans, Mexican Americans, African Americans, and American Indians are part of a genome- wide association study in which population structure might be particularly important. We therefore decided to study in detail one component of this, individual genetic ancestry (IGA). From SNPs present on the Affymetrix 6.0 Human SNP array, we identified 3 sets of ancestry informative markers (AIMs), each maximized for the information in one the three contrasts among ancestral populations: Europeans (HAPMAP, CEU), Africans (HAPMAP, YRI and LWK), and Native Americans (full heritage Pima Indians). We estimate IGA and present an algorithm for their standard errors, compare IGA to principal components, emphasize the importance of balancing information in the ancestry informative markers (AIMs), and test the association of IGA with diabetic nephropathy in the combined sample. A fixed parental allele maximum likelihood algorithm was applied to the FIND to estimate IGA in four samples: 869 American Indians; 1385 African Americans; 1451 Mexican Americans; and 826 European Americans. When the information in the AIMs is unbalanced, the estimates are incorrect with large error. Individual genetic admixture is highly correlated with principle components for capturing population structure. It takes ~700 SNPs to reduce the average standard error of individual admixture below 0.01. When the samples are combined, the resulting population structure creates associations between IGA and diabetic nephropathy. The identified set of AIMs, which include American Indian parental allele frequencies, may be particularly useful for estimating genetic admixture in populations from the Americas. Failure to balance information in maximum likelihood, poly-ancestry models creates biased estimates of individual admixture with large error. This also occurs when estimating IGA using the Bayesian clustering method as implemented in the program STRUCTURE. Odds ratios for the associations of IGA with disease are consistent with what is known about the incidence and prevalence of diabetic nephropathy in these populations.
NASA Astrophysics Data System (ADS)
Ayyad, Yassid; Mittig, Wolfgang; Bazin, Daniel; Beceiro-Novo, Saul; Cortesi, Marco
2018-02-01
The three-dimensional reconstruction of particle tracks in a time projection chamber is a challenging task that requires advanced classification and fitting algorithms. In this work, we have developed and implemented a novel algorithm based on the Random Sample Consensus Model (RANSAC). The RANSAC is used to classify tracks including pile-up, to remove uncorrelated noise hits, as well as to reconstruct the vertex of the reaction. The algorithm, developed within the Active Target Time Projection Chamber (AT-TPC) framework, was tested and validated by analyzing the 4He+4He reaction. Results, performance and quality of the proposed algorithm are presented and discussed in detail.
Hybrid fuzzy cluster ensemble framework for tumor clustering from biomolecular data.
Yu, Zhiwen; Chen, Hantao; You, Jane; Han, Guoqiang; Li, Le
2013-01-01
Cancer class discovery using biomolecular data is one of the most important tasks for cancer diagnosis and treatment. Tumor clustering from gene expression data provides a new way to perform cancer class discovery. Most of the existing research works adopt single-clustering algorithms to perform tumor clustering is from biomolecular data that lack robustness, stability, and accuracy. To further improve the performance of tumor clustering from biomolecular data, we introduce the fuzzy theory into the cluster ensemble framework for tumor clustering from biomolecular data, and propose four kinds of hybrid fuzzy cluster ensemble frameworks (HFCEF), named as HFCEF-I, HFCEF-II, HFCEF-III, and HFCEF-IV, respectively, to identify samples that belong to different types of cancers. The difference between HFCEF-I and HFCEF-II is that they adopt different ensemble generator approaches to generate a set of fuzzy matrices in the ensemble. Specifically, HFCEF-I applies the affinity propagation algorithm (AP) to perform clustering on the sample dimension and generates a set of fuzzy matrices in the ensemble based on the fuzzy membership function and base samples selected by AP. HFCEF-II adopts AP to perform clustering on the attribute dimension, generates a set of subspaces, and obtains a set of fuzzy matrices in the ensemble by performing fuzzy c-means on subspaces. Compared with HFCEF-I and HFCEF-II, HFCEF-III and HFCEF-IV consider the characteristics of HFCEF-I and HFCEF-II. HFCEF-III combines HFCEF-I and HFCEF-II in a serial way, while HFCEF-IV integrates HFCEF-I and HFCEF-II in a concurrent way. HFCEFs adopt suitable consensus functions, such as the fuzzy c-means algorithm or the normalized cut algorithm (Ncut), to summarize generated fuzzy matrices, and obtain the final results. The experiments on real data sets from UCI machine learning repository and cancer gene expression profiles illustrate that 1) the proposed hybrid fuzzy cluster ensemble frameworks work well on real data sets, especially biomolecular data, and 2) the proposed approaches are able to provide more robust, stable, and accurate results when compared with the state-of-the-art single clustering algorithms and traditional cluster ensemble approaches.
New method for detection of gastric cancer by hyperspectral imaging: a pilot study
NASA Astrophysics Data System (ADS)
Kiyotoki, Shu; Nishikawa, Jun; Okamoto, Takeshi; Hamabe, Kouichi; Saito, Mari; Goto, Atsushi; Fujita, Yusuke; Hamamoto, Yoshihiko; Takeuchi, Yusuke; Satori, Shin; Sakaida, Isao
2013-02-01
We developed a new, easy, and objective method to detect gastric cancer using hyperspectral imaging (HSI) technology combining spectroscopy and imaging A total of 16 gastroduodenal tumors removed by endoscopic resection or surgery from 14 patients at Yamaguchi University Hospital, Japan, were recorded using a hyperspectral camera (HSC) equipped with HSI technology Corrected spectral reflectance was obtained from 10 samples of normal mucosa and 10 samples of tumors for each case The 16 cases were divided into eight training cases (160 training samples) and eight test cases (160 test samples) We established a diagnostic algorithm with training samples and evaluated it with test samples Diagnostic capability of the algorithm for each tumor was validated, and enhancement of tumors by image processing using the HSC was evaluated The diagnostic algorithm used the 726-nm wavelength, with a cutoff point established from training samples The sensitivity, specificity, and accuracy rates of the algorithm's diagnostic capability in the test samples were 78.8% (63/80), 92.5% (74/80), and 85.6% (137/160), respectively Tumors in HSC images of 13 (81.3%) cases were well enhanced by image processing Differences in spectral reflectance between tumors and normal mucosa suggested that tumors can be clearly distinguished from background mucosa with HSI technology.
Inglis, Stephen; Melko, Roger G
2013-01-01
We implement a Wang-Landau sampling technique in quantum Monte Carlo (QMC) simulations for the purpose of calculating the Rényi entanglement entropies and associated mutual information. The algorithm converges an estimate for an analog to the density of states for stochastic series expansion QMC, allowing a direct calculation of Rényi entropies without explicit thermodynamic integration. We benchmark results for the mutual information on two-dimensional (2D) isotropic and anisotropic Heisenberg models, a 2D transverse field Ising model, and a three-dimensional Heisenberg model, confirming a critical scaling of the mutual information in cases with a finite-temperature transition. We discuss the benefits and limitations of broad sampling techniques compared to standard importance sampling methods.
Enhanced sampling of glutamate receptor ligand-binding domains.
Lau, Albert Y
2018-04-14
The majority of excitatory synaptic transmission in the central nervous system is mediated by ionotropic glutamate receptors (iGluRs). These membrane-bound protein assemblies consist of modular domains that can be genetically isolated and expressed, which has resulted in a plethora of crystal structures of individual domains in different conformations bound to different ligands. These structures have presented opportunities for molecular dynamics (MD) simulation studies. To examine the free energies that govern molecular behavior, simulation strategies and algorithms have been developed, collectively called enhanced sampling methods This review focuses on the use of enhanced sampling MD simulations of isolated iGluR ligand-binding domains to characterize thermodynamic properties important to receptor function. Copyright © 2018 Elsevier B.V. All rights reserved.
Edge-oriented dual-dictionary guided enrichment (EDGE) for MRI-CT image reconstruction.
Li, Liang; Wang, Bigong; Wang, Ge
2016-01-01
In this paper, we formulate the joint/simultaneous X-ray CT and MRI image reconstruction. In particular, a novel algorithm is proposed for MRI image reconstruction from highly under-sampled MRI data and CT images. It consists of two steps. First, a training dataset is generated from a series of well-registered MRI and CT images on the same patients. Then, an initial MRI image of a patient can be reconstructed via edge-oriented dual-dictionary guided enrichment (EDGE) based on the training dataset and a CT image of the patient. Second, an MRI image is reconstructed using the dictionary learning (DL) algorithm from highly under-sampled k-space data and the initial MRI image. Our algorithm can establish a one-to-one correspondence between the two imaging modalities, and obtain a good initial MRI estimation. Both noise-free and noisy simulation studies were performed to evaluate and validate the proposed algorithm. The results with different under-sampling factors show that the proposed algorithm performed significantly better than those reconstructed using the DL algorithm from MRI data alone.
Crabtree, Nathaniel M; Moore, Jason H; Bowyer, John F; George, Nysia I
2017-01-01
A computational evolution system (CES) is a knowledge discovery engine that can identify subtle, synergistic relationships in large datasets. Pareto optimization allows CESs to balance accuracy with model complexity when evolving classifiers. Using Pareto optimization, a CES is able to identify a very small number of features while maintaining high classification accuracy. A CES can be designed for various types of data, and the user can exploit expert knowledge about the classification problem in order to improve discrimination between classes. These characteristics give CES an advantage over other classification and feature selection algorithms, particularly when the goal is to identify a small number of highly relevant, non-redundant biomarkers. Previously, CESs have been developed only for binary class datasets. In this study, we developed a multi-class CES. The multi-class CES was compared to three common feature selection and classification algorithms: support vector machine (SVM), random k-nearest neighbor (RKNN), and random forest (RF). The algorithms were evaluated on three distinct multi-class RNA sequencing datasets. The comparison criteria were run-time, classification accuracy, number of selected features, and stability of selected feature set (as measured by the Tanimoto distance). The performance of each algorithm was data-dependent. CES performed best on the dataset with the smallest sample size, indicating that CES has a unique advantage since the accuracy of most classification methods suffer when sample size is small. The multi-class extension of CES increases the appeal of its application to complex, multi-class datasets in order to identify important biomarkers and features.
Wei, Kun; Ren, Bingyin
2018-02-13
In a future intelligent factory, a robotic manipulator must work efficiently and safely in a Human-Robot collaborative and dynamic unstructured environment. Autonomous path planning is the most important issue which must be resolved first in the process of improving robotic manipulator intelligence. Among the path-planning methods, the Rapidly Exploring Random Tree (RRT) algorithm based on random sampling has been widely applied in dynamic path planning for a high-dimensional robotic manipulator, especially in a complex environment because of its probability completeness, perfect expansion, and fast exploring speed over other planning methods. However, the existing RRT algorithm has a limitation in path planning for a robotic manipulator in a dynamic unstructured environment. Therefore, an autonomous obstacle avoidance dynamic path-planning method for a robotic manipulator based on an improved RRT algorithm, called Smoothly RRT (S-RRT), is proposed. This method that targets a directional node extends and can increase the sampling speed and efficiency of RRT dramatically. A path optimization strategy based on the maximum curvature constraint is presented to generate a smooth and curved continuous executable path for a robotic manipulator. Finally, the correctness, effectiveness, and practicability of the proposed method are demonstrated and validated via a MATLAB static simulation and a Robot Operating System (ROS) dynamic simulation environment as well as a real autonomous obstacle avoidance experiment in a dynamic unstructured environment for a robotic manipulator. The proposed method not only provides great practical engineering significance for a robotic manipulator's obstacle avoidance in an intelligent factory, but also theoretical reference value for other type of robots' path planning.
A linear programming model for protein inference problem in shotgun proteomics.
Huang, Ting; He, Zengyou
2012-11-15
Assembling peptides identified from tandem mass spectra into a list of proteins, referred to as protein inference, is an important issue in shotgun proteomics. The objective of protein inference is to find a subset of proteins that are truly present in the sample. Although many methods have been proposed for protein inference, several issues such as peptide degeneracy still remain unsolved. In this article, we present a linear programming model for protein inference. In this model, we use a transformation of the joint probability that each peptide/protein pair is present in the sample as the variable. Then, both the peptide probability and protein probability can be expressed as a formula in terms of the linear combination of these variables. Based on this simple fact, the protein inference problem is formulated as an optimization problem: minimize the number of proteins with non-zero probabilities under the constraint that the difference between the calculated peptide probability and the peptide probability generated from peptide identification algorithms should be less than some threshold. This model addresses the peptide degeneracy issue by forcing some joint probability variables involving degenerate peptides to be zero in a rigorous manner. The corresponding inference algorithm is named as ProteinLP. We test the performance of ProteinLP on six datasets. Experimental results show that our method is competitive with the state-of-the-art protein inference algorithms. The source code of our algorithm is available at: https://sourceforge.net/projects/prolp/. zyhe@dlut.edu.cn. Supplementary data are available at Bioinformatics Online.
Rinderknecht, Mike D; Ranzani, Raffaele; Popp, Werner L; Lambercy, Olivier; Gassert, Roger
2018-05-10
Psychophysical procedures are applied in various fields to assess sensory thresholds. During experiments, sampled psychometric functions are usually assumed to be stationary. However, perception can be altered, for example by loss of attention to the presentation of stimuli, leading to biased data, which results in poor threshold estimates. The few existing approaches attempting to identify non-stationarities either detect only whether there was a change in perception, or are not suitable for experiments with a relatively small number of trials (e.g., [Formula: see text] 300). We present a method to detect inattention periods on a trial-by-trial basis with the aim of improving threshold estimates in psychophysical experiments using the adaptive sampling procedure Parameter Estimation by Sequential Testing (PEST). The performance of the algorithm was evaluated in computer simulations modeling inattention, and tested in a behavioral experiment on proprioceptive difference threshold assessment in 20 stroke patients, a population where attention deficits are likely to be present. Simulations showed that estimation errors could be reduced by up to 77% for inattentive subjects, even in sequences with less than 100 trials. In the behavioral data, inattention was detected in 14% of assessments, and applying the proposed algorithm resulted in reduced test-retest variability in 73% of these corrected assessments pairs. The novel algorithm complements existing approaches and, besides being applicable post hoc, could also be used online to prevent collection of biased data. This could have important implications in assessment practice by shortening experiments and improving estimates, especially for clinical settings.
Importance sampling with imperfect cloning for the computation of generalized Lyapunov exponents
NASA Astrophysics Data System (ADS)
Anteneodo, Celia; Camargo, Sabrina; Vallejos, Raúl O.
2017-12-01
We revisit the numerical calculation of generalized Lyapunov exponents, L (q ) , in deterministic dynamical systems. The standard method consists of adding noise to the dynamics in order to use importance sampling algorithms. Then L (q ) is obtained by taking the limit noise-amplitude → 0 after the calculation. We focus on a particular method that involves periodic cloning and pruning of a set of trajectories. However, instead of considering a noisy dynamics, we implement an imperfect (noisy) cloning. This alternative method is compared with the standard one and, when possible, with analytical results. As a workbench we use the asymmetric tent map, the standard map, and a system of coupled symplectic maps. The general conclusion of this study is that the imperfect-cloning method performs as well as the standard one, with the advantage of preserving the deterministic dynamics.
Schoenberg, Mike R; Lange, Rael T; Saklofske, Donald H; Suarez, Mariann; Brickell, Tracey A
2008-12-01
Determination of neuropsychological impairment involves contrasting obtained performances with a comparison standard, which is often an estimate of premorbid IQ. M. R. Schoenberg, R. T. Lange, T. A. Brickell, and D. H. Saklofske (2007) proposed the Child Premorbid Intelligence Estimate (CPIE) to predict premorbid Full Scale IQ (FSIQ) using the Wechsler Intelligence Scale for Children-4th Edition (WISC-IV; Wechsler, 2003). The CPIE includes 12 algorithms to predict FSIQ, 1 using demographic variables and 11 algorithms combining WISC-IV subtest raw scores with demographic variables. The CPIE was applied to a sample of children with acquired traumatic brain injury (TBI sample; n = 40) and a healthy demographically matched sample (n = 40). Paired-samples t tests found estimated premorbid FSIQ differed from obtained FSIQ when applied to the TBI sample (ps
Gehrig, Nicolas; Dragotti, Pier Luigi
2009-03-01
In this paper, we study the sampling and the distributed compression of the data acquired by a camera sensor network. The effective design of these sampling and compression schemes requires, however, the understanding of the structure of the acquired data. To this end, we show that the a priori knowledge of the configuration of the camera sensor network can lead to an effective estimation of such structure and to the design of effective distributed compression algorithms. For idealized scenarios, we derive the fundamental performance bounds of a camera sensor network and clarify the connection between sampling and distributed compression. We then present a distributed compression algorithm that takes advantage of the structure of the data and that outperforms independent compression algorithms on real multiview images.
An improved initialization center k-means clustering algorithm based on distance and density
NASA Astrophysics Data System (ADS)
Duan, Yanling; Liu, Qun; Xia, Shuyin
2018-04-01
Aiming at the problem of the random initial clustering center of k means algorithm that the clustering results are influenced by outlier data sample and are unstable in multiple clustering, a method of central point initialization method based on larger distance and higher density is proposed. The reciprocal of the weighted average of distance is used to represent the sample density, and the data sample with the larger distance and the higher density are selected as the initial clustering centers to optimize the clustering results. Then, a clustering evaluation method based on distance and density is designed to verify the feasibility of the algorithm and the practicality, the experimental results on UCI data sets show that the algorithm has a certain stability and practicality.
Doble, Brett; Lorgelly, Paula
2016-04-01
To determine the external validity of existing mapping algorithms for predicting EQ-5D-3L utility values from EORTC QLQ-C30 responses and to establish their generalizability in different types of cancer. A main analysis (pooled) sample of 3560 observations (1727 patients) and two disease severity patient samples (496 and 93 patients) with repeated observations over time from Cancer 2015 were used to validate the existing algorithms. Errors were calculated between observed and predicted EQ-5D-3L utility values using a single pooled sample and ten pooled tumour type-specific samples. Predictive accuracy was assessed using mean absolute error (MAE) and standardized root-mean-squared error (RMSE). The association between observed and predicted EQ-5D utility values and other covariates across the distribution was tested using quantile regression. Quality-adjusted life years (QALYs) were calculated using observed and predicted values to test responsiveness. Ten 'preferred' mapping algorithms were identified. Two algorithms estimated via response mapping and ordinary least-squares regression using dummy variables performed well on number of validation criteria, including accurate prediction of the best and worst QLQ-C30 health states, predicted values within the EQ-5D tariff range, relatively small MAEs and RMSEs, and minimal differences between estimated QALYs. Comparison of predictive accuracy across ten tumour type-specific samples highlighted that algorithms are relatively insensitive to grouping by tumour type and affected more by differences in disease severity. Two of the 'preferred' mapping algorithms suggest more accurate predictions, but limitations exist. We recommend extensive scenario analyses if mapped utilities are used in cost-utility analyses.
The fatigue life prediction of aluminium alloy using genetic algorithm and neural network
NASA Astrophysics Data System (ADS)
Susmikanti, Mike
2013-09-01
The behavior of the fatigue life of the industrial materials is very important. In many cases, the material with experiencing fatigue life cannot be avoided, however, there are many ways to control their behavior. Many investigations of the fatigue life phenomena of alloys have been done, but it is high cost and times consuming computation. This paper report the modeling and simulation approaches to predict the fatigue life behavior of Aluminum Alloys and resolves some problems of computation. First, the simulation using genetic algorithm was utilized to optimize the load to obtain the stress values. These results can be used to provide N-cycle fatigue life of the material. Furthermore, the experimental data was applied as input data in the neural network learning, while the samples data were applied for testing of the training data. Finally, the multilayer perceptron algorithm is applied to predict whether the given data sets in accordance with the fatigue life of the alloy. To achieve rapid convergence, the Levenberg-Marquardt algorithm was also employed. The simulations results shows that the fatigue behaviors of aluminum under pressure can be predicted. In addition, implementation of neural networks successfully identified a model for material fatigue life.
NASA Astrophysics Data System (ADS)
Ramírez-López, A.; Romero-Romo, M. A.; Muñoz-Negron, D.; López-Ramírez, S.; Escarela-Pérez, R.; Duran-Valencia, C.
2012-10-01
Computational models are developed to create grain structures using mathematical algorithms based on the chaos theory such as cellular automaton, geometrical models, fractals, and stochastic methods. Because of the chaotic nature of grain structures, some of the most popular routines are based on the Monte Carlo method, statistical distributions, and random walk methods, which can be easily programmed and included in nested loops. Nevertheless, grain structures are not well defined as the results of computational errors and numerical inconsistencies on mathematical methods. Due to the finite definition of numbers or the numerical restrictions during the simulation of solidification, damaged images appear on the screen. These images must be repaired to obtain a good measurement of grain geometrical properties. Some mathematical algorithms were developed to repair, measure, and characterize grain structures obtained from cellular automata in the present work. An appropriate measurement of grain size and the corrected identification of interfaces and length are very important topics in materials science because they are the representation and validation of mathematical models with real samples. As a result, the developed algorithms are tested and proved to be appropriate and efficient to eliminate the errors and characterize the grain structures.
The pearls of using real-world evidence to discover social groups
NASA Astrophysics Data System (ADS)
Cardillo, Raymond A.; Salerno, John J.
2005-03-01
In previous work, we introduced a new paradigm called Uni-Party Data Community Generation (UDCG) and a new methodology to discover social groups (a.k.a., community models) called Link Discovery based on Correlation Analysis (LDCA). We further advanced this work by experimenting with a corpus of evidence obtained from a Ponzi scheme investigation. That work identified several UDCG algorithms, developed what we called "Importance Measures" to compare the accuracy of the algorithms based on ground truth, and presented a Concept of Operations (CONOPS) that criminal investigators could use to discover social groups. However, that work used a rather small random sample of manually edited documents because the evidence contained far too many OCR and other extraction errors. Deferring the evidence extraction errors allowed us to continue experimenting with UDCG algorithms, but only used a small fraction of the available evidence. In attempt to discover techniques that are more practical in the near-term, our most recent work focuses on being able to use an entire corpus of real-world evidence to discover social groups. This paper discusses the complications of extracting evidence, suggests a method of performing name resolution, presents a new UDCG algorithm, and discusses our future direction in this area.
Automatic microseismic event picking via unsupervised machine learning
NASA Astrophysics Data System (ADS)
Chen, Yangkang
2018-01-01
Effective and efficient arrival picking plays an important role in microseismic and earthquake data processing and imaging. Widely used short-term-average long-term-average ratio (STA/LTA) based arrival picking algorithms suffer from the sensitivity to moderate-to-strong random ambient noise. To make the state-of-the-art arrival picking approaches effective, microseismic data need to be first pre-processed, for example, removing sufficient amount of noise, and second analysed by arrival pickers. To conquer the noise issue in arrival picking for weak microseismic or earthquake event, I leverage the machine learning techniques to help recognizing seismic waveforms in microseismic or earthquake data. Because of the dependency of supervised machine learning algorithm on large volume of well-designed training data, I utilize an unsupervised machine learning algorithm to help cluster the time samples into two groups, that is, waveform points and non-waveform points. The fuzzy clustering algorithm has been demonstrated to be effective for such purpose. A group of synthetic, real microseismic and earthquake data sets with different levels of complexity show that the proposed method is much more robust than the state-of-the-art STA/LTA method in picking microseismic events, even in the case of moderately strong background noise.
Kernel-based discriminant feature extraction using a representative dataset
NASA Astrophysics Data System (ADS)
Li, Honglin; Sancho Gomez, Jose-Luis; Ahalt, Stanley C.
2002-07-01
Discriminant Feature Extraction (DFE) is widely recognized as an important pre-processing step in classification applications. Most DFE algorithms are linear and thus can only explore the linear discriminant information among the different classes. Recently, there has been several promising attempts to develop nonlinear DFE algorithms, among which is Kernel-based Feature Extraction (KFE). The efficacy of KFE has been experimentally verified by both synthetic data and real problems. However, KFE has some known limitations. First, KFE does not work well for strongly overlapped data. Second, KFE employs all of the training set samples during the feature extraction phase, which can result in significant computation when applied to very large datasets. Finally, KFE can result in overfitting. In this paper, we propose a substantial improvement to KFE that overcomes the above limitations by using a representative dataset, which consists of critical points that are generated from data-editing techniques and centroid points that are determined by using the Frequency Sensitive Competitive Learning (FSCL) algorithm. Experiments show that this new KFE algorithm performs well on significantly overlapped datasets, and it also reduces computational complexity. Further, by controlling the number of centroids, the overfitting problem can be effectively alleviated.
Maharlou, Hamidreza; Niakan Kalhori, Sharareh R; Shahbazi, Shahrbanoo; Ravangard, Ramin
2018-04-01
Accurate prediction of patients' length of stay is highly important. This study compared the performance of artificial neural network and adaptive neuro-fuzzy system algorithms to predict patients' length of stay in intensive care units (ICU) after cardiac surgery. A cross-sectional, analytical, and applied study was conducted. The required data were collected from 311 cardiac patients admitted to intensive care units after surgery at three hospitals of Shiraz, Iran, through a non-random convenience sampling method during the second quarter of 2016. Following the initial processing of influential factors, models were created and evaluated. The results showed that the adaptive neuro-fuzzy algorithm (with mean squared error [MSE] = 7 and R = 0.88) resulted in the creation of a more precise model than the artificial neural network (with MSE = 21 and R = 0.60). The adaptive neuro-fuzzy algorithm produces a more accurate model as it applies both the capabilities of a neural network architecture and experts' knowledge as a hybrid algorithm. It identifies nonlinear components, yielding remarkable results for prediction the length of stay, which is a useful calculation output to support ICU management, enabling higher quality of administration and cost reduction.
Mouse EEG spike detection based on the adapted continuous wavelet transform
NASA Astrophysics Data System (ADS)
Tieng, Quang M.; Kharatishvili, Irina; Chen, Min; Reutens, David C.
2016-04-01
Objective. Electroencephalography (EEG) is an important tool in the diagnosis of epilepsy. Interictal spikes on EEG are used to monitor the development of epilepsy and the effects of drug therapy. EEG recordings are generally long and the data voluminous. Thus developing a sensitive and reliable automated algorithm for analyzing EEG data is necessary. Approach. A new algorithm for detecting and classifying interictal spikes in mouse EEG recordings is proposed, based on the adapted continuous wavelet transform (CWT). The construction of the adapted mother wavelet is founded on a template obtained from a sample comprising the first few minutes of an EEG data set. Main Result. The algorithm was tested with EEG data from a mouse model of epilepsy and experimental results showed that the algorithm could distinguish EEG spikes from other transient waveforms with a high degree of sensitivity and specificity. Significance. Differing from existing approaches, the proposed approach combines wavelet denoising, to isolate transient signals, with adapted CWT-based template matching, to detect true interictal spikes. Using the adapted wavelet constructed from a predefined template, the adapted CWT is calculated on small EEG segments to fit dynamical changes in the EEG recording.
Near-Field Diffraction Imaging from Multiple Detection Planes
NASA Astrophysics Data System (ADS)
Loetgering, L.; Golembusch, M.; Hammoud, R.; Wilhein, T.
2017-06-01
We present diffraction imaging results obtained from multiple near-field diffraction constraints. An iterative phase retrieval algorithm was implemented that uses data redundancy achieved by measuring near-field diffraction intensities at various sample-detector distances. The procedure allows for reconstructing the exit surface wave of a sample within a multiple constraint satisfaction framework neither making use of a priori knowledge as enforced in coherent diffraction imaging (CDI) nor exact scanning grid knowledge as required in ptychography. We also investigate the potential of the presented technique to deal with polychromatic radiation as important for potential application in diffraction imaging by means of tabletop EUV and X-ray sources.
Productive Information Foraging
NASA Technical Reports Server (NTRS)
Furlong, P. Michael; Dille, Michael
2016-01-01
This paper presents a new algorithm for autonomous on-line exploration in unknown environments. The objective of the algorithm is to free robot scientists from extensive preliminary site investigation while still being able to collect meaningful data. We simulate a common form of exploration task for an autonomous robot involving sampling the environment at various locations and compare performance with a simpler existing algorithm that is also denied global information. The result of the experiment shows that the new algorithm has a statistically significant improvement in performance with a significant effect size for a range of costs for taking sampling actions.
EM in high-dimensional spaces.
Draper, Bruce A; Elliott, Daniel L; Hayes, Jeremy; Baek, Kyungim
2005-06-01
This paper considers fitting a mixture of Gaussians model to high-dimensional data in scenarios where there are fewer data samples than feature dimensions. Issues that arise when using principal component analysis (PCA) to represent Gaussian distributions inside Expectation-Maximization (EM) are addressed, and a practical algorithm results. Unlike other algorithms that have been proposed, this algorithm does not try to compress the data to fit low-dimensional models. Instead, it models Gaussian distributions in the (N - 1)-dimensional space spanned by the N data samples. We are able to show that this algorithm converges on data sets where low-dimensional techniques do not.
Wearable EEG via lossless compression.
Dufort, Guillermo; Favaro, Federico; Lecumberry, Federico; Martin, Alvaro; Oliver, Juan P; Oreggioni, Julian; Ramirez, Ignacio; Seroussi, Gadiel; Steinfeld, Leonardo
2016-08-01
This work presents a wearable multi-channel EEG recording system featuring a lossless compression algorithm. The algorithm, based in a previously reported algorithm by the authors, exploits the existing temporal correlation between samples at different sampling times, and the spatial correlation between different electrodes across the scalp. The low-power platform is able to compress, by a factor between 2.3 and 3.6, up to 300sps from 64 channels with a power consumption of 176μW/ch. The performance of the algorithm compares favorably with the best compression rates reported up to date in the literature.
SMI adaptive antenna arrays for weak interfering signals. [Sample Matrix Inversion
NASA Technical Reports Server (NTRS)
Gupta, Inder J.
1986-01-01
The performance of adaptive antenna arrays in the presence of weak interfering signals (below thermal noise) is studied. It is shown that a conventional adaptive antenna array sample matrix inversion (SMI) algorithm is unable to suppress such interfering signals. To overcome this problem, the SMI algorithm is modified. In the modified algorithm, the covariance matrix is redefined such that the effect of thermal noise on the weights of adaptive arrays is reduced. Thus, the weights are dictated by relatively weak signals. It is shown that the modified algorithm provides the desired interference protection.
ERIC Educational Resources Information Center
de Bildt, Annelies; Sytema, Sjoerd; Zander, Eric; Bölte, Sven; Sturm, Harald; Yirmiya, Nurit; Yaari, Maya; Charman, Tony; Salomone, Erica; LeCouteur, Ann; Green, Jonathan; Bedia, Ricardo Canal; Primo, Patricia García; van Daalen, Emma; de Jonge, Maretha V.; Guðmundsdóttir, Emilía; Jóhannsdóttir, Sigurrós; Raleva, Marija; Boskovska, Meri; Rogé, Bernadette; Baduel, Sophie; Moilanen, Irma; Yliherva, Anneli; Buitelaar, Jan; Oosterling, Iris J.
2015-01-01
The current study aimed to investigate the Autism Diagnostic Interview-Revised (ADI-R) algorithms for toddlers and young preschoolers (Kim and Lord, "J Autism Dev Disord" 42(1):82-93, 2012) in a non-US sample from ten sites in nine countries (n = 1,104). The construct validity indicated a good fit of the algorithms. The diagnostic…
NASA Technical Reports Server (NTRS)
Hoffman, Matthew J.; Eluszkiewicz, Janusz; Weisenstein, Deborah; Uymin, Gennady; Moncet, Jean-Luc
2012-01-01
Motivated by the needs of Mars data assimilation. particularly quantification of measurement errors and generation of averaging kernels. we have evaluated atmospheric temperature retrievals from Mars Global Surveyor (MGS) Thermal Emission Spectrometer (TES) radiances. Multiple sets of retrievals have been considered in this study; (1) retrievals available from the Planetary Data System (PDS), (2) retrievals based on variants of the retrieval algorithm used to generate the PDS retrievals, and (3) retrievals produced using the Mars 1-Dimensional Retrieval (M1R) algorithm based on the Optimal Spectral Sampling (OSS ) forward model. The retrieved temperature profiles are compared to the MGS Radio Science (RS) temperature profiles. For the samples tested, the M1R temperature profiles can be made to agree within 2 K with the RS temperature profiles, but only after tuning the prior and error statistics. Use of a global prior that does not take into account the seasonal dependence leads errors of up 6 K. In polar samples. errors relative to the RS temperature profiles are even larger. In these samples, the PDS temperature profiles also exhibit a poor fit with RS temperatures. This fit is worse than reported in previous studies, indicating that the lack of fit is due to a bias correction to TES radiances implemented after 2004. To explain the differences between the PDS and Ml R temperatures, the algorithms are compared directly, with the OSS forward model inserted into the PDS algorithm. Factors such as the filtering parameter, the use of linear versus nonlinear constrained inversion, and the choice of the forward model, are found to contribute heavily to the differences in the temperature profiles retrieved in the polar regions, resulting in uncertainties of up to 6 K. Even outside the poles, changes in the a priori statistics result in different profile shapes which all fit the radiances within the specified error. The importance of the a priori statistics prevents reliable global retrievals based a single a priori and strongly implies that a robust science analysis must instead rely on retrievals employing localized a priori information, for example from an ensemble based data assimilation system such as the Local Ensemble Transform Kalman Filter (LETKF).
NASA Astrophysics Data System (ADS)
Ma, Tianren; Xia, Zhengyou
2017-05-01
Currently, with the rapid development of information technology, the electronic media for social communication is becoming more and more popular. Discovery of communities is a very effective way to understand the properties of complex networks. However, traditional community detection algorithms consider the structural characteristics of a social organization only, with more information about nodes and edges wasted. In the meanwhile, these algorithms do not consider each node on its merits. Label propagation algorithm (LPA) is a near linear time algorithm which aims to find the community in the network. It attracts many scholars owing to its high efficiency. In recent years, there are more improved algorithms that were put forward based on LPA. In this paper, an improved LPA based on random walk and node importance (NILPA) is proposed. Firstly, a list of node importance is obtained through calculation. The nodes in the network are sorted in descending order of importance. On the basis of random walk, a matrix is constructed to measure the similarity of nodes and it avoids the random choice in the LPA. Secondly, a new metric IAS (importance and similarity) is calculated by node importance and similarity matrix, which we can use to avoid the random selection in the original LPA and improve the algorithm stability. Finally, a test in real-world and synthetic networks is given. The result shows that this algorithm has better performance than existing methods in finding community structure.
Reference Gene Validation for RT-qPCR, a Note on Different Available Software Packages
De Spiegelaere, Ward; Dern-Wieloch, Jutta; Weigel, Roswitha; Schumacher, Valérie; Schorle, Hubert; Nettersheim, Daniel; Bergmann, Martin; Brehm, Ralph; Kliesch, Sabine; Vandekerckhove, Linos; Fink, Cornelia
2015-01-01
Background An appropriate normalization strategy is crucial for data analysis from real time reverse transcription polymerase chain reactions (RT-qPCR). It is widely supported to identify and validate stable reference genes, since no single biological gene is stably expressed between cell types or within cells under different conditions. Different algorithms exist to validate optimal reference genes for normalization. Applying human cells, we here compare the three main methods to the online available RefFinder tool that integrates these algorithms along with R-based software packages which include the NormFinder and GeNorm algorithms. Results 14 candidate reference genes were assessed by RT-qPCR in two sample sets, i.e. a set of samples of human testicular tissue containing carcinoma in situ (CIS), and a set of samples from the human adult Sertoli cell line (FS1) either cultured alone or in co-culture with the seminoma like cell line (TCam-2) or with equine bone marrow derived mesenchymal stem cells (eBM-MSC). Expression stabilities of the reference genes were evaluated using geNorm, NormFinder, and BestKeeper. Similar results were obtained by the three approaches for the most and least stably expressed genes. The R-based packages NormqPCR, SLqPCR and the NormFinder for R script gave identical gene rankings. Interestingly, different outputs were obtained between the original software packages and the RefFinder tool, which is based on raw Cq values for input. When the raw data were reanalysed assuming 100% efficiency for all genes, then the outputs of the original software packages were similar to the RefFinder software, indicating that RefFinder outputs may be biased because PCR efficiencies are not taken into account. Conclusions This report shows that assay efficiency is an important parameter for reference gene validation. New software tools that incorporate these algorithms should be carefully validated prior to use. PMID:25825906
Reference gene validation for RT-qPCR, a note on different available software packages.
De Spiegelaere, Ward; Dern-Wieloch, Jutta; Weigel, Roswitha; Schumacher, Valérie; Schorle, Hubert; Nettersheim, Daniel; Bergmann, Martin; Brehm, Ralph; Kliesch, Sabine; Vandekerckhove, Linos; Fink, Cornelia
2015-01-01
An appropriate normalization strategy is crucial for data analysis from real time reverse transcription polymerase chain reactions (RT-qPCR). It is widely supported to identify and validate stable reference genes, since no single biological gene is stably expressed between cell types or within cells under different conditions. Different algorithms exist to validate optimal reference genes for normalization. Applying human cells, we here compare the three main methods to the online available RefFinder tool that integrates these algorithms along with R-based software packages which include the NormFinder and GeNorm algorithms. 14 candidate reference genes were assessed by RT-qPCR in two sample sets, i.e. a set of samples of human testicular tissue containing carcinoma in situ (CIS), and a set of samples from the human adult Sertoli cell line (FS1) either cultured alone or in co-culture with the seminoma like cell line (TCam-2) or with equine bone marrow derived mesenchymal stem cells (eBM-MSC). Expression stabilities of the reference genes were evaluated using geNorm, NormFinder, and BestKeeper. Similar results were obtained by the three approaches for the most and least stably expressed genes. The R-based packages NormqPCR, SLqPCR and the NormFinder for R script gave identical gene rankings. Interestingly, different outputs were obtained between the original software packages and the RefFinder tool, which is based on raw Cq values for input. When the raw data were reanalysed assuming 100% efficiency for all genes, then the outputs of the original software packages were similar to the RefFinder software, indicating that RefFinder outputs may be biased because PCR efficiencies are not taken into account. This report shows that assay efficiency is an important parameter for reference gene validation. New software tools that incorporate these algorithms should be carefully validated prior to use.
Using machine learning to accelerate sampling-based inversion
NASA Astrophysics Data System (ADS)
Valentine, A. P.; Sambridge, M.
2017-12-01
In most cases, a complete solution to a geophysical inverse problem (including robust understanding of the uncertainties associated with the result) requires a sampling-based approach. However, the computational burden is high, and proves intractable for many problems of interest. There is therefore considerable value in developing techniques that can accelerate sampling procedures.The main computational cost lies in evaluation of the forward operator (e.g. calculation of synthetic seismograms) for each candidate model. Modern machine learning techniques-such as Gaussian Processes-offer a route for constructing a computationally-cheap approximation to this calculation, which can replace the accurate solution during sampling. Importantly, the accuracy of the approximation can be refined as inversion proceeds, to ensure high-quality results.In this presentation, we describe and demonstrate this approach-which can be seen as an extension of popular current methods, such as the Neighbourhood Algorithm, and bridges the gap between prior- and posterior-sampling frameworks.
Fast converging minimum probability of error neural network receivers for DS-CDMA communications.
Matyjas, John D; Psaromiligkos, Ioannis N; Batalama, Stella N; Medley, Michael J
2004-03-01
We consider a multilayer perceptron neural network (NN) receiver architecture for the recovery of the information bits of a direct-sequence code-division-multiple-access (DS-CDMA) user. We develop a fast converging adaptive training algorithm that minimizes the bit-error rate (BER) at the output of the receiver. The adaptive algorithm has three key features: i) it incorporates the BER, i.e., the ultimate performance evaluation measure, directly into the learning process, ii) it utilizes constraints that are derived from the properties of the optimum single-user decision boundary for additive white Gaussian noise (AWGN) multiple-access channels, and iii) it embeds importance sampling (IS) principles directly into the receiver optimization process. Simulation studies illustrate the BER performance of the proposed scheme.
Recent developments of the NESSUS probabilistic structural analysis computer program
NASA Technical Reports Server (NTRS)
Millwater, H.; Wu, Y.-T.; Torng, T.; Thacker, B.; Riha, D.; Leung, C. P.
1992-01-01
The NESSUS probabilistic structural analysis computer program combines state-of-the-art probabilistic algorithms with general purpose structural analysis methods to compute the probabilistic response and the reliability of engineering structures. Uncertainty in loading, material properties, geometry, boundary conditions and initial conditions can be simulated. The structural analysis methods include nonlinear finite element and boundary element methods. Several probabilistic algorithms are available such as the advanced mean value method and the adaptive importance sampling method. The scope of the code has recently been expanded to include probabilistic life and fatigue prediction of structures in terms of component and system reliability and risk analysis of structures considering cost of failure. The code is currently being extended to structural reliability considering progressive crack propagation. Several examples are presented to demonstrate the new capabilities.
Automated Speech Rate Measurement in Dysarthria.
Martens, Heidi; Dekens, Tomas; Van Nuffelen, Gwen; Latacz, Lukas; Verhelst, Werner; De Bodt, Marc
2015-06-01
In this study, a new algorithm for automated determination of speech rate (SR) in dysarthric speech is evaluated. We investigated how reliably the algorithm calculates the SR of dysarthric speech samples when compared with calculation performed by speech-language pathologists. The new algorithm was trained and tested using Dutch speech samples of 36 speakers with no history of speech impairment and 40 speakers with mild to moderate dysarthria. We tested the algorithm under various conditions: according to speech task type (sentence reading, passage reading, and storytelling) and algorithm optimization method (speaker group optimization and individual speaker optimization). Correlations between automated and human SR determination were calculated for each condition. High correlations between automated and human SR determination were found in the various testing conditions. The new algorithm measures SR in a sufficiently reliable manner. It is currently being integrated in a clinical software tool for assessing and managing prosody in dysarthric speech. Further research is needed to fine-tune the algorithm to severely dysarthric speech, to make the algorithm less sensitive to background noise, and to evaluate how the algorithm deals with syllabic consonants.
An Enhanced K-Means Algorithm for Water Quality Analysis of The Haihe River in China.
Zou, Hui; Zou, Zhihong; Wang, Xiaojing
2015-11-12
The increase and the complexity of data caused by the uncertain environment is today's reality. In order to identify water quality effectively and reliably, this paper presents a modified fast clustering algorithm for water quality analysis. The algorithm has adopted a varying weights K-means cluster algorithm to analyze water monitoring data. The varying weights scheme was the best weighting indicator selected by a modified indicator weight self-adjustment algorithm based on K-means, which is named MIWAS-K-means. The new clustering algorithm avoids the margin of the iteration not being calculated in some cases. With the fast clustering analysis, we can identify the quality of water samples. The algorithm is applied in water quality analysis of the Haihe River (China) data obtained by the monitoring network over a period of eight years (2006-2013) with four indicators at seven different sites (2078 samples). Both the theoretical and simulated results demonstrate that the algorithm is efficient and reliable for water quality analysis of the Haihe River. In addition, the algorithm can be applied to more complex data matrices with high dimensionality.
Consistency of Global Modis Aerosol Optical Depths over Ocean on Terra and Aqua Ceres SSF Datasets
NASA Technical Reports Server (NTRS)
Ignatov, Alexander; Minnis, Patrick; Miller, Walter F.; Wielicki, Bruce A.; Remer, Lorraine
2006-01-01
Aerosol retrievals over ocean from the Moderate Resolution Imaging Spectroradiometer (MODIS) onboard Terra and Aqua platforms are available from the Clouds and the Earth's Radiant Energy System (CERES) Single Scanner Footprint (SSF) datasets generated at NASA Langley Research Center (LaRC). Two aerosol products are reported side-by-side. The primary M product is generated by sub-setting and remapping the multi-spectral (0.47-2.1 micrometer) MODIS produced oceanic aerosol (MOD04/MYD04 for Terra/Aqua) onto CERES footprints. M*D04 processing uses cloud screening and aerosol algorithms developed by the MODIS science team. The secondary AVHRR-like A product is generated in only two MODIS bands 1 and 6 (on Aqua, bands 1 and 7). The A processing uses the CERES cloud screening algorithm, and NOAA/NESDIS glint identification, and single-channel aerosol retrieval algorithms. The M and A products have been documented elsewhere and preliminarily compared using 2 weeks of global Terra CERES SSF Edition 1A data in which the M product was based on MOD04 collection 3. In this study, the comparisons between the M and A aerosol optical depths (AOD) in MODIS band 1 (0.64 micrometers), tau(sub 1M) and tau(sub 1A) are re-examined using 9 days of global CERES SSF Terra Edition 2A and Aqua Edition 1B data from 13 - 21 October 2002, and extended to include cross-platform comparisons. The M and A products on the new CERES SSF release are generated using the same aerosol algorithms as before, but with different preprocessing and sampling procedures, lending themselves to a simple sensitivity check to non-aerosol factors. Both tau(sub 1M) and tau(sub 1A) generally compare well across platforms. However, the M product shows some differences, which increase with ambient cloud amount and towards the solar side of the orbit. Three types of comparisons conducted in this study - cross-platform, cross-product, and cross-release confirm the previously made observation that the major area for improvement in the current aerosol processing lies in a more formalized and standardized sampling (and most importantly, cloud screening) whereas optimization of the aerosol algorithm is deemed to be an important yet less critical element.
CHRR: coordinate hit-and-run with rounding for uniform sampling of constraint-based models
DOE Office of Scientific and Technical Information (OSTI.GOV)
Haraldsdóttir, Hulda S.; Cousins, Ben; Thiele, Ines
In constraint-based metabolic modelling, physical and biochemical constraints define a polyhedral convex set of feasible flux vectors. Uniform sampling of this set provides an unbiased characterization of the metabolic capabilities of a biochemical network. However, reliable uniform sampling of genome-scale biochemical networks is challenging due to their high dimensionality and inherent anisotropy. Here, we present an implementation of a new sampling algorithm, coordinate hit-and-run with rounding (CHRR). This algorithm is based on the provably efficient hit-and-run random walk and crucially uses a preprocessing step to round the anisotropic flux set. CHRR provably converges to a uniform stationary sampling distribution. Wemore » apply it to metabolic networks of increasing dimensionality. We show that it converges several times faster than a popular artificial centering hit-and-run algorithm, enabling reliable and tractable sampling of genome-scale biochemical networks.« less
CHRR: coordinate hit-and-run with rounding for uniform sampling of constraint-based models
Haraldsdóttir, Hulda S.; Cousins, Ben; Thiele, Ines; ...
2017-01-31
In constraint-based metabolic modelling, physical and biochemical constraints define a polyhedral convex set of feasible flux vectors. Uniform sampling of this set provides an unbiased characterization of the metabolic capabilities of a biochemical network. However, reliable uniform sampling of genome-scale biochemical networks is challenging due to their high dimensionality and inherent anisotropy. Here, we present an implementation of a new sampling algorithm, coordinate hit-and-run with rounding (CHRR). This algorithm is based on the provably efficient hit-and-run random walk and crucially uses a preprocessing step to round the anisotropic flux set. CHRR provably converges to a uniform stationary sampling distribution. Wemore » apply it to metabolic networks of increasing dimensionality. We show that it converges several times faster than a popular artificial centering hit-and-run algorithm, enabling reliable and tractable sampling of genome-scale biochemical networks.« less
A clustering algorithm for sample data based on environmental pollution characteristics
NASA Astrophysics Data System (ADS)
Chen, Mei; Wang, Pengfei; Chen, Qiang; Wu, Jiadong; Chen, Xiaoyun
2015-04-01
Environmental pollution has become an issue of serious international concern in recent years. Among the receptor-oriented pollution models, CMB, PMF, UNMIX, and PCA are widely used as source apportionment models. To improve the accuracy of source apportionment and classify the sample data for these models, this study proposes an easy-to-use, high-dimensional EPC algorithm that not only organizes all of the sample data into different groups according to the similarities in pollution characteristics such as pollution sources and concentrations but also simultaneously detects outliers. The main clustering process consists of selecting the first unlabelled point as the cluster centre, then assigning each data point in the sample dataset to its most similar cluster centre according to both the user-defined threshold and the value of similarity function in each iteration, and finally modifying the clusters using a method similar to k-Means. The validity and accuracy of the algorithm are tested using both real and synthetic datasets, which makes the EPC algorithm practical and effective for appropriately classifying sample data for source apportionment models and helpful for better understanding and interpreting the sources of pollution.
ERIC Educational Resources Information Center
Nussbaum, Francis, Jr.
1988-01-01
Presents an algorithm for solving problems related to multiple allelic frequencies in populations at equilibrium. Considers sample problems and provides their solution using this tabular algorithm. (CW)
Li, Zong-Tao; Wu, Tie-Jun; Lin, Can-Long; Ma, Long-Hua
2011-01-01
A new generalized optimum strapdown algorithm with coning and sculling compensation is presented, in which the position, velocity and attitude updating operations are carried out based on the single-speed structure in which all computations are executed at a single updating rate that is sufficiently high to accurately account for high frequency angular rate and acceleration rectification effects. Different from existing algorithms, the updating rates of the coning and sculling compensations are unrelated with the number of the gyro incremental angle samples and the number of the accelerometer incremental velocity samples. When the output sampling rate of inertial sensors remains constant, this algorithm allows increasing the updating rate of the coning and sculling compensation, yet with more numbers of gyro incremental angle and accelerometer incremental velocity in order to improve the accuracy of system. Then, in order to implement the new strapdown algorithm in a single FPGA chip, the parallelization of the algorithm is designed and its computational complexity is analyzed. The performance of the proposed parallel strapdown algorithm is tested on the Xilinx ISE 12.3 software platform and the FPGA device XC6VLX550T hardware platform on the basis of some fighter data. It is shown that this parallel strapdown algorithm on the FPGA platform can greatly decrease the execution time of algorithm to meet the real-time and high precision requirements of system on the high dynamic environment, relative to the existing implemented on the DSP platform. PMID:22164058
Vizentin-Bugoni, Jeferson; Maruyama, Pietro K; Debastiani, Vanderlei J; Duarte, L da S; Dalsgaard, Bo; Sazima, Marlies
2016-01-01
Virtually all empirical ecological interaction networks to some extent suffer from undersampling. However, how limitations imposed by sampling incompleteness affect our understanding of ecological networks is still poorly explored, which may hinder further advances in the field. Here, we use a plant-hummingbird network with unprecedented sampling effort (2716 h of focal observations) from the Atlantic Rainforest in Brazil, to investigate how sampling effort affects the description of network structure (i.e. widely used network metrics) and the relative importance of distinct processes (i.e. species abundances vs. traits) in determining the frequency of pairwise interactions. By dividing the network into time slices representing a gradient of sampling effort, we show that quantitative metrics, such as interaction evenness, specialization (H2 '), weighted nestedness (wNODF) and modularity (Q; QuanBiMo algorithm) were less biased by sampling incompleteness than binary metrics. Furthermore, the significance of some network metrics changed along the sampling effort gradient. Nevertheless, the higher importance of traits in structuring the network was apparent even with small sampling effort. Our results (i) warn against using very poorly sampled networks as this may bias our understanding of networks, both their patterns and structuring processes, (ii) encourage the use of quantitative metrics little influenced by sampling when performing spatio-temporal comparisons and (iii) indicate that in networks strongly constrained by species traits, such as plant-hummingbird networks, even small sampling is sufficient to detect their relative importance for the frequencies of interactions. Finally, we argue that similar effects of sampling are expected for other highly specialized subnetworks. © 2015 The Authors. Journal of Animal Ecology © 2015 British Ecological Society.
Schoenberg, Mike R; Lange, Rael T; Saklofske, Donald H
2007-11-01
Establishing a comparison standard in neuropsychological assessment is crucial to determining change in function. There is no available method to estimate premorbid intellectual functioning for the Wechsler Intelligence Scale for Children-Fourth Edition (WISC-IV). The WISC-IV provided normative data for both American and Canadian children aged 6 to 16 years old. This study developed regression algorithms as a proposed method to estimate full-scale intelligence quotient (FSIQ) for the Canadian WISC-IV. Participants were the Canadian WISC-IV standardization sample (n = 1,100). The sample was randomly divided into two groups (development and validation groups). The development group was used to generate regression algorithms; 1 algorithm only included demographics, and 11 combined demographic variables with WISC-IV subtest raw scores. The algorithms accounted for 18% to 70% of the variance in FSIQ (standard error of estimate, SEE = 8.6 to 14.2). Estimated FSIQ significantly correlated with actual FSIQ (r = .30 to .80), and the majority of individual FSIQ estimates were within +/-10 points of actual FSIQ. The demographic-only algorithm was less accurate than algorithms combining demographic variables with subtest raw scores. The current algorithms yielded accurate estimates of current FSIQ for Canadian individuals aged 6-16 years old. The potential application of the algorithms to estimate premorbid FSIQ is reviewed. While promising, clinical validation of the algorithms in a sample of children and/or adolescents with known neurological dysfunction is needed to establish these algorithms as a premorbid estimation procedure.
NASA Astrophysics Data System (ADS)
Shen, Lin; Xie, Liangxu; Yang, Mingjun
2017-04-01
Conformational sampling under rugged energy landscape is always a challenge in computer simulations. The recently developed integrated tempering sampling, together with its selective variant (SITS), emerges to be a powerful tool in exploring the free energy landscape or functional motions of various systems. The estimation of weighting factors constitutes a critical step in these methods and requires accurate calculation of partition function ratio between different thermodynamic states. In this work, we propose a new adaptive update algorithm to compute the weighting factors based on the weighted histogram analysis method (WHAM). The adaptive-WHAM algorithm with SITS is then applied to study the thermodynamic properties of several representative peptide systems solvated in an explicit water box. The performance of the new algorithm is validated in simulations of these solvated peptide systems. We anticipate more applications of this coupled optimisation and production algorithm to other complicated systems such as the biochemical reactions in solution.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jakeman, John D.; Narayan, Akil; Zhou, Tao
We propose an algorithm for recovering sparse orthogonal polynomial expansions via collocation. A standard sampling approach for recovering sparse polynomials uses Monte Carlo sampling, from the density of orthogonality, which results in poor function recovery when the polynomial degree is high. Our proposed approach aims to mitigate this limitation by sampling with respect to the weighted equilibrium measure of the parametric domain and subsequently solves a preconditionedmore » $$\\ell^1$$-minimization problem, where the weights of the diagonal preconditioning matrix are given by evaluations of the Christoffel function. Our algorithm can be applied to a wide class of orthogonal polynomial families on bounded and unbounded domains, including all classical families. We present theoretical analysis to motivate the algorithm and numerical results that show our method is superior to standard Monte Carlo methods in many situations of interest. In conclusion, numerical examples are also provided to demonstrate that our proposed algorithm leads to comparable or improved accuracy even when compared with Legendre- and Hermite-specific algorithms.« less
A high speed implementation of the random decrement algorithm
NASA Technical Reports Server (NTRS)
Kiraly, L. J.
1982-01-01
The algorithm is useful for measuring net system damping levels in stochastic processes and for the development of equivalent linearized system response models. The algorithm works by summing together all subrecords which occur after predefined threshold level is crossed. The random decrement signature is normally developed by scanning stored data and adding subrecords together. The high speed implementation of the random decrement algorithm exploits the digital character of sampled data and uses fixed record lengths of 2(n) samples to greatly speed up the process. The contributions to the random decrement signature of each data point was calculated only once and in the same sequence as the data were taken. A hardware implementation of the algorithm using random logic is diagrammed and the process is shown to be limited only by the record size and the threshold crossing frequency of the sampled data. With a hardware cycle time of 200 ns and 1024 point signature, a threshold crossing frequency of 5000 Hertz can be processed and a stably averaged signature presented in real time.
Fast and Robust STEM Reconstruction in Complex Environments Using Terrestrial Laser Scanning
NASA Astrophysics Data System (ADS)
Wang, D.; Hollaus, M.; Puttonen, E.; Pfeifer, N.
2016-06-01
Terrestrial Laser Scanning (TLS) is an effective tool in forest research and management. However, accurate estimation of tree parameters still remains challenging in complex forests. In this paper, we present a novel algorithm for stem modeling in complex environments. This method does not require accurate delineation of stem points from the original point cloud. The stem reconstruction features a self-adaptive cylinder growing scheme. This algorithm is tested for a landslide region in the federal state of Vorarlberg, Austria. The algorithm results are compared with field reference data, which show that our algorithm is able to accurately retrieve the diameter at breast height (DBH) with a root mean square error (RMSE) of ~1.9 cm. This algorithm is further facilitated by applying an advanced sampling technique. Different sampling rates are applied and tested. It is found that a sampling rate of 7.5% is already able to retain the stem fitting quality and simultaneously reduce the computation time significantly by ~88%.
Billing code algorithms to identify cases of peripheral artery disease from administrative data
Fan, Jin; Arruda-Olson, Adelaide M; Leibson, Cynthia L; Smith, Carin; Liu, Guanghui; Bailey, Kent R; Kullo, Iftikhar J
2013-01-01
Objective To construct and validate billing code algorithms for identifying patients with peripheral arterial disease (PAD). Methods We extracted all encounters and line item details including PAD-related billing codes at Mayo Clinic Rochester, Minnesota, between July 1, 1997 and June 30, 2008; 22 712 patients evaluated in the vascular laboratory were divided into training and validation sets. Multiple logistic regression analysis was used to create an integer code score from the training dataset, and this was tested in the validation set. We applied a model-based code algorithm to patients evaluated in the vascular laboratory and compared this with a simpler algorithm (presence of at least one of the ICD-9 PAD codes 440.20–440.29). We also applied both algorithms to a community-based sample (n=4420), followed by a manual review. Results The logistic regression model performed well in both training and validation datasets (c statistic=0.91). In patients evaluated in the vascular laboratory, the model-based code algorithm provided better negative predictive value. The simpler algorithm was reasonably accurate for identification of PAD status, with lesser sensitivity and greater specificity. In the community-based sample, the sensitivity (38.7% vs 68.0%) of the simpler algorithm was much lower, whereas the specificity (92.0% vs 87.6%) was higher than the model-based algorithm. Conclusions A model-based billing code algorithm had reasonable accuracy in identifying PAD cases from the community, and in patients referred to the non-invasive vascular laboratory. The simpler algorithm had reasonable accuracy for identification of PAD in patients referred to the vascular laboratory but was significantly less sensitive in a community-based sample. PMID:24166724
Exploring the Energy Landscapes of Protein Folding Simulations with Bayesian Computation
Burkoff, Nikolas S.; Várnai, Csilla; Wells, Stephen A.; Wild, David L.
2012-01-01
Nested sampling is a Bayesian sampling technique developed to explore probability distributions localized in an exponentially small area of the parameter space. The algorithm provides both posterior samples and an estimate of the evidence (marginal likelihood) of the model. The nested sampling algorithm also provides an efficient way to calculate free energies and the expectation value of thermodynamic observables at any temperature, through a simple post processing of the output. Previous applications of the algorithm have yielded large efficiency gains over other sampling techniques, including parallel tempering. In this article, we describe a parallel implementation of the nested sampling algorithm and its application to the problem of protein folding in a Gō-like force field of empirical potentials that were designed to stabilize secondary structure elements in room-temperature simulations. We demonstrate the method by conducting folding simulations on a number of small proteins that are commonly used for testing protein-folding procedures. A topological analysis of the posterior samples is performed to produce energy landscape charts, which give a high-level description of the potential energy surface for the protein folding simulations. These charts provide qualitative insights into both the folding process and the nature of the model and force field used. PMID:22385859
Exploring the energy landscapes of protein folding simulations with Bayesian computation.
Burkoff, Nikolas S; Várnai, Csilla; Wells, Stephen A; Wild, David L
2012-02-22
Nested sampling is a Bayesian sampling technique developed to explore probability distributions localized in an exponentially small area of the parameter space. The algorithm provides both posterior samples and an estimate of the evidence (marginal likelihood) of the model. The nested sampling algorithm also provides an efficient way to calculate free energies and the expectation value of thermodynamic observables at any temperature, through a simple post processing of the output. Previous applications of the algorithm have yielded large efficiency gains over other sampling techniques, including parallel tempering. In this article, we describe a parallel implementation of the nested sampling algorithm and its application to the problem of protein folding in a Gō-like force field of empirical potentials that were designed to stabilize secondary structure elements in room-temperature simulations. We demonstrate the method by conducting folding simulations on a number of small proteins that are commonly used for testing protein-folding procedures. A topological analysis of the posterior samples is performed to produce energy landscape charts, which give a high-level description of the potential energy surface for the protein folding simulations. These charts provide qualitative insights into both the folding process and the nature of the model and force field used. Copyright © 2012 Biophysical Society. Published by Elsevier Inc. All rights reserved.
Lee, Unseok; Chang, Sungyul; Putra, Gian Anantrio; Kim, Hyoungseok; Kim, Dong Hwan
2018-01-01
A high-throughput plant phenotyping system automatically observes and grows many plant samples. Many plant sample images are acquired by the system to determine the characteristics of the plants (populations). Stable image acquisition and processing is very important to accurately determine the characteristics. However, hardware for acquiring plant images rapidly and stably, while minimizing plant stress, is lacking. Moreover, most software cannot adequately handle large-scale plant imaging. To address these problems, we developed a new, automated, high-throughput plant phenotyping system using simple and robust hardware, and an automated plant-imaging-analysis pipeline consisting of machine-learning-based plant segmentation. Our hardware acquires images reliably and quickly and minimizes plant stress. Furthermore, the images are processed automatically. In particular, large-scale plant-image datasets can be segmented precisely using a classifier developed using a superpixel-based machine-learning algorithm (Random Forest), and variations in plant parameters (such as area) over time can be assessed using the segmented images. We performed comparative evaluations to identify an appropriate learning algorithm for our proposed system, and tested three robust learning algorithms. We developed not only an automatic analysis pipeline but also a convenient means of plant-growth analysis that provides a learning data interface and visualization of plant growth trends. Thus, our system allows end-users such as plant biologists to analyze plant growth via large-scale plant image data easily.
MULTI-K: accurate classification of microarray subtypes using ensemble k-means clustering
Kim, Eun-Youn; Kim, Seon-Young; Ashlock, Daniel; Nam, Dougu
2009-01-01
Background Uncovering subtypes of disease from microarray samples has important clinical implications such as survival time and sensitivity of individual patients to specific therapies. Unsupervised clustering methods have been used to classify this type of data. However, most existing methods focus on clusters with compact shapes and do not reflect the geometric complexity of the high dimensional microarray clusters, which limits their performance. Results We present a cluster-number-based ensemble clustering algorithm, called MULTI-K, for microarray sample classification, which demonstrates remarkable accuracy. The method amalgamates multiple k-means runs by varying the number of clusters and identifies clusters that manifest the most robust co-memberships of elements. In addition to the original algorithm, we newly devised the entropy-plot to control the separation of singletons or small clusters. MULTI-K, unlike the simple k-means or other widely used methods, was able to capture clusters with complex and high-dimensional structures accurately. MULTI-K outperformed other methods including a recently developed ensemble clustering algorithm in tests with five simulated and eight real gene-expression data sets. Conclusion The geometric complexity of clusters should be taken into account for accurate classification of microarray data, and ensemble clustering applied to the number of clusters tackles the problem very well. The C++ code and the data sets tested are available from the authors. PMID:19698124
Lukashin, A V; Fuchs, R
2001-05-01
Cluster analysis of genome-wide expression data from DNA microarray hybridization studies has proved to be a useful tool for identifying biologically relevant groupings of genes and samples. In the present paper, we focus on several important issues related to clustering algorithms that have not yet been fully studied. We describe a simple and robust algorithm for the clustering of temporal gene expression profiles that is based on the simulated annealing procedure. In general, this algorithm guarantees to eventually find the globally optimal distribution of genes over clusters. We introduce an iterative scheme that serves to evaluate quantitatively the optimal number of clusters for each specific data set. The scheme is based on standard approaches used in regular statistical tests. The basic idea is to organize the search of the optimal number of clusters simultaneously with the optimization of the distribution of genes over clusters. The efficiency of the proposed algorithm has been evaluated by means of a reverse engineering experiment, that is, a situation in which the correct distribution of genes over clusters is known a priori. The employment of this statistically rigorous test has shown that our algorithm places greater than 90% genes into correct clusters. Finally, the algorithm has been tested on real gene expression data (expression changes during yeast cell cycle) for which the fundamental patterns of gene expression and the assignment of genes to clusters are well understood from numerous previous studies.
NASA Astrophysics Data System (ADS)
More, Anupreeta; Verma, Aprajita; Marshall, Philip J.; More, Surhud; Baeten, Elisabeth; Wilcox, Julianne; Macmillan, Christine; Cornen, Claude; Kapadia, Amit; Parrish, Michael; Snyder, Chris; Davis, Christopher P.; Gavazzi, Raphael; Lintott, Chris J.; Simpson, Robert; Miller, David; Smith, Arfon M.; Paget, Edward; Saha, Prasenjit; Küng, Rafael; Collett, Thomas E.
2016-01-01
We report the discovery of 29 promising (and 59 total) new lens candidates from the Canada-France-Hawaii Telescope Legacy Survey (CFHTLS) based on about 11 million classifications performed by citizen scientists as part of the first SPACE WARPS lens search. The goal of the blind lens search was to identify lens candidates missed by robots (the RINGFINDER on galaxy scales and ARCFINDER on group/cluster scales) which had been previously used to mine the CFHTLS for lenses. We compare some properties of the samples detected by these algorithms to the SPACE WARPS sample and find them to be broadly similar. The image separation distribution calculated from the SPACE WARPS sample shows that previous constraints on the average density profile of lens galaxies are robust. SPACE WARPS recovers about 65 per cent of known lenses, while the new candidates show a richer variety compared to those found by the two robots. This detection rate could be increased to 80 per cent by only using classifications performed by expert volunteers (albeit at the cost of a lower purity), indicating that the training and performance calibration of the citizen scientists is very important for the success of SPACE WARPS. In this work we present the SIMCT pipeline, used for generating in situ a sample of realistic simulated lensed images. This training sample, along with the false positives identified during the search, has a legacy value for testing future lens-finding algorithms. We make the pipeline and the training set publicly available.
Godoy-Caballero, María del Pilar; Culzoni, María Julia; Galeano-Díaz, Teresa; Acedo-Valenzuela, María Isabel
2013-02-06
This paper presents the development of a non-aqueous capillary electrophoresis method coupled to UV detection combined with multivariate curve resolution-alternating least-squares (MCR-ALS) to carry out the resolution and quantitation of a mixture of six phenolic acids in virgin olive oil samples. p-Coumaric, caffeic, ferulic, 3,4-dihydroxyphenylacetic, vanillic and 4-hydroxyphenilacetic acids have been the analytes under study. All of them present different absorption spectra and overlapped time profiles with the olive oil matrix interferences and between them. The modeling strategy involves the building of a single MCR-ALS model composed of matrices augmented in the temporal mode, namely spectra remain invariant while time profiles may change from sample to sample. So MCR-ALS was used to cope with the coeluting interferences, on accounting the second order advantage inherent to this algorithm which, in addition, is able to handle data sets deviating from trilinearity, like the data herein analyzed. The method was firstly applied to resolve standard mixtures of the analytes randomly prepared in 1-propanol and, secondly, in real virgin olive oil samples, getting recovery values near to 100% in all cases. The importance and novelty of this methodology relies on the combination of non-aqueous capillary electrophoresis second-order data and MCR-ALS algorithm which allows performing the resolution of these compounds simplifying the previous sample pretreatment stages. Copyright © 2012 Elsevier B.V. All rights reserved.
Event Processing and Variable Part of Sample Period Determining in Combined Systems Using GA
NASA Astrophysics Data System (ADS)
Strémy, Maximilián; Závacký, Pavol; Jedlička, Martin
2011-01-01
This article deals with combined dynamic systems and usage of modern techniques in dealing with these systems, focusing particularly on sampling period design, cyclic processing tasks and related processing algorithms in the combined event management systems using genetic algorithms.
Reduced Order Model Basis Vector Generation: Generates Basis Vectors fro ROMs
DOE Office of Scientific and Technical Information (OSTI.GOV)
Arrighi, Bill
2016-03-03
libROM is a library that implements order reduction via singular value decomposition (SVD) of sampled state vectors. It implements 2 parallel, incremental SVD algorithms and one serial, non-incremental algorithm. It also provides a mechanism for adaptive sampling of basis vectors.
DALMATIAN: An Algorithm for Automatic Cell Detection and Counting in 3D.
Shuvaev, Sergey A; Lazutkin, Alexander A; Kedrov, Alexander V; Anokhin, Konstantin V; Enikolopov, Grigori N; Koulakov, Alexei A
2017-01-01
Current 3D imaging methods, including optical projection tomography, light-sheet microscopy, block-face imaging, and serial two photon tomography enable visualization of large samples of biological tissue. Large volumes of data obtained at high resolution require development of automatic image processing techniques, such as algorithms for automatic cell detection or, more generally, point-like object detection. Current approaches to automated cell detection suffer from difficulties originating from detection of particular cell types, cell populations of different brightness, non-uniformly stained, and overlapping cells. In this study, we present a set of algorithms for robust automatic cell detection in 3D. Our algorithms are suitable for, but not limited to, whole brain regions and individual brain sections. We used watershed procedure to split regional maxima representing overlapping cells. We developed a bootstrap Gaussian fit procedure to evaluate the statistical significance of detected cells. We compared cell detection quality of our algorithm and other software using 42 samples, representing 6 staining and imaging techniques. The results provided by our algorithm matched manual expert quantification with signal-to-noise dependent confidence, including samples with cells of different brightness, non-uniformly stained, and overlapping cells for whole brain regions and individual tissue sections. Our algorithm provided the best cell detection quality among tested free and commercial software.
Multirate sampled-data yaw-damper and modal suppression system design
NASA Technical Reports Server (NTRS)
Berg, Martin C.; Mason, Gregory S.
1990-01-01
A multirate control law synthesized algorithm based on an infinite-time quadratic cost function, was developed along with a method for analyzing the robustness of multirate systems. A generalized multirate sampled-data control law structure (GMCLS) was introduced. A new infinite-time-based parameter optimization multirate sampled-data control law synthesis method and solution algorithm were developed. A singular-value-based method for determining gain and phase margins for multirate systems was also developed. The finite-time-based parameter optimization multirate sampled-data control law synthesis algorithm originally intended to be applied to the aircraft problem was instead demonstrated by application to a simpler problem involving the control of the tip position of a two-link robot arm. The GMCLS, the infinite-time-based parameter optimization multirate control law synthesis method and solution algorithm, and the singular-value based method for determining gain and phase margins were all demonstrated by application to the aircraft control problem originally proposed for this project.
An Asymptotically-Optimal Sampling-Based Algorithm for Bi-directional Motion Planning
Starek, Joseph A.; Gomez, Javier V.; Schmerling, Edward; Janson, Lucas; Moreno, Luis; Pavone, Marco
2015-01-01
Bi-directional search is a widely used strategy to increase the success and convergence rates of sampling-based motion planning algorithms. Yet, few results are available that merge both bi-directional search and asymptotic optimality into existing optimal planners, such as PRM*, RRT*, and FMT*. The objective of this paper is to fill this gap. Specifically, this paper presents a bi-directional, sampling-based, asymptotically-optimal algorithm named Bi-directional FMT* (BFMT*) that extends the Fast Marching Tree (FMT*) algorithm to bidirectional search while preserving its key properties, chiefly lazy search and asymptotic optimality through convergence in probability. BFMT* performs a two-source, lazy dynamic programming recursion over a set of randomly-drawn samples, correspondingly generating two search trees: one in cost-to-come space from the initial configuration and another in cost-to-go space from the goal configuration. Numerical experiments illustrate the advantages of BFMT* over its unidirectional counterpart, as well as a number of other state-of-the-art planners. PMID:27004130
Pedigree data analysis with crossover interference.
Browning, Sharon
2003-01-01
We propose a new method for calculating probabilities for pedigree genetic data that incorporates crossover interference using the chi-square models. Applications include relationship inference, genetic map construction, and linkage analysis. The method is based on importance sampling of unobserved inheritance patterns conditional on the observed genotype data and takes advantage of fast algorithms for no-interference models while using reweighting to allow for interference. We show that the method is effective for arbitrarily many markers with small pedigrees. PMID:12930760
Chen, Bin; Peng, Xiuming; Xie, Tiansheng; Jin, Changzhong; Liu, Fumin; Wu, Nanping
2017-07-01
Currently, there are three algorithms for screening of syphilis: traditional algorithm, reverse algorithm and European Centre for Disease Prevention and Control (ECDC) algorithm. To date, there is not a generally recognized diagnostic algorithm. When syphilis meets HIV, the situation is even more complex. To evaluate their screening performance and impact on the seroprevalence of syphilis in HIV-infected individuals, we conducted a cross-sectional study included 865 serum samples from HIV-infected patients in a tertiary hospital. Every sample (one per patient) was tested with toluidine red unheated serum test (TRUST), T. pallidum particle agglutination assay (TPPA), and Treponema pallidum enzyme immunoassay (TP-EIA) according to the manufacturer's instructions. The results of syphilis serological testing were interpreted following different algorithms respectively. We directly compared the traditional syphilis screening algorithm with the reverse syphilis screening algorithm in this unique population. The reverse algorithm achieved remarkable higher seroprevalence of syphilis than the traditional algorithm (24.9% vs. 14.2%, p < 0.0001). Compared to the reverse algorithm, the traditional algorithm also had a missed serodiagnosis rate of 42.8%. The total percentages of agreement and corresponding kappa values of tradition and ECDC algorithm compared with those of reverse algorithm were as follows: 89.4%,0.668; 99.8%, 0.994. There was a very good strength of agreement between the reverse and the ECDC algorithm. Our results supported the reverse (or ECDC) algorithm in screening of syphilis in HIV-infected populations. In addition, our study demonstrated that screening of HIV-populations using different algorithms may result in a statistically different seroprevalence of syphilis.
Digital pile-up rejection for plutonium experiments with solution-grown stilbene
NASA Astrophysics Data System (ADS)
Bourne, M. M.; Clarke, S. D.; Paff, M.; DiFulvio, A.; Norsworthy, M.; Pozzi, S. A.
2017-01-01
A solution-grown stilbene detector was used in several experiments with plutonium samples including plutonium oxide, mixed oxide, and plutonium metal samples. Neutrons from different reactions and plutonium isotopes are accompanied by numerous gamma rays especially by the 59-keV gamma ray of 241Am. Identifying neutrons correctly is important for nuclear nonproliferation applications and makes neutron/gamma discrimination and pile-up rejection necessary. Each experimental dataset is presented with and without pile-up filtering using a previously developed algorithm. The experiments were simulated using MCNPX-PoliMi, a Monte Carlo code designed to accurately model scintillation detector response. Collision output from MCNPX-PoliMi was processed using the specialized MPPost post-processing code to convert neutron energy depositions event-by-event into light pulses. The model was compared to experimental data after pulse-shape discrimination identified waveforms as gamma ray or neutron interactions. We show that the use of the digital pile-up rejection algorithm allows for accurate neutron counting with stilbene to within 2% even when not using lead shielding.
A Modular Low-Complexity ECG Delineation Algorithm for Real-Time Embedded Systems.
Bote, Jose Manuel; Recas, Joaquin; Rincon, Francisco; Atienza, David; Hermida, Roman
2018-03-01
This work presents a new modular and low-complexity algorithm for the delineation of the different ECG waves (QRS, P and T peaks, onsets, and end). Involving a reduced number of operations per second and having a small memory footprint, this algorithm is intended to perform real-time delineation on resource-constrained embedded systems. The modular design allows the algorithm to automatically adjust the delineation quality in runtime to a wide range of modes and sampling rates, from a ultralow-power mode when no arrhythmia is detected, in which the ECG is sampled at low frequency, to a complete high-accuracy delineation mode, in which the ECG is sampled at high frequency and all the ECG fiducial points are detected, in the case of arrhythmia. The delineation algorithm has been adjusted using the QT database, providing very high sensitivity and positive predictivity, and validated with the MIT database. The errors in the delineation of all the fiducial points are below the tolerances given by the Common Standards for Electrocardiography Committee in the high-accuracy mode, except for the P wave onset, for which the algorithm is above the agreed tolerances by only a fraction of the sample duration. The computational load for the ultralow-power 8-MHz TI MSP430 series microcontroller ranges from 0.2% to 8.5% according to the mode used.
Differentially Private Frequent Sequence Mining via Sampling-based Candidate Pruning
Xu, Shengzhi; Cheng, Xiang; Li, Zhengyi; Xiong, Li
2016-01-01
In this paper, we study the problem of mining frequent sequences under the rigorous differential privacy model. We explore the possibility of designing a differentially private frequent sequence mining (FSM) algorithm which can achieve both high data utility and a high degree of privacy. We found, in differentially private FSM, the amount of required noise is proportionate to the number of candidate sequences. If we could effectively reduce the number of unpromising candidate sequences, the utility and privacy tradeoff can be significantly improved. To this end, by leveraging a sampling-based candidate pruning technique, we propose a novel differentially private FSM algorithm, which is referred to as PFS2. The core of our algorithm is to utilize sample databases to further prune the candidate sequences generated based on the downward closure property. In particular, we use the noisy local support of candidate sequences in the sample databases to estimate which sequences are potentially frequent. To improve the accuracy of such private estimations, a sequence shrinking method is proposed to enforce the length constraint on the sample databases. Moreover, to decrease the probability of misestimating frequent sequences as infrequent, a threshold relaxation method is proposed to relax the user-specified threshold for the sample databases. Through formal privacy analysis, we show that our PFS2 algorithm is ε-differentially private. Extensive experiments on real datasets illustrate that our PFS2 algorithm can privately find frequent sequences with high accuracy. PMID:26973430
Yin, Weiwei; Garimalla, Swetha; Moreno, Alberto; Galinski, Mary R; Styczynski, Mark P
2015-08-28
There are increasing efforts to bring high-throughput systems biology techniques to bear on complex animal model systems, often with a goal of learning about underlying regulatory network structures (e.g., gene regulatory networks). However, complex animal model systems typically have significant limitations on cohort sizes, number of samples, and the ability to perform follow-up and validation experiments. These constraints are particularly problematic for many current network learning approaches, which require large numbers of samples and may predict many more regulatory relationships than actually exist. Here, we test the idea that by leveraging the accuracy and efficiency of classifiers, we can construct high-quality networks that capture important interactions between variables in datasets with few samples. We start from a previously-developed tree-like Bayesian classifier and generalize its network learning approach to allow for arbitrary depth and complexity of tree-like networks. Using four diverse sample networks, we demonstrate that this approach performs consistently better at low sample sizes than the Sparse Candidate Algorithm, a representative approach for comparison because it is known to generate Bayesian networks with high positive predictive value. We develop and demonstrate a resampling-based approach to enable the identification of a viable root for the learned tree-like network, important for cases where the root of a network is not known a priori. We also develop and demonstrate an integrated resampling-based approach to the reduction of variable space for the learning of the network. Finally, we demonstrate the utility of this approach via the analysis of a transcriptional dataset of a malaria challenge in a non-human primate model system, Macaca mulatta, suggesting the potential to capture indicators of the earliest stages of cellular differentiation during leukopoiesis. We demonstrate that by starting from effective and efficient approaches for creating classifiers, we can identify interesting tree-like network structures with significant ability to capture the relationships in the training data. This approach represents a promising strategy for inferring networks with high positive predictive value under the constraint of small numbers of samples, meeting a need that will only continue to grow as more high-throughput studies are applied to complex model systems.
Slice sampling technique in Bayesian extreme of gold price modelling
NASA Astrophysics Data System (ADS)
Rostami, Mohammad; Adam, Mohd Bakri; Ibrahim, Noor Akma; Yahya, Mohamed Hisham
2013-09-01
In this paper, a simulation study of Bayesian extreme values by using Markov Chain Monte Carlo via slice sampling algorithm is implemented. We compared the accuracy of slice sampling with other methods for a Gumbel model. This study revealed that slice sampling algorithm offers more accurate and closer estimates with less RMSE than other methods . Finally we successfully employed this procedure to estimate the parameters of Malaysia extreme gold price from 2000 to 2011.
Static versus dynamic sampling for data mining
DOE Office of Scientific and Technical Information (OSTI.GOV)
John, G.H.; Langley, P.
1996-12-31
As data warehouses grow to the point where one hundred gigabytes is considered small, the computational efficiency of data-mining algorithms on large databases becomes increasingly important. Using a sample from the database can speed up the datamining process, but this is only acceptable if it does not reduce the quality of the mined knowledge. To this end, we introduce the {open_quotes}Probably Close Enough{close_quotes} criterion to describe the desired properties of a sample. Sampling usually refers to the use of static statistical tests to decide whether a sample is sufficiently similar to the large database, in the absence of any knowledgemore » of the tools the data miner intends to use. We discuss dynamic sampling methods, which take into account the mining tool being used and can thus give better samples. We describe dynamic schemes that observe a mining tool`s performance on training samples of increasing size and use these results to determine when a sample is sufficiently large. We evaluate these sampling methods on data from the UCI repository and conclude that dynamic sampling is preferable.« less
Mapping chemicals in air using an environmental CAT scanning system: evaluation of algorithms
NASA Astrophysics Data System (ADS)
Samanta, A.; Todd, L. A.
A new technique is being developed which creates near real-time maps of chemical concentrations in air for environmental and occupational environmental applications. This technique, we call Environmental CAT Scanning, combines the real-time measuring technique of open-path Fourier transform infrared spectroscopy with the mapping capabilitites of computed tomography to produce two-dimensional concentration maps. With this system, a network of open-path measurements is obtained over an area; measurements are then processed using a tomographic algorithm to reconstruct the concentrations. This research focussed on the process of evaluating and selecting appropriate reconstruction algorithms, for use in the field, by using test concentration data from both computer simultation and laboratory chamber studies. Four algorithms were tested using three types of data: (1) experimental open-path data from studies that used a prototype opne-path Fourier transform/computed tomography system in an exposure chamber; (2) synthetic open-path data generated from maps created by kriging point samples taken in the chamber studies (in 1), and; (3) synthetic open-path data generated using a chemical dispersion model to create time seires maps. The iterative algorithms used to reconstruct the concentration data were: Algebraic Reconstruction Technique without Weights (ART1), Algebraic Reconstruction Technique with Weights (ARTW), Maximum Likelihood with Expectation Maximization (MLEM) and Multiplicative Algebraic Reconstruction Technique (MART). Maps were evaluated quantitatively and qualitatively. In general, MART and MLEM performed best, followed by ARTW and ART1. However, algorithm performance varied under different contaminant scenarios. This study showed the importance of using a variety of maps, particulary those generated using dispersion models. The time series maps provided a more rigorous test of the algorithms and allowed distinctions to be made among the algorithms. A comprehensive evaluation of algorithms, for the environmental application of tomography, requires the use of a battery of test concentration data before field implementation, which models reality and tests the limits of the algorithms.
Tensor Rank Preserving Discriminant Analysis for Facial Recognition.
Tao, Dapeng; Guo, Yanan; Li, Yaotang; Gao, Xinbo
2017-10-12
Facial recognition, one of the basic topics in computer vision and pattern recognition, has received substantial attention in recent years. However, for those traditional facial recognition algorithms, the facial images are reshaped to a long vector, thereby losing part of the original spatial constraints of each pixel. In this paper, a new tensor-based feature extraction algorithm termed tensor rank preserving discriminant analysis (TRPDA) for facial image recognition is proposed; the proposed method involves two stages: in the first stage, the low-dimensional tensor subspace of the original input tensor samples was obtained; in the second stage, discriminative locality alignment was utilized to obtain the ultimate vector feature representation for subsequent facial recognition. On the one hand, the proposed TRPDA algorithm fully utilizes the natural structure of the input samples, and it applies an optimization criterion that can directly handle the tensor spectral analysis problem, thereby decreasing the computation cost compared those traditional tensor-based feature selection algorithms. On the other hand, the proposed TRPDA algorithm extracts feature by finding a tensor subspace that preserves most of the rank order information of the intra-class input samples. Experiments on the three facial databases are performed here to determine the effectiveness of the proposed TRPDA algorithm.
A complete diet-based algorithm for predicting nonheme iron absorption in adults.
Armah, Seth M; Carriquiry, Alicia; Sullivan, Debra; Cook, James D; Reddy, Manju B
2013-07-01
Many algorithms have been developed in the past few decades to estimate nonheme iron absorption from the diet based on single meal absorption studies. Yet single meal studies exaggerate the effect of diet and other factors on absorption. Here, we propose a new algorithm based on complete diets for estimating nonheme iron absorption. We used data from 4 complete diet studies each with 12-14 participants for a total of 53 individuals (19 men and 34 women) aged 19-38 y. In each study, each participant was observed during three 1-wk periods during which they consumed different diets. The diets were typical, high, or low in meat, tea, calcium, or vitamin C. The total sample size was 159 (53 × 3) observations. We used multiple linear regression to quantify the effect of different factors on iron absorption. Serum ferritin was the most important factor in explaining differences in nonheme iron absorption, whereas the effect of dietary factors was small. When our algorithm was validated with single meal and complete diet data, the respective R(2) values were 0.57 (P < 0.001) and 0.84 (P < 0.0001). The results also suggest that between-person variations explain a large proportion of the differences in nonheme iron absorption. The algorithm based on complete diets we propose is useful for predicting nonheme iron absorption from the diets of different populations.
A seismic coherency method using spectral amplitudes
NASA Astrophysics Data System (ADS)
Sui, Jing-Kun; Zheng, Xiao-Dong; Li, Yan-Dong
2015-09-01
Seismic coherence is used to detect discontinuities in underground media. However, strata with steeply dipping structures often produce false low coherence estimates and thus incorrect discontinuity characterization results. It is important to eliminate or reduce the effect of dipping on coherence estimates. To solve this problem, time-domain dip scanning is typically used to improve estimation of coherence in areas with steeply dipping structures. However, the accuracy of the time-domain estimation of dip is limited by the sampling interval. In contrast, the spectrum amplitude is not affected by the time delays in adjacent seismic traces caused by dipping structures. We propose a coherency algorithm that uses the spectral amplitudes of seismic traces within a predefined analysis window to construct the covariance matrix. The coherency estimates with the proposed algorithm is defined as the ratio between the dominant eigenvalue and the sum of all eigenvalues of the constructed covariance matrix. Thus, we eliminate the effect of dipping structures on coherency estimates. In addition, because different frequency bands of spectral amplitudes are used to estimate coherency, the proposed algorithm has multiscale features. Low frequencies are effective for characterizing large-scale faults, whereas high frequencies are better in characterizing small-scale faults. Application to synthetic and real seismic data show that the proposed algorithm can eliminate the effect of dip and produce better coherence estimates than conventional coherency algorithms in areas with steeply dipping structures.
Nasiri, Jaber; Naghavi, Mohammad Reza; Kayvanjoo, Amir Hossein; Nasiri, Mojtaba; Ebrahimi, Mansour
2015-03-07
For the first time, prediction accuracies of some supervised and unsupervised algorithms were evaluated in an SSR-based DNA fingerprinting study of a pea collection containing 20 cultivars and 57 wild samples. In general, according to the 10 attribute weighting models, the SSR alleles of PEAPHTAP-2 and PSBLOX13.2-1 were the two most important attributes to generate discrimination among eight different species and subspecies of genus Pisum. In addition, K-Medoids unsupervised clustering run on Chi squared dataset exhibited the best prediction accuracy (83.12%), while the lowest accuracy (25.97%) gained as K-Means model ran on FCdb database. Irrespective of some fluctuations, the overall accuracies of tree induction models were significantly high for many algorithms, and the attributes PSBLOX13.2-3 and PEAPHTAP could successfully detach Pisum fulvum accessions and cultivars from the others when two selected decision trees were taken into account. Meanwhile, the other used supervised algorithms exhibited overall reliable accuracies, even though in some rare cases, they gave us low amounts of accuracies. Our results, altogether, demonstrate promising applications of both supervised and unsupervised algorithms to provide suitable data mining tools regarding accurate fingerprinting of different species and subspecies of genus Pisum, as a fundamental priority task in breeding programs of the crop. Copyright © 2015 Elsevier Ltd. All rights reserved.
An efficient genetic algorithm for maximum coverage deployment in wireless sensor networks.
Yoon, Yourim; Kim, Yong-Hyuk
2013-10-01
Sensor networks have a lot of applications such as battlefield surveillance, environmental monitoring, and industrial diagnostics. Coverage is one of the most important performance metrics for sensor networks since it reflects how well a sensor field is monitored. In this paper, we introduce the maximum coverage deployment problem in wireless sensor networks and analyze the properties of the problem and its solution space. Random deployment is the simplest way to deploy sensor nodes but may cause unbalanced deployment and therefore, we need a more intelligent way for sensor deployment. We found that the phenotype space of the problem is a quotient space of the genotype space in a mathematical view. Based on this property, we propose an efficient genetic algorithm using a novel normalization method. A Monte Carlo method is adopted to design an efficient evaluation function, and its computation time is decreased without loss of solution quality using a method that starts from a small number of random samples and gradually increases the number for subsequent generations. The proposed genetic algorithms could be further improved by combining with a well-designed local search. The performance of the proposed genetic algorithm is shown by a comparative experimental study. When compared with random deployment and existing methods, our genetic algorithm was not only about twice faster, but also showed significant performance improvement in quality.
Xiao, Chuan-Le; Chen, Xiao-Zhou; Du, Yang-Li; Sun, Xuesong; Zhang, Gong; He, Qing-Yu
2013-01-04
Mass spectrometry has become one of the most important technologies in proteomic analysis. Tandem mass spectrometry (LC-MS/MS) is a major tool for the analysis of peptide mixtures from protein samples. The key step of MS data processing is the identification of peptides from experimental spectra by searching public sequence databases. Although a number of algorithms to identify peptides from MS/MS data have been already proposed, e.g. Sequest, OMSSA, X!Tandem, Mascot, etc., they are mainly based on statistical models considering only peak-matches between experimental and theoretical spectra, but not peak intensity information. Moreover, different algorithms gave different results from the same MS data, implying their probable incompleteness and questionable reproducibility. We developed a novel peptide identification algorithm, ProVerB, based on a binomial probability distribution model of protein tandem mass spectrometry combined with a new scoring function, making full use of peak intensity information and, thus, enhancing the ability of identification. Compared with Mascot, Sequest, and SQID, ProVerB identified significantly more peptides from LC-MS/MS data sets than the current algorithms at 1% False Discovery Rate (FDR) and provided more confident peptide identifications. ProVerB is also compatible with various platforms and experimental data sets, showing its robustness and versatility. The open-source program ProVerB is available at http://bioinformatics.jnu.edu.cn/software/proverb/ .
Peak-locking centroid bias in Shack-Hartmann wavefront sensing
NASA Astrophysics Data System (ADS)
Anugu, Narsireddy; Garcia, Paulo J. V.; Correia, Carlos M.
2018-05-01
Shack-Hartmann wavefront sensing relies on accurate spot centre measurement. Several algorithms were developed with this aim, mostly focused on precision, i.e. minimizing random errors. In the solar and extended scene community, the importance of the accuracy (bias error due to peak-locking, quantization, or sampling) of the centroid determination was identified and solutions proposed. But these solutions only allow partial bias corrections. To date, no systematic study of the bias error was conducted. This article bridges the gap by quantifying the bias error for different correlation peak-finding algorithms and types of sub-aperture images and by proposing a practical solution to minimize its effects. Four classes of sub-aperture images (point source, elongated laser guide star, crowded field, and solar extended scene) together with five types of peak-finding algorithms (1D parabola, the centre of gravity, Gaussian, 2D quadratic polynomial, and pyramid) are considered, in a variety of signal-to-noise conditions. The best performing peak-finding algorithm depends on the sub-aperture image type, but none is satisfactory to both bias and random errors. A practical solution is proposed that relies on the antisymmetric response of the bias to the sub-pixel position of the true centre. The solution decreases the bias by a factor of ˜7 to values of ≲ 0.02 pix. The computational cost is typically twice of current cross-correlation algorithms.
Sun, Jun; Zhou, Xin; Wu, Xiaohong; Zhang, Xiaodong; Li, Qinglin
2016-02-26
Fast identification of moisture content in tobacco plant leaves plays a key role in the tobacco cultivation industry and benefits the management of tobacco plant in the farm. In order to identify moisture content of tobacco plant leaves in a fast and nondestructive way, a method involving Mahalanobis distance coupled with Monte Carlo cross validation(MD-MCCV) was proposed to eliminate outlier sample in this study. The hyperspectral data of 200 tobacco plant leaf samples of 20 moisture gradients were obtained using FieldSpc(®) 3 spectrometer. Savitzky-Golay smoothing(SG), roughness penalty smoothing(RPS), kernel smoothing(KS) and median smoothing(MS) were used to preprocess the raw spectra. In addition, Mahalanobis distance(MD), Monte Carlo cross validation(MCCV) and Mahalanobis distance coupled to Monte Carlo cross validation(MD-MCCV) were applied to select the outlier sample of the raw spectrum and four smoothing preprocessing spectra. Successive projections algorithm (SPA) was used to extract the most influential wavelengths. Multiple Linear Regression (MLR) was applied to build the prediction models based on preprocessed spectra feature in characteristic wavelengths. The results showed that the preferably four prediction model were MD-MCCV-SG (Rp(2) = 0.8401 and RMSEP = 0.1355), MD-MCCV-RPS (Rp(2) = 0.8030 and RMSEP = 0.1274), MD-MCCV-KS (Rp(2) = 0.8117 and RMSEP = 0.1433), MD-MCCV-MS (Rp(2) = 0.9132 and RMSEP = 0.1162). MD-MCCV algorithm performed best among MD algorithm, MCCV algorithm and the method without sample pretreatment algorithm in the eliminating outlier sample from 20 different moisture gradients of tobacco plant leaves and MD-MCCV can be used to eliminate outlier sample in the spectral preprocessing. Copyright © 2016 Elsevier Inc. All rights reserved.
Developing a cosmic ray muon sampling capability for muon tomography and monitoring applications
NASA Astrophysics Data System (ADS)
Chatzidakis, S.; Chrysikopoulou, S.; Tsoukalas, L. H.
2015-12-01
In this study, a cosmic ray muon sampling capability using a phenomenological model that captures the main characteristics of the experimentally measured spectrum coupled with a set of statistical algorithms is developed. The "muon generator" produces muons with zenith angles in the range 0-90° and energies in the range 1-100 GeV and is suitable for Monte Carlo simulations with emphasis on muon tomographic and monitoring applications. The muon energy distribution is described by the Smith and Duller (1959) [35] phenomenological model. Statistical algorithms are then employed for generating random samples. The inverse transform provides a means to generate samples from the muon angular distribution, whereas the Acceptance-Rejection and Metropolis-Hastings algorithms are employed to provide the energy component. The predictions for muon energies 1-60 GeV and zenith angles 0-90° are validated with a series of actual spectrum measurements and with estimates from the software library CRY. The results confirm the validity of the phenomenological model and the applicability of the statistical algorithms to generate polyenergetic-polydirectional muons. The response of the algorithms and the impact of critical parameters on computation time and computed results were investigated. Final output from the proposed "muon generator" is a look-up table that contains the sampled muon angles and energies and can be easily integrated into Monte Carlo particle simulation codes such as Geant4 and MCNP.
Ahmed, Afaz Uddin; Tariqul Islam, Mohammad; Ismail, Mahamod; Kibria, Salehin; Arshad, Haslina
2014-01-01
An artificial neural network (ANN) and affinity propagation (AP) algorithm based user categorization technique is presented. The proposed algorithm is designed for closed access femtocell network. ANN is used for user classification process and AP algorithm is used to optimize the ANN training process. AP selects the best possible training samples for faster ANN training cycle. The users are distinguished by using the difference of received signal strength in a multielement femtocell device. A previously developed directive microstrip antenna is used to configure the femtocell device. Simulation results show that, for a particular house pattern, the categorization technique without AP algorithm takes 5 indoor users and 10 outdoor users to attain an error-free operation. While integrating AP algorithm with ANN, the system takes 60% less training samples reducing the training time up to 50%. This procedure makes the femtocell more effective for closed access operation. PMID:25133214
Ahmed, Afaz Uddin; Islam, Mohammad Tariqul; Ismail, Mahamod; Kibria, Salehin; Arshad, Haslina
2014-01-01
An artificial neural network (ANN) and affinity propagation (AP) algorithm based user categorization technique is presented. The proposed algorithm is designed for closed access femtocell network. ANN is used for user classification process and AP algorithm is used to optimize the ANN training process. AP selects the best possible training samples for faster ANN training cycle. The users are distinguished by using the difference of received signal strength in a multielement femtocell device. A previously developed directive microstrip antenna is used to configure the femtocell device. Simulation results show that, for a particular house pattern, the categorization technique without AP algorithm takes 5 indoor users and 10 outdoor users to attain an error-free operation. While integrating AP algorithm with ANN, the system takes 60% less training samples reducing the training time up to 50%. This procedure makes the femtocell more effective for closed access operation.
A Novel Hyperspectral Microscopic Imaging System for Evaluating Fresh Degree of Pork.
Xu, Yi; Chen, Quansheng; Liu, Yan; Sun, Xin; Huang, Qiping; Ouyang, Qin; Zhao, Jiewen
2018-04-01
This study proposed a rapid microscopic examination method for pork freshness evaluation by using the self-assembled hyperspectral microscopic imaging (HMI) system with the help of feature extraction algorithm and pattern recognition methods. Pork samples were stored for different days ranging from 0 to 5 days and the freshness of samples was divided into three levels which were determined by total volatile basic nitrogen (TVB-N) content. Meanwhile, hyperspectral microscopic images of samples were acquired by HMI system and processed by the following steps for the further analysis. Firstly, characteristic hyperspectral microscopic images were extracted by using principal component analysis (PCA) and then texture features were selected based on the gray level co-occurrence matrix (GLCM). Next, features data were reduced dimensionality by fisher discriminant analysis (FDA) for further building classification model. Finally, compared with linear discriminant analysis (LDA) model and support vector machine (SVM) model, good back propagation artificial neural network (BP-ANN) model obtained the best freshness classification with a 100 % accuracy rating based on the extracted data. The results confirm that the fabricated HMI system combined with multivariate algorithms has ability to evaluate the fresh degree of pork accurately in the microscopic level, which plays an important role in animal food quality control.
A Novel Hyperspectral Microscopic Imaging System for Evaluating Fresh Degree of Pork
Xu, Yi; Chen, Quansheng; Liu, Yan; Sun, Xin; Huang, Qiping; Ouyang, Qin; Zhao, Jiewen
2018-01-01
Abstract This study proposed a rapid microscopic examination method for pork freshness evaluation by using the self-assembled hyperspectral microscopic imaging (HMI) system with the help of feature extraction algorithm and pattern recognition methods. Pork samples were stored for different days ranging from 0 to 5 days and the freshness of samples was divided into three levels which were determined by total volatile basic nitrogen (TVB-N) content. Meanwhile, hyperspectral microscopic images of samples were acquired by HMI system and processed by the following steps for the further analysis. Firstly, characteristic hyperspectral microscopic images were extracted by using principal component analysis (PCA) and then texture features were selected based on the gray level co-occurrence matrix (GLCM). Next, features data were reduced dimensionality by fisher discriminant analysis (FDA) for further building classification model. Finally, compared with linear discriminant analysis (LDA) model and support vector machine (SVM) model, good back propagation artificial neural network (BP-ANN) model obtained the best freshness classification with a 100 % accuracy rating based on the extracted data. The results confirm that the fabricated HMI system combined with multivariate algorithms has ability to evaluate the fresh degree of pork accurately in the microscopic level, which plays an important role in animal food quality control. PMID:29805285
Pugliese, Cara E; Kenworthy, Lauren; Bal, Vanessa Hus; Wallace, Gregory L; Yerys, Benjamin E; Maddox, Brenna B; White, Susan W; Popal, Haroon; Armour, Anna Chelsea; Miller, Judith; Herrington, John D; Schultz, Robert T; Martin, Alex; Anthony, Laura Gutermuth
2015-12-01
Recent updates have been proposed to the Autism Diagnostic Observation Schedule-2 Module 4 diagnostic algorithm. This new algorithm, however, has not yet been validated in an independent sample without intellectual disability (ID). This multi-site study compared the original and revised algorithms in individuals with ASD without ID. The revised algorithm demonstrated increased sensitivity, but lower specificity in the overall sample. Estimates were highest for females, individuals with a verbal IQ below 85 or above 115, and ages 16 and older. Best practice diagnostic procedures should include the Module 4 in conjunction with other assessment tools. Balancing needs for sensitivity and specificity depending on the purpose of assessment (e.g., clinical vs. research) and demographic characteristics mentioned above will enhance its utility.
NASA Technical Reports Server (NTRS)
Lawton, Pat
2004-01-01
The objective of this work was to support the design of improved IUE NEWSIPS high dispersion extraction algorithms. The purpose of this work was to evaluate use of the Linearized Image (LIHI) file versus the Re-Sampled Image (SIHI) file, evaluate various extraction, and design algorithms for evaluation of IUE High Dispersion spectra. It was concluded the use of the Re-Sampled Image (SIHI) file was acceptable. Since the Gaussian profile worked well for the core and the Lorentzian profile worked well for the wings, the Voigt profile was chosen for use in the extraction algorithm. It was found that the gamma and sigma parameters varied significantly across the detector, so gamma and sigma masks for the SWP detector were developed. Extraction code was written.
Annealed Importance Sampling for Neural Mass Models
Penny, Will; Sengupta, Biswa
2016-01-01
Neural Mass Models provide a compact description of the dynamical activity of cell populations in neocortical regions. Moreover, models of regional activity can be connected together into networks, and inferences made about the strength of connections, using M/EEG data and Bayesian inference. To date, however, Bayesian methods have been largely restricted to the Variational Laplace (VL) algorithm which assumes that the posterior distribution is Gaussian and finds model parameters that are only locally optimal. This paper explores the use of Annealed Importance Sampling (AIS) to address these restrictions. We implement AIS using proposals derived from Langevin Monte Carlo (LMC) which uses local gradient and curvature information for efficient exploration of parameter space. In terms of the estimation of Bayes factors, VL and AIS agree about which model is best but report different degrees of belief. Additionally, AIS finds better model parameters and we find evidence of non-Gaussianity in their posterior distribution. PMID:26942606
Importance-sampling computation of statistical properties of coupled oscillators
NASA Astrophysics Data System (ADS)
Gupta, Shamik; Leitão, Jorge C.; Altmann, Eduardo G.
2017-07-01
We introduce and implement an importance-sampling Monte Carlo algorithm to study systems of globally coupled oscillators. Our computational method efficiently obtains estimates of the tails of the distribution of various measures of dynamical trajectories corresponding to states occurring with (exponentially) small probabilities. We demonstrate the general validity of our results by applying the method to two contrasting cases: the driven-dissipative Kuramoto model, a paradigm in the study of spontaneous synchronization; and the conservative Hamiltonian mean-field model, a prototypical system of long-range interactions. We present results for the distribution of the finite-time Lyapunov exponent and a time-averaged order parameter. Among other features, our results show most notably that the distributions exhibit a vanishing standard deviation but a skewness that is increasing in magnitude with the number of oscillators, implying that nontrivial asymmetries and states yielding rare or atypical values of the observables persist even for a large number of oscillators.
Efficient method of image edge detection based on FSVM
NASA Astrophysics Data System (ADS)
Cai, Aiping; Xiong, Xiaomei
2013-07-01
For efficient object cover edge detection in digital images, this paper studied traditional methods and algorithm based on SVM. It analyzed Canny edge detection algorithm existed some pseudo-edge and poor anti-noise capability. In order to provide a reliable edge extraction method, propose a new detection algorithm based on FSVM. Which contains several steps: first, trains classify sample and gives the different membership function to different samples. Then, a new training sample is formed by increase the punishment some wrong sub-sample, and use the new FSVM classification model for train and test them. Finally the edges are extracted of the object image by using the model. Experimental result shows that good edge detection image will be obtained and adding noise experiments results show that this method has good anti-noise.
Detection of pneumonia using free-text radiology reports in the BioSense system.
Asatryan, Armenak; Benoit, Stephen; Ma, Haobo; English, Roseanne; Elkin, Peter; Tokars, Jerome
2011-01-01
Near real-time disease detection using electronic data sources is a public health priority. Detecting pneumonia is particularly important because it is the manifesting disease of several bioterrorism agents as well as a complication of influenza, including avian and novel H1N1 strains. Text radiology reports are available earlier than physician diagnoses and so could be integral to rapid detection of pneumonia. We performed a pilot study to determine which keywords present in text radiology reports are most highly associated with pneumonia diagnosis. Electronic radiology text reports from 11 hospitals from February 1, 2006 through December 31, 2007 were used. We created a computerized algorithm that searched for selected keywords ("airspace disease", "consolidation", "density", "infiltrate", "opacity", and "pneumonia"), differentiated between clinical history and radiographic findings, and accounted for negations and double negations; this algorithm was tested on a sample of 350 radiology reports. We used the algorithm to study 189,246 chest radiographs, searching for the keywords and determining their association with a final International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) diagnosis of pneumonia. Performance of the search algorithm in finding keywords, and association of the keywords with a pneumonia diagnosis. In the sample of 350 radiographs, the search algorithm was highly successful in identifying the selected keywords (sensitivity 98.5%, specificity 100%). Analysis of the 189,246 radiographs showed that the keyword "pneumonia" was the strongest predictor of an ICD-9-CM diagnosis of pneumonia (adjusted odds ratio 11.8) while "density" was the weakest (adjusted odds ratio 1.5). In general, the most highly associated keyword present in the report, regardless of whether a less highly associated keyword was also present, was the best predictor of a diagnosis of pneumonia. Empirical methods may assist in finding radiology report keywords that are most highly predictive of a pneumonia diagnosis. Copyright © 2010 Elsevier Ireland Ltd. All rights reserved.
Computationally efficient algorithm for high sampling-frequency operation of active noise control
NASA Astrophysics Data System (ADS)
Rout, Nirmal Kumar; Das, Debi Prasad; Panda, Ganapati
2015-05-01
In high sampling-frequency operation of active noise control (ANC) system the length of the secondary path estimate and the ANC filter are very long. This increases the computational complexity of the conventional filtered-x least mean square (FXLMS) algorithm. To reduce the computational complexity of long order ANC system using FXLMS algorithm, frequency domain block ANC algorithms have been proposed in past. These full block frequency domain ANC algorithms are associated with some disadvantages such as large block delay, quantization error due to computation of large size transforms and implementation difficulties in existing low-end DSP hardware. To overcome these shortcomings, the partitioned block ANC algorithm is newly proposed where the long length filters in ANC are divided into a number of equal partitions and suitably assembled to perform the FXLMS algorithm in the frequency domain. The complexity of this proposed frequency domain partitioned block FXLMS (FPBFXLMS) algorithm is quite reduced compared to the conventional FXLMS algorithm. It is further reduced by merging one fast Fourier transform (FFT)-inverse fast Fourier transform (IFFT) combination to derive the reduced structure FPBFXLMS (RFPBFXLMS) algorithm. Computational complexity analysis for different orders of filter and partition size are presented. Systematic computer simulations are carried out for both the proposed partitioned block ANC algorithms to show its accuracy compared to the time domain FXLMS algorithm.
An Enhanced K-Means Algorithm for Water Quality Analysis of The Haihe River in China
Zou, Hui; Zou, Zhihong; Wang, Xiaojing
2015-01-01
The increase and the complexity of data caused by the uncertain environment is today’s reality. In order to identify water quality effectively and reliably, this paper presents a modified fast clustering algorithm for water quality analysis. The algorithm has adopted a varying weights K-means cluster algorithm to analyze water monitoring data. The varying weights scheme was the best weighting indicator selected by a modified indicator weight self-adjustment algorithm based on K-means, which is named MIWAS-K-means. The new clustering algorithm avoids the margin of the iteration not being calculated in some cases. With the fast clustering analysis, we can identify the quality of water samples. The algorithm is applied in water quality analysis of the Haihe River (China) data obtained by the monitoring network over a period of eight years (2006–2013) with four indicators at seven different sites (2078 samples). Both the theoretical and simulated results demonstrate that the algorithm is efficient and reliable for water quality analysis of the Haihe River. In addition, the algorithm can be applied to more complex data matrices with high dimensionality. PMID:26569283
2D evaluation of spectral LIBS data derived from heterogeneous materials using cluster algorithm
NASA Astrophysics Data System (ADS)
Gottlieb, C.; Millar, S.; Grothe, S.; Wilsch, G.
2017-08-01
Laser-induced Breakdown Spectroscopy (LIBS) is capable of providing spatially resolved element maps in regard to the chemical composition of the sample. The evaluation of heterogeneous materials is often a challenging task, especially in the case of phase boundaries. In order to determine information about a certain phase of a material, the need for a method that offers an objective evaluation is necessary. This paper will introduce a cluster algorithm in the case of heterogeneous building materials (concrete) to separate the spectral information of non-relevant aggregates and cement matrix. In civil engineering, the information about the quantitative ingress of harmful species like Cl-, Na+ and SO42- is of great interest in the evaluation of the remaining lifetime of structures (Millar et al., 2015; Wilsch et al., 2005). These species trigger different damage processes such as the alkali-silica reaction (ASR) or the chloride-induced corrosion of the reinforcement. Therefore, a discrimination between the different phases, mainly cement matrix and aggregates, is highly important (Weritz et al., 2006). For the 2D evaluation, the expectation-maximization-algorithm (EM algorithm; Ester and Sander, 2000) has been tested for the application presented in this work. The method has been introduced and different figures of merit have been presented according to recommendations given in Haddad et al. (2014). Advantages of this method will be highlighted. After phase separation, non-relevant information can be excluded and only the wanted phase displayed. Using a set of samples with known and unknown composition, the EM-clustering method has been validated regarding to Gustavo González and Ángeles Herrador (2007).
EXhype: A tool for mineral classification using hyperspectral data
NASA Astrophysics Data System (ADS)
Adep, Ramesh Nityanand; shetty, Amba; Ramesh, H.
2017-02-01
Various supervised classification algorithms have been developed to classify earth surface features using hyperspectral data. Each algorithm is modelled based on different human expertises. However, the performance of conventional algorithms is not satisfactory to map especially the minerals in view of their typical spectral responses. This study introduces a new expert system named 'EXhype (Expert system for hyperspectral data classification)' to map minerals. The system incorporates human expertise at several stages of it's implementation: (i) to deal with intra-class variation; (ii) to identify absorption features; (iii) to discriminate spectra by considering absorption features, non-absorption features and by full spectra comparison; and (iv) finally takes a decision based on learning and by emphasizing most important features. It is developed using a knowledge base consisting of an Optimal Spectral Library, Segmented Upper Hull method, Spectral Angle Mapper (SAM) and Artificial Neural Network. The performance of the EXhype is compared with a traditional, most commonly used SAM algorithm using Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) data acquired over Cuprite, Nevada, USA. A virtual verification method is used to collect samples information for accuracy assessment. Further, a modified accuracy assessment method is used to get a real users accuracies in cases where only limited or desired classes are considered for classification. With the modified accuracy assessment method, SAM and EXhype yields an overall accuracy of 60.35% and 90.75% and the kappa coefficient of 0.51 and 0.89 respectively. It was also found that the virtual verification method allows to use most desired stratified random sampling method and eliminates all the difficulties associated with it. The experimental results show that EXhype is not only producing better accuracy compared to traditional SAM but, can also rightly classify the minerals. It is proficient in avoiding misclassification between target classes when applied on minerals.
Accelerating IMRT optimization by voxel sampling
NASA Astrophysics Data System (ADS)
Martin, Benjamin C.; Bortfeld, Thomas R.; Castañon, David A.
2007-12-01
This paper presents a new method for accelerating intensity-modulated radiation therapy (IMRT) optimization using voxel sampling. Rather than calculating the dose to the entire patient at each step in the optimization, the dose is only calculated for some randomly selected voxels. Those voxels are then used to calculate estimates of the objective and gradient which are used in a randomized version of a steepest descent algorithm. By selecting different voxels on each step, we are able to find an optimal solution to the full problem. We also present an algorithm to automatically choose the best sampling rate for each structure within the patient during the optimization. Seeking further improvements, we experimented with several other gradient-based optimization algorithms and found that the delta-bar-delta algorithm performs well despite the randomness. Overall, we were able to achieve approximately an order of magnitude speedup on our test case as compared to steepest descent.
Differential sampling for fast frequency acquisition via adaptive extended least squares algorithm
NASA Technical Reports Server (NTRS)
Kumar, Rajendra
1987-01-01
This paper presents a differential signal model along with appropriate sampling techinques for least squares estimation of the frequency and frequency derivatives and possibly the phase and amplitude of a sinusoid received in the presence of noise. The proposed algorithm is recursive in mesurements and thus the computational requirement increases only linearly with the number of measurements. The dimension of the state vector in the proposed algorithm does not depend upon the number of measurements and is quite small, typically around four. This is an advantage when compared to previous algorithms wherein the dimension of the state vector increases monotonically with the product of the frequency uncertainty and the observation period. Such a computational simplification may possibly result in some loss of optimality. However, by applying the sampling techniques of the paper such a possible loss in optimality can made small.
Real time network traffic monitoring for wireless local area networks based on compressed sensing
NASA Astrophysics Data System (ADS)
Balouchestani, Mohammadreza
2017-05-01
A wireless local area network (WLAN) is an important type of wireless networks which connotes different wireless nodes in a local area network. WLANs suffer from important problems such as network load balancing, large amount of energy, and load of sampling. This paper presents a new networking traffic approach based on Compressed Sensing (CS) for improving the quality of WLANs. The proposed architecture allows reducing Data Delay Probability (DDP) to 15%, which is a good record for WLANs. The proposed architecture is increased Data Throughput (DT) to 22 % and Signal to Noise (S/N) ratio to 17 %, which provide a good background for establishing high qualified local area networks. This architecture enables continuous data acquisition and compression of WLAN's signals that are suitable for a variety of other wireless networking applications. At the transmitter side of each wireless node, an analog-CS framework is applied at the sensing step before analog to digital converter in order to generate the compressed version of the input signal. At the receiver side of wireless node, a reconstruction algorithm is applied in order to reconstruct the original signals from the compressed signals with high probability and enough accuracy. The proposed algorithm out-performs existing algorithms by achieving a good level of Quality of Service (QoS). This ability allows reducing 15 % of Bit Error Rate (BER) at each wireless node.
Implementation of Wi-Fi Signal Sampling on an Android Smartphone for Indoor Positioning Systems.
Liu, Hung-Huan; Liu, Chun
2017-12-21
Collecting and maintaining radio fingerprint for wireless indoor positioning systems involves considerable time and labor. We have proposed the quick radio fingerprint collection (QRFC) algorithm which employed the built-in accelerometer of Android smartphones to implement step detection in order to assist in collecting radio fingerprints. In the present study, we divided the algorithm into moving sampling (MS) and stepped MS (SMS), and describe the implementation of both algorithms and their comparison. Technical details and common errors concerning the use of Android smartphones to collect Wi-Fi radio beacons were surveyed and discussed. The results of signal sampling experiments performed in a hallway measuring 54 m in length showed that in terms of the amount of time required to complete collection of access point (AP) signals, static sampling (SS; a traditional procedure for collecting Wi-Fi signals) took at least 2 h, whereas MS and SMS took approximately 150 and 300 s, respectively. Notably, AP signals obtained through MS and SMS were comparable to those obtained through SS in terms of the distribution of received signal strength indicator (RSSI) and positioning accuracy. Therefore, MS and SMS are recommended instead of SS as signal sampling procedures for indoor positioning algorithms.
Implementation of Wi-Fi Signal Sampling on an Android Smartphone for Indoor Positioning Systems
Liu, Chun
2017-01-01
Collecting and maintaining radio fingerprint for wireless indoor positioning systems involves considerable time and labor. We have proposed the quick radio fingerprint collection (QRFC) algorithm which employed the built-in accelerometer of Android smartphones to implement step detection in order to assist in collecting radio fingerprints. In the present study, we divided the algorithm into moving sampling (MS) and stepped MS (SMS), and describe the implementation of both algorithms and their comparison. Technical details and common errors concerning the use of Android smartphones to collect Wi-Fi radio beacons were surveyed and discussed. The results of signal sampling experiments performed in a hallway measuring 54 m in length showed that in terms of the amount of time required to complete collection of access point (AP) signals, static sampling (SS; a traditional procedure for collecting Wi-Fi signals) took at least 2 h, whereas MS and SMS took approximately 150 and 300 s, respectively. Notably, AP signals obtained through MS and SMS were comparable to those obtained through SS in terms of the distribution of received signal strength indicator (RSSI) and positioning accuracy. Therefore, MS and SMS are recommended instead of SS as signal sampling procedures for indoor positioning algorithms. PMID:29267234
Geostatistical modeling of riparian forest microclimate and its implications for sampling
Eskelson, B.N.I.; Anderson, P.D.; Hagar, J.C.; Temesgen, H.
2011-01-01
Predictive models of microclimate under various site conditions in forested headwater stream - riparian areas are poorly developed, and sampling designs for characterizing underlying riparian microclimate gradients are sparse. We used riparian microclimate data collected at eight headwater streams in the Oregon Coast Range to compare ordinary kriging (OK), universal kriging (UK), and kriging with external drift (KED) for point prediction of mean maximum air temperature (Tair). Several topographic and forest structure characteristics were considered as site-specific parameters. Height above stream and distance to stream were the most important covariates in the KED models, which outperformed OK and UK in terms of root mean square error. Sample patterns were optimized based on the kriging variance and the weighted means of shortest distance criterion using the simulated annealing algorithm. The optimized sample patterns outperformed systematic sample patterns in terms of mean kriging variance mainly for small sample sizes. These findings suggest methods for increasing efficiency of microclimate monitoring in riparian areas.
Ding, Pan; Gong, Xue-Qing
2016-05-01
Titanium dioxide (TiO2) is an important metal oxide that has been used in many different applications. TiO2 has also been widely employed as a model system to study basic processes and reactions in surface chemistry and heterogeneous catalysis. In this work, we investigated the (011) surface of rutile TiO2 by focusing on its reconstruction. Density functional theory calculations aided by a genetic algorithm based optimization scheme were performed to extensively sample the potential energy surfaces of reconstructed rutile TiO2 structures that obey (2 × 1) periodicity. A lot of stable surface configurations were located, including the global-minimum configuration that was proposed previously. The wide variety of surface structures determined through the calculations performed in this work provide insight into the relationship between the atomic configuration of a surface and its stability. More importantly, several analytical schemes were proposed and tested to gauge the differences and similarities among various surface structures, aiding the construction of the complete pathway for the reconstruction process.
Progressive sample processing of band selection for hyperspectral imagery
NASA Astrophysics Data System (ADS)
Liu, Keng-Hao; Chien, Hung-Chang; Chen, Shih-Yu
2017-10-01
Band selection (BS) is one of the most important topics in hyperspectral image (HSI) processing. The objective of BS is to find a set of representative bands that can represent the whole image with lower inter-band redundancy. Many types of BS algorithms were proposed in the past. However, most of them can be carried on in an off-line manner. It means that they can only be implemented on the pre-collected data. Those off-line based methods are sometime useless for those applications that are timeliness, particular in disaster prevention and target detection. To tackle this issue, a new concept, called progressive sample processing (PSP), was proposed recently. The PSP is an "on-line" framework where the specific type of algorithm can process the currently collected data during the data transmission under band-interleavedby-sample/pixel (BIS/BIP) protocol. This paper proposes an online BS method that integrates a sparse-based BS into PSP framework, called PSP-BS. In PSP-BS, the BS can be carried out by updating BS result recursively pixel by pixel in the same way that a Kalman filter does for updating data information in a recursive fashion. The sparse regression is solved by orthogonal matching pursuit (OMP) algorithm, and the recursive equations of PSP-BS are derived by using matrix decomposition. The experiments conducted on a real hyperspectral image show that the PSP-BS can progressively output the BS status with very low computing time. The convergence of BS results during the transmission can be quickly achieved by using a rearranged pixel transmission sequence. This significant advantage allows BS to be implemented in a real time manner when the HSI data is transmitted pixel by pixel.
Chaibub Neto, Elias; Bare, J. Christopher; Margolin, Adam A.
2014-01-01
New algorithms are continuously proposed in computational biology. Performance evaluation of novel methods is important in practice. Nonetheless, the field experiences a lack of rigorous methodology aimed to systematically and objectively evaluate competing approaches. Simulation studies are frequently used to show that a particular method outperforms another. Often times, however, simulation studies are not well designed, and it is hard to characterize the particular conditions under which different methods perform better. In this paper we propose the adoption of well established techniques in the design of computer and physical experiments for developing effective simulation studies. By following best practices in planning of experiments we are better able to understand the strengths and weaknesses of competing algorithms leading to more informed decisions about which method to use for a particular task. We illustrate the application of our proposed simulation framework with a detailed comparison of the ridge-regression, lasso and elastic-net algorithms in a large scale study investigating the effects on predictive performance of sample size, number of features, true model sparsity, signal-to-noise ratio, and feature correlation, in situations where the number of covariates is usually much larger than sample size. Analysis of data sets containing tens of thousands of features but only a few hundred samples is nowadays routine in computational biology, where “omics” features such as gene expression, copy number variation and sequence data are frequently used in the predictive modeling of complex phenotypes such as anticancer drug response. The penalized regression approaches investigated in this study are popular choices in this setting and our simulations corroborate well established results concerning the conditions under which each one of these methods is expected to perform best while providing several novel insights. PMID:25289666
NASA Astrophysics Data System (ADS)
Abedini, M. J.; Nasseri, M.; Burn, D. H.
2012-04-01
In any geostatistical study, an important consideration is the choice of an appropriate, repeatable, and objective search strategy that controls the nearby samples to be included in the location-specific estimation procedure. Almost all geostatistical software available in the market puts the onus on the user to supply search strategy parameters in a heuristic manner. These parameters are solely controlled by geographical coordinates that are defined for the entire area under study, and the user has no guidance as to how to choose these parameters. The main thesis of the current study is that the selection of search strategy parameters has to be driven by data—both the spatial coordinates and the sample values—and cannot be chosen beforehand. For this purpose, a genetic-algorithm-based ordinary kriging with moving neighborhood technique is proposed. The search capability of a genetic algorithm is exploited to search the feature space for appropriate, either local or global, search strategy parameters. Radius of circle/sphere and/or radii of standard or rotated ellipse/ellipsoid are considered as the decision variables to be optimized by GA. The superiority of GA-based ordinary kriging is demonstrated through application to the Wolfcamp Aquifer piezometric head data. Assessment of numerical results showed that definition of search strategy parameters based on both geographical coordinates and sample values improves cross-validation statistics when compared with that based on geographical coordinates alone. In the case of a variable search neighborhood for each estimation point, optimization of local search strategy parameters for an elliptical support domain—the orientation of which is dictated by anisotropic axes—via GA was able to capture the dynamics of piezometric head in west Texas/New Mexico in an efficient way.
de Godoy, Luiz Antonio Fonseca; Hantao, Leandro Wang; Pedroso, Marcio Pozzobon; Poppi, Ronei Jesus; Augusto, Fabio
2011-08-05
The use of multivariate curve resolution (MCR) to build multivariate quantitative models using data obtained from comprehensive two-dimensional gas chromatography with flame ionization detection (GC×GC-FID) is presented and evaluated. The MCR algorithm presents some important features, such as second order advantage and the recovery of the instrumental response for each pure component after optimization by an alternating least squares (ALS) procedure. A model to quantify the essential oil of rosemary was built using a calibration set containing only known concentrations of the essential oil and cereal alcohol as solvent. A calibration curve correlating the concentration of the essential oil of rosemary and the instrumental response obtained from the MCR-ALS algorithm was obtained, and this calibration model was applied to predict the concentration of the oil in complex samples (mixtures of the essential oil, pineapple essence and commercial perfume). The values of the root mean square error of prediction (RMSEP) and of the root mean square error of the percentage deviation (RMSPD) obtained were 0.4% (v/v) and 7.2%, respectively. Additionally, a second model was built and used to evaluate the accuracy of the method. A model to quantify the essential oil of lemon grass was built and its concentration was predicted in the validation set and real perfume samples. The RMSEP and RMSPD obtained were 0.5% (v/v) and 6.9%, respectively, and the concentration of the essential oil of lemon grass in perfume agreed to the value informed by the manufacturer. The result indicates that the MCR algorithm is adequate to resolve the target chromatogram from the complex sample and to build multivariate models of GC×GC-FID data. Copyright © 2011 Elsevier B.V. All rights reserved.
Nalbantoglu, Sinem; Abu-Asab, Mones; Tan, Ming; Zhang, Xuemin; Cai, Ling; Amri, Hakima
2016-07-01
Pancreatic ductal adenocarcinoma (PDAC) is one of the rapidly growing forms of pancreatic cancer with a poor prognosis and less than 5% 5-year survival rate. In this study, we characterized the genetic signatures and signaling pathways related to survival from PDAC, using a parsimony phylogenetic algorithm. We applied the parsimony phylogenetic algorithm to analyze the publicly available whole-genome in silico array analysis of a gene expression data set in 25 early-stage human PDAC specimens. We explain here that the parsimony phylogenetics is an evolutionary analytical method that offers important promise to uncover clonal (driver) and nonclonal (passenger) aberrations in complex diseases. In our analysis, parsimony and statistical analyses did not identify significant correlations between survival times and gene expression values. Thus, the survival rankings did not appear to be significantly different between patients for any specific gene (p > 0.05). Also, we did not find correlation between gene expression data and tumor stage in the present data set. While the present analysis was unable to identify in this relatively small sample of patients a molecular signature associated with pancreatic cancer prognosis, we suggest that future research and analyses with the parsimony phylogenetic algorithm in larger patient samples are worthwhile, given the devastating nature of pancreatic cancer and its early diagnosis, and the need for novel data analytic approaches. The future research practices might want to place greater emphasis on phylogenetics as one of the analytical paradigms, as our findings presented here are on the cusp of this shift, especially in the current era of Big Data and innovation policies advocating for greater data sharing and reanalysis.
Kramers-Kronig receiver operable without digital upsampling.
Bo, Tianwai; Kim, Hoon
2018-05-28
The Kramers-Kronig (KK) receiver is capable of retrieving the phase information of optical single-sideband (SSB) signal from the optical intensity when the optical signal satisfies the minimum phase condition. Thus, it is possible to direct-detect the optical SSB signal without suffering from the signal-signal beat interference and linear transmission impairments. However, due to the spectral broadening induced by nonlinear operations in the conventional KK algorithm, it is necessary to employ the digital upsampling at the beginning of the digital signal processing (DSP). The increased number of samples at the DSP would hinder the real-time implementation of this attractive receiver. Hence, we propose a new DSP algorithm for KK receiver operable at 2 samples per symbol. We adopt a couple of mathematical approximations to avoid the use of nonlinear operations such as logarithm and exponential functions. By using the proposed algorithm, we demonstrate the transmission of 112-Gb/s SSB orthogonal frequency-division-multiplexed signal over an 80-km fiber link. The results show that the proposed algorithm operating at 2 samples per symbol exhibits similar performance to the conventional KK one operating at 6 samples per symbol. We also present the error analysis of the proposed algorithm for KK receiver in comparison with the conventional one.
Pronk, Sander; Pouya, Iman; Lundborg, Magnus; Rotskoff, Grant; Wesén, Björn; Kasson, Peter M; Lindahl, Erik
2015-06-09
Computational chemistry and other simulation fields are critically dependent on computing resources, but few problems scale efficiently to the hundreds of thousands of processors available in current supercomputers-particularly for molecular dynamics. This has turned into a bottleneck as new hardware generations primarily provide more processing units rather than making individual units much faster, which simulation applications are addressing by increasingly focusing on sampling with algorithms such as free-energy perturbation, Markov state modeling, metadynamics, or milestoning. All these rely on combining results from multiple simulations into a single observation. They are potentially powerful approaches that aim to predict experimental observables directly, but this comes at the expense of added complexity in selecting sampling strategies and keeping track of dozens to thousands of simulations and their dependencies. Here, we describe how the distributed execution framework Copernicus allows the expression of such algorithms in generic workflows: dataflow programs. Because dataflow algorithms explicitly state dependencies of each constituent part, algorithms only need to be described on conceptual level, after which the execution is maximally parallel. The fully automated execution facilitates the optimization of these algorithms with adaptive sampling, where undersampled regions are automatically detected and targeted without user intervention. We show how several such algorithms can be formulated for computational chemistry problems, and how they are executed efficiently with many loosely coupled simulations using either distributed or parallel resources with Copernicus.
NASA Astrophysics Data System (ADS)
Karimi, Hamed; Rosenberg, Gili; Katzgraber, Helmut G.
2017-10-01
We present and apply a general-purpose, multistart algorithm for improving the performance of low-energy samplers used for solving optimization problems. The algorithm iteratively fixes the value of a large portion of the variables to values that have a high probability of being optimal. The resulting problems are smaller and less connected, and samplers tend to give better low-energy samples for these problems. The algorithm is trivially parallelizable since each start in the multistart algorithm is independent, and could be applied to any heuristic solver that can be run multiple times to give a sample. We present results for several classes of hard problems solved using simulated annealing, path-integral quantum Monte Carlo, parallel tempering with isoenergetic cluster moves, and a quantum annealer, and show that the success metrics and the scaling are improved substantially. When combined with this algorithm, the quantum annealer's scaling was substantially improved for native Chimera graph problems. In addition, with this algorithm the scaling of the time to solution of the quantum annealer is comparable to the Hamze-de Freitas-Selby algorithm on the weak-strong cluster problems introduced by Boixo et al. Parallel tempering with isoenergetic cluster moves was able to consistently solve three-dimensional spin glass problems with 8000 variables when combined with our method, whereas without our method it could not solve any.
Zunder, Eli R.; Finck, Rachel; Behbehani, Gregory K.; Amir, El-ad D.; Krishnaswamy, Smita; Gonzalez, Veronica D.; Lorang, Cynthia G.; Bjornson, Zach; Spitzer, Matthew H.; Bodenmiller, Bernd; Fantl, Wendy J.; Pe’er, Dana; Nolan, Garry P.
2015-01-01
SUMMARY Mass-tag cell barcoding (MCB) labels individual cell samples with unique combinatorial barcodes, after which they are pooled for processing and measurement as a single multiplexed sample. The MCB method eliminates variability between samples in antibody staining and instrument sensitivity, reduces antibody consumption, and shortens instrument measurement time. Here, we present an optimized MCB protocol with several improvements over previously described methods. The use of palladium-based labeling reagents expands the number of measurement channels available for mass cytometry and reduces interference with lanthanide-based antibody measurement. An error-detecting combinatorial barcoding scheme allows cell doublets to be identified and removed from the analysis. A debarcoding algorithm that is single cell-based rather than population-based improves the accuracy and efficiency of sample deconvolution. This debarcoding algorithm has been packaged into software that allows rapid and unbiased sample deconvolution. The MCB procedure takes 3–4 h, not including sample acquisition time of ~1 h per million cells. PMID:25612231
NASA Astrophysics Data System (ADS)
Guevara Hidalgo, Esteban; Nemoto, Takahiro; Lecomte, Vivien
Rare trajectories of stochastic systems are important to understand because of their potential impact. However, their properties are by definition difficult to sample directly. Population dynamics provide a numerical tool allowing their study, by means of simulating a large number of copies of the system, which are subjected to a selection rule that favors the rare trajectories of interest. However, such algorithms are plagued by finite simulation time- and finite population size- effects that can render their use delicate. Using the continuous-time cloning algorithm, we analyze the finite-time and finite-size scalings of estimators of the large deviation functions associated to the distribution of the rare trajectories. We use these scalings in order to propose a numerical approach which allows to extract the infinite-time and infinite-size limit of these estimators.
Decorating surfaces with bidirectional texture functions.
Zhou, Kun; Du, Peng; Wang, Lifeng; Matsushita, Yasuyuki; Shi, Jiaoying; Guo, Baining; Shum, Heung-Yeung
2005-01-01
We present a system for decorating arbitrary surfaces with bidirectional texture functions (BTF). Our system generates BTFs in two steps. First, we automatically synthesize a BTF over the target surface from a given BTF sample. Then, we let the user interactively paint BTF patches onto the surface such that the painted patches seamlessly integrate with the background patterns. Our system is based on a patch-based texture synthesis approach known as quilting. We present a graphcut algorithm for BTF synthesis on surfaces and the algorithm works well for a wide variety of BTF samples, including those which present problems for existing algorithms. We also describe a graphcut texture painting algorithm for creating new surface imperfections (e.g., dirt, cracks, scratches) from existing imperfections found in input BTF samples. Using these algorithms, we can decorate surfaces with real-world textures that have spatially-variant reflectance, fine-scale geometry details, and surfaces imperfections. A particularly attractive feature of BTF painting is that it allows us to capture imperfections of real materials and paint them onto geometry models. We demonstrate the effectiveness of our system with examples.
Eller, Leigh A; Eller, Michael A; Ouma, Benson J; Kataaha, Peter; Bagaya, Bernard S; Olemukan, Robert L; Erima, Simon; Kawala, Lilian; de Souza, Mark S; Kibuuka, Hannah; Wabwire-Mangen, Fred; Peel, Sheila A; O'Connell, Robert J; Robb, Merlin L; Michael, Nelson L
2007-10-01
The use of rapid tests for human immunodeficiency virus (HIV) has become standard in HIV testing algorithms employed in resource-limited settings. We report an extensive HIV rapid test validation study conducted among Ugandan blood bank donors at low risk for HIV infection. The operational characteristics of four readily available commercial HIV rapid test kits were first determined with 940 donor samples and were used to select a serial testing algorithm. Uni-Gold Recombigen HIV was used as the screening test, followed by HIV-1/2 STAT-PAK for reactive samples. OraQuick HIV-1 testing was performed if the first two test results were discordant. This algorithm was then tested with 5,252 blood donor samples, and the results were compared to those of enzyme immunoassays (EIAs) and Western blotting. The unadjusted algorithm sensitivity and specificity were 98.6 and 99.9%, respectively. The adjusted sensitivity and specificity were 100 and 99.96%, respectively. This HIV testing algorithm is a suitable alternative to EIAs and Western blotting for Ugandan blood donors.
Note: A pure-sampling quantum Monte Carlo algorithm with independent Metropolis.
Vrbik, Jan; Ospadov, Egor; Rothstein, Stuart M
2016-07-14
Recently, Ospadov and Rothstein published a pure-sampling quantum Monte Carlo algorithm (PSQMC) that features an auxiliary Path Z that connects the midpoints of the current and proposed Paths X and Y, respectively. When sufficiently long, Path Z provides statistical independence of Paths X and Y. Under those conditions, the Metropolis decision used in PSQMC is done without any approximation, i.e., not requiring microscopic reversibility and without having to introduce any G(x → x'; τ) factors into its decision function. This is a unique feature that contrasts with all competing reptation algorithms in the literature. An example illustrates that dependence of Paths X and Y has adverse consequences for pure sampling.
Note: A pure-sampling quantum Monte Carlo algorithm with independent Metropolis
NASA Astrophysics Data System (ADS)
Vrbik, Jan; Ospadov, Egor; Rothstein, Stuart M.
2016-07-01
Recently, Ospadov and Rothstein published a pure-sampling quantum Monte Carlo algorithm (PSQMC) that features an auxiliary Path Z that connects the midpoints of the current and proposed Paths X and Y, respectively. When sufficiently long, Path Z provides statistical independence of Paths X and Y. Under those conditions, the Metropolis decision used in PSQMC is done without any approximation, i.e., not requiring microscopic reversibility and without having to introduce any G(x → x'; τ) factors into its decision function. This is a unique feature that contrasts with all competing reptation algorithms in the literature. An example illustrates that dependence of Paths X and Y has adverse consequences for pure sampling.
NASA Astrophysics Data System (ADS)
Oda, Hirokuni; Xuan, Chuang
2014-10-01
development of pass-through superconducting rock magnetometers (SRM) has greatly promoted collection of paleomagnetic data from continuous long-core samples. The output of pass-through measurement is smoothed and distorted due to convolution of magnetization with the magnetometer sensor response. Although several studies could restore high-resolution paleomagnetic signal through deconvolution of pass-through measurement, difficulties in accurately measuring the magnetometer sensor response have hindered the application of deconvolution. We acquired reliable sensor response of an SRM at the Oregon State University based on repeated measurements of a precisely fabricated magnetic point source. In addition, we present an improved deconvolution algorithm based on Akaike's Bayesian Information Criterion (ABIC) minimization, incorporating new parameters to account for errors in sample measurement position and length. The new algorithm was tested using synthetic data constructed by convolving "true" paleomagnetic signal containing an "excursion" with the sensor response. Realistic noise was added to the synthetic measurement using Monte Carlo method based on measurement noise distribution acquired from 200 repeated measurements of a u-channel sample. Deconvolution of 1000 synthetic measurements with realistic noise closely resembles the "true" magnetization, and successfully restored fine-scale magnetization variations including the "excursion." Our analyses show that inaccuracy in sample measurement position and length significantly affects deconvolution estimation, and can be resolved using the new deconvolution algorithm. Optimized deconvolution of 20 repeated measurements of a u-channel sample yielded highly consistent deconvolution results and estimates of error in sample measurement position and length, demonstrating the reliability of the new deconvolution algorithm for real pass-through measurements.
Distributed autonomous systems: resource management, planning, and control algorithms
NASA Astrophysics Data System (ADS)
Smith, James F., III; Nguyen, ThanhVu H.
2005-05-01
Distributed autonomous systems, i.e., systems that have separated distributed components, each of which, exhibit some degree of autonomy are increasingly providing solutions to naval and other DoD problems. Recently developed control, planning and resource allocation algorithms for two types of distributed autonomous systems will be discussed. The first distributed autonomous system (DAS) to be discussed consists of a collection of unmanned aerial vehicles (UAVs) that are under fuzzy logic control. The UAVs fly and conduct meteorological sampling in a coordinated fashion determined by their fuzzy logic controllers to determine the atmospheric index of refraction. Once in flight no human intervention is required. A fuzzy planning algorithm determines the optimal trajectory, sampling rate and pattern for the UAVs and an interferometer platform while taking into account risk, reliability, priority for sampling in certain regions, fuel limitations, mission cost, and related uncertainties. The real-time fuzzy control algorithm running on each UAV will give the UAV limited autonomy allowing it to change course immediately without consulting with any commander, request other UAVs to help it, alter its sampling pattern and rate when observing interesting phenomena, or to terminate the mission and return to base. The algorithms developed will be compared to a resource manager (RM) developed for another DAS problem related to electronic attack (EA). This RM is based on fuzzy logic and optimized by evolutionary algorithms. It allows a group of dissimilar platforms to use EA resources distributed throughout the group. For both DAS types significant theoretical and simulation results will be presented.
Sample-Based Motion Planning in High-Dimensional and Differentially-Constrained Systems
2010-02-01
Reachable Set . . . 88 6-1 LittleDog Robot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 6-2 Dog bounding up stairs ...planning algorithm implemented on LittleDog, a quadruped robot . The motion planning algorithm successfully planned bounding trajectories over extremely...a motion planning algorithm implemented on LittleDog, a quadruped robot . The motion planning algorithm successfully planned bounding trajectories
Dipnall, Joanna F; Pasco, Julie A; Berk, Michael; Williams, Lana J; Dodd, Seetal; Jacka, Felice N; Meyer, Denny
2016-01-01
Depression is commonly comorbid with many other somatic diseases and symptoms. Identification of individuals in clusters with comorbid symptoms may reveal new pathophysiological mechanisms and treatment targets. The aim of this research was to combine machine-learning (ML) algorithms with traditional regression techniques by utilising self-reported medical symptoms to identify and describe clusters of individuals with increased rates of depression from a large cross-sectional community based population epidemiological study. A multi-staged methodology utilising ML and traditional statistical techniques was performed using the community based population National Health and Nutrition Examination Study (2009-2010) (N = 3,922). A Self-organised Mapping (SOM) ML algorithm, combined with hierarchical clustering, was performed to create participant clusters based on 68 medical symptoms. Binary logistic regression, controlling for sociodemographic confounders, was used to then identify the key clusters of participants with higher levels of depression (PHQ-9≥10, n = 377). Finally, a Multiple Additive Regression Tree boosted ML algorithm was run to identify the important medical symptoms for each key cluster within 17 broad categories: heart, liver, thyroid, respiratory, diabetes, arthritis, fractures and osteoporosis, skeletal pain, blood pressure, blood transfusion, cholesterol, vision, hearing, psoriasis, weight, bowels and urinary. Five clusters of participants, based on medical symptoms, were identified to have significantly increased rates of depression compared to the cluster with the lowest rate: odds ratios ranged from 2.24 (95% CI 1.56, 3.24) to 6.33 (95% CI 1.67, 24.02). The ML boosted regression algorithm identified three key medical condition categories as being significantly more common in these clusters: bowel, pain and urinary symptoms. Bowel-related symptoms was found to dominate the relative importance of symptoms within the five key clusters. This methodology shows promise for the identification of conditions in general populations and supports the current focus on the potential importance of bowel symptoms and the gut in mental health research.
NASA Astrophysics Data System (ADS)
Sitko, Rafał
2008-11-01
Knowledge of X-ray tube spectral distribution is necessary in theoretical methods of matrix correction, i.e. in both fundamental parameter (FP) methods and theoretical influence coefficient algorithms. Thus, the influence of X-ray tube distribution on the accuracy of the analysis of thin films and bulk samples is presented. The calculations are performed using experimental X-ray tube spectra taken from the literature and theoretical X-ray tube spectra evaluated by three different algorithms proposed by Pella et al. (X-Ray Spectrom. 14 (1985) 125-135), Ebel (X-Ray Spectrom. 28 (1999) 255-266), and Finkelshtein and Pavlova (X-Ray Spectrom. 28 (1999) 27-32). In this study, Fe-Cr-Ni system is selected as an example and the calculations are performed for X-ray tubes commonly applied in X-ray fluorescence analysis (XRF), i.e., Cr, Mo, Rh and W. The influence of X-ray tube spectra on FP analysis is evaluated when quantification is performed using various types of calibration samples. FP analysis of bulk samples is performed using pure-element bulk standards and multielement bulk standards similar to the analyzed material, whereas for FP analysis of thin films, the bulk and thin pure-element standards are used. For the evaluation of the influence of X-ray tube spectra on XRF analysis performed by theoretical influence coefficient methods, two algorithms for bulk samples are selected, i.e. Claisse-Quintin (Can. Spectrosc. 12 (1967) 129-134) and COLA algorithms (G.R. Lachance, Paper Presented at the International Conference on Industrial Inorganic Elemental Analysis, Metz, France, June 3, 1981) and two algorithms (constant and linear coefficients) for thin films recently proposed by Sitko (X-Ray Spectrom. 37 (2008) 265-272).
Michel, Yvonne Anne; Augestad, Liv Ariane; Rand, Kim
2018-04-01
The 15D is a generic preference-based health-related quality-of-life instrument developed in Finland. Values for the 15D instrument are estimated by combining responses to three distinct valuation tasks. The impact of how these tasks are combined is relatively unexplored. To compare 15D valuation studies conducted in Norway and Finland in terms of scores assigned in the valuation tasks and resulting value algorithms, and to discuss the contributions of each task and the algorithm estimation procedure to observed differences. Norwegian and Finnish scores from the three valuation tasks were compared using independent samples t tests and Lin concordance correlation coefficients. Covariance between tasks was assessed using Pearson product-moment correlations. Norwegian and Finnish value algorithms were compared using concordance correlation coefficients, total ranges, and ranges for individual dimensions. Observed differences were assessed using minimal important difference. Mean scores in the main valuation task were strikingly similar between the two countries, whereas the final value algorithms were less similar. The largest differences between Norway and Finland were observed for depression, vision, and mental function. 15D algorithms are a product of combining scores from three valuation tasks by use of methods involving multiplication. This procedure used to combine scores from the three tasks by multiplication serves to amplify variance from each task. From relatively similar responses in Norway and Finland, diverging value algorithms are created. We propose to simplify the 15D algorithm estimation procedure by using only one of the valuation tasks. Copyright © 2018 International Society for Pharmacoeconomics and Outcomes Research (ISPOR). Published by Elsevier Inc. All rights reserved.
Development of an Evolutionary Algorithm for the ab Initio Discovery of Two-Dimensional Materials
NASA Astrophysics Data System (ADS)
Revard, Benjamin Charles
Crystal structure prediction is an important first step on the path toward computational materials design. Increasingly robust methods have become available in recent years for computing many materials properties, but because properties are largely a function of crystal structure, the structure must be known before these methods can be brought to bear. In addition, structure prediction is particularly useful for identifying low-energy structures of subperiodic materials, such as two-dimensional (2D) materials, which may adopt unexpected structures that differ from those of the corresponding bulk phases. Evolutionary algorithms, which are heuristics for global optimization inspired by biological evolution, have proven to be a fruitful approach for tackling the problem of crystal structure prediction. This thesis describes the development of an improved evolutionary algorithm for structure prediction and several applications of the algorithm to predict the structures of novel low-energy 2D materials. The first part of this thesis contains an overview of evolutionary algorithms for crystal structure prediction and presents our implementation, including details of extending the algorithm to search for clusters, wires, and 2D materials, improvements to efficiency when running in parallel, improved composition space sampling, and the ability to search for partial phase diagrams. We then present several applications of the evolutionary algorithm to 2D systems, including InP, the C-Si and Sn-S phase diagrams, and several group-IV dioxides. This thesis makes use of the Cornell graduate school's "papers" option. Chapters 1 and 3 correspond to the first-author publications of Refs. [131] and [132], respectively, and chapter 2 will soon be submitted as a first-author publication. The material in chapter 4 is taken from Ref. [144], in which I share joint first-authorship. In this case I have included only my own contributions.
NASA Astrophysics Data System (ADS)
Lyu, Heng; Wang, Yannan; Jin, Qi; Shi, Lei; Li, Yunmei; Wang, Qiao
2017-10-01
Particulate organic carbon (POC) plays an important role in the carbon cycle in water due to its biological pump process. In the open ocean, algorithms can accurately estimate the surface POC concentration. However, no suitable POC-estimation algorithm based on MERIS bands is available for inland turbid eutrophic water. A total of 228 field samples were collected from Lake Taihu in different seasons between 2013 and 2015. At each site, the optical parameters and water quality were analyzed. Using in situ data, it was found that POC-estimation algorithms developed for the open ocean and coastal waters using remote sensing reflectance were not suitable for inland turbid eutrophic water. The organic suspended matter (OSM) concentration was found to be the best indicator of the POC concentration, and POC has an exponential relationship with the OSM concentration. Through an analysis of the POC concentration and optical parameters, it was found that the absorption peak of total suspended matter (TSM) at 665 nm was the optimum parameter to estimate POC. As a result, MERIS band 7, MERIS band 10 and MERIS band 12 were used to derive the absorption coefficient of TSM at 665 nm, and then, a semi-analytical algorithm was used to estimate the POC concentration for inland turbid eutrophic water. An accuracy assessment showed that the developed semi-analytical algorithm could be successfully applied with a MAPE of 31.82% and RMSE of 2.68 mg/L. The developed algorithm was successfully applied to a MERIS image, and two full-resolution MERIS images, acquired on August 13, 2010, and December 7, 2010, were used to map the POC spatial distribution in Lake Taihu in summer and winter.
Analysis of sequencing and scheduling methods for arrival traffic
NASA Technical Reports Server (NTRS)
Neuman, Frank; Erzberger, Heinz
1990-01-01
The air traffic control subsystem that performs scheduling is discussed. The function of the scheduling algorithms is to plan automatically the most efficient landing order and to assign optimally spaced landing times to all arrivals. Several important scheduling algorithms are described and the statistical performance of the scheduling algorithms is examined. Scheduling brings order to an arrival sequence for aircraft. First-come-first-served scheduling (FCFS) establishes a fair order, based on estimated times of arrival, and determines proper separations. Because of the randomness of the traffic, gaps will remain in the scheduled sequence of aircraft. These gaps are filled, or partially filled, by time-advancing the leading aircraft after a gap while still preserving the FCFS order. Tightly scheduled groups of aircraft remain with a mix of heavy and large aircraft. Separation requirements differ for different types of aircraft trailing each other. Advantage is taken of this fact through mild reordering of the traffic, thus shortening the groups and reducing average delays. Actual delays for different samples with the same statistical parameters vary widely, especially for heavy traffic.
Pan, Yongke; Niu, Wenjia
2017-01-01
Semisupervised Discriminant Analysis (SDA) is a semisupervised dimensionality reduction algorithm, which can easily resolve the out-of-sample problem. Relative works usually focus on the geometric relationships of data points, which are not obvious, to enhance the performance of SDA. Different from these relative works, the regularized graph construction is researched here, which is important in the graph-based semisupervised learning methods. In this paper, we propose a novel graph for Semisupervised Discriminant Analysis, which is called combined low-rank and k-nearest neighbor (LRKNN) graph. In our LRKNN graph, we map the data to the LR feature space and then the kNN is adopted to satisfy the algorithmic requirements of SDA. Since the low-rank representation can capture the global structure and the k-nearest neighbor algorithm can maximally preserve the local geometrical structure of the data, the LRKNN graph can significantly improve the performance of SDA. Extensive experiments on several real-world databases show that the proposed LRKNN graph is an efficient graph constructor, which can largely outperform other commonly used baselines. PMID:28316616
KIRMES: kernel-based identification of regulatory modules in euchromatic sequences.
Schultheiss, Sebastian J; Busch, Wolfgang; Lohmann, Jan U; Kohlbacher, Oliver; Rätsch, Gunnar
2009-08-15
Understanding transcriptional regulation is one of the main challenges in computational biology. An important problem is the identification of transcription factor (TF) binding sites in promoter regions of potential TF target genes. It is typically approached by position weight matrix-based motif identification algorithms using Gibbs sampling, or heuristics to extend seed oligos. Such algorithms succeed in identifying single, relatively well-conserved binding sites, but tend to fail when it comes to the identification of combinations of several degenerate binding sites, as those often found in cis-regulatory modules. We propose a new algorithm that combines the benefits of existing motif finding with the ones of support vector machines (SVMs) to find degenerate motifs in order to improve the modeling of regulatory modules. In experiments on microarray data from Arabidopsis thaliana, we were able to show that the newly developed strategy significantly improves the recognition of TF targets. The python source code (open source-licensed under GPL), the data for the experiments and a Galaxy-based web service are available at http://www.fml.mpg.de/raetsch/suppl/kirmes/.
A ground truth based comparative study on clustering of gene expression data.
Zhu, Yitan; Wang, Zuyi; Miller, David J; Clarke, Robert; Xuan, Jianhua; Hoffman, Eric P; Wang, Yue
2008-05-01
Given the variety of available clustering methods for gene expression data analysis, it is important to develop an appropriate and rigorous validation scheme to assess the performance and limitations of the most widely used clustering algorithms. In this paper, we present a ground truth based comparative study on the functionality, accuracy, and stability of five data clustering methods, namely hierarchical clustering, K-means clustering, self-organizing maps, standard finite normal mixture fitting, and a caBIG toolkit (VIsual Statistical Data Analyzer--VISDA), tested on sample clustering of seven published microarray gene expression datasets and one synthetic dataset. We examined the performance of these algorithms in both data-sufficient and data-insufficient cases using quantitative performance measures, including cluster number detection accuracy and mean and standard deviation of partition accuracy. The experimental results showed that VISDA, an interactive coarse-to-fine maximum likelihood fitting algorithm, is a solid performer on most of the datasets, while K-means clustering and self-organizing maps optimized by the mean squared compactness criterion generally produce more stable solutions than the other methods.
[Multi-mathematical modelings for compatibility optimization of Jiangzhi granules].
Yang, Ming; Zhang, Li; Ge, Yingli; Lu, Yanliu; Ji, Guang
2011-12-01
To investigate into the method of "multi activity index evaluation and combination optimized of mult-component" for Chinese herbal formulas. According to the scheme of uniform experimental design, efficacy experiment, multi index evaluation, least absolute shrinkage, selection operator (LASSO) modeling, evolutionary optimization algorithm, validation experiment, we optimized the combination of Jiangzhi granules based on the activity indexes of blood serum ALT, ALT, AST, TG, TC, HDL, LDL and TG level of liver tissues, ratio of liver tissue to body. Analytic hierarchy process (AHP) combining with criteria importance through intercriteria correlation (CRITIC) for multi activity index evaluation was more reasonable and objective, it reflected the information of activity index's order and objective sample data. LASSO algorithm modeling could accurately reflect the relationship between different combination of Jiangzhi granule and the activity comprehensive indexes. The optimized combination of Jiangzhi granule showed better values of the activity comprehensive indexed than the original formula after the validation experiment. AHP combining with CRITIC can be used for multi activity index evaluation and LASSO algorithm, it is suitable for combination optimized of Chinese herbal formulas.
Brouwer, Darren H
2013-01-01
An algorithm is presented for solving the structures of silicate network materials such as zeolites or layered silicates from solid-state (29)Si double-quantum NMR data for situations in which the crystallographic space group is not known. The algorithm is explained and illustrated in detail using a hypothetical two-dimensional network structure as a working example. The algorithm involves an atom-by-atom structure building process in which candidate partial structures are evaluated according to their agreement with Si-O-Si connectivity information, symmetry restraints, and fits to (29)Si double quantum NMR curves followed by minimization of a cost function that incorporates connectivity, symmetry, and quality of fit to the double quantum curves. The two-dimensional network material is successfully reconstructed from hypothetical NMR data that can be reasonably expected to be obtained for real samples. This advance in "NMR crystallography" is expected to be important for structure determination of partially ordered silicate materials for which diffraction provides very limited structural information. Copyright © 2013 Elsevier Inc. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Swesty, F. Douglas; Myra, Eric S.
It is now generally agreed that multidimensional, multigroup, neutrino-radiation hydrodynamics (RHD) is an indispensable element of any realistic model of stellar-core collapse, core-collapse supernovae, and proto-neutron star instabilities. We have developed a new, two-dimensional, multigroup algorithm that can model neutrino-RHD flows in core-collapse supernovae. Our algorithm uses an approach similar to the ZEUS family of algorithms, originally developed by Stone and Norman. However, this completely new implementation extends that previous work in three significant ways: first, we incorporate multispecies, multigroup RHD in a flux-limited-diffusion approximation. Our approach is capable of modeling pair-coupled neutrino-RHD, and includes effects of Pauli blocking inmore » the collision integrals. Blocking gives rise to nonlinearities in the discretized radiation-transport equations, which we evolve implicitly in time. We employ parallelized Newton-Krylov methods to obtain a solution of these nonlinear, implicit equations. Our second major extension to the ZEUS algorithm is the inclusion of an electron conservation equation that describes the evolution of electron-number density in the hydrodynamic flow. This permits calculating deleptonization of a stellar core. Our third extension modifies the hydrodynamics algorithm to accommodate realistic, complex equations of state, including those having nonconvex behavior. In this paper, we present a description of our complete algorithm, giving sufficient details to allow others to implement, reproduce, and extend our work. Finite-differencing details are presented in appendices. We also discuss implementation of this algorithm on state-of-the-art, parallel-computing architectures. Finally, we present results of verification tests that demonstrate the numerical accuracy of this algorithm on diverse hydrodynamic, gravitational, radiation-transport, and RHD sample problems. We believe our methods to be of general use in a variety of model settings where radiation transport or RHD is important. Extension of this work to three spatial dimensions is straightforward.« less
Digital sorting of complex tissues for cell type-specific gene expression profiles.
Zhong, Yi; Wan, Ying-Wooi; Pang, Kaifang; Chow, Lionel M L; Liu, Zhandong
2013-03-07
Cellular heterogeneity is present in almost all gene expression profiles. However, transcriptome analysis of tissue specimens often ignores the cellular heterogeneity present in these samples. Standard deconvolution algorithms require prior knowledge of the cell type frequencies within a tissue or their in vitro expression profiles. Furthermore, these algorithms tend to report biased estimations. Here, we describe a Digital Sorting Algorithm (DSA) for extracting cell-type specific gene expression profiles from mixed tissue samples that is unbiased and does not require prior knowledge of cell type frequencies. The results suggest that DSA is a specific and sensitivity algorithm in gene expression profile deconvolution and will be useful in studying individual cell types of complex tissues.
Quantum-inspired algorithm for estimating the permanent of positive semidefinite matrices
NASA Astrophysics Data System (ADS)
Chakhmakhchyan, L.; Cerf, N. J.; Garcia-Patron, R.
2017-08-01
We construct a quantum-inspired classical algorithm for computing the permanent of Hermitian positive semidefinite matrices by exploiting a connection between these mathematical structures and the boson sampling model. Specifically, the permanent of a Hermitian positive semidefinite matrix can be expressed in terms of the expected value of a random variable, which stands for a specific photon-counting probability when measuring a linear-optically evolved random multimode coherent state. Our algorithm then approximates the matrix permanent from the corresponding sample mean and is shown to run in polynomial time for various sets of Hermitian positive semidefinite matrices, achieving a precision that improves over known techniques. This work illustrates how quantum optics may benefit algorithm development.
Xiang, Suyun; Wang, Wei; Xia, Jia; Xiang, Bingren; Ouyang, Pingkai
2009-09-01
The stochastic resonance algorithm is applied to the trace analysis of alkyl halides and alkyl benzenes in water samples. Compared to encountering a single signal when applying the algorithm, the optimization of system parameters for a multicomponent is more complex. In this article, the resolution of adjacent chromatographic peaks is first involved in the optimization of parameters. With the optimized parameters, the algorithm gave an ideal output with good resolution as well as enhanced signal-to-noise ratio. Applying the enhanced signals, the method extended the limit of detection and exhibited good linearity, which ensures accurate determination of the multicomponent.
Megchelenbrink, Wout; Huynen, Martijn; Marchiori, Elena
2014-01-01
Constraint-based models of metabolic networks are typically underdetermined, because they contain more reactions than metabolites. Therefore the solutions to this system do not consist of unique flux rates for each reaction, but rather a space of possible flux rates. By uniformly sampling this space, an estimated probability distribution for each reaction's flux in the network can be obtained. However, sampling a high dimensional network is time-consuming. Furthermore, the constraints imposed on the network give rise to an irregularly shaped solution space. Therefore more tailored, efficient sampling methods are needed. We propose an efficient sampling algorithm (called optGpSampler), which implements the Artificial Centering Hit-and-Run algorithm in a different manner than the sampling algorithm implemented in the COBRA Toolbox for metabolic network analysis, here called gpSampler. Results of extensive experiments on different genome-scale metabolic networks show that optGpSampler is up to 40 times faster than gpSampler. Application of existing convergence diagnostics on small network reconstructions indicate that optGpSampler converges roughly ten times faster than gpSampler towards similar sampling distributions. For networks of higher dimension (i.e. containing more than 500 reactions), we observed significantly better convergence of optGpSampler and a large deviation between the samples generated by the two algorithms. optGpSampler for Matlab and Python is available for non-commercial use at: http://cs.ru.nl/~wmegchel/optGpSampler/.
Improving and Evaluating Nested Sampling Algorithm for Marginal Likelihood Estimation
NASA Astrophysics Data System (ADS)
Ye, M.; Zeng, X.; Wu, J.; Wang, D.; Liu, J.
2016-12-01
With the growing impacts of climate change and human activities on the cycle of water resources, an increasing number of researches focus on the quantification of modeling uncertainty. Bayesian model averaging (BMA) provides a popular framework for quantifying conceptual model and parameter uncertainty. The ensemble prediction is generated by combining each plausible model's prediction, and each model is attached with a model weight which is determined by model's prior weight and marginal likelihood. Thus, the estimation of model's marginal likelihood is crucial for reliable and accurate BMA prediction. Nested sampling estimator (NSE) is a new proposed method for marginal likelihood estimation. The process of NSE is accomplished by searching the parameters' space from low likelihood area to high likelihood area gradually, and this evolution is finished iteratively via local sampling procedure. Thus, the efficiency of NSE is dominated by the strength of local sampling procedure. Currently, Metropolis-Hasting (M-H) algorithm is often used for local sampling. However, M-H is not an efficient sampling algorithm for high-dimensional or complicated parameter space. For improving the efficiency of NSE, it could be ideal to incorporate the robust and efficient sampling algorithm - DREAMzs into the local sampling of NSE. The comparison results demonstrated that the improved NSE could improve the efficiency of marginal likelihood estimation significantly. However, both improved and original NSEs suffer from heavy instability. In addition, the heavy computation cost of huge number of model executions is overcome by using an adaptive sparse grid surrogates.
Machine Learning for Big Data: A Study to Understand Limits at Scale
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sukumar, Sreenivas R.; Del-Castillo-Negrete, Carlos Emilio
This report aims to empirically understand the limits of machine learning when applied to Big Data. We observe that recent innovations in being able to collect, access, organize, integrate, and query massive amounts of data from a wide variety of data sources have brought statistical data mining and machine learning under more scrutiny, evaluation and application for gleaning insights from the data than ever before. Much is expected from algorithms without understanding their limitations at scale while dealing with massive datasets. In that context, we pose and address the following questions How does a machine learning algorithm perform on measuresmore » such as accuracy and execution time with increasing sample size and feature dimensionality? Does training with more samples guarantee better accuracy? How many features to compute for a given problem? Do more features guarantee better accuracy? Do efforts to derive and calculate more features and train on larger samples worth the effort? As problems become more complex and traditional binary classification algorithms are replaced with multi-task, multi-class categorization algorithms do parallel learners perform better? What happens to the accuracy of the learning algorithm when trained to categorize multiple classes within the same feature space? Towards finding answers to these questions, we describe the design of an empirical study and present the results. We conclude with the following observations (i) accuracy of the learning algorithm increases with increasing sample size but saturates at a point, beyond which more samples do not contribute to better accuracy/learning, (ii) the richness of the feature space dictates performance - both accuracy and training time, (iii) increased dimensionality often reflected in better performance (higher accuracy in spite of longer training times) but the improvements are not commensurate the efforts for feature computation and training and (iv) accuracy of the learning algorithms drop significantly with multi-class learners training on the same feature matrix and (v) learning algorithms perform well when categories in labeled data are independent (i.e., no relationship or hierarchy exists among categories).« less
NASA Technical Reports Server (NTRS)
Dupnick, E.; Wiggins, D.
1980-01-01
The scheduling algorithm for mission planning and logistics evaluation (SAMPLE) is presented. Two major subsystems are included: The mission payloads program; and the set covering program. Formats and parameter definitions for the payload data set (payload model), feasible combination file, and traffic model are documented.
Efficient Actor-Critic Algorithm with Hierarchical Model Learning and Planning
Fu, QiMing
2016-01-01
To improve the convergence rate and the sample efficiency, two efficient learning methods AC-HMLP and RAC-HMLP (AC-HMLP with ℓ 2-regularization) are proposed by combining actor-critic algorithm with hierarchical model learning and planning. The hierarchical models consisting of the local and the global models, which are learned at the same time during learning of the value function and the policy, are approximated by local linear regression (LLR) and linear function approximation (LFA), respectively. Both the local model and the global model are applied to generate samples for planning; the former is used only if the state-prediction error does not surpass the threshold at each time step, while the latter is utilized at the end of each episode. The purpose of taking both models is to improve the sample efficiency and accelerate the convergence rate of the whole algorithm through fully utilizing the local and global information. Experimentally, AC-HMLP and RAC-HMLP are compared with three representative algorithms on two Reinforcement Learning (RL) benchmark problems. The results demonstrate that they perform best in terms of convergence rate and sample efficiency. PMID:27795704
Efficient Actor-Critic Algorithm with Hierarchical Model Learning and Planning.
Zhong, Shan; Liu, Quan; Fu, QiMing
2016-01-01
To improve the convergence rate and the sample efficiency, two efficient learning methods AC-HMLP and RAC-HMLP (AC-HMLP with ℓ 2 -regularization) are proposed by combining actor-critic algorithm with hierarchical model learning and planning. The hierarchical models consisting of the local and the global models, which are learned at the same time during learning of the value function and the policy, are approximated by local linear regression (LLR) and linear function approximation (LFA), respectively. Both the local model and the global model are applied to generate samples for planning; the former is used only if the state-prediction error does not surpass the threshold at each time step, while the latter is utilized at the end of each episode. The purpose of taking both models is to improve the sample efficiency and accelerate the convergence rate of the whole algorithm through fully utilizing the local and global information. Experimentally, AC-HMLP and RAC-HMLP are compared with three representative algorithms on two Reinforcement Learning (RL) benchmark problems. The results demonstrate that they perform best in terms of convergence rate and sample efficiency.
Noise-enhanced convolutional neural networks.
Audhkhasi, Kartik; Osoba, Osonde; Kosko, Bart
2016-06-01
Injecting carefully chosen noise can speed convergence in the backpropagation training of a convolutional neural network (CNN). The Noisy CNN algorithm speeds training on average because the backpropagation algorithm is a special case of the generalized expectation-maximization (EM) algorithm and because such carefully chosen noise always speeds up the EM algorithm on average. The CNN framework gives a practical way to learn and recognize images because backpropagation scales with training data. It has only linear time complexity in the number of training samples. The Noisy CNN algorithm finds a special separating hyperplane in the network's noise space. The hyperplane arises from the likelihood-based positivity condition that noise-boosts the EM algorithm. The hyperplane cuts through a uniform-noise hypercube or Gaussian ball in the noise space depending on the type of noise used. Noise chosen from above the hyperplane speeds training on average. Noise chosen from below slows it on average. The algorithm can inject noise anywhere in the multilayered network. Adding noise to the output neurons reduced the average per-iteration training-set cross entropy by 39% on a standard MNIST image test set of handwritten digits. It also reduced the average per-iteration training-set classification error by 47%. Adding noise to the hidden layers can also reduce these performance measures. The noise benefit is most pronounced for smaller data sets because the largest EM hill-climbing gains tend to occur in the first few iterations. This noise effect can assist random sampling from large data sets because it allows a smaller random sample to give the same or better performance than a noiseless sample gives. Copyright © 2015 Elsevier Ltd. All rights reserved.
Hu, Junguo; Zhou, Jian; Zhou, Guomo; Luo, Yiqi; Xu, Xiaojun; Li, Pingheng; Liang, Junyi
2016-01-01
Soil respiration inherently shows strong spatial variability. It is difficult to obtain an accurate characterization of soil respiration with an insufficient number of monitoring points. However, it is expensive and cumbersome to deploy many sensors. To solve this problem, we proposed employing the Bayesian Maximum Entropy (BME) algorithm, using soil temperature as auxiliary information, to study the spatial distribution of soil respiration. The BME algorithm used the soft data (auxiliary information) effectively to improve the estimation accuracy of the spatiotemporal distribution of soil respiration. Based on the functional relationship between soil temperature and soil respiration, the BME algorithm satisfactorily integrated soil temperature data into said spatial distribution. As a means of comparison, we also applied the Ordinary Kriging (OK) and Co-Kriging (Co-OK) methods. The results indicated that the root mean squared errors (RMSEs) and absolute values of bias for both Day 1 and Day 2 were the lowest for the BME method, thus demonstrating its higher estimation accuracy. Further, we compared the performance of the BME algorithm coupled with auxiliary information, namely soil temperature data, and the OK method without auxiliary information in the same study area for 9, 21, and 37 sampled points. The results showed that the RMSEs for the BME algorithm (0.972 and 1.193) were less than those for the OK method (1.146 and 1.539) when the number of sampled points was 9 and 37, respectively. This indicates that the former method using auxiliary information could reduce the required number of sampling points for studying spatial distribution of soil respiration. Thus, the BME algorithm, coupled with soil temperature data, can not only improve the accuracy of soil respiration spatial interpolation but can also reduce the number of sampling points.
Hu, Junguo; Zhou, Jian; Zhou, Guomo; Luo, Yiqi; Xu, Xiaojun; Li, Pingheng; Liang, Junyi
2016-01-01
Soil respiration inherently shows strong spatial variability. It is difficult to obtain an accurate characterization of soil respiration with an insufficient number of monitoring points. However, it is expensive and cumbersome to deploy many sensors. To solve this problem, we proposed employing the Bayesian Maximum Entropy (BME) algorithm, using soil temperature as auxiliary information, to study the spatial distribution of soil respiration. The BME algorithm used the soft data (auxiliary information) effectively to improve the estimation accuracy of the spatiotemporal distribution of soil respiration. Based on the functional relationship between soil temperature and soil respiration, the BME algorithm satisfactorily integrated soil temperature data into said spatial distribution. As a means of comparison, we also applied the Ordinary Kriging (OK) and Co-Kriging (Co-OK) methods. The results indicated that the root mean squared errors (RMSEs) and absolute values of bias for both Day 1 and Day 2 were the lowest for the BME method, thus demonstrating its higher estimation accuracy. Further, we compared the performance of the BME algorithm coupled with auxiliary information, namely soil temperature data, and the OK method without auxiliary information in the same study area for 9, 21, and 37 sampled points. The results showed that the RMSEs for the BME algorithm (0.972 and 1.193) were less than those for the OK method (1.146 and 1.539) when the number of sampled points was 9 and 37, respectively. This indicates that the former method using auxiliary information could reduce the required number of sampling points for studying spatial distribution of soil respiration. Thus, the BME algorithm, coupled with soil temperature data, can not only improve the accuracy of soil respiration spatial interpolation but can also reduce the number of sampling points. PMID:26807579
De Kesel, Pieter M M; Capiau, Sara; Stove, Veronique V; Lambert, Willy E; Stove, Christophe P
2014-10-01
Although dried blood spot (DBS) sampling is increasingly receiving interest as a potential alternative to traditional blood sampling, the impact of hematocrit (Hct) on DBS results is limiting its final breakthrough in routine bioanalysis. To predict the Hct of a given DBS, potassium (K(+)) proved to be a reliable marker. The aim of this study was to evaluate whether application of an algorithm, based upon predicted Hct or K(+) concentrations as such, allowed correction for the Hct bias. Using validated LC-MS/MS methods, caffeine, chosen as a model compound, was determined in whole blood and corresponding DBS samples with a broad Hct range (0.18-0.47). A reference subset (n = 50) was used to generate an algorithm based on K(+) concentrations in DBS. Application of the developed algorithm on an independent test set (n = 50) alleviated the assay bias, especially at lower Hct values. Before correction, differences between DBS and whole blood concentrations ranged from -29.1 to 21.1%. The mean difference, as obtained by Bland-Altman comparison, was -6.6% (95% confidence interval (CI), -9.7 to -3.4%). After application of the algorithm, differences between corrected and whole blood concentrations lay between -19.9 and 13.9% with a mean difference of -2.1% (95% CI, -4.5 to 0.3%). The same algorithm was applied to a separate compound, paraxanthine, which was determined in 103 samples (Hct range, 0.17-0.47), yielding similar results. In conclusion, a K(+)-based algorithm allows correction for the Hct bias in the quantitative analysis of caffeine and its metabolite paraxanthine.
Andersson, Richard; Larsson, Linnea; Holmqvist, Kenneth; Stridh, Martin; Nyström, Marcus
2017-04-01
Almost all eye-movement researchers use algorithms to parse raw data and detect distinct types of eye movement events, such as fixations, saccades, and pursuit, and then base their results on these. Surprisingly, these algorithms are rarely evaluated. We evaluated the classifications of ten eye-movement event detection algorithms, on data from an SMI HiSpeed 1250 system, and compared them to manual ratings of two human experts. The evaluation focused on fixations, saccades, and post-saccadic oscillations. The evaluation used both event duration parameters, and sample-by-sample comparisons to rank the algorithms. The resulting event durations varied substantially as a function of what algorithm was used. This evaluation differed from previous evaluations by considering a relatively large set of algorithms, multiple events, and data from both static and dynamic stimuli. The main conclusion is that current detectors of only fixations and saccades work reasonably well for static stimuli, but barely better than chance for dynamic stimuli. Differing results across evaluation methods make it difficult to select one winner for fixation detection. For saccade detection, however, the algorithm by Larsson, Nyström and Stridh (IEEE Transaction on Biomedical Engineering, 60(9):2484-2493,2013) outperforms all algorithms in data from both static and dynamic stimuli. The data also show how improperly selected algorithms applied to dynamic data misestimate fixation and saccade properties.
Mishra, Arabinda; Anderson, Adam W; Wu, Xi; Gore, John C; Ding, Zhaohua
2010-08-01
The purpose of this work is to design a neuronal fiber tracking algorithm, which will be more suitable for reconstruction of fibers associated with functionally important regions in the human brain. The functional activations in the brain normally occur in the gray matter regions. Hence the fibers bordering these regions are weakly myelinated, resulting in poor performance of conventional tractography methods to trace the fiber links between them. A lower fractional anisotropy in this region makes it even difficult to track the fibers in the presence of noise. In this work, the authors focused on a stochastic approach to reconstruct these fiber pathways based on a Bayesian regularization framework. To estimate the true fiber direction (propagation vector), the a priori and conditional probability density functions are calculated in advance and are modeled as multivariate normal. The variance of the estimated tensor element vector is associated with the uncertainty due to noise and partial volume averaging (PVA). An adaptive and multiple sampling of the estimated tensor element vector, which is a function of the pre-estimated variance, overcomes the effect of noise and PVA in this work. The algorithm has been rigorously tested using a variety of synthetic data sets. The quantitative comparison of the results to standard algorithms motivated the authors to implement it for in vivo DTI data analysis. The algorithm has been implemented to delineate fibers in two major language pathways (Broca's to SMA and Broca's to Wernicke's) across 12 healthy subjects. Though the mean of standard deviation was marginally bigger than conventional (Euler's) approach [P. J. Basser et al., "In vivo fiber tractography using DT-MRI data," Magn. Reson. Med. 44(4), 625-632 (2000)], the number of extracted fibers in this approach was significantly higher. The authors also compared the performance of the proposed method to Lu's method [Y. Lu et al., "Improved fiber tractography with Bayesian tensor regularization," Neuroimage 31(3), 1061-1074 (2006)] and Friman's stochastic approach [O. Friman et al., "A Bayesian approach for stochastic white matter tractography," IEEE Trans. Med. Imaging 25(8), 965-978 (2006)]. Overall performance of the approach is found to be superior to above two methods, particularly when the signal-to-noise ratio was low. The authors observed that an adaptive sampling of the tensor element vectors, estimated as a function of the variance in a Bayesian framework, can effectively delineate neuronal fibers to analyze the structure-function relationship in human brain. The simulated and in vivo results are in good agreement with the theoretical aspects of the algorithm.
Jung, Inuk; Jo, Kyuri; Kang, Hyejin; Ahn, Hongryul; Yu, Youngjae; Kim, Sun
2017-12-01
Identifying biologically meaningful gene expression patterns from time series gene expression data is important to understand the underlying biological mechanisms. To identify significantly perturbed gene sets between different phenotypes, analysis of time series transcriptome data requires consideration of time and sample dimensions. Thus, the analysis of such time series data seeks to search gene sets that exhibit similar or different expression patterns between two or more sample conditions, constituting the three-dimensional data, i.e. gene-time-condition. Computational complexity for analyzing such data is very high, compared to the already difficult NP-hard two dimensional biclustering algorithms. Because of this challenge, traditional time series clustering algorithms are designed to capture co-expressed genes with similar expression pattern in two sample conditions. We present a triclustering algorithm, TimesVector, specifically designed for clustering three-dimensional time series data to capture distinctively similar or different gene expression patterns between two or more sample conditions. TimesVector identifies clusters with distinctive expression patterns in three steps: (i) dimension reduction and clustering of time-condition concatenated vectors, (ii) post-processing clusters for detecting similar and distinct expression patterns and (iii) rescuing genes from unclassified clusters. Using four sets of time series gene expression data, generated by both microarray and high throughput sequencing platforms, we demonstrated that TimesVector successfully detected biologically meaningful clusters of high quality. TimesVector improved the clustering quality compared to existing triclustering tools and only TimesVector detected clusters with differential expression patterns across conditions successfully. The TimesVector software is available at http://biohealth.snu.ac.kr/software/TimesVector/. sunkim.bioinfo@snu.ac.kr. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Pugliese, Cara E.; Kenworthy, Lauren; Bal, Vanessa Hus; Wallace, Gregory L; Yerys, Benjamin E; Maddox, Brenna B.; White, Susan W.; Popal, Haroon; Armour, Anna Chelsea; Miller, Judith; Herrington, John D.; Schultz, Robert T.; Martin, Alex; Anthony, Laura Gutermuth
2015-01-01
Recent updates have been proposed to the Autism Diagnostic Observation Schedule-2 Module 4 diagnostic algorithm. This new algorithm, however, has not yet been validated in an independent sample without intellectual disability (ID). This multi-site study compared the original and revised algorithms in individuals with ASD without ID. The revised algorithm demonstrated increased sensitivity, but lower specificity in the overall sample. Estimates were highest for females, individuals with a verbal IQ below 85 or above 115, and ages 16 and older. Best practice diagnostic procedures should include the Module 4 in conjunction with other assessment tools. Balancing needs for sensitivity and specificity depending on the purpose of assessment (e.g., clinical vs. research) and demographic characteristics mentioned above will enhance its utility. PMID:26385796
Adaptive image coding based on cubic-spline interpolation
NASA Astrophysics Data System (ADS)
Jiang, Jian-Xing; Hong, Shao-Hua; Lin, Tsung-Ching; Wang, Lin; Truong, Trieu-Kien
2014-09-01
It has been investigated that at low bit rates, downsampling prior to coding and upsampling after decoding can achieve better compression performance than standard coding algorithms, e.g., JPEG and H. 264/AVC. However, at high bit rates, the sampling-based schemes generate more distortion. Additionally, the maximum bit rate for the sampling-based scheme to outperform the standard algorithm is image-dependent. In this paper, a practical adaptive image coding algorithm based on the cubic-spline interpolation (CSI) is proposed. This proposed algorithm adaptively selects the image coding method from CSI-based modified JPEG and standard JPEG under a given target bit rate utilizing the so called ρ-domain analysis. The experimental results indicate that compared with the standard JPEG, the proposed algorithm can show better performance at low bit rates and maintain the same performance at high bit rates.
Genetic algorithm parameters tuning for resource-constrained project scheduling problem
NASA Astrophysics Data System (ADS)
Tian, Xingke; Yuan, Shengrui
2018-04-01
Project Scheduling Problem (RCPSP) is a kind of important scheduling problem. To achieve a certain optimal goal such as the shortest duration, the smallest cost, the resource balance and so on, it is required to arrange the start and finish of all tasks under the condition of satisfying project timing constraints and resource constraints. In theory, the problem belongs to the NP-hard problem, and the model is abundant. Many combinatorial optimization problems are special cases of RCPSP, such as job shop scheduling, flow shop scheduling and so on. At present, the genetic algorithm (GA) has been used to deal with the classical RCPSP problem and achieved remarkable results. Vast scholars have also studied the improved genetic algorithm for the RCPSP problem, which makes it to solve the RCPSP problem more efficiently and accurately. However, for the selection of the main parameters of the genetic algorithm, there is no parameter optimization in these studies. Generally, we used the empirical method, but it cannot ensure to meet the optimal parameters. In this paper, the problem was carried out, which is the blind selection of parameters in the process of solving the RCPSP problem. We made sampling analysis, the establishment of proxy model and ultimately solved the optimal parameters.
Bayesian Analysis of High Dimensional Classification
NASA Astrophysics Data System (ADS)
Mukhopadhyay, Subhadeep; Liang, Faming
2009-12-01
Modern data mining and bioinformatics have presented an important playground for statistical learning techniques, where the number of input variables is possibly much larger than the sample size of the training data. In supervised learning, logistic regression or probit regression can be used to model a binary output and form perceptron classification rules based on Bayesian inference. In these cases , there is a lot of interest in searching for sparse model in High Dimensional regression(/classification) setup. we first discuss two common challenges for analyzing high dimensional data. The first one is the curse of dimensionality. The complexity of many existing algorithms scale exponentially with the dimensionality of the space and by virtue of that algorithms soon become computationally intractable and therefore inapplicable in many real applications. secondly, multicollinearities among the predictors which severely slowdown the algorithm. In order to make Bayesian analysis operational in high dimension we propose a novel 'Hierarchical stochastic approximation monte carlo algorithm' (HSAMC), which overcomes the curse of dimensionality, multicollinearity of predictors in high dimension and also it possesses the self-adjusting mechanism to avoid the local minima separated by high energy barriers. Models and methods are illustrated by simulation inspired from from the feild of genomics. Numerical results indicate that HSAMC can work as a general model selection sampler in high dimensional complex model space.
Fast and Accurate Support Vector Machines on Large Scale Systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vishnu, Abhinav; Narasimhan, Jayenthi; Holder, Larry
Support Vector Machines (SVM) is a supervised Machine Learning and Data Mining (MLDM) algorithm, which has become ubiquitous largely due to its high accuracy and obliviousness to dimensionality. The objective of SVM is to find an optimal boundary --- also known as hyperplane --- which separates the samples (examples in a dataset) of different classes by a maximum margin. Usually, very few samples contribute to the definition of the boundary. However, existing parallel algorithms use the entire dataset for finding the boundary, which is sub-optimal for performance reasons. In this paper, we propose a novel distributed memory algorithm to eliminatemore » the samples which do not contribute to the boundary definition in SVM. We propose several heuristics, which range from early (aggressive) to late (conservative) elimination of the samples, such that the overall time for generating the boundary is reduced considerably. In a few cases, a sample may be eliminated (shrunk) pre-emptively --- potentially resulting in an incorrect boundary. We propose a scalable approach to synchronize the necessary data structures such that the proposed algorithm maintains its accuracy. We consider the necessary trade-offs of single/multiple synchronization using in-depth time-space complexity analysis. We implement the proposed algorithm using MPI and compare it with libsvm--- de facto sequential SVM software --- which we enhance with OpenMP for multi-core/many-core parallelism. Our proposed approach shows excellent efficiency using up to 4096 processes on several large datasets such as UCI HIGGS Boson dataset and Offending URL dataset.« less
Space Object Maneuver Detection Algorithms Using TLE Data
NASA Astrophysics Data System (ADS)
Pittelkau, M.
2016-09-01
An important aspect of Space Situational Awareness (SSA) is detection of deliberate and accidental orbit changes of space objects. Although space surveillance systems detect orbit maneuvers within their tracking algorithms, maneuver data are not readily disseminated for general use. However, two-line element (TLE) data is available and can be used to detect maneuvers of space objects. This work is an attempt to improve upon existing TLE-based maneuver detection algorithms. Three adaptive maneuver detection algorithms are developed and evaluated: The first is a fading-memory Kalman filter, which is equivalent to the sliding-window least-squares polynomial fit, but computationally more efficient and adaptive to the noise in the TLE data. The second algorithm is based on a sample cumulative distribution function (CDF) computed from a histogram of the magnitude-squared |V|2 of change-in-velocity vectors (V), which is computed from the TLE data. A maneuver detection threshold is computed from the median estimated from the CDF, or from the CDF and a specified probability of false alarm. The third algorithm is a median filter. The median filter is the simplest of a class of nonlinear filters called order statistics filters, which is within the theory of robust statistics. The output of the median filter is practically insensitive to outliers, or large maneuvers. The median of the |V|2 data is proportional to the variance of the V, so the variance is estimated from the output of the median filter. A maneuver is detected when the input data exceeds a constant times the estimated variance.
An improved filtering algorithm for big read datasets and its application to single-cell assembly.
Wedemeyer, Axel; Kliemann, Lasse; Srivastav, Anand; Schielke, Christian; Reusch, Thorsten B; Rosenstiel, Philip
2017-07-03
For single-cell or metagenomic sequencing projects, it is necessary to sequence with a very high mean coverage in order to make sure that all parts of the sample DNA get covered by the reads produced. This leads to huge datasets with lots of redundant data. A filtering of this data prior to assembly is advisable. Brown et al. (2012) presented the algorithm Diginorm for this purpose, which filters reads based on the abundance of their k-mers. We present Bignorm, a faster and quality-conscious read filtering algorithm. An important new algorithmic feature is the use of phred quality scores together with a detailed analysis of the k-mer counts to decide which reads to keep. We qualify and recommend parameters for our new read filtering algorithm. Guided by these parameters, we remove in terms of median 97.15% of the reads while keeping the mean phred score of the filtered dataset high. Using the SDAdes assembler, we produce assemblies of high quality from these filtered datasets in a fraction of the time needed for an assembly from the datasets filtered with Diginorm. We conclude that read filtering is a practical and efficient method for reducing read data and for speeding up the assembly process. This applies not only for single cell assembly, as shown in this paper, but also to other projects with high mean coverage datasets like metagenomic sequencing projects. Our Bignorm algorithm allows assemblies of competitive quality in comparison to Diginorm, while being much faster. Bignorm is available for download at https://git.informatik.uni-kiel.de/axw/Bignorm .
Local linear discriminant analysis framework using sample neighbors.
Fan, Zizhu; Xu, Yong; Zhang, David
2011-07-01
The linear discriminant analysis (LDA) is a very popular linear feature extraction approach. The algorithms of LDA usually perform well under the following two assumptions. The first assumption is that the global data structure is consistent with the local data structure. The second assumption is that the input data classes are Gaussian distributions. However, in real-world applications, these assumptions are not always satisfied. In this paper, we propose an improved LDA framework, the local LDA (LLDA), which can perform well without needing to satisfy the above two assumptions. Our LLDA framework can effectively capture the local structure of samples. According to different types of local data structure, our LLDA framework incorporates several different forms of linear feature extraction approaches, such as the classical LDA and principal component analysis. The proposed framework includes two LLDA algorithms: a vector-based LLDA algorithm and a matrix-based LLDA (MLLDA) algorithm. MLLDA is directly applicable to image recognition, such as face recognition. Our algorithms need to train only a small portion of the whole training set before testing a sample. They are suitable for learning large-scale databases especially when the input data dimensions are very high and can achieve high classification accuracy. Extensive experiments show that the proposed algorithms can obtain good classification results.
Unsupervised classification of multivariate geostatistical data: Two algorithms
NASA Astrophysics Data System (ADS)
Romary, Thomas; Ors, Fabien; Rivoirard, Jacques; Deraisme, Jacques
2015-12-01
With the increasing development of remote sensing platforms and the evolution of sampling facilities in mining and oil industry, spatial datasets are becoming increasingly large, inform a growing number of variables and cover wider and wider areas. Therefore, it is often necessary to split the domain of study to account for radically different behaviors of the natural phenomenon over the domain and to simplify the subsequent modeling step. The definition of these areas can be seen as a problem of unsupervised classification, or clustering, where we try to divide the domain into homogeneous domains with respect to the values taken by the variables in hand. The application of classical clustering methods, designed for independent observations, does not ensure the spatial coherence of the resulting classes. Image segmentation methods, based on e.g. Markov random fields, are not adapted to irregularly sampled data. Other existing approaches, based on mixtures of Gaussian random functions estimated via the expectation-maximization algorithm, are limited to reasonable sample sizes and a small number of variables. In this work, we propose two algorithms based on adaptations of classical algorithms to multivariate geostatistical data. Both algorithms are model free and can handle large volumes of multivariate, irregularly spaced data. The first one proceeds by agglomerative hierarchical clustering. The spatial coherence is ensured by a proximity condition imposed for two clusters to merge. This proximity condition relies on a graph organizing the data in the coordinates space. The hierarchical algorithm can then be seen as a graph-partitioning algorithm. Following this interpretation, a spatial version of the spectral clustering algorithm is also proposed. The performances of both algorithms are assessed on toy examples and a mining dataset.
Fast processing of microscopic images using object-based extended depth of field.
Intarapanich, Apichart; Kaewkamnerd, Saowaluck; Pannarut, Montri; Shaw, Philip J; Tongsima, Sissades
2016-12-22
Microscopic analysis requires that foreground objects of interest, e.g. cells, are in focus. In a typical microscopic specimen, the foreground objects may lie on different depths of field necessitating capture of multiple images taken at different focal planes. The extended depth of field (EDoF) technique is a computational method for merging images from different depths of field into a composite image with all foreground objects in focus. Composite images generated by EDoF can be applied in automated image processing and pattern recognition systems. However, current algorithms for EDoF are computationally intensive and impractical, especially for applications such as medical diagnosis where rapid sample turnaround is important. Since foreground objects typically constitute a minor part of an image, the EDoF technique could be made to work much faster if only foreground regions are processed to make the composite image. We propose a novel algorithm called object-based extended depths of field (OEDoF) to address this issue. The OEDoF algorithm consists of four major modules: 1) color conversion, 2) object region identification, 3) good contrast pixel identification and 4) detail merging. First, the algorithm employs color conversion to enhance contrast followed by identification of foreground pixels. A composite image is constructed using only these foreground pixels, which dramatically reduces the computational time. We used 250 images obtained from 45 specimens of confirmed malaria infections to test our proposed algorithm. The resulting composite images with all in-focus objects were produced using the proposed OEDoF algorithm. We measured the performance of OEDoF in terms of image clarity (quality) and processing time. The features of interest selected by the OEDoF algorithm are comparable in quality with equivalent regions in images processed by the state-of-the-art complex wavelet EDoF algorithm; however, OEDoF required four times less processing time. This work presents a modification of the extended depth of field approach for efficiently enhancing microscopic images. This selective object processing scheme used in OEDoF can significantly reduce the overall processing time while maintaining the clarity of important image features. The empirical results from parasite-infected red cell images revealed that our proposed method efficiently and effectively produced in-focus composite images. With the speed improvement of OEDoF, this proposed algorithm is suitable for processing large numbers of microscope images, e.g., as required for medical diagnosis.
A Hybrid Monte Carlo importance sampling of rare events in Turbulence and in Turbulent Models
NASA Astrophysics Data System (ADS)
Margazoglou, Georgios; Biferale, Luca; Grauer, Rainer; Jansen, Karl; Mesterhazy, David; Rosenow, Tillmann; Tripiccione, Raffaele
2017-11-01
Extreme and rare events is a challenging topic in the field of turbulence. Trying to investigate those instances through the use of traditional numerical tools turns to be a notorious task, as they fail to systematically sample the fluctuations around them. On the other hand, we propose that an importance sampling Monte Carlo method can selectively highlight extreme events in remote areas of the phase space and induce their occurrence. We present a brand new computational approach, based on the path integral formulation of stochastic dynamics, and employ an accelerated Hybrid Monte Carlo (HMC) algorithm for this purpose. Through the paradigm of stochastic one-dimensional Burgers' equation, subjected to a random noise that is white-in-time and power-law correlated in Fourier space, we will prove our concept and benchmark our results with standard CFD methods. Furthermore, we will present our first results of constrained sampling around saddle-point instanton configurations (optimal fluctuations). The research leading to these results has received funding from the EU Horizon 2020 research and innovation programme under Grant Agreement No. 642069, and from the EU Seventh Framework Programme (FP7/2007-2013) under ERC Grant Agreement No. 339032.
Energy Efficient GNSS Signal Acquisition Using Singular Value Decomposition (SVD).
Bermúdez Ordoñez, Juan Carlos; Arnaldo Valdés, Rosa María; Gómez Comendador, Fernando
2018-05-16
A significant challenge in global navigation satellite system (GNSS) signal processing is a requirement for a very high sampling rate. The recently-emerging compressed sensing (CS) theory makes processing GNSS signals at a low sampling rate possible if the signal has a sparse representation in a certain space. Based on CS and SVD theories, an algorithm for sampling GNSS signals at a rate much lower than the Nyquist rate and reconstructing the compressed signal is proposed in this research, which is validated after the output from that process still performs signal detection using the standard fast Fourier transform (FFT) parallel frequency space search acquisition. The sparse representation of the GNSS signal is the most important precondition for CS, by constructing a rectangular Toeplitz matrix (TZ) of the transmitted signal, calculating the left singular vectors using SVD from the TZ, to achieve sparse signal representation. Next, obtaining the M-dimensional observation vectors based on the left singular vectors of the SVD, which are equivalent to the sampler operator in standard compressive sensing theory, the signal can be sampled below the Nyquist rate, and can still be reconstructed via ℓ 1 minimization with accuracy using convex optimization. As an added value, there is a GNSS signal acquisition enhancement effect by retaining the useful signal and filtering out noise by projecting the signal into the most significant proper orthogonal modes (PODs) which are the optimal distributions of signal power. The algorithm is validated with real recorded signals, and the results show that the proposed method is effective for sampling, reconstructing intermediate frequency (IF) GNSS signals in the time discrete domain.
Energy Efficient GNSS Signal Acquisition Using Singular Value Decomposition (SVD)
Arnaldo Valdés, Rosa María; Gómez Comendador, Fernando
2018-01-01
A significant challenge in global navigation satellite system (GNSS) signal processing is a requirement for a very high sampling rate. The recently-emerging compressed sensing (CS) theory makes processing GNSS signals at a low sampling rate possible if the signal has a sparse representation in a certain space. Based on CS and SVD theories, an algorithm for sampling GNSS signals at a rate much lower than the Nyquist rate and reconstructing the compressed signal is proposed in this research, which is validated after the output from that process still performs signal detection using the standard fast Fourier transform (FFT) parallel frequency space search acquisition. The sparse representation of the GNSS signal is the most important precondition for CS, by constructing a rectangular Toeplitz matrix (TZ) of the transmitted signal, calculating the left singular vectors using SVD from the TZ, to achieve sparse signal representation. Next, obtaining the M-dimensional observation vectors based on the left singular vectors of the SVD, which are equivalent to the sampler operator in standard compressive sensing theory, the signal can be sampled below the Nyquist rate, and can still be reconstructed via ℓ1 minimization with accuracy using convex optimization. As an added value, there is a GNSS signal acquisition enhancement effect by retaining the useful signal and filtering out noise by projecting the signal into the most significant proper orthogonal modes (PODs) which are the optimal distributions of signal power. The algorithm is validated with real recorded signals, and the results show that the proposed method is effective for sampling, reconstructing intermediate frequency (IF) GNSS signals in the time discrete domain. PMID:29772731
Research on compressive sensing reconstruction algorithm based on total variation model
NASA Astrophysics Data System (ADS)
Gao, Yu-xuan; Sun, Huayan; Zhang, Tinghua; Du, Lin
2017-12-01
Compressed sensing for breakthrough Nyquist sampling theorem provides a strong theoretical , making compressive sampling for image signals be carried out simultaneously. In traditional imaging procedures using compressed sensing theory, not only can it reduces the storage space, but also can reduce the demand for detector resolution greatly. Using the sparsity of image signal, by solving the mathematical model of inverse reconfiguration, realize the super-resolution imaging. Reconstruction algorithm is the most critical part of compression perception, to a large extent determine the accuracy of the reconstruction of the image.The reconstruction algorithm based on the total variation (TV) model is more suitable for the compression reconstruction of the two-dimensional image, and the better edge information can be obtained. In order to verify the performance of the algorithm, Simulation Analysis the reconstruction result in different coding mode of the reconstruction algorithm based on the TV reconstruction algorithm. The reconstruction effect of the reconfigurable algorithm based on TV based on the different coding methods is analyzed to verify the stability of the algorithm. This paper compares and analyzes the typical reconstruction algorithm in the same coding mode. On the basis of the minimum total variation algorithm, the Augmented Lagrangian function term is added and the optimal value is solved by the alternating direction method.Experimental results show that the reconstruction algorithm is compared with the traditional classical algorithm based on TV has great advantages, under the low measurement rate can be quickly and accurately recovers target image.
An Analysis of Light Periods of BL Lac Object S5 0716+714 with the MUSIC Algorithm
NASA Astrophysics Data System (ADS)
Tang, Jie
2012-07-01
The multiple signal classification (MUSIC) algorithm is introduced to the estimation of light periods of BL Lac objects. The principle of the MUSIC algorithm is given, together with a testing on its spectral resolution by using a simulative signal. From a lot of literature, we have collected a large number of effective observational data of the BL Lac object S5 0716+714 in the three optical wavebands V, R, and I from 1994 to 2008. The light periods of S5 0716+714 are obtained by means of the MUSIC algorithm and average periodogram algorithm, respectively. It is found that there exist two major periodic components, one is the period of (3.33±0.08) yr, another is the period of (1.24±0.01) yr. The comparison of the performances of periodicity analysis of two algorithms indicates that the MUSIC algorithm has a smaller requirement on the sample length, as well as a good spectral resolution and anti-noise ability, to improve the accuracy of periodicity analysis in the case of short sample length.
MODIS 3km Aerosol Product: Algorithm and Global Perspective
NASA Technical Reports Server (NTRS)
Remer, L. A.; Mattoo, S.; Levy, R. C.; Munchak, L.
2013-01-01
After more than a decade of producing a nominal 10 km aerosol product based on the dark target method, the MODIS aerosol team will be releasing a nominal 3 km product as part of their Collection 6 release. The new product differs from the original 10 km product only in the manner in which reflectance pixels are ingested, organized and selected by the aerosol algorithm. Overall, the 3 km product closely mirrors the 10 km product. However, the finer resolution product is able to retrieve over ocean closer to islands and coastlines, and is better able to resolve fine aerosol features such as smoke plumes over both ocean and land. In some situations, it provides retrievals over entire regions that the 10 km product barely samples. In situations traditionally difficult for the dark target algorithm, such as over bright or urban surfaces the 3 km product introduces isolated spikes of artificially high aerosol optical depth (AOD) that the 10 km algorithm avoids. Over land, globally, the 3 km product appears to be 0.01 to 0.02 higher than the 10 km product, while over ocean, the 3 km algorithm is retrieving a proportionally greater number of very low aerosol loading situations. Based on collocations with ground-based observations for only six months, expected errors associated with the 3 km land product are determined to be greater than for the 10 km product: 0.05 0.25 AOD. Over ocean, the suggestion is for expected errors to be the same as the 10 km product: 0.03 0.05 AOD. The advantage of the product is on the local scale, which will require continued evaluation not addressed here. Nevertheless, the new 3 km product is expected to provide important information complementary to existing satellite-derived products and become an important tool for the aerosol community.
Sampling-Based Motion Planning Algorithms for Replanning and Spatial Load Balancing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Boardman, Beth Leigh
The common theme of this dissertation is sampling-based motion planning with the two key contributions being in the area of replanning and spatial load balancing for robotic systems. Here, we begin by recalling two sampling-based motion planners: the asymptotically optimal rapidly-exploring random tree (RRT*), and the asymptotically optimal probabilistic roadmap (PRM*). We also provide a brief background on collision cones and the Distributed Reactive Collision Avoidance (DRCA) algorithm. The next four chapters detail novel contributions for motion replanning in environments with unexpected static obstacles, for multi-agent collision avoidance, and spatial load balancing. First, we show improved performance of the RRT*more » when using the proposed Grandparent-Connection (GP) or Focused-Refinement (FR) algorithms. Next, the Goal Tree algorithm for replanning with unexpected static obstacles is detailed and proven to be asymptotically optimal. A multi-agent collision avoidance problem in obstacle environments is approached via the RRT*, leading to the novel Sampling-Based Collision Avoidance (SBCA) algorithm. The SBCA algorithm is proven to guarantee collision free trajectories for all of the agents, even when subject to uncertainties in the knowledge of the other agents’ positions and velocities. Given that a solution exists, we prove that livelocks and deadlock will lead to the cost to the goal being decreased. We introduce a new deconfliction maneuver that decreases the cost-to-come at each step. This new maneuver removes the possibility of livelocks and allows a result to be formed that proves convergence to the goal configurations. Finally, we present a limited range Graph-based Spatial Load Balancing (GSLB) algorithm which fairly divides a non-convex space among multiple agents that are subject to differential constraints and have a limited travel distance. The GSLB is proven to converge to a solution when maximizing the area covered by the agents. The analysis for each of the above mentioned algorithms is confirmed in simulations.« less
ERIC Educational Resources Information Center
Pugliese, Cara E.; Kenworthy, Lauren; Bal, Vanessa Hus; Wallace, Gregory L.; Yerys, Benjamin E.; Maddox, Brenna B.; White, Susan W.; Popal, Haroon; Armour, Anna Chelsea; Miller, Judith; Herrington, John D.; Schultz, Robert T.; Martin, Alex; Anthony, Laura Gutermuth
2015-01-01
Recent updates have been proposed to the Autism Diagnostic Observation Schedule-2 Module 4 diagnostic algorithm. This new algorithm, however, has not yet been validated in an independent sample without intellectual disability (ID). This multi-site study compared the original and revised algorithms in individuals with ASD without ID. The revised…
Liu, Wei; Kulin, Merima; Kazaz, Tarik; Shahid, Adnan; Moerman, Ingrid; De Poorter, Eli
2017-09-12
Driven by the fast growth of wireless communication, the trend of sharing spectrum among heterogeneous technologies becomes increasingly dominant. Identifying concurrent technologies is an important step towards efficient spectrum sharing. However, due to the complexity of recognition algorithms and the strict condition of sampling speed, communication systems capable of recognizing signals other than their own type are extremely rare. This work proves that multi-model distribution of the received signal strength indicator (RSSI) is related to the signals' modulation schemes and medium access mechanisms, and RSSI from different technologies may exhibit highly distinctive features. A distinction is made between technologies with a streaming or a non-streaming property, and appropriate feature spaces can be established either by deriving parameters such as packet duration from RSSI or directly using RSSI's probability distribution. An experimental study shows that even RSSI acquired at a sub-Nyquist sampling rate is able to provide sufficient features to differentiate technologies such as Wi-Fi, Long Term Evolution (LTE), Digital Video Broadcasting-Terrestrial (DVB-T) and Bluetooth. The usage of the RSSI distribution-based feature space is illustrated via a sample algorithm. Experimental evaluation indicates that more than 92% accuracy is achieved with the appropriate configuration. As the analysis of RSSI distribution is straightforward and less demanding in terms of system requirements, we believe it is highly valuable for recognition of wideband technologies on constrained devices in the context of dynamic spectrum access.
Liu, Wei; Kulin, Merima; Kazaz, Tarik; De Poorter, Eli
2017-01-01
Driven by the fast growth of wireless communication, the trend of sharing spectrum among heterogeneous technologies becomes increasingly dominant. Identifying concurrent technologies is an important step towards efficient spectrum sharing. However, due to the complexity of recognition algorithms and the strict condition of sampling speed, communication systems capable of recognizing signals other than their own type are extremely rare. This work proves that multi-model distribution of the received signal strength indicator (RSSI) is related to the signals’ modulation schemes and medium access mechanisms, and RSSI from different technologies may exhibit highly distinctive features. A distinction is made between technologies with a streaming or a non-streaming property, and appropriate feature spaces can be established either by deriving parameters such as packet duration from RSSI or directly using RSSI’s probability distribution. An experimental study shows that even RSSI acquired at a sub-Nyquist sampling rate is able to provide sufficient features to differentiate technologies such as Wi-Fi, Long Term Evolution (LTE), Digital Video Broadcasting-Terrestrial (DVB-T) and Bluetooth. The usage of the RSSI distribution-based feature space is illustrated via a sample algorithm. Experimental evaluation indicates that more than 92% accuracy is achieved with the appropriate configuration. As the analysis of RSSI distribution is straightforward and less demanding in terms of system requirements, we believe it is highly valuable for recognition of wideband technologies on constrained devices in the context of dynamic spectrum access. PMID:28895879
Array signal recovery algorithm for a single-RF-channel DBF array
NASA Astrophysics Data System (ADS)
Zhang, Duo; Wu, Wen; Fang, Da Gang
2016-12-01
An array signal recovery algorithm based on sparse signal reconstruction theory is proposed for a single-RF-channel digital beamforming (DBF) array. A single-RF-channel antenna array is a low-cost antenna array in which signals are obtained from all antenna elements by only one microwave digital receiver. The spatially parallel array signals are converted into time-sequence signals, which are then sampled by the system. The proposed algorithm uses these time-sequence samples to recover the original parallel array signals by exploiting the second-order sparse structure of the array signals. Additionally, an optimization method based on the artificial bee colony (ABC) algorithm is proposed to improve the reconstruction performance. Using the proposed algorithm, the motion compensation problem for the single-RF-channel DBF array can be solved effectively, and the angle and Doppler information for the target can be simultaneously estimated. The effectiveness of the proposed algorithms is demonstrated by the results of numerical simulations.
Building on crossvalidation for increasing the quality of geostatistical modeling
Olea, R.A.
2012-01-01
The random function is a mathematical model commonly used in the assessment of uncertainty associated with a spatially correlated attribute that has been partially sampled. There are multiple algorithms for modeling such random functions, all sharing the requirement of specifying various parameters that have critical influence on the results. The importance of finding ways to compare the methods and setting parameters to obtain results that better model uncertainty has increased as these algorithms have grown in number and complexity. Crossvalidation has been used in spatial statistics, mostly in kriging, for the analysis of mean square errors. An appeal of this approach is its ability to work with the same empirical sample available for running the algorithms. This paper goes beyond checking estimates by formulating a function sensitive to conditional bias. Under ideal conditions, such function turns into a straight line, which can be used as a reference for preparing measures of performance. Applied to kriging, deviations from the ideal line provide sensitivity to the semivariogram lacking in crossvalidation of kriging errors and are more sensitive to conditional bias than analyses of errors. In terms of stochastic simulation, in addition to finding better parameters, the deviations allow comparison of the realizations resulting from the applications of different methods. Examples show improvements of about 30% in the deviations and approximately 10% in the square root of mean square errors between reasonable starting modelling and the solutions according to the new criteria. ?? 2011 US Government.
Kinase Pathway Dependence in Primary Human Leukemias Determined by Rapid Inhibitor Screening
Tyner, Jeffrey W.; Yang, Wayne F.; Bankhead, Armand; Fan, Guang; Fletcher, Luke B.; Bryant, Jade; Glover, Jason M.; Chang, Bill H.; Spurgeon, Stephen E.; Fleming, William H.; Kovacsovics, Tibor; Gotlib, Jason R.; Oh, Stephen T.; Deininger, Michael W.; Zwaan, C. Michel; Den Boer, Monique L.; van den Heuvel-Eibrink, Marry M.; O’Hare, Thomas; Druker, Brian J.; Loriaux, Marc M.
2012-01-01
Kinases are dysregulated in most cancer but the frequency of specific kinase mutations is low, indicating a complex etiology in kinase dysregulation. Here we report a strategy to rapidly identify functionally important kinase targets, irrespective of the etiology of kinase pathway dysregulation, ultimately enabling a correlation of patient genetic profiles to clinically effective kinase inhibitors. Our methodology assessed the sensitivity of primary leukemia patient samples to a panel of 66 small-molecule kinase inhibitors over 3 days. Screening of 151 leukemia patient samples revealed a wide diversity of drug sensitivities, with 70% of the clinical specimens exhibiting hypersensitivity to one or more drugs. From this data set, we developed an algorithm to predict kinase pathway dependence based on analysis of inhibitor sensitivity patterns. Applying this algorithm correctly identified pathway dependence in proof-of-principle specimens with known oncogenes, including a rare FLT3 mutation outside regions covered by standard molecular diagnostic tests. Interrogation of all 151 patient specimens with this algorithm identified a diversity of gene targets and signaling pathways that could aid prioritization of deep sequencing data sets, permitting a cumulative analysis to understand kinase pathway dependence within leukemia subsets. In a proof-of-principle case, we showed that in vitro drug sensitivity could predict both a clinical response and the development of drug resistance. Taken together, our results suggested that drug target scores derived from a comprehensive kinase inhibitor panel could predict pathway dependence in cancer cells while simultaneously identifying potential therapeutic options. PMID:23087056
Medical Data Analytics Is Not a Simple Task.
Babič, František; Vadovský, Michal; Paralič, Ján
2018-01-01
Data analytics represents a new chance for medical diagnosis and treatment to make it more effective and successful. This expectation is not so easy to achieve as it may look like at a first glance. The medical experts, doctors or general practitioners have their own vocabulary, they use specific terms and type of speaking. On the other side, data analysts have to understand the task and to select the right algorithms. The applicability of the results depends on the effectiveness of the interactions between those two worlds. This paper presents our experiences with various medical data samples in form of SWOT analysis. We identified the most important input attributes for the target diagnosis or extracted decision rules and analysed their interestingness with cooperating doctors, for most promising new cut-off values or an investigation of possible important relations hidden in data sample. In general, this type of knowledge can be used for clinical decision support, but it has to be evaluated on different samples, conditions and ideally in long-term studies. Sometimes, the interaction needed much more time than we expected at the beginning but our experiences are mostly positive.
The effect of exit beam phase aberrations on parallel beam coherent x-ray reconstructions
NASA Astrophysics Data System (ADS)
Hruszkewycz, S. O.; Harder, R.; Xiao, X.; Fuoss, P. H.
2010-12-01
Diffraction artifacts from imperfect x-ray windows near the sample are an important consideration in the design of coherent x-ray diffraction measurements. In this study, we used simulated and experimental diffraction patterns in two and three dimensions to explore the effect of phase imperfections in a beryllium window (such as a void or inclusion) on the convergence behavior of phasing algorithms and on the ultimate reconstruction. A predictive relationship between beam wavelength, sample size, and window position was derived to explain the dependence of reconstruction quality on beryllium defect size. Defects corresponding to this prediction cause the most damage to the sample exit wave and induce signature error oscillations during phasing that can be used as a fingerprint of experimental x-ray window artifacts. The relationship between x-ray window imperfection size and coherent x-ray diffractive imaging reconstruction quality explored in this work can play an important role in designing high-resolution in situ coherent imaging instrumentation and will help interpret the phasing behavior of coherent diffraction measured in these in situ environments.
The effect of exit beam phase aberrations on parallel beam coherent x-ray reconstructions.
Hruszkewycz, S O; Harder, R; Xiao, X; Fuoss, P H
2010-12-01
Diffraction artifacts from imperfect x-ray windows near the sample are an important consideration in the design of coherent x-ray diffraction measurements. In this study, we used simulated and experimental diffraction patterns in two and three dimensions to explore the effect of phase imperfections in a beryllium window (such as a void or inclusion) on the convergence behavior of phasing algorithms and on the ultimate reconstruction. A predictive relationship between beam wavelength, sample size, and window position was derived to explain the dependence of reconstruction quality on beryllium defect size. Defects corresponding to this prediction cause the most damage to the sample exit wave and induce signature error oscillations during phasing that can be used as a fingerprint of experimental x-ray window artifacts. The relationship between x-ray window imperfection size and coherent x-ray diffractive imaging reconstruction quality explored in this work can play an important role in designing high-resolution in situ coherent imaging instrumentation and will help interpret the phasing behavior of coherent diffraction measured in these in situ environments.
Maraschin, Marcelo; Somensi-Zeggio, Amélia; Oliveira, Simone K; Kuhnen, Shirley; Tomazzoli, Maíra M; Raguzzoni, Josiane C; Zeri, Ana C M; Carreira, Rafael; Correia, Sara; Costa, Christopher; Rocha, Miguel
2016-01-22
The chemical composition of propolis is affected by environmental factors and harvest season, making it difficult to standardize its extracts for medicinal usage. By detecting a typical chemical profile associated with propolis from a specific production region or season, certain types of propolis may be used to obtain a specific pharmacological activity. In this study, propolis from three agroecological regions (plain, plateau, and highlands) from southern Brazil, collected over the four seasons of 2010, were investigated through a novel NMR-based metabolomics data analysis workflow. Chemometrics and machine learning algorithms (PLS-DA and RF), including methods to estimate variable importance in classification, were used in this study. The machine learning and feature selection methods permitted construction of models for propolis sample classification with high accuracy (>75%, reaching ∼90% in the best case), better discriminating samples regarding their collection seasons comparatively to the harvest regions. PLS-DA and RF allowed the identification of biomarkers for sample discrimination, expanding the set of discriminating features and adding relevant information for the identification of the class-determining metabolites. The NMR-based metabolomics analytical platform, coupled to bioinformatic tools, allowed characterization and classification of Brazilian propolis samples regarding the metabolite signature of important compounds, i.e., chemical fingerprint, harvest seasons, and production regions.
Wang, Qianqian; Zhao, Jing; Gong, Yong; Hao, Qun; Peng, Zhong
2017-11-20
A hybrid artificial bee colony (ABC) algorithm inspired by the best-so-far solution and bacterial chemotaxis was introduced to optimize the parameters of the five-parameter bidirectional reflectance distribution function (BRDF) model. To verify the performance of the hybrid ABC algorithm, we measured BRDF of three kinds of samples and simulated the undetermined parameters of the five-parameter BRDF model using the hybrid ABC algorithm and the genetic algorithm, respectively. The experimental results demonstrate that the hybrid ABC algorithm outperforms the genetic algorithm in convergence speed, accuracy, and time efficiency under the same conditions.
Assessment of surveys for the management of hospital clinical pharmacy services.
Čufar, Andreja; Mrhar, Aleš; Robnik-Šikonja, Marko
2015-06-01
Survey data sets are important sources of data, and their successful exploitation is of key importance for informed policy decision-making. We present how a survey analysis approach initially developed for customer satisfaction research in marketing can be adapted for an introduction of clinical pharmacy services into a hospital. We use a data mining analytical approach to extract relevant managerial consequences. We evaluate the importance of competences for users of a clinical pharmacy with the OrdEval algorithm and determine their nature according to the users' expectations. For this, we need substantially fewer questions than are required by the Kano approach. From 52 clinical pharmacy activities we were able to identify seven activities with a substantial negative impact (i.e., negative reinforcement) on the overall satisfaction of clinical pharmacy services, and two activities with a strong positive impact (upward reinforcement). Using analysis of individual feature values, we identified six performance, 10 excitement, and one basic clinical pharmacists' activity. We show how the OrdEval algorithm can exploit the information hidden in the ordering of class and attribute values, and their inherent correlation using a small sample of highly relevant respondents. The visualization of the outputs turns out highly useful in our clinical pharmacy research case study. Copyright © 2015 Elsevier B.V. All rights reserved.