Sample records for fmri datasets algorithm

  1. Sequential Dictionary Learning From Correlated Data: Application to fMRI Data Analysis.

    PubMed

    Seghouane, Abd-Krim; Iqbal, Asif

    2017-03-22

    Sequential dictionary learning via the K-SVD algorithm has been revealed as a successful alternative to conventional data driven methods such as independent component analysis (ICA) for functional magnetic resonance imaging (fMRI) data analysis. fMRI datasets are however structured data matrices with notions of spatio-temporal correlation and temporal smoothness. This prior information has not been included in the K-SVD algorithm when applied to fMRI data analysis. In this paper we propose three variants of the K-SVD algorithm dedicated to fMRI data analysis by accounting for this prior information. The proposed algorithms differ from the K-SVD in their sparse coding and dictionary update stages. The first two algorithms account for the known correlation structure in the fMRI data by using the squared Q, R-norm instead of the Frobenius norm for matrix approximation. The third and last algorithm account for both the known correlation structure in the fMRI data and the temporal smoothness. The temporal smoothness is incorporated in the dictionary update stage via regularization of the dictionary atoms obtained with penalization. The performance of the proposed dictionary learning algorithms are illustrated through simulations and applications on real fMRI data.

  2. Optshrink LR + S: accelerated fMRI reconstruction using non-convex optimal singular value shrinkage.

    PubMed

    Aggarwal, Priya; Shrivastava, Parth; Kabra, Tanay; Gupta, Anubha

    2017-03-01

    This paper presents a new accelerated fMRI reconstruction method, namely, OptShrink LR + S method that reconstructs undersampled fMRI data using a linear combination of low-rank and sparse components. The low-rank component has been estimated using non-convex optimal singular value shrinkage algorithm, while the sparse component has been estimated using convex l 1 minimization. The performance of the proposed method is compared with the existing state-of-the-art algorithms on real fMRI dataset. The proposed OptShrink LR + S method yields good qualitative and quantitative results.

  3. Joint fMRI analysis and subject clustering using sparse dictionary learning

    NASA Astrophysics Data System (ADS)

    Kim, Seung-Jun; Dontaraju, Krishna K.

    2017-08-01

    Multi-subject fMRI data analysis methods based on sparse dictionary learning are proposed. In addition to identifying the component spatial maps by exploiting the sparsity of the maps, clusters of the subjects are learned by postulating that the fMRI volumes admit a subspace clustering structure. Furthermore, in order to tune the associated hyper-parameters systematically, a cross-validation strategy is developed based on entry-wise sampling of the fMRI dataset. Efficient algorithms for solving the proposed constrained dictionary learning formulations are developed. Numerical tests performed on synthetic fMRI data show promising results and provides insights into the proposed technique.

  4. Multiclass fMRI data decoding and visualization using supervised self-organizing maps.

    PubMed

    Hausfeld, Lars; Valente, Giancarlo; Formisano, Elia

    2014-08-01

    When multivariate pattern decoding is applied to fMRI studies entailing more than two experimental conditions, a most common approach is to transform the multiclass classification problem into a series of binary problems. Furthermore, for decoding analyses, classification accuracy is often the only outcome reported although the topology of activation patterns in the high-dimensional features space may provide additional insights into underlying brain representations. Here we propose to decode and visualize voxel patterns of fMRI datasets consisting of multiple conditions with a supervised variant of self-organizing maps (SSOMs). Using simulations and real fMRI data, we evaluated the performance of our SSOM-based approach. Specifically, the analysis of simulated fMRI data with varying signal-to-noise and contrast-to-noise ratio suggested that SSOMs perform better than a k-nearest-neighbor classifier for medium and large numbers of features (i.e. 250 to 1000 or more voxels) and similar to support vector machines (SVMs) for small and medium numbers of features (i.e. 100 to 600voxels). However, for a larger number of features (>800voxels), SSOMs performed worse than SVMs. When applied to a challenging 3-class fMRI classification problem with datasets collected to examine the neural representation of three human voices at individual speaker level, the SSOM-based algorithm was able to decode speaker identity from auditory cortical activation patterns. Classification performances were similar between SSOMs and other decoding algorithms; however, the ability to visualize decoding models and underlying data topology of SSOMs promotes a more comprehensive understanding of classification outcomes. We further illustrated this visualization ability of SSOMs with a re-analysis of a dataset examining the representation of visual categories in the ventral visual cortex (Haxby et al., 2001). This analysis showed that SSOMs could retrieve and visualize topography and neighborhood relations of the brain representation of eight visual categories. We conclude that SSOMs are particularly suited for decoding datasets consisting of more than two classes and are optimally combined with approaches that reduce the number of voxels used for classification (e.g. region-of-interest or searchlight approaches). Copyright © 2014. Published by Elsevier Inc.

  5. A Non-Parametric Approach for the Activation Detection of Block Design fMRI Simulated Data Using Self-Organizing Maps and Support Vector Machine.

    PubMed

    Bahrami, Sheyda; Shamsi, Mousa

    2017-01-01

    Functional magnetic resonance imaging (fMRI) is a popular method to probe the functional organization of the brain using hemodynamic responses. In this method, volume images of the entire brain are obtained with a very good spatial resolution and low temporal resolution. However, they always suffer from high dimensionality in the face of classification algorithms. In this work, we combine a support vector machine (SVM) with a self-organizing map (SOM) for having a feature-based classification by using SVM. Then, a linear kernel SVM is used for detecting the active areas. Here, we use SOM for feature extracting and labeling the datasets. SOM has two major advances: (i) it reduces dimension of data sets for having less computational complexity and (ii) it is useful for identifying brain regions with small onset differences in hemodynamic responses. Our non-parametric model is compared with parametric and non-parametric methods. We use simulated fMRI data sets and block design inputs in this paper and consider the contrast to noise ratio (CNR) value equal to 0.6 for simulated datasets. fMRI simulated dataset has contrast 1-4% in active areas. The accuracy of our proposed method is 93.63% and the error rate is 6.37%.

  6. The dynamic programming high-order Dynamic Bayesian Networks learning for identifying effective connectivity in human brain from fMRI.

    PubMed

    Dang, Shilpa; Chaudhury, Santanu; Lall, Brejesh; Roy, Prasun Kumar

    2017-06-15

    Determination of effective connectivity (EC) among brain regions using fMRI is helpful in understanding the underlying neural mechanisms. Dynamic Bayesian Networks (DBNs) are an appropriate class of probabilistic graphical temporal-models that have been used in past to model EC from fMRI, specifically order-one. High-order DBNs (HO-DBNs) have still not been explored for fMRI data. A fundamental problem faced in the structure-learning of HO-DBN is high computational-burden and low accuracy by the existing heuristic search techniques used for EC detection from fMRI. In this paper, we propose using dynamic programming (DP) principle along with integration of properties of scoring-function in a way to reduce search space for structure-learning of HO-DBNs and finally, for identifying EC from fMRI which has not been done yet to the best of our knowledge. The proposed exact search-&-score learning approach HO-DBN-DP is an extension of the technique which was originally devised for learning a BN's structure from static data (Singh and Moore, 2005). The effectiveness in structure-learning is shown on synthetic fMRI dataset. The algorithm reaches globally-optimal solution in appreciably reduced time-complexity than the static counterpart due to integration of properties. The proof of optimality is provided. The results demonstrate that HO-DBN-DP is comparably more accurate and faster than currently used structure-learning algorithms used for identifying EC from fMRI. The real data EC from HO-DBN-DP shows consistency with previous literature than the classical Granger Causality method. Hence, the DP algorithm can be employed for reliable EC estimates from experimental fMRI data. Copyright © 2017 Elsevier B.V. All rights reserved.

  7. Learning Effective Connectivity Network Structure from fMRI Data Based on Artificial Immune Algorithm

    PubMed Central

    Ji, Junzhong; Liu, Jinduo; Liang, Peipeng; Zhang, Aidong

    2016-01-01

    Many approaches have been designed to extract brain effective connectivity from functional magnetic resonance imaging (fMRI) data. However, few of them can effectively identify the connectivity network structure due to different defects. In this paper, a new algorithm is developed to infer the effective connectivity between different brain regions by combining artificial immune algorithm (AIA) with the Bayes net method, named as AIAEC. In the proposed algorithm, a brain effective connectivity network is mapped onto an antibody, and four immune operators are employed to perform the optimization process of antibodies, including clonal selection operator, crossover operator, mutation operator and suppression operator, and finally gets an antibody with the highest K2 score as the solution. AIAEC is then tested on Smith’s simulated datasets, and the effect of the different factors on AIAEC is evaluated, including the node number, session length, as well as the other potential confounding factors of the blood oxygen level dependent (BOLD) signal. It was revealed that, as contrast to other existing methods, AIAEC got the best performance on the majority of the datasets. It was also found that AIAEC could attain a relative better solution under the influence of many factors, although AIAEC was differently affected by the aforementioned factors. AIAEC is thus demonstrated to be an effective method for detecting the brain effective connectivity. PMID:27045295

  8. Constructing fMRI connectivity networks: a whole brain functional parcellation method for node definition.

    PubMed

    Maggioni, Eleonora; Tana, Maria Gabriella; Arrigoni, Filippo; Zucca, Claudio; Bianchi, Anna Maria

    2014-05-15

    Functional Magnetic Resonance Imaging (fMRI) is used for exploring brain functionality, and recently it was applied for mapping the brain connection patterns. To give a meaningful neurobiological interpretation to the connectivity network, it is fundamental to properly define the network framework. In particular, the choice of the network nodes may affect the final connectivity results and the consequent interpretation. We introduce a novel method for the intra subject topological characterization of the nodes of fMRI brain networks, based on a whole brain parcellation scheme. The proposed whole brain parcellation algorithm divides the brain into clusters that are homogeneous from the anatomical and functional point of view, each of which constitutes a node. The functional parcellation described is based on the Tononi's cluster index, which measures instantaneous correlation in terms of intrinsic and extrinsic statistical dependencies. The method performance and reliability were first tested on simulated data, then on a real fMRI dataset acquired on healthy subjects during visual stimulation. Finally, the proposed algorithm was applied to epileptic patients' fMRI data recorded during seizures, to verify its usefulness as preparatory step for effective connectivity analysis. For each patient, the nodes of the network involved in ictal activity were defined according to the proposed parcellation scheme and Granger Causality Analysis (GCA) was applied to infer effective connectivity. We showed that the algorithm 1) performed well on simulated data, 2) was able to produce reliable inter subjects results and 3) led to a detailed definition of the effective connectivity pattern. Copyright © 2014 Elsevier B.V. All rights reserved.

  9. Hand classification of fMRI ICA noise components.

    PubMed

    Griffanti, Ludovica; Douaud, Gwenaëlle; Bijsterbosch, Janine; Evangelisti, Stefania; Alfaro-Almagro, Fidel; Glasser, Matthew F; Duff, Eugene P; Fitzgibbon, Sean; Westphal, Robert; Carone, Davide; Beckmann, Christian F; Smith, Stephen M

    2017-07-01

    We present a practical "how-to" guide to help determine whether single-subject fMRI independent components (ICs) characterise structured noise or not. Manual identification of signal and noise after ICA decomposition is required for efficient data denoising: to train supervised algorithms, to check the results of unsupervised ones or to manually clean the data. In this paper we describe the main spatial and temporal features of ICs and provide general guidelines on how to evaluate these. Examples of signal and noise components are provided from a wide range of datasets (3T data, including examples from the UK Biobank and the Human Connectome Project, and 7T data), together with practical guidelines for their identification. Finally, we discuss how the data quality, data type and preprocessing can influence the characteristics of the ICs and present examples of particularly challenging datasets. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.

  10. Big Data Approaches for the Analysis of Large-Scale fMRI Data Using Apache Spark and GPU Processing: A Demonstration on Resting-State fMRI Data from the Human Connectome Project

    PubMed Central

    Boubela, Roland N.; Kalcher, Klaudius; Huf, Wolfgang; Našel, Christian; Moser, Ewald

    2016-01-01

    Technologies for scalable analysis of very large datasets have emerged in the domain of internet computing, but are still rarely used in neuroimaging despite the existence of data and research questions in need of efficient computation tools especially in fMRI. In this work, we present software tools for the application of Apache Spark and Graphics Processing Units (GPUs) to neuroimaging datasets, in particular providing distributed file input for 4D NIfTI fMRI datasets in Scala for use in an Apache Spark environment. Examples for using this Big Data platform in graph analysis of fMRI datasets are shown to illustrate how processing pipelines employing it can be developed. With more tools for the convenient integration of neuroimaging file formats and typical processing steps, big data technologies could find wider endorsement in the community, leading to a range of potentially useful applications especially in view of the current collaborative creation of a wealth of large data repositories including thousands of individual fMRI datasets. PMID:26778951

  11. FACET - a "Flexible Artifact Correction and Evaluation Toolbox" for concurrently recorded EEG/fMRI data.

    PubMed

    Glaser, Johann; Beisteiner, Roland; Bauer, Herbert; Fischmeister, Florian Ph S

    2013-11-09

    In concurrent EEG/fMRI recordings, EEG data are impaired by the fMRI gradient artifacts which exceed the EEG signal by several orders of magnitude. While several algorithms exist to correct the EEG data, these algorithms lack the flexibility to either leave out or add new steps. The here presented open-source MATLAB toolbox FACET is a modular toolbox for the fast and flexible correction and evaluation of imaging artifacts from concurrently recorded EEG datasets. It consists of an Analysis, a Correction and an Evaluation framework allowing the user to choose from different artifact correction methods with various pre- and post-processing steps to form flexible combinations. The quality of the chosen correction approach can then be evaluated and compared to different settings. FACET was evaluated on a dataset provided with the FMRIB plugin for EEGLAB using two different correction approaches: Averaged Artifact Subtraction (AAS, Allen et al., NeuroImage 12(2):230-239, 2000) and the FMRI Artifact Slice Template Removal (FASTR, Niazy et al., NeuroImage 28(3):720-737, 2005). Evaluation of the obtained results were compared to the FASTR algorithm implemented in the EEGLAB plugin FMRIB. No differences were found between the FACET implementation of FASTR and the original algorithm across all gradient artifact relevant performance indices. The FACET toolbox not only provides facilities for all three modalities: data analysis, artifact correction as well as evaluation and documentation of the results but it also offers an easily extendable framework for development and evaluation of new approaches.

  12. FACET – a “Flexible Artifact Correction and Evaluation Toolbox” for concurrently recorded EEG/fMRI data

    PubMed Central

    2013-01-01

    Background In concurrent EEG/fMRI recordings, EEG data are impaired by the fMRI gradient artifacts which exceed the EEG signal by several orders of magnitude. While several algorithms exist to correct the EEG data, these algorithms lack the flexibility to either leave out or add new steps. The here presented open-source MATLAB toolbox FACET is a modular toolbox for the fast and flexible correction and evaluation of imaging artifacts from concurrently recorded EEG datasets. It consists of an Analysis, a Correction and an Evaluation framework allowing the user to choose from different artifact correction methods with various pre- and post-processing steps to form flexible combinations. The quality of the chosen correction approach can then be evaluated and compared to different settings. Results FACET was evaluated on a dataset provided with the FMRIB plugin for EEGLAB using two different correction approaches: Averaged Artifact Subtraction (AAS, Allen et al., NeuroImage 12(2):230–239, 2000) and the FMRI Artifact Slice Template Removal (FASTR, Niazy et al., NeuroImage 28(3):720–737, 2005). Evaluation of the obtained results were compared to the FASTR algorithm implemented in the EEGLAB plugin FMRIB. No differences were found between the FACET implementation of FASTR and the original algorithm across all gradient artifact relevant performance indices. Conclusion The FACET toolbox not only provides facilities for all three modalities: data analysis, artifact correction as well as evaluation and documentation of the results but it also offers an easily extendable framework for development and evaluation of new approaches. PMID:24206927

  13. Fast and Adaptive Sparse Precision Matrix Estimation in High Dimensions

    PubMed Central

    Liu, Weidong; Luo, Xi

    2014-01-01

    This paper proposes a new method for estimating sparse precision matrices in the high dimensional setting. It has been popular to study fast computation and adaptive procedures for this problem. We propose a novel approach, called Sparse Column-wise Inverse Operator, to address these two issues. We analyze an adaptive procedure based on cross validation, and establish its convergence rate under the Frobenius norm. The convergence rates under other matrix norms are also established. This method also enjoys the advantage of fast computation for large-scale problems, via a coordinate descent algorithm. Numerical merits are illustrated using both simulated and real datasets. In particular, it performs favorably on an HIV brain tissue dataset and an ADHD resting-state fMRI dataset. PMID:25750463

  14. Sensitivity and specificity considerations for fMRI encoding, decoding, and mapping of auditory cortex at ultra-high field.

    PubMed

    Moerel, Michelle; De Martino, Federico; Kemper, Valentin G; Schmitter, Sebastian; Vu, An T; Uğurbil, Kâmil; Formisano, Elia; Yacoub, Essa

    2018-01-01

    Following rapid technological advances, ultra-high field functional MRI (fMRI) enables exploring correlates of neuronal population activity at an increasing spatial resolution. However, as the fMRI blood-oxygenation-level-dependent (BOLD) contrast is a vascular signal, the spatial specificity of fMRI data is ultimately determined by the characteristics of the underlying vasculature. At 7T, fMRI measurement parameters determine the relative contribution of the macro- and microvasculature to the acquired signal. Here we investigate how these parameters affect relevant high-end fMRI analyses such as encoding, decoding, and submillimeter mapping of voxel preferences in the human auditory cortex. Specifically, we compare a T 2 * weighted fMRI dataset, obtained with 2D gradient echo (GE) EPI, to a predominantly T 2 weighted dataset obtained with 3D GRASE. We first investigated the decoding accuracy based on two encoding models that represented different hypotheses about auditory cortical processing. This encoding/decoding analysis profited from the large spatial coverage and sensitivity of the T 2 * weighted acquisitions, as evidenced by a significantly higher prediction accuracy in the GE-EPI dataset compared to the 3D GRASE dataset for both encoding models. The main disadvantage of the T 2 * weighted GE-EPI dataset for encoding/decoding analyses was that the prediction accuracy exhibited cortical depth dependent vascular biases. However, we propose that the comparison of prediction accuracy across the different encoding models may be used as a post processing technique to salvage the spatial interpretability of the GE-EPI cortical depth-dependent prediction accuracy. Second, we explored the mapping of voxel preferences. Large-scale maps of frequency preference (i.e., tonotopy) were similar across datasets, yet the GE-EPI dataset was preferable due to its larger spatial coverage and sensitivity. However, submillimeter tonotopy maps revealed biases in assigned frequency preference and selectivity for the GE-EPI dataset, but not for the 3D GRASE dataset. Thus, a T 2 weighted acquisition is recommended if high specificity in tonotopic maps is required. In conclusion, different fMRI acquisitions were better suited for different analyses. It is therefore critical that any sequence parameter optimization considers the eventual intended fMRI analyses and the nature of the neuroscience questions being asked. Copyright © 2017 Elsevier Inc. All rights reserved.

  15. Automatic Denoising of Functional MRI Data: Combining Independent Component Analysis and Hierarchical Fusion of Classifiers

    PubMed Central

    Salimi-Khorshidi, Gholamreza; Douaud, Gwenaëlle; Beckmann, Christian F; Glasser, Matthew F; Griffanti, Ludovica; Smith, Stephen M

    2014-01-01

    Many sources of fluctuation contribute to the fMRI signal, and this makes identifying the effects that are truly related to the underlying neuronal activity difficult. Independent component analysis (ICA) - one of the most widely used techniques for the exploratory analysis of fMRI data - has shown to be a powerful technique in identifying various sources of neuronally-related and artefactual fluctuation in fMRI data (both with the application of external stimuli and with the subject “at rest”). ICA decomposes fMRI data into patterns of activity (a set of spatial maps and their corresponding time series) that are statistically independent and add linearly to explain voxel-wise time series. Given the set of ICA components, if the components representing “signal” (brain activity) can be distinguished form the “noise” components (effects of motion, non-neuronal physiology, scanner artefacts and other nuisance sources), the latter can then be removed from the data, providing an effective cleanup of structured noise. Manual classification of components is labour intensive and requires expertise; hence, a fully automatic noise detection algorithm that can reliably detect various types of noise sources (in both task and resting fMRI) is desirable. In this paper, we introduce FIX (“FMRIB’s ICA-based X-noiseifier”), which provides an automatic solution for denoising fMRI data via accurate classification of ICA components. For each ICA component FIX generates a large number of distinct spatial and temporal features, each describing a different aspect of the data (e.g., what proportion of temporal fluctuations are at high frequencies). The set of features is then fed into a multi-level classifier (built around several different Classifiers). Once trained through the hand-classification of a sufficient number of training datasets, the classifier can then automatically classify new datasets. The noise components can then be subtracted from (or regressed out of) the original data, to provide automated cleanup. On conventional resting-state fMRI (rfMRI) single-run datasets, FIX achieved about 95% overall accuracy. On high-quality rfMRI data from the Human Connectome Project, FIX achieves over 99% classification accuracy, and as a result is being used in the default rfMRI processing pipeline for generating HCP connectomes. FIX is publicly available as a plugin for FSL. PMID:24389422

  16. Task-evoked brain functional magnetic susceptibility mapping by independent component analysis (χICA).

    PubMed

    Chen, Zikuan; Calhoun, Vince D

    2016-03-01

    Conventionally, independent component analysis (ICA) is performed on an fMRI magnitude dataset to analyze brain functional mapping (AICA). By solving the inverse problem of fMRI, we can reconstruct the brain magnetic susceptibility (χ) functional states. Upon the reconstructed χ dataspace, we propose an ICA-based brain functional χ mapping method (χICA) to extract task-evoked brain functional map. A complex division algorithm is applied to a timeseries of fMRI phase images to extract temporal phase changes (relative to an OFF-state snapshot). A computed inverse MRI (CIMRI) model is used to reconstruct a 4D brain χ response dataset. χICA is implemented by applying a spatial InfoMax ICA algorithm to the reconstructed 4D χ dataspace. With finger-tapping experiments on a 7T system, the χICA-extracted χ-depicted functional map is similar to the SPM-inferred functional χ map by a spatial correlation of 0.67 ± 0.05. In comparison, the AICA-extracted magnitude-depicted map is correlated with the SPM magnitude map by 0.81 ± 0.05. The understanding of the inferiority of χICA to AICA for task-evoked functional map is an ongoing research topic. For task-evoked brain functional mapping, we compare the data-driven ICA method with the task-correlated SPM method. In particular, we compare χICA with AICA for extracting task-correlated timecourses and functional maps. χICA can extract a χ-depicted task-evoked brain functional map from a reconstructed χ dataspace without the knowledge about brain hemodynamic responses. The χICA-extracted brain functional χ map reveals a bidirectional BOLD response pattern that is unavailable (or different) from AICA. Copyright © 2016 Elsevier B.V. All rights reserved.

  17. Basis Expansion Approaches for Regularized Sequential Dictionary Learning Algorithms With Enforced Sparsity for fMRI Data Analysis.

    PubMed

    Seghouane, Abd-Krim; Iqbal, Asif

    2017-09-01

    Sequential dictionary learning algorithms have been successfully applied to functional magnetic resonance imaging (fMRI) data analysis. fMRI data sets are, however, structured data matrices with the notions of temporal smoothness in the column direction. This prior information, which can be converted into a constraint of smoothness on the learned dictionary atoms, has seldomly been included in classical dictionary learning algorithms when applied to fMRI data analysis. In this paper, we tackle this problem by proposing two new sequential dictionary learning algorithms dedicated to fMRI data analysis by accounting for this prior information. These algorithms differ from the existing ones in their dictionary update stage. The steps of this stage are derived as a variant of the power method for computing the SVD. The proposed algorithms generate regularized dictionary atoms via the solution of a left regularized rank-one matrix approximation problem where temporal smoothness is enforced via regularization through basis expansion and sparse basis expansion in the dictionary update stage. Applications on synthetic data experiments and real fMRI data sets illustrating the performance of the proposed algorithms are provided.

  18. Hybrid ICA-Bayesian network approach reveals distinct effective connectivity differences in schizophrenia.

    PubMed

    Kim, D; Burge, J; Lane, T; Pearlson, G D; Kiehl, K A; Calhoun, V D

    2008-10-01

    We utilized a discrete dynamic Bayesian network (dDBN) approach (Burge, J., Lane, T., Link, H., Qiu, S., Clark, V.P., 2007. Discrete dynamic Bayesian network analysis of fMRI data. Hum Brain Mapp.) to determine differences in brain regions between patients with schizophrenia and healthy controls on a measure of effective connectivity, termed the approximate conditional likelihood score (ACL) (Burge, J., Lane, T., 2005. Learning Class-Discriminative Dynamic Bayesian Networks. Proceedings of the International Conference on Machine Learning, Bonn, Germany, pp. 97-104.). The ACL score represents a class-discriminative measure of effective connectivity by measuring the relative likelihood of the correlation between brain regions in one group versus another. The algorithm is capable of finding non-linear relationships between brain regions because it uses discrete rather than continuous values and attempts to model temporal relationships with a first-order Markov and stationary assumption constraint (Papoulis, A., 1991. Probability, random variables, and stochastic processes. McGraw-Hill, New York.). Since Bayesian networks are overly sensitive to noisy data, we introduced an independent component analysis (ICA) filtering approach that attempted to reduce the noise found in fMRI data by unmixing the raw datasets into a set of independent spatial component maps. Components that represented noise were removed and the remaining components reconstructed into the dimensions of the original fMRI datasets. We applied the dDBN algorithm to a group of 35 patients with schizophrenia and 35 matched healthy controls using an ICA filtered and unfiltered approach. We determined that filtering the data significantly improved the magnitude of the ACL score. Patients showed the greatest ACL scores in several regions, most markedly the cerebellar vermis and hemispheres. Our findings suggest that schizophrenia patients exhibit weaker connectivity than healthy controls in multiple regions, including bilateral temporal, frontal, and cerebellar regions during an auditory paradigm.

  19. Performance of Blind Source Separation Algorithms for FMRI Analysis using a Group ICA Method

    PubMed Central

    Correa, Nicolle; Adali, Tülay; Calhoun, Vince D.

    2007-01-01

    Independent component analysis (ICA) is a popular blind source separation (BSS) technique that has proven to be promising for the analysis of functional magnetic resonance imaging (fMRI) data. A number of ICA approaches have been used for fMRI data analysis, and even more ICA algorithms exist, however the impact of using different algorithms on the results is largely unexplored. In this paper, we study the performance of four major classes of algorithms for spatial ICA, namely information maximization, maximization of non-gaussianity, joint diagonalization of cross-cumulant matrices, and second-order correlation based methods when they are applied to fMRI data from subjects performing a visuo-motor task. We use a group ICA method to study the variability among different ICA algorithms and propose several analysis techniques to evaluate their performance. We compare how different ICA algorithms estimate activations in expected neuronal areas. The results demonstrate that the ICA algorithms using higher-order statistical information prove to be quite consistent for fMRI data analysis. Infomax, FastICA, and JADE all yield reliable results; each having their strengths in specific areas. EVD, an algorithm using second-order statistics, does not perform reliably for fMRI data. Additionally, for the iterative ICA algorithms, it is important to investigate the variability of the estimates from different runs. We test the consistency of the iterative algorithms, Infomax and FastICA, by running the algorithm a number of times with different initializations and note that they yield consistent results over these multiple runs. Our results greatly improve our confidence in the consistency of ICA for fMRI data analysis. PMID:17540281

  20. Performance of blind source separation algorithms for fMRI analysis using a group ICA method.

    PubMed

    Correa, Nicolle; Adali, Tülay; Calhoun, Vince D

    2007-06-01

    Independent component analysis (ICA) is a popular blind source separation technique that has proven to be promising for the analysis of functional magnetic resonance imaging (fMRI) data. A number of ICA approaches have been used for fMRI data analysis, and even more ICA algorithms exist; however, the impact of using different algorithms on the results is largely unexplored. In this paper, we study the performance of four major classes of algorithms for spatial ICA, namely, information maximization, maximization of non-Gaussianity, joint diagonalization of cross-cumulant matrices and second-order correlation-based methods, when they are applied to fMRI data from subjects performing a visuo-motor task. We use a group ICA method to study variability among different ICA algorithms, and we propose several analysis techniques to evaluate their performance. We compare how different ICA algorithms estimate activations in expected neuronal areas. The results demonstrate that the ICA algorithms using higher-order statistical information prove to be quite consistent for fMRI data analysis. Infomax, FastICA and joint approximate diagonalization of eigenmatrices (JADE) all yield reliable results, with each having its strengths in specific areas. Eigenvalue decomposition (EVD), an algorithm using second-order statistics, does not perform reliably for fMRI data. Additionally, for iterative ICA algorithms, it is important to investigate the variability of estimates from different runs. We test the consistency of the iterative algorithms Infomax and FastICA by running the algorithm a number of times with different initializations, and we note that they yield consistent results over these multiple runs. Our results greatly improve our confidence in the consistency of ICA for fMRI data analysis.

  1. Real-time interactive tractography analysis for multimodal brain visualization tool: MultiXplore

    NASA Astrophysics Data System (ADS)

    Bakhshmand, Saeed M.; de Ribaupierre, Sandrine; Eagleson, Roy

    2017-03-01

    Most debilitating neurological disorders can have anatomical origins. Yet unlike other body organs, the anatomy alone cannot easily provide an understanding of brain functionality. In fact, addressing the challenge of linking structural and functional connectivity remains in the frontiers of neuroscience. Aggregating multimodal neuroimaging datasets may be critical for developing theories that span brain functionality, global neuroanatomy and internal microstructures. Functional magnetic resonance imaging (fMRI) and diffusion tensor imaging (DTI) are main such techniques that are employed to investigate the brain under normal and pathological conditions. FMRI records blood oxygenation level of the grey matter (GM), whereas DTI is able to reveal the underlying structure of the white matter (WM). Brain global activity is assumed to be an integration of GM functional hubs and WM neural pathways that serve to connect them. In this study we developed and evaluated a two-phase algorithm. This algorithm is employed in a 3D interactive connectivity visualization framework and helps to accelerate clustering of virtual neural pathways. In this paper, we will detail an algorithm that makes use of an index-based membership array formed for a whole brain tractography file and corresponding parcellated brain atlas. Next, we demonstrate efficiency of the algorithm by measuring required times for extracting a variety of fiber clusters, which are chosen in such a way to resemble all sizes probable output data files that algorithm will generate. The proposed algorithm facilitates real-time visual inspection of neuroimaging data to further the discovery in structure-function relationship of the brain networks.

  2. A Novel Feature-Map Based ICA Model for Identifying the Individual, Intra/Inter-Group Brain Networks across Multiple fMRI Datasets.

    PubMed

    Wang, Nizhuan; Chang, Chunqi; Zeng, Weiming; Shi, Yuhu; Yan, Hongjie

    2017-01-01

    Independent component analysis (ICA) has been widely used in functional magnetic resonance imaging (fMRI) data analysis to evaluate functional connectivity of the brain; however, there are still some limitations on ICA simultaneously handling neuroimaging datasets with diverse acquisition parameters, e.g., different repetition time, different scanner, etc. Therefore, it is difficult for the traditional ICA framework to effectively handle ever-increasingly big neuroimaging datasets. In this research, a novel feature-map based ICA framework (FMICA) was proposed to address the aforementioned deficiencies, which aimed at exploring brain functional networks (BFNs) at different scales, e.g., the first level (individual subject level), second level (intragroup level of subjects within a certain dataset) and third level (intergroup level of subjects across different datasets), based only on the feature maps extracted from the fMRI datasets. The FMICA was presented as a hierarchical framework, which effectively made ICA and constrained ICA as a whole to identify the BFNs from the feature maps. The simulated and real experimental results demonstrated that FMICA had the excellent ability to identify the intergroup BFNs and to characterize subject-specific and group-specific difference of BFNs from the independent component feature maps, which sharply reduced the size of fMRI datasets. Compared with traditional ICAs, FMICA as a more generalized framework could efficiently and simultaneously identify the variant BFNs at the subject-specific, intragroup, intragroup-specific and intergroup levels, implying that FMICA was able to handle big neuroimaging datasets in neuroscience research.

  3. ICN_Atlas: Automated description and quantification of functional MRI activation patterns in the framework of intrinsic connectivity networks.

    PubMed

    Kozák, Lajos R; van Graan, Louis André; Chaudhary, Umair J; Szabó, Ádám György; Lemieux, Louis

    2017-12-01

    Generally, the interpretation of functional MRI (fMRI) activation maps continues to rely on assessing their relationship to anatomical structures, mostly in a qualitative and often subjective way. Recently, the existence of persistent and stable brain networks of functional nature has been revealed; in particular these so-called intrinsic connectivity networks (ICNs) appear to link patterns of resting state and task-related state connectivity. These networks provide an opportunity of functionally-derived description and interpretation of fMRI maps, that may be especially important in cases where the maps are predominantly task-unrelated, such as studies of spontaneous brain activity e.g. in the case of seizure-related fMRI maps in epilepsy patients or sleep states. Here we present a new toolbox (ICN_Atlas) aimed at facilitating the interpretation of fMRI data in the context of ICN. More specifically, the new methodology was designed to describe fMRI maps in function-oriented, objective and quantitative way using a set of 15 metrics conceived to quantify the degree of 'engagement' of ICNs for any given fMRI-derived statistical map of interest. We demonstrate that the proposed framework provides a highly reliable quantification of fMRI activation maps using a publicly available longitudinal (test-retest) resting-state fMRI dataset. The utility of the ICN_Atlas is also illustrated on a parametric task-modulation fMRI dataset, and on a dataset of a patient who had repeated seizures during resting-state fMRI, confirmed on simultaneously recorded EEG. The proposed ICN_Atlas toolbox is freely available for download at http://icnatlas.com and at http://www.nitrc.org for researchers to use in their fMRI investigations. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.

  4. Connectome-based predictive modeling of attention: Comparing different functional connectivity features and prediction methods across datasets.

    PubMed

    Yoo, Kwangsun; Rosenberg, Monica D; Hsu, Wei-Ting; Zhang, Sheng; Li, Chiang-Shan R; Scheinost, Dustin; Constable, R Todd; Chun, Marvin M

    2018-02-15

    Connectome-based predictive modeling (CPM; Finn et al., 2015; Shen et al., 2017) was recently developed to predict individual differences in traits and behaviors, including fluid intelligence (Finn et al., 2015) and sustained attention (Rosenberg et al., 2016a), from functional brain connectivity (FC) measured with fMRI. Here, using the CPM framework, we compared the predictive power of three different measures of FC (Pearson's correlation, accordance, and discordance) and two different prediction algorithms (linear and partial least square [PLS] regression) for attention function. Accordance and discordance are recently proposed FC measures that respectively track in-phase synchronization and out-of-phase anti-correlation (Meskaldji et al., 2015). We defined connectome-based models using task-based or resting-state FC data, and tested the effects of (1) functional connectivity measure and (2) feature-selection/prediction algorithm on individualized attention predictions. Models were internally validated in a training dataset using leave-one-subject-out cross-validation, and externally validated with three independent datasets. The training dataset included fMRI data collected while participants performed a sustained attention task and rested (N = 25; Rosenberg et al., 2016a). The validation datasets included: 1) data collected during performance of a stop-signal task and at rest (N = 83, including 19 participants who were administered methylphenidate prior to scanning; Farr et al., 2014a; Rosenberg et al., 2016b), 2) data collected during Attention Network Task performance and rest (N = 41, Rosenberg et al., in press), and 3) resting-state data and ADHD symptom severity from the ADHD-200 Consortium (N = 113; Rosenberg et al., 2016a). Models defined using all combinations of functional connectivity measure (Pearson's correlation, accordance, and discordance) and prediction algorithm (linear and PLS regression) predicted attentional abilities, with correlations between predicted and observed measures of attention as high as 0.9 for internal validation, and 0.6 for external validation (all p's < 0.05). Models trained on task data outperformed models trained on rest data. Pearson's correlation and accordance features generally showed a small numerical advantage over discordance features, while PLS regression models were usually better than linear regression models. Overall, in addition to correlation features combined with linear models (Rosenberg et al., 2016a), it is useful to consider accordance features and PLS regression for CPM. Copyright © 2017 Elsevier Inc. All rights reserved.

  5. Subject order-independent group ICA (SOI-GICA) for functional MRI data analysis.

    PubMed

    Zhang, Han; Zuo, Xi-Nian; Ma, Shuang-Ye; Zang, Yu-Feng; Milham, Michael P; Zhu, Chao-Zhe

    2010-07-15

    Independent component analysis (ICA) is a data-driven approach to study functional magnetic resonance imaging (fMRI) data. Particularly, for group analysis on multiple subjects, temporally concatenation group ICA (TC-GICA) is intensively used. However, due to the usually limited computational capability, data reduction with principal component analysis (PCA: a standard preprocessing step of ICA decomposition) is difficult to achieve for a large dataset. To overcome this, TC-GICA employs multiple-stage PCA data reduction. Such multiple-stage PCA data reduction, however, leads to variable outputs due to different subject concatenation orders. Consequently, the ICA algorithm uses the variable multiple-stage PCA outputs and generates variable decompositions. In this study, a rigorous theoretical analysis was conducted to prove the existence of such variability. Simulated and real fMRI experiments were used to demonstrate the subject-order-induced variability of TC-GICA results using multiple PCA data reductions. To solve this problem, we propose a new subject order-independent group ICA (SOI-GICA). Both simulated and real fMRI data experiments demonstrated the high robustness and accuracy of the SOI-GICA results compared to those of traditional TC-GICA. Accordingly, we recommend SOI-GICA for group ICA-based fMRI studies, especially those with large data sets. Copyright 2010 Elsevier Inc. All rights reserved.

  6. Hybrid ICA-Bayesian Network approach reveals distinct effective connectivity differences in schizophrenia

    PubMed Central

    Kim, D.; Burge, J.; Lane, T.; Pearlson, G. D; Kiehl, K. A; Calhoun, V. D.

    2008-01-01

    We utilized a discrete dynamic Bayesian network (dDBN) approach (Burge et al., 2007) to determine differences in brain regions between patients with schizophrenia and healthy controls on a measure of effective connectivity, termed the approximate conditional likelihood score (ACL) (Burge and Lane, 2005). The ACL score represents a class-discriminative measure of effective connectivity by measuring the relative likelihood of the correlation between brain regions in one group versus another. The algorithm is capable of finding non-linear relationships between brain regions because it uses discrete rather than continuous values and attempts to model temporal relationships with a first-order Markov and stationary assumption constraint (Papoulis, 1991). Since Bayesian networks are overly sensitive to noisy data, we introduced an independent component analysis (ICA) filtering approach that attempted to reduce the noise found in fMRI data by unmixing the raw datasets into a set of independent spatial component maps. Components that represented noise were removed and the remaining components reconstructed into the dimensions of the original fMRI datasets. We applied the dDBN algorithm to a group of 35 patients with schizophrenia and 35 matched healthy controls using an ICA filtered and unfiltered approach. We determined that filtering the data significantly improved the magnitude of the ACL score. Patients showed the greatest ACL scores in several regions, most markedly the cerebellar vermis and hemispheres. Our findings suggest that schizophrenia patients exhibit weaker connectivity than healthy controls in multiple regions, including bilateral temporal and frontal cortices, plus cerebellum during an auditory paradigm. PMID:18602482

  7. Using Functional or Structural Magnetic Resonance Images and Personal Characteristic Data to Identify ADHD and Autism

    PubMed Central

    Ghiassian, Sina; Greiner, Russell; Jin, Ping; Brown, Matthew R. G.

    2016-01-01

    A clinical tool that can diagnose psychiatric illness using functional or structural magnetic resonance (MR) brain images has the potential to greatly assist physicians and improve treatment efficacy. Working toward the goal of automated diagnosis, we propose an approach for automated classification of ADHD and autism based on histogram of oriented gradients (HOG) features extracted from MR brain images, as well as personal characteristic data features. We describe a learning algorithm that can produce effective classifiers for ADHD and autism when run on two large public datasets. The algorithm is able to distinguish ADHD from control with hold-out accuracy of 69.6% (over baseline 55.0%) using personal characteristics and structural brain scan features when trained on the ADHD-200 dataset (769 participants in training set, 171 in test set). It is able to distinguish autism from control with hold-out accuracy of 65.0% (over baseline 51.6%) using functional images with personal characteristic data when trained on the Autism Brain Imaging Data Exchange (ABIDE) dataset (889 participants in training set, 222 in test set). These results outperform all previously presented methods on both datasets. To our knowledge, this is the first demonstration of a single automated learning process that can produce classifiers for distinguishing patients vs. controls from brain imaging data with above-chance accuracy on large datasets for two different psychiatric illnesses (ADHD and autism). Working toward clinical applications requires robustness against real-world conditions, including the substantial variability that often exists among data collected at different institutions. It is therefore important that our algorithm was successful with the large ADHD-200 and ABIDE datasets, which include data from hundreds of participants collected at multiple institutions. While the resulting classifiers are not yet clinically relevant, this work shows that there is a signal in the (f)MRI data that a learning algorithm is able to find. We anticipate this will lead to yet more accurate classifiers, over these and other psychiatric disorders, working toward the goal of a clinical tool for high accuracy differential diagnosis. PMID:28030565

  8. Brain functional BOLD perturbation modelling for forward fMRI and inverse mapping

    PubMed Central

    Robinson, Jennifer; Calhoun, Vince

    2018-01-01

    Purpose To computationally separate dynamic brain functional BOLD responses from static background in a brain functional activity for forward fMRI signal analysis and inverse mapping. Methods A brain functional activity is represented in terms of magnetic source by a perturbation model: χ = χ0 +δχ, with δχ for BOLD magnetic perturbations and χ0 for background. A brain fMRI experiment produces a timeseries of complex-valued images (T2* images), whereby we extract the BOLD phase signals (denoted by δP) by a complex division. By solving an inverse problem, we reconstruct the BOLD δχ dataset from the δP dataset, and the brain χ distribution from a (unwrapped) T2* phase image. Given a 4D dataset of task BOLD fMRI, we implement brain functional mapping by temporal correlation analysis. Results Through a high-field (7T) and high-resolution (0.5mm in plane) task fMRI experiment, we demonstrated in detail the BOLD perturbation model for fMRI phase signal separation (P + δP) and reconstructing intrinsic brain magnetic source (χ and δχ). We also provided to a low-field (3T) and low-resolution (2mm) task fMRI experiment in support of single-subject fMRI study. Our experiments show that the δχ-depicted functional map reveals bidirectional BOLD χ perturbations during the task performance. Conclusions The BOLD perturbation model allows us to separate fMRI phase signal (by complex division) and to perform inverse mapping for pure BOLD δχ reconstruction for intrinsic functional χ mapping. The full brain χ reconstruction (from unwrapped fMRI phase) provides a new brain tissue image that allows to scrutinize the brain tissue idiosyncrasy for the pure BOLD δχ response through an automatic function/structure co-localization. PMID:29351339

  9. Individual Brain Charting, a high-resolution fMRI dataset for cognitive mapping.

    PubMed

    Pinho, Ana Luísa; Amadon, Alexis; Ruest, Torsten; Fabre, Murielle; Dohmatob, Elvis; Denghien, Isabelle; Ginisty, Chantal; Becuwe-Desmidt, Séverine; Roger, Séverine; Laurier, Laurence; Joly-Testault, Véronique; Médiouni-Cloarec, Gaëlle; Doublé, Christine; Martins, Bernadette; Pinel, Philippe; Eger, Evelyn; Varoquaux, Gaël; Pallier, Christophe; Dehaene, Stanislas; Hertz-Pannier, Lucie; Thirion, Bertrand

    2018-06-12

    Functional Magnetic Resonance Imaging (fMRI) has furthered brain mapping on perceptual, motor, as well as higher-level cognitive functions. However, to date, no data collection has systematically addressed the functional mapping of cognitive mechanisms at a fine spatial scale. The Individual Brain Charting (IBC) project stands for a high-resolution multi-task fMRI dataset that intends to provide the objective basis toward a comprehensive functional atlas of the human brain. The data refer to a cohort of 12 participants performing many different tasks. The large amount of task-fMRI data on the same subjects yields a precise mapping of the underlying functions, free from both inter-subject and inter-site variability. The present article gives a detailed description of the first release of the IBC dataset. It comprises a dozen of tasks, addressing both low- and high- level cognitive functions. This openly available dataset is thus intended to become a reference for cognitive brain mapping.

  10. Automatic cardiac cycle determination directly from EEG-fMRI data by multi-scale peak detection method.

    PubMed

    Wong, Chung-Ki; Luo, Qingfei; Zotev, Vadim; Phillips, Raquel; Chan, Kam Wai Clifford; Bodurka, Jerzy

    2018-03-31

    In simultaneous EEG-fMRI, identification of the period of cardioballistic artifact (BCG) in EEG is required for the artifact removal. Recording the electrocardiogram (ECG) waveform during fMRI is difficult, often causing inaccurate period detection. Since the waveform of the BCG extracted by independent component analysis (ICA) is relatively invariable compared to the ECG waveform, we propose a multiple-scale peak-detection algorithm to determine the BCG cycle directly from the EEG data. The algorithm first extracts the high contrast BCG component from the EEG data by ICA. The BCG cycle is then estimated by band-pass filtering the component around the fundamental frequency identified from its energy spectral density, and the peak of BCG artifact occurrence is selected from each of the estimated cycle. The algorithm is shown to achieve a high accuracy on a large EEG-fMRI dataset. It is also adaptive to various heart rates without the needs of adjusting the threshold parameters. The cycle detection remains accurate with the scan duration reduced to half a minute. Additionally, the algorithm gives a figure of merit to evaluate the reliability of the detection accuracy. The algorithm is shown to give a higher detection accuracy than the commonly used cycle detection algorithm fmrib_qrsdetect implemented in EEGLAB. The achieved high cycle detection accuracy of our algorithm without using the ECG waveforms makes possible to create and automate pipelines for processing large EEG-fMRI datasets, and virtually eliminates the need for ECG recordings for BCG artifact removal. Copyright © 2018 The Authors. Published by Elsevier B.V. All rights reserved.

  11. A phenome-wide examination of neural and cognitive function.

    PubMed

    Poldrack, R A; Congdon, E; Triplett, W; Gorgolewski, K J; Karlsgodt, K H; Mumford, J A; Sabb, F W; Freimer, N B; London, E D; Cannon, T D; Bilder, R M

    2016-12-06

    This data descriptor outlines a shared neuroimaging dataset from the UCLA Consortium for Neuropsychiatric Phenomics, which focused on understanding the dimensional structure of memory and cognitive control (response inhibition) functions in both healthy individuals (130 subjects) and individuals with neuropsychiatric disorders including schizophrenia (50 subjects), bipolar disorder (49 subjects), and attention deficit/hyperactivity disorder (43 subjects). The dataset includes an extensive set of task-based fMRI assessments, resting fMRI, structural MRI, and high angular resolution diffusion MRI. The dataset is shared through the OpenfMRI project, and is formatted according to the Brain Imaging Data Structure (BIDS) standard.

  12. Embedded sparse representation of fMRI data via group-wise dictionary optimization

    NASA Astrophysics Data System (ADS)

    Zhu, Dajiang; Lin, Binbin; Faskowitz, Joshua; Ye, Jieping; Thompson, Paul M.

    2016-03-01

    Sparse learning enables dimension reduction and efficient modeling of high dimensional signals and images, but it may need to be tailored to best suit specific applications and datasets. Here we used sparse learning to efficiently represent functional magnetic resonance imaging (fMRI) data from the human brain. We propose a novel embedded sparse representation (ESR), to identify the most consistent dictionary atoms across different brain datasets via an iterative group-wise dictionary optimization procedure. In this framework, we introduced additional criteria to make the learned dictionary atoms more consistent across different subjects. We successfully identified four common dictionary atoms that follow the external task stimuli with very high accuracy. After projecting the corresponding coefficient vectors back into the 3-D brain volume space, the spatial patterns are also consistent with traditional fMRI analysis results. Our framework reveals common features of brain activation in a population, as a new, efficient fMRI analysis method.

  13. Classification of autistic individuals and controls using cross-task characterization of fMRI activity

    PubMed Central

    Chanel, Guillaume; Pichon, Swann; Conty, Laurence; Berthoz, Sylvie; Chevallier, Coralie; Grèzes, Julie

    2015-01-01

    Multivariate pattern analysis (MVPA) has been applied successfully to task-based and resting-based fMRI recordings to investigate which neural markers distinguish individuals with autistic spectrum disorders (ASD) from controls. While most studies have focused on brain connectivity during resting state episodes and regions of interest approaches (ROI), a wealth of task-based fMRI datasets have been acquired in these populations in the last decade. This calls for techniques that can leverage information not only from a single dataset, but from several existing datasets that might share some common features and biomarkers. We propose a fully data-driven (voxel-based) approach that we apply to two different fMRI experiments with social stimuli (faces and bodies). The method, based on Support Vector Machines (SVMs) and Recursive Feature Elimination (RFE), is first trained for each experiment independently and each output is then combined to obtain a final classification output. Second, this RFE output is used to determine which voxels are most often selected for classification to generate maps of significant discriminative activity. Finally, to further explore the clinical validity of the approach, we correlate phenotypic information with obtained classifier scores. The results reveal good classification accuracy (range between 69% and 92.3%). Moreover, we were able to identify discriminative activity patterns pertaining to the social brain without relying on a priori ROI definitions. Finally, social motivation was the only dimension which correlated with classifier scores, suggesting that it is the main dimension captured by the classifiers. Altogether, we believe that the present RFE method proves to be efficient and may help identifying relevant biomarkers by taking advantage of acquired task-based fMRI datasets in psychiatric populations. PMID:26793434

  14. Rapid geodesic mapping of brain functional connectivity: implementation of a dedicated co-processor in a field-programmable gate array (FPGA) and application to resting state functional MRI.

    PubMed

    Minati, Ludovico; Cercignani, Mara; Chan, Dennis

    2013-10-01

    Graph theory-based analyses of brain network topology can be used to model the spatiotemporal correlations in neural activity detected through fMRI, and such approaches have wide-ranging potential, from detection of alterations in preclinical Alzheimer's disease through to command identification in brain-machine interfaces. However, due to prohibitive computational costs, graph-based analyses to date have principally focused on measuring connection density rather than mapping the topological architecture in full by exhaustive shortest-path determination. This paper outlines a solution to this problem through parallel implementation of Dijkstra's algorithm in programmable logic. The processor design is optimized for large, sparse graphs and provided in full as synthesizable VHDL code. An acceleration factor between 15 and 18 is obtained on a representative resting-state fMRI dataset, and maps of Euclidean path length reveal the anticipated heterogeneous cortical involvement in long-range integrative processing. These results enable high-resolution geodesic connectivity mapping for resting-state fMRI in patient populations and real-time geodesic mapping to support identification of imagined actions for fMRI-based brain-machine interfaces. Copyright © 2013 IPEM. Published by Elsevier Ltd. All rights reserved.

  15. Improved FastICA algorithm in fMRI data analysis using the sparsity property of the sources.

    PubMed

    Ge, Ruiyang; Wang, Yubao; Zhang, Jipeng; Yao, Li; Zhang, Hang; Long, Zhiying

    2016-04-01

    As a blind source separation technique, independent component analysis (ICA) has many applications in functional magnetic resonance imaging (fMRI). Although either temporal or spatial prior information has been introduced into the constrained ICA and semi-blind ICA methods to improve the performance of ICA in fMRI data analysis, certain types of additional prior information, such as the sparsity, has seldom been added to the ICA algorithms as constraints. In this study, we proposed a SparseFastICA method by adding the source sparsity as a constraint to the FastICA algorithm to improve the performance of the widely used FastICA. The source sparsity is estimated through a smoothed ℓ0 norm method. We performed experimental tests on both simulated data and real fMRI data to investigate the feasibility and robustness of SparseFastICA and made a performance comparison between SparseFastICA, FastICA and Infomax ICA. Results of the simulated and real fMRI data demonstrated the feasibility and robustness of SparseFastICA for the source separation in fMRI data. Both the simulated and real fMRI experimental results showed that SparseFastICA has better robustness to noise and better spatial detection power than FastICA. Although the spatial detection power of SparseFastICA and Infomax did not show significant difference, SparseFastICA had faster computation speed than Infomax. SparseFastICA was comparable to the Infomax algorithm with a faster computation speed. More importantly, SparseFastICA outperformed FastICA in robustness and spatial detection power and can be used to identify more accurate brain networks than FastICA algorithm. Copyright © 2016 Elsevier B.V. All rights reserved.

  16. Automatic EEG-assisted retrospective motion correction for fMRI (aE-REMCOR).

    PubMed

    Wong, Chung-Ki; Zotev, Vadim; Misaki, Masaya; Phillips, Raquel; Luo, Qingfei; Bodurka, Jerzy

    2016-04-01

    Head motions during functional magnetic resonance imaging (fMRI) impair fMRI data quality and introduce systematic artifacts that can affect interpretation of fMRI results. Electroencephalography (EEG) recordings performed simultaneously with fMRI provide high-temporal-resolution information about ongoing brain activity as well as head movements. Recently, an EEG-assisted retrospective motion correction (E-REMCOR) method was introduced. E-REMCOR utilizes EEG motion artifacts to correct the effects of head movements in simultaneously acquired fMRI data on a slice-by-slice basis. While E-REMCOR is an efficient motion correction approach, it involves an independent component analysis (ICA) of the EEG data and identification of motion-related ICs. Here we report an automated implementation of E-REMCOR, referred to as aE-REMCOR, which we developed to facilitate the application of E-REMCOR in large-scale EEG-fMRI studies. The aE-REMCOR algorithm, implemented in MATLAB, enables an automated preprocessing of the EEG data, an ICA decomposition, and, importantly, an automatic identification of motion-related ICs. aE-REMCOR has been used to perform retrospective motion correction for 305 fMRI datasets from 16 subjects, who participated in EEG-fMRI experiments conducted on a 3T MRI scanner. Performance of aE-REMCOR has been evaluated based on improvement in temporal signal-to-noise ratio (TSNR) of the fMRI data, as well as correction efficiency defined in terms of spike reduction in fMRI motion parameters. The results show that aE-REMCOR is capable of substantially reducing head motion artifacts in fMRI data. In particular, when there are significant rapid head movements during the scan, a large TSNR improvement and high correction efficiency can be achieved. Depending on a subject's motion, an average TSNR improvement over the brain upon the application of aE-REMCOR can be as high as 27%, with top ten percent of the TSNR improvement values exceeding 55%. The average correction efficiency over the 305 fMRI scans is 18% and the largest achieved efficiency is 71%. The utility of aE-REMCOR on the resting state fMRI connectivity of the default mode network is also examined. The motion-induced position-dependent error in the DMN connectivity analysis is shown to be reduced when aE-REMCOR is utilized. These results demonstrate that aE-REMCOR can be conveniently and efficiently used to improve fMRI motion correction in large clinical EEG-fMRI studies. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.

  17. How does spatial extent of fMRI datasets affect independent component analysis decomposition?

    PubMed

    Aragri, Adriana; Scarabino, Tommaso; Seifritz, Erich; Comani, Silvia; Cirillo, Sossio; Tedeschi, Gioacchino; Esposito, Fabrizio; Di Salle, Francesco

    2006-09-01

    Spatial independent component analysis (sICA) of functional magnetic resonance imaging (fMRI) time series can generate meaningful activation maps and associated descriptive signals, which are useful to evaluate datasets of the entire brain or selected portions of it. Besides computational implications, variations in the input dataset combined with the multivariate nature of ICA may lead to different spatial or temporal readouts of brain activation phenomena. By reducing and increasing a volume of interest (VOI), we applied sICA to different datasets from real activation experiments with multislice acquisition and single or multiple sensory-motor task-induced blood oxygenation level-dependent (BOLD) signal sources with different spatial and temporal structure. Using receiver operating characteristics (ROC) methodology for accuracy evaluation and multiple regression analysis as benchmark, we compared sICA decompositions of reduced and increased VOI fMRI time-series containing auditory, motor and hemifield visual activation occurring separately or simultaneously in time. Both approaches yielded valid results; however, the results of the increased VOI approach were spatially more accurate compared to the results of the decreased VOI approach. This is consistent with the capability of sICA to take advantage of extended samples of statistical observations and suggests that sICA is more powerful with extended rather than reduced VOI datasets to delineate brain activity. (c) 2006 Wiley-Liss, Inc.

  18. Towards Tunable Consensus Clustering for Studying Functional Brain Connectivity During Affective Processing.

    PubMed

    Liu, Chao; Abu-Jamous, Basel; Brattico, Elvira; Nandi, Asoke K

    2017-03-01

    In the past decades, neuroimaging of humans has gained a position of status within neuroscience, and data-driven approaches and functional connectivity analyses of functional magnetic resonance imaging (fMRI) data are increasingly favored to depict the complex architecture of human brains. However, the reliability of these findings is jeopardized by too many analysis methods and sometimes too few samples used, which leads to discord among researchers. We propose a tunable consensus clustering paradigm that aims at overcoming the clustering methods selection problem as well as reliability issues in neuroimaging by means of first applying several analysis methods (three in this study) on multiple datasets and then integrating the clustering results. To validate the method, we applied it to a complex fMRI experiment involving affective processing of hundreds of music clips. We found that brain structures related to visual, reward, and auditory processing have intrinsic spatial patterns of coherent neuroactivity during affective processing. The comparisons between the results obtained from our method and those from each individual clustering algorithm demonstrate that our paradigm has notable advantages over traditional single clustering algorithms in being able to evidence robust connectivity patterns even with complex neuroimaging data involving a variety of stimuli and affective evaluations of them. The consensus clustering method is implemented in the R package "UNCLES" available on http://cran.r-project.org/web/packages/UNCLES/index.html .

  19. Classification of fMRI resting-state maps using machine learning techniques: A comparative study

    NASA Astrophysics Data System (ADS)

    Gallos, Ioannis; Siettos, Constantinos

    2017-11-01

    We compare the efficiency of Principal Component Analysis (PCA) and nonlinear learning manifold algorithms (ISOMAP and Diffusion maps) for classifying brain maps between groups of schizophrenia patients and healthy from fMRI scans during a resting-state experiment. After a standard pre-processing pipeline, we applied spatial Independent component analysis (ICA) to reduce (a) noise and (b) spatial-temporal dimensionality of fMRI maps. On the cross-correlation matrix of the ICA components, we applied PCA, ISOMAP and Diffusion Maps to find an embedded low-dimensional space. Finally, support-vector-machines (SVM) and k-NN algorithms were used to evaluate the performance of the algorithms in classifying between the two groups.

  20. Accounting for Non-Gaussian Sources of Spatial Correlation in Parametric Functional Magnetic Resonance Imaging Paradigms I: Revisiting Cluster-Based Inferences.

    PubMed

    Gopinath, Kaundinya; Krishnamurthy, Venkatagiri; Sathian, K

    2018-02-01

    In a recent study, Eklund et al. employed resting-state functional magnetic resonance imaging data as a surrogate for null functional magnetic resonance imaging (fMRI) datasets and posited that cluster-wise family-wise error (FWE) rate-corrected inferences made by using parametric statistical methods in fMRI studies over the past two decades may have been invalid, particularly for cluster defining thresholds less stringent than p < 0.001; this was principally because the spatial autocorrelation functions (sACF) of fMRI data had been modeled incorrectly to follow a Gaussian form, whereas empirical data suggested otherwise. Here, we show that accounting for non-Gaussian signal components such as those arising from resting-state neural activity as well as physiological responses and motion artifacts in the null fMRI datasets yields first- and second-level general linear model analysis residuals with nearly uniform and Gaussian sACF. Further comparison with nonparametric permutation tests indicates that cluster-based FWE corrected inferences made with Gaussian spatial noise approximations are valid.

  1. Identifying Autism from Resting-State fMRI Using Long Short-Term Memory Networks.

    PubMed

    Dvornek, Nicha C; Ventola, Pamela; Pelphrey, Kevin A; Duncan, James S

    2017-09-01

    Functional magnetic resonance imaging (fMRI) has helped characterize the pathophysiology of autism spectrum disorders (ASD) and carries promise for producing objective biomarkers for ASD. Recent work has focused on deriving ASD biomarkers from resting-state functional connectivity measures. However, current efforts that have identified ASD with high accuracy were limited to homogeneous, small datasets, while classification results for heterogeneous, multi-site data have shown much lower accuracy. In this paper, we propose the use of recurrent neural networks with long short-term memory (LSTMs) for classification of individuals with ASD and typical controls directly from the resting-state fMRI time-series. We used the entire large, multi-site Autism Brain Imaging Data Exchange (ABIDE) I dataset for training and testing the LSTM models. Under a cross-validation framework, we achieved classification accuracy of 68.5%, which is 9% higher than previously reported methods that used fMRI data from the whole ABIDE cohort. Finally, we presented interpretation of the trained LSTM weights, which highlight potential functional networks and regions that are known to be implicated in ASD.

  2. Identifying Autism from Resting-State fMRI Using Long Short-Term Memory Networks

    PubMed Central

    Dvornek, Nicha C.; Ventola, Pamela; Pelphrey, Kevin A.; Duncan, James S.

    2017-01-01

    Functional magnetic resonance imaging (fMRI) has helped characterize the pathophysiology of autism spectrum disorders (ASD) and carries promise for producing objective biomarkers for ASD. Recent work has focused on deriving ASD biomarkers from resting-state functional connectivity measures. However, current efforts that have identified ASD with high accuracy were limited to homogeneous, small datasets, while classification results for heterogeneous, multi-site data have shown much lower accuracy. In this paper, we propose the use of recurrent neural networks with long short-term memory (LSTMs) for classification of individuals with ASD and typical controls directly from the resting-state fMRI time-series. We used the entire large, multi-site Autism Brain Imaging Data Exchange (ABIDE) I dataset for training and testing the LSTM models. Under a cross-validation framework, we achieved classification accuracy of 68.5%, which is 9% higher than previously reported methods that used fMRI data from the whole ABIDE cohort. Finally, we presented interpretation of the trained LSTM weights, which highlight potential functional networks and regions that are known to be implicated in ASD. PMID:29104967

  3. A wavelet method for modeling and despiking motion artifacts from resting-state fMRI time series.

    PubMed

    Patel, Ameera X; Kundu, Prantik; Rubinov, Mikail; Jones, P Simon; Vértes, Petra E; Ersche, Karen D; Suckling, John; Bullmore, Edward T

    2014-07-15

    The impact of in-scanner head movement on functional magnetic resonance imaging (fMRI) signals has long been established as undesirable. These effects have been traditionally corrected by methods such as linear regression of head movement parameters. However, a number of recent independent studies have demonstrated that these techniques are insufficient to remove motion confounds, and that even small movements can spuriously bias estimates of functional connectivity. Here we propose a new data-driven, spatially-adaptive, wavelet-based method for identifying, modeling, and removing non-stationary events in fMRI time series, caused by head movement, without the need for data scrubbing. This method involves the addition of just one extra step, the Wavelet Despike, in standard pre-processing pipelines. With this method, we demonstrate robust removal of a range of different motion artifacts and motion-related biases including distance-dependent connectivity artifacts, at a group and single-subject level, using a range of previously published and new diagnostic measures. The Wavelet Despike is able to accommodate the substantial spatial and temporal heterogeneity of motion artifacts and can consequently remove a range of high and low frequency artifacts from fMRI time series, that may be linearly or non-linearly related to physical movements. Our methods are demonstrated by the analysis of three cohorts of resting-state fMRI data, including two high-motion datasets: a previously published dataset on children (N=22) and a new dataset on adults with stimulant drug dependence (N=40). We conclude that there is a real risk of motion-related bias in connectivity analysis of fMRI data, but that this risk is generally manageable, by effective time series denoising strategies designed to attenuate synchronized signal transients induced by abrupt head movements. The Wavelet Despiking software described in this article is freely available for download at www.brainwavelet.org. Copyright © 2014. Published by Elsevier Inc.

  4. A wavelet method for modeling and despiking motion artifacts from resting-state fMRI time series

    PubMed Central

    Patel, Ameera X.; Kundu, Prantik; Rubinov, Mikail; Jones, P. Simon; Vértes, Petra E.; Ersche, Karen D.; Suckling, John; Bullmore, Edward T.

    2014-01-01

    The impact of in-scanner head movement on functional magnetic resonance imaging (fMRI) signals has long been established as undesirable. These effects have been traditionally corrected by methods such as linear regression of head movement parameters. However, a number of recent independent studies have demonstrated that these techniques are insufficient to remove motion confounds, and that even small movements can spuriously bias estimates of functional connectivity. Here we propose a new data-driven, spatially-adaptive, wavelet-based method for identifying, modeling, and removing non-stationary events in fMRI time series, caused by head movement, without the need for data scrubbing. This method involves the addition of just one extra step, the Wavelet Despike, in standard pre-processing pipelines. With this method, we demonstrate robust removal of a range of different motion artifacts and motion-related biases including distance-dependent connectivity artifacts, at a group and single-subject level, using a range of previously published and new diagnostic measures. The Wavelet Despike is able to accommodate the substantial spatial and temporal heterogeneity of motion artifacts and can consequently remove a range of high and low frequency artifacts from fMRI time series, that may be linearly or non-linearly related to physical movements. Our methods are demonstrated by the analysis of three cohorts of resting-state fMRI data, including two high-motion datasets: a previously published dataset on children (N = 22) and a new dataset on adults with stimulant drug dependence (N = 40). We conclude that there is a real risk of motion-related bias in connectivity analysis of fMRI data, but that this risk is generally manageable, by effective time series denoising strategies designed to attenuate synchronized signal transients induced by abrupt head movements. The Wavelet Despiking software described in this article is freely available for download at www.brainwavelet.org. PMID:24657353

  5. Machine learning algorithm accurately detects fMRI signature of vulnerability to major depression.

    PubMed

    Sato, João R; Moll, Jorge; Green, Sophie; Deakin, John F W; Thomaz, Carlos E; Zahn, Roland

    2015-08-30

    Standard functional magnetic resonance imaging (fMRI) analyses cannot assess the potential of a neuroimaging signature as a biomarker to predict individual vulnerability to major depression (MD). Here, we use machine learning for the first time to address this question. Using a recently identified neural signature of guilt-selective functional disconnection, the classification algorithm was able to distinguish remitted MD from control participants with 78.3% accuracy. This demonstrates the high potential of our fMRI signature as a biomarker of MD vulnerability. Crown Copyright © 2015. Published by Elsevier Ireland Ltd. All rights reserved.

  6. The Function Biomedical Informatics Research Network Data Repository

    PubMed Central

    Keator, David B.; van Erp, Theo G.M.; Turner, Jessica A.; Glover, Gary H.; Mueller, Bryon A.; Liu, Thomas T.; Voyvodic, James T.; Rasmussen, Jerod; Calhoun, Vince D.; Lee, Hyo Jong; Toga, Arthur W.; McEwen, Sarah; Ford, Judith M.; Mathalon, Daniel H.; Diaz, Michele; O’Leary, Daniel S.; Bockholt, H. Jeremy; Gadde, Syam; Preda, Adrian; Wible, Cynthia G.; Stern, Hal S.; Belger, Aysenil; McCarthy, Gregory; Ozyurt, Burak; Potkin, Steven G.

    2015-01-01

    The Function Biomedical Informatics Research Network (FBIRN) developed methods and tools for conducting multi-scanner functional magnetic resonance imaging (fMRI) studies. Method and tool development were based on two major goals: 1) to assess the major sources of variation in fMRI studies conducted across scanners, including instrumentation, acquisition protocols, challenge tasks, and analysis methods, and 2) to provide a distributed network infrastructure and an associated federated database to host and query large, multi-site, fMRI and clinical datasets. In the process of achieving these goals the FBIRN test bed generated several multi-scanner brain imaging data sets to be shared with the wider scientific community via the BIRN Data Repository (BDR). The FBIRN Phase 1 dataset consists of a traveling subject study of 5 healthy subjects, each scanned on 10 different 1.5 to 4 Tesla scanners. The FBIRN Phase 2 and Phase 3 datasets consist of subjects with schizophrenia or schizoaffective disorder along with healthy comparison subjects scanned at multiple sites. In this paper, we provide concise descriptions of FBIRN’s multi-scanner brain imaging data sets and details about the BIRN Data Repository instance of the Human Imaging Database (HID) used to publicly share the data. PMID:26364863

  7. Compressed Sensing for fMRI: Feasibility Study on the Acceleration of Non-EPI fMRI at 9.4T

    PubMed Central

    Kim, Seong-Gi; Ye, Jong Chul

    2015-01-01

    Conventional functional magnetic resonance imaging (fMRI) technique known as gradient-recalled echo (GRE) echo-planar imaging (EPI) is sensitive to image distortion and degradation caused by local magnetic field inhomogeneity at high magnetic fields. Non-EPI sequences such as spoiled gradient echo and balanced steady-state free precession (bSSFP) have been proposed as an alternative high-resolution fMRI technique; however, the temporal resolution of these sequences is lower than the typically used GRE-EPI fMRI. One potential approach to improve the temporal resolution is to use compressed sensing (CS). In this study, we tested the feasibility of k-t FOCUSS—one of the high performance CS algorithms for dynamic MRI—for non-EPI fMRI at 9.4T using the model of rat somatosensory stimulation. To optimize the performance of CS reconstruction, different sampling patterns and k-t FOCUSS variations were investigated. Experimental results show that an optimized k-t FOCUSS algorithm with acceleration by a factor of 4 works well for non-EPI fMRI at high field under various statistical criteria, which confirms that a combination of CS and a non-EPI sequence may be a good solution for high-resolution fMRI at high fields. PMID:26413503

  8. Sex differences in language asymmetry are age-dependent and small: a large-scale, consonant-vowel dichotic listening study with behavioral and fMRI data.

    PubMed

    Hirnstein, Marco; Westerhausen, René; Korsnes, Maria S; Hugdahl, Kenneth

    2013-01-01

    Men are often believed to have a functionally more asymmetrical brain organization than women, but the empirical evidence for sex differences in lateralization is unclear to date. Over the years we have collected data from a vast number of participants using the same consonant-vowel dichotic listening task, a reliable marker for language lateralization. One dataset comprised behavioral data from 1782 participants (885 females, 125 non-right-handers), who were divided in four age groups (children <10 yrs, adolescents = 10-15 yrs, younger adults = 16-49 yrs, and older adults >50 yrs). In addition, we had behavioral and functional imaging (fMRI) data from another 104 younger adults (49 females, aged 18-45 yrs), who completed the same dichotic listening task in a 3T scanner. This database allowed us to comprehensively test whether there is a sex difference in functional language lateralization. Across all participants and in both datasets a right ear advantage (REA) emerged, reflecting left-hemispheric language lateralization. Accordingly, the fMRI data revealed a leftward asymmetry in superior temporal lobe language processing areas. In the N = 1782 dataset no main effect of sex but a significant sex by age interaction emerged: the REA increased with age in both sexes but as a result of an earlier onset in females the REA was stronger in female than male adolescents. In turn, male younger adults showed greater asymmetry than female younger adults (accounting for <1% of variance). There were no sex differences in children and older adults. The males in the fMRI dataset (N = 104) also had a greater REA than females (accounting for 4% of variance), but no sex difference emerged in the neuroimaging data. Handedness did not affect these findings. Taken together, our findings suggest that sex differences in language lateralization as assessed with dichotic listening exist, but they are (a) not necessarily reflected in fMRI data, (b) age-dependent and (c) relatively small. Copyright © 2012 Elsevier Ltd. All rights reserved.

  9. A general probabilistic model for group independent component analysis and its estimation methods

    PubMed Central

    Guo, Ying

    2012-01-01

    SUMMARY Independent component analysis (ICA) has become an important tool for analyzing data from functional magnetic resonance imaging (fMRI) studies. ICA has been successfully applied to single-subject fMRI data. The extension of ICA to group inferences in neuroimaging studies, however, is challenging due to the unavailability of a pre-specified group design matrix and the uncertainty in between-subjects variability in fMRI data. We present a general probabilistic ICA (PICA) model that can accommodate varying group structures of multi-subject spatio-temporal processes. An advantage of the proposed model is that it can flexibly model various types of group structures in different underlying neural source signals and under different experimental conditions in fMRI studies. A maximum likelihood method is used for estimating this general group ICA model. We propose two EM algorithms to obtain the ML estimates. The first method is an exact EM algorithm which provides an exact E-step and an explicit noniterative M-step. The second method is an variational approximation EM algorithm which is computationally more efficient than the exact EM. In simulation studies, we first compare the performance of the proposed general group PICA model and the existing probabilistic group ICA approach. We then compare the two proposed EM algorithms and show the variational approximation EM achieves comparable accuracy to the exact EM with significantly less computation time. An fMRI data example is used to illustrate application of the proposed methods. PMID:21517789

  10. An Automated, Adaptive Framework for Optimizing Preprocessing Pipelines in Task-Based Functional MRI

    PubMed Central

    Churchill, Nathan W.; Spring, Robyn; Afshin-Pour, Babak; Dong, Fan; Strother, Stephen C.

    2015-01-01

    BOLD fMRI is sensitive to blood-oxygenation changes correlated with brain function; however, it is limited by relatively weak signal and significant noise confounds. Many preprocessing algorithms have been developed to control noise and improve signal detection in fMRI. Although the chosen set of preprocessing and analysis steps (the “pipeline”) significantly affects signal detection, pipelines are rarely quantitatively validated in the neuroimaging literature, due to complex preprocessing interactions. This paper outlines and validates an adaptive resampling framework for evaluating and optimizing preprocessing choices by optimizing data-driven metrics of task prediction and spatial reproducibility. Compared to standard “fixed” preprocessing pipelines, this optimization approach significantly improves independent validation measures of within-subject test-retest, and between-subject activation overlap, and behavioural prediction accuracy. We demonstrate that preprocessing choices function as implicit model regularizers, and that improvements due to pipeline optimization generalize across a range of simple to complex experimental tasks and analysis models. Results are shown for brief scanning sessions (<3 minutes each), demonstrating that with pipeline optimization, it is possible to obtain reliable results and brain-behaviour correlations in relatively small datasets. PMID:26161667

  11. Prediction of activation patterns preceding hallucinations in patients with schizophrenia using machine learning with structured sparsity.

    PubMed

    de Pierrefeu, Amicie; Fovet, Thomas; Hadj-Selem, Fouad; Löfstedt, Tommy; Ciuciu, Philippe; Lefebvre, Stephanie; Thomas, Pierre; Lopes, Renaud; Jardri, Renaud; Duchesnay, Edouard

    2018-04-01

    Despite significant progress in the field, the detection of fMRI signal changes during hallucinatory events remains difficult and time-consuming. This article first proposes a machine-learning algorithm to automatically identify resting-state fMRI periods that precede hallucinations versus periods that do not. When applied to whole-brain fMRI data, state-of-the-art classification methods, such as support vector machines (SVM), yield dense solutions that are difficult to interpret. We proposed to extend the existing sparse classification methods by taking the spatial structure of brain images into account with structured sparsity using the total variation penalty. Based on this approach, we obtained reliable classifying performances associated with interpretable predictive patterns, composed of two clearly identifiable clusters in speech-related brain regions. The variation in transition-to-hallucination functional patterns not only from one patient to another but also from one occurrence to the next (e.g., also depending on the sensory modalities involved) appeared to be the major difficulty when developing effective classifiers. Consequently, second, this article aimed to characterize the variability within the prehallucination patterns using an extension of principal component analysis with spatial constraints. The principal components (PCs) and the associated basis patterns shed light on the intrinsic structures of the variability present in the dataset. Such results are promising in the scope of innovative fMRI-guided therapy for drug-resistant hallucinations, such as fMRI-based neurofeedback. © 2018 Wiley Periodicals, Inc.

  12. TWave: High-Order Analysis of Functional MRI

    PubMed Central

    Barnathan, Michael; Megalooikonomou, Vasileios; Faloutsos, Christos; Faro, Scott; Mohamed, Feroze B.

    2011-01-01

    The traditional approach to functional image analysis models images as matrices of raw voxel intensity values. Although such a representation is widely utilized and heavily entrenched both within neuroimaging and in the wider data mining community, the strong interactions among space, time, and categorical modes such as subject and experimental task inherent in functional imaging yield a dataset with “high-order” structure, which matrix models are incapable of exploiting. Reasoning across all of these modes of data concurrently requires a high-order model capable of representing relationships between all modes of the data in tandem. We thus propose to model functional MRI data using tensors, which are high-order generalizations of matrices equivalent to multidimensional arrays or data cubes. However, several unique challenges exist in the high-order analysis of functional medical data: naïve tensor models are incapable of exploiting spatiotemporal locality patterns, standard tensor analysis techniques exhibit poor efficiency, and mixtures of numeric and categorical modes of data are very often present in neuroimaging experiments. Formulating the problem of image clustering as a form of Latent Semantic Analysis and using the WaveCluster algorithm as a baseline, we propose a comprehensive hybrid tensor and wavelet framework for clustering, concept discovery, and compression of functional medical images which successfully addresses these challenges. Our approach reduced runtime and dataset size on a 9.3 GB finger opposition motor task fMRI dataset by up to 98% while exhibiting improved spatiotemporal coherence relative to standard tensor, wavelet, and voxel-based approaches. Our clustering technique was capable of automatically differentiating between the frontal areas of the brain responsible for task-related habituation and the motor regions responsible for executing the motor task, in contrast to a widely used fMRI analysis program, SPM, which only detected the latter region. Furthermore, our approach discovered latent concepts suggestive of subject handedness nearly 100x faster than standard approaches. These results suggest that a high-order model is an integral component to accurate scalable functional neuroimaging. PMID:21729758

  13. Estimating neural response functions from fMRI

    PubMed Central

    Kumar, Sukhbinder; Penny, William

    2014-01-01

    This paper proposes a methodology for estimating Neural Response Functions (NRFs) from fMRI data. These NRFs describe non-linear relationships between experimental stimuli and neuronal population responses. The method is based on a two-stage model comprising an NRF and a Hemodynamic Response Function (HRF) that are simultaneously fitted to fMRI data using a Bayesian optimization algorithm. This algorithm also produces a model evidence score, providing a formal model comparison method for evaluating alternative NRFs. The HRF is characterized using previously established “Balloon” and BOLD signal models. We illustrate the method with two example applications based on fMRI studies of the auditory system. In the first, we estimate the time constants of repetition suppression and facilitation, and in the second we estimate the parameters of population receptive fields in a tonotopic mapping study. PMID:24847246

  14. An empirical comparison of SPM preprocessing parameters to the analysis of fMRI data.

    PubMed

    Della-Maggiore, Valeria; Chau, Wilkin; Peres-Neto, Pedro R; McIntosh, Anthony R

    2002-09-01

    We present the results from two sets of Monte Carlo simulations aimed at evaluating the robustness of some preprocessing parameters of SPM99 for the analysis of functional magnetic resonance imaging (fMRI). Statistical robustness was estimated by implementing parametric and nonparametric simulation approaches based on the images obtained from an event-related fMRI experiment. Simulated datasets were tested for combinations of the following parameters: basis function, global scaling, low-pass filter, high-pass filter and autoregressive modeling of serial autocorrelation. Based on single-subject SPM analysis, we derived the following conclusions that may serve as a guide for initial analysis of fMRI data using SPM99: (1) The canonical hemodynamic response function is a more reliable basis function to model the fMRI time series than HRF with time derivative. (2) Global scaling should be avoided since it may significantly decrease the power depending on the experimental design. (3) The use of a high-pass filter may be beneficial for event-related designs with fixed interstimulus intervals. (4) When dealing with fMRI time series with short interstimulus intervals (<8 s), the use of first-order autoregressive model is recommended over a low-pass filter (HRF) because it reduces the risk of inferential bias while providing a relatively good power. For datasets with interstimulus intervals longer than 8 seconds, temporal smoothing is not recommended since it decreases power. While the generalizability of our results may be limited, the methods we employed can be easily implemented by other scientists to determine the best parameter combination to analyze their data.

  15. Canonical Correlation Analysis for Feature-Based Fusion of Biomedical Imaging Modalities and Its Application to Detection of Associative Networks in Schizophrenia.

    PubMed

    Correa, Nicolle M; Li, Yi-Ou; Adalı, Tülay; Calhoun, Vince D

    2008-12-01

    Typically data acquired through imaging techniques such as functional magnetic resonance imaging (fMRI), structural MRI (sMRI), and electroencephalography (EEG) are analyzed separately. However, fusing information from such complementary modalities promises to provide additional insight into connectivity across brain networks and changes due to disease. We propose a data fusion scheme at the feature level using canonical correlation analysis (CCA) to determine inter-subject covariations across modalities. As we show both with simulation results and application to real data, multimodal CCA (mCCA) proves to be a flexible and powerful method for discovering associations among various data types. We demonstrate the versatility of the method with application to two datasets, an fMRI and EEG, and an fMRI and sMRI dataset, both collected from patients diagnosed with schizophrenia and healthy controls. CCA results for fMRI and EEG data collected for an auditory oddball task reveal associations of the temporal and motor areas with the N2 and P3 peaks. For the application to fMRI and sMRI data collected for an auditory sensorimotor task, CCA results show an interesting joint relationship between fMRI and gray matter, with patients with schizophrenia showing more functional activity in motor areas and less activity in temporal areas associated with less gray matter as compared to healthy controls. Additionally, we compare our scheme with an independent component analysis based fusion method, joint-ICA that has proven useful for such a study and note that the two methods provide complementary perspectives on data fusion.

  16. A METHOD FOR USING BLOCKED AND EVENT-RELATED FMRI DATA TO STUDY “RESTING STATE” FUNCTIONAL CONNECTIVITY

    PubMed Central

    Fair, Damien A.; Schlaggar, Bradley L.; Cohen B.A., Alexander L.; Miezin, Francis M.; Dosenbach, Nico U.F.; Wenger, Kristin K.; Fox, Michael D.; Snyder, Abraham Z.; Raichle, Marcus E.; Petersen, Steven E.

    2007-01-01

    Resting state functional connectivity MRI (fcMRI) has become a particularly useful tool for studying regional relationships in typical and atypical populations. Because many investigators have already obtained large datasets of task related fMRI, the ability to use this existing task data for resting state fcMRI is of considerable interest. Two classes of datasets could potentially be modified to emulate resting state data. These datasets include: 1) “interleaved” resting blocks from blocked or mixed blocked/event-related sets, and 2) residual timecourses from event-related sets that lack rest blocks. Using correlation analysis, we compared the functional connectivity of resting epochs taken from a mixed blocked/event-related design fMRI data set and the residuals derived from event-related data with standard continuous resting state data to determine which class of data can best emulate resting state data. We show that despite some differences, the functional connectivity for the interleaved resting periods taken from blocked designs is both qualitatively and quantitatively very similar to that of “continuous” resting state data. In contrast, despite being qualitatively similar to “continuous” resting state data, residuals derived from event-related design data had several distinct quantitative differences. These results suggest that the interleaved resting state data such as those taken from blocked or mixed blocked/event-related fMRI designs are well-suited for resting state functional connectivity analyses. Although using event-related data residuals for resting state functional connectivity may still be useful, results should be interpreted with care. PMID:17239622

  17. A conditional Granger causality model approach for group analysis in functional MRI

    PubMed Central

    Zhou, Zhenyu; Wang, Xunheng; Klahr, Nelson J.; Liu, Wei; Arias, Diana; Liu, Hongzhi; von Deneen, Karen M.; Wen, Ying; Lu, Zuhong; Xu, Dongrong; Liu, Yijun

    2011-01-01

    Granger causality model (GCM) derived from multivariate vector autoregressive models of data has been employed for identifying effective connectivity in the human brain with functional MR imaging (fMRI) and to reveal complex temporal and spatial dynamics underlying a variety of cognitive processes. In the most recent fMRI effective connectivity measures, pairwise GCM has commonly been applied based on single voxel values or average values from special brain areas at the group level. Although a few novel conditional GCM methods have been proposed to quantify the connections between brain areas, our study is the first to propose a viable standardized approach for group analysis of an fMRI data with GCM. To compare the effectiveness of our approach with traditional pairwise GCM models, we applied a well-established conditional GCM to pre-selected time series of brain regions resulting from general linear model (GLM) and group spatial kernel independent component analysis (ICA) of an fMRI dataset in the temporal domain. Datasets consisting of one task-related and one resting-state fMRI were used to investigate connections among brain areas with the conditional GCM method. With the GLM detected brain activation regions in the emotion related cortex during the block design paradigm, the conditional GCM method was proposed to study the causality of the habituation between the left amygdala and pregenual cingulate cortex during emotion processing. For the resting-state dataset, it is possible to calculate not only the effective connectivity between networks but also the heterogeneity within a single network. Our results have further shown a particular interacting pattern of default mode network (DMN) that can be characterized as both afferent and efferent influences on the medial prefrontal cortex (mPFC) and posterior cingulate cortex (PCC). These results suggest that the conditional GCM approach based on a linear multivariate vector autoregressive (MVAR) model can achieve greater accuracy in detecting network connectivity than the widely used pairwise GCM, and this group analysis methodology can be quite useful to extend the information obtainable in fMRI. PMID:21232892

  18. Multivariate analysis of fMRI time series: classification and regression of brain responses using machine learning.

    PubMed

    Formisano, Elia; De Martino, Federico; Valente, Giancarlo

    2008-09-01

    Machine learning and pattern recognition techniques are being increasingly employed in functional magnetic resonance imaging (fMRI) data analysis. By taking into account the full spatial pattern of brain activity measured simultaneously at many locations, these methods allow detecting subtle, non-strictly localized effects that may remain invisible to the conventional analysis with univariate statistical methods. In typical fMRI applications, pattern recognition algorithms "learn" a functional relationship between brain response patterns and a perceptual, cognitive or behavioral state of a subject expressed in terms of a label, which may assume discrete (classification) or continuous (regression) values. This learned functional relationship is then used to predict the unseen labels from a new data set ("brain reading"). In this article, we describe the mathematical foundations of machine learning applications in fMRI. We focus on two methods, support vector machines and relevance vector machines, which are respectively suited for the classification and regression of fMRI patterns. Furthermore, by means of several examples and applications, we illustrate and discuss the methodological challenges of using machine learning algorithms in the context of fMRI data analysis.

  19. A whole brain atlas with sub-parcellation of cortical gyri using resting fMRI

    NASA Astrophysics Data System (ADS)

    Joshi, Anand A.; Choi, Soyoung; Sonkar, Gaurav; Chong, Minqi; Gonzalez-Martinez, Jorge; Nair, Dileep; Shattuck, David W.; Damasio, Hanna; Leahy, Richard M.

    2017-02-01

    The new hybrid-BCI-DNI atlas is a high-resolution MPRAGE, single-subject atlas, constructed using both anatomical and functional information to guide the parcellation of the cerebral cortex. Anatomical labeling was performed manually on coronal single-slice images guided by sulcal and gyral landmarks to generate the original (non-hybrid) BCI-DNI atlas. Functional sub-parcellations of the gyral ROIs were then generated from 40 minimally preprocessed resting fMRI datasets from the HCP database. Gyral ROIs were transferred from the BCI-DNI atlas to the 40 subjects using the HCP grayordinate space as a reference. For each subject, each gyral ROI was subdivided using the fMRI data by applying spectral clustering to a similarity matrix computed from the fMRI time-series correlations between each vertex pair. The sub-parcellations were then transferred back to the original cortical mesh to create the subparcellated hBCI-DNI atlas with a total of 67 cortical regions per hemisphere. To assess the stability of the gyral subdivisons, a separate set of 60 HCP datasets were processed as follows: 1) coregistration of the structural scans to the hBCI-DNI atlas; 2) coregistration of the anatomical BCI-DNI atlas without functional subdivisions, followed by sub-parcellation of each subject's resting fMRI data as described above. We then computed consistency between the anatomically-driven delineation of each gyral subdivision and that obtained per subject using individual fMRI data. The gyral sub-parcellations generated by atlas-based registration show variable but generally good overlap of the confidence intervals with the resting fMRI-based subdivisions. These consistency measures will provide a quantitative measure of reliability of each subdivision to users of the atlas.

  20. A high-resolution 7-Tesla fMRI dataset from complex natural stimulation with an audio movie.

    PubMed

    Hanke, Michael; Baumgartner, Florian J; Ibe, Pierre; Kaule, Falko R; Pollmann, Stefan; Speck, Oliver; Zinke, Wolf; Stadler, Jörg

    2014-01-01

    Here we present a high-resolution functional magnetic resonance (fMRI) dataset - 20 participants recorded at high field strength (7 Tesla) during prolonged stimulation with an auditory feature film ("Forrest Gump"). In addition, a comprehensive set of auxiliary data (T1w, T2w, DTI, susceptibility-weighted image, angiography) as well as measurements to assess technical and physiological noise components have been acquired. An initial analysis confirms that these data can be used to study common and idiosyncratic brain response patterns to complex auditory stimulation. Among the potential uses of this dataset are the study of auditory attention and cognition, language and music perception, and social perception. The auxiliary measurements enable a large variety of additional analysis strategies that relate functional response patterns to structural properties of the brain. Alongside the acquired data, we provide source code and detailed information on all employed procedures - from stimulus creation to data analysis. In order to facilitate replicative and derived works, only free and open-source software was utilized.

  1. Visual analytics of brain networks.

    PubMed

    Li, Kaiming; Guo, Lei; Faraco, Carlos; Zhu, Dajiang; Chen, Hanbo; Yuan, Yixuan; Lv, Jinglei; Deng, Fan; Jiang, Xi; Zhang, Tuo; Hu, Xintao; Zhang, Degang; Miller, L Stephen; Liu, Tianming

    2012-05-15

    Identification of regions of interest (ROIs) is a fundamental issue in brain network construction and analysis. Recent studies demonstrate that multimodal neuroimaging approaches and joint analysis strategies are crucial for accurate, reliable and individualized identification of brain ROIs. In this paper, we present a novel approach of visual analytics and its open-source software for ROI definition and brain network construction. By combining neuroscience knowledge and computational intelligence capabilities, visual analytics can generate accurate, reliable and individualized ROIs for brain networks via joint modeling of multimodal neuroimaging data and an intuitive and real-time visual analytics interface. Furthermore, it can be used as a functional ROI optimization and prediction solution when fMRI data is unavailable or inadequate. We have applied this approach to an operation span working memory fMRI/DTI dataset, a schizophrenia DTI/resting state fMRI (R-fMRI) dataset, and a mild cognitive impairment DTI/R-fMRI dataset, in order to demonstrate the effectiveness of visual analytics. Our experimental results are encouraging. Copyright © 2012 Elsevier Inc. All rights reserved.

  2. Interleaved EPI based fMRI improved by multiplexed sensitivity encoding (MUSE) and simultaneous multi-band imaging.

    PubMed

    Chang, Hing-Chiu; Gaur, Pooja; Chou, Ying-hui; Chu, Mei-Lan; Chen, Nan-kuei

    2014-01-01

    Functional magnetic resonance imaging (fMRI) is a non-invasive and powerful imaging tool for detecting brain activities. The majority of fMRI studies are performed with single-shot echo-planar imaging (EPI) due to its high temporal resolution. Recent studies have demonstrated that, by increasing the spatial-resolution of fMRI, previously unidentified neuronal networks can be measured. However, it is challenging to improve the spatial resolution of conventional single-shot EPI based fMRI. Although multi-shot interleaved EPI is superior to single-shot EPI in terms of the improved spatial-resolution, reduced geometric distortions, and sharper point spread function (PSF), interleaved EPI based fMRI has two main limitations: 1) the imaging throughput is lower in interleaved EPI; 2) the magnitude and phase signal variations among EPI segments (due to physiological noise, subject motion, and B0 drift) are translated to significant in-plane aliasing artifact across the field of view (FOV). Here we report a method that integrates multiple approaches to address the technical limitations of interleaved EPI-based fMRI. Firstly, the multiplexed sensitivity-encoding (MUSE) post-processing algorithm is used to suppress in-plane aliasing artifacts resulting from time-domain signal instabilities during dynamic scans. Secondly, a simultaneous multi-band interleaved EPI pulse sequence, with a controlled aliasing scheme incorporated, is implemented to increase the imaging throughput. Thirdly, the MUSE algorithm is then generalized to accommodate fMRI data obtained with our multi-band interleaved EPI pulse sequence, suppressing both in-plane and through-plane aliasing artifacts. The blood-oxygenation-level-dependent (BOLD) signal detectability and the scan throughput can be significantly improved for interleaved EPI-based fMRI. Our human fMRI data obtained from 3 Tesla systems demonstrate the effectiveness of the developed methods. It is expected that future fMRI studies requiring high spatial-resolvability and fidelity will largely benefit from the reported techniques.

  3. American Society of Functional Neuroradiology-Recommended fMRI Paradigm Algorithms for Presurgical Language Assessment.

    PubMed

    Black, D F; Vachha, B; Mian, A; Faro, S H; Maheshwari, M; Sair, H I; Petrella, J R; Pillai, J J; Welker, K

    2017-10-01

    Functional MR imaging is increasingly being used for presurgical language assessment in the treatment of patients with brain tumors, epilepsy, vascular malformations, and other conditions. The inherent complexity of fMRI, which includes numerous processing steps and selective analyses, is compounded by institution-unique approaches to patient training, paradigm choice, and an eclectic array of postprocessing options from various vendors. Consequently, institutions perform fMRI in such markedly different manners that data sharing, comparison, and generalization of results are difficult. The American Society of Functional Neuroradiology proposes widespread adoption of common fMRI language paradigms as the first step in countering this lost opportunity to advance our knowledge and improve patient care. A taskforce of American Society of Functional Neuroradiology members from multiple institutions used a broad literature review, member polls, and expert opinion to converge on 2 sets of standard language paradigms that strike a balance between ease of application and clinical usefulness. The taskforce generated an adult language paradigm algorithm for presurgical language assessment including the following tasks: Sentence Completion, Silent Word Generation, Rhyming, Object Naming, and/or Passive Story Listening. The pediatric algorithm includes the following tasks: Sentence Completion, Rhyming, Antonym Generation, or Passive Story Listening. Convergence of fMRI language paradigms across institutions offers the first step in providing a "Rosetta Stone" that provides a common reference point with which to compare and contrast the usefulness and reliability of fMRI data. From this common language task battery, future refinements and improvements are anticipated, particularly as objective measures of reliability become available. Some commonality of practice is a necessary first step to develop a foundation on which to improve the clinical utility of this field. © 2017 by American Journal of Neuroradiology.

  4. Spatial independent component analysis of functional MRI time-series: to what extent do results depend on the algorithm used?

    PubMed

    Esposito, Fabrizio; Formisano, Elia; Seifritz, Erich; Goebel, Rainer; Morrone, Renato; Tedeschi, Gioacchino; Di Salle, Francesco

    2002-07-01

    Independent component analysis (ICA) has been successfully employed to decompose functional MRI (fMRI) time-series into sets of activation maps and associated time-courses. Several ICA algorithms have been proposed in the neural network literature. Applied to fMRI, these algorithms might lead to different spatial or temporal readouts of brain activation. We compared the two ICA algorithms that have been used so far for spatial ICA (sICA) of fMRI time-series: the Infomax (Bell and Sejnowski [1995]: Neural Comput 7:1004-1034) and the Fixed-Point (Hyvärinen [1999]: Adv Neural Inf Proc Syst 10:273-279) algorithms. We evaluated the Infomax- and Fixed Point-based sICA decompositions of simulated motor, and real motor and visual activation fMRI time-series using an ensemble of measures. Log-likelihood (McKeown et al. [1998]: Hum Brain Mapp 6:160-188) was used as a measure of how significantly the estimated independent sources fit the statistical structure of the data; receiver operating characteristics (ROC) and linear correlation analyses were used to evaluate the algorithms' accuracy of estimating the spatial layout and the temporal dynamics of simulated and real activations; cluster sizing calculations and an estimation of a residual gaussian noise term within the components were used to examine the anatomic structure of ICA components and for the assessment of noise reduction capabilities. Whereas both algorithms produced highly accurate results, the Fixed-Point outperformed the Infomax in terms of spatial and temporal accuracy as long as inferential statistics were employed as benchmarks. Conversely, the Infomax sICA was superior in terms of global estimation of the ICA model and noise reduction capabilities. Because of its adaptive nature, the Infomax approach appears to be better suited to investigate activation phenomena that are not predictable or adequately modelled by inferential techniques. Copyright 2002 Wiley-Liss, Inc.

  5. A Space Affine Matching Approach to fMRI Time Series Analysis.

    PubMed

    Chen, Liang; Zhang, Weishi; Liu, Hongbo; Feng, Shigang; Chen, C L Philip; Wang, Huili

    2016-07-01

    For fMRI time series analysis, an important challenge is to overcome the potential delay between hemodynamic response signal and cognitive stimuli signal, namely the same frequency but different phase (SFDP) problem. In this paper, a novel space affine matching feature is presented by introducing the time domain and frequency domain features. The time domain feature is used to discern different stimuli, while the frequency domain feature to eliminate the delay. And then we propose a space affine matching (SAM) algorithm to match fMRI time series by our affine feature, in which a normal vector is estimated using gradient descent to explore the time series matching optimally. The experimental results illustrate that the SAM algorithm is insensitive to the delay between the hemodynamic response signal and the cognitive stimuli signal. Our approach significantly outperforms GLM method while there exists the delay. The approach can help us solve the SFDP problem in fMRI time series matching and thus of great promise to reveal brain dynamics.

  6. Empirical study of seven data mining algorithms on different characteristics of datasets for biomedical classification applications.

    PubMed

    Zhang, Yiyan; Xin, Yi; Li, Qin; Ma, Jianshe; Li, Shuai; Lv, Xiaodan; Lv, Weiqi

    2017-11-02

    Various kinds of data mining algorithms are continuously raised with the development of related disciplines. The applicable scopes and their performances of these algorithms are different. Hence, finding a suitable algorithm for a dataset is becoming an important emphasis for biomedical researchers to solve practical problems promptly. In this paper, seven kinds of sophisticated active algorithms, namely, C4.5, support vector machine, AdaBoost, k-nearest neighbor, naïve Bayes, random forest, and logistic regression, were selected as the research objects. The seven algorithms were applied to the 12 top-click UCI public datasets with the task of classification, and their performances were compared through induction and analysis. The sample size, number of attributes, number of missing values, and the sample size of each class, correlation coefficients between variables, class entropy of task variable, and the ratio of the sample size of the largest class to the least class were calculated to character the 12 research datasets. The two ensemble algorithms reach high accuracy of classification on most datasets. Moreover, random forest performs better than AdaBoost on the unbalanced dataset of the multi-class task. Simple algorithms, such as the naïve Bayes and logistic regression model are suitable for a small dataset with high correlation between the task and other non-task attribute variables. K-nearest neighbor and C4.5 decision tree algorithms perform well on binary- and multi-class task datasets. Support vector machine is more adept on the balanced small dataset of the binary-class task. No algorithm can maintain the best performance in all datasets. The applicability of the seven data mining algorithms on the datasets with different characteristics was summarized to provide a reference for biomedical researchers or beginners in different fields.

  7. Clusternomics: Integrative context-dependent clustering for heterogeneous datasets

    PubMed Central

    Wernisch, Lorenz

    2017-01-01

    Integrative clustering is used to identify groups of samples by jointly analysing multiple datasets describing the same set of biological samples, such as gene expression, copy number, methylation etc. Most existing algorithms for integrative clustering assume that there is a shared consistent set of clusters across all datasets, and most of the data samples follow this structure. However in practice, the structure across heterogeneous datasets can be more varied, with clusters being joined in some datasets and separated in others. In this paper, we present a probabilistic clustering method to identify groups across datasets that do not share the same cluster structure. The proposed algorithm, Clusternomics, identifies groups of samples that share their global behaviour across heterogeneous datasets. The algorithm models clusters on the level of individual datasets, while also extracting global structure that arises from the local cluster assignments. Clusters on both the local and the global level are modelled using a hierarchical Dirichlet mixture model to identify structure on both levels. We evaluated the model both on simulated and on real-world datasets. The simulated data exemplifies datasets with varying degrees of common structure. In such a setting Clusternomics outperforms existing algorithms for integrative and consensus clustering. In a real-world application, we used the algorithm for cancer subtyping, identifying subtypes of cancer from heterogeneous datasets. We applied the algorithm to TCGA breast cancer dataset, integrating gene expression, miRNA expression, DNA methylation and proteomics. The algorithm extracted clinically meaningful clusters with significantly different survival probabilities. We also evaluated the algorithm on lung and kidney cancer TCGA datasets with high dimensionality, again showing clinically significant results and scalability of the algorithm. PMID:29036190

  8. Clusternomics: Integrative context-dependent clustering for heterogeneous datasets.

    PubMed

    Gabasova, Evelina; Reid, John; Wernisch, Lorenz

    2017-10-01

    Integrative clustering is used to identify groups of samples by jointly analysing multiple datasets describing the same set of biological samples, such as gene expression, copy number, methylation etc. Most existing algorithms for integrative clustering assume that there is a shared consistent set of clusters across all datasets, and most of the data samples follow this structure. However in practice, the structure across heterogeneous datasets can be more varied, with clusters being joined in some datasets and separated in others. In this paper, we present a probabilistic clustering method to identify groups across datasets that do not share the same cluster structure. The proposed algorithm, Clusternomics, identifies groups of samples that share their global behaviour across heterogeneous datasets. The algorithm models clusters on the level of individual datasets, while also extracting global structure that arises from the local cluster assignments. Clusters on both the local and the global level are modelled using a hierarchical Dirichlet mixture model to identify structure on both levels. We evaluated the model both on simulated and on real-world datasets. The simulated data exemplifies datasets with varying degrees of common structure. In such a setting Clusternomics outperforms existing algorithms for integrative and consensus clustering. In a real-world application, we used the algorithm for cancer subtyping, identifying subtypes of cancer from heterogeneous datasets. We applied the algorithm to TCGA breast cancer dataset, integrating gene expression, miRNA expression, DNA methylation and proteomics. The algorithm extracted clinically meaningful clusters with significantly different survival probabilities. We also evaluated the algorithm on lung and kidney cancer TCGA datasets with high dimensionality, again showing clinically significant results and scalability of the algorithm.

  9. HAFNI-enabled largescale platform for neuroimaging informatics (HELPNI).

    PubMed

    Makkie, Milad; Zhao, Shijie; Jiang, Xi; Lv, Jinglei; Zhao, Yu; Ge, Bao; Li, Xiang; Han, Junwei; Liu, Tianming

    Tremendous efforts have thus been devoted on the establishment of functional MRI informatics systems that recruit a comprehensive collection of statistical/computational approaches for fMRI data analysis. However, the state-of-the-art fMRI informatics systems are especially designed for specific fMRI sessions or studies of which the data size is not really big, and thus has difficulty in handling fMRI 'big data.' Given the size of fMRI data are growing explosively recently due to the advancement of neuroimaging technologies, an effective and efficient fMRI informatics system which can process and analyze fMRI big data is much needed. To address this challenge, in this work, we introduce our newly developed informatics platform, namely, 'HAFNI-enabled largescale platform for neuroimaging informatics (HELPNI).' HELPNI implements our recently developed computational framework of sparse representation of whole-brain fMRI signals which is called holistic atlases of functional networks and interactions (HAFNI) for fMRI data analysis. HELPNI provides integrated solutions to archive and process large-scale fMRI data automatically and structurally, to extract and visualize meaningful results information from raw fMRI data, and to share open-access processed and raw data with other collaborators through web. We tested the proposed HELPNI platform using publicly available 1000 Functional Connectomes dataset including over 1200 subjects. We identified consistent and meaningful functional brain networks across individuals and populations based on resting state fMRI (rsfMRI) big data. Using efficient sampling module, the experimental results demonstrate that our HELPNI system has superior performance than other systems for large-scale fMRI data in terms of processing and storing the data and associated results much faster.

  10. HAFNI-enabled largescale platform for neuroimaging informatics (HELPNI).

    PubMed

    Makkie, Milad; Zhao, Shijie; Jiang, Xi; Lv, Jinglei; Zhao, Yu; Ge, Bao; Li, Xiang; Han, Junwei; Liu, Tianming

    2015-12-01

    Tremendous efforts have thus been devoted on the establishment of functional MRI informatics systems that recruit a comprehensive collection of statistical/computational approaches for fMRI data analysis. However, the state-of-the-art fMRI informatics systems are especially designed for specific fMRI sessions or studies of which the data size is not really big, and thus has difficulty in handling fMRI 'big data.' Given the size of fMRI data are growing explosively recently due to the advancement of neuroimaging technologies, an effective and efficient fMRI informatics system which can process and analyze fMRI big data is much needed. To address this challenge, in this work, we introduce our newly developed informatics platform, namely, 'HAFNI-enabled largescale platform for neuroimaging informatics (HELPNI).' HELPNI implements our recently developed computational framework of sparse representation of whole-brain fMRI signals which is called holistic atlases of functional networks and interactions (HAFNI) for fMRI data analysis. HELPNI provides integrated solutions to archive and process large-scale fMRI data automatically and structurally, to extract and visualize meaningful results information from raw fMRI data, and to share open-access processed and raw data with other collaborators through web. We tested the proposed HELPNI platform using publicly available 1000 Functional Connectomes dataset including over 1200 subjects. We identified consistent and meaningful functional brain networks across individuals and populations based on resting state fMRI (rsfMRI) big data. Using efficient sampling module, the experimental results demonstrate that our HELPNI system has superior performance than other systems for large-scale fMRI data in terms of processing and storing the data and associated results much faster.

  11. Statistical testing and power analysis for brain-wide association study.

    PubMed

    Gong, Weikang; Wan, Lin; Lu, Wenlian; Ma, Liang; Cheng, Fan; Cheng, Wei; Grünewald, Stefan; Feng, Jianfeng

    2018-04-05

    The identification of connexel-wise associations, which involves examining functional connectivities between pairwise voxels across the whole brain, is both statistically and computationally challenging. Although such a connexel-wise methodology has recently been adopted by brain-wide association studies (BWAS) to identify connectivity changes in several mental disorders, such as schizophrenia, autism and depression, the multiple correction and power analysis methods designed specifically for connexel-wise analysis are still lacking. Therefore, we herein report the development of a rigorous statistical framework for connexel-wise significance testing based on the Gaussian random field theory. It includes controlling the family-wise error rate (FWER) of multiple hypothesis testings using topological inference methods, and calculating power and sample size for a connexel-wise study. Our theoretical framework can control the false-positive rate accurately, as validated empirically using two resting-state fMRI datasets. Compared with Bonferroni correction and false discovery rate (FDR), it can reduce false-positive rate and increase statistical power by appropriately utilizing the spatial information of fMRI data. Importantly, our method bypasses the need of non-parametric permutation to correct for multiple comparison, thus, it can efficiently tackle large datasets with high resolution fMRI images. The utility of our method is shown in a case-control study. Our approach can identify altered functional connectivities in a major depression disorder dataset, whereas existing methods fail. A software package is available at https://github.com/weikanggong/BWAS. Copyright © 2018 Elsevier B.V. All rights reserved.

  12. A high-resolution 7-Tesla fMRI dataset from complex natural stimulation with an audio movie

    PubMed Central

    Hanke, Michael; Baumgartner, Florian J.; Ibe, Pierre; Kaule, Falko R.; Pollmann, Stefan; Speck, Oliver; Zinke, Wolf; Stadler, Jörg

    2014-01-01

    Here we present a high-resolution functional magnetic resonance (fMRI) dataset – 20 participants recorded at high field strength (7 Tesla) during prolonged stimulation with an auditory feature film (“Forrest Gump”). In addition, a comprehensive set of auxiliary data (T1w, T2w, DTI, susceptibility-weighted image, angiography) as well as measurements to assess technical and physiological noise components have been acquired. An initial analysis confirms that these data can be used to study common and idiosyncratic brain response patterns to complex auditory stimulation. Among the potential uses of this dataset are the study of auditory attention and cognition, language and music perception, and social perception. The auxiliary measurements enable a large variety of additional analysis strategies that relate functional response patterns to structural properties of the brain. Alongside the acquired data, we provide source code and detailed information on all employed procedures – from stimulus creation to data analysis. In order to facilitate replicative and derived works, only free and open-source software was utilized. PMID:25977761

  13. Directional connectivity of resting state human fMRI data using cascaded ICA-PDC analysis.

    PubMed

    Silfverhuth, Minna J; Remes, Jukka; Starck, Tuomo; Nikkinen, Juha; Veijola, Juha; Tervonen, Osmo; Kiviniemi, Vesa

    2011-11-01

    Directional connectivity measures, such as partial directed coherence (PDC), give us means to explore effective connectivity in the human brain. By utilizing independent component analysis (ICA), the original data-set reduction was performed for further PDC analysis. To test this cascaded ICA-PDC approach in causality studies of human functional magnetic resonance imaging (fMRI) data. Resting state group data was imaged from 55 subjects using a 1.5 T scanner (TR 1800 ms, 250 volumes). Temporal concatenation group ICA in a probabilistic ICA and further repeatability runs (n = 200) were overtaken. The reduced data-set included the time series presentation of the following nine ICA components: secondary somatosensory cortex, inferior temporal gyrus, intracalcarine cortex, primary auditory cortex, amygdala, putamen and the frontal medial cortex, posterior cingulate cortex and precuneus, comprising the default mode network components. Re-normalized PDC (rPDC) values were computed to determine directional connectivity at the group level at each frequency. The integrative role was suggested for precuneus while the role of major divergence region may be proposed to primary auditory cortex and amygdala. This study demonstrates the potential of the cascaded ICA-PDC approach in directional connectivity studies of human fMRI.

  14. Functional feature embedded space mapping of fMRI data.

    PubMed

    Hu, Jin; Tian, Jie; Yang, Lei

    2006-01-01

    We have proposed a new method for fMRI data analysis which is called Functional Feature Embedded Space Mapping (FFESM). Our work mainly focuses on the experimental design with periodic stimuli which can be described by a number of Fourier coefficients in the frequency domain. A nonlinear dimension reduction technique Isomap is applied to the high dimensional features obtained from frequency domain of the fMRI data for the first time. Finally, the presence of activated time series is identified by the clustering method in which the information theoretic criterion of minimum description length (MDL) is used to estimate the number of clusters. The feasibility of our algorithm is demonstrated by real human experiments. Although we focus on analyzing periodic fMRI data, the approach can be extended to analyze non-periodic fMRI data (event-related fMRI) by replacing the Fourier analysis with a wavelet analysis.

  15. Genetic Programming and Frequent Itemset Mining to Identify Feature Selection Patterns of iEEG and fMRI Epilepsy Data

    PubMed Central

    Smart, Otis; Burrell, Lauren

    2014-01-01

    Pattern classification for intracranial electroencephalogram (iEEG) and functional magnetic resonance imaging (fMRI) signals has furthered epilepsy research toward understanding the origin of epileptic seizures and localizing dysfunctional brain tissue for treatment. Prior research has demonstrated that implicitly selecting features with a genetic programming (GP) algorithm more effectively determined the proper features to discern biomarker and non-biomarker interictal iEEG and fMRI activity than conventional feature selection approaches. However for each the iEEG and fMRI modalities, it is still uncertain whether the stochastic properties of indirect feature selection with a GP yield (a) consistent results within a patient data set and (b) features that are specific or universal across multiple patient data sets. We examined the reproducibility of implicitly selecting features to classify interictal activity using a GP algorithm by performing several selection trials and subsequent frequent itemset mining (FIM) for separate iEEG and fMRI epilepsy patient data. We observed within-subject consistency and across-subject variability with some small similarity for selected features, indicating a clear need for patient-specific features and possible need for patient-specific feature selection or/and classification. For the fMRI, using nearest-neighbor classification and 30 GP generations, we obtained over 60% median sensitivity and over 60% median selectivity. For the iEEG, using nearest-neighbor classification and 30 GP generations, we obtained over 65% median sensitivity and over 65% median selectivity except one patient. PMID:25580059

  16. Automatic classification of schizophrenia using resting-state functional language network via an adaptive learning algorithm

    NASA Astrophysics Data System (ADS)

    Zhu, Maohu; Jie, Nanfeng; Jiang, Tianzi

    2014-03-01

    A reliable and precise classification of schizophrenia is significant for its diagnosis and treatment of schizophrenia. Functional magnetic resonance imaging (fMRI) is a novel tool increasingly used in schizophrenia research. Recent advances in statistical learning theory have led to applying pattern classification algorithms to access the diagnostic value of functional brain networks, discovered from resting state fMRI data. The aim of this study was to propose an adaptive learning algorithm to distinguish schizophrenia patients from normal controls using resting-state functional language network. Furthermore, here the classification of schizophrenia was regarded as a sample selection problem where a sparse subset of samples was chosen from the labeled training set. Using these selected samples, which we call informative vectors, a classifier for the clinic diagnosis of schizophrenia was established. We experimentally demonstrated that the proposed algorithm incorporating resting-state functional language network achieved 83.6% leaveone- out accuracy on resting-state fMRI data of 27 schizophrenia patients and 28 normal controls. In contrast with KNearest- Neighbor (KNN), Support Vector Machine (SVM) and l1-norm, our method yielded better classification performance. Moreover, our results suggested that a dysfunction of resting-state functional language network plays an important role in the clinic diagnosis of schizophrenia.

  17. Exploring connectivity with large-scale Granger causality on resting-state functional MRI.

    PubMed

    DSouza, Adora M; Abidin, Anas Z; Leistritz, Lutz; Wismüller, Axel

    2017-08-01

    Large-scale Granger causality (lsGC) is a recently developed, resting-state functional MRI (fMRI) connectivity analysis approach that estimates multivariate voxel-resolution connectivity. Unlike most commonly used multivariate approaches, which establish coarse-resolution connectivity by aggregating voxel time-series avoiding an underdetermined problem, lsGC estimates voxel-resolution, fine-grained connectivity by incorporating an embedded dimension reduction. We investigate application of lsGC on realistic fMRI simulations, modeling smoothing of neuronal activity by the hemodynamic response function and repetition time (TR), and empirical resting-state fMRI data. Subsequently, functional subnetworks are extracted from lsGC connectivity measures for both datasets and validated quantitatively. We also provide guidelines to select lsGC free parameters. Results indicate that lsGC reliably recovers underlying network structure with area under receiver operator characteristic curve (AUC) of 0.93 at TR=1.5s for a 10-min session of fMRI simulations. Furthermore, subnetworks of closely interacting modules are recovered from the aforementioned lsGC networks. Results on empirical resting-state fMRI data demonstrate recovery of visual and motor cortex in close agreement with spatial maps obtained from (i) visuo-motor fMRI stimulation task-sequence (Accuracy=0.76) and (ii) independent component analysis (ICA) of resting-state fMRI (Accuracy=0.86). Compared with conventional Granger causality approach (AUC=0.75), lsGC produces better network recovery on fMRI simulations. Furthermore, it cannot recover functional subnetworks from empirical fMRI data, since quantifying voxel-resolution connectivity is not possible as consequence of encountering an underdetermined problem. Functional network recovery from fMRI data suggests that lsGC gives useful insight into connectivity patterns from resting-state fMRI at a multivariate voxel-resolution. Copyright © 2017 Elsevier B.V. All rights reserved.

  18. The increase of the functional entropy of the human brain with age.

    PubMed

    Yao, Y; Lu, W L; Xu, B; Li, C B; Lin, C P; Waxman, D; Feng, J F

    2013-10-09

    We use entropy to characterize intrinsic ageing properties of the human brain. Analysis of fMRI data from a large dataset of individuals, using resting state BOLD signals, demonstrated that a functional entropy associated with brain activity increases with age. During an average lifespan, the entropy, which was calculated from a population of individuals, increased by approximately 0.1 bits, due to correlations in BOLD activity becoming more widely distributed. We attribute this to the number of excitatory neurons and the excitatory conductance decreasing with age. Incorporating these properties into a computational model leads to quantitatively similar results to the fMRI data. Our dataset involved males and females and we found significant differences between them. The entropy of males at birth was lower than that of females. However, the entropies of the two sexes increase at different rates, and intersect at approximately 50 years; after this age, males have a larger entropy.

  19. The Increase of the Functional Entropy of the Human Brain with Age

    PubMed Central

    Yao, Y.; Lu, W. L.; Xu, B.; Li, C. B.; Lin, C. P.; Waxman, D.; Feng, J. F.

    2013-01-01

    We use entropy to characterize intrinsic ageing properties of the human brain. Analysis of fMRI data from a large dataset of individuals, using resting state BOLD signals, demonstrated that a functional entropy associated with brain activity increases with age. During an average lifespan, the entropy, which was calculated from a population of individuals, increased by approximately 0.1 bits, due to correlations in BOLD activity becoming more widely distributed. We attribute this to the number of excitatory neurons and the excitatory conductance decreasing with age. Incorporating these properties into a computational model leads to quantitatively similar results to the fMRI data. Our dataset involved males and females and we found significant differences between them. The entropy of males at birth was lower than that of females. However, the entropies of the two sexes increase at different rates, and intersect at approximately 50 years; after this age, males have a larger entropy. PMID:24103922

  20. Development, validation, and comparison of ICA-based gradient artifact reduction algorithms for simultaneous EEG-spiral in/out and echo-planar fMRI recordings

    PubMed Central

    Ryali, S; Glover, GH; Chang, C; Menon, V

    2009-01-01

    EEG data acquired in an MRI scanner are heavily contaminated by gradient artifacts that can significantly compromise signal quality. We developed two new methods based on Independent Component Analysis (ICA) for reducing gradient artifacts from spiral in-out and echo-planar pulse sequences at 3T, and compared our algorithms with four other commonly used methods: average artifact subtraction (Allen et al. 2000), principal component analysis (Niazy et al. 2005), Taylor series (Wan et al. 2006) and a conventional temporal ICA algorithm. Models of gradient artifacts were derived from simulations as well as a water phantom and performance of each method was evaluated on datasets constructed using visual event-related potentials (ERPs) as well as resting EEG. Our new methods recovered ERPs and resting EEG below the beta band (< 12.5 Hz) with high signal-to-noise ratio (SNR > 4). Our algorithms outperformed all of these methods on resting EEG in the theta- and alpha-bands (SNR > 4); however, for all methods, signal recovery was modest (SNR ~ 1) in the beta-band and poor (SNR < 0.3) in the gamma-band and above. We found that the conventional ICA algorithm performed poorly with uniformly low SNR (< 0.1). Taken together, our new ICA-based methods offer a more robust technique for gradient artifact reduction when scanning at 3T using spiral in-out and echo-planar pulse sequences. We provide new insights into the strengths and weaknesses of each method using a unified subspace framework. PMID:19580873

  1. Sample-Poor Estimation of Order and Common Signal Subspace with Application to Fusion of Medical Imaging Data

    PubMed Central

    Levin-Schwartz, Yuri; Song, Yang; Schreier, Peter J.; Calhoun, Vince D.; Adalı, Tülay

    2016-01-01

    Due to their data-driven nature, multivariate methods such as canonical correlation analysis (CCA) have proven very useful for fusion of multimodal neurological data. However, being able to determine the degree of similarity between datasets and appropriate order selection are crucial to the success of such techniques. The standard methods for calculating the order of multimodal data focus only on sources with the greatest individual energy and ignore relations across datasets. Additionally, these techniques as well as the most widely-used methods for determining the degree of similarity between datasets assume sufficient sample support and are not effective in the sample-poor regime. In this paper, we propose to jointly estimate the degree of similarity between datasets and their order when few samples are present using principal component analysis and canonical correlation analysis (PCA-CCA). By considering these two problems simultaneously, we are able to minimize the assumptions placed on the data and achieve superior performance in the sample-poor regime compared to traditional techniques. We apply PCA-CCA to the pairwise combinations of functional magnetic resonance imaging (fMRI), structural magnetic resonance imaging (sMRI), and electroencephalogram (EEG) data drawn from patients with schizophrenia and healthy controls while performing an auditory oddball task. The PCA-CCA results indicate that the fMRI and sMRI datasets are the most similar, whereas the sMRI and EEG datasets share the least similarity. We also demonstrate that the degree of similarity obtained by PCA-CCA is highly predictive of the degree of significance found for components generated using CCA. PMID:27039696

  2. Bayesian Inference for Functional Dynamics Exploring in fMRI Data.

    PubMed

    Guo, Xuan; Liu, Bing; Chen, Le; Chen, Guantao; Pan, Yi; Zhang, Jing

    2016-01-01

    This paper aims to review state-of-the-art Bayesian-inference-based methods applied to functional magnetic resonance imaging (fMRI) data. Particularly, we focus on one specific long-standing challenge in the computational modeling of fMRI datasets: how to effectively explore typical functional interactions from fMRI time series and the corresponding boundaries of temporal segments. Bayesian inference is a method of statistical inference which has been shown to be a powerful tool to encode dependence relationships among the variables with uncertainty. Here we provide an introduction to a group of Bayesian-inference-based methods for fMRI data analysis, which were designed to detect magnitude or functional connectivity change points and to infer their functional interaction patterns based on corresponding temporal boundaries. We also provide a comparison of three popular Bayesian models, that is, Bayesian Magnitude Change Point Model (BMCPM), Bayesian Connectivity Change Point Model (BCCPM), and Dynamic Bayesian Variable Partition Model (DBVPM), and give a summary of their applications. We envision that more delicate Bayesian inference models will be emerging and play increasingly important roles in modeling brain functions in the years to come.

  3. Missing value imputation for microarray data: a comprehensive comparison study and a web tool.

    PubMed

    Chiu, Chia-Chun; Chan, Shih-Yao; Wang, Chung-Ching; Wu, Wei-Sheng

    2013-01-01

    Microarray data are usually peppered with missing values due to various reasons. However, most of the downstream analyses for microarray data require complete datasets. Therefore, accurate algorithms for missing value estimation are needed for improving the performance of microarray data analyses. Although many algorithms have been developed, there are many debates on the selection of the optimal algorithm. The studies about the performance comparison of different algorithms are still incomprehensive, especially in the number of benchmark datasets used, the number of algorithms compared, the rounds of simulation conducted, and the performance measures used. In this paper, we performed a comprehensive comparison by using (I) thirteen datasets, (II) nine algorithms, (III) 110 independent runs of simulation, and (IV) three types of measures to evaluate the performance of each imputation algorithm fairly. First, the effects of different types of microarray datasets on the performance of each imputation algorithm were evaluated. Second, we discussed whether the datasets from different species have different impact on the performance of different algorithms. To assess the performance of each algorithm fairly, all evaluations were performed using three types of measures. Our results indicate that the performance of an imputation algorithm mainly depends on the type of a dataset but not on the species where the samples come from. In addition to the statistical measure, two other measures with biological meanings are useful to reflect the impact of missing value imputation on the downstream data analyses. Our study suggests that local-least-squares-based methods are good choices to handle missing values for most of the microarray datasets. In this work, we carried out a comprehensive comparison of the algorithms for microarray missing value imputation. Based on such a comprehensive comparison, researchers could choose the optimal algorithm for their datasets easily. Moreover, new imputation algorithms could be compared with the existing algorithms using this comparison strategy as a standard protocol. In addition, to assist researchers in dealing with missing values easily, we built a web-based and easy-to-use imputation tool, MissVIA (http://cosbi.ee.ncku.edu.tw/MissVIA), which supports many imputation algorithms. Once users upload a real microarray dataset and choose the imputation algorithms, MissVIA will determine the optimal algorithm for the users' data through a series of simulations, and then the imputed results can be downloaded for the downstream data analyses.

  4. Spectral spatiotemporal imaging of cortical oscillations and interactions in the human brain

    PubMed Central

    Lin, Fa-Hsuan; Witzel, Thomas; Hämäläinen, Matti S.; Dale, Anders M.; Belliveau, John W.; Stufflebeam, Steven M.

    2010-01-01

    This paper presents a computationally efficient source estimation algorithm that localizes cortical oscillations and their phase relationships. The proposed method employs wavelet-transformed magnetoencephalography (MEG) data and uses anatomical MRI to constrain the current locations to the cortical mantle. In addition, the locations of the sources can be further confined with the help of functional MRI (fMRI) data. As a result, we obtain spatiotemporal maps of spectral power and phase relationships. As an example, we show how the phase locking value (PLV), that is, the trial-by-trial phase relationship between the stimulus and response, can be imaged on the cortex. We apply the method to spontaneous, evoked, and driven cortical oscillations measured with MEG. We test the method of combining MEG, structural MRI, and fMRI using simulated cortical oscillations along Heschl’s gyrus (HG). We also analyze sustained auditory gamma-band neuromagnetic fields from MEG and fMRI measurements. Our results show that combining the MEG recording with fMRI improves source localization for the non-noise-normalized wavelet power. In contrast, noise-normalized spectral power or PLV localization may not benefit from the fMRI constraint. We show that if the thresholds are not properly chosen, noise-normalized spectral power or PLV estimates may contain false (phantom) sources, independent of the inclusion of the fMRI prior information. The proposed algorithm can be used for evoked MEG/EEG and block-designed or event-related fMRI paradigms, or for spontaneous MEG data sets. Spectral spatiotemporal imaging of cortical oscillations and interactions in the human brain can provide further understanding of large-scale neural activity and communication between different brain regions. PMID:15488408

  5. The effect of machine learning regression algorithms and sample size on individualized behavioral prediction with functional connectivity features.

    PubMed

    Cui, Zaixu; Gong, Gaolang

    2018-06-02

    Individualized behavioral/cognitive prediction using machine learning (ML) regression approaches is becoming increasingly applied. The specific ML regression algorithm and sample size are two key factors that non-trivially influence prediction accuracies. However, the effects of the ML regression algorithm and sample size on individualized behavioral/cognitive prediction performance have not been comprehensively assessed. To address this issue, the present study included six commonly used ML regression algorithms: ordinary least squares (OLS) regression, least absolute shrinkage and selection operator (LASSO) regression, ridge regression, elastic-net regression, linear support vector regression (LSVR), and relevance vector regression (RVR), to perform specific behavioral/cognitive predictions based on different sample sizes. Specifically, the publicly available resting-state functional MRI (rs-fMRI) dataset from the Human Connectome Project (HCP) was used, and whole-brain resting-state functional connectivity (rsFC) or rsFC strength (rsFCS) were extracted as prediction features. Twenty-five sample sizes (ranged from 20 to 700) were studied by sub-sampling from the entire HCP cohort. The analyses showed that rsFC-based LASSO regression performed remarkably worse than the other algorithms, and rsFCS-based OLS regression performed markedly worse than the other algorithms. Regardless of the algorithm and feature type, both the prediction accuracy and its stability exponentially increased with increasing sample size. The specific patterns of the observed algorithm and sample size effects were well replicated in the prediction using re-testing fMRI data, data processed by different imaging preprocessing schemes, and different behavioral/cognitive scores, thus indicating excellent robustness/generalization of the effects. The current findings provide critical insight into how the selected ML regression algorithm and sample size influence individualized predictions of behavior/cognition and offer important guidance for choosing the ML regression algorithm or sample size in relevant investigations. Copyright © 2018 Elsevier Inc. All rights reserved.

  6. Artificial neural network for suppression of banding artifacts in balanced steady-state free precession MRI.

    PubMed

    Kim, Ki Hwan; Park, Sung-Hong

    2017-04-01

    The balanced steady-state free precession (bSSFP) MR sequence is frequently used in clinics, but is sensitive to off-resonance effects, which can cause banding artifacts. Often multiple bSSFP datasets are acquired at different phase cycling (PC) angles and then combined in a special way for banding artifact suppression. Many strategies of combining the datasets have been suggested for banding artifact suppression, but there are still limitations in their performance, especially when the number of phase-cycled bSSFP datasets is small. The purpose of this study is to develop a learning-based model to combine the multiple phase-cycled bSSFP datasets for better banding artifact suppression. Multilayer perceptron (MLP) is a feedforward artificial neural network consisting of three layers of input, hidden, and output layers. MLP models were trained by input bSSFP datasets acquired from human brain and knee at 3T, which were separately performed for two and four PC angles. Banding-free bSSFP images were generated by maximum-intensity projection (MIP) of 8 or 12 phase-cycled datasets and were used as targets for training the output layer. The trained MLP models were applied to another brain and knee datasets acquired with different scan parameters and also to multiple phase-cycled bSSFP functional MRI datasets acquired on rat brain at 9.4T, in comparison with the conventional MIP method. Simulations were also performed to validate the MLP approach. Both the simulations and human experiments demonstrated that MLP suppressed banding artifacts significantly, superior to MIP in both banding artifact suppression and SNR efficiency. MLP demonstrated superior performance over MIP for the 9.4T fMRI data as well, which was not used for training the models, while visually preserving the fMRI maps very well. Artificial neural network is a promising technique for combining multiple phase-cycled bSSFP datasets for banding artifact suppression. Copyright © 2016 Elsevier Inc. All rights reserved.

  7. Scale-Free and Multifractal Time Dynamics of fMRI Signals during Rest and Task

    PubMed Central

    Ciuciu, P.; Varoquaux, G.; Abry, P.; Sadaghiani, S.; Kleinschmidt, A.

    2012-01-01

    Scaling temporal dynamics in functional MRI (fMRI) signals have been evidenced for a decade as intrinsic characteristics of ongoing brain activity (Zarahn et al., 1997). Recently, scaling properties were shown to fluctuate across brain networks and to be modulated between rest and task (He, 2011): notably, Hurst exponent, quantifying long memory, decreases under task in activating and deactivating brain regions. In most cases, such results were obtained: First, from univariate (voxelwise or regionwise) analysis, hence focusing on specific cognitive systems such as Resting-State Networks (RSNs) and raising the issue of the specificity of this scale-free dynamics modulation in RSNs. Second, using analysis tools designed to measure a single scaling exponent related to the second order statistics of the data, thus relying on models that either implicitly or explicitly assume Gaussianity and (asymptotic) self-similarity, while fMRI signals may significantly depart from those either of those two assumptions (Ciuciu et al., 2008; Wink et al., 2008). To address these issues, the present contribution elaborates on the analysis of the scaling properties of fMRI temporal dynamics by proposing two significant variations. First, scaling properties are technically investigated using the recently introduced Wavelet Leader-based Multifractal formalism (WLMF; Wendt et al., 2007). This measures a collection of scaling exponents, thus enables a richer and more versatile description of scale invariance (beyond correlation and Gaussianity), referred to as multifractality. Also, it benefits from improved estimation performance compared to tools previously used in the literature. Second, scaling properties are investigated in both RSN and non-RSN structures (e.g., artifacts), at a broader spatial scale than the voxel one, using a multivariate approach, namely the Multi-Subject Dictionary Learning (MSDL) algorithm (Varoquaux et al., 2011) that produces a set of spatial components that appear more sparse than their Independent Component Analysis (ICA) counterpart. These tools are combined and applied to a fMRI dataset comprising 12 subjects with resting-state and activation runs (Sadaghiani et al., 2009). Results stemming from those analysis confirm the already reported task-related decrease of long memory in functional networks, but also show that it occurs in artifacts, thus making this feature not specific to functional networks. Further, results indicate that most fMRI signals appear multifractal at rest except in non-cortical regions. Task-related modulation of multifractality appears only significant in functional networks and thus can be considered as the key property disentangling functional networks from artifacts. These finding are discussed in the light of the recent literature reporting scaling dynamics of EEG microstate sequences at rest and addressing non-stationarity issues in temporally independent fMRI modes. PMID:22715328

  8. Assessment of Chlorophyll-a Algorithms Considering Different Trophic Statuses and Optimal Bands.

    PubMed

    Salem, Salem Ibrahim; Higa, Hiroto; Kim, Hyungjun; Kobayashi, Hiroshi; Oki, Kazuo; Oki, Taikan

    2017-07-31

    Numerous algorithms have been proposed to retrieve chlorophyll- a concentrations in Case 2 waters; however, the retrieval accuracy is far from satisfactory. In this research, seven algorithms are assessed with different band combinations of multispectral and hyperspectral bands using linear (LN), quadratic polynomial (QP) and power (PW) regression approaches, resulting in altogether 43 algorithmic combinations. These algorithms are evaluated by using simulated and measured datasets to understand the strengths and limitations of these algorithms. Two simulated datasets comprising 500,000 reflectance spectra each, both based on wide ranges of inherent optical properties (IOPs), are generated for the calibration and validation stages. Results reveal that the regression approach (i.e., LN, QP, and PW) has more influence on the simulated dataset than on the measured one. The algorithms that incorporated linear regression provide the highest retrieval accuracy for the simulated dataset. Results from simulated datasets reveal that the 3-band (3b) algorithm that incorporate 665-nm and 680-nm bands and band tuning selection approach outperformed other algorithms with root mean square error (RMSE) of 15.87 mg·m -3 , 16.25 mg·m -3 , and 19.05 mg·m -3 , respectively. The spatial distribution of the best performing algorithms, for various combinations of chlorophyll- a (Chla) and non-algal particles (NAP) concentrations, show that the 3b_tuning_QP and 3b_680_QP outperform other algorithms in terms of minimum RMSE frequency of 33.19% and 60.52%, respectively. However, the two algorithms failed to accurately retrieve Chla for many combinations of Chla and NAP, particularly for low Chla and NAP concentrations. In addition, the spatial distribution emphasizes that no single algorithm can provide outstanding accuracy for Chla retrieval and that multi-algorithms should be included to reduce the error. Comparing the results of the measured and simulated datasets reveal that the algorithms that incorporate the 665-nm band outperform other algorithms for measured dataset (RMSE = 36.84 mg·m -3 ), while algorithms that incorporate the band tuning approach provide the highest retrieval accuracy for the simulated dataset (RMSE = 25.05 mg·m -3 ).

  9. Assessment of Chlorophyll-a Algorithms Considering Different Trophic Statuses and Optimal Bands

    PubMed Central

    Higa, Hiroto; Kobayashi, Hiroshi; Oki, Kazuo

    2017-01-01

    Numerous algorithms have been proposed to retrieve chlorophyll-a concentrations in Case 2 waters; however, the retrieval accuracy is far from satisfactory. In this research, seven algorithms are assessed with different band combinations of multispectral and hyperspectral bands using linear (LN), quadratic polynomial (QP) and power (PW) regression approaches, resulting in altogether 43 algorithmic combinations. These algorithms are evaluated by using simulated and measured datasets to understand the strengths and limitations of these algorithms. Two simulated datasets comprising 500,000 reflectance spectra each, both based on wide ranges of inherent optical properties (IOPs), are generated for the calibration and validation stages. Results reveal that the regression approach (i.e., LN, QP, and PW) has more influence on the simulated dataset than on the measured one. The algorithms that incorporated linear regression provide the highest retrieval accuracy for the simulated dataset. Results from simulated datasets reveal that the 3-band (3b) algorithm that incorporate 665-nm and 680-nm bands and band tuning selection approach outperformed other algorithms with root mean square error (RMSE) of 15.87 mg·m−3, 16.25 mg·m−3, and 19.05 mg·m−3, respectively. The spatial distribution of the best performing algorithms, for various combinations of chlorophyll-a (Chla) and non-algal particles (NAP) concentrations, show that the 3b_tuning_QP and 3b_680_QP outperform other algorithms in terms of minimum RMSE frequency of 33.19% and 60.52%, respectively. However, the two algorithms failed to accurately retrieve Chla for many combinations of Chla and NAP, particularly for low Chla and NAP concentrations. In addition, the spatial distribution emphasizes that no single algorithm can provide outstanding accuracy for Chla retrieval and that multi-algorithms should be included to reduce the error. Comparing the results of the measured and simulated datasets reveal that the algorithms that incorporate the 665-nm band outperform other algorithms for measured dataset (RMSE = 36.84 mg·m−3), while algorithms that incorporate the band tuning approach provide the highest retrieval accuracy for the simulated dataset (RMSE = 25.05 mg·m−3). PMID:28758984

  10. Active learning for clinical text classification: is it better than random sampling?

    PubMed

    Figueroa, Rosa L; Zeng-Treitler, Qing; Ngo, Long H; Goryachev, Sergey; Wiechmann, Eduardo P

    2012-01-01

    This study explores active learning algorithms as a way to reduce the requirements for large training sets in medical text classification tasks. Three existing active learning algorithms (distance-based (DIST), diversity-based (DIV), and a combination of both (CMB)) were used to classify text from five datasets. The performance of these algorithms was compared to that of passive learning on the five datasets. We then conducted a novel investigation of the interaction between dataset characteristics and the performance results. Classification accuracy and area under receiver operating characteristics (ROC) curves for each algorithm at different sample sizes were generated. The performance of active learning algorithms was compared with that of passive learning using a weighted mean of paired differences. To determine why the performance varies on different datasets, we measured the diversity and uncertainty of each dataset using relative entropy and correlated the results with the performance differences. The DIST and CMB algorithms performed better than passive learning. With a statistical significance level set at 0.05, DIST outperformed passive learning in all five datasets, while CMB was found to be better than passive learning in four datasets. We found strong correlations between the dataset diversity and the DIV performance, as well as the dataset uncertainty and the performance of the DIST algorithm. For medical text classification, appropriate active learning algorithms can yield performance comparable to that of passive learning with considerably smaller training sets. In particular, our results suggest that DIV performs better on data with higher diversity and DIST on data with lower uncertainty.

  11. Active learning for clinical text classification: is it better than random sampling?

    PubMed Central

    Figueroa, Rosa L; Ngo, Long H; Goryachev, Sergey; Wiechmann, Eduardo P

    2012-01-01

    Objective This study explores active learning algorithms as a way to reduce the requirements for large training sets in medical text classification tasks. Design Three existing active learning algorithms (distance-based (DIST), diversity-based (DIV), and a combination of both (CMB)) were used to classify text from five datasets. The performance of these algorithms was compared to that of passive learning on the five datasets. We then conducted a novel investigation of the interaction between dataset characteristics and the performance results. Measurements Classification accuracy and area under receiver operating characteristics (ROC) curves for each algorithm at different sample sizes were generated. The performance of active learning algorithms was compared with that of passive learning using a weighted mean of paired differences. To determine why the performance varies on different datasets, we measured the diversity and uncertainty of each dataset using relative entropy and correlated the results with the performance differences. Results The DIST and CMB algorithms performed better than passive learning. With a statistical significance level set at 0.05, DIST outperformed passive learning in all five datasets, while CMB was found to be better than passive learning in four datasets. We found strong correlations between the dataset diversity and the DIV performance, as well as the dataset uncertainty and the performance of the DIST algorithm. Conclusion For medical text classification, appropriate active learning algorithms can yield performance comparable to that of passive learning with considerably smaller training sets. In particular, our results suggest that DIV performs better on data with higher diversity and DIST on data with lower uncertainty. PMID:22707743

  12. Combination of rs-fMRI and sMRI Data to Discriminate Autism Spectrum Disorders in Young Children Using Deep Belief Network.

    PubMed

    Akhavan Aghdam, Maryam; Sharifi, Arash; Pedram, Mir Mohsen

    2018-05-07

    In recent years, the use of advanced magnetic resonance (MR) imaging methods such as functional magnetic resonance imaging (fMRI) and structural magnetic resonance imaging (sMRI) has recorded a great increase in neuropsychiatric disorders. Deep learning is a branch of machine learning that is increasingly being used for applications of medical image analysis such as computer-aided diagnosis. In a bid to classify and represent learning tasks, this study utilized one of the most powerful deep learning algorithms (deep belief network (DBN)) for the combination of data from Autism Brain Imaging Data Exchange I and II (ABIDE I and ABIDE II) datasets. The DBN was employed so as to focus on the combination of resting-state fMRI (rs-fMRI), gray matter (GM), and white matter (WM) data. This was done based on the brain regions that were defined using the automated anatomical labeling (AAL), in order to classify autism spectrum disorders (ASDs) from typical controls (TCs). Since the diagnosis of ASD is much more effective at an early age, only 185 individuals (116 ASD and 69 TC) ranging in age from 5 to 10 years were included in this analysis. In contrast, the proposed method is used to exploit the latent or abstract high-level features inside rs-fMRI and sMRI data while the old methods consider only the simple low-level features extracted from neuroimages. Moreover, combining multiple data types and increasing the depth of DBN can improve classification accuracy. In this study, the best combination comprised rs-fMRI, GM, and WM for DBN of depth 3 with 65.56% accuracy (sensitivity = 84%, specificity = 32.96%, F1 score = 74.76%) obtained via 10-fold cross-validation. This result outperforms previously presented methods on ABIDE I dataset.

  13. A two-step super-Gaussian independent component analysis approach for fMRI data.

    PubMed

    Ge, Ruiyang; Yao, Li; Zhang, Hang; Long, Zhiying

    2015-09-01

    Independent component analysis (ICA) has been widely applied to functional magnetic resonance imaging (fMRI) data analysis. Although ICA assumes that the sources underlying data are statistically independent, it usually ignores sources' additional properties, such as sparsity. In this study, we propose a two-step super-GaussianICA (2SGICA) method that incorporates the sparse prior of the sources into the ICA model. 2SGICA uses the super-Gaussian ICA (SGICA) algorithm that is based on a simplified Lewicki-Sejnowski's model to obtain the initial source estimate in the first step. Using a kernel estimator technique, the source density is acquired and fitted to the Laplacian function based on the initial source estimates. The fitted Laplacian prior is used for each source at the second SGICA step. Moreover, the automatic target generation process for initial value generation is used in 2SGICA to guarantee the stability of the algorithm. An adaptive step size selection criterion is also implemented in the proposed algorithm. We performed experimental tests on both simulated data and real fMRI data to investigate the feasibility and robustness of 2SGICA and made a performance comparison between InfomaxICA, FastICA, mean field ICA (MFICA) with Laplacian prior, sparse online dictionary learning (ODL), SGICA and 2SGICA. Both simulated and real fMRI experiments showed that the 2SGICA was most robust to noises, and had the best spatial detection power and the time course estimation among the six methods. Copyright © 2015. Published by Elsevier Inc.

  14. An evaluation of independent component analyses with an application to resting-state fMRI

    PubMed Central

    Matteson, David S.; Ruppert, David; Eloyan, Ani; Caffo, Brian S.

    2013-01-01

    Summary We examine differences between independent component analyses (ICAs) arising from different as-sumptions, measures of dependence, and starting points of the algorithms. ICA is a popular method with diverse applications including artifact removal in electrophysiology data, feature extraction in microarray data, and identifying brain networks in functional magnetic resonance imaging (fMRI). ICA can be viewed as a generalization of principal component analysis (PCA) that takes into account higher-order cross-correlations. Whereas the PCA solution is unique, there are many ICA methods–whose solutions may differ. Infomax, FastICA, and JADE are commonly applied to fMRI studies, with FastICA being arguably the most popular. Hastie and Tibshirani (2003) demonstrated that ProDenICA outperformed FastICA in simulations with two components. We introduce the application of ProDenICA to simulations with more components and to fMRI data. ProDenICA was more accurate in simulations, and we identified differences between biologically meaningful ICs from ProDenICA versus other methods in the fMRI analysis. ICA methods require nonconvex optimization, yet current practices do not recognize the importance of, nor adequately address sensitivity to, initial values. We found that local optima led to dramatically different estimates in both simulations and group ICA of fMRI, and we provide evidence that the global optimum from ProDenICA is the best estimate. We applied a modification of the Hungarian (Kuhn-Munkres) algorithm to match ICs from multiple estimates, thereby gaining novel insights into how brain networks vary in their sensitivity to initial values and ICA method. PMID:24350655

  15. Correction of 3D rigid body motion in fMRI time series by independent estimation of rotational and translational effects in k-space.

    PubMed

    Costagli, Mauro; Waggoner, R Allen; Ueno, Kenichi; Tanaka, Keiji; Cheng, Kang

    2009-04-15

    In functional magnetic resonance imaging (fMRI), even subvoxel motion dramatically corrupts the blood oxygenation level-dependent (BOLD) signal, invalidating the assumption that intensity variation in time is primarily due to neuronal activity. Thus, correction of the subject's head movements is a fundamental step to be performed prior to data analysis. Most motion correction techniques register a series of volumes assuming that rigid body motion, characterized by rotational and translational parameters, occurs. Unlike the most widely used applications for fMRI data processing, which correct motion in the image domain by numerically estimating rotational and translational components simultaneously, the algorithm presented here operates in a three-dimensional k-space, to decouple and correct rotations and translations independently, offering new ways and more flexible procedures to estimate the parameters of interest. We developed an implementation of this method in MATLAB, and tested it on both simulated and experimental data. Its performance was quantified in terms of square differences and center of mass stability across time. Our data show that the algorithm proposed here successfully corrects for rigid-body motion, and its employment in future fMRI studies is feasible and promising.

  16. Decoding of visual activity patterns from fMRI responses using multivariate pattern analyses and convolutional neural network.

    PubMed

    Zafar, Raheel; Kamel, Nidal; Naufal, Mohamad; Malik, Aamir Saeed; Dass, Sarat C; Ahmad, Rana Fayyaz; Abdullah, Jafri M; Reza, Faruque

    2017-01-01

    Decoding of human brain activity has always been a primary goal in neuroscience especially with functional magnetic resonance imaging (fMRI) data. In recent years, Convolutional neural network (CNN) has become a popular method for the extraction of features due to its higher accuracy, however it needs a lot of computation and training data. In this study, an algorithm is developed using Multivariate pattern analysis (MVPA) and modified CNN to decode the behavior of brain for different images with limited data set. Selection of significant features is an important part of fMRI data analysis, since it reduces the computational burden and improves the prediction performance; significant features are selected using t-test. MVPA uses machine learning algorithms to classify different brain states and helps in prediction during the task. General linear model (GLM) is used to find the unknown parameters of every individual voxel and the classification is done using multi-class support vector machine (SVM). MVPA-CNN based proposed algorithm is compared with region of interest (ROI) based method and MVPA based estimated values. The proposed method showed better overall accuracy (68.6%) compared to ROI (61.88%) and estimation values (64.17%).

  17. Temporal interpolation alters motion in fMRI scans: Magnitudes and consequences for artifact detection.

    PubMed

    Power, Jonathan D; Plitt, Mark; Kundu, Prantik; Bandettini, Peter A; Martin, Alex

    2017-01-01

    Head motion can be estimated at any point of fMRI image processing. Processing steps involving temporal interpolation (e.g., slice time correction or outlier replacement) often precede motion estimation in the literature. From first principles it can be anticipated that temporal interpolation will alter head motion in a scan. Here we demonstrate this effect and its consequences in five large fMRI datasets. Estimated head motion was reduced by 10-50% or more following temporal interpolation, and reductions were often visible to the naked eye. Such reductions make the data seem to be of improved quality. Such reductions also degrade the sensitivity of analyses aimed at detecting motion-related artifact and can cause a dataset with artifact to falsely appear artifact-free. These reduced motion estimates will be particularly problematic for studies needing estimates of motion in time, such as studies of dynamics. Based on these findings, it is sensible to obtain motion estimates prior to any image processing (regardless of subsequent processing steps and the actual timing of motion correction procedures, which need not be changed). We also find that outlier replacement procedures change signals almost entirely during times of motion and therefore have notable similarities to motion-targeting censoring strategies (which withhold or replace signals entirely during times of motion).

  18. Replicability of time-varying connectivity patterns in large resting state fMRI samples.

    PubMed

    Abrol, Anees; Damaraju, Eswar; Miller, Robyn L; Stephen, Julia M; Claus, Eric D; Mayer, Andrew R; Calhoun, Vince D

    2017-12-01

    The past few years have seen an emergence of approaches that leverage temporal changes in whole-brain patterns of functional connectivity (the chronnectome). In this chronnectome study, we investigate the replicability of the human brain's inter-regional coupling dynamics during rest by evaluating two different dynamic functional network connectivity (dFNC) analysis frameworks using 7 500 functional magnetic resonance imaging (fMRI) datasets. To quantify the extent to which the emergent functional connectivity (FC) patterns are reproducible, we characterize the temporal dynamics by deriving several summary measures across multiple large, independent age-matched samples. Reproducibility was demonstrated through the existence of basic connectivity patterns (FC states) amidst an ensemble of inter-regional connections. Furthermore, application of the methods to conservatively configured (statistically stationary, linear and Gaussian) surrogate datasets revealed that some of the studied state summary measures were indeed statistically significant and also suggested that this class of null model did not explain the fMRI data fully. This extensive testing of reproducibility of similarity statistics also suggests that the estimated FC states are robust against variation in data quality, analysis, grouping, and decomposition methods. We conclude that future investigations probing the functional and neurophysiological relevance of time-varying connectivity assume critical importance. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.

  19. Replicability of time-varying connectivity patterns in large resting state fMRI samples

    PubMed Central

    Abrol, Anees; Damaraju, Eswar; Miller, Robyn L.; Stephen, Julia M.; Claus, Eric D.; Mayer, Andrew R.; Calhoun, Vince D.

    2018-01-01

    The past few years have seen an emergence of approaches that leverage temporal changes in whole-brain patterns of functional connectivity (the chronnectome). In this chronnectome study, we investigate the replicability of the human brain’s inter-regional coupling dynamics during rest by evaluating two different dynamic functional network connectivity (dFNC) analysis frameworks using 7 500 functional magnetic resonance imaging (fMRI) datasets. To quantify the extent to which the emergent functional connectivity (FC) patterns are reproducible, we characterize the temporal dynamics by deriving several summary measures across multiple large, independent age-matched samples. Reproducibility was demonstrated through the existence of basic connectivity patterns (FC states) amidst an ensemble of inter-regional connections. Furthermore, application of the methods to conservatively configured (statistically stationary, linear and Gaussian) surrogate datasets revealed that some of the studied state summary measures were indeed statistically significant and also suggested that this class of null model did not explain the fMRI data fully. This extensive testing of reproducibility of similarity statistics also suggests that the estimated FC states are robust against variation in data quality, analysis, grouping, and decomposition methods. We conclude that future investigations probing the functional and neurophysiological relevance of time-varying connectivity assume critical importance. PMID:28916181

  20. Temporal interpolation alters motion in fMRI scans: Magnitudes and consequences for artifact detection

    PubMed Central

    Plitt, Mark; Kundu, Prantik; Bandettini, Peter A.; Martin, Alex

    2017-01-01

    Head motion can be estimated at any point of fMRI image processing. Processing steps involving temporal interpolation (e.g., slice time correction or outlier replacement) often precede motion estimation in the literature. From first principles it can be anticipated that temporal interpolation will alter head motion in a scan. Here we demonstrate this effect and its consequences in five large fMRI datasets. Estimated head motion was reduced by 10–50% or more following temporal interpolation, and reductions were often visible to the naked eye. Such reductions make the data seem to be of improved quality. Such reductions also degrade the sensitivity of analyses aimed at detecting motion-related artifact and can cause a dataset with artifact to falsely appear artifact-free. These reduced motion estimates will be particularly problematic for studies needing estimates of motion in time, such as studies of dynamics. Based on these findings, it is sensible to obtain motion estimates prior to any image processing (regardless of subsequent processing steps and the actual timing of motion correction procedures, which need not be changed). We also find that outlier replacement procedures change signals almost entirely during times of motion and therefore have notable similarities to motion-targeting censoring strategies (which withhold or replace signals entirely during times of motion). PMID:28880888

  1. Modified Bat Algorithm for Feature Selection with the Wisconsin Diagnosis Breast Cancer (WDBC) Dataset

    PubMed

    Jeyasingh, Suganthi; Veluchamy, Malathi

    2017-05-01

    Early diagnosis of breast cancer is essential to save lives of patients. Usually, medical datasets include a large variety of data that can lead to confusion during diagnosis. The Knowledge Discovery on Database (KDD) process helps to improve efficiency. It requires elimination of inappropriate and repeated data from the dataset before final diagnosis. This can be done using any of the feature selection algorithms available in data mining. Feature selection is considered as a vital step to increase the classification accuracy. This paper proposes a Modified Bat Algorithm (MBA) for feature selection to eliminate irrelevant features from an original dataset. The Bat algorithm was modified using simple random sampling to select the random instances from the dataset. Ranking was with the global best features to recognize the predominant features available in the dataset. The selected features are used to train a Random Forest (RF) classification algorithm. The MBA feature selection algorithm enhanced the classification accuracy of RF in identifying the occurrence of breast cancer. The Wisconsin Diagnosis Breast Cancer Dataset (WDBC) was used for estimating the performance analysis of the proposed MBA feature selection algorithm. The proposed algorithm achieved better performance in terms of Kappa statistic, Mathew’s Correlation Coefficient, Precision, F-measure, Recall, Mean Absolute Error (MAE), Root Mean Square Error (RMSE), Relative Absolute Error (RAE) and Root Relative Squared Error (RRSE). Creative Commons Attribution License

  2. Missing value imputation for microarray data: a comprehensive comparison study and a web tool

    PubMed Central

    2013-01-01

    Background Microarray data are usually peppered with missing values due to various reasons. However, most of the downstream analyses for microarray data require complete datasets. Therefore, accurate algorithms for missing value estimation are needed for improving the performance of microarray data analyses. Although many algorithms have been developed, there are many debates on the selection of the optimal algorithm. The studies about the performance comparison of different algorithms are still incomprehensive, especially in the number of benchmark datasets used, the number of algorithms compared, the rounds of simulation conducted, and the performance measures used. Results In this paper, we performed a comprehensive comparison by using (I) thirteen datasets, (II) nine algorithms, (III) 110 independent runs of simulation, and (IV) three types of measures to evaluate the performance of each imputation algorithm fairly. First, the effects of different types of microarray datasets on the performance of each imputation algorithm were evaluated. Second, we discussed whether the datasets from different species have different impact on the performance of different algorithms. To assess the performance of each algorithm fairly, all evaluations were performed using three types of measures. Our results indicate that the performance of an imputation algorithm mainly depends on the type of a dataset but not on the species where the samples come from. In addition to the statistical measure, two other measures with biological meanings are useful to reflect the impact of missing value imputation on the downstream data analyses. Our study suggests that local-least-squares-based methods are good choices to handle missing values for most of the microarray datasets. Conclusions In this work, we carried out a comprehensive comparison of the algorithms for microarray missing value imputation. Based on such a comprehensive comparison, researchers could choose the optimal algorithm for their datasets easily. Moreover, new imputation algorithms could be compared with the existing algorithms using this comparison strategy as a standard protocol. In addition, to assist researchers in dealing with missing values easily, we built a web-based and easy-to-use imputation tool, MissVIA (http://cosbi.ee.ncku.edu.tw/MissVIA), which supports many imputation algorithms. Once users upload a real microarray dataset and choose the imputation algorithms, MissVIA will determine the optimal algorithm for the users' data through a series of simulations, and then the imputed results can be downloaded for the downstream data analyses. PMID:24565220

  3. An algorithm for direct causal learning of influences on patient outcomes.

    PubMed

    Rathnam, Chandramouli; Lee, Sanghoon; Jiang, Xia

    2017-01-01

    This study aims at developing and introducing a new algorithm, called direct causal learner (DCL), for learning the direct causal influences of a single target. We applied it to both simulated and real clinical and genome wide association study (GWAS) datasets and compared its performance to classic causal learning algorithms. The DCL algorithm learns the causes of a single target from passive data using Bayesian-scoring, instead of using independence checks, and a novel deletion algorithm. We generate 14,400 simulated datasets and measure the number of datasets for which DCL correctly and partially predicts the direct causes. We then compare its performance with the constraint-based path consistency (PC) and conservative PC (CPC) algorithms, the Bayesian-score based fast greedy search (FGS) algorithm, and the partial ancestral graphs algorithm fast causal inference (FCI). In addition, we extend our comparison of all five algorithms to both a real GWAS dataset and real breast cancer datasets over various time-points in order to observe how effective they are at predicting the causal influences of Alzheimer's disease and breast cancer survival. DCL consistently outperforms FGS, PC, CPC, and FCI in discovering the parents of the target for the datasets simulated using a simple network. Overall, DCL predicts significantly more datasets correctly (McNemar's test significance: p<0.0001) than any of the other algorithms for these network types. For example, when assessing overall performance (simple and complex network results combined), DCL correctly predicts approximately 1400 more datasets than the top FGS method, 1600 more datasets than the top CPC method, 4500 more datasets than the top PC method, and 5600 more datasets than the top FCI method. Although FGS did correctly predict more datasets than DCL for the complex networks, and DCL correctly predicted only a few more datasets than CPC for these networks, there is no significant difference in performance between these three algorithms for this network type. However, when we use a more continuous measure of accuracy, we find that all the DCL methods are able to better partially predict more direct causes than FGS and CPC for the complex networks. In addition, DCL consistently had faster runtimes than the other algorithms. In the application to the real datasets, DCL identified rs6784615, located on the NISCH gene, and rs10824310, located on the PRKG1 gene, as direct causes of late onset Alzheimer's disease (LOAD) development. In addition, DCL identified ER category as a direct predictor of breast cancer mortality within 5 years, and HER2 status as a direct predictor of 10-year breast cancer mortality. These predictors have been identified in previous studies to have a direct causal relationship with their respective phenotypes, supporting the predictive power of DCL. When the other algorithms discovered predictors from the real datasets, these predictors were either also found by DCL or could not be supported by previous studies. Our results show that DCL outperforms FGS, PC, CPC, and FCI in almost every case, demonstrating its potential to advance causal learning. Furthermore, our DCL algorithm effectively identifies direct causes in the LOAD and Metabric GWAS datasets, which indicates its potential for clinical applications. Copyright © 2016 Elsevier B.V. All rights reserved.

  4. Enabling Real-Time Volume Rendering of Functional Magnetic Resonance Imaging on an iOS Device.

    PubMed

    Holub, Joseph; Winer, Eliot

    2017-12-01

    Powerful non-invasive imaging technologies like computed tomography (CT), ultrasound, and magnetic resonance imaging (MRI) are used daily by medical professionals to diagnose and treat patients. While 2D slice viewers have long been the standard, many tools allowing 3D representations of digital medical data are now available. The newest imaging advancement, functional MRI (fMRI) technology, has changed medical imaging from viewing static to dynamic physiology (4D) over time, particularly to study brain activity. Add this to the rapid adoption of mobile devices for everyday work and the need to visualize fMRI data on tablets or smartphones arises. However, there are few mobile tools available to visualize 3D MRI data, let alone 4D fMRI data. Building volume rendering tools on mobile devices to visualize 3D and 4D medical data is challenging given the limited computational power of the devices. This paper describes research that explored the feasibility of performing real-time 3D and 4D volume raycasting on a tablet device. The prototype application was tested on a 9.7" iPad Pro using two different fMRI datasets of brain activity. The results show that mobile raycasting is able to achieve between 20 and 40 frames per second for traditional 3D datasets, depending on the sampling interval, and up to 9 frames per second for 4D data. While the prototype application did not always achieve true real-time interaction, these results clearly demonstrated that visualizing 3D and 4D digital medical data is feasible with a properly constructed software framework.

  5. Learning Computational Models of Video Memorability from fMRI Brain Imaging.

    PubMed

    Han, Junwei; Chen, Changyuan; Shao, Ling; Hu, Xintao; Han, Jungong; Liu, Tianming

    2015-08-01

    Generally, various visual media are unequally memorable by the human brain. This paper looks into a new direction of modeling the memorability of video clips and automatically predicting how memorable they are by learning from brain functional magnetic resonance imaging (fMRI). We propose a novel computational framework by integrating the power of low-level audiovisual features and brain activity decoding via fMRI. Initially, a user study experiment is performed to create a ground truth database for measuring video memorability and a set of effective low-level audiovisual features is examined in this database. Then, human subjects' brain fMRI data are obtained when they are watching the video clips. The fMRI-derived features that convey the brain activity of memorizing videos are extracted using a universal brain reference system. Finally, due to the fact that fMRI scanning is expensive and time-consuming, a computational model is learned on our benchmark dataset with the objective of maximizing the correlation between the low-level audiovisual features and the fMRI-derived features using joint subspace learning. The learned model can then automatically predict the memorability of videos without fMRI scans. Evaluations on publically available image and video databases demonstrate the effectiveness of the proposed framework.

  6. Review and Analysis of Algorithmic Approaches Developed for Prognostics on CMAPSS Dataset

    DTIC Science & Technology

    2014-12-23

    publications for benchmarking prognostics algorithms. The turbofan degradation datasets have received over seven thousand unique downloads in the last five...approaches that researchers have taken to implement prognostics using these turbofan datasets. Some unique characteristics of these datasets are also...Description of the five turbofan degradation datasets available from NASA repository. Datasets #Fault Modes #Conditions #Train Units #Test Units

  7. Automatic and Robust Delineation of the Fiducial Points of the Seismocardiogram Signal for Non-invasive Estimation of Cardiac Time Intervals.

    PubMed

    Khosrow-Khavar, Farzad; Tavakolian, Kouhyar; Blaber, Andrew; Menon, Carlo

    2016-10-12

    The purpose of this research was to design a delineation algorithm that could detect specific fiducial points of the seismocardiogram (SCG) signal with or without using the electrocardiogram (ECG) R-wave as the reference point. The detected fiducial points were used to estimate cardiac time intervals. Due to complexity and sensitivity of the SCG signal, the algorithm was designed to robustly discard the low-quality cardiac cycles, which are the ones that contain unrecognizable fiducial points. The algorithm was trained on a dataset containing 48,318 manually annotated cardiac cycles. It was then applied to three test datasets: 65 young healthy individuals (dataset 1), 15 individuals above 44 years old (dataset 2), and 25 patients with previous heart conditions (dataset 3). The algorithm accomplished high prediction accuracy with the rootmean- square-error of less than 5 ms for all the test datasets. The algorithm overall mean detection rate per individual recordings (DRI) were 74, 68, and 42 percent for the three test datasets when concurrent ECG and SCG were used. For the standalone SCG case, the mean DRI was 32, 14 and 21 percent. When the proposed algorithm applied to concurrent ECG and SCG signals, the desired fiducial points of the SCG signal were successfully estimated with a high detection rate. For the standalone case, however, the algorithm achieved high prediction accuracy and detection rate for only the young individual dataset. The presented algorithm could be used for accurate and non-invasive estimation of cardiac time intervals.

  8. Decoding the encoding of functional brain networks: An fMRI classification comparison of non-negative matrix factorization (NMF), independent component analysis (ICA), and sparse coding algorithms.

    PubMed

    Xie, Jianwen; Douglas, Pamela K; Wu, Ying Nian; Brody, Arthur L; Anderson, Ariana E

    2017-04-15

    Brain networks in fMRI are typically identified using spatial independent component analysis (ICA), yet other mathematical constraints provide alternate biologically-plausible frameworks for generating brain networks. Non-negative matrix factorization (NMF) would suppress negative BOLD signal by enforcing positivity. Spatial sparse coding algorithms (L1 Regularized Learning and K-SVD) would impose local specialization and a discouragement of multitasking, where the total observed activity in a single voxel originates from a restricted number of possible brain networks. The assumptions of independence, positivity, and sparsity to encode task-related brain networks are compared; the resulting brain networks within scan for different constraints are used as basis functions to encode observed functional activity. These encodings are then decoded using machine learning, by using the time series weights to predict within scan whether a subject is viewing a video, listening to an audio cue, or at rest, in 304 fMRI scans from 51 subjects. The sparse coding algorithm of L1 Regularized Learning outperformed 4 variations of ICA (p<0.001) for predicting the task being performed within each scan using artifact-cleaned components. The NMF algorithms, which suppressed negative BOLD signal, had the poorest accuracy compared to the ICA and sparse coding algorithms. Holding constant the effect of the extraction algorithm, encodings using sparser spatial networks (containing more zero-valued voxels) had higher classification accuracy (p<0.001). Lower classification accuracy occurred when the extracted spatial maps contained more CSF regions (p<0.001). The success of sparse coding algorithms suggests that algorithms which enforce sparsity, discourage multitasking, and promote local specialization may capture better the underlying source processes than those which allow inexhaustible local processes such as ICA. Negative BOLD signal may capture task-related activations. Copyright © 2017 Elsevier B.V. All rights reserved.

  9. Signal Sampling for Efficient Sparse Representation of Resting State FMRI Data

    PubMed Central

    Ge, Bao; Makkie, Milad; Wang, Jin; Zhao, Shijie; Jiang, Xi; Li, Xiang; Lv, Jinglei; Zhang, Shu; Zhang, Wei; Han, Junwei; Guo, Lei; Liu, Tianming

    2015-01-01

    As the size of brain imaging data such as fMRI grows explosively, it provides us with unprecedented and abundant information about the brain. How to reduce the size of fMRI data but not lose much information becomes a more and more pressing issue. Recent literature studies tried to deal with it by dictionary learning and sparse representation methods, however, their computation complexities are still high, which hampers the wider application of sparse representation method to large scale fMRI datasets. To effectively address this problem, this work proposes to represent resting state fMRI (rs-fMRI) signals of a whole brain via a statistical sampling based sparse representation. First we sampled the whole brain’s signals via different sampling methods, then the sampled signals were aggregate into an input data matrix to learn a dictionary, finally this dictionary was used to sparsely represent the whole brain’s signals and identify the resting state networks. Comparative experiments demonstrate that the proposed signal sampling framework can speed-up by ten times in reconstructing concurrent brain networks without losing much information. The experiments on the 1000 Functional Connectomes Project further demonstrate its effectiveness and superiority. PMID:26646924

  10. Causal mapping of emotion networks in the human brain: Framework and initial findings.

    PubMed

    Dubois, Julien; Oya, Hiroyuki; Tyszka, J Michael; Howard, Matthew; Eberhardt, Frederick; Adolphs, Ralph

    2017-11-13

    Emotions involve many cortical and subcortical regions, prominently including the amygdala. It remains unknown how these multiple network components interact, and it remains unknown how they cause the behavioral, autonomic, and experiential effects of emotions. Here we describe a framework for combining a novel technique, concurrent electrical stimulation with fMRI (es-fMRI), together with a novel analysis, inferring causal structure from fMRI data (causal discovery). We outline a research program for investigating human emotion with these new tools, and provide initial findings from two large resting-state datasets as well as case studies in neurosurgical patients with electrical stimulation of the amygdala. The overarching goal is to use causal discovery methods on fMRI data to infer causal graphical models of how brain regions interact, and then to further constrain these models with direct stimulation of specific brain regions and concurrent fMRI. We conclude by discussing limitations and future extensions. The approach could yield anatomical hypotheses about brain connectivity, motivate rational strategies for treating mood disorders with deep brain stimulation, and could be extended to animal studies that use combined optogenetic fMRI. Copyright © 2017 Elsevier Ltd. All rights reserved.

  11. Removal of BCG artefact from concurrent fMRI-EEG recordings based on EMD and PCA.

    PubMed

    Javed, Ehtasham; Faye, Ibrahima; Malik, Aamir Saeed; Abdullah, Jafri Malin

    2017-11-01

    Simultaneous electroencephalography (EEG) and functional magnetic resonance image (fMRI) acquisitions provide better insight into brain dynamics. Some artefacts due to simultaneous acquisition pose a threat to the quality of the data. One such problematic artefact is the ballistocardiogram (BCG) artefact. We developed a hybrid algorithm that combines features of empirical mode decomposition (EMD) with principal component analysis (PCA) to reduce the BCG artefact. The algorithm does not require extra electrocardiogram (ECG) or electrooculogram (EOG) recordings to extract the BCG artefact. The method was tested with both simulated and real EEG data of 11 participants. From the simulated data, the similarity index between the extracted BCG and the simulated BCG showed the effectiveness of the proposed method in BCG removal. On the other hand, real data were recorded with two conditions, i.e. resting state (eyes closed dataset) and task influenced (event-related potentials (ERPs) dataset). Using qualitative (visual inspection) and quantitative (similarity index, improved normalized power spectrum (INPS) ratio, power spectrum, sample entropy (SE)) evaluation parameters, the assessment results showed that the proposed method can efficiently reduce the BCG artefact while preserving the neuronal signals. Compared with conventional methods, namely, average artefact subtraction (AAS), optimal basis set (OBS) and combined independent component analysis and principal component analysis (ICA-PCA), the statistical analyses of the results showed that the proposed method has better performance, and the differences were significant for all quantitative parameters except for the power and sample entropy. The proposed method does not require any reference signal, prior information or assumption to extract the BCG artefact. It will be very useful in circumstances where the reference signal is not available. Copyright © 2017 Elsevier B.V. All rights reserved.

  12. Review and Analysis of Algorithmic Approaches Developed for Prognostics on CMAPSS Dataset

    NASA Technical Reports Server (NTRS)

    Ramasso, Emannuel; Saxena, Abhinav

    2014-01-01

    Benchmarking of prognostic algorithms has been challenging due to limited availability of common datasets suitable for prognostics. In an attempt to alleviate this problem several benchmarking datasets have been collected by NASA's prognostic center of excellence and made available to the Prognostics and Health Management (PHM) community to allow evaluation and comparison of prognostics algorithms. Among those datasets are five C-MAPSS datasets that have been extremely popular due to their unique characteristics making them suitable for prognostics. The C-MAPSS datasets pose several challenges that have been tackled by different methods in the PHM literature. In particular, management of high variability due to sensor noise, effects of operating conditions, and presence of multiple simultaneous fault modes are some factors that have great impact on the generalization capabilities of prognostics algorithms. More than 70 publications have used the C-MAPSS datasets for developing data-driven prognostic algorithms. The C-MAPSS datasets are also shown to be well-suited for development of new machine learning and pattern recognition tools for several key preprocessing steps such as feature extraction and selection, failure mode assessment, operating conditions assessment, health status estimation, uncertainty management, and prognostics performance evaluation. This paper summarizes a comprehensive literature review of publications using C-MAPSS datasets and provides guidelines and references to further usage of these datasets in a manner that allows clear and consistent comparison between different approaches.

  13. Efficient sequential and parallel algorithms for record linkage.

    PubMed

    Mamun, Abdullah-Al; Mi, Tian; Aseltine, Robert; Rajasekaran, Sanguthevar

    2014-01-01

    Integrating data from multiple sources is a crucial and challenging problem. Even though there exist numerous algorithms for record linkage or deduplication, they suffer from either large time needs or restrictions on the number of datasets that they can integrate. In this paper we report efficient sequential and parallel algorithms for record linkage which handle any number of datasets and outperform previous algorithms. Our algorithms employ hierarchical clustering algorithms as the basis. A key idea that we use is radix sorting on certain attributes to eliminate identical records before any further processing. Another novel idea is to form a graph that links similar records and find the connected components. Our sequential and parallel algorithms have been tested on a real dataset of 1,083,878 records and synthetic datasets ranging in size from 50,000 to 9,000,000 records. Our sequential algorithm runs at least two times faster, for any dataset, than the previous best-known algorithm, the two-phase algorithm using faster computation of the edit distance (TPA (FCED)). The speedups obtained by our parallel algorithm are almost linear. For example, we get a speedup of 7.5 with 8 cores (residing in a single node), 14.1 with 16 cores (residing in two nodes), and 26.4 with 32 cores (residing in four nodes). We have compared the performance of our sequential algorithm with TPA (FCED) and found that our algorithm outperforms the previous one. The accuracy is the same as that of this previous best-known algorithm.

  14. Disk storage management for LHCb based on Data Popularity estimator

    NASA Astrophysics Data System (ADS)

    Hushchyn, Mikhail; Charpentier, Philippe; Ustyuzhanin, Andrey

    2015-12-01

    This paper presents an algorithm providing recommendations for optimizing the LHCb data storage. The LHCb data storage system is a hybrid system. All datasets are kept as archives on magnetic tapes. The most popular datasets are kept on disks. The algorithm takes the dataset usage history and metadata (size, type, configuration etc.) to generate a recommendation report. This article presents how we use machine learning algorithms to predict future data popularity. Using these predictions it is possible to estimate which datasets should be removed from disk. We use regression algorithms and time series analysis to find the optimal number of replicas for datasets that are kept on disk. Based on the data popularity and the number of replicas optimization, the algorithm minimizes a loss function to find the optimal data distribution. The loss function represents all requirements for data distribution in the data storage system. We demonstrate how our algorithm helps to save disk space and to reduce waiting times for jobs using this data.

  15. A Hierarchical Model for Simultaneous Detection and Estimation in Multi-subject fMRI Studies

    PubMed Central

    Degras, David; Lindquist, Martin A.

    2014-01-01

    In this paper we introduce a new hierarchical model for the simultaneous detection of brain activation and estimation of the shape of the hemodynamic response in multi-subject fMRI studies. The proposed approach circumvents a major stumbling block in standard multi-subject fMRI data analysis, in that it both allows the shape of the hemodynamic response function to vary across region and subjects, while still providing a straightforward way to estimate population-level activation. An e cient estimation algorithm is presented, as is an inferential framework that not only allows for tests of activation, but also for tests for deviations from some canonical shape. The model is validated through simulations and application to a multi-subject fMRI study of thermal pain. PMID:24793829

  16. Resting State fMRI Functional Connectivity-Based Classification Using a Convolutional Neural Network Architecture

    PubMed Central

    Meszlényi, Regina J.; Buza, Krisztian; Vidnyánszky, Zoltán

    2017-01-01

    Machine learning techniques have become increasingly popular in the field of resting state fMRI (functional magnetic resonance imaging) network based classification. However, the application of convolutional networks has been proposed only very recently and has remained largely unexplored. In this paper we describe a convolutional neural network architecture for functional connectome classification called connectome-convolutional neural network (CCNN). Our results on simulated datasets and a publicly available dataset for amnestic mild cognitive impairment classification demonstrate that our CCNN model can efficiently distinguish between subject groups. We also show that the connectome-convolutional network is capable to combine information from diverse functional connectivity metrics and that models using a combination of different connectivity descriptors are able to outperform classifiers using only one metric. From this flexibility follows that our proposed CCNN model can be easily adapted to a wide range of connectome based classification or regression tasks, by varying which connectivity descriptor combinations are used to train the network. PMID:29089883

  17. Resting State fMRI Functional Connectivity-Based Classification Using a Convolutional Neural Network Architecture.

    PubMed

    Meszlényi, Regina J; Buza, Krisztian; Vidnyánszky, Zoltán

    2017-01-01

    Machine learning techniques have become increasingly popular in the field of resting state fMRI (functional magnetic resonance imaging) network based classification. However, the application of convolutional networks has been proposed only very recently and has remained largely unexplored. In this paper we describe a convolutional neural network architecture for functional connectome classification called connectome-convolutional neural network (CCNN). Our results on simulated datasets and a publicly available dataset for amnestic mild cognitive impairment classification demonstrate that our CCNN model can efficiently distinguish between subject groups. We also show that the connectome-convolutional network is capable to combine information from diverse functional connectivity metrics and that models using a combination of different connectivity descriptors are able to outperform classifiers using only one metric. From this flexibility follows that our proposed CCNN model can be easily adapted to a wide range of connectome based classification or regression tasks, by varying which connectivity descriptor combinations are used to train the network.

  18. A Java-based fMRI processing pipeline evaluation system for assessment of univariate general linear model and multivariate canonical variate analysis-based pipelines.

    PubMed

    Zhang, Jing; Liang, Lichen; Anderson, Jon R; Gatewood, Lael; Rottenberg, David A; Strother, Stephen C

    2008-01-01

    As functional magnetic resonance imaging (fMRI) becomes widely used, the demands for evaluation of fMRI processing pipelines and validation of fMRI analysis results is increasing rapidly. The current NPAIRS package, an IDL-based fMRI processing pipeline evaluation framework, lacks system interoperability and the ability to evaluate general linear model (GLM)-based pipelines using prediction metrics. Thus, it can not fully evaluate fMRI analytical software modules such as FSL.FEAT and NPAIRS.GLM. In order to overcome these limitations, a Java-based fMRI processing pipeline evaluation system was developed. It integrated YALE (a machine learning environment) into Fiswidgets (a fMRI software environment) to obtain system interoperability and applied an algorithm to measure GLM prediction accuracy. The results demonstrated that the system can evaluate fMRI processing pipelines with univariate GLM and multivariate canonical variates analysis (CVA)-based models on real fMRI data based on prediction accuracy (classification accuracy) and statistical parametric image (SPI) reproducibility. In addition, a preliminary study was performed where four fMRI processing pipelines with GLM and CVA modules such as FSL.FEAT and NPAIRS.CVA were evaluated with the system. The results indicated that (1) the system can compare different fMRI processing pipelines with heterogeneous models (NPAIRS.GLM, NPAIRS.CVA and FSL.FEAT) and rank their performance by automatic performance scoring, and (2) the rank of pipeline performance is highly dependent on the preprocessing operations. These results suggest that the system will be of value for the comparison, validation, standardization and optimization of functional neuroimaging software packages and fMRI processing pipelines.

  19. Dataset-Driven Research to Support Learning and Knowledge Analytics

    ERIC Educational Resources Information Center

    Verbert, Katrien; Manouselis, Nikos; Drachsler, Hendrik; Duval, Erik

    2012-01-01

    In various research areas, the availability of open datasets is considered as key for research and application purposes. These datasets are used as benchmarks to develop new algorithms and to compare them to other algorithms in given settings. Finding such available datasets for experimentation can be a challenging task in technology enhanced…

  20. Categorical speech processing in Broca's area: an fMRI study using multivariate pattern-based analysis.

    PubMed

    Lee, Yune-Sang; Turkeltaub, Peter; Granger, Richard; Raizada, Rajeev D S

    2012-03-14

    Although much effort has been directed toward understanding the neural basis of speech processing, the neural processes involved in the categorical perception of speech have been relatively less studied, and many questions remain open. In this functional magnetic resonance imaging (fMRI) study, we probed the cortical regions mediating categorical speech perception using an advanced brain-mapping technique, whole-brain multivariate pattern-based analysis (MVPA). Normal healthy human subjects (native English speakers) were scanned while they listened to 10 consonant-vowel syllables along the /ba/-/da/ continuum. Outside of the scanner, individuals' own category boundaries were measured to divide the fMRI data into /ba/ and /da/ conditions per subject. The whole-brain MVPA revealed that Broca's area and the left pre-supplementary motor area evoked distinct neural activity patterns between the two perceptual categories (/ba/ vs /da/). Broca's area was also found when the same analysis was applied to another dataset (Raizada and Poldrack, 2007), which previously yielded the supramarginal gyrus using a univariate adaptation-fMRI paradigm. The consistent MVPA findings from two independent datasets strongly indicate that Broca's area participates in categorical speech perception, with a possible role of translating speech signals into articulatory codes. The difference in results between univariate and multivariate pattern-based analyses of the same data suggest that processes in different cortical areas along the dorsal speech perception stream are distributed on different spatial scales.

  1. Level 2 Ancillary Products and Datasets Algorithm Theoretical Basis

    NASA Technical Reports Server (NTRS)

    Diner, D.; Abdou, W.; Gordon, H.; Kahn, R.; Knyazikhin, Y.; Martonchik, J.; McDonald, D.; McMuldroch, S.; Myneni, R.; West, R.

    1999-01-01

    This Algorithm Theoretical Basis (ATB) document describes the algorithms used to generate the parameters of certain ancillary products and datasets used during Level 2 processing of Multi-angle Imaging SpectroRadiometer (MIST) data.

  2. Efficient sequential and parallel algorithms for record linkage

    PubMed Central

    Mamun, Abdullah-Al; Mi, Tian; Aseltine, Robert; Rajasekaran, Sanguthevar

    2014-01-01

    Background and objective Integrating data from multiple sources is a crucial and challenging problem. Even though there exist numerous algorithms for record linkage or deduplication, they suffer from either large time needs or restrictions on the number of datasets that they can integrate. In this paper we report efficient sequential and parallel algorithms for record linkage which handle any number of datasets and outperform previous algorithms. Methods Our algorithms employ hierarchical clustering algorithms as the basis. A key idea that we use is radix sorting on certain attributes to eliminate identical records before any further processing. Another novel idea is to form a graph that links similar records and find the connected components. Results Our sequential and parallel algorithms have been tested on a real dataset of 1 083 878 records and synthetic datasets ranging in size from 50 000 to 9 000 000 records. Our sequential algorithm runs at least two times faster, for any dataset, than the previous best-known algorithm, the two-phase algorithm using faster computation of the edit distance (TPA (FCED)). The speedups obtained by our parallel algorithm are almost linear. For example, we get a speedup of 7.5 with 8 cores (residing in a single node), 14.1 with 16 cores (residing in two nodes), and 26.4 with 32 cores (residing in four nodes). Conclusions We have compared the performance of our sequential algorithm with TPA (FCED) and found that our algorithm outperforms the previous one. The accuracy is the same as that of this previous best-known algorithm. PMID:24154837

  3. Large-scale image region documentation for fully automated image biomarker algorithm development and evaluation.

    PubMed

    Reeves, Anthony P; Xie, Yiting; Liu, Shuang

    2017-04-01

    With the advent of fully automated image analysis and modern machine learning methods, there is a need for very large image datasets having documented segmentations for both computer algorithm training and evaluation. This paper presents a method and implementation for facilitating such datasets that addresses the critical issue of size scaling for algorithm validation and evaluation; current evaluation methods that are usually used in academic studies do not scale to large datasets. This method includes protocols for the documentation of many regions in very large image datasets; the documentation may be incrementally updated by new image data and by improved algorithm outcomes. This method has been used for 5 years in the context of chest health biomarkers from low-dose chest CT images that are now being used with increasing frequency in lung cancer screening practice. The lung scans are segmented into over 100 different anatomical regions, and the method has been applied to a dataset of over 20,000 chest CT images. Using this framework, the computer algorithms have been developed to achieve over 90% acceptable image segmentation on the complete dataset.

  4. Imbalanced class learning in epigenetics.

    PubMed

    Haque, M Muksitul; Skinner, Michael K; Holder, Lawrence B

    2014-07-01

    In machine learning, one of the important criteria for higher classification accuracy is a balanced dataset. Datasets with a large ratio between minority and majority classes face hindrance in learning using any classifier. Datasets having a magnitude difference in number of instances between the target concept result in an imbalanced class distribution. Such datasets can range from biological data, sensor data, medical diagnostics, or any other domain where labeling any instances of the minority class can be time-consuming or costly or the data may not be easily available. The current study investigates a number of imbalanced class algorithms for solving the imbalanced class distribution present in epigenetic datasets. Epigenetic (DNA methylation) datasets inherently come with few differentially DNA methylated regions (DMR) and with a higher number of non-DMR sites. For this class imbalance problem, a number of algorithms are compared, including the TAN+AdaBoost algorithm. Experiments performed on four epigenetic datasets and several known datasets show that an imbalanced dataset can have similar accuracy as a regular learner on a balanced dataset.

  5. Magnetic field shift due to mechanical vibration in functional magnetic resonance imaging.

    PubMed

    Foerster, Bernd U; Tomasi, Dardo; Caparelli, Elisabeth C

    2005-11-01

    Mechanical vibrations of the gradient coil system during readout in echo-planar imaging (EPI) can increase the temperature of the gradient system and alter the magnetic field distribution during functional magnetic resonance imaging (fMRI). This effect is enhanced by resonant modes of vibrations and results in apparent motion along the phase encoding direction in fMRI studies. The magnetic field drift was quantified during EPI by monitoring the resonance frequency interleaved with the EPI acquisition, and a novel method is proposed to correct the apparent motion. The knowledge on the frequency drift over time was used to correct the phase of the k-space EPI dataset. Since the resonance frequency changes very slowly over time, two measurements of the resonance frequency, immediately before and after the EPI acquisition, are sufficient to remove the field drift effects from fMRI time series. The frequency drift correction method was tested "in vivo" and compared to the standard image realignment method. The proposed method efficiently corrects spurious motion due to magnetic field drifts during fMRI. (c) 2005 Wiley-Liss, Inc.

  6. Dual-TRACER: High resolution fMRI with constrained evolution reconstruction.

    PubMed

    Li, Xuesong; Ma, Xiaodong; Li, Lyu; Zhang, Zhe; Zhang, Xue; Tong, Yan; Wang, Lihong; Sen Song; Guo, Hua

    2018-01-01

    fMRI with high spatial resolution is beneficial for studies in psychology and neuroscience, but is limited by various factors such as prolonged imaging time, low signal to noise ratio and scarcity of advanced facilities. Compressed Sensing (CS) based methods for accelerating fMRI data acquisition are promising. Other advanced algorithms like k-t FOCUSS or PICCS have been developed to improve performance. This study aims to investigate a new method, Dual-TRACER, based on Temporal Resolution Acceleration with Constrained Evolution Reconstruction (TRACER), for accelerating fMRI acquisitions using golden angle variable density spiral. Both numerical simulations and in vivo experiments at 3T were conducted to evaluate and characterize this method. Results show that Dual-TRACER can provide functional images with a high spatial resolution (1×1mm 2 ) under an acceleration factor of 20 while maintaining hemodynamic signals well. Compared with other investigated methods, dual-TRACER provides a better signal recovery, higher fMRI sensitivity and more reliable activation detection. Copyright © 2017 Elsevier Inc. All rights reserved.

  7. A hybrid method for classifying cognitive states from fMRI data.

    PubMed

    Parida, S; Dehuri, S; Cho, S-B; Cacha, L A; Poznanski, R R

    2015-09-01

    Functional magnetic resonance imaging (fMRI) makes it possible to detect brain activities in order to elucidate cognitive-states. The complex nature of fMRI data requires under-standing of the analyses applied to produce possible avenues for developing models of cognitive state classification and improving brain activity prediction. While many models of classification task of fMRI data analysis have been developed, in this paper, we present a novel hybrid technique through combining the best attributes of genetic algorithms (GAs) and ensemble decision tree technique that consistently outperforms all other methods which are being used for cognitive-state classification. Specifically, this paper illustrates the combined effort of decision-trees ensemble and GAs for feature selection through an extensive simulation study and discusses the classification performance with respect to fMRI data. We have shown that our proposed method exhibits significant reduction of the number of features with clear edge classification accuracy over ensemble of decision-trees.

  8. Resting-state functional magnetic resonance imaging in hepatic encephalopathy: current status and perspectives.

    PubMed

    Zhang, Long Jiang; Wu, Shengyong; Ren, Jiaqian; Lu, Guang Ming

    2014-09-01

    Hepatic encephalopathy (HE) is a neuropsychiatric syndrome which develops in patients with severe liver diseases and/or portal-systemic shunting. Minimal HE, the earliest manifestation of HE, has drawn increasing attention in the last decade. Minimal HE is associated with a series of brain functional changes, such as attention, working memory, and so on. Blood oxygen level dependent (BOLD) functional MRI (fMRI), especially resting-state fMRI has been used to explore the brain functional changes of HE, yielding important insights for understanding pathophysiological mechanisms and functional reorganization of HE. This paper briefly reviews the principles of BOLD fMRI, potential applications of resting-state fMRI with advanced post-processing algorithms such as regional homogeneity, amplitude of low frequency fluctuation, functional connectivity and future research perspective in this field.

  9. Distortion analysis of subband adaptive filtering methods for FMRI active noise control systems.

    PubMed

    Milani, Ali A; Panahi, Issa M; Briggs, Richard

    2007-01-01

    Delayless subband filtering structure, as a high performance frequency domain filtering technique, is used for canceling broadband fMRI noise (8 kHz bandwidth). In this method, adaptive filtering is done in subbands and the coefficients of the main canceling filter are computed by stacking the subband weights together. There are two types of stacking methods called FFT and FFT-2. In this paper, we analyze the distortion introduced by these two stacking methods. The effect of the stacking distortion on the performance of different adaptive filters in FXLMS algorithm with non-minimum phase secondary path is explored. The investigation is done for different adaptive algorithms (nLMS, APA and RLS), different weight stacking methods, and different number of subbands.

  10. Bridging the gap between real-life data and simulated data by providing a highly realistic fall dataset for evaluating camera-based fall detection algorithms.

    PubMed

    Baldewijns, Greet; Debard, Glen; Mertes, Gert; Vanrumste, Bart; Croonenborghs, Tom

    2016-03-01

    Fall incidents are an important health hazard for older adults. Automatic fall detection systems can reduce the consequences of a fall incident by assuring that timely aid is given. The development of these systems is therefore getting a lot of research attention. Real-life data which can help evaluate the results of this research is however sparse. Moreover, research groups that have this type of data are not at liberty to share it. Most research groups thus use simulated datasets. These simulation datasets, however, often do not incorporate the challenges the fall detection system will face when implemented in real-life. In this Letter, a more realistic simulation dataset is presented to fill this gap between real-life data and currently available datasets. It was recorded while re-enacting real-life falls recorded during previous studies. It incorporates the challenges faced by fall detection algorithms in real life. A fall detection algorithm from Debard et al. was evaluated on this dataset. This evaluation showed that the dataset possesses extra challenges compared with other publicly available datasets. In this Letter, the dataset is discussed as well as the results of this preliminary evaluation of the fall detection algorithm. The dataset can be downloaded from www.kuleuven.be/advise/datasets.

  11. funcLAB/G-service-oriented architecture for standards-based analysis of functional magnetic resonance imaging in HealthGrids.

    PubMed

    Erberich, Stephan G; Bhandekar, Manasee; Chervenak, Ann; Kesselman, Carl; Nelson, Marvin D

    2007-01-01

    Functional MRI is successfully being used in clinical and research applications including preoperative planning, language mapping, and outcome monitoring. However, clinical use of fMRI is less widespread due to its complexity of imaging, image workflow, post-processing, and lack of algorithmic standards hindering result comparability. As a consequence, wide-spread adoption of fMRI as clinical tool is low contributing to the uncertainty of community physicians how to integrate fMRI into practice. In addition, training of physicians with fMRI is in its infancy and requires clinical and technical understanding. Therefore, many institutions which perform fMRI have a team of basic researchers and physicians to perform fMRI as a routine imaging tool. In order to provide fMRI as an advanced diagnostic tool to the benefit of a larger patient population, image acquisition and image post-processing must be streamlined, standardized, and available at any institution which does not have these resources available. Here we describe a software architecture, the functional imaging laboratory (funcLAB/G), which addresses (i) standardized image processing using Statistical Parametric Mapping and (ii) its extension to secure sharing and availability for the community using standards-based Grid technology (Globus Toolkit). funcLAB/G carries the potential to overcome the limitations of fMRI in clinical use and thus makes standardized fMRI available to the broader healthcare enterprise utilizing the Internet and HealthGrid Web Services technology.

  12. Dynamic causal modelling on infant fNIRS data: A validation study on a simultaneously recorded fNIRS-fMRI dataset.

    PubMed

    Bulgarelli, Chiara; Blasi, Anna; Arridge, Simon; Powell, Samuel; de Klerk, Carina C J M; Southgate, Victoria; Brigadoi, Sabrina; Penny, William; Tak, Sungho; Hamilton, Antonia

    2018-04-12

    Tracking the connectivity of the developing brain from infancy through childhood is an area of increasing research interest, and fNIRS provides an ideal method for studying the infant brain as it is compact, safe and robust to motion. However, data analysis methods for fNIRS are still underdeveloped compared to those available for fMRI. Dynamic causal modelling (DCM) is an advanced connectivity technique developed for fMRI data, that aims to estimate the coupling between brain regions and how this might be modulated by changes in experimental conditions. DCM has recently been applied to adult fNIRS, but not to infants. The present paper provides a proof-of-principle for the application of this method to infant fNIRS data and a demonstration of the robustness of this method using a simultaneously recorded fMRI-fNIRS single case study, thereby allowing the use of this technique in future infant studies. fMRI and fNIRS were simultaneously recorded from a 6-month-old sleeping infant, who was presented with auditory stimuli in a block design. Both fMRI and fNIRS data were preprocessed using SPM, and analysed using a general linear model approach. The main challenges that adapting DCM for fNIRS infant data posed included: (i) the import of the structural image of the participant for spatial pre-processing, (ii) the spatial registration of the optodes on the structural image of the infant, (iii) calculation of an accurate 3-layer segmentation of the structural image, (iv) creation of a high-density mesh as well as (v) the estimation of the NIRS optical sensitivity functions. To assess our results, we compared the values obtained for variational Free Energy (F), Bayesian Model Selection (BMS) and Bayesian Model Average (BMA) with the same set of possible models applied to both the fMRI and fNIRS datasets. We found high correspondence in F, BMS, and BMA between fMRI and fNIRS data, therefore showing for the first time high reliability of DCM applied to infant fNIRS data. This work opens new avenues for future research on effective connectivity in infancy by contributing a data analysis pipeline and guidance for applying DCM to infant fNIRS data. Copyright © 2018 The Authors. Published by Elsevier Inc. All rights reserved.

  13. Evidence for Model-based Computations in the Human Amygdala during Pavlovian Conditioning

    PubMed Central

    Prévost, Charlotte; McNamee, Daniel; Jessup, Ryan K.; Bossaerts, Peter; O'Doherty, John P.

    2013-01-01

    Contemporary computational accounts of instrumental conditioning have emphasized a role for a model-based system in which values are computed with reference to a rich model of the structure of the world, and a model-free system in which values are updated without encoding such structure. Much less studied is the possibility of a similar distinction operating at the level of Pavlovian conditioning. In the present study, we scanned human participants while they participated in a Pavlovian conditioning task with a simple structure while measuring activity in the human amygdala using a high-resolution fMRI protocol. After fitting a model-based algorithm and a variety of model-free algorithms to the fMRI data, we found evidence for the superiority of a model-based algorithm in accounting for activity in the amygdala compared to the model-free counterparts. These findings support an important role for model-based algorithms in describing the processes underpinning Pavlovian conditioning, as well as providing evidence of a role for the human amygdala in model-based inference. PMID:23436990

  14. EXPLORING FUNCTIONAL CONNECTIVITY IN FMRI VIA CLUSTERING.

    PubMed

    Venkataraman, Archana; Van Dijk, Koene R A; Buckner, Randy L; Golland, Polina

    2009-04-01

    In this paper we investigate the use of data driven clustering methods for functional connectivity analysis in fMRI. In particular, we consider the K-Means and Spectral Clustering algorithms as alternatives to the commonly used Seed-Based Analysis. To enable clustering of the entire brain volume, we use the Nyström Method to approximate the necessary spectral decompositions. We apply K-Means, Spectral Clustering and Seed-Based Analysis to resting-state fMRI data collected from 45 healthy young adults. Without placing any a priori constraints, both clustering methods yield partitions that are associated with brain systems previously identified via Seed-Based Analysis. Our empirical results suggest that clustering provides a valuable tool for functional connectivity analysis.

  15. Large-scale image region documentation for fully automated image biomarker algorithm development and evaluation

    PubMed Central

    Reeves, Anthony P.; Xie, Yiting; Liu, Shuang

    2017-01-01

    Abstract. With the advent of fully automated image analysis and modern machine learning methods, there is a need for very large image datasets having documented segmentations for both computer algorithm training and evaluation. This paper presents a method and implementation for facilitating such datasets that addresses the critical issue of size scaling for algorithm validation and evaluation; current evaluation methods that are usually used in academic studies do not scale to large datasets. This method includes protocols for the documentation of many regions in very large image datasets; the documentation may be incrementally updated by new image data and by improved algorithm outcomes. This method has been used for 5 years in the context of chest health biomarkers from low-dose chest CT images that are now being used with increasing frequency in lung cancer screening practice. The lung scans are segmented into over 100 different anatomical regions, and the method has been applied to a dataset of over 20,000 chest CT images. Using this framework, the computer algorithms have been developed to achieve over 90% acceptable image segmentation on the complete dataset. PMID:28612037

  16. Performances of Machine Learning Algorithms for Binary Classification of Network Anomaly Detection System

    NASA Astrophysics Data System (ADS)

    Nawir, Mukrimah; Amir, Amiza; Lynn, Ong Bi; Yaakob, Naimah; Badlishah Ahmad, R.

    2018-05-01

    The rapid growth of technologies might endanger them to various network attacks due to the nature of data which are frequently exchange their data through Internet and large-scale data that need to be handle. Moreover, network anomaly detection using machine learning faced difficulty when dealing the involvement of dataset where the number of labelled network dataset is very few in public and this caused many researchers keep used the most commonly network dataset (KDDCup99) which is not relevant to employ the machine learning (ML) algorithms for a classification. Several issues regarding these available labelled network datasets are discussed in this paper. The aim of this paper to build a network anomaly detection system using machine learning algorithms that are efficient, effective and fast processing. The finding showed that AODE algorithm is performed well in term of accuracy and processing time for binary classification towards UNSW-NB15 dataset.

  17. Object detection approach using generative sparse, hierarchical networks with top-down and lateral connections for combining texture/color detection and shape/contour detection

    DOEpatents

    Paiton, Dylan M.; Kenyon, Garrett T.; Brumby, Steven P.; Schultz, Peter F.; George, John S.

    2015-07-28

    An approach to detecting objects in an image dataset may combine texture/color detection, shape/contour detection, and/or motion detection using sparse, generative, hierarchical models with lateral and top-down connections. A first independent representation of objects in an image dataset may be produced using a color/texture detection algorithm. A second independent representation of objects in the image dataset may be produced using a shape/contour detection algorithm. A third independent representation of objects in the image dataset may be produced using a motion detection algorithm. The first, second, and third independent representations may then be combined into a single coherent output using a combinatorial algorithm.

  18. Semi-supervised clustering for parcellating brain regions based on resting state fMRI data

    NASA Astrophysics Data System (ADS)

    Cheng, Hewei; Fan, Yong

    2014-03-01

    Many unsupervised clustering techniques have been adopted for parcellating brain regions of interest into functionally homogeneous subregions based on resting state fMRI data. However, the unsupervised clustering techniques are not able to take advantage of exiting knowledge of the functional neuroanatomy readily available from studies of cytoarchitectonic parcellation or meta-analysis of the literature. In this study, we propose a semi-supervised clustering method for parcellating amygdala into functionally homogeneous subregions based on resting state fMRI data. Particularly, the semi-supervised clustering is implemented under the framework of graph partitioning, and adopts prior information and spatial consistent constraints to obtain a spatially contiguous parcellation result. The graph partitioning problem is solved using an efficient algorithm similar to the well-known weighted kernel k-means algorithm. Our method has been validated for parcellating amygdala into 3 subregions based on resting state fMRI data of 28 subjects. The experiment results have demonstrated that the proposed method is more robust than unsupervised clustering and able to parcellate amygdala into centromedial, laterobasal, and superficial parts with improved functionally homogeneity compared with the cytoarchitectonic parcellation result. The validity of the parcellation results is also supported by distinctive functional and structural connectivity patterns of the subregions and high consistency between coactivation patterns derived from a meta-analysis and functional connectivity patterns of corresponding subregions.

  19. Intersession reliability of fMRI activation for heat pain and motor tasks

    PubMed Central

    Quiton, Raimi L.; Keaser, Michael L.; Zhuo, Jiachen; Gullapalli, Rao P.; Greenspan, Joel D.

    2014-01-01

    As the practice of conducting longitudinal fMRI studies to assess mechanisms of pain-reducing interventions becomes more common, there is a great need to assess the test–retest reliability of the pain-related BOLD fMRI signal across repeated sessions. This study quantitatively evaluated the reliability of heat pain-related BOLD fMRI brain responses in healthy volunteers across 3 sessions conducted on separate days using two measures: (1) intraclass correlation coefficients (ICC) calculated based on signal amplitude and (2) spatial overlap. The ICC analysis of pain-related BOLD fMRI responses showed fair-to-moderate intersession reliability in brain areas regarded as part of the cortical pain network. Areas with the highest intersession reliability based on the ICC analysis included the anterior midcingulate cortex, anterior insula, and second somatosensory cortex. Areas with the lowest intersession reliability based on the ICC analysis also showed low spatial reliability; these regions included pregenual anterior cingulate cortex, primary somatosensory cortex, and posterior insula. Thus, this study found regional differences in pain-related BOLD fMRI response reliability, which may provide useful information to guide longitudinal pain studies. A simple motor task (finger-thumb opposition) was performed by the same subjects in the same sessions as the painful heat stimuli were delivered. Intersession reliability of fMRI activation in cortical motor areas was comparable to previously published findings for both spatial overlap and ICC measures, providing support for the validity of the analytical approach used to assess intersession reliability of pain-related fMRI activation. A secondary finding of this study is that the use of standard ICC alone as a measure of reliability may not be sufficient, as the underlying variance structure of an fMRI dataset can result in inappropriately high ICC values; a method to eliminate these false positive results was used in this study and is recommended for future studies of test–retest reliability. PMID:25161897

  20. Comparison between hybrid feedforward-feedback, feedforward, and feedback structures for active noise control of fMRI noise.

    PubMed

    Reddy, Rajiv M; Panahi, Issa M S

    2008-01-01

    The performance of FIR feedforward, IIR feedforward, FIR feedback, hybrid FIR feedforward--FIR feedback, and hybrid IIR feedforward - FIR feedback structures for active noise control (ANC) are compared for an fMRI noise application. The filtered-input normalized least squares (FxNLMS) algorithm is used to update the coefficients of the adaptive filters in all these structures. Realistic primary and secondary paths of an fMRI bore are used by estimating them on a half cylindrical acrylic bore of 0.76 m (D)x1.52 m (L). Detailed results of the performance of the ANC system are presented in the paper for each of these structures. We find that the IIR feedforward structure produces most of the performance improvement in the hybrid IIR feedforward - FIR feedback structure and adding the feedback structure becomes almost redundant in the case of fMRI noise.

  1. A model-based 3D phase unwrapping algorithm using Gegenbauer polynomials.

    PubMed

    Langley, Jason; Zhao, Qun

    2009-09-07

    The application of a two-dimensional (2D) phase unwrapping algorithm to a three-dimensional (3D) phase map may result in an unwrapped phase map that is discontinuous in the direction normal to the unwrapped plane. This work investigates the problem of phase unwrapping for 3D phase maps. The phase map is modeled as a product of three one-dimensional Gegenbauer polynomials. The orthogonality of Gegenbauer polynomials and their derivatives on the interval [-1, 1] are exploited to calculate the expansion coefficients. The algorithm was implemented using two well-known Gegenbauer polynomials: Chebyshev polynomials of the first kind and Legendre polynomials. Both implementations of the phase unwrapping algorithm were tested on 3D datasets acquired from a magnetic resonance imaging (MRI) scanner. The first dataset was acquired from a homogeneous spherical phantom. The second dataset was acquired using the same spherical phantom but magnetic field inhomogeneities were introduced by an external coil placed adjacent to the phantom, which provided an additional burden to the phase unwrapping algorithm. Then Gaussian noise was added to generate a low signal-to-noise ratio dataset. The third dataset was acquired from the brain of a human volunteer. The results showed that Chebyshev implementation and the Legendre implementation of the phase unwrapping algorithm give similar results on the 3D datasets. Both implementations of the phase unwrapping algorithm compare well to PRELUDE 3D, 3D phase unwrapping software well recognized for functional MRI.

  2. Localized Ambient Solidity Separation Algorithm Based Computer User Segmentation.

    PubMed

    Sun, Xiao; Zhang, Tongda; Chai, Yueting; Liu, Yi

    2015-01-01

    Most of popular clustering methods typically have some strong assumptions of the dataset. For example, the k-means implicitly assumes that all clusters come from spherical Gaussian distributions which have different means but the same covariance. However, when dealing with datasets that have diverse distribution shapes or high dimensionality, these assumptions might not be valid anymore. In order to overcome this weakness, we proposed a new clustering algorithm named localized ambient solidity separation (LASS) algorithm, using a new isolation criterion called centroid distance. Compared with other density based isolation criteria, our proposed centroid distance isolation criterion addresses the problem caused by high dimensionality and varying density. The experiment on a designed two-dimensional benchmark dataset shows that our proposed LASS algorithm not only inherits the advantage of the original dissimilarity increments clustering method to separate naturally isolated clusters but also can identify the clusters which are adjacent, overlapping, and under background noise. Finally, we compared our LASS algorithm with the dissimilarity increments clustering method on a massive computer user dataset with over two million records that contains demographic and behaviors information. The results show that LASS algorithm works extremely well on this computer user dataset and can gain more knowledge from it.

  3. Localized Ambient Solidity Separation Algorithm Based Computer User Segmentation

    PubMed Central

    Sun, Xiao; Zhang, Tongda; Chai, Yueting; Liu, Yi

    2015-01-01

    Most of popular clustering methods typically have some strong assumptions of the dataset. For example, the k-means implicitly assumes that all clusters come from spherical Gaussian distributions which have different means but the same covariance. However, when dealing with datasets that have diverse distribution shapes or high dimensionality, these assumptions might not be valid anymore. In order to overcome this weakness, we proposed a new clustering algorithm named localized ambient solidity separation (LASS) algorithm, using a new isolation criterion called centroid distance. Compared with other density based isolation criteria, our proposed centroid distance isolation criterion addresses the problem caused by high dimensionality and varying density. The experiment on a designed two-dimensional benchmark dataset shows that our proposed LASS algorithm not only inherits the advantage of the original dissimilarity increments clustering method to separate naturally isolated clusters but also can identify the clusters which are adjacent, overlapping, and under background noise. Finally, we compared our LASS algorithm with the dissimilarity increments clustering method on a massive computer user dataset with over two million records that contains demographic and behaviors information. The results show that LASS algorithm works extremely well on this computer user dataset and can gain more knowledge from it. PMID:26221133

  4. Dynamic Reorganization of Functional Connectivity Reveals Abnormal Temporal Efficiency in Schizophrenia.

    PubMed

    Sun, Yu; Collinson, Simon L; Suckling, John; Sim, Kang

    2018-06-07

    Emerging evidence suggests that schizophrenia is associated with brain dysconnectivity. Nonetheless, the implicit assumption of stationary functional connectivity (FC) adopted in most previous resting-state functional magnetic resonance imaging (fMRI) studies raises an open question of schizophrenia-related aberrations in dynamic properties of resting-state FC. This study introduces an empirical method to examine the dynamic functional dysconnectivity in patients with schizophrenia. Temporal brain networks were estimated from resting-state fMRI of 2 independent datasets (patients/controls = 18/19 and 53/57 for self-recorded dataset and a publicly available replication dataset, respectively) by the correlation of sliding time-windowed time courses among regions of a predefined atlas. Through the newly introduced temporal efficiency approach and temporal random network models, we examined, for the first time, the 3D spatiotemporal architecture of the temporal brain network. We found that although prominent temporal small-world properties were revealed in both groups, temporal brain networks of patients with schizophrenia in both datasets showed a significantly higher temporal global efficiency, which cannot be simply attributable to head motion and sampling error. Specifically, we found localized changes of temporal nodal properties in the left frontal, right medial parietal, and subcortical areas that were associated with clinical features of schizophrenia. Our findings demonstrate that altered dynamic FC may underlie abnormal brain function and clinical symptoms observed in schizophrenia. Moreover, we provide new evidence to extend the dysconnectivity hypothesis in schizophrenia from static to dynamic brain network and highlight the potential of aberrant brain dynamic FC in unraveling the pathophysiologic mechanisms of the disease.

  5. Machine Learning Algorithms for Automatic Classification of Marmoset Vocalizations

    PubMed Central

    Ribeiro, Sidarta; Pereira, Danillo R.; Papa, João P.; de Albuquerque, Victor Hugo C.

    2016-01-01

    Automatic classification of vocalization type could potentially become a useful tool for acoustic the monitoring of captive colonies of highly vocal primates. However, for classification to be useful in practice, a reliable algorithm that can be successfully trained on small datasets is necessary. In this work, we consider seven different classification algorithms with the goal of finding a robust classifier that can be successfully trained on small datasets. We found good classification performance (accuracy > 0.83 and F1-score > 0.84) using the Optimum Path Forest classifier. Dataset and algorithms are made publicly available. PMID:27654941

  6. Harnessing Diversity towards the Reconstructing of Large Scale Gene Regulatory Networks

    PubMed Central

    Yamanaka, Ryota; Kitano, Hiroaki

    2013-01-01

    Elucidating gene regulatory network (GRN) from large scale experimental data remains a central challenge in systems biology. Recently, numerous techniques, particularly consensus driven approaches combining different algorithms, have become a potentially promising strategy to infer accurate GRNs. Here, we develop a novel consensus inference algorithm, TopkNet that can integrate multiple algorithms to infer GRNs. Comprehensive performance benchmarking on a cloud computing framework demonstrated that (i) a simple strategy to combine many algorithms does not always lead to performance improvement compared to the cost of consensus and (ii) TopkNet integrating only high-performance algorithms provide significant performance improvement compared to the best individual algorithms and community prediction. These results suggest that a priori determination of high-performance algorithms is a key to reconstruct an unknown regulatory network. Similarity among gene-expression datasets can be useful to determine potential optimal algorithms for reconstruction of unknown regulatory networks, i.e., if expression-data associated with known regulatory network is similar to that with unknown regulatory network, optimal algorithms determined for the known regulatory network can be repurposed to infer the unknown regulatory network. Based on this observation, we developed a quantitative measure of similarity among gene-expression datasets and demonstrated that, if similarity between the two expression datasets is high, TopkNet integrating algorithms that are optimal for known dataset perform well on the unknown dataset. The consensus framework, TopkNet, together with the similarity measure proposed in this study provides a powerful strategy towards harnessing the wisdom of the crowds in reconstruction of unknown regulatory networks. PMID:24278007

  7. Object detection approach using generative sparse, hierarchical networks with top-down and lateral connections for combining texture/color detection and shape/contour detection

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Paiton, Dylan M.; Kenyon, Garrett T.; Brumby, Steven P.

    An approach to detecting objects in an image dataset may combine texture/color detection, shape/contour detection, and/or motion detection using sparse, generative, hierarchical models with lateral and top-down connections. A first independent representation of objects in an image dataset may be produced using a color/texture detection algorithm. A second independent representation of objects in the image dataset may be produced using a shape/contour detection algorithm. A third independent representation of objects in the image dataset may be produced using a motion detection algorithm. The first, second, and third independent representations may then be combined into a single coherent output using amore » combinatorial algorithm.« less

  8. Multimodal analysis of cortical chemoarchitecture and macroscale fMRI resting‐state functional connectivity

    PubMed Central

    Scholtens, Lianne H.; Turk, Elise; Mantini, Dante; Vanduffel, Wim; Feldman Barrett, Lisa

    2016-01-01

    Abstract The cerebral cortex is well known to display a large variation in excitatory and inhibitory chemoarchitecture, but the effect of this variation on global scale functional neural communication and synchronization patterns remains less well understood. Here, we provide evidence of the chemoarchitecture of cortical regions to be associated with large‐scale region‐to‐region resting‐state functional connectivity. We assessed the excitatory versus inhibitory chemoarchitecture of cortical areas as an ExIn ratio between receptor density mappings of excitatory (AMPA, M1) and inhibitory (GABAA, M2) receptors, computed on the basis of data collated from pioneering studies of autoradiography mappings as present in literature of the human (2 datasets) and macaque (1 dataset) cortex. Cortical variation in ExIn ratio significantly correlated with total level of functional connectivity as derived from resting‐state functional connectivity recordings of cortical areas across all three datasets (human I: P = 0.0004; human II: P = 0.0008; macaque: P = 0.0007), suggesting cortical areas with an overall more excitatory character to show higher levels of intrinsic functional connectivity during resting‐state. Our findings are indicative of the microscale chemoarchitecture of cortical regions to be related to resting‐state fMRI connectivity patterns at the global system's level of connectome organization. Hum Brain Mapp 37:3103–3113, 2016. © 2016 Wiley Periodicals, Inc. PMID:27207489

  9. A unified framework for group independent component analysis for multi-subject fMRI data

    PubMed Central

    Guo, Ying; Pagnoni, Giuseppe

    2008-01-01

    Independent component analysis (ICA) is becoming increasingly popular for analyzing functional magnetic resonance imaging (fMRI) data. While ICA has been successfully applied to single-subject analysis, the extension of ICA to group inferences is not straightforward and remains an active topic of research. Current group ICA models, such as the GIFT (Calhoun et al., 2001) and tensor PICA (Beckmann and Smith, 2005), make different assumptions about the underlying structure of the group spatio-temporal processes and are thus estimated using algorithms tailored for the assumed structure, potentially leading to diverging results. To our knowledge, there are currently no methods for assessing the validity of different model structures in real fMRI data and selecting the most appropriate one among various choices. In this paper, we propose a unified framework for estimating and comparing group ICA models with varying spatio-temporal structures. We consider a class of group ICA models that can accommodate different group structures and include existing models, such as the GIFT and tensor PICA, as special cases. We propose a maximum likelihood (ML) approach with a modified Expectation-Maximization (EM) algorithm for the estimation of the proposed class of models. Likelihood ratio tests (LRT) are presented to compare between different group ICA models. The LRT can be used to perform model comparison and selection, to assess the goodness-of-fit of a model in a particular data set, and to test group differences in the fMRI signal time courses between subject subgroups. Simulation studies are conducted to evaluate the performance of the proposed method under varying structures of group spatio-temporal processes. We illustrate our group ICA method using data from an fMRI study that investigates changes in neural processing associated with the regular practice of Zen meditation. PMID:18650105

  10. A Review of Challenges in the Use of fMRI for Disease Classification / Characterization and A Projection Pursuit Application from Multi-site fMRI Schizophrenia Study.

    PubMed

    Demirci, Oguz; Clark, Vincent P; Magnotta, Vincent A; Andreasen, Nancy C; Lauriello, John; Kiehl, Kent A; Pearlson, Godfrey D; Calhoun, Vince D

    2008-09-01

    Functional magnetic resonance imaging (fMRI) is a fairly new technique that has the potential to characterize and classify brain disorders such as schizophrenia. It has the possibility of playing a crucial role in designing objective prognostic/diagnostic tools, but also presents numerous challenges to analysis and interpretation. Classification provides results for individual subjects, rather than results related to group differences. This is a more complicated endeavor that must be approached more carefully and efficient methods should be developed to draw generalized and valid conclusions out of high dimensional data with a limited number of subjects, especially for heterogeneous disorders whose pathophysiology is unknown. Numerous research efforts have been reported in the field using fMRI activation of schizophrenia patients and healthy controls. However, the results are usually not generalizable to larger data sets and require careful definition of the techniques used both in designing algorithms and reporting prediction accuracies. In this review paper, we survey a number of previous reports and also identify possible biases (cross-validation, class size, e.g.) in class comparison/prediction problems. Some suggestions to improve the effectiveness of the presentation of the prediction accuracy results are provided. We also present our own results using a projection pursuit algorithm followed by an application of independent component analysis proposed in an earlier study. We classify schizophrenia versus healthy controls using fMRI data of 155 subjects from two sites obtained during three different tasks. The results are compared in order to investigate the effectiveness of each task and differences between patients with schizophrenia and healthy controls were investigated.

  11. Findings in resting-state fMRI by differences from K-means clustering.

    PubMed

    Chyzhyk, Darya; Graña, Manuel

    2014-01-01

    Resting state fMRI has growing number of studies with diverse aims, always centered on some kind of functional connectivity biomarker obtained from correlation regarding seed regions, or by analytical decomposition of the signal towards the localization of the spatial distribution of functional connectivity patterns. In general, studies are computationally costly and very sensitive to noise and preprocessing of data. In this paper we consider clustering by K-means as a exploratory procedure which can provide some results with little computational effort, due to efficient implementations that are readily available. We demonstrate the approach on a dataset of schizophrenia patients, finding differences between patients with and without auditory hallucinations.

  12. On the relationship between instantaneous phase synchrony and correlation-based sliding windows for time-resolved fMRI connectivity analysis.

    PubMed

    Pedersen, Mangor; Omidvarnia, Amir; Zalesky, Andrew; Jackson, Graeme D

    2018-06-08

    Correlation-based sliding window analysis (CSWA) is the most commonly used method to estimate time-resolved functional MRI (fMRI) connectivity. However, instantaneous phase synchrony analysis (IPSA) is gaining popularity mainly because it offers single time-point resolution of time-resolved fMRI connectivity. We aim to provide a systematic comparison between these two approaches, on both temporal and topological levels. For this purpose, we used resting-state fMRI data from two separate cohorts with different temporal resolutions (45 healthy subjects from Human Connectome Project fMRI data with repetition time of 0.72 s and 25 healthy subjects from a separate validation fMRI dataset with a repetition time of 3 s). For time-resolved functional connectivity analysis, we calculated tapered CSWA over a wide range of different window lengths that were temporally and topologically compared to IPSA. We found a strong association in connectivity dynamics between IPSA and CSWA when considering the absolute values of CSWA. The association between CSWA and IPSA was stronger for a window length of ∼20 s (shorter than filtered fMRI wavelength) than ∼100 s (longer than filtered fMRI wavelength), irrespective of the sampling rate of the underlying fMRI data. Narrow-band filtering of fMRI data (0.03-0.07 Hz) yielded a stronger relationship between IPSA and CSWA than wider-band (0.01-0.1 Hz). On a topological level, time-averaged IPSA and CSWA nodes were non-linearly correlated for both short (∼20 s) and long (∼100 s) windows, mainly because nodes with strong negative correlations (CSWA) displayed high phase synchrony (IPSA). IPSA and CSWA were anatomically similar in the default mode network, sensory cortex, insula and cerebellum. Our results suggest that IPSA and CSWA provide comparable characterizations of time-resolved fMRI connectivity for appropriately chosen window lengths. Although IPSA requires narrow-band fMRI filtering, we recommend the use of IPSA given that it does not mandate a (semi-)arbitrary choice of window length and window overlap. A code for calculating IPSA is provided. Copyright © 2018. Published by Elsevier Inc.

  13. SamSelect: a sample sequence selection algorithm for quorum planted motif search on large DNA datasets.

    PubMed

    Yu, Qiang; Wei, Dingbang; Huo, Hongwei

    2018-06-18

    Given a set of t n-length DNA sequences, q satisfying 0 < q ≤ 1, and l and d satisfying 0 ≤ d < l < n, the quorum planted motif search (qPMS) finds l-length strings that occur in at least qt input sequences with up to d mismatches and is mainly used to locate transcription factor binding sites in DNA sequences. Existing qPMS algorithms have been able to efficiently process small standard datasets (e.g., t = 20 and n = 600), but they are too time consuming to process large DNA datasets, such as ChIP-seq datasets that contain thousands of sequences or more. We analyze the effects of t and q on the time performance of qPMS algorithms and find that a large t or a small q causes a longer computation time. Based on this information, we improve the time performance of existing qPMS algorithms by selecting a sample sequence set D' with a small t and a large q from the large input dataset D and then executing qPMS algorithms on D'. A sample sequence selection algorithm named SamSelect is proposed. The experimental results on both simulated and real data show (1) that SamSelect can select D' efficiently and (2) that the qPMS algorithms executed on D' can find implanted or real motifs in a significantly shorter time than when executed on D. We improve the ability of existing qPMS algorithms to process large DNA datasets from the perspective of selecting high-quality sample sequence sets so that the qPMS algorithms can find motifs in a short time in the selected sample sequence set D', rather than take an unfeasibly long time to search the original sequence set D. Our motif discovery method is an approximate algorithm.

  14. Adaptive Swarm Balancing Algorithms for rare-event prediction in imbalanced healthcare data

    PubMed Central

    Wong, Raymond K.; Mohammed, Sabah; Fiaidhi, Jinan; Sung, Yunsick

    2017-01-01

    Clinical data analysis and forecasting have made substantial contributions to disease control, prevention and detection. However, such data usually suffer from highly imbalanced samples in class distributions. In this paper, we aim to formulate effective methods to rebalance binary imbalanced dataset, where the positive samples take up only the minority. We investigate two different meta-heuristic algorithms, particle swarm optimization and bat algorithm, and apply them to empower the effects of synthetic minority over-sampling technique (SMOTE) for pre-processing the datasets. One approach is to process the full dataset as a whole. The other is to split up the dataset and adaptively process it one segment at a time. The experimental results reported in this paper reveal that the performance improvements obtained by the former methods are not scalable to larger data scales. The latter methods, which we call Adaptive Swarm Balancing Algorithms, lead to significant efficiency and effectiveness improvements on large datasets while the first method is invalid. We also find it more consistent with the practice of the typical large imbalanced medical datasets. We further use the meta-heuristic algorithms to optimize two key parameters of SMOTE. The proposed methods lead to more credible performances of the classifier, and shortening the run time compared to brute-force method. PMID:28753613

  15. Converting Static Image Datasets to Spiking Neuromorphic Datasets Using Saccades.

    PubMed

    Orchard, Garrick; Jayawant, Ajinkya; Cohen, Gregory K; Thakor, Nitish

    2015-01-01

    Creating datasets for Neuromorphic Vision is a challenging task. A lack of available recordings from Neuromorphic Vision sensors means that data must typically be recorded specifically for dataset creation rather than collecting and labeling existing data. The task is further complicated by a desire to simultaneously provide traditional frame-based recordings to allow for direct comparison with traditional Computer Vision algorithms. Here we propose a method for converting existing Computer Vision static image datasets into Neuromorphic Vision datasets using an actuated pan-tilt camera platform. Moving the sensor rather than the scene or image is a more biologically realistic approach to sensing and eliminates timing artifacts introduced by monitor updates when simulating motion on a computer monitor. We present conversion of two popular image datasets (MNIST and Caltech101) which have played important roles in the development of Computer Vision, and we provide performance metrics on these datasets using spike-based recognition algorithms. This work contributes datasets for future use in the field, as well as results from spike-based algorithms against which future works can compare. Furthermore, by converting datasets already popular in Computer Vision, we enable more direct comparison with frame-based approaches.

  16. PCA leverage: outlier detection for high-dimensional functional magnetic resonance imaging data.

    PubMed

    Mejia, Amanda F; Nebel, Mary Beth; Eloyan, Ani; Caffo, Brian; Lindquist, Martin A

    2017-07-01

    Outlier detection for high-dimensional (HD) data is a popular topic in modern statistical research. However, one source of HD data that has received relatively little attention is functional magnetic resonance images (fMRI), which consists of hundreds of thousands of measurements sampled at hundreds of time points. At a time when the availability of fMRI data is rapidly growing-primarily through large, publicly available grassroots datasets-automated quality control and outlier detection methods are greatly needed. We propose principal components analysis (PCA) leverage and demonstrate how it can be used to identify outlying time points in an fMRI run. Furthermore, PCA leverage is a measure of the influence of each observation on the estimation of principal components, which are often of interest in fMRI data. We also propose an alternative measure, PCA robust distance, which is less sensitive to outliers and has controllable statistical properties. The proposed methods are validated through simulation studies and are shown to be highly accurate. We also conduct a reliability study using resting-state fMRI data from the Autism Brain Imaging Data Exchange and find that removal of outliers using the proposed methods results in more reliable estimation of subject-level resting-state networks using independent components analysis. © The Author 2017. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  17. Squish: Near-Optimal Compression for Archival of Relational Datasets

    PubMed Central

    Gao, Yihan; Parameswaran, Aditya

    2017-01-01

    Relational datasets are being generated at an alarmingly rapid rate across organizations and industries. Compressing these datasets could significantly reduce storage and archival costs. Traditional compression algorithms, e.g., gzip, are suboptimal for compressing relational datasets since they ignore the table structure and relationships between attributes. We study compression algorithms that leverage the relational structure to compress datasets to a much greater extent. We develop Squish, a system that uses a combination of Bayesian Networks and Arithmetic Coding to capture multiple kinds of dependencies among attributes and achieve near-entropy compression rate. Squish also supports user-defined attributes: users can instantiate new data types by simply implementing five functions for a new class interface. We prove the asymptotic optimality of our compression algorithm and conduct experiments to show the effectiveness of our system: Squish achieves a reduction of over 50% in storage size relative to systems developed in prior work on a variety of real datasets. PMID:28180028

  18. Multimodal integration of fMRI and EEG data for high spatial and temporal resolution analysis of brain networks

    PubMed Central

    Mantini, D.; Marzetti, L.; Corbetta, M.; Romani, G.L.; Del Gratta, C.

    2017-01-01

    Two major non-invasive brain mapping techniques, electroencephalography (EEG) and functional magnetic resonance imaging (fMRI), have complementary advantages with regard to their spatial and temporal resolution. We propose an approach based on the integration of EEG and fMRI, enabling the EEG temporal dynamics of information processing to be characterized within spatially well-defined fMRI large-scale networks. First, the fMRI data are decomposed into networks by means of spatial independent component analysis (sICA), and those associated with intrinsic activity and/or responding to task performance are selected using information from the related time-courses. Next, the EEG data over all sensors are averaged with respect to event timing, thus calculating event-related potentials (ERPs). The ERPs are subjected to temporal ICA (tICA), and the resulting components are localized with the weighted minimum norm (WMNLS) algorithm using the task-related fMRI networks as priors. Finally, the temporal contribution of each ERP component in the areas belonging to the fMRI large-scale networks is estimated. The proposed approach has been evaluated on visual target detection data. Our results confirm that two different components, commonly observed in EEG when presenting novel and salient stimuli respectively, are related to the neuronal activation in large-scale networks, operating at different latencies and associated with different functional processes. PMID:20052528

  19. SPHINX--an algorithm for taxonomic binning of metagenomic sequences.

    PubMed

    Mohammed, Monzoorul Haque; Ghosh, Tarini Shankar; Singh, Nitin Kumar; Mande, Sharmila S

    2011-01-01

    Compared with composition-based binning algorithms, the binning accuracy and specificity of alignment-based binning algorithms is significantly higher. However, being alignment-based, the latter class of algorithms require enormous amount of time and computing resources for binning huge metagenomic datasets. The motivation was to develop a binning approach that can analyze metagenomic datasets as rapidly as composition-based approaches, but nevertheless has the accuracy and specificity of alignment-based algorithms. This article describes a hybrid binning approach (SPHINX) that achieves high binning efficiency by utilizing the principles of both 'composition'- and 'alignment'-based binning algorithms. Validation results with simulated sequence datasets indicate that SPHINX is able to analyze metagenomic sequences as rapidly as composition-based algorithms. Furthermore, the binning efficiency (in terms of accuracy and specificity of assignments) of SPHINX is observed to be comparable with results obtained using alignment-based algorithms. A web server for the SPHINX algorithm is available at http://metagenomics.atc.tcs.com/SPHINX/.

  20. A Statistical Method to Distinguish Functional Brain Networks

    PubMed Central

    Fujita, André; Vidal, Maciel C.; Takahashi, Daniel Y.

    2017-01-01

    One major problem in neuroscience is the comparison of functional brain networks of different populations, e.g., distinguishing the networks of controls and patients. Traditional algorithms are based on search for isomorphism between networks, assuming that they are deterministic. However, biological networks present randomness that cannot be well modeled by those algorithms. For instance, functional brain networks of distinct subjects of the same population can be different due to individual characteristics. Moreover, networks of subjects from different populations can be generated through the same stochastic process. Thus, a better hypothesis is that networks are generated by random processes. In this case, subjects from the same group are samples from the same random process, whereas subjects from different groups are generated by distinct processes. Using this idea, we developed a statistical test called ANOGVA to test whether two or more populations of graphs are generated by the same random graph model. Our simulations' results demonstrate that we can precisely control the rate of false positives and that the test is powerful to discriminate random graphs generated by different models and parameters. The method also showed to be robust for unbalanced data. As an example, we applied ANOGVA to an fMRI dataset composed of controls and patients diagnosed with autism or Asperger. ANOGVA identified the cerebellar functional sub-network as statistically different between controls and autism (p < 0.001). PMID:28261045

  1. A Statistical Method to Distinguish Functional Brain Networks.

    PubMed

    Fujita, André; Vidal, Maciel C; Takahashi, Daniel Y

    2017-01-01

    One major problem in neuroscience is the comparison of functional brain networks of different populations, e.g., distinguishing the networks of controls and patients. Traditional algorithms are based on search for isomorphism between networks, assuming that they are deterministic. However, biological networks present randomness that cannot be well modeled by those algorithms. For instance, functional brain networks of distinct subjects of the same population can be different due to individual characteristics. Moreover, networks of subjects from different populations can be generated through the same stochastic process. Thus, a better hypothesis is that networks are generated by random processes. In this case, subjects from the same group are samples from the same random process, whereas subjects from different groups are generated by distinct processes. Using this idea, we developed a statistical test called ANOGVA to test whether two or more populations of graphs are generated by the same random graph model. Our simulations' results demonstrate that we can precisely control the rate of false positives and that the test is powerful to discriminate random graphs generated by different models and parameters. The method also showed to be robust for unbalanced data. As an example, we applied ANOGVA to an fMRI dataset composed of controls and patients diagnosed with autism or Asperger. ANOGVA identified the cerebellar functional sub-network as statistically different between controls and autism ( p < 0.001).

  2. Comparative Evaluation of Background Subtraction Algorithms in Remote Scene Videos Captured by MWIR Sensors

    PubMed Central

    Yao, Guangle; Lei, Tao; Zhong, Jiandan; Jiang, Ping; Jia, Wenwu

    2017-01-01

    Background subtraction (BS) is one of the most commonly encountered tasks in video analysis and tracking systems. It distinguishes the foreground (moving objects) from the video sequences captured by static imaging sensors. Background subtraction in remote scene infrared (IR) video is important and common to lots of fields. This paper provides a Remote Scene IR Dataset captured by our designed medium-wave infrared (MWIR) sensor. Each video sequence in this dataset is identified with specific BS challenges and the pixel-wise ground truth of foreground (FG) for each frame is also provided. A series of experiments were conducted to evaluate BS algorithms on this proposed dataset. The overall performance of BS algorithms and the processor/memory requirements were compared. Proper evaluation metrics or criteria were employed to evaluate the capability of each BS algorithm to handle different kinds of BS challenges represented in this dataset. The results and conclusions in this paper provide valid references to develop new BS algorithm for remote scene IR video sequence, and some of them are not only limited to remote scene or IR video sequence but also generic for background subtraction. The Remote Scene IR dataset and the foreground masks detected by each evaluated BS algorithm are available online: https://github.com/JerryYaoGl/BSEvaluationRemoteSceneIR. PMID:28837112

  3. Image segmentation evaluation for very-large datasets

    NASA Astrophysics Data System (ADS)

    Reeves, Anthony P.; Liu, Shuang; Xie, Yiting

    2016-03-01

    With the advent of modern machine learning methods and fully automated image analysis there is a need for very large image datasets having documented segmentations for both computer algorithm training and evaluation. Current approaches of visual inspection and manual markings do not scale well to big data. We present a new approach that depends on fully automated algorithm outcomes for segmentation documentation, requires no manual marking, and provides quantitative evaluation for computer algorithms. The documentation of new image segmentations and new algorithm outcomes are achieved by visual inspection. The burden of visual inspection on large datasets is minimized by (a) customized visualizations for rapid review and (b) reducing the number of cases to be reviewed through analysis of quantitative segmentation evaluation. This method has been applied to a dataset of 7,440 whole-lung CT images for 6 different segmentation algorithms designed to fully automatically facilitate the measurement of a number of very important quantitative image biomarkers. The results indicate that we could achieve 93% to 99% successful segmentation for these algorithms on this relatively large image database. The presented evaluation method may be scaled to much larger image databases.

  4. Quantification of intensity variations in functional MR images using rotated principal components

    NASA Astrophysics Data System (ADS)

    Backfrieder, W.; Baumgartner, R.; Sámal, M.; Moser, E.; Bergmann, H.

    1996-08-01

    In functional MRI (fMRI), the changes in cerebral haemodynamics related to stimulated neural brain activity are measured using standard clinical MR equipment. Small intensity variations in fMRI data have to be detected and distinguished from non-neural effects by careful image analysis. Based on multivariate statistics we describe an algorithm involving oblique rotation of the most significant principal components for an estimation of the temporal and spatial distribution of the stimulated neural activity over the whole image matrix. This algorithm takes advantage of strong local signal variations. A mathematical phantom was designed to generate simulated data for the evaluation of the method. In simulation experiments, the potential of the method to quantify small intensity changes, especially when processing data sets containing multiple sources of signal variations, was demonstrated. In vivo fMRI data collected in both visual and motor stimulation experiments were analysed, showing a proper location of the activated cortical regions within well known neural centres and an accurate extraction of the activation time profile. The suggested method yields accurate absolute quantification of in vivo brain activity without the need of extensive prior knowledge and user interaction.

  5. Kernel Principal Component Analysis for dimensionality reduction in fMRI-based diagnosis of ADHD.

    PubMed

    Sidhu, Gagan S; Asgarian, Nasimeh; Greiner, Russell; Brown, Matthew R G

    2012-01-01

    This study explored various feature extraction methods for use in automated diagnosis of Attention-Deficit Hyperactivity Disorder (ADHD) from functional Magnetic Resonance Image (fMRI) data. Each participant's data consisted of a resting state fMRI scan as well as phenotypic data (age, gender, handedness, IQ, and site of scanning) from the ADHD-200 dataset. We used machine learning techniques to produce support vector machine (SVM) classifiers that attempted to differentiate between (1) all ADHD patients vs. healthy controls and (2) ADHD combined (ADHD-c) type vs. ADHD inattentive (ADHD-i) type vs. controls. In different tests, we used only the phenotypic data, only the imaging data, or else both the phenotypic and imaging data. For feature extraction on fMRI data, we tested the Fast Fourier Transform (FFT), different variants of Principal Component Analysis (PCA), and combinations of FFT and PCA. PCA variants included PCA over time (PCA-t), PCA over space and time (PCA-st), and kernelized PCA (kPCA-st). Baseline chance accuracy was 64.2% produced by guessing healthy control (the majority class) for all participants. Using only phenotypic data produced 72.9% accuracy on two class diagnosis and 66.8% on three class diagnosis. Diagnosis using only imaging data did not perform as well as phenotypic-only approaches. Using both phenotypic and imaging data with combined FFT and kPCA-st feature extraction yielded accuracies of 76.0% on two class diagnosis and 68.6% on three class diagnosis-better than phenotypic-only approaches. Our results demonstrate the potential of using FFT and kPCA-st with resting-state fMRI data as well as phenotypic data for automated diagnosis of ADHD. These results are encouraging given known challenges of learning ADHD diagnostic classifiers using the ADHD-200 dataset (see Brown et al., 2012).

  6. A novel approach to calibrate the hemodynamic model using functional Magnetic Resonance Imaging (fMRI) measurements.

    PubMed

    Khoram, Nafiseh; Zayane, Chadia; Djellouli, Rabia; Laleg-Kirati, Taous-Meriem

    2016-03-15

    The calibration of the hemodynamic model that describes changes in blood flow and blood oxygenation during brain activation is a crucial step for successfully monitoring and possibly predicting brain activity. This in turn has the potential to provide diagnosis and treatment of brain diseases in early stages. We propose an efficient numerical procedure for calibrating the hemodynamic model using some fMRI measurements. The proposed solution methodology is a regularized iterative method equipped with a Kalman filtering-type procedure. The Newton component of the proposed method addresses the nonlinear aspect of the problem. The regularization feature is used to ensure the stability of the algorithm. The Kalman filter procedure is incorporated here to address the noise in the data. Numerical results obtained with synthetic data as well as with real fMRI measurements are presented to illustrate the accuracy, robustness to the noise, and the cost-effectiveness of the proposed method. We present numerical results that clearly demonstrate that the proposed method outperforms the Cubature Kalman Filter (CKF), one of the most prominent existing numerical methods. We have designed an iterative numerical technique, called the TNM-CKF algorithm, for calibrating the mathematical model that describes the single-event related brain response when fMRI measurements are given. The method appears to be highly accurate and effective in reconstructing the BOLD signal even when the measurements are tainted with high noise level (as high as 30%). Published by Elsevier B.V.

  7. Android Malware Classification Using K-Means Clustering Algorithm

    NASA Astrophysics Data System (ADS)

    Hamid, Isredza Rahmi A.; Syafiqah Khalid, Nur; Azma Abdullah, Nurul; Rahman, Nurul Hidayah Ab; Chai Wen, Chuah

    2017-08-01

    Malware was designed to gain access or damage a computer system without user notice. Besides, attacker exploits malware to commit crime or fraud. This paper proposed Android malware classification approach based on K-Means clustering algorithm. We evaluate the proposed model in terms of accuracy using machine learning algorithms. Two datasets were selected to demonstrate the practicing of K-Means clustering algorithms that are Virus Total and Malgenome dataset. We classify the Android malware into three clusters which are ransomware, scareware and goodware. Nine features were considered for each types of dataset such as Lock Detected, Text Detected, Text Score, Encryption Detected, Threat, Porn, Law, Copyright and Moneypak. We used IBM SPSS Statistic software for data classification and WEKA tools to evaluate the built cluster. The proposed K-Means clustering algorithm shows promising result with high accuracy when tested using Random Forest algorithm.

  8. Robust continuous clustering

    PubMed Central

    Shah, Sohil Atul

    2017-01-01

    Clustering is a fundamental procedure in the analysis of scientific data. It is used ubiquitously across the sciences. Despite decades of research, existing clustering algorithms have limited effectiveness in high dimensions and often require tuning parameters for different domains and datasets. We present a clustering algorithm that achieves high accuracy across multiple domains and scales efficiently to high dimensions and large datasets. The presented algorithm optimizes a smooth continuous objective, which is based on robust statistics and allows heavily mixed clusters to be untangled. The continuous nature of the objective also allows clustering to be integrated as a module in end-to-end feature learning pipelines. We demonstrate this by extending the algorithm to perform joint clustering and dimensionality reduction by efficiently optimizing a continuous global objective. The presented approach is evaluated on large datasets of faces, hand-written digits, objects, newswire articles, sensor readings from the Space Shuttle, and protein expression levels. Our method achieves high accuracy across all datasets, outperforming the best prior algorithm by a factor of 3 in average rank. PMID:28851838

  9. A new approach to optic disc detection in human retinal images using the firefly algorithm.

    PubMed

    Rahebi, Javad; Hardalaç, Fırat

    2016-03-01

    There are various methods and algorithms to detect the optic discs in retinal images. In recent years, much attention has been given to the utilization of the intelligent algorithms. In this paper, we present a new automated method of optic disc detection in human retinal images using the firefly algorithm. The firefly intelligent algorithm is an emerging intelligent algorithm that was inspired by the social behavior of fireflies. The population in this algorithm includes the fireflies, each of which has a specific rate of lighting or fitness. In this method, the insects are compared two by two, and the less attractive insects can be observed to move toward the more attractive insects. Finally, one of the insects is selected as the most attractive, and this insect presents the optimum response to the problem in question. Here, we used the light intensity of the pixels of the retinal image pixels instead of firefly lightings. The movement of these insects due to local fluctuations produces different light intensity values in the images. Because the optic disc is the brightest area in the retinal images, all of the insects move toward brightest area and thus specify the location of the optic disc in the image. The results of implementation show that proposed algorithm could acquire an accuracy rate of 100 % in DRIVE dataset, 95 % in STARE dataset, and 94.38 % in DiaRetDB1 dataset. The results of implementation reveal high capability and accuracy of proposed algorithm in the detection of the optic disc from retinal images. Also, recorded required time for the detection of the optic disc in these images is 2.13 s for DRIVE dataset, 2.81 s for STARE dataset, and 3.52 s for DiaRetDB1 dataset accordingly. These time values are average value.

  10. A projection pursuit algorithm to classify individuals using fMRI data: Application to schizophrenia.

    PubMed

    Demirci, Oguz; Clark, Vincent P; Calhoun, Vince D

    2008-02-15

    Schizophrenia is diagnosed based largely upon behavioral symptoms. Currently, no quantitative, biologically based diagnostic technique has yet been developed to identify patients with schizophrenia. Classification of individuals into patient with schizophrenia and healthy control groups based on quantitative biologically based data is of great interest to support and refine psychiatric diagnoses. We applied a novel projection pursuit technique on various components obtained with independent component analysis (ICA) of 70 subjects' fMRI activation maps obtained during an auditory oddball task. The validity of the technique was tested with a leave-one-out method and the detection performance varied between 80% and 90%. The findings suggest that the proposed data reduction algorithm is effective in classifying individuals into schizophrenia and healthy control groups and may eventually prove useful as a diagnostic tool.

  11. Sparse representation based biomarker selection for schizophrenia with integrated analysis of fMRI and SNPs.

    PubMed

    Cao, Hongbao; Duan, Junbo; Lin, Dongdong; Shugart, Yin Yao; Calhoun, Vince; Wang, Yu-Ping

    2014-11-15

    Integrative analysis of multiple data types can take advantage of their complementary information and therefore may provide higher power to identify potential biomarkers that would be missed using individual data analysis. Due to different natures of diverse data modality, data integration is challenging. Here we address the data integration problem by developing a generalized sparse model (GSM) using weighting factors to integrate multi-modality data for biomarker selection. As an example, we applied the GSM model to a joint analysis of two types of schizophrenia data sets: 759,075 SNPs and 153,594 functional magnetic resonance imaging (fMRI) voxels in 208 subjects (92 cases/116 controls). To solve this small-sample-large-variable problem, we developed a novel sparse representation based variable selection (SRVS) algorithm, with the primary aim to identify biomarkers associated with schizophrenia. To validate the effectiveness of the selected variables, we performed multivariate classification followed by a ten-fold cross validation. We compared our proposed SRVS algorithm with an earlier sparse model based variable selection algorithm for integrated analysis. In addition, we compared with the traditional statistics method for uni-variant data analysis (Chi-squared test for SNP data and ANOVA for fMRI data). Results showed that our proposed SRVS method can identify novel biomarkers that show stronger capability in distinguishing schizophrenia patients from healthy controls. Moreover, better classification ratios were achieved using biomarkers from both types of data, suggesting the importance of integrative analysis. Copyright © 2014 Elsevier Inc. All rights reserved.

  12. Benchmarking protein classification algorithms via supervised cross-validation.

    PubMed

    Kertész-Farkas, Attila; Dhir, Somdutta; Sonego, Paolo; Pacurar, Mircea; Netoteia, Sergiu; Nijveen, Harm; Kuzniar, Arnold; Leunissen, Jack A M; Kocsor, András; Pongor, Sándor

    2008-04-24

    Development and testing of protein classification algorithms are hampered by the fact that the protein universe is characterized by groups vastly different in the number of members, in average protein size, similarity within group, etc. Datasets based on traditional cross-validation (k-fold, leave-one-out, etc.) may not give reliable estimates on how an algorithm will generalize to novel, distantly related subtypes of the known protein classes. Supervised cross-validation, i.e., selection of test and train sets according to the known subtypes within a database has been successfully used earlier in conjunction with the SCOP database. Our goal was to extend this principle to other databases and to design standardized benchmark datasets for protein classification. Hierarchical classification trees of protein categories provide a simple and general framework for designing supervised cross-validation strategies for protein classification. Benchmark datasets can be designed at various levels of the concept hierarchy using a simple graph-theoretic distance. A combination of supervised and random sampling was selected to construct reduced size model datasets, suitable for algorithm comparison. Over 3000 new classification tasks were added to our recently established protein classification benchmark collection that currently includes protein sequence (including protein domains and entire proteins), protein structure and reading frame DNA sequence data. We carried out an extensive evaluation based on various machine-learning algorithms such as nearest neighbor, support vector machines, artificial neural networks, random forests and logistic regression, used in conjunction with comparison algorithms, BLAST, Smith-Waterman, Needleman-Wunsch, as well as 3D comparison methods DALI and PRIDE. The resulting datasets provide lower, and in our opinion more realistic estimates of the classifier performance than do random cross-validation schemes. A combination of supervised and random sampling was used to construct model datasets, suitable for algorithm comparison.

  13. A benchmark for comparison of cell tracking algorithms

    PubMed Central

    Maška, Martin; Ulman, Vladimír; Svoboda, David; Matula, Pavel; Matula, Petr; Ederra, Cristina; Urbiola, Ainhoa; España, Tomás; Venkatesan, Subramanian; Balak, Deepak M.W.; Karas, Pavel; Bolcková, Tereza; Štreitová, Markéta; Carthel, Craig; Coraluppi, Stefano; Harder, Nathalie; Rohr, Karl; Magnusson, Klas E. G.; Jaldén, Joakim; Blau, Helen M.; Dzyubachyk, Oleh; Křížek, Pavel; Hagen, Guy M.; Pastor-Escuredo, David; Jimenez-Carretero, Daniel; Ledesma-Carbayo, Maria J.; Muñoz-Barrutia, Arrate; Meijering, Erik; Kozubek, Michal; Ortiz-de-Solorzano, Carlos

    2014-01-01

    Motivation: Automatic tracking of cells in multidimensional time-lapse fluorescence microscopy is an important task in many biomedical applications. A novel framework for objective evaluation of cell tracking algorithms has been established under the auspices of the IEEE International Symposium on Biomedical Imaging 2013 Cell Tracking Challenge. In this article, we present the logistics, datasets, methods and results of the challenge and lay down the principles for future uses of this benchmark. Results: The main contributions of the challenge include the creation of a comprehensive video dataset repository and the definition of objective measures for comparison and ranking of the algorithms. With this benchmark, six algorithms covering a variety of segmentation and tracking paradigms have been compared and ranked based on their performance on both synthetic and real datasets. Given the diversity of the datasets, we do not declare a single winner of the challenge. Instead, we present and discuss the results for each individual dataset separately. Availability and implementation: The challenge Web site (http://www.codesolorzano.com/celltrackingchallenge) provides access to the training and competition datasets, along with the ground truth of the training videos. It also provides access to Windows and Linux executable files of the evaluation software and most of the algorithms that competed in the challenge. Contact: codesolorzano@unav.es Supplementary information: Supplementary data are available at Bioinformatics online. PMID:24526711

  14. Learning discriminative functional network features of schizophrenia

    NASA Astrophysics Data System (ADS)

    Gheiratmand, Mina; Rish, Irina; Cecchi, Guillermo; Brown, Matthew; Greiner, Russell; Bashivan, Pouya; Polosecki, Pablo; Dursun, Serdar

    2017-03-01

    Associating schizophrenia with disrupted functional connectivity is a central idea in schizophrenia research. However, identifying neuroimaging-based features that can serve as reliable "statistical biomarkers" of the disease remains a challenging open problem. We argue that generalization accuracy and stability of candidate features ("biomarkers") must be used as additional criteria on top of standard significance tests in order to discover more robust biomarkers. Generalization accuracy refers to the utility of biomarkers for making predictions about individuals, for example discriminating between patients and controls, in novel datasets. Feature stability refers to the reproducibility of the candidate features across different datasets. Here, we extracted functional connectivity network features from fMRI data at both high-resolution (voxel-level) and a spatially down-sampled lower-resolution ("supervoxel" level). At the supervoxel level, we used whole-brain network links, while at the voxel level, due to the intractably large number of features, we sampled a subset of them. We compared statistical significance, stability and discriminative utility of both feature types in a multi-site fMRI dataset, composed of schizophrenia patients and healthy controls. For both feature types, a considerable fraction of features showed significant differences between the two groups. Also, both feature types were similarly stable across multiple data subsets. However, the whole-brain supervoxel functional connectivity features showed a higher cross-validation classification accuracy of 78.7% vs. 72.4% for the voxel-level features. Cross-site variability and heterogeneity in the patient samples in the multi-site FBIRN dataset made the task more challenging compared to single-site studies. The use of the above methodology in combination with the fully data-driven approach using the whole brain information have the potential to shed light on "biomarker discovery" in schizophrenia.

  15. A Fast Exact k-Nearest Neighbors Algorithm for High Dimensional Search Using k-Means Clustering and Triangle Inequality.

    PubMed

    Wang, Xueyi

    2012-02-08

    The k-nearest neighbors (k-NN) algorithm is a widely used machine learning method that finds nearest neighbors of a test object in a feature space. We present a new exact k-NN algorithm called kMkNN (k-Means for k-Nearest Neighbors) that uses the k-means clustering and the triangle inequality to accelerate the searching for nearest neighbors in a high dimensional space. The kMkNN algorithm has two stages. In the buildup stage, instead of using complex tree structures such as metric trees, kd-trees, or ball-tree, kMkNN uses a simple k-means clustering method to preprocess the training dataset. In the searching stage, given a query object, kMkNN finds nearest training objects starting from the nearest cluster to the query object and uses the triangle inequality to reduce the distance calculations. Experiments show that the performance of kMkNN is surprisingly good compared to the traditional k-NN algorithm and tree-based k-NN algorithms such as kd-trees and ball-trees. On a collection of 20 datasets with up to 10(6) records and 10(4) dimensions, kMkNN shows a 2-to 80-fold reduction of distance calculations and a 2- to 60-fold speedup over the traditional k-NN algorithm for 16 datasets. Furthermore, kMkNN performs significant better than a kd-tree based k-NN algorithm for all datasets and performs better than a ball-tree based k-NN algorithm for most datasets. The results show that kMkNN is effective for searching nearest neighbors in high dimensional spaces.

  16. A Comparison of Lung Nodule Segmentation Algorithms: Methods and Results from a Multi-institutional Study.

    PubMed

    Kalpathy-Cramer, Jayashree; Zhao, Binsheng; Goldgof, Dmitry; Gu, Yuhua; Wang, Xingwei; Yang, Hao; Tan, Yongqiang; Gillies, Robert; Napel, Sandy

    2016-08-01

    Tumor volume estimation, as well as accurate and reproducible borders segmentation in medical images, are important in the diagnosis, staging, and assessment of response to cancer therapy. The goal of this study was to demonstrate the feasibility of a multi-institutional effort to assess the repeatability and reproducibility of nodule borders and volume estimate bias of computerized segmentation algorithms in CT images of lung cancer, and to provide results from such a study. The dataset used for this evaluation consisted of 52 tumors in 41 CT volumes (40 patient datasets and 1 dataset containing scans of 12 phantom nodules of known volume) from five collections available in The Cancer Imaging Archive. Three academic institutions developing lung nodule segmentation algorithms submitted results for three repeat runs for each of the nodules. We compared the performance of lung nodule segmentation algorithms by assessing several measurements of spatial overlap and volume measurement. Nodule sizes varied from 29 μl to 66 ml and demonstrated a diversity of shapes. Agreement in spatial overlap of segmentations was significantly higher for multiple runs of the same algorithm than between segmentations generated by different algorithms (p < 0.05) and was significantly higher on the phantom dataset compared to the other datasets (p < 0.05). Algorithms differed significantly in the bias of the measured volumes of the phantom nodules (p < 0.05) underscoring the need for assessing performance on clinical data in addition to phantoms. Algorithms that most accurately estimated nodule volumes were not the most repeatable, emphasizing the need to evaluate both their accuracy and precision. There were considerable differences between algorithms, especially in a subset of heterogeneous nodules, underscoring the recommendation that the same software be used at all time points in longitudinal studies.

  17. Cortical Cartography and Caret Software

    PubMed Central

    Van Essen, David C.

    2011-01-01

    Caret software is widely used for analyzing and visualizing many types of fMRI data, often in conjunction with experimental data from other modalities. This article places Caret’s development in a historical context that spans three decades of brain mapping – from the early days of manually generated flat maps to the nascent field of human connectomics. It also highlights some of Caret’s distinctive capabilities. This includes the ease of visualizing data on surfaces and/or volumes and on atlases as well as individual subjects. Caret can display many types of experimental data using various combinations of overlays (e.g., fMRI activation maps, cortical parcellations, areal boundaries), and it has other features that facilitate the analysis and visualization of complex neuroimaging datasets. PMID:22062192

  18. Competitive learning with pairwise constraints.

    PubMed

    Covões, Thiago F; Hruschka, Eduardo R; Ghosh, Joydeep

    2013-01-01

    Constrained clustering has been an active research topic since the last decade. Most studies focus on batch-mode algorithms. This brief introduces two algorithms for on-line constrained learning, named on-line linear constrained vector quantization error (O-LCVQE) and constrained rival penalized competitive learning (C-RPCL). The former is a variant of the LCVQE algorithm for on-line settings, whereas the latter is an adaptation of the (on-line) RPCL algorithm to deal with constrained clustering. The accuracy results--in terms of the normalized mutual information (NMI)--from experiments with nine datasets show that the partitions induced by O-LCVQE are competitive with those found by the (batch-mode) LCVQE. Compared with this formidable baseline algorithm, it is surprising that C-RPCL can provide better partitions (in terms of the NMI) for most of the datasets. Also, experiments on a large dataset show that on-line algorithms for constrained clustering can significantly reduce the computational time.

  19. DIRBoost-an algorithm for boosting deformable image registration: application to lung CT intra-subject registration.

    PubMed

    Muenzing, Sascha E A; van Ginneken, Bram; Viergever, Max A; Pluim, Josien P W

    2014-04-01

    We introduce a boosting algorithm to improve on existing methods for deformable image registration (DIR). The proposed DIRBoost algorithm is inspired by the theory on hypothesis boosting, well known in the field of machine learning. DIRBoost utilizes a method for automatic registration error detection to obtain estimates of local registration quality. All areas detected as erroneously registered are subjected to boosting, i.e. undergo iterative registrations by employing boosting masks on both the fixed and moving image. We validated the DIRBoost algorithm on three different DIR methods (ANTS gSyn, NiftyReg, and DROP) on three independent reference datasets of pulmonary image scan pairs. DIRBoost reduced registration errors significantly and consistently on all reference datasets for each DIR algorithm, yielding an improvement of the registration accuracy by 5-34% depending on the dataset and the registration algorithm employed. Copyright © 2014 Elsevier B.V. All rights reserved.

  20. An improved filtering algorithm for big read datasets and its application to single-cell assembly.

    PubMed

    Wedemeyer, Axel; Kliemann, Lasse; Srivastav, Anand; Schielke, Christian; Reusch, Thorsten B; Rosenstiel, Philip

    2017-07-03

    For single-cell or metagenomic sequencing projects, it is necessary to sequence with a very high mean coverage in order to make sure that all parts of the sample DNA get covered by the reads produced. This leads to huge datasets with lots of redundant data. A filtering of this data prior to assembly is advisable. Brown et al. (2012) presented the algorithm Diginorm for this purpose, which filters reads based on the abundance of their k-mers. We present Bignorm, a faster and quality-conscious read filtering algorithm. An important new algorithmic feature is the use of phred quality scores together with a detailed analysis of the k-mer counts to decide which reads to keep. We qualify and recommend parameters for our new read filtering algorithm. Guided by these parameters, we remove in terms of median 97.15% of the reads while keeping the mean phred score of the filtered dataset high. Using the SDAdes assembler, we produce assemblies of high quality from these filtered datasets in a fraction of the time needed for an assembly from the datasets filtered with Diginorm. We conclude that read filtering is a practical and efficient method for reducing read data and for speeding up the assembly process. This applies not only for single cell assembly, as shown in this paper, but also to other projects with high mean coverage datasets like metagenomic sequencing projects. Our Bignorm algorithm allows assemblies of competitive quality in comparison to Diginorm, while being much faster. Bignorm is available for download at https://git.informatik.uni-kiel.de/axw/Bignorm .

  1. A hybrid approach to select features and classify diseases based on medical data

    NASA Astrophysics Data System (ADS)

    AbdelLatif, Hisham; Luo, Jiawei

    2018-03-01

    Feature selection is popular problem in the classification of diseases in clinical medicine. Here, we developing a hybrid methodology to classify diseases, based on three medical datasets, Arrhythmia, Breast cancer, and Hepatitis datasets. This methodology called k-means ANOVA Support Vector Machine (K-ANOVA-SVM) uses K-means cluster with ANOVA statistical to preprocessing data and selection the significant features, and Support Vector Machines in the classification process. To compare and evaluate the performance, we choice three classification algorithms, decision tree Naïve Bayes, Support Vector Machines and applied the medical datasets direct to these algorithms. Our methodology was a much better classification accuracy is given of 98% in Arrhythmia datasets, 92% in Breast cancer datasets and 88% in Hepatitis datasets, Compare to use the medical data directly with decision tree Naïve Bayes, and Support Vector Machines. Also, the ROC curve and precision with (K-ANOVA-SVM) Achieved best results than other algorithms

  2. Analysis of Naïve Bayes Algorithm for Email Spam Filtering across Multiple Datasets

    NASA Astrophysics Data System (ADS)

    Fitriah Rusland, Nurul; Wahid, Norfaradilla; Kasim, Shahreen; Hafit, Hanayanti

    2017-08-01

    E-mail spam continues to become a problem on the Internet. Spammed e-mail may contain many copies of the same message, commercial advertisement or other irrelevant posts like pornographic content. In previous research, different filtering techniques are used to detect these e-mails such as using Random Forest, Naïve Bayesian, Support Vector Machine (SVM) and Neutral Network. In this research, we test Naïve Bayes algorithm for e-mail spam filtering on two datasets and test its performance, i.e., Spam Data and SPAMBASE datasets [8]. The performance of the datasets is evaluated based on their accuracy, recall, precision and F-measure. Our research use WEKA tool for the evaluation of Naïve Bayes algorithm for e-mail spam filtering on both datasets. The result shows that the type of email and the number of instances of the dataset has an influence towards the performance of Naïve Bayes.

  3. GSNFS: Gene subnetwork biomarker identification of lung cancer expression data.

    PubMed

    Doungpan, Narumol; Engchuan, Worrawat; Chan, Jonathan H; Meechai, Asawin

    2016-12-05

    Gene expression has been used to identify disease gene biomarkers, but there are ongoing challenges. Single gene or gene-set biomarkers are inadequate to provide sufficient understanding of complex disease mechanisms and the relationship among those genes. Network-based methods have thus been considered for inferring the interaction within a group of genes to further study the disease mechanism. Recently, the Gene-Network-based Feature Set (GNFS), which is capable of handling case-control and multiclass expression for gene biomarker identification, has been proposed, partly taking into account of network topology. However, its performance relies on a greedy search for building subnetworks and thus requires further improvement. In this work, we establish a new approach named Gene Sub-Network-based Feature Selection (GSNFS) by implementing the GNFS framework with two proposed searching and scoring algorithms, namely gene-set-based (GS) search and parent-node-based (PN) search, to identify subnetworks. An additional dataset is used to validate the results. The two proposed searching algorithms of the GSNFS method for subnetwork expansion are concerned with the degree of connectivity and the scoring scheme for building subnetworks and their topology. For each iteration of expansion, the neighbour genes of a current subnetwork, whose expression data improved the overall subnetwork score, is recruited. While the GS search calculated the subnetwork score using an activity score of a current subnetwork and the gene expression values of its neighbours, the PN search uses the expression value of the corresponding parent of each neighbour gene. Four lung cancer expression datasets were used for subnetwork identification. In addition, using pathway data and protein-protein interaction as network data in order to consider the interaction among significant genes were discussed. Classification was performed to compare the performance of the identified gene subnetworks with three subnetwork identification algorithms. The two searching algorithms resulted in better classification and gene/gene-set agreement compared to the original greedy search of the GNFS method. The identified lung cancer subnetwork using the proposed searching algorithm resulted in an improvement of the cross-dataset validation and an increase in the consistency of findings between two independent datasets. The homogeneity measurement of the datasets was conducted to assess dataset compatibility in cross-dataset validation. The lung cancer dataset with higher homogeneity showed a better result when using the GS search while the dataset with low homogeneity showed a better result when using the PN search. The 10-fold cross-dataset validation on the independent lung cancer datasets showed higher classification performance of the proposed algorithms when compared with the greedy search in the original GNFS method. The proposed searching algorithms provide a higher number of genes in the subnetwork expansion step than the greedy algorithm. As a result, the performance of the subnetworks identified from the GSNFS method was improved in terms of classification performance and gene/gene-set level agreement depending on the homogeneity of the datasets used in the analysis. Some common genes obtained from the four datasets using different searching algorithms are genes known to play a role in lung cancer. The improvement of classification performance and the gene/gene-set level agreement, and the biological relevance indicated the effectiveness of the GSNFS method for gene subnetwork identification using expression data.

  4. Neural imaging to track mental states while using an intelligent tutoring system.

    PubMed

    Anderson, John R; Betts, Shawn; Ferris, Jennifer L; Fincham, Jon M

    2010-04-13

    Hemodynamic measures of brain activity can be used to interpret a student's mental state when they are interacting with an intelligent tutoring system. Functional magnetic resonance imaging (fMRI) data were collected while students worked with a tutoring system that taught an algebra isomorph. A cognitive model predicted the distribution of solution times from measures of problem complexity. Separately, a linear discriminant analysis used fMRI data to predict whether or not students were engaged in problem solving. A hidden Markov algorithm merged these two sources of information to predict the mental states of students during problem-solving episodes. The algorithm was trained on data from 1 day of interaction and tested with data from a later day. In terms of predicting what state a student was in during a 2-s period, the algorithm achieved 87% accuracy on the training data and 83% accuracy on the test data. The results illustrate the importance of integrating the bottom-up information from imaging data with the top-down information from a cognitive model.

  5. Parallel group independent component analysis for massive fMRI data sets.

    PubMed

    Chen, Shaojie; Huang, Lei; Qiu, Huitong; Nebel, Mary Beth; Mostofsky, Stewart H; Pekar, James J; Lindquist, Martin A; Eloyan, Ani; Caffo, Brian S

    2017-01-01

    Independent component analysis (ICA) is widely used in the field of functional neuroimaging to decompose data into spatio-temporal patterns of co-activation. In particular, ICA has found wide usage in the analysis of resting state fMRI (rs-fMRI) data. Recently, a number of large-scale data sets have become publicly available that consist of rs-fMRI scans from thousands of subjects. As a result, efficient ICA algorithms that scale well to the increased number of subjects are required. To address this problem, we propose a two-stage likelihood-based algorithm for performing group ICA, which we denote Parallel Group Independent Component Analysis (PGICA). By utilizing the sequential nature of the algorithm and parallel computing techniques, we are able to efficiently analyze data sets from large numbers of subjects. We illustrate the efficacy of PGICA, which has been implemented in R and is freely available through the Comprehensive R Archive Network, through simulation studies and application to rs-fMRI data from two large multi-subject data sets, consisting of 301 and 779 subjects respectively.

  6. A wavelet-based estimator of the degrees of freedom in denoised fMRI time series for probabilistic testing of functional connectivity and brain graphs.

    PubMed

    Patel, Ameera X; Bullmore, Edward T

    2016-11-15

    Connectome mapping using techniques such as functional magnetic resonance imaging (fMRI) has become a focus of systems neuroscience. There remain many statistical challenges in analysis of functional connectivity and network architecture from BOLD fMRI multivariate time series. One key statistic for any time series is its (effective) degrees of freedom, df, which will generally be less than the number of time points (or nominal degrees of freedom, N). If we know the df, then probabilistic inference on other fMRI statistics, such as the correlation between two voxel or regional time series, is feasible. However, we currently lack good estimators of df in fMRI time series, especially after the degrees of freedom of the "raw" data have been modified substantially by denoising algorithms for head movement. Here, we used a wavelet-based method both to denoise fMRI data and to estimate the (effective) df of the denoised process. We show that seed voxel correlations corrected for locally variable df could be tested for false positive connectivity with better control over Type I error and greater specificity of anatomical mapping than probabilistic connectivity maps using the nominal degrees of freedom. We also show that wavelet despiked statistics can be used to estimate all pairwise correlations between a set of regional nodes, assign a P value to each edge, and then iteratively add edges to the graph in order of increasing P. These probabilistically thresholded graphs are likely more robust to regional variation in head movement effects than comparable graphs constructed by thresholding correlations. Finally, we show that time-windowed estimates of df can be used for probabilistic connectivity testing or dynamic network analysis so that apparent changes in the functional connectome are appropriately corrected for the effects of transient noise bursts. Wavelet despiking is both an algorithm for fMRI time series denoising and an estimator of the (effective) df of denoised fMRI time series. Accurate estimation of df offers many potential advantages for probabilistically thresholding functional connectivity and network statistics tested in the context of spatially variant and non-stationary noise. Code for wavelet despiking, seed correlational testing and probabilistic graph construction is freely available to download as part of the BrainWavelet Toolbox at www.brainwavelet.org. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.

  7. Classification of Medical Datasets Using SVMs with Hybrid Evolutionary Algorithms Based on Endocrine-Based Particle Swarm Optimization and Artificial Bee Colony Algorithms.

    PubMed

    Lin, Kuan-Cheng; Hsieh, Yi-Hsiu

    2015-10-01

    The classification and analysis of data is an important issue in today's research. Selecting a suitable set of features makes it possible to classify an enormous quantity of data quickly and efficiently. Feature selection is generally viewed as a problem of feature subset selection, such as combination optimization problems. Evolutionary algorithms using random search methods have proven highly effective in obtaining solutions to problems of optimization in a diversity of applications. In this study, we developed a hybrid evolutionary algorithm based on endocrine-based particle swarm optimization (EPSO) and artificial bee colony (ABC) algorithms in conjunction with a support vector machine (SVM) for the selection of optimal feature subsets for the classification of datasets. The results of experiments using specific UCI medical datasets demonstrate that the accuracy of the proposed hybrid evolutionary algorithm is superior to that of basic PSO, EPSO and ABC algorithms, with regard to classification accuracy using subsets with a reduced number of features.

  8. Cluster-level statistical inference in fMRI datasets: The unexpected behavior of random fields in high dimensions.

    PubMed

    Bansal, Ravi; Peterson, Bradley S

    2018-06-01

    Identifying regional effects of interest in MRI datasets usually entails testing a priori hypotheses across many thousands of brain voxels, requiring control for false positive findings in these multiple hypotheses testing. Recent studies have suggested that parametric statistical methods may have incorrectly modeled functional MRI data, thereby leading to higher false positive rates than their nominal rates. Nonparametric methods for statistical inference when conducting multiple statistical tests, in contrast, are thought to produce false positives at the nominal rate, which has thus led to the suggestion that previously reported studies should reanalyze their fMRI data using nonparametric tools. To understand better why parametric methods may yield excessive false positives, we assessed their performance when applied both to simulated datasets of 1D, 2D, and 3D Gaussian Random Fields (GRFs) and to 710 real-world, resting-state fMRI datasets. We showed that both the simulated 2D and 3D GRFs and the real-world data contain a small percentage (<6%) of very large clusters (on average 60 times larger than the average cluster size), which were not present in 1D GRFs. These unexpectedly large clusters were deemed statistically significant using parametric methods, leading to empirical familywise error rates (FWERs) as high as 65%: the high empirical FWERs were not a consequence of parametric methods failing to model spatial smoothness accurately, but rather of these very large clusters that are inherently present in smooth, high-dimensional random fields. In fact, when discounting these very large clusters, the empirical FWER for parametric methods was 3.24%. Furthermore, even an empirical FWER of 65% would yield on average less than one of those very large clusters in each brain-wide analysis. Nonparametric methods, in contrast, estimated distributions from those large clusters, and therefore, by construct rejected the large clusters as false positives at the nominal FWERs. Those rejected clusters were outlying values in the distribution of cluster size but cannot be distinguished from true positive findings without further analyses, including assessing whether fMRI signal in those regions correlates with other clinical, behavioral, or cognitive measures. Rejecting the large clusters, however, significantly reduced the statistical power of nonparametric methods in detecting true findings compared with parametric methods, which would have detected most true findings that are essential for making valid biological inferences in MRI data. Parametric analyses, in contrast, detected most true findings while generating relatively few false positives: on average, less than one of those very large clusters would be deemed a true finding in each brain-wide analysis. We therefore recommend the continued use of parametric methods that model nonstationary smoothness for cluster-level, familywise control of false positives, particularly when using a Cluster Defining Threshold of 2.5 or higher, and subsequently assessing rigorously the biological plausibility of the findings, even for large clusters. Finally, because nonparametric methods yielded a large reduction in statistical power to detect true positive findings, we conclude that the modest reduction in false positive findings that nonparametric analyses afford does not warrant a re-analysis of previously published fMRI studies using nonparametric techniques. Copyright © 2018 Elsevier Inc. All rights reserved.

  9. Combining task-evoked and spontaneous activity to improve pre-operative brain mapping with fMRI

    PubMed Central

    Fox, Michael D.; Qian, Tianyi; Madsen, Joseph R.; Wang, Danhong; Li, Meiling; Ge, Manling; Zuo, Huan-cong; Groppe, David M.; Mehta, Ashesh D.; Hong, Bo; Liu, Hesheng

    2016-01-01

    Noninvasive localization of brain function is used to understand and treat neurological disease, exemplified by pre-operative fMRI mapping prior to neurosurgical intervention. The principal approach for generating these maps relies on brain responses evoked by a task and, despite known limitations, has dominated clinical practice for over 20 years. Recently, pre-operative fMRI mapping based on correlations in spontaneous brain activity has been demonstrated, however this approach has its own limitations and has not seen widespread clinical use. Here we show that spontaneous and task-based mapping can be performed together using the same pre-operative fMRI data, provide complimentary information relevant for functional localization, and can be combined to improve identification of eloquent motor cortex. Accuracy, sensitivity, and specificity of our approach are quantified through comparison with electrical cortical stimulation mapping in eight patients with intractable epilepsy. Broad applicability and reproducibility of our approach is demonstrated through prospective replication in an independent dataset of six patients from a different center. In both cohorts and every individual patient, we see a significant improvement in signal to noise and mapping accuracy independent of threshold, quantified using receiver operating characteristic curves. Collectively, our results suggest that modifying the processing of fMRI data to incorporate both task-based and spontaneous activity significantly improves functional localization in pre-operative patients. Because this method requires no additional scan time or modification to conventional pre-operative data acquisition protocols it could have widespread utility. PMID:26408860

  10. The DataBridge: A System For Optimizing The Use Of Dark Data From The Long Tail Of Science

    NASA Astrophysics Data System (ADS)

    Lander, H.; Rajasekar, A.

    2015-12-01

    The DataBridge is a National Science Foundation funded collaborative project (OCI-1247652, OCI-1247602, OCI-1247663) designed to assist in the discovery of dark data sets from the long tail of science. The DataBridge aims to to build queryable communities of datasets using sociometric network analysis. This approach is being tested to evaluate the ability to leverage various forms of metadata to facilitate discovery of new knowledge. Each dataset in the Databridge has an associated name space used as a first level partitioning. In addition to testing known algorithms for SNA community building, the DataBridge project has built a message-based platform that allows users to provide their own algorithms for each of the stages in the community building process. The stages are: Signature Generation (SG): An SG algorithm creates a metadata signature for a dataset. Signature algorithms might use text metadata provided by the dataset creator or derive metadata. Relevance Algorithm (RA): An RA compares a pair of datasets and produces a similarity value between 0 and 1 for the two datasets. Sociometric Network Analysis (SNA): The SNA will operate on a similarity matrix produced by an RA to partition all of the datasets in the name space into a set of clusters. These clusters represent communities of closely related datasets. The DataBridge also includes a web application that produces a visual representation of the clustering. Future work includes a more complete application that will allow different types of searching of the network of datasets. The DataBridge approach is relevant to geoscience research and informatics. In this presentation we will outline the project, illustrate the deployment of the approach, and discuss other potential applications and next steps for the research such as applying this approach to models. In addition we will explore the relevance of DataBridge to other geoscience projects such as various EarthCube Building Blocks and DIBBS projects.

  11. PyMVPA: A python toolbox for multivariate pattern analysis of fMRI data.

    PubMed

    Hanke, Michael; Halchenko, Yaroslav O; Sederberg, Per B; Hanson, Stephen José; Haxby, James V; Pollmann, Stefan

    2009-01-01

    Decoding patterns of neural activity onto cognitive states is one of the central goals of functional brain imaging. Standard univariate fMRI analysis methods, which correlate cognitive and perceptual function with the blood oxygenation-level dependent (BOLD) signal, have proven successful in identifying anatomical regions based on signal increases during cognitive and perceptual tasks. Recently, researchers have begun to explore new multivariate techniques that have proven to be more flexible, more reliable, and more sensitive than standard univariate analysis. Drawing on the field of statistical learning theory, these new classifier-based analysis techniques possess explanatory power that could provide new insights into the functional properties of the brain. However, unlike the wealth of software packages for univariate analyses, there are few packages that facilitate multivariate pattern classification analyses of fMRI data. Here we introduce a Python-based, cross-platform, and open-source software toolbox, called PyMVPA, for the application of classifier-based analysis techniques to fMRI datasets. PyMVPA makes use of Python's ability to access libraries written in a large variety of programming languages and computing environments to interface with the wealth of existing machine learning packages. We present the framework in this paper and provide illustrative examples on its usage, features, and programmability.

  12. PyMVPA: A Python toolbox for multivariate pattern analysis of fMRI data

    PubMed Central

    Hanke, Michael; Halchenko, Yaroslav O.; Sederberg, Per B.; Hanson, Stephen José; Haxby, James V.; Pollmann, Stefan

    2009-01-01

    Decoding patterns of neural activity onto cognitive states is one of the central goals of functional brain imaging. Standard univariate fMRI analysis methods, which correlate cognitive and perceptual function with the blood oxygenation-level dependent (BOLD) signal, have proven successful in identifying anatomical regions based on signal increases during cognitive and perceptual tasks. Recently, researchers have begun to explore new multivariate techniques that have proven to be more flexible, more reliable, and more sensitive than standard univariate analysis. Drawing on the field of statistical learning theory, these new classifier-based analysis techniques possess explanatory power that could provide new insights into the functional properties of the brain. However, unlike the wealth of software packages for univariate analyses, there are few packages that facilitate multivariate pattern classification analyses of fMRI data. Here we introduce a Python-based, cross-platform, and open-source software toolbox, called PyMVPA, for the application of classifier-based analysis techniques to fMRI datasets. PyMVPA makes use of Python's ability to access libraries written in a large variety of programming languages and computing environments to interface with the wealth of existing machine-learning packages. We present the framework in this paper and provide illustrative examples on its usage, features, and programmability. PMID:19184561

  13. Is Rest Really Rest? Resting State Functional Connectivity during Rest and Motor Task Paradigms.

    PubMed

    Jurkiewicz, Michael T; Crawley, Adrian P; Mikulis, David J

    2018-04-18

    Numerous studies have identified the default mode network (DMN) within the brain of healthy individuals, which has been attributed to the ongoing mental activity of the brain during the wakeful resting-state. While engaged during specific resting-state fMRI paradigms, it remains unclear as to whether traditional block-design simple movement fMRI experiments significantly influence the default mode network or other areas. Using blood-oxygen level dependent (BOLD) fMRI we characterized the pattern of functional connectivity in healthy subjects during a resting-state paradigm and compared this to the same resting-state analysis performed on motor task data residual time courses after regressing out the task paradigm. Using seed-voxel analysis to define the DMN, the executive control network (ECN), and sensorimotor, auditory and visual networks, the resting-state analysis of the residual time courses demonstrated reduced functional connectivity in the motor network and reduced connectivity between the insula and the ECN compared to the standard resting-state datasets. Overall, performance of simple self-directed motor tasks does little to change the resting-state functional connectivity across the brain, especially in non-motor areas. This would suggest that previously acquired fMRI studies incorporating simple block-design motor tasks could be mined retrospectively for assessment of the resting-state connectivity.

  14. A wavelet-based statistical analysis of FMRI data: I. motivation and data distribution modeling.

    PubMed

    Dinov, Ivo D; Boscardin, John W; Mega, Michael S; Sowell, Elizabeth L; Toga, Arthur W

    2005-01-01

    We propose a new method for statistical analysis of functional magnetic resonance imaging (fMRI) data. The discrete wavelet transformation is employed as a tool for efficient and robust signal representation. We use structural magnetic resonance imaging (MRI) and fMRI to empirically estimate the distribution of the wavelet coefficients of the data both across individuals and spatial locations. An anatomical subvolume probabilistic atlas is used to tessellate the structural and functional signals into smaller regions each of which is processed separately. A frequency-adaptive wavelet shrinkage scheme is employed to obtain essentially optimal estimations of the signals in the wavelet space. The empirical distributions of the signals on all the regions are computed in a compressed wavelet space. These are modeled by heavy-tail distributions because their histograms exhibit slower tail decay than the Gaussian. We discovered that the Cauchy, Bessel K Forms, and Pareto distributions provide the most accurate asymptotic models for the distribution of the wavelet coefficients of the data. Finally, we propose a new model for statistical analysis of functional MRI data using this atlas-based wavelet space representation. In the second part of our investigation, we will apply this technique to analyze a large fMRI dataset involving repeated presentation of sensory-motor response stimuli in young, elderly, and demented subjects.

  15. External validation of a publicly available computer assisted diagnostic tool for mammographic mass lesions with two high prevalence research datasets.

    PubMed

    Benndorf, Matthias; Burnside, Elizabeth S; Herda, Christoph; Langer, Mathias; Kotter, Elmar

    2015-08-01

    Lesions detected at mammography are described with a highly standardized terminology: the breast imaging-reporting and data system (BI-RADS) lexicon. Up to now, no validated semantic computer assisted classification algorithm exists to interactively link combinations of morphological descriptors from the lexicon to a probabilistic risk estimate of malignancy. The authors therefore aim at the external validation of the mammographic mass diagnosis (MMassDx) algorithm. A classification algorithm like MMassDx must perform well in a variety of clinical circumstances and in datasets that were not used to generate the algorithm in order to ultimately become accepted in clinical routine. The MMassDx algorithm uses a naïve Bayes network and calculates post-test probabilities of malignancy based on two distinct sets of variables, (a) BI-RADS descriptors and age ("descriptor model") and (b) BI-RADS descriptors, age, and BI-RADS assessment categories ("inclusive model"). The authors evaluate both the MMassDx (descriptor) and MMassDx (inclusive) models using two large publicly available datasets of mammographic mass lesions: the digital database for screening mammography (DDSM) dataset, which contains two subsets from the same examinations-a medio-lateral oblique (MLO) view and cranio-caudal (CC) view dataset-and the mammographic mass (MM) dataset. The DDSM contains 1220 mass lesions and the MM dataset contains 961 mass lesions. The authors evaluate discriminative performance using area under the receiver-operating-characteristic curve (AUC) and compare this to the BI-RADS assessment categories alone (i.e., the clinical performance) using the DeLong method. The authors also evaluate whether assigned probabilistic risk estimates reflect the lesions' true risk of malignancy using calibration curves. The authors demonstrate that the MMassDx algorithms show good discriminatory performance. AUC for the MMassDx (descriptor) model in the DDSM data is 0.876/0.895 (MLO/CC view) and AUC for the MMassDx (inclusive) model in the DDSM data is 0.891/0.900 (MLO/CC view). AUC for the MMassDx (descriptor) model in the MM data is 0.862 and AUC for the MMassDx (inclusive) model in the MM data is 0.900. In all scenarios, MMassDx performs significantly better than clinical performance, P < 0.05 each. The authors furthermore demonstrate that the MMassDx algorithm systematically underestimates the risk of malignancy in the DDSM and MM datasets, especially when low probabilities of malignancy are assigned. The authors' results reveal that the MMassDx algorithms have good discriminatory performance but less accurate calibration when tested on two independent validation datasets. Improvement in calibration and testing in a prospective clinical population will be important steps in the pursuit of translation of these algorithms to the clinic.

  16. Investigating the enhancement of template-free activation detection of event-related fMRI data using wavelet shrinkage and figures of merit.

    PubMed

    Ngan, Shing-Chung; Hu, Xiaoping; Khong, Pek-Lan

    2011-03-01

    We propose a method for preprocessing event-related functional magnetic resonance imaging (fMRI) data that can lead to enhancement of template-free activation detection. The method is based on using a figure of merit to guide the wavelet shrinkage of a given fMRI data set. Several previous studies have demonstrated that in the root-mean-square error setting, wavelet shrinkage can improve the signal-to-noise ratio of fMRI time courses. However, preprocessing fMRI data in the root-mean-square error setting does not necessarily lead to enhancement of template-free activation detection. Motivated by this observation, in this paper, we move to the detection setting and investigate the possibility of using wavelet shrinkage to enhance template-free activation detection of fMRI data. The main ingredients of our method are (i) forward wavelet transform of the voxel time courses, (ii) shrinking the resulting wavelet coefficients as directed by an appropriate figure of merit, (iii) inverse wavelet transform of the shrunk data, and (iv) submitting these preprocessed time courses to a given activation detection algorithm. Two figures of merit are developed in the paper, and two other figures of merit adapted from the literature are described. Receiver-operating characteristic analyses with simulated fMRI data showed quantitative evidence that data preprocessing as guided by the figures of merit developed in the paper can yield improved detectability of the template-free measures. We also demonstrate the application of our methodology on an experimental fMRI data set. The proposed method is useful for enhancing template-free activation detection in event-related fMRI data. It is of significant interest to extend the present framework to produce comprehensive, adaptive and fully automated preprocessing of fMRI data optimally suited for subsequent data analysis steps. Copyright © 2010 Elsevier B.V. All rights reserved.

  17. Progeny Clustering: A Method to Identify Biological Phenotypes

    PubMed Central

    Hu, Chenyue W.; Kornblau, Steven M.; Slater, John H.; Qutub, Amina A.

    2015-01-01

    Estimating the optimal number of clusters is a major challenge in applying cluster analysis to any type of dataset, especially to biomedical datasets, which are high-dimensional and complex. Here, we introduce an improved method, Progeny Clustering, which is stability-based and exceptionally efficient in computing, to find the ideal number of clusters. The algorithm employs a novel Progeny Sampling method to reconstruct cluster identity, a co-occurrence probability matrix to assess the clustering stability, and a set of reference datasets to overcome inherent biases in the algorithm and data space. Our method was shown successful and robust when applied to two synthetic datasets (datasets of two-dimensions and ten-dimensions containing eight dimensions of pure noise), two standard biological datasets (the Iris dataset and Rat CNS dataset) and two biological datasets (a cell phenotype dataset and an acute myeloid leukemia (AML) reverse phase protein array (RPPA) dataset). Progeny Clustering outperformed some popular clustering evaluation methods in the ten-dimensional synthetic dataset as well as in the cell phenotype dataset, and it was the only method that successfully discovered clinically meaningful patient groupings in the AML RPPA dataset. PMID:26267476

  18. Fast Automatic Segmentation of White Matter Streamlines Based on a Multi-Subject Bundle Atlas.

    PubMed

    Labra, Nicole; Guevara, Pamela; Duclap, Delphine; Houenou, Josselin; Poupon, Cyril; Mangin, Jean-François; Figueroa, Miguel

    2017-01-01

    This paper presents an algorithm for fast segmentation of white matter bundles from massive dMRI tractography datasets using a multisubject atlas. We use a distance metric to compare streamlines in a subject dataset to labeled centroids in the atlas, and label them using a per-bundle configurable threshold. In order to reduce segmentation time, the algorithm first preprocesses the data using a simplified distance metric to rapidly discard candidate streamlines in multiple stages, while guaranteeing that no false negatives are produced. The smaller set of remaining streamlines is then segmented using the original metric, thus eliminating any false positives from the preprocessing stage. As a result, a single-thread implementation of the algorithm can segment a dataset of almost 9 million streamlines in less than 6 minutes. Moreover, parallel versions of our algorithm for multicore processors and graphics processing units further reduce the segmentation time to less than 22 seconds and to 5 seconds, respectively. This performance enables the use of the algorithm in truly interactive applications for visualization, analysis, and segmentation of large white matter tractography datasets.

  19. Integration of patient specific modeling and advanced image processing techniques for image-guided neurosurgery

    NASA Astrophysics Data System (ADS)

    Archip, Neculai; Fedorov, Andriy; Lloyd, Bryn; Chrisochoides, Nikos; Golby, Alexandra; Black, Peter M.; Warfield, Simon K.

    2006-03-01

    A major challenge in neurosurgery oncology is to achieve maximal tumor removal while avoiding postoperative neurological deficits. Therefore, estimation of the brain deformation during the image guided tumor resection process is necessary. While anatomic MRI is highly sensitive for intracranial pathology, its specificity is limited. Different pathologies may have a very similar appearance on anatomic MRI. Moreover, since fMRI and diffusion tensor imaging are not currently available during the surgery, non-rigid registration of preoperative MR with intra-operative MR is necessary. This article presents a translational research effort that aims to integrate a number of state-of-the-art technologies for MRI-guided neurosurgery at the Brigham and Women's Hospital (BWH). Our ultimate goal is to routinely provide the neurosurgeons with accurate information about brain deformation during the surgery. The current system is tested during the weekly neurosurgeries in the open magnet at the BWH. The preoperative data is processed, prior to the surgery, while both rigid and non-rigid registration algorithms are run in the vicinity of the operating room. The system is tested on 9 image datasets from 3 neurosurgery cases. A method based on edge detection is used to quantitatively validate the results. 95% Hausdorff distance between points of the edges is used to estimate the accuracy of the registration. Overall, the minimum error is 1.4 mm, the mean error 2.23 mm, and the maximum error 3.1 mm. The mean ratio between brain deformation estimation and rigid alignment is 2.07. It demonstrates that our results can be 2.07 times more precise then the current technology. The major contribution of the presented work is the rigid and non-rigid alignment of the pre-operative fMRI with intra-operative 0.5T MRI achieved during the neurosurgery.

  20. Establishing the resting state default mode network derived from functional magnetic resonance imaging tasks as an endophenotype: A twins study.

    PubMed

    Korgaonkar, Mayuresh S; Ram, Kaushik; Williams, Leanne M; Gatt, Justine M; Grieve, Stuart M

    2014-08-01

    The resting state default mode network (DMN) has been shown to characterize a number of neurological and psychiatric disorders. Evidence suggests an underlying genetic basis for this network and hence could serve as potential endophenotype for these disorders. Heritability is a defining criterion for endophenotypes. The DMN is measured either using a resting-state functional magnetic resonance imaging (fMRI) scan or by extracting resting state activity from task-based fMRI. The current study is the first to evaluate heritability of this task-derived resting activity. 250 healthy adult twins (79 monozygotic and 46 dizygotic same sex twin pairs) completed five cognitive and emotion processing fMRI tasks. Resting state DMN functional connectivity was derived from these five fMRI tasks. We validated this approach by comparing connectivity estimates from task-derived resting activity for all five fMRI tasks, with those obtained using a dedicated task-free resting state scan in an independent cohort of 27 healthy individuals. Structural equation modeling using the classic twin design was used to estimate the genetic and environmental contributions to variance for the resting-state DMN functional connectivity. About 9-41% of the variance in functional connectivity between the DMN nodes was attributed to genetic contribution with the greatest heritability found for functional connectivity between the posterior cingulate and right inferior parietal nodes (P<0.001). Our data provide new evidence that functional connectivity measures from the intrinsic DMN derived from task-based fMRI datasets are under genetic control and have the potential to serve as endophenotypes for genetically predisposed psychiatric and neurological disorders. Copyright © 2014 Wiley Periodicals, Inc.

  1. Optimization of Contrast Detection Power with Probabilistic Behavioral Information

    PubMed Central

    Cordes, Dietmar; Herzmann, Grit; Nandy, Rajesh; Curran, Tim

    2012-01-01

    Recent progress in the experimental design for event-related fMRI experiments made it possible to find the optimal stimulus sequence for maximum contrast detection power using a genetic algorithm. In this study, a novel algorithm is proposed for optimization of contrast detection power by including probabilistic behavioral information, based on pilot data, in the genetic algorithm. As a particular application, a recognition memory task is studied and the design matrix optimized for contrasts involving the familiarity of individual items (pictures of objects) and the recollection of qualitative information associated with the items (left/right orientation). Optimization of contrast efficiency is a complicated issue whenever subjects’ responses are not deterministic but probabilistic. Contrast efficiencies are not predictable unless behavioral responses are included in the design optimization. However, available software for design optimization does not include options for probabilistic behavioral constraints. If the anticipated behavioral responses are included in the optimization algorithm, the design is optimal for the assumed behavioral responses, and the resulting contrast efficiency is greater than what either a block design or a random design can achieve. Furthermore, improvements of contrast detection power depend strongly on the behavioral probabilities, the perceived randomness, and the contrast of interest. The present genetic algorithm can be applied to any case in which fMRI contrasts are dependent on probabilistic responses that can be estimated from pilot data. PMID:22326984

  2. An advanced algorithm for fetal heart rate estimation from non-invasive low electrode density recordings.

    PubMed

    Dessì, Alessia; Pani, Danilo; Raffo, Luigi

    2014-08-01

    Non-invasive fetal electrocardiography is still an open research issue. The recent publication of an annotated dataset on Physionet providing four-channel non-invasive abdominal ECG traces promoted an international challenge on the topic. Starting from that dataset, an algorithm for the identification of the fetal QRS complexes from a reduced number of electrodes and without any a priori information about the electrode positioning has been developed, entering into the top ten best-performing open-source algorithms presented at the challenge.In this paper, an improved version of that algorithm is presented and evaluated exploiting the same challenge metrics. It is mainly based on the subtraction of the maternal QRS complexes in every lead, obtained by synchronized averaging of morphologically similar complexes, the filtering of the maternal P and T waves and the enhancement of the fetal QRS through independent component analysis (ICA) applied on the processed signals before a final fetal QRS detection stage. The RR time series of both the mother and the fetus are analyzed to enhance pseudoperiodicity with the aim of correcting wrong annotations. The algorithm has been designed and extensively evaluated on the open dataset A (N = 75), and finally evaluated on datasets B (N = 100) and C (N = 272) to have the mean scores over data not used during the algorithm development. Compared to the results achieved by the previous version of the algorithm, the current version would mark the 5th and 4th position in the final ranking related to the events 1 and 2, reserved to the open-source challenge entries, taking into account both official and unofficial entrants. On dataset A, the algorithm achieves 0.982 median sensitivity and 0.976 median positive predictivity.

  3. A Robust Dynamic Heart-Rate Detection Algorithm Framework During Intense Physical Activities Using Photoplethysmographic Signals

    PubMed Central

    Song, Jiajia; Li, Dan; Ma, Xiaoyuan; Teng, Guowei; Wei, Jianming

    2017-01-01

    Dynamic accurate heart-rate (HR) estimation using a photoplethysmogram (PPG) during intense physical activities is always challenging due to corruption by motion artifacts (MAs). It is difficult to reconstruct a clean signal and extract HR from contaminated PPG. This paper proposes a robust HR-estimation algorithm framework that uses one-channel PPG and tri-axis acceleration data to reconstruct the PPG and calculate the HR based on features of the PPG and spectral analysis. Firstly, the signal is judged by the presence of MAs. Then, the spectral peaks corresponding to acceleration data are filtered from the periodogram of the PPG when MAs exist. Different signal-processing methods are applied based on the amount of remaining PPG spectral peaks. The main MA-removal algorithm (NFEEMD) includes the repeated single-notch filter and ensemble empirical mode decomposition. Finally, HR calibration is designed to ensure the accuracy of HR tracking. The NFEEMD algorithm was performed on the 23 datasets from the 2015 IEEE Signal Processing Cup Database. The average estimation errors were 1.12 BPM (12 training datasets), 2.63 BPM (10 testing datasets) and 1.87 BPM (all 23 datasets), respectively. The Pearson correlation was 0.992. The experiment results illustrate that the proposed algorithm is not only suitable for HR estimation during continuous activities, like slow running (13 training datasets), but also for intense physical activities with acceleration, like arm exercise (10 testing datasets). PMID:29068403

  4. Automated identification of drug and food allergies entered using non-standard terminology.

    PubMed

    Epstein, Richard H; St Jacques, Paul; Stockin, Michael; Rothman, Brian; Ehrenfeld, Jesse M; Denny, Joshua C

    2013-01-01

    An accurate computable representation of food and drug allergy is essential for safe healthcare. Our goal was to develop a high-performance, easily maintained algorithm to identify medication and food allergies and sensitivities from unstructured allergy entries in electronic health record (EHR) systems. An algorithm was developed in Transact-SQL to identify ingredients to which patients had allergies in a perioperative information management system. The algorithm used RxNorm and natural language processing techniques developed on a training set of 24 599 entries from 9445 records. Accuracy, specificity, precision, recall, and F-measure were determined for the training dataset and repeated for the testing dataset (24 857 entries from 9430 records). Accuracy, precision, recall, and F-measure for medication allergy matches were all above 98% in the training dataset and above 97% in the testing dataset for all allergy entries. Corresponding values for food allergy matches were above 97% and above 93%, respectively. Specificities of the algorithm were 90.3% and 85.0% for drug matches and 100% and 88.9% for food matches in the training and testing datasets, respectively. The algorithm had high performance for identification of medication and food allergies. Maintenance is practical, as updates are managed through upload of new RxNorm versions and additions to companion database tables. However, direct entry of codified allergy information by providers (through autocompleters or drop lists) is still preferred to post-hoc encoding of the data. Data tables used in the algorithm are available for download. A high performing, easily maintained algorithm can successfully identify medication and food allergies from free text entries in EHR systems.

  5. Topologic analysis and comparison of brain activation in children with epilepsy versus controls: an fMRI study

    NASA Astrophysics Data System (ADS)

    Oweis, Khalid J.; Berl, Madison M.; Gaillard, William D.; Duke, Elizabeth S.; Blackstone, Kaitlin; Loew, Murray H.; Zara, Jason M.

    2010-03-01

    This paper describes the development of novel computer-aided analysis algorithms to identify the language activation patterns at a certain Region of Interest (ROI) in Functional Magnetic Resonance Imaging (fMRI). Previous analysis techniques have been used to compare typical and pathologic activation patterns in fMRI images resulting from identical tasks but none of them analyzed activation topographically in a quantitative manner. This paper presents new analysis techniques and algorithms capable of identifying a pattern of language activation associated with localization related epilepsy. fMRI images of 64 healthy individuals and 31 patients with localization related epilepsy have been studied and analyzed on an ROI basis. All subjects are right handed with normal MRI scans and have been classified into three age groups (4-6, 7-9, 10-12 years). Our initial efforts have focused on investigating activation in the Left Inferior Frontal Gyrus (LIFG). A number of volumetric features have been extracted from the data. The LIFG has been cut into slices and the activation has been investigated topographically on a slice by slice basis. Overall, a total of 809 features have been extracted, and correlation analysis was applied to eliminate highly correlated features. Principal Component analysis was then applied to account only for major components in the data and One-Way Analysis of Variance (ANOVA) has been applied to test for significantly different features between normal and patient groups. Twenty Nine features have were found to be significantly different (p<0.05) between patient and control groups

  6. An automated method for identifying artifact in independent component analysis of resting-state FMRI.

    PubMed

    Bhaganagarapu, Kaushik; Jackson, Graeme D; Abbott, David F

    2013-01-01

    An enduring issue with data-driven analysis and filtering methods is the interpretation of results. To assist, we present an automatic method for identification of artifact in independent components (ICs) derived from functional MRI (fMRI). The method was designed with the following features: does not require temporal information about an fMRI paradigm; does not require the user to train the algorithm; requires only the fMRI images (additional acquisition of anatomical imaging not required); is able to identify a high proportion of artifact-related ICs without removing components that are likely to be of neuronal origin; can be applied to resting-state fMRI; is automated, requiring minimal or no human intervention. We applied the method to a MELODIC probabilistic ICA of resting-state functional connectivity data acquired in 50 healthy control subjects, and compared the results to a blinded expert manual classification. The method identified between 26 and 72% of the components as artifact (mean 55%). About 0.3% of components identified as artifact were discordant with the manual classification; retrospective examination of these ICs suggested the automated method had correctly identified these as artifact. We have developed an effective automated method which removes a substantial number of unwanted noisy components in ICA analyses of resting-state fMRI data. Source code of our implementation of the method is available.

  7. An Automated Method for Identifying Artifact in Independent Component Analysis of Resting-State fMRI

    PubMed Central

    Bhaganagarapu, Kaushik; Jackson, Graeme D.; Abbott, David F.

    2013-01-01

    An enduring issue with data-driven analysis and filtering methods is the interpretation of results. To assist, we present an automatic method for identification of artifact in independent components (ICs) derived from functional MRI (fMRI). The method was designed with the following features: does not require temporal information about an fMRI paradigm; does not require the user to train the algorithm; requires only the fMRI images (additional acquisition of anatomical imaging not required); is able to identify a high proportion of artifact-related ICs without removing components that are likely to be of neuronal origin; can be applied to resting-state fMRI; is automated, requiring minimal or no human intervention. We applied the method to a MELODIC probabilistic ICA of resting-state functional connectivity data acquired in 50 healthy control subjects, and compared the results to a blinded expert manual classification. The method identified between 26 and 72% of the components as artifact (mean 55%). About 0.3% of components identified as artifact were discordant with the manual classification; retrospective examination of these ICs suggested the automated method had correctly identified these as artifact. We have developed an effective automated method which removes a substantial number of unwanted noisy components in ICA analyses of resting-state fMRI data. Source code of our implementation of the method is available. PMID:23847511

  8. Sparse representation of whole-brain fMRI signals for identification of functional networks.

    PubMed

    Lv, Jinglei; Jiang, Xi; Li, Xiang; Zhu, Dajiang; Chen, Hanbo; Zhang, Tuo; Zhang, Shu; Hu, Xintao; Han, Junwei; Huang, Heng; Zhang, Jing; Guo, Lei; Liu, Tianming

    2015-02-01

    There have been several recent studies that used sparse representation for fMRI signal analysis and activation detection based on the assumption that each voxel's fMRI signal is linearly composed of sparse components. Previous studies have employed sparse coding to model functional networks in various modalities and scales. These prior contributions inspired the exploration of whether/how sparse representation can be used to identify functional networks in a voxel-wise way and on the whole brain scale. This paper presents a novel, alternative methodology of identifying multiple functional networks via sparse representation of whole-brain task-based fMRI signals. Our basic idea is that all fMRI signals within the whole brain of one subject are aggregated into a big data matrix, which is then factorized into an over-complete dictionary basis matrix and a reference weight matrix via an effective online dictionary learning algorithm. Our extensive experimental results have shown that this novel methodology can uncover multiple functional networks that can be well characterized and interpreted in spatial, temporal and frequency domains based on current brain science knowledge. Importantly, these well-characterized functional network components are quite reproducible in different brains. In general, our methods offer a novel, effective and unified solution to multiple fMRI data analysis tasks including activation detection, de-activation detection, and functional network identification. Copyright © 2014 Elsevier B.V. All rights reserved.

  9. Pairwise gene GO-based measures for biclustering of high-dimensional expression data.

    PubMed

    Nepomuceno, Juan A; Troncoso, Alicia; Nepomuceno-Chamorro, Isabel A; Aguilar-Ruiz, Jesús S

    2018-01-01

    Biclustering algorithms search for groups of genes that share the same behavior under a subset of samples in gene expression data. Nowadays, the biological knowledge available in public repositories can be used to drive these algorithms to find biclusters composed of groups of genes functionally coherent. On the other hand, a distance among genes can be defined according to their information stored in Gene Ontology (GO). Gene pairwise GO semantic similarity measures report a value for each pair of genes which establishes their functional similarity. A scatter search-based algorithm that optimizes a merit function that integrates GO information is studied in this paper. This merit function uses a term that addresses the information through a GO measure. The effect of two possible different gene pairwise GO measures on the performance of the algorithm is analyzed. Firstly, three well known yeast datasets with approximately one thousand of genes are studied. Secondly, a group of human datasets related to clinical data of cancer is also explored by the algorithm. Most of these data are high-dimensional datasets composed of a huge number of genes. The resultant biclusters reveal groups of genes linked by a same functionality when the search procedure is driven by one of the proposed GO measures. Furthermore, a qualitative biological study of a group of biclusters show their relevance from a cancer disease perspective. It can be concluded that the integration of biological information improves the performance of the biclustering process. The two different GO measures studied show an improvement in the results obtained for the yeast dataset. However, if datasets are composed of a huge number of genes, only one of them really improves the algorithm performance. This second case constitutes a clear option to explore interesting datasets from a clinical point of view.

  10. Systematic Poisoning Attacks on and Defenses for Machine Learning in Healthcare.

    PubMed

    Mozaffari-Kermani, Mehran; Sur-Kolay, Susmita; Raghunathan, Anand; Jha, Niraj K

    2015-11-01

    Machine learning is being used in a wide range of application domains to discover patterns in large datasets. Increasingly, the results of machine learning drive critical decisions in applications related to healthcare and biomedicine. Such health-related applications are often sensitive, and thus, any security breach would be catastrophic. Naturally, the integrity of the results computed by machine learning is of great importance. Recent research has shown that some machine-learning algorithms can be compromised by augmenting their training datasets with malicious data, leading to a new class of attacks called poisoning attacks. Hindrance of a diagnosis may have life-threatening consequences and could cause distrust. On the other hand, not only may a false diagnosis prompt users to distrust the machine-learning algorithm and even abandon the entire system but also such a false positive classification may cause patient distress. In this paper, we present a systematic, algorithm-independent approach for mounting poisoning attacks across a wide range of machine-learning algorithms and healthcare datasets. The proposed attack procedure generates input data, which, when added to the training set, can either cause the results of machine learning to have targeted errors (e.g., increase the likelihood of classification into a specific class), or simply introduce arbitrary errors (incorrect classification). These attacks may be applied to both fixed and evolving datasets. They can be applied even when only statistics of the training dataset are available or, in some cases, even without access to the training dataset, although at a lower efficacy. We establish the effectiveness of the proposed attacks using a suite of six machine-learning algorithms and five healthcare datasets. Finally, we present countermeasures against the proposed generic attacks that are based on tracking and detecting deviations in various accuracy metrics, and benchmark their effectiveness.

  11. Fault Tolerant Frequent Pattern Mining

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Shohdy, Sameh; Vishnu, Abhinav; Agrawal, Gagan

    FP-Growth algorithm is a Frequent Pattern Mining (FPM) algorithm that has been extensively used to study correlations and patterns in large scale datasets. While several researchers have designed distributed memory FP-Growth algorithms, it is pivotal to consider fault tolerant FP-Growth, which can address the increasing fault rates in large scale systems. In this work, we propose a novel parallel, algorithm-level fault-tolerant FP-Growth algorithm. We leverage algorithmic properties and MPI advanced features to guarantee an O(1) space complexity, achieved by using the dataset memory space itself for checkpointing. We also propose a recovery algorithm that can use in-memory and disk-based checkpointing,more » though in many cases the recovery can be completed without any disk access, and incurring no memory overhead for checkpointing. We evaluate our FT algorithm on a large scale InfiniBand cluster with several large datasets using up to 2K cores. Our evaluation demonstrates excellent efficiency for checkpointing and recovery in comparison to the disk-based approach. We have also observed 20x average speed-up in comparison to Spark, establishing that a well designed algorithm can easily outperform a solution based on a general fault-tolerant programming model.« less

  12. Identifying injection drug use and estimating population size of people who inject drugs using healthcare administrative datasets.

    PubMed

    Janjua, Naveed Zafar; Islam, Nazrul; Kuo, Margot; Yu, Amanda; Wong, Stanley; Butt, Zahid A; Gilbert, Mark; Buxton, Jane; Chapinal, Nuria; Samji, Hasina; Chong, Mei; Alvarez, Maria; Wong, Jason; Tyndall, Mark W; Krajden, Mel

    2018-05-01

    Large linked healthcare administrative datasets could be used to monitor programs providing prevention and treatment services to people who inject drugs (PWID). However, diagnostic codes in administrative datasets do not differentiate non-injection from injection drug use (IDU). We validated algorithms based on diagnostic codes and prescription records representing IDU in administrative datasets against interview-based IDU data. The British Columbia Hepatitis Testers Cohort (BC-HTC) includes ∼1.7 million individuals tested for HCV/HIV or reported HBV/HCV/HIV/tuberculosis cases in BC from 1990 to 2015, linked to administrative datasets including physician visit, hospitalization and prescription drug records. IDU, assessed through interviews as part of enhanced surveillance at the time of HIV or HCV/HBV diagnosis from a subset of cases included in the BC-HTC (n = 6559), was used as the gold standard. ICD-9/ICD-10 codes for IDU and injecting-related infections (IRI) were grouped with records of opioid substitution therapy (OST) into multiple IDU algorithms in administrative datasets. We assessed the performance of IDU algorithms through calculation of sensitivity, specificity, positive predictive, and negative predictive values. Sensitivity was highest (90-94%), and specificity was lowest (42-73%) for algorithms based either on IDU or IRI and drug misuse codes. Algorithms requiring both drug misuse and IRI had lower sensitivity (57-60%) and higher specificity (90-92%). An optimal sensitivity and specificity combination was found with two medical visits or a single hospitalization for injectable drugs with (83%/82%) and without OST (78%/83%), respectively. Based on algorithms that included two medical visits, a single hospitalization or OST records, there were 41,358 (1.2% of 11-65 years individuals in BC) recent PWID in BC based on health encounters during 3- year period (2013-2015). Algorithms for identifying PWID using diagnostic codes in linked administrative data could be used for tracking the progress of programing aimed at PWID. With population-based datasets, this tool can be used to inform much needed estimates of PWID population size. Copyright © 2018 Elsevier B.V. All rights reserved.

  13. Biclustering sparse binary genomic data.

    PubMed

    van Uitert, Miranda; Meuleman, Wouter; Wessels, Lodewyk

    2008-12-01

    Genomic datasets often consist of large, binary, sparse data matrices. In such a dataset, one is often interested in finding contiguous blocks that (mostly) contain ones. This is a biclustering problem, and while many algorithms have been proposed to deal with gene expression data, only two algorithms have been proposed that specifically deal with binary matrices. None of the gene expression biclustering algorithms can handle the large number of zeros in sparse binary matrices. The two proposed binary algorithms failed to produce meaningful results. In this article, we present a new algorithm that is able to extract biclusters from sparse, binary datasets. A powerful feature is that biclusters with different numbers of rows and columns can be detected, varying from many rows to few columns and few rows to many columns. It allows the user to guide the search towards biclusters of specific dimensions. When applying our algorithm to an input matrix derived from TRANSFAC, we find transcription factors with distinctly dissimilar binding motifs, but a clear set of common targets that are significantly enriched for GO categories.

  14. CIFAR10-DVS: An Event-Stream Dataset for Object Classification

    PubMed Central

    Li, Hongmin; Liu, Hanchao; Ji, Xiangyang; Li, Guoqi; Shi, Luping

    2017-01-01

    Neuromorphic vision research requires high-quality and appropriately challenging event-stream datasets to support continuous improvement of algorithms and methods. However, creating event-stream datasets is a time-consuming task, which needs to be recorded using the neuromorphic cameras. Currently, there are limited event-stream datasets available. In this work, by utilizing the popular computer vision dataset CIFAR-10, we converted 10,000 frame-based images into 10,000 event streams using a dynamic vision sensor (DVS), providing an event-stream dataset of intermediate difficulty in 10 different classes, named as “CIFAR10-DVS.” The conversion of event-stream dataset was implemented by a repeated closed-loop smooth (RCLS) movement of frame-based images. Unlike the conversion of frame-based images by moving the camera, the image movement is more realistic in respect of its practical applications. The repeated closed-loop image movement generates rich local intensity changes in continuous time which are quantized by each pixel of the DVS camera to generate events. Furthermore, a performance benchmark in event-driven object classification is provided based on state-of-the-art classification algorithms. This work provides a large event-stream dataset and an initial benchmark for comparison, which may boost algorithm developments in even-driven pattern recognition and object classification. PMID:28611582

  15. CIFAR10-DVS: An Event-Stream Dataset for Object Classification.

    PubMed

    Li, Hongmin; Liu, Hanchao; Ji, Xiangyang; Li, Guoqi; Shi, Luping

    2017-01-01

    Neuromorphic vision research requires high-quality and appropriately challenging event-stream datasets to support continuous improvement of algorithms and methods. However, creating event-stream datasets is a time-consuming task, which needs to be recorded using the neuromorphic cameras. Currently, there are limited event-stream datasets available. In this work, by utilizing the popular computer vision dataset CIFAR-10, we converted 10,000 frame-based images into 10,000 event streams using a dynamic vision sensor (DVS), providing an event-stream dataset of intermediate difficulty in 10 different classes, named as "CIFAR10-DVS." The conversion of event-stream dataset was implemented by a repeated closed-loop smooth (RCLS) movement of frame-based images. Unlike the conversion of frame-based images by moving the camera, the image movement is more realistic in respect of its practical applications. The repeated closed-loop image movement generates rich local intensity changes in continuous time which are quantized by each pixel of the DVS camera to generate events. Furthermore, a performance benchmark in event-driven object classification is provided based on state-of-the-art classification algorithms. This work provides a large event-stream dataset and an initial benchmark for comparison, which may boost algorithm developments in even-driven pattern recognition and object classification.

  16. Aerosol Climate Time Series in ESA Aerosol_cci

    NASA Astrophysics Data System (ADS)

    Popp, Thomas; de Leeuw, Gerrit; Pinnock, Simon

    2016-04-01

    Within the ESA Climate Change Initiative (CCI) Aerosol_cci (2010 - 2017) conducts intensive work to improve algorithms for the retrieval of aerosol information from European sensors. Meanwhile, full mission time series of 2 GCOS-required aerosol parameters are completely validated and released: Aerosol Optical Depth (AOD) from dual view ATSR-2 / AATSR radiometers (3 algorithms, 1995 - 2012), and stratospheric extinction profiles from star occultation GOMOS spectrometer (2002 - 2012). Additionally, a 35-year multi-sensor time series of the qualitative Absorbing Aerosol Index (AAI) together with sensitivity information and an AAI model simulator is available. Complementary aerosol properties requested by GCOS are in a "round robin" phase, where various algorithms are inter-compared: fine mode AOD, mineral dust AOD (from the thermal IASI spectrometer, but also from ATSR instruments and the POLDER sensor), absorption information and aerosol layer height. As a quasi-reference for validation in few selected regions with sparse ground-based observations the multi-pixel GRASP algorithm for the POLDER instrument is used. Validation of first dataset versions (vs. AERONET, MAN) and inter-comparison to other satellite datasets (MODIS, MISR, SeaWIFS) proved the high quality of the available datasets comparable to other satellite retrievals and revealed needs for algorithm improvement (for example for higher AOD values) which were taken into account for a reprocessing. The datasets contain pixel level uncertainty estimates which were also validated and improved in the reprocessing. For the three ATSR algorithms the use of an ensemble method was tested. The paper will summarize and discuss the status of dataset reprocessing and validation. The focus will be on the ATSR, GOMOS and IASI datasets. Pixel level uncertainties validation will be summarized and discussed including unknown components and their potential usefulness and limitations. Opportunities for time series extension with successor instruments of the Sentinel family will be described and the complementarity of the different satellite aerosol products (e.g. dust vs. total AOD, ensembles from different algorithms for the same sensor) will be discussed.

  17. A Comparative Evaluation of Unsupervised Anomaly Detection Algorithms for Multivariate Data.

    PubMed

    Goldstein, Markus; Uchida, Seiichi

    2016-01-01

    Anomaly detection is the process of identifying unexpected items or events in datasets, which differ from the norm. In contrast to standard classification tasks, anomaly detection is often applied on unlabeled data, taking only the internal structure of the dataset into account. This challenge is known as unsupervised anomaly detection and is addressed in many practical applications, for example in network intrusion detection, fraud detection as well as in the life science and medical domain. Dozens of algorithms have been proposed in this area, but unfortunately the research community still lacks a comparative universal evaluation as well as common publicly available datasets. These shortcomings are addressed in this study, where 19 different unsupervised anomaly detection algorithms are evaluated on 10 different datasets from multiple application domains. By publishing the source code and the datasets, this paper aims to be a new well-funded basis for unsupervised anomaly detection research. Additionally, this evaluation reveals the strengths and weaknesses of the different approaches for the first time. Besides the anomaly detection performance, computational effort, the impact of parameter settings as well as the global/local anomaly detection behavior is outlined. As a conclusion, we give an advise on algorithm selection for typical real-world tasks.

  18. Rare itemsets mining algorithm based on RP-Tree and spark framework

    NASA Astrophysics Data System (ADS)

    Liu, Sainan; Pan, Haoan

    2018-05-01

    For the issues of the rare itemsets mining in big data, this paper proposed a rare itemsets mining algorithm based on RP-Tree and Spark framework. Firstly, it arranged the data vertically according to the transaction identifier, in order to solve the defects of scan the entire data set, the vertical datasets are divided into frequent vertical datasets and rare vertical datasets. Then, it adopted the RP-Tree algorithm to construct the frequent pattern tree that contains rare items and generate rare 1-itemsets. After that, it calculated the support of the itemsets by scanning the two vertical data sets, finally, it used the iterative process to generate rare itemsets. The experimental show that the algorithm can effectively excavate rare itemsets and have great superiority in execution time.

  19. Joint Blind Source Separation by Multi-set Canonical Correlation Analysis

    PubMed Central

    Li, Yi-Ou; Adalı, Tülay; Wang, Wei; Calhoun, Vince D

    2009-01-01

    In this work, we introduce a simple and effective scheme to achieve joint blind source separation (BSS) of multiple datasets using multi-set canonical correlation analysis (M-CCA) [1]. We first propose a generative model of joint BSS based on the correlation of latent sources within and between datasets. We specify source separability conditions, and show that, when the conditions are satisfied, the group of corresponding sources from each dataset can be jointly extracted by M-CCA through maximization of correlation among the extracted sources. We compare source separation performance of the M-CCA scheme with other joint BSS methods and demonstrate the superior performance of the M-CCA scheme in achieving joint BSS for a large number of datasets, group of corresponding sources with heterogeneous correlation values, and complex-valued sources with circular and non-circular distributions. We apply M-CCA to analysis of functional magnetic resonance imaging (fMRI) data from multiple subjects and show its utility in estimating meaningful brain activations from a visuomotor task. PMID:20221319

  20. Image fusion using sparse overcomplete feature dictionaries

    DOEpatents

    Brumby, Steven P.; Bettencourt, Luis; Kenyon, Garrett T.; Chartrand, Rick; Wohlberg, Brendt

    2015-10-06

    Approaches for deciding what individuals in a population of visual system "neurons" are looking for using sparse overcomplete feature dictionaries are provided. A sparse overcomplete feature dictionary may be learned for an image dataset and a local sparse representation of the image dataset may be built using the learned feature dictionary. A local maximum pooling operation may be applied on the local sparse representation to produce a translation-tolerant representation of the image dataset. An object may then be classified and/or clustered within the translation-tolerant representation of the image dataset using a supervised classification algorithm and/or an unsupervised clustering algorithm.

  1. Bayesian spatiotemporal model of fMRI data using transfer functions.

    PubMed

    Quirós, Alicia; Diez, Raquel Montes; Wilson, Simon P

    2010-09-01

    This research describes a new Bayesian spatiotemporal model to analyse BOLD fMRI studies. In the temporal dimension, we describe the shape of the hemodynamic response function (HRF) with a transfer function model. The spatial continuity and local homogeneity of the evoked responses are modelled by a Gaussian Markov random field prior on the parameter indicating activations. The proposal constitutes an extension of the spatiotemporal model presented in a previous approach [Quirós, A., Montes Diez, R. and Gamerman, D., 2010. Bayesian spatiotemporal model of fMRI data, Neuroimage, 49: 442-456], offering more flexibility in the estimation of the HRF and computational advantages in the resulting MCMC algorithm. Simulations from the model are performed in order to ascertain the performance of the sampling scheme and the ability of the posterior to estimate model parameters, as well as to check the model sensitivity to signal to noise ratio. Results are shown on synthetic data and on a real data set from a block-design fMRI experiment. Copyright (c) 2010 Elsevier Inc. All rights reserved.

  2. Large-scale Labeled Datasets to Fuel Earth Science Deep Learning Applications

    NASA Astrophysics Data System (ADS)

    Maskey, M.; Ramachandran, R.; Miller, J.

    2017-12-01

    Deep learning has revolutionized computer vision and natural language processing with various algorithms scaled using high-performance computing. However, generic large-scale labeled datasets such as the ImageNet are the fuel that drives the impressive accuracy of deep learning results. Large-scale labeled datasets already exist in domains such as medical science, but creating them in the Earth science domain is a challenge. While there are ways to apply deep learning using limited labeled datasets, there is a need in the Earth sciences for creating large-scale labeled datasets for benchmarking and scaling deep learning applications. At the NASA Marshall Space Flight Center, we are using deep learning for a variety of Earth science applications where we have encountered the need for large-scale labeled datasets. We will discuss our approaches for creating such datasets and why these datasets are just as valuable as deep learning algorithms. We will also describe successful usage of these large-scale labeled datasets with our deep learning based applications.

  3. MoleculeNet: a benchmark for molecular machine learning† †Electronic supplementary information (ESI) available. See DOI: 10.1039/c7sc02664a

    PubMed Central

    Wu, Zhenqin; Ramsundar, Bharath; Feinberg, Evan N.; Gomes, Joseph; Geniesse, Caleb; Pappu, Aneesh S.; Leswing, Karl

    2017-01-01

    Molecular machine learning has been maturing rapidly over the last few years. Improved methods and the presence of larger datasets have enabled machine learning algorithms to make increasingly accurate predictions about molecular properties. However, algorithmic progress has been limited due to the lack of a standard benchmark to compare the efficacy of proposed methods; most new algorithms are benchmarked on different datasets making it challenging to gauge the quality of proposed methods. This work introduces MoleculeNet, a large scale benchmark for molecular machine learning. MoleculeNet curates multiple public datasets, establishes metrics for evaluation, and offers high quality open-source implementations of multiple previously proposed molecular featurization and learning algorithms (released as part of the DeepChem open source library). MoleculeNet benchmarks demonstrate that learnable representations are powerful tools for molecular machine learning and broadly offer the best performance. However, this result comes with caveats. Learnable representations still struggle to deal with complex tasks under data scarcity and highly imbalanced classification. For quantum mechanical and biophysical datasets, the use of physics-aware featurizations can be more important than choice of particular learning algorithm. PMID:29629118

  4. A Novel Anti-classification Approach for Knowledge Protection.

    PubMed

    Lin, Chen-Yi; Chen, Tung-Shou; Tsai, Hui-Fang; Lee, Wei-Bin; Hsu, Tien-Yu; Kao, Yuan-Hung

    2015-10-01

    Classification is the problem of identifying a set of categories where new data belong, on the basis of a set of training data whose category membership is known. Its application is wide-spread, such as the medical science domain. The issue of the classification knowledge protection has been paid attention increasingly in recent years because of the popularity of cloud environments. In the paper, we propose a Shaking Sorted-Sampling (triple-S) algorithm for protecting the classification knowledge of a dataset. The triple-S algorithm sorts the data of an original dataset according to the projection results of the principal components analysis so that the features of the adjacent data are similar. Then, we generate noise data with incorrect classes and add those data to the original dataset. In addition, we develop an effective positioning strategy, determining the added positions of noise data in the original dataset, to ensure the restoration of the original dataset after removing those noise data. The experimental results show that the disturbance effect of the triple-S algorithm on the CLC, MySVM, and LibSVM classifiers increases when the noise data ratio increases. In addition, compared with existing methods, the disturbance effect of the triple-S algorithm is more significant on MySVM and LibSVM when a certain amount of the noise data added to the original dataset is reached.

  5. Gaussian diffusion sinogram inpainting for X-ray CT metal artifact reduction.

    PubMed

    Peng, Chengtao; Qiu, Bensheng; Li, Ming; Guan, Yihui; Zhang, Cheng; Wu, Zhongyi; Zheng, Jian

    2017-01-05

    Metal objects implanted in the bodies of patients usually generate severe streaking artifacts in reconstructed images of X-ray computed tomography, which degrade the image quality and affect the diagnosis of disease. Therefore, it is essential to reduce these artifacts to meet the clinical demands. In this work, we propose a Gaussian diffusion sinogram inpainting metal artifact reduction algorithm based on prior images to reduce these artifacts for fan-beam computed tomography reconstruction. In this algorithm, prior information that originated from a tissue-classified prior image is used for the inpainting of metal-corrupted projections, and it is incorporated into a Gaussian diffusion function. The prior knowledge is particularly designed to locate the diffusion position and improve the sparsity of the subtraction sinogram, which is obtained by subtracting the prior sinogram of the metal regions from the original sinogram. The sinogram inpainting algorithm is implemented through an approach of diffusing prior energy and is then solved by gradient descent. The performance of the proposed metal artifact reduction algorithm is compared with two conventional metal artifact reduction algorithms, namely the interpolation metal artifact reduction algorithm and normalized metal artifact reduction algorithm. The experimental datasets used included both simulated and clinical datasets. By evaluating the results subjectively, the proposed metal artifact reduction algorithm causes fewer secondary artifacts than the two conventional metal artifact reduction algorithms, which lead to severe secondary artifacts resulting from impertinent interpolation and normalization. Additionally, the objective evaluation shows the proposed approach has the smallest normalized mean absolute deviation and the highest signal-to-noise ratio, indicating that the proposed method has produced the image with the best quality. No matter for the simulated datasets or the clinical datasets, the proposed algorithm has reduced the metal artifacts apparently.

  6. NIRS-SPM: statistical parametric mapping for near infrared spectroscopy

    NASA Astrophysics Data System (ADS)

    Tak, Sungho; Jang, Kwang Eun; Jung, Jinwook; Jang, Jaeduck; Jeong, Yong; Ye, Jong Chul

    2008-02-01

    Even though there exists a powerful statistical parametric mapping (SPM) tool for fMRI, similar public domain tools are not available for near infrared spectroscopy (NIRS). In this paper, we describe a new public domain statistical toolbox called NIRS-SPM for quantitative analysis of NIRS signals. Specifically, NIRS-SPM statistically analyzes the NIRS data using GLM and makes inference as the excursion probability which comes from the random field that are interpolated from the sparse measurement. In order to obtain correct inference, NIRS-SPM offers the pre-coloring and pre-whitening method for temporal correlation estimation. For simultaneous recording NIRS signal with fMRI, the spatial mapping between fMRI image and real coordinate in 3-D digitizer is estimated using Horn's algorithm. These powerful tools allows us the super-resolution localization of the brain activation which is not possible using the conventional NIRS analysis tools.

  7. Classification of fMRI independent components using IC-fingerprints and support vector machine classifiers.

    PubMed

    De Martino, Federico; Gentile, Francesco; Esposito, Fabrizio; Balsi, Marco; Di Salle, Francesco; Goebel, Rainer; Formisano, Elia

    2007-01-01

    We present a general method for the classification of independent components (ICs) extracted from functional MRI (fMRI) data sets. The method consists of two steps. In the first step, each fMRI-IC is associated with an IC-fingerprint, i.e., a representation of the component in a multidimensional space of parameters. These parameters are post hoc estimates of global properties of the ICs and are largely independent of a specific experimental design and stimulus timing. In the second step a machine learning algorithm automatically separates the IC-fingerprints into six general classes after preliminary training performed on a small subset of expert-labeled components. We illustrate this approach in a multisubject fMRI study employing visual structure-from-motion stimuli encoding faces and control random shapes. We show that: (1) IC-fingerprints are a valuable tool for the inspection, characterization and selection of fMRI-ICs and (2) automatic classifications of fMRI-ICs in new subjects present a high correspondence with those obtained by expert visual inspection of the components. Importantly, our classification procedure highlights several neurophysiologically interesting processes. The most intriguing of which is reflected, with high intra- and inter-subject reproducibility, in one IC exhibiting a transiently task-related activation in the 'face' region of the primary sensorimotor cortex. This suggests that in addition to or as part of the mirror system, somatotopic regions of the sensorimotor cortex are involved in disambiguating the perception of a moving body part. Finally, we show that the same classification algorithm can be successfully applied, without re-training, to fMRI collected using acquisition parameters, stimulation modality and timing considerably different from those used for training.

  8. Multi-subject hierarchical inverse covariance modelling improves estimation of functional brain networks.

    PubMed

    Colclough, Giles L; Woolrich, Mark W; Harrison, Samuel J; Rojas López, Pedro A; Valdes-Sosa, Pedro A; Smith, Stephen M

    2018-05-07

    A Bayesian model for sparse, hierarchical, inver-covariance estimation is presented, and applied to multi-subject functional connectivity estimation in the human brain. It enables simultaneous inference of the strength of connectivity between brain regions at both subject and population level, and is applicable to fMRI, MEG and EEG data. Two versions of the model can encourage sparse connectivity, either using continuous priors to suppress irrelevant connections, or using an explicit description of the network structure to estimate the connection probability between each pair of regions. A large evaluation of this model, and thirteen methods that represent the state of the art of inverse covariance modelling, is conducted using both simulated and resting-state functional imaging datasets. Our novel Bayesian approach has similar performance to the best extant alternative, Ng et al.'s Sparse Group Gaussian Graphical Model algorithm, which also is based on a hierarchical structure. Using data from the Human Connectome Project, we show that these hierarchical models are able to reduce the measurement error in MEG beta-band functional networks by 10%, producing concomitant increases in estimates of the genetic influence on functional connectivity. Copyright © 2018. Published by Elsevier Inc.

  9. BayesMotif: de novo protein sorting motif discovery from impure datasets.

    PubMed

    Hu, Jianjun; Zhang, Fan

    2010-01-18

    Protein sorting is the process that newly synthesized proteins are transported to their target locations within or outside of the cell. This process is precisely regulated by protein sorting signals in different forms. A major category of sorting signals are amino acid sub-sequences usually located at the N-terminals or C-terminals of protein sequences. Genome-wide experimental identification of protein sorting signals is extremely time-consuming and costly. Effective computational algorithms for de novo discovery of protein sorting signals is needed to improve the understanding of protein sorting mechanisms. We formulated the protein sorting motif discovery problem as a classification problem and proposed a Bayesian classifier based algorithm (BayesMotif) for de novo identification of a common type of protein sorting motifs in which a highly conserved anchor is present along with a less conserved motif regions. A false positive removal procedure is developed to iteratively remove sequences that are unlikely to contain true motifs so that the algorithm can identify motifs from impure input sequences. Experiments on both implanted motif datasets and real-world datasets showed that the enhanced BayesMotif algorithm can identify anchored sorting motifs from pure or impure protein sequence dataset. It also shows that the false positive removal procedure can help to identify true motifs even when there is only 20% of the input sequences containing true motif instances. We proposed BayesMotif, a novel Bayesian classification based algorithm for de novo discovery of a special category of anchored protein sorting motifs from impure datasets. Compared to conventional motif discovery algorithms such as MEME, our algorithm can find less-conserved motifs with short highly conserved anchors. Our algorithm also has the advantage of easy incorporation of additional meta-sequence features such as hydrophobicity or charge of the motifs which may help to overcome the limitations of PWM (position weight matrix) motif model.

  10. Decision tree methods: applications for classification and prediction.

    PubMed

    Song, Yan-Yan; Lu, Ying

    2015-04-25

    Decision tree methodology is a commonly used data mining method for establishing classification systems based on multiple covariates or for developing prediction algorithms for a target variable. This method classifies a population into branch-like segments that construct an inverted tree with a root node, internal nodes, and leaf nodes. The algorithm is non-parametric and can efficiently deal with large, complicated datasets without imposing a complicated parametric structure. When the sample size is large enough, study data can be divided into training and validation datasets. Using the training dataset to build a decision tree model and a validation dataset to decide on the appropriate tree size needed to achieve the optimal final model. This paper introduces frequently used algorithms used to develop decision trees (including CART, C4.5, CHAID, and QUEST) and describes the SPSS and SAS programs that can be used to visualize tree structure.

  11. Attribute Utility Motivated k-anonymization of Datasets to Support the Heterogeneous Needs of Biomedical Researchers

    PubMed Central

    Ye, Huimin; Chen, Elizabeth S.

    2011-01-01

    In order to support the increasing need to share electronic health data for research purposes, various methods have been proposed for privacy preservation including k-anonymity. Many k-anonymity models provide the same level of anoymization regardless of practical need, which may decrease the utility of the dataset for a particular research study. In this study, we explore extensions to the k-anonymity algorithm that aim to satisfy the heterogeneous needs of different researchers while preserving privacy as well as utility of the dataset. The proposed algorithm, Attribute Utility Motivated k-anonymization (AUM), involves analyzing the characteristics of attributes and utilizing them to minimize information loss during the anonymization process. Through comparison with two existing algorithms, Mondrian and Incognito, preliminary results indicate that AUM may preserve more information from original datasets thus providing higher quality results with lower distortion. PMID:22195223

  12. Distinct prediction errors in mesostriatal circuits of the human brain mediate learning about the values of both states and actions: evidence from high-resolution fMRI.

    PubMed

    Colas, Jaron T; Pauli, Wolfgang M; Larsen, Tobias; Tyszka, J Michael; O'Doherty, John P

    2017-10-01

    Prediction-error signals consistent with formal models of "reinforcement learning" (RL) have repeatedly been found within dopaminergic nuclei of the midbrain and dopaminoceptive areas of the striatum. However, the precise form of the RL algorithms implemented in the human brain is not yet well determined. Here, we created a novel paradigm optimized to dissociate the subtypes of reward-prediction errors that function as the key computational signatures of two distinct classes of RL models-namely, "actor/critic" models and action-value-learning models (e.g., the Q-learning model). The state-value-prediction error (SVPE), which is independent of actions, is a hallmark of the actor/critic architecture, whereas the action-value-prediction error (AVPE) is the distinguishing feature of action-value-learning algorithms. To test for the presence of these prediction-error signals in the brain, we scanned human participants with a high-resolution functional magnetic-resonance imaging (fMRI) protocol optimized to enable measurement of neural activity in the dopaminergic midbrain as well as the striatal areas to which it projects. In keeping with the actor/critic model, the SVPE signal was detected in the substantia nigra. The SVPE was also clearly present in both the ventral striatum and the dorsal striatum. However, alongside these purely state-value-based computations we also found evidence for AVPE signals throughout the striatum. These high-resolution fMRI findings suggest that model-free aspects of reward learning in humans can be explained algorithmically with RL in terms of an actor/critic mechanism operating in parallel with a system for more direct action-value learning.

  13. Optimizing Complexity Measures for fMRI Data: Algorithm, Artifact, and Sensitivity

    PubMed Central

    Rubin, Denis; Fekete, Tomer; Mujica-Parodi, Lilianne R.

    2013-01-01

    Introduction Complexity in the brain has been well-documented at both neuronal and hemodynamic scales, with increasing evidence supporting its use in sensitively differentiating between mental states and disorders. However, application of complexity measures to fMRI time-series, which are short, sparse, and have low signal/noise, requires careful modality-specific optimization. Methods Here we use both simulated and real data to address two fundamental issues: choice of algorithm and degree/type of signal processing. Methods were evaluated with regard to resilience to acquisition artifacts common to fMRI as well as detection sensitivity. Detection sensitivity was quantified in terms of grey-white matter contrast and overlap with activation. We additionally investigated the variation of complexity with activation and emotional content, optimal task length, and the degree to which results scaled with scanner using the same paradigm with two 3T magnets made by different manufacturers. Methods for evaluating complexity were: power spectrum, structure function, wavelet decomposition, second derivative, rescaled range, Higuchi’s estimate of fractal dimension, aggregated variance, and detrended fluctuation analysis. To permit direct comparison across methods, all results were normalized to Hurst exponents. Results Power-spectrum, Higuchi’s fractal dimension, and generalized Hurst exponent based estimates were most successful by all criteria; the poorest-performing measures were wavelet, detrended fluctuation analysis, aggregated variance, and rescaled range. Conclusions Functional MRI data have artifacts that interact with complexity calculations in nontrivially distinct ways compared to other physiological data (such as EKG, EEG) for which these measures are typically used. Our results clearly demonstrate that decisions regarding choice of algorithm, signal processing, time-series length, and scanner have a significant impact on the reliability and sensitivity of complexity estimates. PMID:23700424

  14. Distinct prediction errors in mesostriatal circuits of the human brain mediate learning about the values of both states and actions: evidence from high-resolution fMRI

    PubMed Central

    Pauli, Wolfgang M.; Larsen, Tobias; Tyszka, J. Michael; O’Doherty, John P.

    2017-01-01

    Prediction-error signals consistent with formal models of “reinforcement learning” (RL) have repeatedly been found within dopaminergic nuclei of the midbrain and dopaminoceptive areas of the striatum. However, the precise form of the RL algorithms implemented in the human brain is not yet well determined. Here, we created a novel paradigm optimized to dissociate the subtypes of reward-prediction errors that function as the key computational signatures of two distinct classes of RL models—namely, “actor/critic” models and action-value-learning models (e.g., the Q-learning model). The state-value-prediction error (SVPE), which is independent of actions, is a hallmark of the actor/critic architecture, whereas the action-value-prediction error (AVPE) is the distinguishing feature of action-value-learning algorithms. To test for the presence of these prediction-error signals in the brain, we scanned human participants with a high-resolution functional magnetic-resonance imaging (fMRI) protocol optimized to enable measurement of neural activity in the dopaminergic midbrain as well as the striatal areas to which it projects. In keeping with the actor/critic model, the SVPE signal was detected in the substantia nigra. The SVPE was also clearly present in both the ventral striatum and the dorsal striatum. However, alongside these purely state-value-based computations we also found evidence for AVPE signals throughout the striatum. These high-resolution fMRI findings suggest that model-free aspects of reward learning in humans can be explained algorithmically with RL in terms of an actor/critic mechanism operating in parallel with a system for more direct action-value learning. PMID:29049406

  15. Genetic Bee Colony (GBC) algorithm: A new gene selection method for microarray cancer classification.

    PubMed

    Alshamlan, Hala M; Badr, Ghada H; Alohali, Yousef A

    2015-06-01

    Naturally inspired evolutionary algorithms prove effectiveness when used for solving feature selection and classification problems. Artificial Bee Colony (ABC) is a relatively new swarm intelligence method. In this paper, we propose a new hybrid gene selection method, namely Genetic Bee Colony (GBC) algorithm. The proposed algorithm combines the used of a Genetic Algorithm (GA) along with Artificial Bee Colony (ABC) algorithm. The goal is to integrate the advantages of both algorithms. The proposed algorithm is applied to a microarray gene expression profile in order to select the most predictive and informative genes for cancer classification. In order to test the accuracy performance of the proposed algorithm, extensive experiments were conducted. Three binary microarray datasets are use, which include: colon, leukemia, and lung. In addition, another three multi-class microarray datasets are used, which are: SRBCT, lymphoma, and leukemia. Results of the GBC algorithm are compared with our recently proposed technique: mRMR when combined with the Artificial Bee Colony algorithm (mRMR-ABC). We also compared the combination of mRMR with GA (mRMR-GA) and Particle Swarm Optimization (mRMR-PSO) algorithms. In addition, we compared the GBC algorithm with other related algorithms that have been recently published in the literature, using all benchmark datasets. The GBC algorithm shows superior performance as it achieved the highest classification accuracy along with the lowest average number of selected genes. This proves that the GBC algorithm is a promising approach for solving the gene selection problem in both binary and multi-class cancer classification. Copyright © 2015 Elsevier Ltd. All rights reserved.

  16. Endmember extraction from hyperspectral image based on discrete firefly algorithm (EE-DFA)

    NASA Astrophysics Data System (ADS)

    Zhang, Chengye; Qin, Qiming; Zhang, Tianyuan; Sun, Yuanheng; Chen, Chao

    2017-04-01

    This study proposed a novel method to extract endmembers from hyperspectral image based on discrete firefly algorithm (EE-DFA). Endmembers are the input of many spectral unmixing algorithms. Hence, in this paper, endmember extraction from hyperspectral image is regarded as a combinational optimization problem to get best spectral unmixing results, which can be solved by the discrete firefly algorithm. Two series of experiments were conducted on the synthetic hyperspectral datasets with different SNR and the AVIRIS Cuprite dataset, respectively. The experimental results were compared with the endmembers extracted by four popular methods: the sequential maximum angle convex cone (SMACC), N-FINDR, Vertex Component Analysis (VCA), and Minimum Volume Constrained Nonnegative Matrix Factorization (MVC-NMF). What's more, the effect of the parameters in the proposed method was tested on both synthetic hyperspectral datasets and AVIRIS Cuprite dataset, and the recommended parameters setting was proposed. The results in this study demonstrated that the proposed EE-DFA method showed better performance than the existing popular methods. Moreover, EE-DFA is robust under different SNR conditions.

  17. A Comparative Evaluation of Unsupervised Anomaly Detection Algorithms for Multivariate Data

    PubMed Central

    Goldstein, Markus; Uchida, Seiichi

    2016-01-01

    Anomaly detection is the process of identifying unexpected items or events in datasets, which differ from the norm. In contrast to standard classification tasks, anomaly detection is often applied on unlabeled data, taking only the internal structure of the dataset into account. This challenge is known as unsupervised anomaly detection and is addressed in many practical applications, for example in network intrusion detection, fraud detection as well as in the life science and medical domain. Dozens of algorithms have been proposed in this area, but unfortunately the research community still lacks a comparative universal evaluation as well as common publicly available datasets. These shortcomings are addressed in this study, where 19 different unsupervised anomaly detection algorithms are evaluated on 10 different datasets from multiple application domains. By publishing the source code and the datasets, this paper aims to be a new well-funded basis for unsupervised anomaly detection research. Additionally, this evaluation reveals the strengths and weaknesses of the different approaches for the first time. Besides the anomaly detection performance, computational effort, the impact of parameter settings as well as the global/local anomaly detection behavior is outlined. As a conclusion, we give an advise on algorithm selection for typical real-world tasks. PMID:27093601

  18. Implementing a Parallel Image Edge Detection Algorithm Based on the Otsu-Canny Operator on the Hadoop Platform.

    PubMed

    Cao, Jianfang; Chen, Lichao; Wang, Min; Tian, Yun

    2018-01-01

    The Canny operator is widely used to detect edges in images. However, as the size of the image dataset increases, the edge detection performance of the Canny operator decreases and its runtime becomes excessive. To improve the runtime and edge detection performance of the Canny operator, in this paper, we propose a parallel design and implementation for an Otsu-optimized Canny operator using a MapReduce parallel programming model that runs on the Hadoop platform. The Otsu algorithm is used to optimize the Canny operator's dual threshold and improve the edge detection performance, while the MapReduce parallel programming model facilitates parallel processing for the Canny operator to solve the processing speed and communication cost problems that occur when the Canny edge detection algorithm is applied to big data. For the experiments, we constructed datasets of different scales from the Pascal VOC2012 image database. The proposed parallel Otsu-Canny edge detection algorithm performs better than other traditional edge detection algorithms. The parallel approach reduced the running time by approximately 67.2% on a Hadoop cluster architecture consisting of 5 nodes with a dataset of 60,000 images. Overall, our approach system speeds up the system by approximately 3.4 times when processing large-scale datasets, which demonstrates the obvious superiority of our method. The proposed algorithm in this study demonstrates both better edge detection performance and improved time performance.

  19. Fast and Accurate Support Vector Machines on Large Scale Systems

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Vishnu, Abhinav; Narasimhan, Jayenthi; Holder, Larry

    Support Vector Machines (SVM) is a supervised Machine Learning and Data Mining (MLDM) algorithm, which has become ubiquitous largely due to its high accuracy and obliviousness to dimensionality. The objective of SVM is to find an optimal boundary --- also known as hyperplane --- which separates the samples (examples in a dataset) of different classes by a maximum margin. Usually, very few samples contribute to the definition of the boundary. However, existing parallel algorithms use the entire dataset for finding the boundary, which is sub-optimal for performance reasons. In this paper, we propose a novel distributed memory algorithm to eliminatemore » the samples which do not contribute to the boundary definition in SVM. We propose several heuristics, which range from early (aggressive) to late (conservative) elimination of the samples, such that the overall time for generating the boundary is reduced considerably. In a few cases, a sample may be eliminated (shrunk) pre-emptively --- potentially resulting in an incorrect boundary. We propose a scalable approach to synchronize the necessary data structures such that the proposed algorithm maintains its accuracy. We consider the necessary trade-offs of single/multiple synchronization using in-depth time-space complexity analysis. We implement the proposed algorithm using MPI and compare it with libsvm--- de facto sequential SVM software --- which we enhance with OpenMP for multi-core/many-core parallelism. Our proposed approach shows excellent efficiency using up to 4096 processes on several large datasets such as UCI HIGGS Boson dataset and Offending URL dataset.« less

  20. Towards a robust framework for catchment classification

    NASA Astrophysics Data System (ADS)

    Deshmukh, A.; Samal, A.; Singh, R.

    2017-12-01

    Classification of catchments based on various measures of similarity has emerged as an important technique to understand regional scale hydrologic behavior. Classification of catchment characteristics and/or streamflow response has been used reveal which characteristics are more likely to explain the observed variability of hydrologic response. However, numerous algorithms for supervised or unsupervised classification are available, making it hard to identify the algorithm most suitable for the dataset at hand. Consequently, existing catchment classification studies vary significantly in the classification algorithms employed with no previous attempt at understanding the degree of uncertainty in classification due to this algorithmic choice. This hinders the generalizability of interpretations related to hydrologic behavior. Our goal is to develop a protocol that can be followed while classifying hydrologic datasets. We focus on a classification framework for unsupervised classification and provide a step-by-step classification procedure. The steps include testing the clusterabiltiy of original dataset prior to classification, feature selection, validation of clustered data, and quantification of similarity of two clusterings. We test several commonly available methods within this framework to understand the level of similarity of classification results across algorithms. We apply the proposed framework on recently developed datasets for India to analyze to what extent catchment properties can explain observed catchment response. Our testing dataset includes watershed characteristics for over 200 watersheds which comprise of both natural (physio-climatic) characteristics and socio-economic characteristics. This framework allows us to understand the controls on observed hydrologic variability across India.

  1. Minimalist ensemble algorithms for genome-wide protein localization prediction.

    PubMed

    Lin, Jhih-Rong; Mondal, Ananda Mohan; Liu, Rong; Hu, Jianjun

    2012-07-03

    Computational prediction of protein subcellular localization can greatly help to elucidate its functions. Despite the existence of dozens of protein localization prediction algorithms, the prediction accuracy and coverage are still low. Several ensemble algorithms have been proposed to improve the prediction performance, which usually include as many as 10 or more individual localization algorithms. However, their performance is still limited by the running complexity and redundancy among individual prediction algorithms. This paper proposed a novel method for rational design of minimalist ensemble algorithms for practical genome-wide protein subcellular localization prediction. The algorithm is based on combining a feature selection based filter and a logistic regression classifier. Using a novel concept of contribution scores, we analyzed issues of algorithm redundancy, consensus mistakes, and algorithm complementarity in designing ensemble algorithms. We applied the proposed minimalist logistic regression (LR) ensemble algorithm to two genome-wide datasets of Yeast and Human and compared its performance with current ensemble algorithms. Experimental results showed that the minimalist ensemble algorithm can achieve high prediction accuracy with only 1/3 to 1/2 of individual predictors of current ensemble algorithms, which greatly reduces computational complexity and running time. It was found that the high performance ensemble algorithms are usually composed of the predictors that together cover most of available features. Compared to the best individual predictor, our ensemble algorithm improved the prediction accuracy from AUC score of 0.558 to 0.707 for the Yeast dataset and from 0.628 to 0.646 for the Human dataset. Compared with popular weighted voting based ensemble algorithms, our classifier-based ensemble algorithms achieved much better performance without suffering from inclusion of too many individual predictors. We proposed a method for rational design of minimalist ensemble algorithms using feature selection and classifiers. The proposed minimalist ensemble algorithm based on logistic regression can achieve equal or better prediction performance while using only half or one-third of individual predictors compared to other ensemble algorithms. The results also suggested that meta-predictors that take advantage of a variety of features by combining individual predictors tend to achieve the best performance. The LR ensemble server and related benchmark datasets are available at http://mleg.cse.sc.edu/LRensemble/cgi-bin/predict.cgi.

  2. Minimalist ensemble algorithms for genome-wide protein localization prediction

    PubMed Central

    2012-01-01

    Background Computational prediction of protein subcellular localization can greatly help to elucidate its functions. Despite the existence of dozens of protein localization prediction algorithms, the prediction accuracy and coverage are still low. Several ensemble algorithms have been proposed to improve the prediction performance, which usually include as many as 10 or more individual localization algorithms. However, their performance is still limited by the running complexity and redundancy among individual prediction algorithms. Results This paper proposed a novel method for rational design of minimalist ensemble algorithms for practical genome-wide protein subcellular localization prediction. The algorithm is based on combining a feature selection based filter and a logistic regression classifier. Using a novel concept of contribution scores, we analyzed issues of algorithm redundancy, consensus mistakes, and algorithm complementarity in designing ensemble algorithms. We applied the proposed minimalist logistic regression (LR) ensemble algorithm to two genome-wide datasets of Yeast and Human and compared its performance with current ensemble algorithms. Experimental results showed that the minimalist ensemble algorithm can achieve high prediction accuracy with only 1/3 to 1/2 of individual predictors of current ensemble algorithms, which greatly reduces computational complexity and running time. It was found that the high performance ensemble algorithms are usually composed of the predictors that together cover most of available features. Compared to the best individual predictor, our ensemble algorithm improved the prediction accuracy from AUC score of 0.558 to 0.707 for the Yeast dataset and from 0.628 to 0.646 for the Human dataset. Compared with popular weighted voting based ensemble algorithms, our classifier-based ensemble algorithms achieved much better performance without suffering from inclusion of too many individual predictors. Conclusions We proposed a method for rational design of minimalist ensemble algorithms using feature selection and classifiers. The proposed minimalist ensemble algorithm based on logistic regression can achieve equal or better prediction performance while using only half or one-third of individual predictors compared to other ensemble algorithms. The results also suggested that meta-predictors that take advantage of a variety of features by combining individual predictors tend to achieve the best performance. The LR ensemble server and related benchmark datasets are available at http://mleg.cse.sc.edu/LRensemble/cgi-bin/predict.cgi. PMID:22759391

  3. An enhanced deterministic K-Means clustering algorithm for cancer subtype prediction from gene expression data.

    PubMed

    Nidheesh, N; Abdul Nazeer, K A; Ameer, P M

    2017-12-01

    Clustering algorithms with steps involving randomness usually give different results on different executions for the same dataset. This non-deterministic nature of algorithms such as the K-Means clustering algorithm limits their applicability in areas such as cancer subtype prediction using gene expression data. It is hard to sensibly compare the results of such algorithms with those of other algorithms. The non-deterministic nature of K-Means is due to its random selection of data points as initial centroids. We propose an improved, density based version of K-Means, which involves a novel and systematic method for selecting initial centroids. The key idea of the algorithm is to select data points which belong to dense regions and which are adequately separated in feature space as the initial centroids. We compared the proposed algorithm to a set of eleven widely used single clustering algorithms and a prominent ensemble clustering algorithm which is being used for cancer data classification, based on the performances on a set of datasets comprising ten cancer gene expression datasets. The proposed algorithm has shown better overall performance than the others. There is a pressing need in the Biomedical domain for simple, easy-to-use and more accurate Machine Learning tools for cancer subtype prediction. The proposed algorithm is simple, easy-to-use and gives stable results. Moreover, it provides comparatively better predictions of cancer subtypes from gene expression data. Copyright © 2017 Elsevier Ltd. All rights reserved.

  4. Highlights of the Version 8 SBUV and TOMS Datasets Released at this Symposium

    NASA Technical Reports Server (NTRS)

    Bhartia, Pawan K.; McPeters, Richard D.; Flynn, Lawrence E.; Wellemeyer, Charles G.

    2004-01-01

    Last October was the 25th anniversary of the launch of the SBUV and TOMS instruments on NASA's Nimbus-7 satellite. Total Ozone and ozone profile datasets produced by these and following instruments have produced a quarter century long record. Over time we have released several versions of these datasets to incorporate advances in UV radiative transfer, inverse modeling, and instrument characterization. In this meeting we are releasing datasets produced from the version 8 algorithms. They replace the previous versions (V6 SBUV, and V7 TOMS) released about a decade ago. About a dozen companion papers in this meeting provide details of the new algorithms and intercomparison of the new data with external data. In this paper we present key features of the new algorithm, and discuss how the new results differ from those released previously. We show that the new datasets have better internal consistency and also agree better with external datasets. A key feature of the V8 SBUV algorithm is that the climatology has no influence on inter-annual variability and trends; it only affects the mean values and, to a limited extent, the seasonal dependence. By contrast, climatology does have some influence on TOMS total O3 trends, particularly at large solar zenith angles. For this reason, and also because TOMS record has gaps, md EP/TOMS is suffering from data quality problems, we recommend using SBUV total ozone data for applications where the high spatial resolution of TOMS is not essential.

  5. Adaptive swarm cluster-based dynamic multi-objective synthetic minority oversampling technique algorithm for tackling binary imbalanced datasets in biomedical data classification.

    PubMed

    Li, Jinyan; Fong, Simon; Sung, Yunsick; Cho, Kyungeun; Wong, Raymond; Wong, Kelvin K L

    2016-01-01

    An imbalanced dataset is defined as a training dataset that has imbalanced proportions of data in both interesting and uninteresting classes. Often in biomedical applications, samples from the stimulating class are rare in a population, such as medical anomalies, positive clinical tests, and particular diseases. Although the target samples in the primitive dataset are small in number, the induction of a classification model over such training data leads to poor prediction performance due to insufficient training from the minority class. In this paper, we use a novel class-balancing method named adaptive swarm cluster-based dynamic multi-objective synthetic minority oversampling technique (ASCB_DmSMOTE) to solve this imbalanced dataset problem, which is common in biomedical applications. The proposed method combines under-sampling and over-sampling into a swarm optimisation algorithm. It adaptively selects suitable parameters for the rebalancing algorithm to find the best solution. Compared with the other versions of the SMOTE algorithm, significant improvements, which include higher accuracy and credibility, are observed with ASCB_DmSMOTE. Our proposed method tactfully combines two rebalancing techniques together. It reasonably re-allocates the majority class in the details and dynamically optimises the two parameters of SMOTE to synthesise a reasonable scale of minority class for each clustered sub-imbalanced dataset. The proposed methods ultimately overcome other conventional methods and attains higher credibility with even greater accuracy of the classification model.

  6. Separation of pulsar signals from noise using supervised machine learning algorithms

    NASA Astrophysics Data System (ADS)

    Bethapudi, S.; Desai, S.

    2018-04-01

    We evaluate the performance of four different machine learning (ML) algorithms: an Artificial Neural Network Multi-Layer Perceptron (ANN MLP), Adaboost, Gradient Boosting Classifier (GBC), and XGBoost, for the separation of pulsars from radio frequency interference (RFI) and other sources of noise, using a dataset obtained from the post-processing of a pulsar search pipeline. This dataset was previously used for the cross-validation of the SPINN-based machine learning engine, obtained from the reprocessing of the HTRU-S survey data (Morello et al., 2014). We have used the Synthetic Minority Over-sampling Technique (SMOTE) to deal with high-class imbalance in the dataset. We report a variety of quality scores from all four of these algorithms on both the non-SMOTE and SMOTE datasets. For all the above ML methods, we report high accuracy and G-mean for both the non-SMOTE and SMOTE cases. We study the feature importances using Adaboost, GBC, and XGBoost and also from the minimum Redundancy Maximum Relevance approach to report algorithm-agnostic feature ranking. From these methods, we find that the signal to noise of the folded profile to be the best feature. We find that all the ML algorithms report FPRs about an order of magnitude lower than the corresponding FPRs obtained in Morello et al. (2014), for the same recall value.

  7. Automatic Clustering Using Multi-objective Particle Swarm and Simulated Annealing

    PubMed Central

    Abubaker, Ahmad; Baharum, Adam; Alrefaei, Mahmoud

    2015-01-01

    This paper puts forward a new automatic clustering algorithm based on Multi-Objective Particle Swarm Optimization and Simulated Annealing, “MOPSOSA”. The proposed algorithm is capable of automatic clustering which is appropriate for partitioning datasets to a suitable number of clusters. MOPSOSA combines the features of the multi-objective based particle swarm optimization (PSO) and the Multi-Objective Simulated Annealing (MOSA). Three cluster validity indices were optimized simultaneously to establish the suitable number of clusters and the appropriate clustering for a dataset. The first cluster validity index is centred on Euclidean distance, the second on the point symmetry distance, and the last cluster validity index is based on short distance. A number of algorithms have been compared with the MOPSOSA algorithm in resolving clustering problems by determining the actual number of clusters and optimal clustering. Computational experiments were carried out to study fourteen artificial and five real life datasets. PMID:26132309

  8. Meta-Learning Approach for Automatic Parameter Tuning: A Case Study with Educational Datasets

    ERIC Educational Resources Information Center

    Molina, M. M.; Luna, J. M.; Romero, C.; Ventura, S.

    2012-01-01

    This paper proposes to the use of a meta-learning approach for automatic parameter tuning of a well-known decision tree algorithm by using past information about algorithm executions. Fourteen educational datasets were analysed using various combinations of parameter values to examine the effects of the parameter values on accuracy classification.…

  9. Identifying autism from neural representations of social interactions: neurocognitive markers of autism.

    PubMed

    Just, Marcel Adam; Cherkassky, Vladimir L; Buchweitz, Augusto; Keller, Timothy A; Mitchell, Tom M

    2014-01-01

    Autism is a psychiatric/neurological condition in which alterations in social interaction (among other symptoms) are diagnosed by behavioral psychiatric methods. The main goal of this study was to determine how the neural representations and meanings of social concepts (such as to insult) are altered in autism. A second goal was to determine whether these alterations can serve as neurocognitive markers of autism. The approach is based on previous advances in fMRI analysis methods that permit (a) the identification of a concept, such as the thought of a physical object, from its fMRI pattern, and (b) the ability to assess the semantic content of a concept from its fMRI pattern. These factor analysis and machine learning methods were applied to the fMRI activation patterns of 17 adults with high-functioning autism and matched controls, scanned while thinking about 16 social interactions. One prominent neural representation factor that emerged (manifested mainly in posterior midline regions) was related to self-representation, but this factor was present only for the control participants, and was near-absent in the autism group. Moreover, machine learning algorithms classified individuals as autistic or control with 97% accuracy from their fMRI neurocognitive markers. The findings suggest that psychiatric alterations of thought can begin to be biologically understood by assessing the form and content of the altered thought's underlying brain activation patterns.

  10. Identifying Autism from Neural Representations of Social Interactions: Neurocognitive Markers of Autism

    PubMed Central

    Just, Marcel Adam; Cherkassky, Vladimir L.; Buchweitz, Augusto; Keller, Timothy A.; Mitchell, Tom M.

    2014-01-01

    Autism is a psychiatric/neurological condition in which alterations in social interaction (among other symptoms) are diagnosed by behavioral psychiatric methods. The main goal of this study was to determine how the neural representations and meanings of social concepts (such as to insult) are altered in autism. A second goal was to determine whether these alterations can serve as neurocognitive markers of autism. The approach is based on previous advances in fMRI analysis methods that permit (a) the identification of a concept, such as the thought of a physical object, from its fMRI pattern, and (b) the ability to assess the semantic content of a concept from its fMRI pattern. These factor analysis and machine learning methods were applied to the fMRI activation patterns of 17 adults with high-functioning autism and matched controls, scanned while thinking about 16 social interactions. One prominent neural representation factor that emerged (manifested mainly in posterior midline regions) was related to self-representation, but this factor was present only for the control participants, and was near-absent in the autism group. Moreover, machine learning algorithms classified individuals as autistic or control with 97% accuracy from their fMRI neurocognitive markers. The findings suggest that psychiatric alterations of thought can begin to be biologically understood by assessing the form and content of the altered thought’s underlying brain activation patterns. PMID:25461818

  11. Effects of stimulants on brain function in attention-deficit/hyperactivity disorder: a systematic review and meta-analysis.

    PubMed

    Rubia, Katya; Alegria, Analucia A; Cubillo, Ana I; Smith, Anna B; Brammer, Michael J; Radua, Joaquim

    2014-10-15

    Psychostimulant medication, most commonly the catecholamine agonist methylphenidate, is the most effective treatment for attention-deficit/hyperactivity disorder (ADHD). However, relatively little is known on the mechanisms of action. Acute effects on brain function can elucidate underlying neurocognitive effects. We tested methylphenidate effects relative to placebo in functional magnetic resonance imaging (fMRI) during three disorder-relevant tasks in medication-naïve ADHD adolescents. In addition, we conducted a systematic review and meta-analysis of the fMRI findings of acute stimulant effects on ADHD brain function. The fMRI study compared 20 adolescents with ADHD under either placebo or methylphenidate in a randomized controlled trial while performing stop, working memory, and time discrimination tasks. The meta-analysis was conducted searching PubMed, ScienceDirect, Web of Knowledge, Google Scholar, and Scopus databases. Peak coordinates of clusters of significant effects of stimulant medication relative to placebo or off medication were extracted for each study. The fMRI analysis showed that methylphenidate significantly enhanced activation in bilateral inferior frontal cortex (IFC)/insula during inhibition and time discrimination but had no effect on working memory networks. The meta-analysis, including 14 fMRI datasets and 212 children with ADHD, showed that stimulants most consistently enhanced right IFC/insula activation, which also remained for a subgroup analysis of methylphenidate effects alone. A more lenient threshold also revealed increased putamen activation. Psychostimulants most consistently increase right IFC/insula activation, which are key areas of cognitive control and also the most replicated neurocognitive dysfunction in ADHD. These neurocognitive effects may underlie their positive clinical effects. © 2013 Society of Biological Psychiatry Published by Society of Biological Psychiatry All rights reserved.

  12. Accuracy of automated classification of major depressive disorder as a function of symptom severity.

    PubMed

    Ramasubbu, Rajamannar; Brown, Matthew R G; Cortese, Filmeno; Gaxiola, Ismael; Goodyear, Bradley; Greenshaw, Andrew J; Dursun, Serdar M; Greiner, Russell

    2016-01-01

    Growing evidence documents the potential of machine learning for developing brain based diagnostic methods for major depressive disorder (MDD). As symptom severity may influence brain activity, we investigated whether the severity of MDD affected the accuracies of machine learned MDD-vs-Control diagnostic classifiers. Forty-five medication-free patients with DSM-IV defined MDD and 19 healthy controls participated in the study. Based on depression severity as determined by the Hamilton Rating Scale for Depression (HRSD), MDD patients were sorted into three groups: mild to moderate depression (HRSD 14-19), severe depression (HRSD 20-23), and very severe depression (HRSD ≥ 24). We collected functional magnetic resonance imaging (fMRI) data during both resting-state and an emotional-face matching task. Patients in each of the three severity groups were compared against controls in separate analyses, using either the resting-state or task-based fMRI data. We use each of these six datasets with linear support vector machine (SVM) binary classifiers for identifying individuals as patients or controls. The resting-state fMRI data showed statistically significant classification accuracy only for the very severe depression group (accuracy 66%, p = 0.012 corrected), while mild to moderate (accuracy 58%, p = 1.0 corrected) and severe depression (accuracy 52%, p = 1.0 corrected) were only at chance. With task-based fMRI data, the automated classifier performed at chance in all three severity groups. Binary linear SVM classifiers achieved significant classification of very severe depression with resting-state fMRI, but the contribution of brain measurements may have limited potential in differentiating patients with less severe depression from healthy controls.

  13. Hippocampal Sharp-Wave Ripples Influence Selective Activation of the Default Mode Network

    PubMed Central

    Kaplan, Raphael; Adhikari, Mohit H.; Hindriks, Rikkert; Mantini, Dante; Murayama, Yusuke; Logothetis, Nikos K.; Deco, Gustavo

    2016-01-01

    Summary The default mode network (DMN) is a commonly observed resting-state network (RSN) that includes medial temporal, parietal, and prefrontal regions involved in episodic memory [1, 2, 3]. The behavioral relevance of endogenous DMN activity remains elusive, despite an emerging literature correlating resting fMRI fluctuations with memory performance [4, 5]—particularly in DMN regions [6, 7, 8]. Mechanistic support for the DMN’s role in memory consolidation might come from investigation of large deflections (sharp-waves) in the hippocampal local field potential that co-occur with high-frequency (>80 Hz) oscillations called ripples—both during sleep [9, 10] and awake deliberative periods [11, 12, 13]. Ripples are ideally suited for memory consolidation [14, 15], since the reactivation of hippocampal place cell ensembles occurs during ripples [16, 17, 18, 19]. Moreover, the number of ripples after learning predicts subsequent memory performance in rodents [20, 21, 22] and humans [23], whereas electrical stimulation of the hippocampus after learning interferes with memory consolidation [24, 25, 26]. A recent study in macaques showed diffuse fMRI neocortical activation and subcortical deactivation specifically after ripples [27]. Yet it is unclear whether ripples and other hippocampal neural events influence endogenous fluctuations in specific RSNs—like the DMN—unitarily. Here, we examine fMRI datasets from anesthetized monkeys with simultaneous hippocampal electrophysiology recordings, where we observe a dramatic increase in the DMN fMRI signal following ripples, but not following other hippocampal electrophysiological events. Crucially, we find increases in ongoing DMN activity after ripples, but not in other RSNs. Our results relate endogenous DMN fluctuations to hippocampal ripples, thereby linking network-level resting fMRI fluctuations with behaviorally relevant circuit-level neural dynamics. PMID:26898464

  14. Effects of Stimulants on Brain Function in Attention-Deficit/Hyperactivity Disorder: A Systematic Review and Meta-Analysis

    PubMed Central

    Rubia, Katya; Alegria, Analucia A.; Cubillo, Ana I.; Smith, Anna B.; Brammer, Michael J.; Radua, Joaquim

    2014-01-01

    Background Psychostimulant medication, most commonly the catecholamine agonist methylphenidate, is the most effective treatment for attention-deficit/hyperactivity disorder (ADHD). However, relatively little is known on the mechanisms of action. Acute effects on brain function can elucidate underlying neurocognitive effects. We tested methylphenidate effects relative to placebo in functional magnetic resonance imaging (fMRI) during three disorder-relevant tasks in medication-naïve ADHD adolescents. In addition, we conducted a systematic review and meta-analysis of the fMRI findings of acute stimulant effects on ADHD brain function. Methods The fMRI study compared 20 adolescents with ADHD under either placebo or methylphenidate in a randomized controlled trial while performing stop, working memory, and time discrimination tasks. The meta-analysis was conducted searching PubMed, ScienceDirect, Web of Knowledge, Google Scholar, and Scopus databases. Peak coordinates of clusters of significant effects of stimulant medication relative to placebo or off medication were extracted for each study. Results The fMRI analysis showed that methylphenidate significantly enhanced activation in bilateral inferior frontal cortex (IFC)/insula during inhibition and time discrimination but had no effect on working memory networks. The meta-analysis, including 14 fMRI datasets and 212 children with ADHD, showed that stimulants most consistently enhanced right IFC/insula activation, which also remained for a subgroup analysis of methylphenidate effects alone. A more lenient threshold also revealed increased putamen activation. Conclusions Psychostimulants most consistently increase right IFC/insula activation, which are key areas of cognitive control and also the most replicated neurocognitive dysfunction in ADHD. These neurocognitive effects may underlie their positive clinical effects. PMID:24314347

  15. An End-to-End simulator for the development of atmospheric corrections and temperature - emissivity separation algorithms in the TIR spectral domain

    NASA Astrophysics Data System (ADS)

    Rock, Gilles; Fischer, Kim; Schlerf, Martin; Gerhards, Max; Udelhoven, Thomas

    2017-04-01

    The development and optimization of image processing algorithms requires the availability of datasets depicting every step from earth surface to the sensor's detector. The lack of ground truth data obliges to develop algorithms on simulated data. The simulation of hyperspectral remote sensing data is a useful tool for a variety of tasks such as the design of systems, the understanding of the image formation process, and the development and validation of data processing algorithms. An end-to-end simulator has been set up consisting of a forward simulator, a backward simulator and a validation module. The forward simulator derives radiance datasets based on laboratory sample spectra, applies atmospheric contributions using radiative transfer equations, and simulates the instrument response using configurable sensor models. This is followed by the backward simulation branch, consisting of an atmospheric correction (AC), a temperature and emissivity separation (TES) or a hybrid AC and TES algorithm. An independent validation module allows the comparison between input and output dataset and the benchmarking of different processing algorithms. In this study, hyperspectral thermal infrared scenes of a variety of surfaces have been simulated to analyze existing AC and TES algorithms. The ARTEMISS algorithm was optimized and benchmarked against the original implementations. The errors in TES were found to be related to incorrect water vapor retrieval. The atmospheric characterization could be optimized resulting in increasing accuracies in temperature and emissivity retrieval. Airborne datasets of different spectral resolutions were simulated from terrestrial HyperCam-LW measurements. The simulated airborne radiance spectra were subjected to atmospheric correction and TES and further used for a plant species classification study analyzing effects related to noise and mixed pixels.

  16. Alzheimer Classification Using a Minimum Spanning Tree of High-Order Functional Network on fMRI Dataset

    PubMed Central

    Guo, Hao; Liu, Lei; Chen, Junjie; Xu, Yong; Jie, Xiang

    2017-01-01

    Functional magnetic resonance imaging (fMRI) is one of the most useful methods to generate functional connectivity networks of the brain. However, conventional network generation methods ignore dynamic changes of functional connectivity between brain regions. Previous studies proposed constructing high-order functional connectivity networks that consider the time-varying characteristics of functional connectivity, and a clustering method was performed to decrease computational cost. However, random selection of the initial clustering centers and the number of clusters negatively affected classification accuracy, and the network lost neurological interpretability. Here we propose a novel method that introduces the minimum spanning tree method to high-order functional connectivity networks. As an unbiased method, the minimum spanning tree simplifies high-order network structure while preserving its core framework. The dynamic characteristics of time series are not lost with this approach, and the neurological interpretation of the network is guaranteed. Simultaneously, we propose a multi-parameter optimization framework that involves extracting discriminative features from the minimum spanning tree high-order functional connectivity networks. Compared with the conventional methods, our resting-state fMRI classification method based on minimum spanning tree high-order functional connectivity networks greatly improved the diagnostic accuracy for Alzheimer's disease. PMID:29249926

  17. Large-Scale functional network overlap is a general property of brain functional organization: Reconciling inconsistent fMRI findings from general-linear-model-based analyses

    PubMed Central

    Xu, Jiansong; Potenza, Marc N.; Calhoun, Vince D.; Zhang, Rubin; Yip, Sarah W.; Wall, John T.; Pearlson, Godfrey D.; Worhunsky, Patrick D.; Garrison, Kathleen A.; Moran, Joseph M.

    2016-01-01

    Functional magnetic resonance imaging (fMRI) studies regularly use univariate general-linear-model-based analyses (GLM). Their findings are often inconsistent across different studies, perhaps because of several fundamental brain properties including functional heterogeneity, balanced excitation and inhibition (E/I), and sparseness of neuronal activities. These properties stipulate heterogeneous neuronal activities in the same voxels and likely limit the sensitivity and specificity of GLM. This paper selectively reviews findings of histological and electrophysiological studies and fMRI spatial independent component analysis (sICA) and reports new findings by applying sICA to two existing datasets. The extant and new findings consistently demonstrate several novel features of brain functional organization not revealed by GLM. They include overlap of large-scale functional networks (FNs) and their concurrent opposite modulations, and no significant modulations in activity of most FNs across the whole brain during any task conditions. These novel features of brain functional organization are highly consistent with the brain’s properties of functional heterogeneity, balanced E/I, and sparseness of neuronal activity, and may help reconcile inconsistent GLM findings. PMID:27592153

  18. Decoding power-spectral profiles from FMRI brain activities during naturalistic auditory experience.

    PubMed

    Hu, Xintao; Guo, Lei; Han, Junwei; Liu, Tianming

    2017-02-01

    Recent studies have demonstrated a close relationship between computational acoustic features and neural brain activities, and have largely advanced our understanding of auditory information processing in the human brain. Along this line, we proposed a multidisciplinary study to examine whether power spectral density (PSD) profiles can be decoded from brain activities during naturalistic auditory experience. The study was performed on a high resolution functional magnetic resonance imaging (fMRI) dataset acquired when participants freely listened to the audio-description of the movie "Forrest Gump". Representative PSD profiles existing in the audio-movie were identified by clustering the audio samples according to their PSD descriptors. Support vector machine (SVM) classifiers were trained to differentiate the representative PSD profiles using corresponding fMRI brain activities. Based on PSD profile decoding, we explored how the neural decodability correlated to power intensity and frequency deviants. Our experimental results demonstrated that PSD profiles can be reliably decoded from brain activities. We also suggested a sigmoidal relationship between the neural decodability and power intensity deviants of PSD profiles. Our study in addition substantiates the feasibility and advantage of naturalistic paradigm for studying neural encoding of complex auditory information.

  19. Sparse network-based models for patient classification using fMRI

    PubMed Central

    Rosa, Maria J.; Portugal, Liana; Hahn, Tim; Fallgatter, Andreas J.; Garrido, Marta I.; Shawe-Taylor, John; Mourao-Miranda, Janaina

    2015-01-01

    Pattern recognition applied to whole-brain neuroimaging data, such as functional Magnetic Resonance Imaging (fMRI), has proved successful at discriminating psychiatric patients from healthy participants. However, predictive patterns obtained from whole-brain voxel-based features are difficult to interpret in terms of the underlying neurobiology. Many psychiatric disorders, such as depression and schizophrenia, are thought to be brain connectivity disorders. Therefore, pattern recognition based on network models might provide deeper insights and potentially more powerful predictions than whole-brain voxel-based approaches. Here, we build a novel sparse network-based discriminative modeling framework, based on Gaussian graphical models and L1-norm regularized linear Support Vector Machines (SVM). In addition, the proposed framework is optimized in terms of both predictive power and reproducibility/stability of the patterns. Our approach aims to provide better pattern interpretation than voxel-based whole-brain approaches by yielding stable brain connectivity patterns that underlie discriminative changes in brain function between the groups. We illustrate our technique by classifying patients with major depressive disorder (MDD) and healthy participants, in two (event- and block-related) fMRI datasets acquired while participants performed a gender discrimination and emotional task, respectively, during the visualization of emotional valent faces. PMID:25463459

  20. A comparison of public datasets for acceleration-based fall detection.

    PubMed

    Igual, Raul; Medrano, Carlos; Plaza, Inmaculada

    2015-09-01

    Falls are one of the leading causes of mortality among the older population, being the rapid detection of a fall a key factor to mitigate its main adverse health consequences. In this context, several authors have conducted studies on acceleration-based fall detection using external accelerometers or smartphones. The published detection rates are diverse, sometimes close to a perfect detector. This divergence may be explained by the difficulties in comparing different fall detection studies in a fair play since each study uses its own dataset obtained under different conditions. In this regard, several datasets have been made publicly available recently. This paper presents a comparison, to the best of our knowledge for the first time, of these public fall detection datasets in order to determine whether they have an influence on the declared performances. Using two different detection algorithms, the study shows that the performances of the fall detection techniques are affected, to a greater or lesser extent, by the specific datasets used to validate them. We have also found large differences in the generalization capability of a fall detector depending on the dataset used for training. In fact, the performance decreases dramatically when the algorithms are tested on a dataset different from the one used for training. Other characteristics of the datasets like the number of training samples also have an influence on the performance while algorithms seem less sensitive to the sampling frequency or the acceleration range. Copyright © 2015 IPEM. Published by Elsevier Ltd. All rights reserved.

  1. [Spatial domain display for interference image dataset].

    PubMed

    Wang, Cai-Ling; Li, Yu-Shan; Liu, Xue-Bin; Hu, Bing-Liang; Jing, Juan-Juan; Wen, Jia

    2011-11-01

    The requirements of imaging interferometer visualization is imminent for the user of image interpretation and information extraction. However, the conventional researches on visualization only focus on the spectral image dataset in spectral domain. Hence, the quick show of interference spectral image dataset display is one of the nodes in interference image processing. The conventional visualization of interference dataset chooses classical spectral image dataset display method after Fourier transformation. In the present paper, the problem of quick view of interferometer imager in image domain is addressed and the algorithm is proposed which simplifies the matter. The Fourier transformation is an obstacle since its computation time is very large and the complexion would be even deteriorated with the size of dataset increasing. The algorithm proposed, named interference weighted envelopes, makes the dataset divorced from transformation. The authors choose three interference weighted envelopes respectively based on the Fourier transformation, features of interference data and human visual system. After comparing the proposed with the conventional methods, the results show the huge difference in display time.

  2. Effects of ageing and Alzheimer disease on haemodynamic response function: a challenge for event-related fMRI.

    PubMed

    Asemani, Davud; Morsheddost, Hassan; Shalchy, Mahsa Alizadeh

    2017-06-01

    Functional magnetic resonance imaging (fMRI) can generate brain images that show neuronal activity due to sensory, cognitive or motor tasks. Haemodynamic response function (HRF) may be considered as a biomarker to discriminate the Alzheimer disease (AD) from healthy ageing. As blood-oxygenation-level-dependent fMRI signal is much weak and noisy, particularly for the elderly subjects, a robust method is necessary for HRF estimation to efficiently differentiate the AD. After applying minimum description length wavelet as an extra denoising step, deconvolution algorithm is here employed for HRF estimation, substituting the averaging method used in the previous works. The HRF amplitude peaks are compared for three groups HRF of young, non-demented and demented elderly groups for both vision and motor regions. Prior works often reported significant differences in the HRF peak amplitude between the young and the elderly. The authors' experimentations show that the HRF peaks are not significantly different comparing the young adults with the elderly (either demented or non-demented). It is here demonstrated that the contradictory findings of the previous studies on the HRF peaks for the elderly compared with the young are originated from the noise contribution in fMRI data.

  3. Accuracy and robustness evaluation in stereo matching

    NASA Astrophysics Data System (ADS)

    Nguyen, Duc M.; Hanca, Jan; Lu, Shao-Ping; Schelkens, Peter; Munteanu, Adrian

    2016-09-01

    Stereo matching has received a lot of attention from the computer vision community, thanks to its wide range of applications. Despite of the large variety of algorithms that have been proposed so far, it is not trivial to select suitable algorithms for the construction of practical systems. One of the main problems is that many algorithms lack sufficient robustness when employed in various operational conditions. This problem is due to the fact that most of the proposed methods in the literature are usually tested and tuned to perform well on one specific dataset. To alleviate this problem, an extensive evaluation in terms of accuracy and robustness of state-of-the-art stereo matching algorithms is presented. Three datasets (Middlebury, KITTI, and MPEG FTV) representing different operational conditions are employed. Based on the analysis, improvements over existing algorithms have been proposed. The experimental results show that our improved versions of cross-based and cost volume filtering algorithms outperform the original versions with large margins on Middlebury and KITTI datasets. In addition, the latter of the two proposed algorithms ranks itself among the best local stereo matching approaches on the KITTI benchmark. Under evaluations using specific settings for depth-image-based-rendering applications, our improved belief propagation algorithm is less complex than MPEG's FTV depth estimation reference software (DERS), while yielding similar depth estimation performance. Finally, several conclusions on stereo matching algorithms are also presented.

  4. Classifier ensemble construction with rotation forest to improve medical diagnosis performance of machine learning algorithms.

    PubMed

    Ozcift, Akin; Gulten, Arif

    2011-12-01

    Improving accuracies of machine learning algorithms is vital in designing high performance computer-aided diagnosis (CADx) systems. Researches have shown that a base classifier performance might be enhanced by ensemble classification strategies. In this study, we construct rotation forest (RF) ensemble classifiers of 30 machine learning algorithms to evaluate their classification performances using Parkinson's, diabetes and heart diseases from literature. While making experiments, first the feature dimension of three datasets is reduced using correlation based feature selection (CFS) algorithm. Second, classification performances of 30 machine learning algorithms are calculated for three datasets. Third, 30 classifier ensembles are constructed based on RF algorithm to assess performances of respective classifiers with the same disease data. All the experiments are carried out with leave-one-out validation strategy and the performances of the 60 algorithms are evaluated using three metrics; classification accuracy (ACC), kappa error (KE) and area under the receiver operating characteristic (ROC) curve (AUC). Base classifiers succeeded 72.15%, 77.52% and 84.43% average accuracies for diabetes, heart and Parkinson's datasets, respectively. As for RF classifier ensembles, they produced average accuracies of 74.47%, 80.49% and 87.13% for respective diseases. RF, a newly proposed classifier ensemble algorithm, might be used to improve accuracy of miscellaneous machine learning algorithms to design advanced CADx systems. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.

  5. Heterogeneous Ensemble Combination Search Using Genetic Algorithm for Class Imbalanced Data Classification.

    PubMed

    Haque, Mohammad Nazmul; Noman, Nasimul; Berretta, Regina; Moscato, Pablo

    2016-01-01

    Classification of datasets with imbalanced sample distributions has always been a challenge. In general, a popular approach for enhancing classification performance is the construction of an ensemble of classifiers. However, the performance of an ensemble is dependent on the choice of constituent base classifiers. Therefore, we propose a genetic algorithm-based search method for finding the optimum combination from a pool of base classifiers to form a heterogeneous ensemble. The algorithm, called GA-EoC, utilises 10 fold-cross validation on training data for evaluating the quality of each candidate ensembles. In order to combine the base classifiers decision into ensemble's output, we used the simple and widely used majority voting approach. The proposed algorithm, along with the random sub-sampling approach to balance the class distribution, has been used for classifying class-imbalanced datasets. Additionally, if a feature set was not available, we used the (α, β) - k Feature Set method to select a better subset of features for classification. We have tested GA-EoC with three benchmarking datasets from the UCI-Machine Learning repository, one Alzheimer's disease dataset and a subset of the PubFig database of Columbia University. In general, the performance of the proposed method on the chosen datasets is robust and better than that of the constituent base classifiers and many other well-known ensembles. Based on our empirical study we claim that a genetic algorithm is a superior and reliable approach to heterogeneous ensemble construction and we expect that the proposed GA-EoC would perform consistently in other cases.

  6. Heterogeneous Ensemble Combination Search Using Genetic Algorithm for Class Imbalanced Data Classification

    PubMed Central

    Haque, Mohammad Nazmul; Noman, Nasimul; Berretta, Regina; Moscato, Pablo

    2016-01-01

    Classification of datasets with imbalanced sample distributions has always been a challenge. In general, a popular approach for enhancing classification performance is the construction of an ensemble of classifiers. However, the performance of an ensemble is dependent on the choice of constituent base classifiers. Therefore, we propose a genetic algorithm-based search method for finding the optimum combination from a pool of base classifiers to form a heterogeneous ensemble. The algorithm, called GA-EoC, utilises 10 fold-cross validation on training data for evaluating the quality of each candidate ensembles. In order to combine the base classifiers decision into ensemble’s output, we used the simple and widely used majority voting approach. The proposed algorithm, along with the random sub-sampling approach to balance the class distribution, has been used for classifying class-imbalanced datasets. Additionally, if a feature set was not available, we used the (α, β) − k Feature Set method to select a better subset of features for classification. We have tested GA-EoC with three benchmarking datasets from the UCI-Machine Learning repository, one Alzheimer’s disease dataset and a subset of the PubFig database of Columbia University. In general, the performance of the proposed method on the chosen datasets is robust and better than that of the constituent base classifiers and many other well-known ensembles. Based on our empirical study we claim that a genetic algorithm is a superior and reliable approach to heterogeneous ensemble construction and we expect that the proposed GA-EoC would perform consistently in other cases. PMID:26764911

  7. CrossLink: a novel method for cross-condition classification of cancer subtypes.

    PubMed

    Ma, Chifeng; Sastry, Konduru S; Flore, Mario; Gehani, Salah; Al-Bozom, Issam; Feng, Yusheng; Serpedin, Erchin; Chouchane, Lotfi; Chen, Yidong; Huang, Yufei

    2016-08-22

    We considered the prediction of cancer classes (e.g. subtypes) using patient gene expression profiles that contain both systematic and condition-specific biases when compared with the training reference dataset. The conventional normalization-based approaches cannot guarantee that the gene signatures in the reference and prediction datasets always have the same distribution for all different conditions as the class-specific gene signatures change with the condition. Therefore, the trained classifier would work well under one condition but not under another. To address the problem of current normalization approaches, we propose a novel algorithm called CrossLink (CL). CL recognizes that there is no universal, condition-independent normalization mapping of signatures. In contrast, it exploits the fact that the signature is unique to its associated class under any condition and thus employs an unsupervised clustering algorithm to discover this unique signature. We assessed the performance of CL for cross-condition predictions of PAM50 subtypes of breast cancer by using a simulated dataset modeled after TCGA BRCA tumor samples with a cross-validation scheme, and datasets with known and unknown PAM50 classification. CL achieved prediction accuracy >73 %, highest among other methods we evaluated. We also applied the algorithm to a set of breast cancer tumors derived from Arabic population to assign a PAM50 classification to each tumor based on their gene expression profiles. A novel algorithm CrossLink for cross-condition prediction of cancer classes was proposed. In all test datasets, CL showed robust and consistent improvement in prediction performance over other state-of-the-art normalization and classification algorithms.

  8. Implementing a Parallel Image Edge Detection Algorithm Based on the Otsu-Canny Operator on the Hadoop Platform

    PubMed Central

    Wang, Min; Tian, Yun

    2018-01-01

    The Canny operator is widely used to detect edges in images. However, as the size of the image dataset increases, the edge detection performance of the Canny operator decreases and its runtime becomes excessive. To improve the runtime and edge detection performance of the Canny operator, in this paper, we propose a parallel design and implementation for an Otsu-optimized Canny operator using a MapReduce parallel programming model that runs on the Hadoop platform. The Otsu algorithm is used to optimize the Canny operator's dual threshold and improve the edge detection performance, while the MapReduce parallel programming model facilitates parallel processing for the Canny operator to solve the processing speed and communication cost problems that occur when the Canny edge detection algorithm is applied to big data. For the experiments, we constructed datasets of different scales from the Pascal VOC2012 image database. The proposed parallel Otsu-Canny edge detection algorithm performs better than other traditional edge detection algorithms. The parallel approach reduced the running time by approximately 67.2% on a Hadoop cluster architecture consisting of 5 nodes with a dataset of 60,000 images. Overall, our approach system speeds up the system by approximately 3.4 times when processing large-scale datasets, which demonstrates the obvious superiority of our method. The proposed algorithm in this study demonstrates both better edge detection performance and improved time performance. PMID:29861711

  9. Hearing Shapes: Event-related Potentials Reveal the Time Course of Auditory-Visual Sensory Substitution.

    PubMed

    Graulty, Christian; Papaioannou, Orestis; Bauer, Phoebe; Pitts, Michael A; Canseco-Gonzalez, Enriqueta

    2018-04-01

    In auditory-visual sensory substitution, visual information (e.g., shape) can be extracted through strictly auditory input (e.g., soundscapes). Previous studies have shown that image-to-sound conversions that follow simple rules [such as the Meijer algorithm; Meijer, P. B. L. An experimental system for auditory image representation. Transactions on Biomedical Engineering, 39, 111-121, 1992] are highly intuitive and rapidly learned by both blind and sighted individuals. A number of recent fMRI studies have begun to explore the neuroplastic changes that result from sensory substitution training. However, the time course of cross-sensory information transfer in sensory substitution is largely unexplored and may offer insights into the underlying neural mechanisms. In this study, we recorded ERPs to soundscapes before and after sighted participants were trained with the Meijer algorithm. We compared these posttraining versus pretraining ERP differences with those of a control group who received the same set of 80 auditory/visual stimuli but with arbitrary pairings during training. Our behavioral results confirmed the rapid acquisition of cross-sensory mappings, and the group trained with the Meijer algorithm was able to generalize their learning to novel soundscapes at impressive levels of accuracy. The ERP results revealed an early cross-sensory learning effect (150-210 msec) that was significantly enhanced in the algorithm-trained group compared with the control group as well as a later difference (420-480 msec) that was unique to the algorithm-trained group. These ERP modulations are consistent with previous fMRI results and provide additional insight into the time course of cross-sensory information transfer in sensory substitution.

  10. Data Mining and Optimization Tools for Developing Engine Parameters Tools

    NASA Technical Reports Server (NTRS)

    Dhawan, Atam P.

    1998-01-01

    This project was awarded for understanding the problem and developing a plan for Data Mining tools for use in designing and implementing an Engine Condition Monitoring System. Tricia Erhardt and I studied the problem domain for developing an Engine Condition Monitoring system using the sparse and non-standardized datasets to be available through a consortium at NASA Lewis Research Center. We visited NASA three times to discuss additional issues related to dataset which was not made available to us. We discussed and developed a general framework of data mining and optimization tools to extract useful information from sparse and non-standard datasets. These discussions lead to the training of Tricia Erhardt to develop Genetic Algorithm based search programs which were written in C++ and used to demonstrate the capability of GA algorithm in searching an optimal solution in noisy, datasets. From the study and discussion with NASA LeRC personnel, we then prepared a proposal, which is being submitted to NASA for future work for the development of data mining algorithms for engine conditional monitoring. The proposed set of algorithm uses wavelet processing for creating multi-resolution pyramid of tile data for GA based multi-resolution optimal search.

  11. Kernel-based discriminant feature extraction using a representative dataset

    NASA Astrophysics Data System (ADS)

    Li, Honglin; Sancho Gomez, Jose-Luis; Ahalt, Stanley C.

    2002-07-01

    Discriminant Feature Extraction (DFE) is widely recognized as an important pre-processing step in classification applications. Most DFE algorithms are linear and thus can only explore the linear discriminant information among the different classes. Recently, there has been several promising attempts to develop nonlinear DFE algorithms, among which is Kernel-based Feature Extraction (KFE). The efficacy of KFE has been experimentally verified by both synthetic data and real problems. However, KFE has some known limitations. First, KFE does not work well for strongly overlapped data. Second, KFE employs all of the training set samples during the feature extraction phase, which can result in significant computation when applied to very large datasets. Finally, KFE can result in overfitting. In this paper, we propose a substantial improvement to KFE that overcomes the above limitations by using a representative dataset, which consists of critical points that are generated from data-editing techniques and centroid points that are determined by using the Frequency Sensitive Competitive Learning (FSCL) algorithm. Experiments show that this new KFE algorithm performs well on significantly overlapped datasets, and it also reduces computational complexity. Further, by controlling the number of centroids, the overfitting problem can be effectively alleviated.

  12. HyRA: A Hybrid Recommendation Algorithm Focused on Smart POI. Ceutí as a Study Scenario.

    PubMed

    Alvarado-Uribe, Joanna; Gómez-Oliva, Andrea; Barrera-Animas, Ari Yair; Molina, Germán; Gonzalez-Mendoza, Miguel; Parra-Meroño, María Concepción; Jara, Antonio J

    2018-03-17

    Nowadays, Physical Web together with the increase in the use of mobile devices, Global Positioning System (GPS), and Social Networking Sites (SNS) have caused users to share enriched information on the Web such as their tourist experiences. Therefore, an area that has been significantly improved by using the contextual information provided by these technologies is tourism. In this way, the main goals of this work are to propose and develop an algorithm focused on the recommendation of Smart Point of Interaction (Smart POI) for a specific user according to his/her preferences and the Smart POIs' context. Hence, a novel Hybrid Recommendation Algorithm (HyRA) is presented by incorporating an aggregation operator into the user-based Collaborative Filtering (CF) algorithm as well as including the Smart POIs' categories and geographical information. For the experimental phase, two real-world datasets have been collected and preprocessed. In addition, one Smart POIs' categories dataset was built. As a result, a dataset composed of 16 Smart POIs, another constituted by the explicit preferences of 200 respondents, and the last dataset integrated by 13 Smart POIs' categories are provided. The experimental results show that the recommendations suggested by HyRA are promising.

  13. HyRA: A Hybrid Recommendation Algorithm Focused on Smart POI. Ceutí as a Study Scenario

    PubMed Central

    Gómez-Oliva, Andrea; Molina, Germán

    2018-01-01

    Nowadays, Physical Web together with the increase in the use of mobile devices, Global Positioning System (GPS), and Social Networking Sites (SNS) have caused users to share enriched information on the Web such as their tourist experiences. Therefore, an area that has been significantly improved by using the contextual information provided by these technologies is tourism. In this way, the main goals of this work are to propose and develop an algorithm focused on the recommendation of Smart Point of Interaction (Smart POI) for a specific user according to his/her preferences and the Smart POIs’ context. Hence, a novel Hybrid Recommendation Algorithm (HyRA) is presented by incorporating an aggregation operator into the user-based Collaborative Filtering (CF) algorithm as well as including the Smart POIs’ categories and geographical information. For the experimental phase, two real-world datasets have been collected and preprocessed. In addition, one Smart POIs’ categories dataset was built. As a result, a dataset composed of 16 Smart POIs, another constituted by the explicit preferences of 200 respondents, and the last dataset integrated by 13 Smart POIs’ categories are provided. The experimental results show that the recommendations suggested by HyRA are promising. PMID:29562590

  14. Picking ChIP-seq peak detectors for analyzing chromatin modification experiments

    PubMed Central

    Micsinai, Mariann; Parisi, Fabio; Strino, Francesco; Asp, Patrik; Dynlacht, Brian D.; Kluger, Yuval

    2012-01-01

    Numerous algorithms have been developed to analyze ChIP-Seq data. However, the complexity of analyzing diverse patterns of ChIP-Seq signals, especially for epigenetic marks, still calls for the development of new algorithms and objective comparisons of existing methods. We developed Qeseq, an algorithm to detect regions of increased ChIP read density relative to background. Qeseq employs critical novel elements, such as iterative recalibration and neighbor joining of reads to identify enriched regions of any length. To objectively assess its performance relative to other 14 ChIP-Seq peak finders, we designed a novel protocol based on Validation Discriminant Analysis (VDA) to optimally select validation sites and generated two validation datasets, which are the most comprehensive to date for algorithmic benchmarking of key epigenetic marks. In addition, we systematically explored a total of 315 diverse parameter configurations from these algorithms and found that typically optimal parameters in one dataset do not generalize to other datasets. Nevertheless, default parameters show the most stable performance, suggesting that they should be used. This study also provides a reproducible and generalizable methodology for unbiased comparative analysis of high-throughput sequencing tools that can facilitate future algorithmic development. PMID:22307239

  15. Picking ChIP-seq peak detectors for analyzing chromatin modification experiments.

    PubMed

    Micsinai, Mariann; Parisi, Fabio; Strino, Francesco; Asp, Patrik; Dynlacht, Brian D; Kluger, Yuval

    2012-05-01

    Numerous algorithms have been developed to analyze ChIP-Seq data. However, the complexity of analyzing diverse patterns of ChIP-Seq signals, especially for epigenetic marks, still calls for the development of new algorithms and objective comparisons of existing methods. We developed Qeseq, an algorithm to detect regions of increased ChIP read density relative to background. Qeseq employs critical novel elements, such as iterative recalibration and neighbor joining of reads to identify enriched regions of any length. To objectively assess its performance relative to other 14 ChIP-Seq peak finders, we designed a novel protocol based on Validation Discriminant Analysis (VDA) to optimally select validation sites and generated two validation datasets, which are the most comprehensive to date for algorithmic benchmarking of key epigenetic marks. In addition, we systematically explored a total of 315 diverse parameter configurations from these algorithms and found that typically optimal parameters in one dataset do not generalize to other datasets. Nevertheless, default parameters show the most stable performance, suggesting that they should be used. This study also provides a reproducible and generalizable methodology for unbiased comparative analysis of high-throughput sequencing tools that can facilitate future algorithmic development.

  16. Discriminating Projections for Estimating Face Age in Wild Images

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Tokola, Ryan A; Bolme, David S; Ricanek, Karl

    2014-01-01

    We introduce a novel approach to estimating the age of a human from a single uncontrolled image. Current face age estimation algorithms work well in highly controlled images, and some are robust to changes in illumination, but it is usually assumed that images are close to frontal. This bias is clearly seen in the datasets that are commonly used to evaluate age estimation, which either entirely or mostly consist of frontal images. Using pose-specific projections, our algorithm maps image features into a pose-insensitive latent space that is discriminative with respect to age. Age estimation is then performed using a multi-classmore » SVM. We show that our approach outperforms other published results on the Images of Groups dataset, which is the only age-related dataset with a non-trivial number of off-axis face images, and that we are competitive with recent age estimation algorithms on the mostly-frontal FG-NET dataset. We also experimentally demonstrate that our feature projections introduce insensitivity to pose.« less

  17. Electricity, water, and natural gas consumption of a residential house in Canada from 2012 to 2014.

    PubMed

    Makonin, Stephen; Ellert, Bradley; Bajić, Ivan V; Popowich, Fred

    2016-06-07

    With the cost of consuming resources increasing (both economically and ecologically), homeowners need to find ways to curb consumption. The Almanac of Minutely Power dataset Version 2 (AMPds2) has been released to help computational sustainability researchers, power and energy engineers, building scientists and technologists, utility companies, and eco-feedback researchers test their models, systems, algorithms, or prototypes on real house data. In the vast majority of cases, real-world datasets lead to more accurate models and algorithms. AMPds2 is the first dataset to capture all three main types of consumption (electricity, water, and natural gas) over a long period of time (2 years) and provide 11 measurement characteristics for electricity. No other such datasets from Canada exist. Each meter has 730 days of captured data. We also include environmental and utility billing data for cost analysis. AMPds2 data has been pre-cleaned to provide for consistent and comparable accuracy results amongst different researchers and machine learning algorithms.

  18. Electricity, water, and natural gas consumption of a residential house in Canada from 2012 to 2014

    PubMed Central

    Makonin, Stephen; Ellert, Bradley; Bajić, Ivan V.; Popowich, Fred

    2016-01-01

    With the cost of consuming resources increasing (both economically and ecologically), homeowners need to find ways to curb consumption. The Almanac of Minutely Power dataset Version 2 (AMPds2) has been released to help computational sustainability researchers, power and energy engineers, building scientists and technologists, utility companies, and eco-feedback researchers test their models, systems, algorithms, or prototypes on real house data. In the vast majority of cases, real-world datasets lead to more accurate models and algorithms. AMPds2 is the first dataset to capture all three main types of consumption (electricity, water, and natural gas) over a long period of time (2 years) and provide 11 measurement characteristics for electricity. No other such datasets from Canada exist. Each meter has 730 days of captured data. We also include environmental and utility billing data for cost analysis. AMPds2 data has been pre-cleaned to provide for consistent and comparable accuracy results amongst different researchers and machine learning algorithms. PMID:27271937

  19. Canonical PSO Based K-Means Clustering Approach for Real Datasets.

    PubMed

    Dey, Lopamudra; Chakraborty, Sanjay

    2014-01-01

    "Clustering" the significance and application of this technique is spread over various fields. Clustering is an unsupervised process in data mining, that is why the proper evaluation of the results and measuring the compactness and separability of the clusters are important issues. The procedure of evaluating the results of a clustering algorithm is known as cluster validity measure. Different types of indexes are used to solve different types of problems and indices selection depends on the kind of available data. This paper first proposes Canonical PSO based K-means clustering algorithm and also analyses some important clustering indices (intercluster, intracluster) and then evaluates the effects of those indices on real-time air pollution database, wholesale customer, wine, and vehicle datasets using typical K-means, Canonical PSO based K-means, simple PSO based K-means, DBSCAN, and Hierarchical clustering algorithms. This paper also describes the nature of the clusters and finally compares the performances of these clustering algorithms according to the validity assessment. It also defines which algorithm will be more desirable among all these algorithms to make proper compact clusters on this particular real life datasets. It actually deals with the behaviour of these clustering algorithms with respect to validation indexes and represents their results of evaluation in terms of mathematical and graphical forms.

  20. Canonical PSO Based K-Means Clustering Approach for Real Datasets

    PubMed Central

    Dey, Lopamudra; Chakraborty, Sanjay

    2014-01-01

    “Clustering” the significance and application of this technique is spread over various fields. Clustering is an unsupervised process in data mining, that is why the proper evaluation of the results and measuring the compactness and separability of the clusters are important issues. The procedure of evaluating the results of a clustering algorithm is known as cluster validity measure. Different types of indexes are used to solve different types of problems and indices selection depends on the kind of available data. This paper first proposes Canonical PSO based K-means clustering algorithm and also analyses some important clustering indices (intercluster, intracluster) and then evaluates the effects of those indices on real-time air pollution database, wholesale customer, wine, and vehicle datasets using typical K-means, Canonical PSO based K-means, simple PSO based K-means, DBSCAN, and Hierarchical clustering algorithms. This paper also describes the nature of the clusters and finally compares the performances of these clustering algorithms according to the validity assessment. It also defines which algorithm will be more desirable among all these algorithms to make proper compact clusters on this particular real life datasets. It actually deals with the behaviour of these clustering algorithms with respect to validation indexes and represents their results of evaluation in terms of mathematical and graphical forms. PMID:27355083

  1. a Metadata Based Approach for Analyzing Uav Datasets for Photogrammetric Applications

    NASA Astrophysics Data System (ADS)

    Dhanda, A.; Remondino, F.; Santana Quintero, M.

    2018-05-01

    This paper proposes a methodology for pre-processing and analysing Unmanned Aerial Vehicle (UAV) datasets before photogrammetric processing. In cases where images are gathered without a detailed flight plan and at regular acquisition intervals the datasets can be quite large and be time consuming to process. This paper proposes a method to calculate the image overlap and filter out images to reduce large block sizes and speed up photogrammetric processing. The python-based algorithm that implements this methodology leverages the metadata in each image to determine the end and side overlap of grid-based UAV flights. Utilizing user input, the algorithm filters out images that are unneeded for photogrammetric processing. The result is an algorithm that can speed up photogrammetric processing and provide valuable information to the user about the flight path.

  2. A globally optimal k-anonymity method for the de-identification of health data.

    PubMed

    El Emam, Khaled; Dankar, Fida Kamal; Issa, Romeo; Jonker, Elizabeth; Amyot, Daniel; Cogo, Elise; Corriveau, Jean-Pierre; Walker, Mark; Chowdhury, Sadrul; Vaillancourt, Regis; Roffey, Tyson; Bottomley, Jim

    2009-01-01

    Explicit patient consent requirements in privacy laws can have a negative impact on health research, leading to selection bias and reduced recruitment. Often legislative requirements to obtain consent are waived if the information collected or disclosed is de-identified. The authors developed and empirically evaluated a new globally optimal de-identification algorithm that satisfies the k-anonymity criterion and that is suitable for health datasets. Authors compared OLA (Optimal Lattice Anonymization) empirically to three existing k-anonymity algorithms, Datafly, Samarati, and Incognito, on six public, hospital, and registry datasets for different values of k and suppression limits. Measurement Three information loss metrics were used for the comparison: precision, discernability metric, and non-uniform entropy. Each algorithm's performance speed was also evaluated. The Datafly and Samarati algorithms had higher information loss than OLA and Incognito; OLA was consistently faster than Incognito in finding the globally optimal de-identification solution. For the de-identification of health datasets, OLA is an improvement on existing k-anonymity algorithms in terms of information loss and performance.

  3. Physical environment virtualization for human activities recognition

    NASA Astrophysics Data System (ADS)

    Poshtkar, Azin; Elangovan, Vinayak; Shirkhodaie, Amir; Chan, Alex; Hu, Shuowen

    2015-05-01

    Human activity recognition research relies heavily on extensive datasets to verify and validate performance of activity recognition algorithms. However, obtaining real datasets are expensive and highly time consuming. A physics-based virtual simulation can accelerate the development of context based human activity recognition algorithms and techniques by generating relevant training and testing videos simulating diverse operational scenarios. In this paper, we discuss in detail the requisite capabilities of a virtual environment to aid as a test bed for evaluating and enhancing activity recognition algorithms. To demonstrate the numerous advantages of virtual environment development, a newly developed virtual environment simulation modeling (VESM) environment is presented here to generate calibrated multisource imagery datasets suitable for development and testing of recognition algorithms for context-based human activities. The VESM environment serves as a versatile test bed to generate a vast amount of realistic data for training and testing of sensor processing algorithms. To demonstrate the effectiveness of VESM environment, we present various simulated scenarios and processed results to infer proper semantic annotations from the high fidelity imagery data for human-vehicle activity recognition under different operational contexts.

  4. Tractography Verified by Intraoperative Magnetic Resonance Imaging and Subcortical Stimulation During Tumor Resection Near the Corticospinal Tract.

    PubMed

    Münnich, Timo; Klein, Jan; Hattingen, Elke; Noack, Anika; Herrmann, Eva; Seifert, Volker; Senft, Christian; Forster, Marie-Therese

    2018-04-14

    Tractography is a popular tool for visualizing the corticospinal tract (CST). However, results may be influenced by numerous variables, eg, the selection of seeding regions of interests (ROIs) or the chosen tracking algorithm. To compare different variable sets by correlating tractography results with intraoperative subcortical stimulation of the CST, correcting intraoperative brain shift by the use of intraoperative MRI. Seeding ROIs were created by means of motor cortex segmentation, functional MRI (fMRI), and navigated transcranial magnetic stimulation (nTMS). Based on these ROIs, tractography was run for each patient using a deterministic and a probabilistic algorithm. Tractographies were processed on pre- and postoperatively acquired data. Using a linear mixed effects statistical model, best correlation between subcortical stimulation intensity and the distance between tractography and stimulation sites was achieved by using the segmented motor cortex as seeding ROI and applying the probabilistic algorithm on preoperatively acquired imaging sequences. Tractographies based on fMRI or nTMS results differed very little, but with enlargement of positive nTMS sites the stimulation-distance correlation of nTMS-based tractography improved. Our results underline that the use of tractography demands for careful interpretation of its virtual results by considering all influencing variables.

  5. A hybrid personalized data recommendation approach for geoscience data sharing

    NASA Astrophysics Data System (ADS)

    WANG, M.; Wang, J.

    2016-12-01

    Recommender systems are effective tools helping Internet users overcome information overloading. The two most widely used recommendation algorithms are collaborating filtering (CF) and content-based filtering (CBF). A number of recommender systems based on those two algorithms were developed for multimedia, online sells, and other domains. Each of the two algorithms has its advantages and shortcomings. Hybrid approaches that combine these two algorithms are better choices in many cases. In geoscience data sharing domain, where the items (datasets) are more informative (in space and time) and domain-specific, no recommender system is specialized for data users. This paper reports a dynamic weighted hybrid recommendation algorithm that combines CF and CBF for geoscience data sharing portal. We first derive users' ratings on items with their historical visiting time by Jenks Natural Break. In the CBF part, we incorporate the space, time, and subject information of geoscience datasets to compute item similarity. Predicted ratings were computed with k-NN method separately using CBF and CF, and then combined with weights. With training dataset we attempted to find the best model describing ideal weights and users' co-rating numbers. A logarithmic function was confirmed to be the best model. The model was then used to tune the weights of CF and CBF on user-item basis with test dataset. Evaluation results show that the dynamic weighted approach outperforms either solo CF or CBF approach in terms of Precision and Recall.

  6. McTwo: a two-step feature selection algorithm based on maximal information coefficient.

    PubMed

    Ge, Ruiquan; Zhou, Manli; Luo, Youxi; Meng, Qinghan; Mai, Guoqin; Ma, Dongli; Wang, Guoqing; Zhou, Fengfeng

    2016-03-23

    High-throughput bio-OMIC technologies are producing high-dimension data from bio-samples at an ever increasing rate, whereas the training sample number in a traditional experiment remains small due to various difficulties. This "large p, small n" paradigm in the area of biomedical "big data" may be at least partly solved by feature selection algorithms, which select only features significantly associated with phenotypes. Feature selection is an NP-hard problem. Due to the exponentially increased time requirement for finding the globally optimal solution, all the existing feature selection algorithms employ heuristic rules to find locally optimal solutions, and their solutions achieve different performances on different datasets. This work describes a feature selection algorithm based on a recently published correlation measurement, Maximal Information Coefficient (MIC). The proposed algorithm, McTwo, aims to select features associated with phenotypes, independently of each other, and achieving high classification performance of the nearest neighbor algorithm. Based on the comparative study of 17 datasets, McTwo performs about as well as or better than existing algorithms, with significantly reduced numbers of selected features. The features selected by McTwo also appear to have particular biomedical relevance to the phenotypes from the literature. McTwo selects a feature subset with very good classification performance, as well as a small feature number. So McTwo may represent a complementary feature selection algorithm for the high-dimensional biomedical datasets.

  7. Ranking and averaging independent component analysis by reproducibility (RAICAR).

    PubMed

    Yang, Zhi; LaConte, Stephen; Weng, Xuchu; Hu, Xiaoping

    2008-06-01

    Independent component analysis (ICA) is a data-driven approach that has exhibited great utility for functional magnetic resonance imaging (fMRI). Standard ICA implementations, however, do not provide the number and relative importance of the resulting components. In addition, ICA algorithms utilizing gradient-based optimization give decompositions that are dependent on initialization values, which can lead to dramatically different results. In this work, a new method, RAICAR (Ranking and Averaging Independent Component Analysis by Reproducibility), is introduced to address these issues for spatial ICA applied to fMRI. RAICAR utilizes repeated ICA realizations and relies on the reproducibility between them to rank and select components. Different realizations are aligned based on correlations, leading to aligned components. Each component is ranked and thresholded based on between-realization correlations. Furthermore, different realizations of each aligned component are selectively averaged to generate the final estimate of the given component. Reliability and accuracy of this method are demonstrated with both simulated and experimental fMRI data. Copyright 2007 Wiley-Liss, Inc.

  8. Decoupling function and anatomy in atlases of functional connectivity patterns: language mapping in tumor patients.

    PubMed

    Langs, Georg; Sweet, Andrew; Lashkari, Danial; Tie, Yanmei; Rigolo, Laura; Golby, Alexandra J; Golland, Polina

    2014-12-01

    In this paper we construct an atlas that summarizes functional connectivity characteristics of a cognitive process from a population of individuals. The atlas encodes functional connectivity structure in a low-dimensional embedding space that is derived from a diffusion process on a graph that represents correlations of fMRI time courses. The functional atlas is decoupled from the anatomical space, and thus can represent functional networks with variable spatial distribution in a population. In practice the atlas is represented by a common prior distribution for the embedded fMRI signals of all subjects. We derive an algorithm for fitting this generative model to the observed data in a population. Our results in a language fMRI study demonstrate that the method identifies coherent and functionally equivalent regions across subjects. The method also successfully maps functional networks from a healthy population used as a training set to individuals whose language networks are affected by tumors. Copyright © 2014. Published by Elsevier Inc.

  9. A Robust Motion Artifact Detection Algorithm for Accurate Detection of Heart Rates From Photoplethysmographic Signals Using Time-Frequency Spectral Features.

    PubMed

    Dao, Duy; Salehizadeh, S M A; Noh, Yeonsik; Chong, Jo Woon; Cho, Chae Ho; McManus, Dave; Darling, Chad E; Mendelson, Yitzhak; Chon, Ki H

    2017-09-01

    Motion and noise artifacts (MNAs) impose limits on the usability of the photoplethysmogram (PPG), particularly in the context of ambulatory monitoring. MNAs can distort PPG, causing erroneous estimation of physiological parameters such as heart rate (HR) and arterial oxygen saturation (SpO2). In this study, we present a novel approach, "TifMA," based on using the time-frequency spectrum of PPG to first detect the MNA-corrupted data and next discard the nonusable part of the corrupted data. The term "nonusable" refers to segments of PPG data from which the HR signal cannot be recovered accurately. Two sequential classification procedures were included in the TifMA algorithm. The first classifier distinguishes between MNA-corrupted and MNA-free PPG data. Once a segment of data is deemed MNA-corrupted, the next classifier determines whether the HR can be recovered from the corrupted segment or not. A support vector machine (SVM) classifier was used to build a decision boundary for the first classification task using data segments from a training dataset. Features from time-frequency spectra of PPG were extracted to build the detection model. Five datasets were considered for evaluating TifMA performance: (1) and (2) were laboratory-controlled PPG recordings from forehead and finger pulse oximeter sensors with subjects making random movements, (3) and (4) were actual patient PPG recordings from UMass Memorial Medical Center with random free movements and (5) was a laboratory-controlled PPG recording dataset measured at the forehead while the subjects ran on a treadmill. The first dataset was used to analyze the noise sensitivity of the algorithm. Datasets 2-4 were used to evaluate the MNA detection phase of the algorithm. The results from the first phase of the algorithm (MNA detection) were compared to results from three existing MNA detection algorithms: the Hjorth, kurtosis-Shannon entropy, and time-domain variability-SVM approaches. This last is an approach recently developed in our laboratory. The proposed TifMA algorithm consistently provided higher detection rates than the other three methods, with accuracies greater than 95% for all data. Moreover, our algorithm was able to pinpoint the start and end times of the MNA with an error of less than 1 s in duration, whereas the next-best algorithm had a detection error of more than 2.2 s. The final, most challenging, dataset was collected to verify the performance of the algorithm in discriminating between corrupted data that were usable for accurate HR estimations and data that were nonusable. It was found that on average 48% of the data segments were found to have MNA, and of these, 38% could be used to provide reliable HR estimation.

  10. An improved algorithm of fiber tractography demonstrates postischemic cerebral reorganization

    NASA Astrophysics Data System (ADS)

    Liu, Xiao-dong; Lu, Jie; Yao, Li; Li, Kun-cheng; Zhao, Xiao-jie

    2008-03-01

    In vivo white matter tractography by diffusion tensor imaging (DTI) accurately represents the organizational architecture of white matter in the vicinity of brain lesions and especially ischemic brain. In this study, we suggested an improved fiber tracking algorithm based on TEND, called TENDAS, for tensor deflection with adaptive stepping, which had been introduced a stepping framework for interpreting the algorithm behavior as a function of the tensor shape (linear-shaped or not) and tract history. The propagation direction at each step was given by the deflection vector. TENDAS tractography was used to examine a 17-year-old recovery patient with congenital right hemisphere artery stenosis combining with fMRI. Meaningless picture location was used as spatial working memory task in this study. We detected the shifted functional localization to the contralateral homotypic cortex and more prominent and extensive left-sided parietal and medial frontal cortical activations which were used directly as seed mask for tractography for the reconstruction of individual spatial parietal pathways. Comparing with the TEND algorithms, TENDAS shows smoother and less sharp bending characterization of white matter architecture of the parietal cortex. The results of this preliminary study were twofold. First, TENDAS may provide more adaptability and accuracy in reconstructing certain anatomical features, whereas it is very difficult to verify tractography maps of white matter connectivity in the living human brain. Second, our study indicates that combination of TENDAS and fMRI provide a unique image of functional cortical reorganization and structural modifications of postischemic spatial working memory.

  11. Automatical and accurate segmentation of cerebral tissues in fMRI dataset with combination of image processing and deep learning

    NASA Astrophysics Data System (ADS)

    Kong, Zhenglun; Luo, Junyi; Xu, Shengpu; Li, Ting

    2018-02-01

    Image segmentation plays an important role in medical science. One application is multimodality imaging, especially the fusion of structural imaging with functional imaging, which includes CT, MRI and new types of imaging technology such as optical imaging to obtain functional images. The fusion process require precisely extracted structural information, in order to register the image to it. Here we used image enhancement, morphometry methods to extract the accurate contours of different tissues such as skull, cerebrospinal fluid (CSF), grey matter (GM) and white matter (WM) on 5 fMRI head image datasets. Then we utilized convolutional neural network to realize automatic segmentation of images in deep learning way. Such approach greatly reduced the processing time compared to manual and semi-automatic segmentation and is of great importance in improving speed and accuracy as more and more samples being learned. The contours of the borders of different tissues on all images were accurately extracted and 3D visualized. This can be used in low-level light therapy and optical simulation software such as MCVM. We obtained a precise three-dimensional distribution of brain, which offered doctors and researchers quantitative volume data and detailed morphological characterization for personal precise medicine of Cerebral atrophy/expansion. We hope this technique can bring convenience to visualization medical and personalized medicine.

  12. Inferring Boolean network states from partial information

    PubMed Central

    2013-01-01

    Networks of molecular interactions regulate key processes in living cells. Therefore, understanding their functionality is a high priority in advancing biological knowledge. Boolean networks are often used to describe cellular networks mathematically and are fitted to experimental datasets. The fitting often results in ambiguities since the interpretation of the measurements is not straightforward and since the data contain noise. In order to facilitate a more reliable mapping between datasets and Boolean networks, we develop an algorithm that infers network trajectories from a dataset distorted by noise. We analyze our algorithm theoretically and demonstrate its accuracy using simulation and microarray expression data. PMID:24006954

  13. LEAP: biomarker inference through learning and evaluating association patterns.

    PubMed

    Jiang, Xia; Neapolitan, Richard E

    2015-03-01

    Single nucleotide polymorphism (SNP) high-dimensional datasets are available from Genome Wide Association Studies (GWAS). Such data provide researchers opportunities to investigate the complex genetic basis of diseases. Much of genetic risk might be due to undiscovered epistatic interactions, which are interactions in which combination of several genes affect disease. Research aimed at discovering interacting SNPs from GWAS datasets proceeded in two directions. First, tools were developed to evaluate candidate interactions. Second, algorithms were developed to search over the space of candidate interactions. Another problem when learning interacting SNPs, which has not received much attention, is evaluating how likely it is that the learned SNPs are associated with the disease. A complete system should provide this information as well. We develop such a system. Our system, called LEAP, includes a new heuristic search algorithm for learning interacting SNPs, and a Bayesian network based algorithm for computing the probability of their association. We evaluated the performance of LEAP using 100 1,000-SNP simulated datasets, each of which contains 15 SNPs involved in interactions. When learning interacting SNPs from these datasets, LEAP outperformed seven others methods. Furthermore, only SNPs involved in interactions were found to be probable. We also used LEAP to analyze real Alzheimer's disease and breast cancer GWAS datasets. We obtained interesting and new results from the Alzheimer's dataset, but limited results from the breast cancer dataset. We conclude that our results support that LEAP is a useful tool for extracting candidate interacting SNPs from high-dimensional datasets and determining their probability. © 2015 The Authors. *Genetic Epidemiology published by Wiley Periodicals, Inc.

  14. m-BIRCH: an online clustering approach for computer vision applications

    NASA Astrophysics Data System (ADS)

    Madan, Siddharth K.; Dana, Kristin J.

    2015-03-01

    We adapt a classic online clustering algorithm called Balanced Iterative Reducing and Clustering using Hierarchies (BIRCH), to incrementally cluster large datasets of features commonly used in multimedia and computer vision. We call the adapted version modified-BIRCH (m-BIRCH). The algorithm uses only a fraction of the dataset memory to perform clustering, and updates the clustering decisions when new data comes in. Modifications made in m-BIRCH enable data driven parameter selection and effectively handle varying density regions in the feature space. Data driven parameter selection automatically controls the level of coarseness of the data summarization. Effective handling of varying density regions is necessary to well represent the different density regions in data summarization. We use m-BIRCH to cluster 840K color SIFT descriptors, and 60K outlier corrupted grayscale patches. We use the algorithm to cluster datasets consisting of challenging non-convex clustering patterns. Our implementation of the algorithm provides an useful clustering tool and is made publicly available.

  15. A Genetic Algorithm That Exchanges Neighboring Centers for Fuzzy c-Means Clustering

    ERIC Educational Resources Information Center

    Chahine, Firas Safwan

    2012-01-01

    Clustering algorithms are widely used in pattern recognition and data mining applications. Due to their computational efficiency, partitional clustering algorithms are better suited for applications with large datasets than hierarchical clustering algorithms. K-means is among the most popular partitional clustering algorithm, but has a major…

  16. Joint estimation of motion and illumination change in a sequence of images

    NASA Astrophysics Data System (ADS)

    Koo, Ja-Keoung; Kim, Hyo-Hun; Hong, Byung-Woo

    2015-09-01

    We present an algorithm that simultaneously computes optical flow and estimates illumination change from an image sequence in a unified framework. We propose an energy functional consisting of conventional optical flow energy based on Horn-Schunck method and an additional constraint that is designed to compensate for illumination changes. Any undesirable illumination change that occurs in the imaging procedure in a sequence while the optical flow is being computed is considered a nuisance factor. In contrast to the conventional optical flow algorithm based on Horn-Schunck functional, which assumes the brightness constancy constraint, our algorithm is shown to be robust with respect to temporal illumination changes in the computation of optical flows. An efficient conjugate gradient descent technique is used in the optimization procedure as a numerical scheme. The experimental results obtained from the Middlebury benchmark dataset demonstrate the robustness and the effectiveness of our algorithm. In addition, comparative analysis of our algorithm and Horn-Schunck algorithm is performed on the additional test dataset that is constructed by applying a variety of synthetic bias fields to the original image sequences in the Middlebury benchmark dataset in order to demonstrate that our algorithm outperforms the Horn-Schunck algorithm. The superior performance of the proposed method is observed in terms of both qualitative visualizations and quantitative accuracy errors when compared to Horn-Schunck optical flow algorithm that easily yields poor results in the presence of small illumination changes leading to violation of the brightness constancy constraint.

  17. Developing Novel Machine Learning Algorithms to Improve Sedentary Assessment for Youth Health Enhancement.

    PubMed

    Golla, Gowtham Kumar; Carlson, Jordan A; Huan, Jun; Kerr, Jacqueline; Mitchell, Tarrah; Borner, Kelsey

    2016-10-01

    Sedentary behavior of youth is an important determinant of health. However, better measures are needed to improve understanding of this relationship and the mechanisms at play, as well as to evaluate health promotion interventions. Wearable accelerometers are considered as the standard for assessing physical activity in research, but do not perform well for assessing posture (i.e., sitting vs. standing), a critical component of sedentary behavior. The machine learning algorithms that we propose for assessing sedentary behavior will allow us to re-examine existing accelerometer data to better understand the association between sedentary time and health in various populations. We collected two datasets, a laboratory-controlled dataset and a free-living dataset. We trained machine learning classifiers separately on each dataset and compared performance across datasets. The classifiers predict five postures: sit, stand, sit-stand, stand-sit, and stand\\walk. We compared a manually constructed Hidden Markov model (HMM) with an automated HMM from existing software. The manually constructed HMM gave more F1-Macro score on both datasets.

  18. The Status of the NASA MEaSUREs Combined ASTER and MODIS Emissivity Over Land (CAMEL) Products

    NASA Astrophysics Data System (ADS)

    Borbas, E. E.; Feltz, M.; Hulley, G. C.; Knuteson, R. O.; Hook, S. J.

    2017-12-01

    As part of a NASA MEaSUREs Land Surface Temperature and Emissivity project, the University of Wisconsin, Space Science and Engineering Center and the NASA's Jet Propulsion Laboratory have developed a global monthly mean emissivity Earth System Data Record (ESDR). The CAMEL ESDR was produced by merging two current state-of-the-art emissivity datasets: the UW-Madison MODIS Infrared emissivity dataset (UWIREMIS), and the JPL ASTER Global Emissivity Dataset v4 (GEDv4). The dataset includes monthly global data records of emissivity, uncertainty at 13 hinge points between 3.6-14.3 µm, and Principal Components Analysis (PCA) coefficients at 5 kilometer resolution for years 2003 to 2015. A high spectral resolution algorithm is also provided for HSR applications. The dataset is currently being tested in sounder retrieval algorithm (e.g. CrIS, IASI) and has already been implemented in RTTOV-12 for immediate use in numerical weather modeling and data assimilation. This poster will present the current status of the dataset.

  19. An efficient and scalable graph modeling approach for capturing information at different levels in next generation sequencing reads

    PubMed Central

    2013-01-01

    Background Next generation sequencing technologies have greatly advanced many research areas of the biomedical sciences through their capability to generate massive amounts of genetic information at unprecedented rates. The advent of next generation sequencing has led to the development of numerous computational tools to analyze and assemble the millions to billions of short sequencing reads produced by these technologies. While these tools filled an important gap, current approaches for storing, processing, and analyzing short read datasets generally have remained simple and lack the complexity needed to efficiently model the produced reads and assemble them correctly. Results Previously, we presented an overlap graph coarsening scheme for modeling read overlap relationships on multiple levels. Most current read assembly and analysis approaches use a single graph or set of clusters to represent the relationships among a read dataset. Instead, we use a series of graphs to represent the reads and their overlap relationships across a spectrum of information granularity. At each information level our algorithm is capable of generating clusters of reads from the reduced graph, forming an integrated graph modeling and clustering approach for read analysis and assembly. Previously we applied our algorithm to simulated and real 454 datasets to assess its ability to efficiently model and cluster next generation sequencing data. In this paper we extend our algorithm to large simulated and real Illumina datasets to demonstrate that our algorithm is practical for both sequencing technologies. Conclusions Our overlap graph theoretic algorithm is able to model next generation sequencing reads at various levels of granularity through the process of graph coarsening. Additionally, our model allows for efficient representation of the read overlap relationships, is scalable for large datasets, and is practical for both Illumina and 454 sequencing technologies. PMID:24564333

  20. Version 2 of the IASI NH3 neural network retrieval algorithm: near-real-time and reanalysed datasets

    NASA Astrophysics Data System (ADS)

    Van Damme, Martin; Whitburn, Simon; Clarisse, Lieven; Clerbaux, Cathy; Hurtmans, Daniel; Coheur, Pierre-François

    2017-12-01

    Recently, Whitburn et al.(2016) presented a neural-network-based algorithm for retrieving atmospheric ammonia (NH3) columns from Infrared Atmospheric Sounding Interferometer (IASI) satellite observations. In the past year, several improvements have been introduced, and the resulting new baseline version, Artificial Neural Network for IASI (ANNI)-NH3-v2.1, is documented here. One of the main changes to the algorithm is that separate neural networks were trained for land and sea observations, resulting in a better training performance for both groups. By reducing and transforming the input parameter space, performance is now also better for observations associated with favourable sounding conditions (i.e. enhanced thermal contrasts). Other changes relate to the introduction of a bias correction over land and sea and the treatment of the satellite zenith angle. In addition to these algorithmic changes, new recommendations for post-filtering the data and for averaging data in time or space are formulated. We also introduce a second dataset (ANNI-NH3-v2.1R-I) which relies on ERA-Interim ECMWF meteorological input data, along with surface temperature retrieved from a dedicated network, rather than the operationally provided Eumetsat IASI Level 2 (L2) data used for the standard near-real-time version. The need for such a dataset emerged after a series of sharp discontinuities were identified in the NH3 time series, which could be traced back to incremental changes in the IASI L2 algorithms for temperature and clouds. The reanalysed dataset is coherent in time and can therefore be used to study trends. Furthermore, both datasets agree reasonably well in the mean on recent data, after the date when the IASI meteorological L2 version 6 became operational (30 September 2014).

  1. Efficient Record Linkage Algorithms Using Complete Linkage Clustering.

    PubMed

    Mamun, Abdullah-Al; Aseltine, Robert; Rajasekaran, Sanguthevar

    2016-01-01

    Data from different agencies share data of the same individuals. Linking these datasets to identify all the records belonging to the same individuals is a crucial and challenging problem, especially given the large volumes of data. A large number of available algorithms for record linkage are prone to either time inefficiency or low-accuracy in finding matches and non-matches among the records. In this paper we propose efficient as well as reliable sequential and parallel algorithms for the record linkage problem employing hierarchical clustering methods. We employ complete linkage hierarchical clustering algorithms to address this problem. In addition to hierarchical clustering, we also use two other techniques: elimination of duplicate records and blocking. Our algorithms use sorting as a sub-routine to identify identical copies of records. We have tested our algorithms on datasets with millions of synthetic records. Experimental results show that our algorithms achieve nearly 100% accuracy. Parallel implementations achieve almost linear speedups. Time complexities of these algorithms do not exceed those of previous best-known algorithms. Our proposed algorithms outperform previous best-known algorithms in terms of accuracy consuming reasonable run times.

  2. Efficient Record Linkage Algorithms Using Complete Linkage Clustering

    PubMed Central

    Mamun, Abdullah-Al; Aseltine, Robert; Rajasekaran, Sanguthevar

    2016-01-01

    Data from different agencies share data of the same individuals. Linking these datasets to identify all the records belonging to the same individuals is a crucial and challenging problem, especially given the large volumes of data. A large number of available algorithms for record linkage are prone to either time inefficiency or low-accuracy in finding matches and non-matches among the records. In this paper we propose efficient as well as reliable sequential and parallel algorithms for the record linkage problem employing hierarchical clustering methods. We employ complete linkage hierarchical clustering algorithms to address this problem. In addition to hierarchical clustering, we also use two other techniques: elimination of duplicate records and blocking. Our algorithms use sorting as a sub-routine to identify identical copies of records. We have tested our algorithms on datasets with millions of synthetic records. Experimental results show that our algorithms achieve nearly 100% accuracy. Parallel implementations achieve almost linear speedups. Time complexities of these algorithms do not exceed those of previous best-known algorithms. Our proposed algorithms outperform previous best-known algorithms in terms of accuracy consuming reasonable run times. PMID:27124604

  3. Pattern classification of fMRI data: applications for analysis of spatially distributed cortical networks.

    PubMed

    Yourganov, Grigori; Schmah, Tanya; Churchill, Nathan W; Berman, Marc G; Grady, Cheryl L; Strother, Stephen C

    2014-08-01

    The field of fMRI data analysis is rapidly growing in sophistication, particularly in the domain of multivariate pattern classification. However, the interaction between the properties of the analytical model and the parameters of the BOLD signal (e.g. signal magnitude, temporal variance and functional connectivity) is still an open problem. We addressed this problem by evaluating a set of pattern classification algorithms on simulated and experimental block-design fMRI data. The set of classifiers consisted of linear and quadratic discriminants, linear support vector machine, and linear and nonlinear Gaussian naive Bayes classifiers. For linear discriminant, we used two methods of regularization: principal component analysis, and ridge regularization. The classifiers were used (1) to classify the volumes according to the behavioral task that was performed by the subject, and (2) to construct spatial maps that indicated the relative contribution of each voxel to classification. Our evaluation metrics were: (1) accuracy of out-of-sample classification and (2) reproducibility of spatial maps. In simulated data sets, we performed an additional evaluation of spatial maps with ROC analysis. We varied the magnitude, temporal variance and connectivity of simulated fMRI signal and identified the optimal classifier for each simulated environment. Overall, the best performers were linear and quadratic discriminants (operating on principal components of the data matrix) and, in some rare situations, a nonlinear Gaussian naïve Bayes classifier. The results from the simulated data were supported by within-subject analysis of experimental fMRI data, collected in a study of aging. This is the first study that systematically characterizes interactions between analysis model and signal parameters (such as magnitude, variance and correlation) on the performance of pattern classifiers for fMRI. Copyright © 2014 Elsevier Inc. All rights reserved.

  4. A hierarchical model for probabilistic independent component analysis of multi-subject fMRI studies

    PubMed Central

    Tang, Li

    2014-01-01

    Summary An important goal in fMRI studies is to decompose the observed series of brain images to identify and characterize underlying brain functional networks. Independent component analysis (ICA) has been shown to be a powerful computational tool for this purpose. Classic ICA has been successfully applied to single-subject fMRI data. The extension of ICA to group inferences in neuroimaging studies, however, is challenging due to the unavailability of a pre-specified group design matrix. Existing group ICA methods generally concatenate observed fMRI data across subjects on the temporal domain and then decompose multi-subject data in a similar manner to single-subject ICA. The major limitation of existing methods is that they ignore between-subject variability in spatial distributions of brain functional networks in group ICA. In this paper, we propose a new hierarchical probabilistic group ICA method to formally model subject-specific effects in both temporal and spatial domains when decomposing multi-subject fMRI data. The proposed method provides model-based estimation of brain functional networks at both the population and subject level. An important advantage of the hierarchical model is that it provides a formal statistical framework to investigate similarities and differences in brain functional networks across subjects, e.g., subjects with mental disorders or neurodegenerative diseases such as Parkinson’s as compared to normal subjects. We develop an EM algorithm for model estimation where both the E-step and M-step have explicit forms. We compare the performance of the proposed hierarchical model with that of two popular group ICA methods via simulation studies. We illustrate our method with application to an fMRI study of Zen meditation. PMID:24033125

  5. Brain correlates of autonomic modulation: combining heart rate variability with fMRI.

    PubMed

    Napadow, Vitaly; Dhond, Rupali; Conti, Giulia; Makris, Nikos; Brown, Emery N; Barbieri, Riccardo

    2008-08-01

    The central autonomic network (CAN) has been described in animal models but has been difficult to elucidate in humans. Potential confounds include physiological noise artifacts affecting brainstem neuroimaging data, and difficulty in deriving non-invasive continuous assessments of autonomic modulation. We have developed and implemented a new method which relates cardiac-gated fMRI timeseries with continuous-time heart rate variability (HRV) to estimate central autonomic processing. As many autonomic structures of interest are in brain regions strongly affected by cardiogenic pulsatility, we chose to cardiac-gate our fMRI acquisition to increase sensitivity. Cardiac-gating introduces T1-variability, which was corrected by transforming fMRI data to a fixed TR using a previously published method [Guimaraes, A.R., Melcher, J.R., et al., 1998. Imaging subcortical auditory activity in humans. Hum. Brain Mapp. 6(1), 33-41]. The electrocardiogram was analyzed with a novel point process adaptive-filter algorithm for computation of the high-frequency (HF) index, reflecting the time-varying dynamics of efferent cardiovagal modulation. Central command of cardiovagal outflow was inferred by using the resample HF timeseries as a regressor to the fMRI data. A grip task was used to perturb the autonomic nervous system. Our combined HRV-fMRI approach demonstrated HF correlation with fMRI activity in the hypothalamus, cerebellum, parabrachial nucleus/locus ceruleus, periaqueductal gray, amygdala, hippocampus, thalamus, and dorsomedial/dorsolateral prefrontal, posterior insular, and middle temporal cortices. While some regions consistent with central cardiovagal control in animal models gave corroborative evidence for our methodology, other mostly higher cortical or limbic-related brain regions may be unique to humans. Our approach should be optimized and applied to study the human brain correlates of autonomic modulation for various stimuli in both physiological and pathological states.

  6. Comparing MODIS C6 'Deep Blue' and 'Dark Target' Aerosol Data

    NASA Technical Reports Server (NTRS)

    Hsu, N. C.; Sayer, A. M.; Bettenhausen, C.; Lee, J.; Levy, R. C.; Mattoo, S.; Munchak, L. A.; Kleidman, R.

    2014-01-01

    The MODIS Collection 6 Atmospheres product suite includes refined versions of both 'Deep Blue' (DB) and 'Dark Target' (DT) aerosol algorithms, with the DB dataset now expanded to include coverage over vegetated land surfaces. This means that, over much of the global land surface, users will have both DB and DT data to choose from. A 'merged' dataset is also provided, primarily for visualization purposes, which takes retrievals from either or both algorithms based on regional and seasonal climatologies of normalized difference vegetation index (NDVI). This poster present some comparisons of these two C6 aerosol algorithms, focusing on AOD at 550 nm derived from MODIS Aqua measurements, with each other and with Aerosol Robotic Network (AERONET) data, with the intent to facilitate user decisions about the suitability of the two datasets for their desired applications.

  7. Fully Automated Segmentation of Fluid/Cyst Regions in Optical Coherence Tomography Images With Diabetic Macular Edema Using Neutrosophic Sets and Graph Algorithms.

    PubMed

    Rashno, Abdolreza; Koozekanani, Dara D; Drayna, Paul M; Nazari, Behzad; Sadri, Saeed; Rabbani, Hossein; Parhi, Keshab K

    2018-05-01

    This paper presents a fully automated algorithm to segment fluid-associated (fluid-filled) and cyst regions in optical coherence tomography (OCT) retina images of subjects with diabetic macular edema. The OCT image is segmented using a novel neutrosophic transformation and a graph-based shortest path method. In neutrosophic domain, an image is transformed into three sets: (true), (indeterminate) that represents noise, and (false). This paper makes four key contributions. First, a new method is introduced to compute the indeterminacy set , and a new -correction operation is introduced to compute the set in neutrosophic domain. Second, a graph shortest-path method is applied in neutrosophic domain to segment the inner limiting membrane and the retinal pigment epithelium as regions of interest (ROI) and outer plexiform layer and inner segment myeloid as middle layers using a novel definition of the edge weights . Third, a new cost function for cluster-based fluid/cyst segmentation in ROI is presented which also includes a novel approach in estimating the number of clusters in an automated manner. Fourth, the final fluid regions are achieved by ignoring very small regions and the regions between middle layers. The proposed method is evaluated using two publicly available datasets: Duke, Optima, and a third local dataset from the UMN clinic which is available online. The proposed algorithm outperforms the previously proposed Duke algorithm by 8% with respect to the dice coefficient and by 5% with respect to precision on the Duke dataset, while achieving about the same sensitivity. Also, the proposed algorithm outperforms a prior method for Optima dataset by 6%, 22%, and 23% with respect to the dice coefficient, sensitivity, and precision, respectively. Finally, the proposed algorithm also achieves sensitivity of 67.3%, 88.8%, and 76.7%, for the Duke, Optima, and the university of minnesota (UMN) datasets, respectively.

  8. Aerosol Climate Time Series Evaluation In ESA Aerosol_cci

    NASA Astrophysics Data System (ADS)

    Popp, T.; de Leeuw, G.; Pinnock, S.

    2015-12-01

    Within the ESA Climate Change Initiative (CCI) Aerosol_cci (2010 - 2017) conducts intensive work to improve algorithms for the retrieval of aerosol information from European sensors. By the end of 2015 full mission time series of 2 GCOS-required aerosol parameters are completely validated and released: Aerosol Optical Depth (AOD) from dual view ATSR-2 / AATSR radiometers (3 algorithms, 1995 - 2012), and stratospheric extinction profiles from star occultation GOMOS spectrometer (2002 - 2012). Additionally, a 35-year multi-sensor time series of the qualitative Absorbing Aerosol Index (AAI) together with sensitivity information and an AAI model simulator is available. Complementary aerosol properties requested by GCOS are in a "round robin" phase, where various algorithms are inter-compared: fine mode AOD, mineral dust AOD (from the thermal IASI spectrometer), absorption information and aerosol layer height. As a quasi-reference for validation in few selected regions with sparse ground-based observations the multi-pixel GRASP algorithm for the POLDER instrument is used. Validation of first dataset versions (vs. AERONET, MAN) and inter-comparison to other satellite datasets (MODIS, MISR, SeaWIFS) proved the high quality of the available datasets comparable to other satellite retrievals and revealed needs for algorithm improvement (for example for higher AOD values) which were taken into account for a reprocessing. The datasets contain pixel level uncertainty estimates which are also validated. The paper will summarize and discuss the results of major reprocessing and validation conducted in 2015. The focus will be on the ATSR, GOMOS and IASI datasets. Pixel level uncertainties validation will be summarized and discussed including unknown components and their potential usefulness and limitations. Opportunities for time series extension with successor instruments of the Sentinel family will be described and the complementarity of the different satellite aerosol products (e.g. dust vs. total AOD, ensembles from different algorithms for the same sensor) will be discussed.

  9. Inferring microbial interaction networks from metagenomic data using SgLV-EKF algorithm.

    PubMed

    Alshawaqfeh, Mustafa; Serpedin, Erchin; Younes, Ahmad Bani

    2017-03-27

    Inferring the microbial interaction networks (MINs) and modeling their dynamics are critical in understanding the mechanisms of the bacterial ecosystem and designing antibiotic and/or probiotic therapies. Recently, several approaches were proposed to infer MINs using the generalized Lotka-Volterra (gLV) model. Main drawbacks of these models include the fact that these models only consider the measurement noise without taking into consideration the uncertainties in the underlying dynamics. Furthermore, inferring the MIN is characterized by the limited number of observations and nonlinearity in the regulatory mechanisms. Therefore, novel estimation techniques are needed to address these challenges. This work proposes SgLV-EKF: a stochastic gLV model that adopts the extended Kalman filter (EKF) algorithm to model the MIN dynamics. In particular, SgLV-EKF employs a stochastic modeling of the MIN by adding a noise term to the dynamical model to compensate for modeling uncertainties. This stochastic modeling is more realistic than the conventional gLV model which assumes that the MIN dynamics are perfectly governed by the gLV equations. After specifying the stochastic model structure, we propose the EKF to estimate the MIN. SgLV-EKF was compared with two similarity-based algorithms, one algorithm from the integral-based family and two regression-based algorithms, in terms of the achieved performance on two synthetic data-sets and two real data-sets. The first data-set models the randomness in measurement data, whereas, the second data-set incorporates uncertainties in the underlying dynamics. The real data-sets are provided by a recent study pertaining to an antibiotic-mediated Clostridium difficile infection. The experimental results demonstrate that SgLV-EKF outperforms the alternative methods in terms of robustness to measurement noise, modeling errors, and tracking the dynamics of the MIN. Performance analysis demonstrates that the proposed SgLV-EKF algorithm represents a powerful and reliable tool to infer MINs and track their dynamics.

  10. Tracking children's mental states while solving algebra equations.

    PubMed

    Anderson, John R; Betts, Shawn; Ferris, Jennifer L; Fincham, Jon M

    2012-11-01

    Behavioral and function magnetic resonance imagery (fMRI) data were combined to infer the mental states of students as they interacted with an intelligent tutoring system. Sixteen children interacted with a computer tutor for solving linear equations over a six-day period (days 0-5), with days 1 and 5 occurring in an fMRI scanner. Hidden Markov model algorithms combined a model of student behavior with multi-voxel imaging pattern data to predict the mental states of students. We separately assessed the algorithms' ability to predict which step in a problem-solving sequence was performed and whether the step was performed correctly. For day 1, the data patterns of other students were used to predict the mental states of a target student. These predictions were improved on day 5 by adding information about the target student's behavioral and imaging data from day 1. Successful tracking of mental states depended on using the combination of a behavioral model and multi-voxel pattern analysis, illustrating the effectiveness of an integrated approach to tracking the cognition of individuals in real time as they perform complex tasks. Copyright © 2011 Wiley Periodicals, Inc.

  11. A photogrammetric technique for generation of an accurate multispectral optical flow dataset

    NASA Astrophysics Data System (ADS)

    Kniaz, V. V.

    2017-06-01

    A presence of an accurate dataset is the key requirement for a successful development of an optical flow estimation algorithm. A large number of freely available optical flow datasets were developed in recent years and gave rise for many powerful algorithms. However most of the datasets include only images captured in the visible spectrum. This paper is focused on the creation of a multispectral optical flow dataset with an accurate ground truth. The generation of an accurate ground truth optical flow is a rather complex problem, as no device for error-free optical flow measurement was developed to date. Existing methods for ground truth optical flow estimation are based on hidden textures, 3D modelling or laser scanning. Such techniques are either work only with a synthetic optical flow or provide a sparse ground truth optical flow. In this paper a new photogrammetric method for generation of an accurate ground truth optical flow is proposed. The method combines the benefits of the accuracy and density of a synthetic optical flow datasets with the flexibility of laser scanning based techniques. A multispectral dataset including various image sequences was generated using the developed method. The dataset is freely available on the accompanying web site.

  12. Identification of Disease Critical Genes Using Collective Meta-heuristic Approaches: An Application to Preeclampsia.

    PubMed

    Biswas, Surama; Dutta, Subarna; Acharyya, Sriyankar

    2017-12-01

    Identifying a small subset of disease critical genes out of a large size of microarray gene expression data is a challenge in computational life sciences. This paper has applied four meta-heuristic algorithms, namely, honey bee mating optimization (HBMO), harmony search (HS), differential evolution (DE) and genetic algorithm (basic version GA) to find disease critical genes of preeclampsia which affects women during gestation. Two hybrid algorithms, namely, HBMO-kNN and HS-kNN have been newly proposed here where kNN (k nearest neighbor classifier) is used for sample classification. Performances of these new approaches have been compared with other two hybrid algorithms, namely, DE-kNN and SGA-kNN. Three datasets of different sizes have been used. In a dataset, the set of genes found common in the output of each algorithm is considered here as disease critical genes. In different datasets, the percentage of classification or classification accuracy of meta-heuristic algorithms varied between 92.46 and 100%. HBMO-kNN has the best performance (99.64-100%) in almost all data sets. DE-kNN secures the second position (99.42-100%). Disease critical genes obtained here match with clinically revealed preeclampsia genes to a large extent.

  13. Functional magnetic resonance imaging activation detection: fuzzy cluster analysis in wavelet and multiwavelet domains.

    PubMed

    Jahanian, Hesamoddin; Soltanian-Zadeh, Hamid; Hossein-Zadeh, Gholam-Ali

    2005-09-01

    To present novel feature spaces, based on multiscale decompositions obtained by scalar wavelet and multiwavelet transforms, to remedy problems associated with high dimension of functional magnetic resonance imaging (fMRI) time series (when they are used directly in clustering algorithms) and their poor signal-to-noise ratio (SNR) that limits accurate classification of fMRI time series according to their activation contents. Using randomization, the proposed method finds wavelet/multiwavelet coefficients that represent the activation content of fMRI time series and combines them to define new feature spaces. Using simulated and experimental fMRI data sets, the proposed feature spaces are compared to the cross-correlation (CC) feature space and their performances are evaluated. In these studies, the false positive detection rate is controlled using randomization. To compare different methods, several points of the receiver operating characteristics (ROC) curves, using simulated data, are estimated and compared. The proposed features suppress the effects of confounding signals and improve activation detection sensitivity. Experimental results show improved sensitivity and robustness of the proposed method compared to the conventional CC analysis. More accurate and sensitive activation detection can be achieved using the proposed feature spaces compared to CC feature space. Multiwavelet features show superior detection sensitivity compared to the scalar wavelet features. (c) 2005 Wiley-Liss, Inc.

  14. Regression Models for Identifying Noise Sources in Magnetic Resonance Images

    PubMed Central

    Zhu, Hongtu; Li, Yimei; Ibrahim, Joseph G.; Shi, Xiaoyan; An, Hongyu; Chen, Yashen; Gao, Wei; Lin, Weili; Rowe, Daniel B.; Peterson, Bradley S.

    2009-01-01

    Stochastic noise, susceptibility artifacts, magnetic field and radiofrequency inhomogeneities, and other noise components in magnetic resonance images (MRIs) can introduce serious bias into any measurements made with those images. We formally introduce three regression models including a Rician regression model and two associated normal models to characterize stochastic noise in various magnetic resonance imaging modalities, including diffusion-weighted imaging (DWI) and functional MRI (fMRI). Estimation algorithms are introduced to maximize the likelihood function of the three regression models. We also develop a diagnostic procedure for systematically exploring MR images to identify noise components other than simple stochastic noise, and to detect discrepancies between the fitted regression models and MRI data. The diagnostic procedure includes goodness-of-fit statistics, measures of influence, and tools for graphical display. The goodness-of-fit statistics can assess the key assumptions of the three regression models, whereas measures of influence can isolate outliers caused by certain noise components, including motion artifacts. The tools for graphical display permit graphical visualization of the values for the goodness-of-fit statistic and influence measures. Finally, we conduct simulation studies to evaluate performance of these methods, and we analyze a real dataset to illustrate how our diagnostic procedure localizes subtle image artifacts by detecting intravoxel variability that is not captured by the regression models. PMID:19890478

  15. Negative Selection Algorithm for Aircraft Fault Detection

    NASA Technical Reports Server (NTRS)

    Dasgupta, D.; KrishnaKumar, K.; Wong, D.; Berry, M.

    2004-01-01

    We investigated a real-valued Negative Selection Algorithm (NSA) for fault detection in man-in-the-loop aircraft operation. The detection algorithm uses body-axes angular rate sensory data exhibiting the normal flight behavior patterns, to generate probabilistically a set of fault detectors that can detect any abnormalities (including faults and damages) in the behavior pattern of the aircraft flight. We performed experiments with datasets (collected under normal and various simulated failure conditions) using the NASA Ames man-in-the-loop high-fidelity C-17 flight simulator. The paper provides results of experiments with different datasets representing various failure conditions.

  16. Two-pass imputation algorithm for missing value estimation in gene expression time series.

    PubMed

    Tsiporkova, Elena; Boeva, Veselka

    2007-10-01

    Gene expression microarray experiments frequently generate datasets with multiple values missing. However, most of the analysis, mining, and classification methods for gene expression data require a complete matrix of gene array values. Therefore, the accurate estimation of missing values in such datasets has been recognized as an important issue, and several imputation algorithms have already been proposed to the biological community. Most of these approaches, however, are not particularly suitable for time series expression profiles. In view of this, we propose a novel imputation algorithm, which is specially suited for the estimation of missing values in gene expression time series data. The algorithm utilizes Dynamic Time Warping (DTW) distance in order to measure the similarity between time expression profiles, and subsequently selects for each gene expression profile with missing values a dedicated set of candidate profiles for estimation. Three different DTW-based imputation (DTWimpute) algorithms have been considered: position-wise, neighborhood-wise, and two-pass imputation. These have initially been prototyped in Perl, and their accuracy has been evaluated on yeast expression time series data using several different parameter settings. The experiments have shown that the two-pass algorithm consistently outperforms, in particular for datasets with a higher level of missing entries, the neighborhood-wise and the position-wise algorithms. The performance of the two-pass DTWimpute algorithm has further been benchmarked against the weighted K-Nearest Neighbors algorithm, which is widely used in the biological community; the former algorithm has appeared superior to the latter one. Motivated by these findings, indicating clearly the added value of the DTW techniques for missing value estimation in time series data, we have built an optimized C++ implementation of the two-pass DTWimpute algorithm. The software also provides for a choice between three different initial rough imputation methods.

  17. Identification of Predictive Cis-Regulatory Elements Using a Discriminative Objective Function and a Dynamic Search Space

    PubMed Central

    Karnik, Rahul; Beer, Michael A.

    2015-01-01

    The generation of genomic binding or accessibility data from massively parallel sequencing technologies such as ChIP-seq and DNase-seq continues to accelerate. Yet state-of-the-art computational approaches for the identification of DNA binding motifs often yield motifs of weak predictive power. Here we present a novel computational algorithm called MotifSpec, designed to find predictive motifs, in contrast to over-represented sequence elements. The key distinguishing feature of this algorithm is that it uses a dynamic search space and a learned threshold to find discriminative motifs in combination with the modeling of motifs using a full PWM (position weight matrix) rather than k-mer words or regular expressions. We demonstrate that our approach finds motifs corresponding to known binding specificities in several mammalian ChIP-seq datasets, and that our PWMs classify the ChIP-seq signals with accuracy comparable to, or marginally better than motifs from the best existing algorithms. In other datasets, our algorithm identifies novel motifs where other methods fail. Finally, we apply this algorithm to detect motifs from expression datasets in C. elegans using a dynamic expression similarity metric rather than fixed expression clusters, and find novel predictive motifs. PMID:26465884

  18. Identification of Predictive Cis-Regulatory Elements Using a Discriminative Objective Function and a Dynamic Search Space.

    PubMed

    Karnik, Rahul; Beer, Michael A

    2015-01-01

    The generation of genomic binding or accessibility data from massively parallel sequencing technologies such as ChIP-seq and DNase-seq continues to accelerate. Yet state-of-the-art computational approaches for the identification of DNA binding motifs often yield motifs of weak predictive power. Here we present a novel computational algorithm called MotifSpec, designed to find predictive motifs, in contrast to over-represented sequence elements. The key distinguishing feature of this algorithm is that it uses a dynamic search space and a learned threshold to find discriminative motifs in combination with the modeling of motifs using a full PWM (position weight matrix) rather than k-mer words or regular expressions. We demonstrate that our approach finds motifs corresponding to known binding specificities in several mammalian ChIP-seq datasets, and that our PWMs classify the ChIP-seq signals with accuracy comparable to, or marginally better than motifs from the best existing algorithms. In other datasets, our algorithm identifies novel motifs where other methods fail. Finally, we apply this algorithm to detect motifs from expression datasets in C. elegans using a dynamic expression similarity metric rather than fixed expression clusters, and find novel predictive motifs.

  19. Mapping Global Ocean Surface Albedo from Satellite Observations: Models, Algorithms, and Datasets

    NASA Astrophysics Data System (ADS)

    Li, X.; Fan, X.; Yan, H.; Li, A.; Wang, M.; Qu, Y.

    2018-04-01

    Ocean surface albedo (OSA) is one of the important parameters in surface radiation budget (SRB). It is usually considered as a controlling factor of the heat exchange among the atmosphere and ocean. The temporal and spatial dynamics of OSA determine the energy absorption of upper level ocean water, and have influences on the oceanic currents, atmospheric circulations, and transportation of material and energy of hydrosphere. Therefore, various parameterizations and models have been developed for describing the dynamics of OSA. However, it has been demonstrated that the currently available OSA datasets cannot full fill the requirement of global climate change studies. In this study, we present a literature review on mapping global OSA from satellite observations. The models (parameterizations, the coupled ocean-atmosphere radiative transfer (COART), and the three component ocean water albedo (TCOWA)), algorithms (the estimation method based on reanalysis data, and the direct-estimation algorithm), and datasets (the cloud, albedo and radiation (CLARA) surface albedo product, dataset derived by the TCOWA model, and the global land surface satellite (GLASS) phase-2 surface broadband albedo product) of OSA have been discussed, separately.

  20. Aerosol climate time series from ESA Aerosol_cci (Invited)

    NASA Astrophysics Data System (ADS)

    Holzer-Popp, T.

    2013-12-01

    Within the ESA Climate Change Initiative (CCI) the Aerosol_cci project (mid 2010 - mid 2013, phase 2 proposed 2014-2016) has conducted intensive work to improve algorithms for the retrieval of aerosol information from European sensors AATSR (3 algorithms), PARASOL, MERIS (3 algorithms), synergetic AATSR/SCIAMACHY, OMI and GOMOS. Whereas OMI and GOMOS were used to derive absorbing aerosol index and stratospheric extinction profiles, respectively, Aerosol Optical Depth (AOD) and Angstrom coefficient were retrieved from the other sensors. Global datasets for 2008 were produced and validated versus independent ground-based data and other satellite data sets (MODIS, MISR). An additional 17-year dataset is currently generated using ATSR-2/AATSR data. During the three years of the project, intensive collaborative efforts were made to improve the retrieval algorithms focusing on the most critical modules. The team agreed on the use of a common definition for the aerosol optical properties. Cloud masking was evaluated, but a rigorous analysis with a pre-scribed cloud mask did not lead to improvement for all algorithms. Better results were obtained using a post-processing step in which sudden transitions, indicative of possible occurrence of cloud contamination, were removed. Surface parameterization, which is most critical for the nadir only algorithms (MERIS and synergetic AATSR / SCIAMACHY) was studied to a limited extent. The retrieval results for AOD, Ångström exponent (AE) and uncertainties were evaluated by comparison with data from AERONET (and a limited amount of MAN) sun photometer and with satellite data available from MODIS and MISR. Both level2 and level3 (gridded daily) datasets were validated. Several validation metrics were used (standard statistical quantities such as bias, rmse, Pearson correlation, linear regression, as well as scoring approaches to quantitatively evaluate the spatial and temporal correlations against AERONET), and in some cases developed further, to evaluate the datasets and their regional and seasonal merits. The validation showed that most datasets have improved significantly and in particular PARASOL (ocean only) provides excellent results. The metrics for AATSR (land and ocean) datasets are similar to those of MODIS and MISR, with AATSR better in some land regions and less good in some others (ocean). However, AATSR coverage is smaller than that of MODIS due to swath width. The MERIS dataset provides better coverage than AATSR but has lower quality (especially over land) than the other datasets. Also the synergetic AATSR/SCIAMACHY dataset has lower quality. The evaluation of the pixel uncertainties shows first good results but also reveals that more work needs to be done to provide comprehensive information for data assimilation. Users (MACC/ECMWF, AEROCOM) confirmed the relevance of this additional information and encouraged Aerosol_cci to release the current uncertainties. The paper will summarize and discuss the results of three year work in Aerosol_cci, extract the lessons learned and conclude with an outlook to the work proposed for the next three years. In this second phase a cyclic effort of algorithm evolution, dataset generation, validation and assessment will be applied to produce and further improve complete time series from all sensors under investigation, new sensors will be added (e.g. IASI), and preparation for the Sentinel missions will be made.

  1. Encoding the local connectivity patterns of fMRI for cognitive task and state classification.

    PubMed

    Onal Ertugrul, Itir; Ozay, Mete; Yarman Vural, Fatos T

    2018-06-15

    In this work, we propose a novel framework to encode the local connectivity patterns of brain, using Fisher vectors (FV), vector of locally aggregated descriptors (VLAD) and bag-of-words (BoW) methods. We first obtain local descriptors, called mesh arc descriptors (MADs) from fMRI data, by forming local meshes around anatomical regions, and estimating their relationship within a neighborhood. Then, we extract a dictionary of relationships, called brain connectivity dictionary by fitting a generative Gaussian mixture model (GMM) to a set of MADs, and selecting codewords at the mean of each component of the mixture. Codewords represent connectivity patterns among anatomical regions. We also encode MADs by VLAD and BoW methods using k-Means clustering. We classify cognitive tasks using the Human Connectome Project (HCP) task fMRI dataset and cognitive states using the Emotional Memory Retrieval (EMR). We train support vector machines (SVMs) using the encoded MADs. Results demonstrate that, FV encoding of MADs can be successfully employed for classification of cognitive tasks, and outperform VLAD and BoW representations. Moreover, we identify the significant Gaussians in mixture models by computing energy of their corresponding FV parts, and analyze their effect on classification accuracy. Finally, we suggest a new method to visualize the codewords of the learned brain connectivity dictionary.

  2. Multi-class SVM model for fMRI-based classification and grading of liver fibrosis

    NASA Astrophysics Data System (ADS)

    Freiman, M.; Sela, Y.; Edrei, Y.; Pappo, O.; Joskowicz, L.; Abramovitch, R.

    2010-03-01

    We present a novel non-invasive automatic method for the classification and grading of liver fibrosis from fMRI maps based on hepatic hemodynamic changes. This method automatically creates a model for liver fibrosis grading based on training datasets. Our supervised learning method evaluates hepatic hemodynamics from an anatomical MRI image and three T2*-W fMRI signal intensity time-course scans acquired during the breathing of air, air-carbon dioxide, and carbogen. It constructs a statistical model of liver fibrosis from these fMRI scans using a binary-based one-against-all multi class Support Vector Machine (SVM) classifier. We evaluated the resulting classification model with the leave-one out technique and compared it to both full multi-class SVM and K-Nearest Neighbor (KNN) classifications. Our experimental study analyzed 57 slice sets from 13 mice, and yielded a 98.2% separation accuracy between healthy and low grade fibrotic subjects, and an overall accuracy of 84.2% for fibrosis grading. These results are better than the existing image-based methods which can only discriminate between healthy and high grade fibrosis subjects. With appropriate extensions, our method may be used for non-invasive classification and progression monitoring of liver fibrosis in human patients instead of more invasive approaches, such as biopsy or contrast-enhanced imaging.

  3. Supervised dictionary learning for inferring concurrent brain networks.

    PubMed

    Zhao, Shijie; Han, Junwei; Lv, Jinglei; Jiang, Xi; Hu, Xintao; Zhao, Yu; Ge, Bao; Guo, Lei; Liu, Tianming

    2015-10-01

    Task-based fMRI (tfMRI) has been widely used to explore functional brain networks via predefined stimulus paradigm in the fMRI scan. Traditionally, the general linear model (GLM) has been a dominant approach to detect task-evoked networks. However, GLM focuses on task-evoked or event-evoked brain responses and possibly ignores the intrinsic brain functions. In comparison, dictionary learning and sparse coding methods have attracted much attention recently, and these methods have shown the promise of automatically and systematically decomposing fMRI signals into meaningful task-evoked and intrinsic concurrent networks. Nevertheless, two notable limitations of current data-driven dictionary learning method are that the prior knowledge of task paradigm is not sufficiently utilized and that the establishment of correspondences among dictionary atoms in different brains have been challenging. In this paper, we propose a novel supervised dictionary learning and sparse coding method for inferring functional networks from tfMRI data, which takes both of the advantages of model-driven method and data-driven method. The basic idea is to fix the task stimulus curves as predefined model-driven dictionary atoms and only optimize the other portion of data-driven dictionary atoms. Application of this novel methodology on the publicly available human connectome project (HCP) tfMRI datasets has achieved promising results.

  4. Evolving hard problems: Generating human genetics datasets with a complex etiology.

    PubMed

    Himmelstein, Daniel S; Greene, Casey S; Moore, Jason H

    2011-07-07

    A goal of human genetics is to discover genetic factors that influence individuals' susceptibility to common diseases. Most common diseases are thought to result from the joint failure of two or more interacting components instead of single component failures. This greatly complicates both the task of selecting informative genetic variants and the task of modeling interactions between them. We and others have previously developed algorithms to detect and model the relationships between these genetic factors and disease. Previously these methods have been evaluated with datasets simulated according to pre-defined genetic models. Here we develop and evaluate a model free evolution strategy to generate datasets which display a complex relationship between individual genotype and disease susceptibility. We show that this model free approach is capable of generating a diverse array of datasets with distinct gene-disease relationships for an arbitrary interaction order and sample size. We specifically generate eight-hundred Pareto fronts; one for each independent run of our algorithm. In each run the predictiveness of single genetic variation and pairs of genetic variants have been minimized, while the predictiveness of third, fourth, or fifth-order combinations is maximized. Two hundred runs of the algorithm are further dedicated to creating datasets with predictive four or five order interactions and minimized lower-level effects. This method and the resulting datasets will allow the capabilities of novel methods to be tested without pre-specified genetic models. This allows researchers to evaluate which methods will succeed on human genetics problems where the model is not known in advance. We further make freely available to the community the entire Pareto-optimal front of datasets from each run so that novel methods may be rigorously evaluated. These 76,600 datasets are available from http://discovery.dartmouth.edu/model_free_data/.

  5. Functional overestimation due to spatial smoothing of fMRI data.

    PubMed

    Liu, Peng; Calhoun, Vince; Chen, Zikuan

    2017-11-01

    Pearson correlation (simply correlation) is a basic technique for neuroimage function analysis. It has been observed that the spatial smoothing may cause functional overestimation, which however remains a lack of complete understanding. Herein, we present a theoretical explanation from the perspective of correlation scale invariance. For a task-evoked spatiotemporal functional dataset, we can extract the functional spatial map by calculating the temporal correlations (tcorr) of voxel timecourses against the task timecourse. From the relationship between image noise level (changed through spatial smoothing) and the tcorr map calculation, we show that the spatial smoothing causes a noise reduction, which in turn smooths the tcorr map and leads to a spatial expansion on neuroactivity blob estimation. Through numerical simulations and subject experiments, we show that the spatial smoothing of fMRI data may overestimate activation spots in the correlation functional map. Our results suggest a small spatial smoothing (with a smoothing kernel with a full width at half maximum (FWHM) of no more than two voxels) on fMRI data processing for correlation-based functional mapping COMPARISON WITH EXISTING METHODS: In extreme noiselessness, the correlation of scale-invariance property defines a meaningless binary tcorr map. In reality, a functional activity blob in a tcorr map is shaped due to the spoilage of image noise on correlative responses. We may reduce data noise level by smoothing processing, which poses a smoothing effect on correlation. This logic allows us to understand the noise dependence and the smoothing effect of correlation-based fMRI data analysis. Copyright © 2017 Elsevier B.V. All rights reserved.

  6. Fuzzy Naive Bayesian model for medical diagnostic decision support.

    PubMed

    Wagholikar, Kavishwar B; Vijayraghavan, Sundararajan; Deshpande, Ashok W

    2009-01-01

    This work relates to the development of computational algorithms to provide decision support to physicians. The authors propose a Fuzzy Naive Bayesian (FNB) model for medical diagnosis, which extends the Fuzzy Bayesian approach proposed by Okuda. A physician's interview based method is described to define a orthogonal fuzzy symptom information system, required to apply the model. For the purpose of elaboration and elicitation of characteristics, the algorithm is applied to a simple simulated dataset, and compared with conventional Naive Bayes (NB) approach. As a preliminary evaluation of FNB in real world scenario, the comparison is repeated on a real fuzzy dataset of 81 patients diagnosed with infectious diseases. The case study on simulated dataset elucidates that FNB can be optimal over NB for diagnosing patients with imprecise-fuzzy information, on account of the following characteristics - 1) it can model the information that, values of some attributes are semantically closer than values of other attributes, and 2) it offers a mechanism to temper exaggerations in patient information. Although the algorithm requires precise training data, its utility for fuzzy training data is argued for. This is supported by the case study on infectious disease dataset, which indicates optimality of FNB over NB for the infectious disease domain. Further case studies on large datasets are required to establish utility of FNB.

  7. Countering imbalanced datasets to improve adverse drug event predictive models in labor and delivery.

    PubMed

    Taft, L M; Evans, R S; Shyu, C R; Egger, M J; Chawla, N; Mitchell, J A; Thornton, S N; Bray, B; Varner, M

    2009-04-01

    The IOM report, Preventing Medication Errors, emphasizes the overall lack of knowledge of the incidence of adverse drug events (ADE). Operating rooms, emergency departments and intensive care units are known to have a higher incidence of ADE. Labor and delivery (L&D) is an emergency care unit that could have an increased risk of ADE, where reported rates remain low and under-reporting is suspected. Risk factor identification with electronic pattern recognition techniques could improve ADE detection rates. The objective of the present study is to apply Synthetic Minority Over Sampling Technique (SMOTE) as an enhanced sampling method in a sparse dataset to generate prediction models to identify ADE in women admitted for labor and delivery based on patient risk factors and comorbidities. By creating synthetic cases with the SMOTE algorithm and using a 10-fold cross-validation technique, we demonstrated improved performance of the Naïve Bayes and the decision tree algorithms. The true positive rate (TPR) of 0.32 in the raw dataset increased to 0.67 in the 800% over-sampled dataset. Enhanced performance from classification algorithms can be attained with the use of synthetic minority class oversampling techniques in sparse clinical datasets. Predictive models created in this manner can be used to develop evidence based ADE monitoring systems.

  8. Vehicle Classification Using an Imbalanced Dataset Based on a Single Magnetic Sensor.

    PubMed

    Xu, Chang; Wang, Yingguan; Bao, Xinghe; Li, Fengrong

    2018-05-24

    This paper aims to improve the accuracy of automatic vehicle classifiers for imbalanced datasets. Classification is made through utilizing a single anisotropic magnetoresistive sensor, with the models of vehicles involved being classified into hatchbacks, sedans, buses, and multi-purpose vehicles (MPVs). Using time domain and frequency domain features in combination with three common classification algorithms in pattern recognition, we develop a novel feature extraction method for vehicle classification. These three common classification algorithms are the k-nearest neighbor, the support vector machine, and the back-propagation neural network. Nevertheless, a problem remains with the original vehicle magnetic dataset collected being imbalanced, and may lead to inaccurate classification results. With this in mind, we propose an approach called SMOTE, which can further boost the performance of classifiers. Experimental results show that the k-nearest neighbor (KNN) classifier with the SMOTE algorithm can reach a classification accuracy of 95.46%, thus minimizing the effect of the imbalance.

  9. Identification of Patients with Family History of Pancreatic Cancer--Investigation of an NLP System Portability.

    PubMed

    Mehrabi, Saeed; Krishnan, Anand; Roch, Alexandra M; Schmidt, Heidi; Li, DingCheng; Kesterson, Joe; Beesley, Chris; Dexter, Paul; Schmidt, Max; Palakal, Mathew; Liu, Hongfang

    2015-01-01

    In this study we have developed a rule-based natural language processing (NLP) system to identify patients with family history of pancreatic cancer. The algorithm was developed in a Unstructured Information Management Architecture (UIMA) framework and consisted of section segmentation, relation discovery, and negation detection. The system was evaluated on data from two institutions. The family history identification precision was consistent across the institutions shifting from 88.9% on Indiana University (IU) dataset to 87.8% on Mayo Clinic dataset. Customizing the algorithm on the the Mayo Clinic data, increased its precision to 88.1%. The family member relation discovery achieved precision, recall, and F-measure of 75.3%, 91.6% and 82.6% respectively. Negation detection resulted in precision of 99.1%. The results show that rule-based NLP approaches for specific information extraction tasks are portable across institutions; however customization of the algorithm on the new dataset improves its performance.

  10. Multiple imputation of missing fMRI data in whole brain analysis

    PubMed Central

    Vaden, Kenneth I.; Gebregziabher, Mulugeta; Kuchinsky, Stefanie E.; Eckert, Mark A.

    2012-01-01

    Whole brain fMRI analyses rarely include the entire brain because of missing data that result from data acquisition limits and susceptibility artifact, in particular. This missing data problem is typically addressed by omitting voxels from analysis, which may exclude brain regions that are of theoretical interest and increase the potential for Type II error at cortical boundaries or Type I error when spatial thresholds are used to establish significance. Imputation could significantly expand statistical map coverage, increase power, and enhance interpretations of fMRI results. We examined multiple imputation for group level analyses of missing fMRI data using methods that leverage the spatial information in fMRI datasets for both real and simulated data. Available case analysis, neighbor replacement, and regression based imputation approaches were compared in a general linear model framework to determine the extent to which these methods quantitatively (effect size) and qualitatively (spatial coverage) increased the sensitivity of group analyses. In both real and simulated data analysis, multiple imputation provided 1) variance that was most similar to estimates for voxels with no missing data, 2) fewer false positive errors in comparison to mean replacement, and 3) fewer false negative errors in comparison to available case analysis. Compared to the standard analysis approach of omitting voxels with missing data, imputation methods increased brain coverage in this study by 35% (from 33,323 to 45,071 voxels). In addition, multiple imputation increased the size of significant clusters by 58% and number of significant clusters across statistical thresholds, compared to the standard voxel omission approach. While neighbor replacement produced similar results, we recommend multiple imputation because it uses an informed sampling distribution to deal with missing data across subjects that can include neighbor values and other predictors. Multiple imputation is anticipated to be particularly useful for 1) large fMRI data sets with inconsistent missing voxels across subjects and 2) addressing the problem of increased artifact at ultra-high field, which significantly limit the extent of whole brain coverage and interpretations of results. PMID:22500925

  11. PAIR Comparison between Two Within-Group Conditions of Resting-State fMRI Improves Classification Accuracy

    PubMed Central

    Zhou, Zhen; Wang, Jian-Bao; Zang, Yu-Feng; Pan, Gang

    2018-01-01

    Classification approaches have been increasingly applied to differentiate patients and normal controls using resting-state functional magnetic resonance imaging data (RS-fMRI). Although most previous classification studies have reported promising accuracy within individual datasets, achieving high levels of accuracy with multiple datasets remains challenging for two main reasons: high dimensionality, and high variability across subjects. We used two independent RS-fMRI datasets (n = 31, 46, respectively) both with eyes closed (EC) and eyes open (EO) conditions. For each dataset, we first reduced the number of features to a small number of brain regions with paired t-tests, using the amplitude of low frequency fluctuation (ALFF) as a metric. Second, we employed a new method for feature extraction, named the PAIR method, examining EC and EO as paired conditions rather than independent conditions. Specifically, for each dataset, we obtained EC minus EO (EC—EO) maps of ALFF from half of subjects (n = 15 for dataset-1, n = 23 for dataset-2) and obtained EO—EC maps from the other half (n = 16 for dataset-1, n = 23 for dataset-2). A support vector machine (SVM) method was used for classification of EC RS-fMRI mapping and EO mapping. The mean classification accuracy of the PAIR method was 91.40% for dataset-1, and 92.75% for dataset-2 in the conventional frequency band of 0.01–0.08 Hz. For cross-dataset validation, we applied the classifier from dataset-1 directly to dataset-2, and vice versa. The mean accuracy of cross-dataset validation was 94.93% for dataset-1 to dataset-2 and 90.32% for dataset-2 to dataset-1 in the 0.01–0.08 Hz range. For the UNPAIR method, classification accuracy was substantially lower (mean 69.89% for dataset-1 and 82.97% for dataset-2), and was much lower for cross-dataset validation (64.69% for dataset-1 to dataset-2 and 64.98% for dataset-2 to dataset-1) in the 0.01–0.08 Hz range. In conclusion, for within-group design studies (e.g., paired conditions or follow-up studies), we recommend the PAIR method for feature extraction. In addition, dimensionality reduction with strong prior knowledge of specific brain regions should also be considered for feature selection in neuroimaging studies. PMID:29375288

  12. Brain-Inspired Constructive Learning Algorithms with Evolutionally Additive Nonlinear Neurons

    NASA Astrophysics Data System (ADS)

    Fang, Le-Heng; Lin, Wei; Luo, Qiang

    In this article, inspired partially by the physiological evidence of brain’s growth and development, we developed a new type of constructive learning algorithm with evolutionally additive nonlinear neurons. The new algorithms have remarkable ability in effective regression and accurate classification. In particular, the algorithms are able to sustain a certain reduction of the loss function when the dynamics of the trained network are bogged down in the vicinity of the local minima. The algorithm augments the neural network by adding only a few connections as well as neurons whose activation functions are nonlinear, nonmonotonic, and self-adapted to the dynamics of the loss functions. Indeed, we analytically demonstrate the reduction dynamics of the algorithm for different problems, and further modify the algorithms so as to obtain an improved generalization capability for the augmented neural networks. Finally, through comparing with the classical algorithm and architecture for neural network construction, we show that our constructive learning algorithms as well as their modified versions have better performances, such as faster training speed and smaller network size, on several representative benchmark datasets including the MNIST dataset for handwriting digits.

  13. Compressed sensing based missing nodes prediction in temporal communication network

    NASA Astrophysics Data System (ADS)

    Cheng, Guangquan; Ma, Yang; Liu, Zhong; Xie, Fuli

    2018-02-01

    The reconstruction of complex network topology is of great theoretical and practical significance. Most research so far focuses on the prediction of missing links. There are many mature algorithms for link prediction which have achieved good results, but research on the prediction of missing nodes has just begun. In this paper, we propose an algorithm for missing node prediction in complex networks. We detect the position of missing nodes based on their neighbor nodes under the theory of compressed sensing, and extend the algorithm to the case of multiple missing nodes using spectral clustering. Experiments on real public network datasets and simulated datasets show that our algorithm can detect the locations of hidden nodes effectively with high precision.

  14. Human activity recognition based on feature selection in smart home using back-propagation algorithm.

    PubMed

    Fang, Hongqing; He, Lei; Si, Hao; Liu, Peng; Xie, Xiaolei

    2014-09-01

    In this paper, Back-propagation(BP) algorithm has been used to train the feed forward neural network for human activity recognition in smart home environments, and inter-class distance method for feature selection of observed motion sensor events is discussed and tested. And then, the human activity recognition performances of neural network using BP algorithm have been evaluated and compared with other probabilistic algorithms: Naïve Bayes(NB) classifier and Hidden Markov Model(HMM). The results show that different feature datasets yield different activity recognition accuracy. The selection of unsuitable feature datasets increases the computational complexity and degrades the activity recognition accuracy. Furthermore, neural network using BP algorithm has relatively better human activity recognition performances than NB classifier and HMM. Copyright © 2014 ISA. Published by Elsevier Ltd. All rights reserved.

  15. Efficient algorithms for fast integration on large data sets from multiple sources.

    PubMed

    Mi, Tian; Rajasekaran, Sanguthevar; Aseltine, Robert

    2012-06-28

    Recent large scale deployments of health information technology have created opportunities for the integration of patient medical records with disparate public health, human service, and educational databases to provide comprehensive information related to health and development. Data integration techniques, which identify records belonging to the same individual that reside in multiple data sets, are essential to these efforts. Several algorithms have been proposed in the literatures that are adept in integrating records from two different datasets. Our algorithms are aimed at integrating multiple (in particular more than two) datasets efficiently. Hierarchical clustering based solutions are used to integrate multiple (in particular more than two) datasets. Edit distance is used as the basic distance calculation, while distance calculation of common input errors is also studied. Several techniques have been applied to improve the algorithms in terms of both time and space: 1) Partial Construction of the Dendrogram (PCD) that ignores the level above the threshold; 2) Ignoring the Dendrogram Structure (IDS); 3) Faster Computation of the Edit Distance (FCED) that predicts the distance with the threshold by upper bounds on edit distance; and 4) A pre-processing blocking phase that limits dynamic computation within each block. We have experimentally validated our algorithms on large simulated as well as real data. Accuracy and completeness are defined stringently to show the performance of our algorithms. In addition, we employ a four-category analysis. Comparison with FEBRL shows the robustness of our approach. In the experiments we conducted, the accuracy we observed exceeded 90% for the simulated data in most cases. 97.7% and 98.1% accuracy were achieved for the constant and proportional threshold, respectively, in a real dataset of 1,083,878 records.

  16. Reference-Free Removal of EEG-fMRI Ballistocardiogram Artifacts with Harmonic Regression

    PubMed Central

    Krishnaswamy, Pavitra; Bonmassar, Giorgio; Poulsen, Catherine; Pierce, Eric T; Purdon, Patrick L.; Brown, Emery N.

    2016-01-01

    Combining electroencephalogram (EEG) recording and functional magnetic resonance imaging (fMRI) offers the potential for imaging brain activity with high spatial and temporal resolution. This potential remains limited by the significant ballistocardiogram (BCG) artifacts induced in the EEG by cardiac pulsation-related head movement within the magnetic field. We model the BCG artifact using a harmonic basis, pose the artifact removal problem as a local harmonic regression analysis, and develop an efficient maximum likelihood algorithm to estimate and remove BCG artifacts. Our analysis paradigm accounts for time-frequency overlap between the BCG artifacts and neurophysiologic EEG signals, and tracks the spatiotemporal variations in both the artifact and the signal. We evaluate performance on: simulated oscillatory and evoked responses constructed with realistic artifacts; actual anesthesia-induced oscillatory recordings; and actual visual evoked potential recordings. In each case, the local harmonic regression analysis effectively removes the BCG artifacts, and recovers the neurophysiologic EEG signals. We further show that our algorithm outperforms commonly used reference-based and component analysis techniques, particularly in low SNR conditions, the presence of significant time-frequency overlap between the artifact and the signal, and/or large spatiotemporal variations in the BCG. Because our algorithm does not require reference signals and has low computational complexity, it offers a practical tool for removing BCG artifacts from EEG data recorded in combination with fMRI. PMID:26151100

  17. Investigating the performance of neural network backpropagation algorithms for TEC estimations using South African GPS data

    NASA Astrophysics Data System (ADS)

    Habarulema, J. B.; McKinnell, L.-A.

    2012-05-01

    In this work, results obtained by investigating the application of different neural network backpropagation training algorithms are presented. This was done to assess the performance accuracy of each training algorithm in total electron content (TEC) estimations using identical datasets in models development and verification processes. Investigated training algorithms are standard backpropagation (SBP), backpropagation with weight delay (BPWD), backpropagation with momentum (BPM) term, backpropagation with chunkwise weight update (BPC) and backpropagation for batch (BPB) training. These five algorithms are inbuilt functions within the Stuttgart Neural Network Simulator (SNNS) and the main objective was to find out the training algorithm that generates the minimum error between the TEC derived from Global Positioning System (GPS) observations and the modelled TEC data. Another investigated algorithm is the MatLab based Levenberg-Marquardt backpropagation (L-MBP), which achieves convergence after the least number of iterations during training. In this paper, neural network (NN) models were developed using hourly TEC data (for 8 years: 2000-2007) derived from GPS observations over a receiver station located at Sutherland (SUTH) (32.38° S, 20.81° E), South Africa. Verification of the NN models for all algorithms considered was performed on both "seen" and "unseen" data. Hourly TEC values over SUTH for 2003 formed the "seen" dataset. The "unseen" dataset consisted of hourly TEC data for 2002 and 2008 over Cape Town (CPTN) (33.95° S, 18.47° E) and SUTH, respectively. The models' verification showed that all algorithms investigated provide comparable results statistically, but differ significantly in terms of time required to achieve convergence during input-output data training/learning. This paper therefore provides a guide to neural network users for choosing appropriate algorithms based on the availability of computation capabilities used for research.

  18. Quantifying the tibiofemoral joint space using x-ray tomosynthesis.

    PubMed

    Kalinosky, Benjamin; Sabol, John M; Piacsek, Kelly; Heckel, Beth; Gilat Schmidt, Taly

    2011-12-01

    Digital x-ray tomosynthesis (DTS) has the potential to provide 3D information about the knee joint in a load-bearing posture, which may improve diagnosis and monitoring of knee osteoarthritis compared with projection radiography, the current standard of care. Manually quantifying and visualizing the joint space width (JSW) from 3D tomosynthesis datasets may be challenging. This work developed a semiautomated algorithm for quantifying the 3D tibiofemoral JSW from reconstructed DTS images. The algorithm was validated through anthropomorphic phantom experiments and applied to three clinical datasets. A user-selected volume of interest within the reconstructed DTS volume was enhanced with 1D multiscale gradient kernels. The edge-enhanced volumes were divided by polarity into tibial and femoral edge maps and combined across kernel scales. A 2D connected components algorithm was performed to determine candidate tibial and femoral edges. A 2D joint space width map (JSW) was constructed to represent the 3D tibiofemoral joint space. To quantify the algorithm accuracy, an adjustable knee phantom was constructed, and eleven posterior-anterior (PA) and lateral DTS scans were acquired with the medial minimum JSW of the phantom set to 0-5 mm in 0.5 mm increments (VolumeRad™, GE Healthcare, Chalfont St. Giles, United Kingdom). The accuracy of the algorithm was quantified by comparing the minimum JSW in a region of interest in the medial compartment of the JSW map to the measured phantom setting for each trial. In addition, the algorithm was applied to DTS scans of a static knee phantom and the JSW map compared to values estimated from a manually segmented computed tomography (CT) dataset. The algorithm was also applied to three clinical DTS datasets of osteoarthritic patients. The algorithm segmented the JSW and generated a JSW map for all phantom and clinical datasets. For the adjustable phantom, the estimated minimum JSW values were plotted against the measured values for all trials. A linear fit estimated a slope of 0.887 (R² = 0.962) and a mean error across all trials of 0.34 mm for the PA phantom data. The estimated minimum JSW values for the lateral adjustable phantom acquisitions were found to have low correlation to the measured values (R² = 0.377), with a mean error of 2.13 mm. The error in the lateral adjustable-phantom datasets appeared to be caused by artifacts due to unrealistic features in the phantom bones. JSW maps generated by DTS and CT varied by a mean of 0.6 mm and 0.8 mm across the knee joint, for PA and lateral scans. The tibial and femoral edges were successfully segmented and JSW maps determined for PA and lateral clinical DTS datasets. A semiautomated method is presented for quantifying the 3D joint space in a 2D JSW map using tomosynthesis images. The proposed algorithm quantified the JSW across the knee joint to sub-millimeter accuracy for PA tomosynthesis acquisitions. Overall, the results suggest that x-ray tomosynthesis may be beneficial for diagnosing and monitoring disease progression or treatment of osteoarthritis by providing quantitative images of JSW in the load-bearing knee.

  19. Multi-class computational evolution: development, benchmark evaluation and application to RNA-Seq biomarker discovery.

    PubMed

    Crabtree, Nathaniel M; Moore, Jason H; Bowyer, John F; George, Nysia I

    2017-01-01

    A computational evolution system (CES) is a knowledge discovery engine that can identify subtle, synergistic relationships in large datasets. Pareto optimization allows CESs to balance accuracy with model complexity when evolving classifiers. Using Pareto optimization, a CES is able to identify a very small number of features while maintaining high classification accuracy. A CES can be designed for various types of data, and the user can exploit expert knowledge about the classification problem in order to improve discrimination between classes. These characteristics give CES an advantage over other classification and feature selection algorithms, particularly when the goal is to identify a small number of highly relevant, non-redundant biomarkers. Previously, CESs have been developed only for binary class datasets. In this study, we developed a multi-class CES. The multi-class CES was compared to three common feature selection and classification algorithms: support vector machine (SVM), random k-nearest neighbor (RKNN), and random forest (RF). The algorithms were evaluated on three distinct multi-class RNA sequencing datasets. The comparison criteria were run-time, classification accuracy, number of selected features, and stability of selected feature set (as measured by the Tanimoto distance). The performance of each algorithm was data-dependent. CES performed best on the dataset with the smallest sample size, indicating that CES has a unique advantage since the accuracy of most classification methods suffer when sample size is small. The multi-class extension of CES increases the appeal of its application to complex, multi-class datasets in order to identify important biomarkers and features.

  20. SVM classification of microaneurysms with imbalanced dataset based on borderline-SMOTE and data cleaning techniques

    NASA Astrophysics Data System (ADS)

    Wang, Qingjie; Xin, Jingmin; Wu, Jiayi; Zheng, Nanning

    2017-03-01

    Microaneurysms are the earliest clinic signs of diabetic retinopathy, and many algorithms were developed for the automatic classification of these specific pathology. However, the imbalanced class distribution of dataset usually causes the classification accuracy of true microaneurysms be low. Therefore, by combining the borderline synthetic minority over-sampling technique (BSMOTE) with the data cleaning techniques such as Tomek links and Wilson's edited nearest neighbor rule (ENN) to resample the imbalanced dataset, we propose two new support vector machine (SVM) classification algorithms for the microaneurysms. The proposed BSMOTE-Tomek and BSMOTE-ENN algorithms consist of: 1) the adaptive synthesis of the minority samples in the neighborhood of the borderline, and 2) the remove of redundant training samples for improving the efficiency of data utilization. Moreover, the modified SVM classifier with probabilistic outputs is used to divide the microaneurysm candidates into two groups: true microaneurysms and false microaneurysms. The experiments with a public microaneurysms database shows that the proposed algorithms have better classification performance including the receiver operating characteristic (ROC) curve and the free-response receiver operating characteristic (FROC) curve.

  1. Gene selection heuristic algorithm for nutrigenomics studies.

    PubMed

    Valour, D; Hue, I; Grimard, B; Valour, B

    2013-07-15

    Large datasets from -omics studies need to be deeply investigated. The aim of this paper is to provide a new method (LEM method) for the search of transcriptome and metabolome connections. The heuristic algorithm here described extends the classical canonical correlation analysis (CCA) to a high number of variables (without regularization) and combines well-conditioning and fast-computing in "R." Reduced CCA models are summarized in PageRank matrices, the product of which gives a stochastic matrix that resumes the self-avoiding walk covered by the algorithm. Then, a homogeneous Markov process applied to this stochastic matrix converges the probabilities of interconnection between genes, providing a selection of disjointed subsets of genes. This is an alternative to regularized generalized CCA for the determination of blocks within the structure matrix. Each gene subset is thus linked to the whole metabolic or clinical dataset that represents the biological phenotype of interest. Moreover, this selection process reaches the aim of biologists who often need small sets of genes for further validation or extended phenotyping. The algorithm is shown to work efficiently on three published datasets, resulting in meaningfully broadened gene networks.

  2. Benchmarking neuromorphic vision: lessons learnt from computer vision

    PubMed Central

    Tan, Cheston; Lallee, Stephane; Orchard, Garrick

    2015-01-01

    Neuromorphic Vision sensors have improved greatly since the first silicon retina was presented almost three decades ago. They have recently matured to the point where they are commercially available and can be operated by laymen. However, despite improved availability of sensors, there remains a lack of good datasets, while algorithms for processing spike-based visual data are still in their infancy. On the other hand, frame-based computer vision algorithms are far more mature, thanks in part to widely accepted datasets which allow direct comparison between algorithms and encourage competition. We are presented with a unique opportunity to shape the development of Neuromorphic Vision benchmarks and challenges by leveraging what has been learnt from the use of datasets in frame-based computer vision. Taking advantage of this opportunity, in this paper we review the role that benchmarks and challenges have played in the advancement of frame-based computer vision, and suggest guidelines for the creation of Neuromorphic Vision benchmarks and challenges. We also discuss the unique challenges faced when benchmarking Neuromorphic Vision algorithms, particularly when attempting to provide direct comparison with frame-based computer vision. PMID:26528120

  3. Search for Patterns of Functional Specificity in the Brain: A Nonparametric Hierarchical Bayesian Model for Group fMRI Data

    PubMed Central

    Sridharan, Ramesh; Vul, Edward; Hsieh, Po-Jang; Kanwisher, Nancy; Golland, Polina

    2012-01-01

    Functional MRI studies have uncovered a number of brain areas that demonstrate highly specific functional patterns. In the case of visual object recognition, small, focal regions have been characterized with selectivity for visual categories such as human faces. In this paper, we develop an algorithm that automatically learns patterns of functional specificity from fMRI data in a group of subjects. The method does not require spatial alignment of functional images from different subjects. The algorithm is based on a generative model that comprises two main layers. At the lower level, we express the functional brain response to each stimulus as a binary activation variable. At the next level, we define a prior over sets of activation variables in all subjects. We use a Hierarchical Dirichlet Process as the prior in order to learn the patterns of functional specificity shared across the group, which we call functional systems, and estimate the number of these systems. Inference based on our model enables automatic discovery and characterization of dominant and consistent functional systems. We apply the method to data from a visual fMRI study comprised of 69 distinct stimulus images. The discovered system activation profiles correspond to selectivity for a number of image categories such as faces, bodies, and scenes. Among systems found by our method, we identify new areas that are deactivated by face stimuli. In empirical comparisons with perviously proposed exploratory methods, our results appear superior in capturing the structure in the space of visual categories of stimuli. PMID:21884803

  4. Novel linkage disequilibrium clustering algorithm identifies new lupus genes on meta-analysis of GWAS datasets.

    PubMed

    Saeed, Mohammad

    2017-05-01

    Systemic lupus erythematosus (SLE) is a complex disorder. Genetic association studies of complex disorders suffer from the following three major issues: phenotypic heterogeneity, false positive (type I error), and false negative (type II error) results. Hence, genes with low to moderate effects are missed in standard analyses, especially after statistical corrections. OASIS is a novel linkage disequilibrium clustering algorithm that can potentially address false positives and negatives in genome-wide association studies (GWAS) of complex disorders such as SLE. OASIS was applied to two SLE dbGAP GWAS datasets (6077 subjects; ∼0.75 million single-nucleotide polymorphisms). OASIS identified three known SLE genes viz. IFIH1, TNIP1, and CD44, not previously reported using these GWAS datasets. In addition, 22 novel loci for SLE were identified and the 5 SLE genes previously reported using these datasets were verified. OASIS methodology was validated using single-variant replication and gene-based analysis with GATES. This led to the verification of 60% of OASIS loci. New SLE genes that OASIS identified and were further verified include TNFAIP6, DNAJB3, TTF1, GRIN2B, MON2, LATS2, SNX6, RBFOX1, NCOA3, and CHAF1B. This study presents the OASIS algorithm, software, and the meta-analyses of two publicly available SLE GWAS datasets along with the novel SLE genes. Hence, OASIS is a novel linkage disequilibrium clustering method that can be universally applied to existing GWAS datasets for the identification of new genes.

  5. Dynamic association rules for gene expression data analysis.

    PubMed

    Chen, Shu-Chuan; Tsai, Tsung-Hsien; Chung, Cheng-Han; Li, Wen-Hsiung

    2015-10-14

    The purpose of gene expression analysis is to look for the association between regulation of gene expression levels and phenotypic variations. This association based on gene expression profile has been used to determine whether the induction/repression of genes correspond to phenotypic variations including cell regulations, clinical diagnoses and drug development. Statistical analyses on microarray data have been developed to resolve gene selection issue. However, these methods do not inform us of causality between genes and phenotypes. In this paper, we propose the dynamic association rule algorithm (DAR algorithm) which helps ones to efficiently select a subset of significant genes for subsequent analysis. The DAR algorithm is based on association rules from market basket analysis in marketing. We first propose a statistical way, based on constructing a one-sided confidence interval and hypothesis testing, to determine if an association rule is meaningful. Based on the proposed statistical method, we then developed the DAR algorithm for gene expression data analysis. The method was applied to analyze four microarray datasets and one Next Generation Sequencing (NGS) dataset: the Mice Apo A1 dataset, the whole genome expression dataset of mouse embryonic stem cells, expression profiling of the bone marrow of Leukemia patients, Microarray Quality Control (MAQC) data set and the RNA-seq dataset of a mouse genomic imprinting study. A comparison of the proposed method with the t-test on the expression profiling of the bone marrow of Leukemia patients was conducted. We developed a statistical way, based on the concept of confidence interval, to determine the minimum support and minimum confidence for mining association relationships among items. With the minimum support and minimum confidence, one can find significant rules in one single step. The DAR algorithm was then developed for gene expression data analysis. Four gene expression datasets showed that the proposed DAR algorithm not only was able to identify a set of differentially expressed genes that largely agreed with that of other methods, but also provided an efficient and accurate way to find influential genes of a disease. In the paper, the well-established association rule mining technique from marketing has been successfully modified to determine the minimum support and minimum confidence based on the concept of confidence interval and hypothesis testing. It can be applied to gene expression data to mine significant association rules between gene regulation and phenotype. The proposed DAR algorithm provides an efficient way to find influential genes that underlie the phenotypic variance.

  6. Analysis of energy-based algorithms for RNA secondary structure prediction

    PubMed Central

    2012-01-01

    Background RNA molecules play critical roles in the cells of organisms, including roles in gene regulation, catalysis, and synthesis of proteins. Since RNA function depends in large part on its folded structures, much effort has been invested in developing accurate methods for prediction of RNA secondary structure from the base sequence. Minimum free energy (MFE) predictions are widely used, based on nearest neighbor thermodynamic parameters of Mathews, Turner et al. or those of Andronescu et al. Some recently proposed alternatives that leverage partition function calculations find the structure with maximum expected accuracy (MEA) or pseudo-expected accuracy (pseudo-MEA) methods. Advances in prediction methods are typically benchmarked using sensitivity, positive predictive value and their harmonic mean, namely F-measure, on datasets of known reference structures. Since such benchmarks document progress in improving accuracy of computational prediction methods, it is important to understand how measures of accuracy vary as a function of the reference datasets and whether advances in algorithms or thermodynamic parameters yield statistically significant improvements. Our work advances such understanding for the MFE and (pseudo-)MEA-based methods, with respect to the latest datasets and energy parameters. Results We present three main findings. First, using the bootstrap percentile method, we show that the average F-measure accuracy of the MFE and (pseudo-)MEA-based algorithms, as measured on our largest datasets with over 2000 RNAs from diverse families, is a reliable estimate (within a 2% range with high confidence) of the accuracy of a population of RNA molecules represented by this set. However, average accuracy on smaller classes of RNAs such as a class of 89 Group I introns used previously in benchmarking algorithm accuracy is not reliable enough to draw meaningful conclusions about the relative merits of the MFE and MEA-based algorithms. Second, on our large datasets, the algorithm with best overall accuracy is a pseudo MEA-based algorithm of Hamada et al. that uses a generalized centroid estimator of base pairs. However, between MFE and other MEA-based methods, there is no clear winner in the sense that the relative accuracy of the MFE versus MEA-based algorithms changes depending on the underlying energy parameters. Third, of the four parameter sets we considered, the best accuracy for the MFE-, MEA-based, and pseudo-MEA-based methods is 0.686, 0.680, and 0.711, respectively (on a scale from 0 to 1 with 1 meaning perfect structure predictions) and is obtained with a thermodynamic parameter set obtained by Andronescu et al. called BL* (named after the Boltzmann likelihood method by which the parameters were derived). Conclusions Large datasets should be used to obtain reliable measures of the accuracy of RNA structure prediction algorithms, and average accuracies on specific classes (such as Group I introns and Transfer RNAs) should be interpreted with caution, considering the relatively small size of currently available datasets for such classes. The accuracy of the MEA-based methods is significantly higher when using the BL* parameter set of Andronescu et al. than when using the parameters of Mathews and Turner, and there is no significant difference between the accuracy of MEA-based methods and MFE when using the BL* parameters. The pseudo-MEA-based method of Hamada et al. with the BL* parameter set significantly outperforms all other MFE and MEA-based algorithms on our large data sets. PMID:22296803

  7. Analysis of energy-based algorithms for RNA secondary structure prediction.

    PubMed

    Hajiaghayi, Monir; Condon, Anne; Hoos, Holger H

    2012-02-01

    RNA molecules play critical roles in the cells of organisms, including roles in gene regulation, catalysis, and synthesis of proteins. Since RNA function depends in large part on its folded structures, much effort has been invested in developing accurate methods for prediction of RNA secondary structure from the base sequence. Minimum free energy (MFE) predictions are widely used, based on nearest neighbor thermodynamic parameters of Mathews, Turner et al. or those of Andronescu et al. Some recently proposed alternatives that leverage partition function calculations find the structure with maximum expected accuracy (MEA) or pseudo-expected accuracy (pseudo-MEA) methods. Advances in prediction methods are typically benchmarked using sensitivity, positive predictive value and their harmonic mean, namely F-measure, on datasets of known reference structures. Since such benchmarks document progress in improving accuracy of computational prediction methods, it is important to understand how measures of accuracy vary as a function of the reference datasets and whether advances in algorithms or thermodynamic parameters yield statistically significant improvements. Our work advances such understanding for the MFE and (pseudo-)MEA-based methods, with respect to the latest datasets and energy parameters. We present three main findings. First, using the bootstrap percentile method, we show that the average F-measure accuracy of the MFE and (pseudo-)MEA-based algorithms, as measured on our largest datasets with over 2000 RNAs from diverse families, is a reliable estimate (within a 2% range with high confidence) of the accuracy of a population of RNA molecules represented by this set. However, average accuracy on smaller classes of RNAs such as a class of 89 Group I introns used previously in benchmarking algorithm accuracy is not reliable enough to draw meaningful conclusions about the relative merits of the MFE and MEA-based algorithms. Second, on our large datasets, the algorithm with best overall accuracy is a pseudo MEA-based algorithm of Hamada et al. that uses a generalized centroid estimator of base pairs. However, between MFE and other MEA-based methods, there is no clear winner in the sense that the relative accuracy of the MFE versus MEA-based algorithms changes depending on the underlying energy parameters. Third, of the four parameter sets we considered, the best accuracy for the MFE-, MEA-based, and pseudo-MEA-based methods is 0.686, 0.680, and 0.711, respectively (on a scale from 0 to 1 with 1 meaning perfect structure predictions) and is obtained with a thermodynamic parameter set obtained by Andronescu et al. called BL* (named after the Boltzmann likelihood method by which the parameters were derived). Large datasets should be used to obtain reliable measures of the accuracy of RNA structure prediction algorithms, and average accuracies on specific classes (such as Group I introns and Transfer RNAs) should be interpreted with caution, considering the relatively small size of currently available datasets for such classes. The accuracy of the MEA-based methods is significantly higher when using the BL* parameter set of Andronescu et al. than when using the parameters of Mathews and Turner, and there is no significant difference between the accuracy of MEA-based methods and MFE when using the BL* parameters. The pseudo-MEA-based method of Hamada et al. with the BL* parameter set significantly outperforms all other MFE and MEA-based algorithms on our large data sets.

  8. Preserving subject variability in group fMRI analysis: performance evaluation of GICA vs. IVA

    PubMed Central

    Michael, Andrew M.; Anderson, Mathew; Miller, Robyn L.; Adalı, Tülay; Calhoun, Vince D.

    2014-01-01

    Independent component analysis (ICA) is a widely applied technique to derive functionally connected brain networks from fMRI data. Group ICA (GICA) and Independent Vector Analysis (IVA) are extensions of ICA that enable users to perform group fMRI analyses; however a full comparison of the performance limits of GICA and IVA has not been investigated. Recent interest in resting state fMRI data with potentially higher degree of subject variability makes the evaluation of the above techniques important. In this paper we compare component estimation accuracies of GICA and an improved version of IVA using simulated fMRI datasets. We systematically change the degree of inter-subject spatial variability of components and evaluate estimation accuracy over all spatial maps (SMs) and time courses (TCs) of the decomposition. Our results indicate the following: (1) at low levels of SM variability or when just one SM is varied, both GICA and IVA perform well, (2) at higher levels of SM variability or when more than one SMs are varied, IVA continues to perform well but GICA yields SM estimates that are composites of other SMs with errors in TCs, (3) both GICA and IVA remove spatial correlations of overlapping SMs and introduce artificial correlations in their TCs, (4) if number of SMs is over estimated, IVA continues to perform well but GICA introduces artifacts in the varying and extra SMs with artificial correlations in the TCs of extra components, and (5) in the absence or presence of SMs unique to one subject, GICA produces errors in TCs and IVA estimates are accurate. In summary, our simulation experiments (both simplistic and realistic) and our holistic analyses approach indicate that IVA produces results that are closer to ground truth and thereby better preserves subject variability. The improved version of IVA is now packaged into the GIFT toolbox (http://mialab.mrn.org/software/gift). PMID:25018704

  9. Gaussian process based independent analysis for temporal source separation in fMRI.

    PubMed

    Hald, Ditte Høvenhoff; Henao, Ricardo; Winther, Ole

    2017-05-15

    Functional Magnetic Resonance Imaging (fMRI) gives us a unique insight into the processes of the brain, and opens up for analyzing the functional activation patterns of the underlying sources. Task-inferred supervised learning with restrictive assumptions in the regression set-up, restricts the exploratory nature of the analysis. Fully unsupervised independent component analysis (ICA) algorithms, on the other hand, can struggle to detect clear classifiable components on single-subject data. We attribute this shortcoming to inadequate modeling of the fMRI source signals by failing to incorporate its temporal nature. fMRI source signals, biological stimuli and non-stimuli-related artifacts are all smooth over a time-scale compatible with the sampling time (TR). We therefore propose Gaussian process ICA (GPICA), which facilitates temporal dependency by the use of Gaussian process source priors. On two fMRI data sets with different sampling frequency, we show that the GPICA-inferred temporal components and associated spatial maps allow for a more definite interpretation than standard temporal ICA methods. The temporal structures of the sources are controlled by the covariance of the Gaussian process, specified by a kernel function with an interpretable and controllable temporal length scale parameter. We propose a hierarchical model specification, considering both instantaneous and convolutive mixing, and we infer source spatial maps, temporal patterns and temporal length scale parameters by Markov Chain Monte Carlo. A companion implementation made as a plug-in for SPM can be downloaded from https://github.com/dittehald/GPICA. Copyright © 2017 Elsevier Inc. All rights reserved.

  10. Segmentation of Unstructured Datasets

    NASA Technical Reports Server (NTRS)

    Bhat, Smitha

    1996-01-01

    Datasets generated by computer simulations and experiments in Computational Fluid Dynamics tend to be extremely large and complex. It is difficult to visualize these datasets using standard techniques like Volume Rendering and Ray Casting. Object Segmentation provides a technique to extract and quantify regions of interest within these massive datasets. This thesis explores basic algorithms to extract coherent amorphous regions from two-dimensional and three-dimensional scalar unstructured grids. The techniques are applied to datasets from Computational Fluid Dynamics and from Finite Element Analysis.

  11. M-AMST: an automatic 3D neuron tracing method based on mean shift and adapted minimum spanning tree.

    PubMed

    Wan, Zhijiang; He, Yishan; Hao, Ming; Yang, Jian; Zhong, Ning

    2017-03-29

    Understanding the working mechanism of the brain is one of the grandest challenges for modern science. Toward this end, the BigNeuron project was launched to gather a worldwide community to establish a big data resource and a set of the state-of-the-art of single neuron reconstruction algorithms. Many groups contributed their own algorithms for the project, including our mean shift and minimum spanning tree (M-MST). Although M-MST is intuitive and easy to implement, the MST just considers spatial information of single neuron and ignores the shape information, which might lead to less precise connections between some neuron segments. In this paper, we propose an improved algorithm, namely M-AMST, in which a rotating sphere model based on coordinate transformation is used to improve the weight calculation method in M-MST. Two experiments are designed to illustrate the effect of adapted minimum spanning tree algorithm and the adoptability of M-AMST in reconstructing variety of neuron image datasets respectively. In the experiment 1, taking the reconstruction of APP2 as reference, we produce the four difference scores (entire structure average (ESA), different structure average (DSA), percentage of different structure (PDS) and max distance of neurons' nodes (MDNN)) by comparing the neuron reconstruction of the APP2 and the other 5 competing algorithm. The result shows that M-AMST gets lower difference scores than M-MST in ESA, PDS and MDNN. Meanwhile, M-AMST is better than N-MST in ESA and MDNN. It indicates that utilizing the adapted minimum spanning tree algorithm which took the shape information of neuron into account can achieve better neuron reconstructions. In the experiment 2, 7 neuron image datasets are reconstructed and the four difference scores are calculated by comparing the gold standard reconstruction and the reconstructions produced by 6 competing algorithms. Comparing the four difference scores of M-AMST and the other 5 algorithm, we can conclude that M-AMST is able to achieve the best difference score in 3 datasets and get the second-best difference score in the other 2 datasets. We develop a pathway extraction method using a rotating sphere model based on coordinate transformation to improve the weight calculation approach in MST. The experimental results show that M-AMST utilizes the adapted minimum spanning tree algorithm which takes the shape information of neuron into account can achieve better neuron reconstructions. Moreover, M-AMST is able to get good neuron reconstruction in variety of image datasets.

  12. A 30+ Year AVHRR LAI and FAPAR Climate Data Record: Algorithm Description, Validation, and Case Study

    NASA Technical Reports Server (NTRS)

    Claverie, Martin; Matthews, Jessica L.; Vermote, Eric F.; Justice, Christopher O.

    2016-01-01

    In- land surface models, which are used to evaluate the role of vegetation in the context ofglobal climate change and variability, LAI and FAPAR play a key role, specifically with respect to thecarbon and water cycles. The AVHRR-based LAIFAPAR dataset offers daily temporal resolution,an improvement over previous products. This climate data record is based on a carefully calibratedand corrected land surface reflectance dataset to provide a high-quality, consistent time-series suitablefor climate studies. It spans from mid-1981 to the present. Further, this operational dataset is availablein near real-time allowing use for monitoring purposes. The algorithm relies on artificial neuralnetworks calibrated using the MODIS LAI/FAPAR dataset. Evaluation based on cross-comparisonwith MODIS products and in situ data show the dataset is consistent and reliable with overalluncertainties of 1.03 and 0.15 for LAI and FAPAR, respectively. However, a clear saturation effect isobserved in the broadleaf forest biomes with high LAI (greater than 4.5) and FAPAR (greater than 0.8) values.

  13. Diagnosis and prediction of periodontally compromised teeth using a deep learning-based convolutional neural network algorithm.

    PubMed

    Lee, Jae-Hong; Kim, Do-Hyung; Jeong, Seong-Nyum; Choi, Seong-Ho

    2018-04-01

    The aim of the current study was to develop a computer-assisted detection system based on a deep convolutional neural network (CNN) algorithm and to evaluate the potential usefulness and accuracy of this system for the diagnosis and prediction of periodontally compromised teeth (PCT). Combining pretrained deep CNN architecture and a self-trained network, periapical radiographic images were used to determine the optimal CNN algorithm and weights. The diagnostic and predictive accuracy, sensitivity, specificity, positive predictive value, negative predictive value, receiver operating characteristic (ROC) curve, area under the ROC curve, confusion matrix, and 95% confidence intervals (CIs) were calculated using our deep CNN algorithm, based on a Keras framework in Python. The periapical radiographic dataset was split into training (n=1,044), validation (n=348), and test (n=348) datasets. With the deep learning algorithm, the diagnostic accuracy for PCT was 81.0% for premolars and 76.7% for molars. Using 64 premolars and 64 molars that were clinically diagnosed as severe PCT, the accuracy of predicting extraction was 82.8% (95% CI, 70.1%-91.2%) for premolars and 73.4% (95% CI, 59.9%-84.0%) for molars. We demonstrated that the deep CNN algorithm was useful for assessing the diagnosis and predictability of PCT. Therefore, with further optimization of the PCT dataset and improvements in the algorithm, a computer-aided detection system can be expected to become an effective and efficient method of diagnosing and predicting PCT.

  14. Validation of Point Clouds Segmentation Algorithms Through Their Application to Several Case Studies for Indoor Building Modelling

    NASA Astrophysics Data System (ADS)

    Macher, H.; Landes, T.; Grussenmeyer, P.

    2016-06-01

    Laser scanners are widely used for the modelling of existing buildings and particularly in the creation process of as-built BIM (Building Information Modelling). However, the generation of as-built BIM from point clouds involves mainly manual steps and it is consequently time consuming and error-prone. Along the path to automation, a three steps segmentation approach has been developed. This approach is composed of two phases: a segmentation into sub-spaces namely floors and rooms and a plane segmentation combined with the identification of building elements. In order to assess and validate the developed approach, different case studies are considered. Indeed, it is essential to apply algorithms to several datasets and not to develop algorithms with a unique dataset which could influence the development with its particularities. Indoor point clouds of different types of buildings will be used as input for the developed algorithms, going from an individual house of almost one hundred square meters to larger buildings of several thousand square meters. Datasets provide various space configurations and present numerous different occluding objects as for example desks, computer equipments, home furnishings and even wine barrels. For each dataset, the results will be illustrated. The analysis of the results will provide an insight into the transferability of the developed approach for the indoor modelling of several types of buildings.

  15. Two novel motion-based algorithms for surveillance video analysis on embedded platforms

    NASA Astrophysics Data System (ADS)

    Vijverberg, Julien A.; Loomans, Marijn J. H.; Koeleman, Cornelis J.; de With, Peter H. N.

    2010-05-01

    This paper proposes two novel motion-vector based techniques for target detection and target tracking in surveillance videos. The algorithms are designed to operate on a resource-constrained device, such as a surveillance camera, and to reuse the motion vectors generated by the video encoder. The first novel algorithm for target detection uses motion vectors to construct a consistent motion mask, which is combined with a simple background segmentation technique to obtain a segmentation mask. The second proposed algorithm aims at multi-target tracking and uses motion vectors to assign blocks to targets employing five features. The weights of these features are adapted based on the interaction between targets. These algorithms are combined in one complete analysis application. The performance of this application for target detection has been evaluated for the i-LIDS sterile zone dataset and achieves an F1-score of 0.40-0.69. The performance of the analysis algorithm for multi-target tracking has been evaluated using the CAVIAR dataset and achieves an MOTP of around 9.7 and MOTA of 0.17-0.25. On a selection of targets in videos from other datasets, the achieved MOTP and MOTA are 8.8-10.5 and 0.32-0.49 respectively. The execution time on a PC-based platform is 36 ms. This includes the 20 ms for generating motion vectors, which are also required by the video encoder.

  16. Optimal HRF and smoothing parameters for fMRI time series within an autoregressive modeling framework.

    PubMed

    Galka, Andreas; Siniatchkin, Michael; Stephani, Ulrich; Groening, Kristina; Wolff, Stephan; Bosch-Bayard, Jorge; Ozaki, Tohru

    2010-12-01

    The analysis of time series obtained by functional magnetic resonance imaging (fMRI) may be approached by fitting predictive parametric models, such as nearest-neighbor autoregressive models with exogeneous input (NNARX). As a part of the modeling procedure, it is possible to apply instantaneous linear transformations to the data. Spatial smoothing, a common preprocessing step, may be interpreted as such a transformation. The autoregressive parameters may be constrained, such that they provide a response behavior that corresponds to the canonical haemodynamic response function (HRF). We present an algorithm for estimating the parameters of the linear transformations and of the HRF within a rigorous maximum-likelihood framework. Using this approach, an optimal amount of both the spatial smoothing and the HRF can be estimated simultaneously for a given fMRI data set. An example from a motor-task experiment is discussed. It is found that, for this data set, weak, but non-zero, spatial smoothing is optimal. Furthermore, it is demonstrated that activated regions can be estimated within the maximum-likelihood framework.

  17. Discriminant analysis of resting-state functional connectivity patterns on the Grassmann manifold

    NASA Astrophysics Data System (ADS)

    Fan, Yong; Liu, Yong; Jiang, Tianzi; Liu, Zhening; Hao, Yihui; Liu, Haihong

    2010-03-01

    The functional networks, extracted from fMRI images using independent component analysis, have been demonstrated informative for distinguishing brain states of cognitive functions and neurological diseases. In this paper, we propose a novel algorithm for discriminant analysis of functional networks encoded by spatial independent components. The functional networks of each individual are used as bases for a linear subspace, referred to as a functional connectivity pattern, which facilitates a comprehensive characterization of temporal signals of fMRI data. The functional connectivity patterns of different individuals are analyzed on the Grassmann manifold by adopting a principal angle based subspace distance. In conjunction with a support vector machine classifier, a forward component selection technique is proposed to select independent components for constructing the most discriminative functional connectivity pattern. The discriminant analysis method has been applied to an fMRI based schizophrenia study with 31 schizophrenia patients and 31 healthy individuals. The experimental results demonstrate that the proposed method not only achieves a promising classification performance for distinguishing schizophrenia patients from healthy controls, but also identifies discriminative functional networks that are informative for schizophrenia diagnosis.

  18. Multistability of the Brain Network for Self-other Processing

    PubMed Central

    Chen, Yi-An; Huang, Tsung-Ren

    2017-01-01

    Early fMRI studies suggested that brain areas processing self-related and other-related information were highly overlapping. Hypothesising functional localisation of the cortex, researchers have tried to locate “self-specific” and “other-specific” regions within these overlapping areas by subtracting suspected confounding signals in task-based fMRI experiments. Inspired by recent advances in whole-brain dynamic modelling, we instead explored an alternative hypothesis that similar spatial activation patterns could be associated with different processing modes in the form of different synchronisation patterns. Combining an automated synthesis of fMRI data with a presumption-free diffusion spectrum image (DSI) fibre-tracking algorithm, we isolated a network putatively composed of brain areas and white matter tracts involved in self-other processing. We sampled synchronisation patterns from the dynamical systems of this network using various combinations of physiological parameters. Our results showed that the self-other processing network, with simulated gamma-band activity, tended to stabilise at a number of distinct synchronisation patterns. This phenomenon, termed “multistability,” could serve as an alternative model in theorising the mechanism of processing self-other information. PMID:28256520

  19. A Cancer Gene Selection Algorithm Based on the K-S Test and CFS.

    PubMed

    Su, Qiang; Wang, Yina; Jiang, Xiaobing; Chen, Fuxue; Lu, Wen-Cong

    2017-01-01

    To address the challenging problem of selecting distinguished genes from cancer gene expression datasets, this paper presents a gene subset selection algorithm based on the Kolmogorov-Smirnov (K-S) test and correlation-based feature selection (CFS) principles. The algorithm selects distinguished genes first using the K-S test, and then, it uses CFS to select genes from those selected by the K-S test. We adopted support vector machines (SVM) as the classification tool and used the criteria of accuracy to evaluate the performance of the classifiers on the selected gene subsets. This approach compared the proposed gene subset selection algorithm with the K-S test, CFS, minimum-redundancy maximum-relevancy (mRMR), and ReliefF algorithms. The average experimental results of the aforementioned gene selection algorithms for 5 gene expression datasets demonstrate that, based on accuracy, the performance of the new K-S and CFS-based algorithm is better than those of the K-S test, CFS, mRMR, and ReliefF algorithms. The experimental results show that the K-S test-CFS gene selection algorithm is a very effective and promising approach compared to the K-S test, CFS, mRMR, and ReliefF algorithms.

  20. Effect of a Noise-Optimized Second-Generation Monoenergetic Algorithm on Image Noise and Conspicuity of Hypervascular Liver Tumors: An In Vitro and In Vivo Study.

    PubMed

    Marin, Daniele; Ramirez-Giraldo, Juan Carlos; Gupta, Sonia; Fu, Wanyi; Stinnett, Sandra S; Mileto, Achille; Bellini, Davide; Patel, Bhavik; Samei, Ehsan; Nelson, Rendon C

    2016-06-01

    The purpose of this study is to investigate whether the reduction in noise using a second-generation monoenergetic algorithm can improve the conspicuity of hypervascular liver tumors on dual-energy CT (DECT) images of the liver. An anthropomorphic liver phantom in three body sizes and iodine-containing inserts simulating hypervascular lesions was imaged with DECT and single-energy CT at various energy levels (80-140 kV). In addition, a retrospective clinical study was performed in 31 patients with 66 hypervascular liver tumors who underwent DECT during the late hepatic arterial phase. Datasets at energy levels ranging from 40 to 80 keV were reconstructed using first- and second-generation monoenergetic algorithms. Noise, tumor-to-liver contrast-to-noise ratio (CNR), and CNR with a noise constraint (CNRNC) set with a maximum noise increase of 50% were calculated and compared among the different reconstructed datasets. The maximum CNR for the second-generation monoenergetic algorithm, which was attained at 40 keV in both phantom and clinical datasets, was statistically significantly higher than the maximum CNR for the first-generation monoenergetic algorithm (p < 0.001) or single-energy CT acquisitions across a wide range of kilovoltage values. With the second-generation monoenergetic algorithm, the optimal CNRNC occurred at 55 keV, corresponding to lower energy levels compared with first-generation algorithm (predominantly at 70 keV). Patient body size did not substantially affect the selection of the optimal energy level to attain maximal CNR and CNRNC using the second-generation monoenergetic algorithm. A noise-optimized second-generation monoenergetic algorithm significantly improves the conspicuity of hypervascular liver tumors.

  1. Joint demosaicking and zooming using moderate spectral correlation and consistent edge map

    NASA Astrophysics Data System (ADS)

    Zhou, Dengwen; Dong, Weiming; Chen, Wengang

    2014-07-01

    The recently published joint demosaicking and zooming algorithms for single-sensor digital cameras all overfit the popular Kodak test images, which have been found to have higher spectral correlation than typical color images. Their performance perhaps significantly degrades on other datasets, such as the McMaster test images, which have weak spectral correlation. A new joint demosaicking and zooming algorithm is proposed for the Bayer color filter array (CFA) pattern, in which the edge direction information (edge map) extracted from the raw CFA data is consistently used in demosaicking and zooming. It also moderately utilizes the spectral correlation between color planes. The experimental results confirm that the proposed algorithm produces an excellent performance on both the Kodak and McMaster datasets in terms of both subjective and objective measures. Our algorithm also has high computational efficiency. It provides a better tradeoff among adaptability, performance, and computational cost compared to the existing algorithms.

  2. Heterogeneous Tensor Decomposition for Clustering via Manifold Optimization.

    PubMed

    Sun, Yanfeng; Gao, Junbin; Hong, Xia; Mishra, Bamdev; Yin, Baocai

    2016-03-01

    Tensor clustering is an important tool that exploits intrinsically rich structures in real-world multiarray or Tensor datasets. Often in dealing with those datasets, standard practice is to use subspace clustering that is based on vectorizing multiarray data. However, vectorization of tensorial data does not exploit complete structure information. In this paper, we propose a subspace clustering algorithm without adopting any vectorization process. Our approach is based on a novel heterogeneous Tucker decomposition model taking into account cluster membership information. We propose a new clustering algorithm that alternates between different modes of the proposed heterogeneous tensor model. All but the last mode have closed-form updates. Updating the last mode reduces to optimizing over the multinomial manifold for which we investigate second order Riemannian geometry and propose a trust-region algorithm. Numerical experiments show that our proposed algorithm compete effectively with state-of-the-art clustering algorithms that are based on tensor factorization.

  3. A multi-band semi-analytical algorithm for estimating chlorophyll-a concentration in the Yellow River Estuary, China.

    PubMed

    Chen, Jun; Quan, Wenting; Cui, Tingwei

    2015-01-01

    In this study, two sample semi-analytical algorithms and one new unified multi-band semi-analytical algorithm (UMSA) for estimating chlorophyll-a (Chla) concentration were constructed by specifying optimal wavelengths. The three sample semi-analytical algorithms, including the three-band semi-analytical algorithm (TSA), four-band semi-analytical algorithm (FSA), and UMSA algorithm, were calibrated and validated by the dataset collected in the Yellow River Estuary between September 1 and 10, 2009. By comparing of the accuracy of assessment of TSA, FSA, and UMSA algorithms, it was found that the UMSA algorithm had a superior performance in comparison with the two other algorithms, TSA and FSA. Using the UMSA algorithm in retrieving Chla concentration in the Yellow River Estuary decreased by 25.54% NRMSE (normalized root mean square error) when compared with the FSA algorithm, and 29.66% NRMSE in comparison with the TSA algorithm. These are very significant improvements upon previous methods. Additionally, the study revealed that the TSA and FSA algorithms are merely more specific forms of the UMSA algorithm. Owing to the special form of the UMSA algorithm, if the same bands were used for both the TSA and UMSA algorithms or FSA and UMSA algorithms, the UMSA algorithm would theoretically produce superior results in comparison with the TSA and FSA algorithms. Thus, good results may also be produced if the UMSA algorithm were to be applied for predicting Chla concentration for datasets of Gitelson et al. (2008) and Le et al. (2009).

  4. Validity of Five Satellite-Based Latent Heat Flux Algorithms for Semi-arid Ecosystems

    DOE PAGES

    Feng, Fei; Chen, Jiquan; Li, Xianglan; ...

    2015-12-09

    Accurate estimation of latent heat flux (LE) is critical in characterizing semiarid ecosystems. Many LE algorithms have been developed during the past few decades. However, the algorithms have not been directly compared, particularly over global semiarid ecosystems. In this paper, we evaluated the performance of five LE models over semiarid ecosystems such as grassland, shrub, and savanna using the Fluxnet dataset of 68 eddy covariance (EC) sites during the period 2000–2009. We also used a modern-era retrospective analysis for research and applications (MERRA) dataset, the Normalized Difference Vegetation Index (NDVI) and Fractional Photosynthetically Active Radiation (FPAR) from the moderate resolutionmore » imaging spectroradiometer (MODIS) products; the leaf area index (LAI) from the global land surface satellite (GLASS) products; and the digital elevation model (DEM) from shuttle radar topography mission (SRTM30) dataset to generate LE at region scale during the period 2003–2006. The models were the moderate resolution imaging spectroradiometer LE (MOD16) algorithm, revised remote sensing based Penman–Monteith LE algorithm (RRS), the Priestley–Taylor LE algorithm of the Jet Propulsion Laboratory (PT-JPL), the modified satellite-based Priestley–Taylor LE algorithm (MS-PT), and the semi-empirical Penman LE algorithm (UMD). Direct comparison with ground measured LE showed the PT-JPL and MS-PT algorithms had relative high performance over semiarid ecosystems with the coefficient of determination (R2) ranging from 0.6 to 0.8 and root mean squared error (RMSE) of approximately 20 W/m 2. Empirical parameters in the structure algorithms of MOD16 and RRS, and calibrated coefficients of the UMD algorithm may be the cause of the reduced performance of these LE algorithms with R2 ranging from 0.5 to 0.7 and RMSE ranging from 20 to 35 W/m 2 for MOD16, RRS and UMD. Sensitivity analysis showed that radiation and vegetation terms were the dominating variables affecting LE Fluxes in global semiarid ecosystem.« less

  5. Validity of Five Satellite-Based Latent Heat Flux Algorithms for Semi-arid Ecosystems

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Feng, Fei; Chen, Jiquan; Li, Xianglan

    Accurate estimation of latent heat flux (LE) is critical in characterizing semiarid ecosystems. Many LE algorithms have been developed during the past few decades. However, the algorithms have not been directly compared, particularly over global semiarid ecosystems. In this paper, we evaluated the performance of five LE models over semiarid ecosystems such as grassland, shrub, and savanna using the Fluxnet dataset of 68 eddy covariance (EC) sites during the period 2000–2009. We also used a modern-era retrospective analysis for research and applications (MERRA) dataset, the Normalized Difference Vegetation Index (NDVI) and Fractional Photosynthetically Active Radiation (FPAR) from the moderate resolutionmore » imaging spectroradiometer (MODIS) products; the leaf area index (LAI) from the global land surface satellite (GLASS) products; and the digital elevation model (DEM) from shuttle radar topography mission (SRTM30) dataset to generate LE at region scale during the period 2003–2006. The models were the moderate resolution imaging spectroradiometer LE (MOD16) algorithm, revised remote sensing based Penman–Monteith LE algorithm (RRS), the Priestley–Taylor LE algorithm of the Jet Propulsion Laboratory (PT-JPL), the modified satellite-based Priestley–Taylor LE algorithm (MS-PT), and the semi-empirical Penman LE algorithm (UMD). Direct comparison with ground measured LE showed the PT-JPL and MS-PT algorithms had relative high performance over semiarid ecosystems with the coefficient of determination (R2) ranging from 0.6 to 0.8 and root mean squared error (RMSE) of approximately 20 W/m 2. Empirical parameters in the structure algorithms of MOD16 and RRS, and calibrated coefficients of the UMD algorithm may be the cause of the reduced performance of these LE algorithms with R2 ranging from 0.5 to 0.7 and RMSE ranging from 20 to 35 W/m 2 for MOD16, RRS and UMD. Sensitivity analysis showed that radiation and vegetation terms were the dominating variables affecting LE Fluxes in global semiarid ecosystem.« less

  6. A new collaborative recommendation approach based on users clustering using artificial bee colony algorithm.

    PubMed

    Ju, Chunhua; Xu, Chonghuan

    2013-01-01

    Although there are many good collaborative recommendation methods, it is still a challenge to increase the accuracy and diversity of these methods to fulfill users' preferences. In this paper, we propose a novel collaborative filtering recommendation approach based on K-means clustering algorithm. In the process of clustering, we use artificial bee colony (ABC) algorithm to overcome the local optimal problem caused by K-means. After that we adopt the modified cosine similarity to compute the similarity between users in the same clusters. Finally, we generate recommendation results for the corresponding target users. Detailed numerical analysis on a benchmark dataset MovieLens and a real-world dataset indicates that our new collaborative filtering approach based on users clustering algorithm outperforms many other recommendation methods.

  7. A New Collaborative Recommendation Approach Based on Users Clustering Using Artificial Bee Colony Algorithm

    PubMed Central

    Ju, Chunhua

    2013-01-01

    Although there are many good collaborative recommendation methods, it is still a challenge to increase the accuracy and diversity of these methods to fulfill users' preferences. In this paper, we propose a novel collaborative filtering recommendation approach based on K-means clustering algorithm. In the process of clustering, we use artificial bee colony (ABC) algorithm to overcome the local optimal problem caused by K-means. After that we adopt the modified cosine similarity to compute the similarity between users in the same clusters. Finally, we generate recommendation results for the corresponding target users. Detailed numerical analysis on a benchmark dataset MovieLens and a real-world dataset indicates that our new collaborative filtering approach based on users clustering algorithm outperforms many other recommendation methods. PMID:24381525

  8. A Convex Formulation for Learning a Shared Predictive Structure from Multiple Tasks

    PubMed Central

    Chen, Jianhui; Tang, Lei; Liu, Jun; Ye, Jieping

    2013-01-01

    In this paper, we consider the problem of learning from multiple related tasks for improved generalization performance by extracting their shared structures. The alternating structure optimization (ASO) algorithm, which couples all tasks using a shared feature representation, has been successfully applied in various multitask learning problems. However, ASO is nonconvex and the alternating algorithm only finds a local solution. We first present an improved ASO formulation (iASO) for multitask learning based on a new regularizer. We then convert iASO, a nonconvex formulation, into a relaxed convex one (rASO). Interestingly, our theoretical analysis reveals that rASO finds a globally optimal solution to its nonconvex counterpart iASO under certain conditions. rASO can be equivalently reformulated as a semidefinite program (SDP), which is, however, not scalable to large datasets. We propose to employ the block coordinate descent (BCD) method and the accelerated projected gradient (APG) algorithm separately to find the globally optimal solution to rASO; we also develop efficient algorithms for solving the key subproblems involved in BCD and APG. The experiments on the Yahoo webpages datasets and the Drosophila gene expression pattern images datasets demonstrate the effectiveness and efficiency of the proposed algorithms and confirm our theoretical analysis. PMID:23520249

  9. An ensemble approach to protein fold classification by integration of template-based assignment and support vector machine classifier.

    PubMed

    Xia, Jiaqi; Peng, Zhenling; Qi, Dawei; Mu, Hongbo; Yang, Jianyi

    2017-03-15

    Protein fold classification is a critical step in protein structure prediction. There are two possible ways to classify protein folds. One is through template-based fold assignment and the other is ab-initio prediction using machine learning algorithms. Combination of both solutions to improve the prediction accuracy was never explored before. We developed two algorithms, HH-fold and SVM-fold for protein fold classification. HH-fold is a template-based fold assignment algorithm using the HHsearch program. SVM-fold is a support vector machine-based ab-initio classification algorithm, in which a comprehensive set of features are extracted from three complementary sequence profiles. These two algorithms are then combined, resulting to the ensemble approach TA-fold. We performed a comprehensive assessment for the proposed methods by comparing with ab-initio methods and template-based threading methods on six benchmark datasets. An accuracy of 0.799 was achieved by TA-fold on the DD dataset that consists of proteins from 27 folds. This represents improvement of 5.4-11.7% over ab-initio methods. After updating this dataset to include more proteins in the same folds, the accuracy increased to 0.971. In addition, TA-fold achieved >0.9 accuracy on a large dataset consisting of 6451 proteins from 184 folds. Experiments on the LE dataset show that TA-fold consistently outperforms other threading methods at the family, superfamily and fold levels. The success of TA-fold is attributed to the combination of template-based fold assignment and ab-initio classification using features from complementary sequence profiles that contain rich evolution information. http://yanglab.nankai.edu.cn/TA-fold/. yangjy@nankai.edu.cn or mhb-506@163.com. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  10. A group LASSO-based method for robustly inferring gene regulatory networks from multiple time-course datasets.

    PubMed

    Liu, Li-Zhi; Wu, Fang-Xiang; Zhang, Wen-Jun

    2014-01-01

    As an abstract mapping of the gene regulations in the cell, gene regulatory network is important to both biological research study and practical applications. The reverse engineering of gene regulatory networks from microarray gene expression data is a challenging research problem in systems biology. With the development of biological technologies, multiple time-course gene expression datasets might be collected for a specific gene network under different circumstances. The inference of a gene regulatory network can be improved by integrating these multiple datasets. It is also known that gene expression data may be contaminated with large errors or outliers, which may affect the inference results. A novel method, Huber group LASSO, is proposed to infer the same underlying network topology from multiple time-course gene expression datasets as well as to take the robustness to large error or outliers into account. To solve the optimization problem involved in the proposed method, an efficient algorithm which combines the ideas of auxiliary function minimization and block descent is developed. A stability selection method is adapted to our method to find a network topology consisting of edges with scores. The proposed method is applied to both simulation datasets and real experimental datasets. It shows that Huber group LASSO outperforms the group LASSO in terms of both areas under receiver operating characteristic curves and areas under the precision-recall curves. The convergence analysis of the algorithm theoretically shows that the sequence generated from the algorithm converges to the optimal solution of the problem. The simulation and real data examples demonstrate the effectiveness of the Huber group LASSO in integrating multiple time-course gene expression datasets and improving the resistance to large errors or outliers.

  11. Predicting drug-target interactions by dual-network integrated logistic matrix factorization

    NASA Astrophysics Data System (ADS)

    Hao, Ming; Bryant, Stephen H.; Wang, Yanli

    2017-01-01

    In this work, we propose a dual-network integrated logistic matrix factorization (DNILMF) algorithm to predict potential drug-target interactions (DTI). The prediction procedure consists of four steps: (1) inferring new drug/target profiles and constructing profile kernel matrix; (2) diffusing drug profile kernel matrix with drug structure kernel matrix; (3) diffusing target profile kernel matrix with target sequence kernel matrix; and (4) building DNILMF model and smoothing new drug/target predictions based on their neighbors. We compare our algorithm with the state-of-the-art method based on the benchmark dataset. Results indicate that the DNILMF algorithm outperforms the previously reported approaches in terms of AUPR (area under precision-recall curve) and AUC (area under curve of receiver operating characteristic) based on the 5 trials of 10-fold cross-validation. We conclude that the performance improvement depends on not only the proposed objective function, but also the used nonlinear diffusion technique which is important but under studied in the DTI prediction field. In addition, we also compile a new DTI dataset for increasing the diversity of currently available benchmark datasets. The top prediction results for the new dataset are confirmed by experimental studies or supported by other computational research.

  12. Cloud Properties and Radiative Heating Rates for TWP

    DOE Data Explorer

    Comstock, Jennifer

    2013-11-07

    A cloud properties and radiative heating rates dataset is presented where cloud properties retrieved using lidar and radar observations are input into a radiative transfer model to compute radiative fluxes and heating rates at three ARM sites located in the Tropical Western Pacific (TWP) region. The cloud properties retrieval is a conditional retrieval that applies various retrieval techniques depending on the available data, that is if lidar, radar or both instruments detect cloud. This Combined Remote Sensor Retrieval Algorithm (CombRet) produces vertical profiles of liquid or ice water content (LWC or IWC), droplet effective radius (re), ice crystal generalized effective size (Dge), cloud phase, and cloud boundaries. The algorithm was compared with 3 other independent algorithms to help estimate the uncertainty in the cloud properties, fluxes, and heating rates (Comstock et al. 2013). The dataset is provided at 2 min temporal and 90 m vertical resolution. The current dataset is applied to time periods when the MMCR (Millimeter Cloud Radar) version of the ARSCL (Active Remotely-Sensed Cloud Locations) Value Added Product (VAP) is available. The MERGESONDE VAP is utilized where temperature and humidity profiles are required. Future additions to this dataset will utilize the new KAZR instrument and its associated VAPs.

  13. Pushing spatial and temporal resolution for functional and diffusion MRI in the Human Connectome Project

    PubMed Central

    Uğurbil, Kamil; Xu, Junqian; Auerbach, Edward J.; Moeller, Steen; Vu, An; Duarte-Carvajalino, Julio M.; Lenglet, Christophe; Wu, Xiaoping; Schmitter, Sebastian; Van de Moortele, Pierre Francois; Strupp, John; Sapiro, Guillermo; De Martino, Federico; Wang, Dingxin; Harel, Noam; Garwood, Michael; Chen, Liyong; Feinberg, David A.; Smith, Stephen M.; Miller, Karla L.; Sotiropoulos, Stamatios N; Jbabdi, Saad; Andersson, Jesper L; Behrens, Timothy EJ; Glasser, Matthew F.; Van Essen, David; Yacoub, Essa

    2013-01-01

    The human connectome project (HCP) relies primarily on three complementary magnetic resonance (MR) methods. These are: 1) resting state functional MR imaging (rfMRI) which uses correlations in the temporal fluctuations in an fMRI time series to deduce ‘functional connectivity’; 2) diffusion imaging (dMRI), which provides the input for tractography algorithms used for the reconstruction of the complex axonal fiber architecture; and 3) task based fMRI (tfMRI), which is employed to identify functional parcellation in the human brain in order to assist analyses of data obtained with the first two methods. We describe technical improvements and optimization of these methods as well as instrumental choices that impact speed of acquisition of fMRI and dMRI images at 3 Tesla, leading to whole brain coverage with 2 mm isotropic resolution in 0.7 second for fMRI, and 1.25 mm isotropic resolution dMRI data for tractography analysis with three-fold reduction in total data acquisition time. Ongoing technical developments and optimization for acquisition of similar data at 7 Tesla magnetic field are also presented, targeting higher resolution, specificity of functional imaging signals, mitigation of the inhomogeneous radio frequency (RF) fields and power deposition. Results demonstrate that overall, these approaches represent a significant advance in MR imaging of the human brain to investigate brain function and structure. PMID:23702417

  14. Impacts of simultaneous multislice acquisition on sensitivity and specificity in fMRI.

    PubMed

    Risk, Benjamin B; Kociuba, Mary C; Rowe, Daniel B

    2018-05-15

    Simultaneous multislice (SMS) imaging can be used to decrease the time between acquisition of fMRI volumes, which can increase sensitivity by facilitating the removal of higher-frequency artifacts and boosting effective sample size. The technique requires an additional processing step in which the slices are separated, or unaliased, to recover the whole brain volume. However, this may result in signal "leakage" between aliased locations, i.e., slice "leakage," and lead to spurious activation (decreased specificity). SMS can also lead to noise amplification, which can reduce the benefits of decreased repetition time. In this study, we evaluate the original slice-GRAPPA (no leak block) reconstruction algorithm and acceleration factor (AF = 8) used in the fMRI data in the young adult Human Connectome Project (HCP). We also evaluate split slice-GRAPPA (leak block), which can reduce slice leakage. We use simulations to disentangle higher test statistics into true positives (sensitivity) and false positives (decreased specificity). Slice leakage was greatly decreased by split slice-GRAPPA. Noise amplification was decreased by using moderate acceleration factors (AF = 4). We examined slice leakage in unprocessed fMRI motor task data from the HCP. When data were smoothed, we found evidence of slice leakage in some, but not all, subjects. We also found evidence of SMS noise amplification in unprocessed task and processed resting-state HCP data. Copyright © 2018 Elsevier Inc. All rights reserved.

  15. Regional-scale calculation of the LS factor using parallel processing

    NASA Astrophysics Data System (ADS)

    Liu, Kai; Tang, Guoan; Jiang, Ling; Zhu, A.-Xing; Yang, Jianyi; Song, Xiaodong

    2015-05-01

    With the increase of data resolution and the increasing application of USLE over large areas, the existing serial implementation of algorithms for computing the LS factor is becoming a bottleneck. In this paper, a parallel processing model based on message passing interface (MPI) is presented for the calculation of the LS factor, so that massive datasets at a regional scale can be processed efficiently. The parallel model contains algorithms for calculating flow direction, flow accumulation, drainage network, slope, slope length and the LS factor. According to the existence of data dependence, the algorithms are divided into local algorithms and global algorithms. Parallel strategy are designed according to the algorithm characters including the decomposition method for maintaining the integrity of the results, optimized workflow for reducing the time taken for exporting the unnecessary intermediate data and a buffer-communication-computation strategy for improving the communication efficiency. Experiments on a multi-node system show that the proposed parallel model allows efficient calculation of the LS factor at a regional scale with a massive dataset.

  16. Near-lossless multichannel EEG compression based on matrix and tensor decompositions.

    PubMed

    Dauwels, Justin; Srinivasan, K; Reddy, M Ramasubba; Cichocki, Andrzej

    2013-05-01

    A novel near-lossless compression algorithm for multichannel electroencephalogram (MC-EEG) is proposed based on matrix/tensor decomposition models. MC-EEG is represented in suitable multiway (multidimensional) forms to efficiently exploit temporal and spatial correlations simultaneously. Several matrix/tensor decomposition models are analyzed in view of efficient decorrelation of the multiway forms of MC-EEG. A compression algorithm is built based on the principle of “lossy plus residual coding,” consisting of a matrix/tensor decomposition-based coder in the lossy layer followed by arithmetic coding in the residual layer. This approach guarantees a specifiable maximum absolute error between original and reconstructed signals. The compression algorithm is applied to three different scalp EEG datasets and an intracranial EEG dataset, each with different sampling rate and resolution. The proposed algorithm achieves attractive compression ratios compared to compressing individual channels separately. For similar compression ratios, the proposed algorithm achieves nearly fivefold lower average error compared to a similar wavelet-based volumetric MC-EEG compression algorithm.

  17. Automatic detection of lift-off and touch-down of a pick-up walker using 3D kinematics.

    PubMed

    Grootveld, L; Thies, S B; Ogden, D; Howard, D; Kenney, L P J

    2014-02-01

    Walking aids have been associated with falls and it is believed that incorrect use limits their usefulness. Measures are therefore needed that characterize their stable use and the classification of key events in walking aid movement is the first step in their development. This study presents an automated algorithm for detection of lift-off (LO) and touch-down (TD) events of a pick-up walker. For algorithm design and initial testing, a single user performed trials for which the four individual walker feet lifted off the ground and touched down again in various sequences, and for different amounts of frame loading (Dataset_1). For further validation, ten healthy young subjects walked with the pick-up walker on flat ground (Dataset_2a) and on a narrow beam (Dataset_2b), to challenge balance. One 88-year-old walking frame user was also assessed. Kinematic data were collected with a 3D optoelectronic camera system. The algorithm detected over 93% of events (Dataset_1), and 95% and 92% in Dataset_2a and b, respectively. Of the various LO/TD sequences, those associated with natural progression resulted in up to 100% correctly identified events. For the 88-year-old walking frame user, 96% of LO events and 93% of TD events were detected, demonstrating the potential of the approach. Copyright © 2013 IPEM. Published by Elsevier Ltd. All rights reserved.

  18. Boosting association rule mining in large datasets via Gibbs sampling.

    PubMed

    Qian, Guoqi; Rao, Calyampudi Radhakrishna; Sun, Xiaoying; Wu, Yuehua

    2016-05-03

    Current algorithms for association rule mining from transaction data are mostly deterministic and enumerative. They can be computationally intractable even for mining a dataset containing just a few hundred transaction items, if no action is taken to constrain the search space. In this paper, we develop a Gibbs-sampling-induced stochastic search procedure to randomly sample association rules from the itemset space, and perform rule mining from the reduced transaction dataset generated by the sample. Also a general rule importance measure is proposed to direct the stochastic search so that, as a result of the randomly generated association rules constituting an ergodic Markov chain, the overall most important rules in the itemset space can be uncovered from the reduced dataset with probability 1 in the limit. In the simulation study and a real genomic data example, we show how to boost association rule mining by an integrated use of the stochastic search and the Apriori algorithm.

  19. A modified active appearance model based on an adaptive artificial bee colony.

    PubMed

    Abdulameer, Mohammed Hasan; Sheikh Abdullah, Siti Norul Huda; Othman, Zulaiha Ali

    2014-01-01

    Active appearance model (AAM) is one of the most popular model-based approaches that have been extensively used to extract features by highly accurate modeling of human faces under various physical and environmental circumstances. However, in such active appearance model, fitting the model with original image is a challenging task. State of the art shows that optimization method is applicable to resolve this problem. However, another common problem is applying optimization. Hence, in this paper we propose an AAM based face recognition technique, which is capable of resolving the fitting problem of AAM by introducing a new adaptive ABC algorithm. The adaptation increases the efficiency of fitting as against the conventional ABC algorithm. We have used three datasets: CASIA dataset, property 2.5D face dataset, and UBIRIS v1 images dataset in our experiments. The results have revealed that the proposed face recognition technique has performed effectively, in terms of accuracy of face recognition.

  20. Long Term Cloud Property Datasets From MODIS and AVHRR Using the CERES Cloud Algorithm

    NASA Technical Reports Server (NTRS)

    Minnis, Patrick; Bedka, Kristopher M.; Doelling, David R.; Sun-Mack, Sunny; Yost, Christopher R.; Trepte, Qing Z.; Bedka, Sarah T.; Palikonda, Rabindra; Scarino, Benjamin R.; Chen, Yan; hide

    2015-01-01

    Cloud properties play a critical role in climate change. Monitoring cloud properties over long time periods is needed to detect changes and to validate and constrain models. The Clouds and the Earth's Radiant Energy System (CERES) project has developed several cloud datasets from Aqua and Terra MODIS data to better interpret broadband radiation measurements and improve understanding of the role of clouds in the radiation budget. The algorithms applied to MODIS data have been adapted to utilize various combinations of channels on the Advanced Very High Resolution Radiometer (AVHRR) on the long-term time series of NOAA and MetOp satellites to provide a new cloud climate data record. These datasets can be useful for a variety of studies. This paper presents results of the MODIS and AVHRR analyses covering the period from 1980-2014. Validation and comparisons with other datasets are also given.

  1. SAR image dataset of military ground targets with multiple poses for ATR

    NASA Astrophysics Data System (ADS)

    Belloni, Carole; Balleri, Alessio; Aouf, Nabil; Merlet, Thomas; Le Caillec, Jean-Marc

    2017-10-01

    Automatic Target Recognition (ATR) is the task of automatically detecting and classifying targets. Recognition using Synthetic Aperture Radar (SAR) images is interesting because SAR images can be acquired at night and under any weather conditions, whereas optical sensors operating in the visible band do not have this capability. Existing SAR ATR algorithms have mostly been evaluated using the MSTAR dataset.1 The problem with the MSTAR is that some of the proposed ATR methods have shown good classification performance even when targets were hidden,2 suggesting the presence of a bias in the dataset. Evaluations of SAR ATR techniques are currently challenging due to the lack of publicly available data in the SAR domain. In this paper, we present a high resolution SAR dataset consisting of images of a set of ground military target models taken at various aspect angles, The dataset can be used for a fair evaluation and comparison of SAR ATR algorithms. We applied the Inverse Synthetic Aperture Radar (ISAR) technique to echoes from targets rotating on a turntable and illuminated with a stepped frequency waveform. The targets in the database consist of four variants of two 1.7m-long models of T-64 and T-72 tanks. The gun, the turret position and the depression angle are varied to form 26 different sequences of images. The emitted signal spanned the frequency range from 13 GHz to 18 GHz to achieve a bandwidth of 5 GHz sampled with 4001 frequency points. The resolution obtained with respect to the size of the model targets is comparable to typical values obtained using SAR airborne systems. Single polarized images (Horizontal-Horizontal) are generated using the backprojection algorithm.3 A total of 1480 images are produced using a 20° integration angle. The images in the dataset are organized in a suggested training and testing set to facilitate a standard evaluation of SAR ATR algorithms.

  2. Scalable Machine Learning for Massive Astronomical Datasets

    NASA Astrophysics Data System (ADS)

    Ball, Nicholas M.; Gray, A.

    2014-04-01

    We present the ability to perform data mining and machine learning operations on a catalog of half a billion astronomical objects. This is the result of the combination of robust, highly accurate machine learning algorithms with linear scalability that renders the applications of these algorithms to massive astronomical data tractable. We demonstrate the core algorithms kernel density estimation, K-means clustering, linear regression, nearest neighbors, random forest and gradient-boosted decision tree, singular value decomposition, support vector machine, and two-point correlation function. Each of these is relevant for astronomical applications such as finding novel astrophysical objects, characterizing artifacts in data, object classification (including for rare objects), object distances, finding the important features describing objects, density estimation of distributions, probabilistic quantities, and exploring the unknown structure of new data. The software, Skytree Server, runs on any UNIX-based machine, a virtual machine, or cloud-based and distributed systems including Hadoop. We have integrated it on the cloud computing system of the Canadian Astronomical Data Centre, the Canadian Advanced Network for Astronomical Research (CANFAR), creating the world's first cloud computing data mining system for astronomy. We demonstrate results showing the scaling of each of our major algorithms on large astronomical datasets, including the full 470,992,970 objects of the 2 Micron All-Sky Survey (2MASS) Point Source Catalog. We demonstrate the ability to find outliers in the full 2MASS dataset utilizing multiple methods, e.g., nearest neighbors. This is likely of particular interest to the radio astronomy community given, for example, that survey projects contain groups dedicated to this topic. 2MASS is used as a proof-of-concept dataset due to its convenience and availability. These results are of interest to any astronomical project with large and/or complex datasets that wishes to extract the full scientific value from its data.

  3. Scalable Machine Learning for Massive Astronomical Datasets

    NASA Astrophysics Data System (ADS)

    Ball, Nicholas M.; Astronomy Data Centre, Canadian

    2014-01-01

    We present the ability to perform data mining and machine learning operations on a catalog of half a billion astronomical objects. This is the result of the combination of robust, highly accurate machine learning algorithms with linear scalability that renders the applications of these algorithms to massive astronomical data tractable. We demonstrate the core algorithms kernel density estimation, K-means clustering, linear regression, nearest neighbors, random forest and gradient-boosted decision tree, singular value decomposition, support vector machine, and two-point correlation function. Each of these is relevant for astronomical applications such as finding novel astrophysical objects, characterizing artifacts in data, object classification (including for rare objects), object distances, finding the important features describing objects, density estimation of distributions, probabilistic quantities, and exploring the unknown structure of new data. The software, Skytree Server, runs on any UNIX-based machine, a virtual machine, or cloud-based and distributed systems including Hadoop. We have integrated it on the cloud computing system of the Canadian Astronomical Data Centre, the Canadian Advanced Network for Astronomical Research (CANFAR), creating the world's first cloud computing data mining system for astronomy. We demonstrate results showing the scaling of each of our major algorithms on large astronomical datasets, including the full 470,992,970 objects of the 2 Micron All-Sky Survey (2MASS) Point Source Catalog. We demonstrate the ability to find outliers in the full 2MASS dataset utilizing multiple methods, e.g., nearest neighbors, and the local outlier factor. 2MASS is used as a proof-of-concept dataset due to its convenience and availability. These results are of interest to any astronomical project with large and/or complex datasets that wishes to extract the full scientific value from its data.

  4. Can the usage of human growth hormones affect facial appearance and the accuracy of face recognition systems?

    NASA Astrophysics Data System (ADS)

    Rose, Jake; Martin, Michael; Bourlai, Thirimachos

    2014-06-01

    In law enforcement and security applications, the acquisition of face images is critical in producing key trace evidence for the successful identification of potential threats. The goal of the study is to demonstrate that steroid usage significantly affects human facial appearance and hence, the performance of commercial and academic face recognition (FR) algorithms. In this work, we evaluate the performance of state-of-the-art FR algorithms on two unique face image datasets of subjects before (gallery set) and after (probe set) steroid (or human growth hormone) usage. For the purpose of this study, datasets of 73 subjects were created from multiple sources found on the Internet, containing images of men and women before and after steroid usage. Next, we geometrically pre-processed all images of both face datasets. Then, we applied image restoration techniques on the same face datasets, and finally, we applied FR algorithms in order to match the pre-processed face images of our probe datasets against the face images of the gallery set. Experimental results demonstrate that only a specific set of FR algorithms obtain the most accurate results (in terms of the rank-1 identification rate). This is because there are several factors that influence the efficiency of face matchers including (i) the time lapse between the before and after image pre-processing and restoration face photos, (ii) the usage of different drugs (e.g. Dianabol, Winstrol, and Decabolan), (iii) the usage of different cameras to capture face images, and finally, (iv) the variability of standoff distance, illumination and other noise factors (e.g. motion noise). All of the previously mentioned complicated scenarios make clear that cross-scenario matching is a very challenging problem and, thus, further investigation is required.

  5. Spatio-temporal models of mental processes from fMRI.

    PubMed

    Janoos, Firdaus; Machiraju, Raghu; Singh, Shantanu; Morocz, Istvan Ákos

    2011-07-15

    Understanding the highly complex, spatially distributed and temporally organized phenomena entailed by mental processes using functional MRI is an important research problem in cognitive and clinical neuroscience. Conventional analysis methods focus on the spatial dimension of the data discarding the information about brain function contained in the temporal dimension. This paper presents a fully spatio-temporal multivariate analysis method using a state-space model (SSM) for brain function that yields not only spatial maps of activity but also its temporal structure along with spatially varying estimates of the hemodynamic response. Efficient algorithms for estimating the parameters along with quantitative validations are given. A novel low-dimensional feature-space for representing the data, based on a formal definition of functional similarity, is derived. Quantitative validation of the model and the estimation algorithms is provided with a simulation study. Using a real fMRI study for mental arithmetic, the ability of this neurophysiologically inspired model to represent the spatio-temporal information corresponding to mental processes is demonstrated. Moreover, by comparing the models across multiple subjects, natural patterns in mental processes organized according to different mental abilities are revealed. Copyright © 2011 Elsevier Inc. All rights reserved.

  6. Medical subdomain classification of clinical notes using a machine learning-based natural language processing approach.

    PubMed

    Weng, Wei-Hung; Wagholikar, Kavishwar B; McCray, Alexa T; Szolovits, Peter; Chueh, Henry C

    2017-12-01

    The medical subdomain of a clinical note, such as cardiology or neurology, is useful content-derived metadata for developing machine learning downstream applications. To classify the medical subdomain of a note accurately, we have constructed a machine learning-based natural language processing (NLP) pipeline and developed medical subdomain classifiers based on the content of the note. We constructed the pipeline using the clinical NLP system, clinical Text Analysis and Knowledge Extraction System (cTAKES), the Unified Medical Language System (UMLS) Metathesaurus, Semantic Network, and learning algorithms to extract features from two datasets - clinical notes from Integrating Data for Analysis, Anonymization, and Sharing (iDASH) data repository (n = 431) and Massachusetts General Hospital (MGH) (n = 91,237), and built medical subdomain classifiers with different combinations of data representation methods and supervised learning algorithms. We evaluated the performance of classifiers and their portability across the two datasets. The convolutional recurrent neural network with neural word embeddings trained-medical subdomain classifier yielded the best performance measurement on iDASH and MGH datasets with area under receiver operating characteristic curve (AUC) of 0.975 and 0.991, and F1 scores of 0.845 and 0.870, respectively. Considering better clinical interpretability, linear support vector machine-trained medical subdomain classifier using hybrid bag-of-words and clinically relevant UMLS concepts as the feature representation, with term frequency-inverse document frequency (tf-idf)-weighting, outperformed other shallow learning classifiers on iDASH and MGH datasets with AUC of 0.957 and 0.964, and F1 scores of 0.932 and 0.934 respectively. We trained classifiers on one dataset, applied to the other dataset and yielded the threshold of F1 score of 0.7 in classifiers for half of the medical subdomains we studied. Our study shows that a supervised learning-based NLP approach is useful to develop medical subdomain classifiers. The deep learning algorithm with distributed word representation yields better performance yet shallow learning algorithms with the word and concept representation achieves comparable performance with better clinical interpretability. Portable classifiers may also be used across datasets from different institutions.

  7. Linear Discriminant Analysis Achieves High Classification Accuracy for the BOLD fMRI Response to Naturalistic Movie Stimuli

    PubMed Central

    Mandelkow, Hendrik; de Zwart, Jacco A.; Duyn, Jeff H.

    2016-01-01

    Naturalistic stimuli like movies evoke complex perceptual processes, which are of great interest in the study of human cognition by functional MRI (fMRI). However, conventional fMRI analysis based on statistical parametric mapping (SPM) and the general linear model (GLM) is hampered by a lack of accurate parametric models of the BOLD response to complex stimuli. In this situation, statistical machine-learning methods, a.k.a. multivariate pattern analysis (MVPA), have received growing attention for their ability to generate stimulus response models in a data-driven fashion. However, machine-learning methods typically require large amounts of training data as well as computational resources. In the past, this has largely limited their application to fMRI experiments involving small sets of stimulus categories and small regions of interest in the brain. By contrast, the present study compares several classification algorithms known as Nearest Neighbor (NN), Gaussian Naïve Bayes (GNB), and (regularized) Linear Discriminant Analysis (LDA) in terms of their classification accuracy in discriminating the global fMRI response patterns evoked by a large number of naturalistic visual stimuli presented as a movie. Results show that LDA regularized by principal component analysis (PCA) achieved high classification accuracies, above 90% on average for single fMRI volumes acquired 2 s apart during a 300 s movie (chance level 0.7% = 2 s/300 s). The largest source of classification errors were autocorrelations in the BOLD signal compounded by the similarity of consecutive stimuli. All classifiers performed best when given input features from a large region of interest comprising around 25% of the voxels that responded significantly to the visual stimulus. Consistent with this, the most informative principal components represented widespread distributions of co-activated brain regions that were similar between subjects and may represent functional networks. In light of these results, the combination of naturalistic movie stimuli and classification analysis in fMRI experiments may prove to be a sensitive tool for the assessment of changes in natural cognitive processes under experimental manipulation. PMID:27065832

  8. Challenges in measuring individual differences in functional connectivity using fMRI: The case of healthy aging

    PubMed Central

    Tsvetanov, Kamen A.; Cam‐CAN; Henson, Richard N.

    2017-01-01

    Abstract Many studies report individual differences in functional connectivity, such as those related to age. However, estimates of connectivity from fMRI are confounded by other factors, such as vascular health, head motion and changes in the location of functional regions. Here, we investigate the impact of these confounds, and pre‐processing strategies that can mitigate them, using data from the Cambridge Centre for Ageing & Neuroscience (www.cam-can.com). This dataset contained two sessions of resting‐state fMRI from 214 adults aged 18–88. Functional connectivity between all regions was strongly related to vascular health, most likely reflecting respiratory and cardiac signals. These variations in mean connectivity limit the validity of between‐participant comparisons of connectivity estimates, and were best mitigated by regression of mean connectivity over participants. We also showed that high‐pass filtering, instead of band‐pass filtering, produced stronger and more reliable age‐effects. Head motion was correlated with gray‐matter volume in selected brain regions, and with various cognitive measures, suggesting that it has a biological (trait) component, and warning against regressing out motion over participants. Finally, we showed that the location of functional regions was more variable in older adults, which was alleviated by smoothing the data, or using a multivariate measure of connectivity. These results demonstrate that analysis choices have a dramatic impact on connectivity differences between individuals, ultimately affecting the associations found between connectivity and cognition. It is important that fMRI connectivity studies address these issues, and we suggest a number of ways to optimize analysis choices. Hum Brain Mapp 38:4125–4156, 2017. © 2017 Wiley Periodicals, Inc. PMID:28544076

  9. Motor and non-motor circuitry activation induced by subthalamic nucleus deep brain stimulation (STN DBS) in Parkinson’s disease patients: Intraoperative fMRI for DBS

    PubMed Central

    Knight, Emily J.; Testini, Paola; Min, Hoon-Ki; Gibson, William S.; Gorny, Krzysztof R.; Favazza, Christopher P.; Felmlee, Joel P.; Kim, Inyong; Welker, Kirk M.; Clayton, Daniel A.; Klassen, Bryan T.; Chang, Su-youne; Lee, Kendall H.

    2015-01-01

    Objective To test the hypothesis suggested by previous studies that subthalamic nucleus (STN) deep brain stimulation (DBS) in patients with PD would affect the activity of both motor and non-motor networks, we applied intraoperative fMRI to patients receiving DBS. Patients and Methods Ten patients receiving STN DBS for PD underwent intraoperative 1.5T fMRI during high frequency stimulation delivered via an external pulse generator. The study was conducted between the dates of January 1, 2013 and September 30, 2014. Results We observed blood oxygen level dependent (BOLD) signal changes (FDR<.001) in the motor circuitry, including primary motor, premotor, and supplementary motor cortices, thalamus, pedunculopontine nucleus (PPN), and cerebellum, as well as in the limbic circuitry, including cingulate and insular cortices. Activation of the motor network was observed also after applying a Bonferroni correction (p<.001) to our dataset, suggesting that, across subjects, BOLD changes in the motor circuitry are more consistent compared to those occurring in the non-motor network. Conclusions These findings support the modulatory role of STN DBS on the activity of motor and non-motor networks, and suggest complex mechanisms at the basis of the efficacy of this treatment modality. Furthermore, these results suggest that, across subjects, BOLD changes in the motor circuitry are more consistent compared to those occurring in the non-motor network. With further studies combining the use of real time intraoperative fMRI with clinical outcomes in patients treated with DBS, functional imaging techniques have the potential not only to elucidate the mechanisms of DBS functioning, but also to guide and assist in the surgical treatment of patients affected by movement and neuropsychiatric disorders. PMID:26046412

  10. Effect of phase-encoding direction on group analysis of resting-state functional magnetic resonance imaging.

    PubMed

    Mori, Yasuo; Miyata, Jun; Isobe, Masanori; Son, Shuraku; Yoshihara, Yujiro; Aso, Toshihiko; Kouchiyama, Takanori; Murai, Toshiya; Takahashi, Hidehiko

    2018-05-17

    Echo-planar imaging is a common technique used in functional magnetic resonance imaging (fMRI), however it suffers from image distortion and signal loss because of large susceptibility effects that are related to the phase-encoding direction of the scan. Despite this relationship, the majority of neuroimaging studies have not considered the influence of phase-encoding direction. Here, we aimed to clarify how phase-encoding direction can affect the outcome of an fMRI connectivity study of schizophrenia. Resting-state fMRI using anterior to posterior (A-P) and posterior to anterior (P-A) directions was used to examine 25 patients with schizophrenia (SC) and 37 matched healthy controls (HC). We conducted a functional connectivity analysis using independent component analysis and performed three group comparisons: A-P vs. P-A (all participants), SC vs. HC for the A-P and P-A datasets, and the interaction between phase-encoding direction and participant group. The estimated functional connectivity differed between the two phase-encoding directions in areas that were more extensive than those where signal loss has been reported. Although functional connectivity in the SC group was lower than that in the HC group for both directions, the A-P and P-A conditions did not exhibit the same specific pattern of differences. Further, we observed an interaction between participant group and the phase-encoding direction in the left temporo-parietal junction and left fusiform gyrus. Phase-encoding direction can influence the results of functional connectivity studies. Thus, appropriate selection and documentation of phase-encoding direction will be important in future resting-state fMRI studies. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.

  11. Cortical processing of pitch: Model-based encoding and decoding of auditory fMRI responses to real-life sounds.

    PubMed

    De Angelis, Vittoria; De Martino, Federico; Moerel, Michelle; Santoro, Roberta; Hausfeld, Lars; Formisano, Elia

    2017-11-13

    Pitch is a perceptual attribute related to the fundamental frequency (or periodicity) of a sound. So far, the cortical processing of pitch has been investigated mostly using synthetic sounds. However, the complex harmonic structure of natural sounds may require different mechanisms for the extraction and analysis of pitch. This study investigated the neural representation of pitch in human auditory cortex using model-based encoding and decoding analyses of high field (7 T) functional magnetic resonance imaging (fMRI) data collected while participants listened to a wide range of real-life sounds. Specifically, we modeled the fMRI responses as a function of the sounds' perceived pitch height and salience (related to the fundamental frequency and the harmonic structure respectively), which we estimated with a computational algorithm of pitch extraction (de Cheveigné and Kawahara, 2002). First, using single-voxel fMRI encoding, we identified a pitch-coding region in the antero-lateral Heschl's gyrus (HG) and adjacent superior temporal gyrus (STG). In these regions, the pitch representation model combining height and salience predicted the fMRI responses comparatively better than other models of acoustic processing and, in the right hemisphere, better than pitch representations based on height/salience alone. Second, we assessed with model-based decoding that multi-voxel response patterns of the identified regions are more informative of perceived pitch than the remainder of the auditory cortex. Further multivariate analyses showed that complementing a multi-resolution spectro-temporal sound representation with pitch produces a small but significant improvement to the decoding of complex sounds from fMRI response patterns. In sum, this work extends model-based fMRI encoding and decoding methods - previously employed to examine the representation and processing of acoustic sound features in the human auditory system - to the representation and processing of a relevant perceptual attribute such as pitch. Taken together, the results of our model-based encoding and decoding analyses indicated that the pitch of complex real life sounds is extracted and processed in lateral HG/STG regions, at locations consistent with those indicated in several previous fMRI studies using synthetic sounds. Within these regions, pitch-related sound representations reflect the modulatory combination of height and the salience of the pitch percept. Copyright © 2017 Elsevier Inc. All rights reserved.

  12. Testing mapping algorithms of the cancer-specific EORTC QLQ-C30 onto EQ-5D in malignant mesothelioma.

    PubMed

    Arnold, David T; Rowen, Donna; Versteegh, Matthijs M; Morley, Anna; Hooper, Clare E; Maskell, Nicholas A

    2015-01-23

    In order to estimate utilities for cancer studies where the EQ-5D was not used, the EORTC QLQ-C30 can be used to estimate EQ-5D using existing mapping algorithms. Several mapping algorithms exist for this transformation, however, algorithms tend to lose accuracy in patients in poor health states. The aim of this study was to test all existing mapping algorithms of QLQ-C30 onto EQ-5D, in a dataset of patients with malignant pleural mesothelioma, an invariably fatal malignancy where no previous mapping estimation has been published. Health related quality of life (HRQoL) data where both the EQ-5D and QLQ-C30 were used simultaneously was obtained from the UK-based prospective observational SWAMP (South West Area Mesothelioma and Pemetrexed) trial. In the original trial 73 patients with pleural mesothelioma were offered palliative chemotherapy and their HRQoL was assessed across five time points. This data was used to test the nine available mapping algorithms found in the literature, comparing predicted against observed EQ-5D values. The ability of algorithms to predict the mean, minimise error and detect clinically significant differences was assessed. The dataset had a total of 250 observations across 5 timepoints. The linear regression mapping algorithms tested generally performed poorly, over-estimating the predicted compared to observed EQ-5D values, especially when observed EQ-5D was below 0.5. The best performing algorithm used a response mapping method and predicted the mean EQ-5D with accuracy with an average root mean squared error of 0.17 (Standard Deviation; 0.22). This algorithm reliably discriminated between clinically distinct subgroups seen in the primary dataset. This study tested mapping algorithms in a population with poor health states, where they have been previously shown to perform poorly. Further research into EQ-5D estimation should be directed at response mapping methods given its superior performance in this study.

  13. A biclustering algorithm for extracting bit-patterns from binary datasets.

    PubMed

    Rodriguez-Baena, Domingo S; Perez-Pulido, Antonio J; Aguilar-Ruiz, Jesus S

    2011-10-01

    Binary datasets represent a compact and simple way to store data about the relationships between a group of objects and their possible properties. In the last few years, different biclustering algorithms have been specially developed to be applied to binary datasets. Several approaches based on matrix factorization, suffix trees or divide-and-conquer techniques have been proposed to extract useful biclusters from binary data, and these approaches provide information about the distribution of patterns and intrinsic correlations. A novel approach to extracting biclusters from binary datasets, BiBit, is introduced here. The results obtained from different experiments with synthetic data reveal the excellent performance and the robustness of BiBit to density and size of input data. Also, BiBit is applied to a central nervous system embryonic tumor gene expression dataset to test the quality of the results. A novel gene expression preprocessing methodology, based on expression level layers, and the selective search performed by BiBit, based on a very fast bit-pattern processing technique, provide very satisfactory results in quality and computational cost. The power of biclustering in finding genes involved simultaneously in different cancer processes is also shown. Finally, a comparison with Bimax, one of the most cited binary biclustering algorithms, shows that BiBit is faster while providing essentially the same results. The source and binary codes, the datasets used in the experiments and the results can be found at: http://www.upo.es/eps/bigs/BiBit.html dsrodbae@upo.es Supplementary data are available at Bioinformatics online.

  14. Training radial basis function networks for wind speed prediction using PSO enhanced differential search optimizer

    PubMed Central

    2018-01-01

    This paper presents an integrated hybrid optimization algorithm for training the radial basis function neural network (RBF NN). Training of neural networks is still a challenging exercise in machine learning domain. Traditional training algorithms in general suffer and trap in local optima and lead to premature convergence, which makes them ineffective when applied for datasets with diverse features. Training algorithms based on evolutionary computations are becoming popular due to their robust nature in overcoming the drawbacks of the traditional algorithms. Accordingly, this paper proposes a hybrid training procedure with differential search (DS) algorithm functionally integrated with the particle swarm optimization (PSO). To surmount the local trapping of the search procedure, a new population initialization scheme is proposed using Logistic chaotic sequence, which enhances the population diversity and aid the search capability. To demonstrate the effectiveness of the proposed RBF hybrid training algorithm, experimental analysis on publicly available 7 benchmark datasets are performed. Subsequently, experiments were conducted on a practical application case for wind speed prediction to expound the superiority of the proposed RBF training algorithm in terms of prediction accuracy. PMID:29768463

  15. Training radial basis function networks for wind speed prediction using PSO enhanced differential search optimizer.

    PubMed

    Rani R, Hannah Jessie; Victoire T, Aruldoss Albert

    2018-01-01

    This paper presents an integrated hybrid optimization algorithm for training the radial basis function neural network (RBF NN). Training of neural networks is still a challenging exercise in machine learning domain. Traditional training algorithms in general suffer and trap in local optima and lead to premature convergence, which makes them ineffective when applied for datasets with diverse features. Training algorithms based on evolutionary computations are becoming popular due to their robust nature in overcoming the drawbacks of the traditional algorithms. Accordingly, this paper proposes a hybrid training procedure with differential search (DS) algorithm functionally integrated with the particle swarm optimization (PSO). To surmount the local trapping of the search procedure, a new population initialization scheme is proposed using Logistic chaotic sequence, which enhances the population diversity and aid the search capability. To demonstrate the effectiveness of the proposed RBF hybrid training algorithm, experimental analysis on publicly available 7 benchmark datasets are performed. Subsequently, experiments were conducted on a practical application case for wind speed prediction to expound the superiority of the proposed RBF training algorithm in terms of prediction accuracy.

  16. [An operational remote sensing algorithm of land surface evapotranspiration based on NOAA PAL dataset].

    PubMed

    Hou, Ying-Yu; He, Yan-Bo; Wang, Jian-Lin; Tian, Guo-Liang

    2009-10-01

    Based on the time series 10-day composite NOAA Pathfinder AVHRR Land (PAL) dataset (8 km x 8 km), and by using land surface energy balance equation and "VI-Ts" (vegetation index-land surface temperature) method, a new algorithm of land surface evapotranspiration (ET) was constructed. This new algorithm did not need the support from meteorological observation data, and all of its parameters and variables were directly inversed or derived from remote sensing data. A widely accepted ET model of remote sensing, i. e., SEBS model, was chosen to validate the new algorithm. The validation test showed that both the ET and its seasonal variation trend estimated by SEBS model and our new algorithm accorded well, suggesting that the ET estimated from the new algorithm was reliable, being able to reflect the actual land surface ET. The new ET algorithm of remote sensing was practical and operational, which offered a new approach to study the spatiotemporal variation of ET in continental scale and global scale based on the long-term time series satellite remote sensing images.

  17. OpenSHS: Open Smart Home Simulator.

    PubMed

    Alshammari, Nasser; Alshammari, Talal; Sedky, Mohamed; Champion, Justin; Bauer, Carolin

    2017-05-02

    This paper develops a new hybrid, open-source, cross-platform 3D smart home simulator, OpenSHS, for dataset generation. OpenSHS offers an opportunity for researchers in the field of the Internet of Things (IoT) and machine learning to test and evaluate their models. Following a hybrid approach, OpenSHS combines advantages from both interactive and model-based approaches. This approach reduces the time and efforts required to generate simulated smart home datasets. We have designed a replication algorithm for extending and expanding a dataset. A small sample dataset produced, by OpenSHS, can be extended without affecting the logical order of the events. The replication provides a solution for generating large representative smart home datasets. We have built an extensible library of smart devices that facilitates the simulation of current and future smart home environments. Our tool divides the dataset generation process into three distinct phases: first design: the researcher designs the initial virtual environment by building the home, importing smart devices and creating contexts; second, simulation: the participant simulates his/her context-specific events; and third, aggregation: the researcher applies the replication algorithm to generate the final dataset. We conducted a study to assess the ease of use of our tool on the System Usability Scale (SUS).

  18. Variant effect prediction tools assessed using independent, functional assay-based datasets: implications for discovery and diagnostics.

    PubMed

    Mahmood, Khalid; Jung, Chol-Hee; Philip, Gayle; Georgeson, Peter; Chung, Jessica; Pope, Bernard J; Park, Daniel J

    2017-05-16

    Genetic variant effect prediction algorithms are used extensively in clinical genomics and research to determine the likely consequences of amino acid substitutions on protein function. It is vital that we better understand their accuracies and limitations because published performance metrics are confounded by serious problems of circularity and error propagation. Here, we derive three independent, functionally determined human mutation datasets, UniFun, BRCA1-DMS and TP53-TA, and employ them, alongside previously described datasets, to assess the pre-eminent variant effect prediction tools. Apparent accuracies of variant effect prediction tools were influenced significantly by the benchmarking dataset. Benchmarking with the assay-determined datasets UniFun and BRCA1-DMS yielded areas under the receiver operating characteristic curves in the modest ranges of 0.52 to 0.63 and 0.54 to 0.75, respectively, considerably lower than observed for other, potentially more conflicted datasets. These results raise concerns about how such algorithms should be employed, particularly in a clinical setting. Contemporary variant effect prediction tools are unlikely to be as accurate at the general prediction of functional impacts on proteins as reported prior. Use of functional assay-based datasets that avoid prior dependencies promises to be valuable for the ongoing development and accurate benchmarking of such tools.

  19. OpenSHS: Open Smart Home Simulator

    PubMed Central

    Alshammari, Nasser; Alshammari, Talal; Sedky, Mohamed; Champion, Justin; Bauer, Carolin

    2017-01-01

    This paper develops a new hybrid, open-source, cross-platform 3D smart home simulator, OpenSHS, for dataset generation. OpenSHS offers an opportunity for researchers in the field of the Internet of Things (IoT) and machine learning to test and evaluate their models. Following a hybrid approach, OpenSHS combines advantages from both interactive and model-based approaches. This approach reduces the time and efforts required to generate simulated smart home datasets. We have designed a replication algorithm for extending and expanding a dataset. A small sample dataset produced, by OpenSHS, can be extended without affecting the logical order of the events. The replication provides a solution for generating large representative smart home datasets. We have built an extensible library of smart devices that facilitates the simulation of current and future smart home environments. Our tool divides the dataset generation process into three distinct phases: first design: the researcher designs the initial virtual environment by building the home, importing smart devices and creating contexts; second, simulation: the participant simulates his/her context-specific events; and third, aggregation: the researcher applies the replication algorithm to generate the final dataset. We conducted a study to assess the ease of use of our tool on the System Usability Scale (SUS). PMID:28468330

  20. Identification of Patients with Family History of Pancreatic Cancer - Investigation of an NLP System Portability

    PubMed Central

    Mehrabi, Saeed; Krishnan, Anand; Roch, Alexandra M; Schmidt, Heidi; Li, DingCheng; Kesterson, Joe; Beesley, Chris; Dexter, Paul; Schmidt, Max; Palakal, Mathew; Liu, Hongfang

    2018-01-01

    In this study we have developed a rule-based natural language processing (NLP) system to identify patients with family history of pancreatic cancer. The algorithm was developed in a Unstructured Information Management Architecture (UIMA) framework and consisted of section segmentation, relation discovery, and negation detection. The system was evaluated on data from two institutions. The family history identification precision was consistent across the institutions shifting from 88.9% on Indiana University (IU) dataset to 87.8% on Mayo Clinic dataset. Customizing the algorithm on the the Mayo Clinic data, increased its precision to 88.1%. The family member relation discovery achieved precision, recall, and F-measure of 75.3%, 91.6% and 82.6% respectively. Negation detection resulted in precision of 99.1%. The results show that rule-based NLP approaches for specific information extraction tasks are portable across institutions; however customization of the algorithm on the new dataset improves its performance. PMID:26262122

  1. Partial covariance based functional connectivity computation using Ledoit-Wolf covariance regularization.

    PubMed

    Brier, Matthew R; Mitra, Anish; McCarthy, John E; Ances, Beau M; Snyder, Abraham Z

    2015-11-01

    Functional connectivity refers to shared signals among brain regions and is typically assessed in a task free state. Functional connectivity commonly is quantified between signal pairs using Pearson correlation. However, resting-state fMRI is a multivariate process exhibiting a complicated covariance structure. Partial covariance assesses the unique variance shared between two brain regions excluding any widely shared variance, hence is appropriate for the analysis of multivariate fMRI datasets. However, calculation of partial covariance requires inversion of the covariance matrix, which, in most functional connectivity studies, is not invertible owing to rank deficiency. Here we apply Ledoit-Wolf shrinkage (L2 regularization) to invert the high dimensional BOLD covariance matrix. We investigate the network organization and brain-state dependence of partial covariance-based functional connectivity. Although RSNs are conventionally defined in terms of shared variance, removal of widely shared variance, surprisingly, improved the separation of RSNs in a spring embedded graphical model. This result suggests that pair-wise unique shared variance plays a heretofore unrecognized role in RSN covariance organization. In addition, application of partial correlation to fMRI data acquired in the eyes open vs. eyes closed states revealed focal changes in uniquely shared variance between the thalamus and visual cortices. This result suggests that partial correlation of resting state BOLD time series reflect functional processes in addition to structural connectivity. Copyright © 2015 Elsevier Inc. All rights reserved.

  2. Partial covariance based functional connectivity computation using Ledoit-Wolf covariance regularization

    PubMed Central

    Brier, Matthew R.; Mitra, Anish; McCarthy, John E.; Ances, Beau M.; Snyder, Abraham Z.

    2015-01-01

    Functional connectivity refers to shared signals among brain regions and is typically assessed in a task free state. Functional connectivity commonly is quantified between signal pairs using Pearson correlation. However, resting-state fMRI is a multivariate process exhibiting a complicated covariance structure. Partial covariance assesses the unique variance shared between two brain regions excluding any widely shared variance, hence is appropriate for the analysis of multivariate fMRI datasets. However, calculation of partial covariance requires inversion of the covariance matrix, which, in most functional connectivity studies, is not invertible owing to rank deficiency. Here we apply Ledoit-Wolf shrinkage (L2 regularization) to invert the high dimensional BOLD covariance matrix. We investigate the network organization and brain-state dependence of partial covariance-based functional connectivity. Although RSNs are conventionally defined in terms of shared variance, removal of widely shared variance, surprisingly, improved the separation of RSNs in a spring embedded graphical model. This result suggests that pair-wise unique shared variance plays a heretofore unrecognized role in RSN covariance organization. In addition, application of partial correlation to fMRI data acquired in the eyes open vs. eyes closed states revealed focal changes in uniquely shared variance between the thalamus and visual cortices. This result suggests that partial correlation of resting state BOLD time series reflect functional processes in addition to structural connectivity. PMID:26208872

  3. Functional brain networks reconstruction using group sparsity-regularized learning.

    PubMed

    Zhao, Qinghua; Li, Will X Y; Jiang, Xi; Lv, Jinglei; Lu, Jianfeng; Liu, Tianming

    2018-06-01

    Investigating functional brain networks and patterns using sparse representation of fMRI data has received significant interests in the neuroimaging community. It has been reported that sparse representation is effective in reconstructing concurrent and interactive functional brain networks. To date, most of data-driven network reconstruction approaches rarely take consideration of anatomical structures, which are the substrate of brain function. Furthermore, it has been rarely explored whether structured sparse representation with anatomical guidance could facilitate functional networks reconstruction. To address this problem, in this paper, we propose to reconstruct brain networks utilizing the structure guided group sparse regression (S2GSR) in which 116 anatomical regions from the AAL template, as prior knowledge, are employed to guide the network reconstruction when performing sparse representation of whole-brain fMRI data. Specifically, we extract fMRI signals from standard space aligned with the AAL template. Then by learning a global over-complete dictionary, with the learned dictionary as a set of features (regressors), the group structured regression employs anatomical structures as group information to regress whole brain signals. Finally, the decomposition coefficients matrix is mapped back to the brain volume to represent functional brain networks and patterns. We use the publicly available Human Connectome Project (HCP) Q1 dataset as the test bed, and the experimental results indicate that the proposed anatomically guided structure sparse representation is effective in reconstructing concurrent functional brain networks.

  4. A combined long-range phasing and long haplotype imputation method to impute phase for SNP genotypes

    PubMed Central

    2011-01-01

    Background Knowing the phase of marker genotype data can be useful in genome-wide association studies, because it makes it possible to use analysis frameworks that account for identity by descent or parent of origin of alleles and it can lead to a large increase in data quantities via genotype or sequence imputation. Long-range phasing and haplotype library imputation constitute a fast and accurate method to impute phase for SNP data. Methods A long-range phasing and haplotype library imputation algorithm was developed. It combines information from surrogate parents and long haplotypes to resolve phase in a manner that is not dependent on the family structure of a dataset or on the presence of pedigree information. Results The algorithm performed well in both simulated and real livestock and human datasets in terms of both phasing accuracy and computation efficiency. The percentage of alleles that could be phased in both simulated and real datasets of varying size generally exceeded 98% while the percentage of alleles incorrectly phased in simulated data was generally less than 0.5%. The accuracy of phasing was affected by dataset size, with lower accuracy for dataset sizes less than 1000, but was not affected by effective population size, family data structure, presence or absence of pedigree information, and SNP density. The method was computationally fast. In comparison to a commonly used statistical method (fastPHASE), the current method made about 8% less phasing mistakes and ran about 26 times faster for a small dataset. For larger datasets, the differences in computational time are expected to be even greater. A computer program implementing these methods has been made available. Conclusions The algorithm and software developed in this study make feasible the routine phasing of high-density SNP chips in large datasets. PMID:21388557

  5. Automatic localization of the left ventricular blood pool centroid in short axis cardiac cine MR images.

    PubMed

    Tan, Li Kuo; Liew, Yih Miin; Lim, Einly; Abdul Aziz, Yang Faridah; Chee, Kok Han; McLaughlin, Robert A

    2018-06-01

    In this paper, we develop and validate an open source, fully automatic algorithm to localize the left ventricular (LV) blood pool centroid in short axis cardiac cine MR images, enabling follow-on automated LV segmentation algorithms. The algorithm comprises four steps: (i) quantify motion to determine an initial region of interest surrounding the heart, (ii) identify potential 2D objects of interest using an intensity-based segmentation, (iii) assess contraction/expansion, circularity, and proximity to lung tissue to score all objects of interest in terms of their likelihood of constituting part of the LV, and (iv) aggregate the objects into connected groups and construct the final LV blood pool volume and centroid. This algorithm was tested against 1140 datasets from the Kaggle Second Annual Data Science Bowl, as well as 45 datasets from the STACOM 2009 Cardiac MR Left Ventricle Segmentation Challenge. Correct LV localization was confirmed in 97.3% of the datasets. The mean absolute error between the gold standard and localization centroids was 2.8 to 4.7 mm, or 12 to 22% of the average endocardial radius. Graphical abstract Fully automated localization of the left ventricular blood pool in short axis cardiac cine MR images.

  6. A Monocular Vision Sensor-Based Obstacle Detection Algorithm for Autonomous Robots.

    PubMed

    Lee, Tae-Jae; Yi, Dong-Hoon; Cho, Dong-Il Dan

    2016-03-01

    This paper presents a monocular vision sensor-based obstacle detection algorithm for autonomous robots. Each individual image pixel at the bottom region of interest is labeled as belonging either to an obstacle or the floor. While conventional methods depend on point tracking for geometric cues for obstacle detection, the proposed algorithm uses the inverse perspective mapping (IPM) method. This method is much more advantageous when the camera is not high off the floor, which makes point tracking near the floor difficult. Markov random field-based obstacle segmentation is then performed using the IPM results and a floor appearance model. Next, the shortest distance between the robot and the obstacle is calculated. The algorithm is tested by applying it to 70 datasets, 20 of which include nonobstacle images where considerable changes in floor appearance occur. The obstacle segmentation accuracies and the distance estimation error are quantitatively analyzed. For obstacle datasets, the segmentation precision and the average distance estimation error of the proposed method are 81.4% and 1.6 cm, respectively, whereas those for a conventional method are 57.5% and 9.9 cm, respectively. For nonobstacle datasets, the proposed method gives 0.0% false positive rates, while the conventional method gives 17.6%.

  7. ROS-based ground stereo vision detection: implementation and experiments.

    PubMed

    Hu, Tianjiang; Zhao, Boxin; Tang, Dengqing; Zhang, Daibing; Kong, Weiwei; Shen, Lincheng

    This article concentrates on open-source implementation on flying object detection in cluttered scenes. It is of significance for ground stereo-aided autonomous landing of unmanned aerial vehicles. The ground stereo vision guidance system is presented with details on system architecture and workflow. The Chan-Vese detection algorithm is further considered and implemented in the robot operating systems (ROS) environment. A data-driven interactive scheme is developed to collect datasets for parameter tuning and performance evaluating. The flying vehicle outdoor experiments capture the stereo sequential images dataset and record the simultaneous data from pan-and-tilt unit, onboard sensors and differential GPS. Experimental results by using the collected dataset validate the effectiveness of the published ROS-based detection algorithm.

  8. Iterative reconstruction of simulated low count data: a comparison of post-filtering versus regularised OSEM

    NASA Astrophysics Data System (ADS)

    Karaoglanis, K.; Efthimiou, N.; Tsoumpas, C.

    2015-09-01

    Low count PET data is a challenge for medical image reconstruction. The statistics of a dataset is a key factor of the quality of the reconstructed images. Reconstruction algorithms which would be able to compensate for low count datasets could provide the means to reduce the patient injected doses and/or reduce the scan times. It has been shown that the use of priors improve the image quality in low count conditions. In this study we compared regularised versus post-filtered OSEM for their performance on challenging simulated low count datasets. Initial visual comparison demonstrated that both algorithms improve the image quality, although the use of regularization does not introduce the undesired blurring as post-filtering.

  9. Neural CMOS-integrated circuit and its application to data classification.

    PubMed

    Göknar, Izzet Cem; Yildiz, Merih; Minaei, Shahram; Deniz, Engin

    2012-05-01

    Implementation and new applications of a tunable complementary metal-oxide-semiconductor-integrated circuit (CMOS-IC) of a recently proposed classifier core-cell (CC) are presented and tested with two different datasets. With two algorithms-one based on Fisher's linear discriminant analysis and the other based on perceptron learning, used to obtain CCs' tunable parameters-the Haberman and Iris datasets are classified. The parameters so obtained are used for hard-classification of datasets with a neural network structured circuit. Classification performance and coefficient calculation times for both algorithms are given. The CC has 6-ns response time and 1.8-mW power consumption. The fabrication parameters used for the IC are taken from CMOS AMS 0.35-μm technology.

  10. Multi-Complementary Model for Long-Term Tracking

    PubMed Central

    Zhang, Deng; Zhang, Junchang; Xia, Chenyang

    2018-01-01

    In recent years, video target tracking algorithms have been widely used. However, many tracking algorithms do not achieve satisfactory performance, especially when dealing with problems such as object occlusions, background clutters, motion blur, low illumination color images, and sudden illumination changes in real scenes. In this paper, we incorporate an object model based on contour information into a Staple tracker that combines the correlation filter model and color model to greatly improve the tracking robustness. Since each model is responsible for tracking specific features, the three complementary models combine for more robust tracking. In addition, we propose an efficient object detection model with contour and color histogram features, which has good detection performance and better detection efficiency compared to the traditional target detection algorithm. Finally, we optimize the traditional scale calculation, which greatly improves the tracking execution speed. We evaluate our tracker on the Object Tracking Benchmarks 2013 (OTB-13) and Object Tracking Benchmarks 2015 (OTB-15) benchmark datasets. With the OTB-13 benchmark datasets, our algorithm is improved by 4.8%, 9.6%, and 10.9% on the success plots of OPE, TRE and SRE, respectively, in contrast to another classic LCT (Long-term Correlation Tracking) algorithm. On the OTB-15 benchmark datasets, when compared with the LCT algorithm, our algorithm achieves 10.4%, 12.5%, and 16.1% improvement on the success plots of OPE, TRE, and SRE, respectively. At the same time, it needs to be emphasized that, due to the high computational efficiency of the color model and the object detection model using efficient data structures, and the speed advantage of the correlation filters, our tracking algorithm could still achieve good tracking speed. PMID:29425170

  11. New Algorithm and Software (BNOmics) for Inferring and Visualizing Bayesian Networks from Heterogeneous Big Biological and Genetic Data

    PubMed Central

    Gogoshin, Grigoriy; Boerwinkle, Eric

    2017-01-01

    Abstract Bayesian network (BN) reconstruction is a prototypical systems biology data analysis approach that has been successfully used to reverse engineer and model networks reflecting different layers of biological organization (ranging from genetic to epigenetic to cellular pathway to metabolomic). It is especially relevant in the context of modern (ongoing and prospective) studies that generate heterogeneous high-throughput omics datasets. However, there are both theoretical and practical obstacles to the seamless application of BN modeling to such big data, including computational inefficiency of optimal BN structure search algorithms, ambiguity in data discretization, mixing data types, imputation and validation, and, in general, limited scalability in both reconstruction and visualization of BNs. To overcome these and other obstacles, we present BNOmics, an improved algorithm and software toolkit for inferring and analyzing BNs from omics datasets. BNOmics aims at comprehensive systems biology—type data exploration, including both generating new biological hypothesis and testing and validating the existing ones. Novel aspects of the algorithm center around increasing scalability and applicability to varying data types (with different explicit and implicit distributional assumptions) within the same analysis framework. An output and visualization interface to widely available graph-rendering software is also included. Three diverse applications are detailed. BNOmics was originally developed in the context of genetic epidemiology data and is being continuously optimized to keep pace with the ever-increasing inflow of available large-scale omics datasets. As such, the software scalability and usability on the less than exotic computer hardware are a priority, as well as the applicability of the algorithm and software to the heterogeneous datasets containing many data types—single-nucleotide polymorphisms and other genetic/epigenetic/transcriptome variables, metabolite levels, epidemiological variables, endpoints, and phenotypes, etc. PMID:27681505

  12. A hierarchical network-based algorithm for multi-scale watershed delineation

    NASA Astrophysics Data System (ADS)

    Castronova, Anthony M.; Goodall, Jonathan L.

    2014-11-01

    Watershed delineation is a process for defining a land area that contributes surface water flow to a single outlet point. It is a commonly used in water resources analysis to define the domain in which hydrologic process calculations are applied. There has been a growing effort over the past decade to improve surface elevation measurements in the U.S., which has had a significant impact on the accuracy of hydrologic calculations. Traditional watershed processing on these elevation rasters, however, becomes more burdensome as data resolution increases. As a result, processing of these datasets can be troublesome on standard desktop computers. This challenge has resulted in numerous works that aim to provide high performance computing solutions to large data, high resolution data, or both. This work proposes an efficient watershed delineation algorithm for use in desktop computing environments that leverages existing data, U.S. Geological Survey (USGS) National Hydrography Dataset Plus (NHD+), and open source software tools to construct watershed boundaries. This approach makes use of U.S. national-level hydrography data that has been precomputed using raster processing algorithms coupled with quality control routines. Our approach uses carefully arranged data and mathematical graph theory to traverse river networks and identify catchment boundaries. We demonstrate this new watershed delineation technique, compare its accuracy with traditional algorithms that derive watershed solely from digital elevation models, and then extend our approach to address subwatershed delineation. Our findings suggest that the open-source hierarchical network-based delineation procedure presented in the work is a promising approach to watershed delineation that can be used summarize publicly available datasets for hydrologic model input pre-processing. Through our analysis, we explore the benefits of reusing the NHD+ datasets for watershed delineation, and find that the our technique offers greater flexibility and extendability than traditional raster algorithms.

  13. New Algorithm and Software (BNOmics) for Inferring and Visualizing Bayesian Networks from Heterogeneous Big Biological and Genetic Data.

    PubMed

    Gogoshin, Grigoriy; Boerwinkle, Eric; Rodin, Andrei S

    2017-04-01

    Bayesian network (BN) reconstruction is a prototypical systems biology data analysis approach that has been successfully used to reverse engineer and model networks reflecting different layers of biological organization (ranging from genetic to epigenetic to cellular pathway to metabolomic). It is especially relevant in the context of modern (ongoing and prospective) studies that generate heterogeneous high-throughput omics datasets. However, there are both theoretical and practical obstacles to the seamless application of BN modeling to such big data, including computational inefficiency of optimal BN structure search algorithms, ambiguity in data discretization, mixing data types, imputation and validation, and, in general, limited scalability in both reconstruction and visualization of BNs. To overcome these and other obstacles, we present BNOmics, an improved algorithm and software toolkit for inferring and analyzing BNs from omics datasets. BNOmics aims at comprehensive systems biology-type data exploration, including both generating new biological hypothesis and testing and validating the existing ones. Novel aspects of the algorithm center around increasing scalability and applicability to varying data types (with different explicit and implicit distributional assumptions) within the same analysis framework. An output and visualization interface to widely available graph-rendering software is also included. Three diverse applications are detailed. BNOmics was originally developed in the context of genetic epidemiology data and is being continuously optimized to keep pace with the ever-increasing inflow of available large-scale omics datasets. As such, the software scalability and usability on the less than exotic computer hardware are a priority, as well as the applicability of the algorithm and software to the heterogeneous datasets containing many data types-single-nucleotide polymorphisms and other genetic/epigenetic/transcriptome variables, metabolite levels, epidemiological variables, endpoints, and phenotypes, etc.

  14. BiPACE 2D--graph-based multiple alignment for comprehensive 2D gas chromatography-mass spectrometry.

    PubMed

    Hoffmann, Nils; Wilhelm, Mathias; Doebbe, Anja; Niehaus, Karsten; Stoye, Jens

    2014-04-01

    Comprehensive 2D gas chromatography-mass spectrometry is an established method for the analysis of complex mixtures in analytical chemistry and metabolomics. It produces large amounts of data that require semiautomatic, but preferably automatic handling. This involves the location of significant signals (peaks) and their matching and alignment across different measurements. To date, there exist only a few openly available algorithms for the retention time alignment of peaks originating from such experiments that scale well with increasing sample and peak numbers, while providing reliable alignment results. We describe BiPACE 2D, an automated algorithm for retention time alignment of peaks from 2D gas chromatography-mass spectrometry experiments and evaluate it on three previously published datasets against the mSPA, SWPA and Guineu algorithms. We also provide a fourth dataset from an experiment studying the H2 production of two different strains of Chlamydomonas reinhardtii that is available from the MetaboLights database together with the experimental protocol, peak-detection results and manually curated multiple peak alignment for future comparability with newly developed algorithms. BiPACE 2D is contained in the freely available Maltcms framework, version 1.3, hosted at http://maltcms.sf.net, under the terms of the L-GPL v3 or Eclipse Open Source licenses. The software used for the evaluation along with the underlying datasets is available at the same location. The C.reinhardtii dataset is freely available at http://www.ebi.ac.uk/metabolights/MTBLS37.

  15. Identifying patients with Alzheimer's disease using resting-state fMRI and graph theory.

    PubMed

    Khazaee, Ali; Ebrahimzadeh, Ata; Babajani-Feremi, Abbas

    2015-11-01

    Study of brain network on the basis of resting-state functional magnetic resonance imaging (fMRI) has provided promising results to investigate changes in connectivity among different brain regions because of diseases. Graph theory can efficiently characterize different aspects of the brain network by calculating measures of integration and segregation. In this study, we combine graph theoretical approaches with advanced machine learning methods to study functional brain network alteration in patients with Alzheimer's disease (AD). Support vector machine (SVM) was used to explore the ability of graph measures in diagnosis of AD. We applied our method on the resting-state fMRI data of twenty patients with AD and twenty age and gender matched healthy subjects. The data were preprocessed and each subject's graph was constructed by parcellation of the whole brain into 90 distinct regions using the automated anatomical labeling (AAL) atlas. The graph measures were then calculated and used as the discriminating features. Extracted network-based features were fed to different feature selection algorithms to choose most significant features. In addition to the machine learning approach, statistical analysis was performed on connectivity matrices to find altered connectivity patterns in patients with AD. Using the selected features, we were able to accurately classify patients with AD from healthy subjects with accuracy of 100%. Results of this study show that pattern recognition and graph of brain network, on the basis of the resting state fMRI data, can efficiently assist in the diagnosis of AD. Classification based on the resting-state fMRI can be used as a non-invasive and automatic tool to diagnosis of Alzheimer's disease. Copyright © 2015 International Federation of Clinical Neurophysiology. All rights reserved.

  16. PreSurgMapp: a MATLAB Toolbox for Presurgical Mapping of Eloquent Functional Areas Based on Task-Related and Resting-State Functional MRI.

    PubMed

    Huang, Huiyuan; Ding, Zhongxiang; Mao, Dewang; Yuan, Jianhua; Zhu, Fangmei; Chen, Shuda; Xu, Yan; Lou, Lin; Feng, Xiaoyan; Qi, Le; Qiu, Wusi; Zhang, Han; Zang, Yu-Feng

    2016-10-01

    The main goal of brain tumor surgery is to maximize tumor resection while minimizing the risk of irreversible postoperative functional sequelae. Eloquent functional areas should be delineated preoperatively, particularly for patients with tumors near eloquent areas. Functional magnetic resonance imaging (fMRI) is a noninvasive technique that demonstrates great promise for presurgical planning. However, specialized data processing toolkits for presurgical planning remain lacking. Based on several functions in open-source software such as Statistical Parametric Mapping (SPM), Resting-State fMRI Data Analysis Toolkit (REST), Data Processing Assistant for Resting-State fMRI (DPARSF) and Multiple Independent Component Analysis (MICA), here, we introduce an open-source MATLAB toolbox named PreSurgMapp. This toolbox can reveal eloquent areas using comprehensive methods and various complementary fMRI modalities. For example, PreSurgMapp supports both model-based (general linear model, GLM, and seed correlation) and data-driven (independent component analysis, ICA) methods and processes both task-based and resting-state fMRI data. PreSurgMapp is designed for highly automatic and individualized functional mapping with a user-friendly graphical user interface (GUI) for time-saving pipeline processing. For example, sensorimotor and language-related components can be automatically identified without human input interference using an effective, accurate component identification algorithm using discriminability index. All the results generated can be further evaluated and compared by neuro-radiologists or neurosurgeons. This software has substantial value for clinical neuro-radiology and neuro-oncology, including application to patients with low- and high-grade brain tumors and those with epilepsy foci in the dominant language hemisphere who are planning to undergo a temporal lobectomy.

  17. Task-Related Edge Density (TED)—A New Method for Revealing Dynamic Network Formation in fMRI Data of the Human Brain

    PubMed Central

    Lohmann, Gabriele; Stelzer, Johannes; Zuber, Verena; Buschmann, Tilo; Margulies, Daniel; Bartels, Andreas; Scheffler, Klaus

    2016-01-01

    The formation of transient networks in response to external stimuli or as a reflection of internal cognitive processes is a hallmark of human brain function. However, its identification in fMRI data of the human brain is notoriously difficult. Here we propose a new method of fMRI data analysis that tackles this problem by considering large-scale, task-related synchronisation networks. Networks consist of nodes and edges connecting them, where nodes correspond to voxels in fMRI data, and the weight of an edge is determined via task-related changes in dynamic synchronisation between their respective times series. Based on these definitions, we developed a new data analysis algorithm that identifies edges that show differing levels of synchrony between two distinct task conditions and that occur in dense packs with similar characteristics. Hence, we call this approach “Task-related Edge Density” (TED). TED proved to be a very strong marker for dynamic network formation that easily lends itself to statistical analysis using large scale statistical inference. A major advantage of TED compared to other methods is that it does not depend on any specific hemodynamic response model, and it also does not require a presegmentation of the data for dimensionality reduction as it can handle large networks consisting of tens of thousands of voxels. We applied TED to fMRI data of a fingertapping and an emotion processing task provided by the Human Connectome Project. TED revealed network-based involvement of a large number of brain areas that evaded detection using traditional GLM-based analysis. We show that our proposed method provides an entirely new window into the immense complexity of human brain function. PMID:27341204

  18. Task-Related Edge Density (TED)-A New Method for Revealing Dynamic Network Formation in fMRI Data of the Human Brain.

    PubMed

    Lohmann, Gabriele; Stelzer, Johannes; Zuber, Verena; Buschmann, Tilo; Margulies, Daniel; Bartels, Andreas; Scheffler, Klaus

    2016-01-01

    The formation of transient networks in response to external stimuli or as a reflection of internal cognitive processes is a hallmark of human brain function. However, its identification in fMRI data of the human brain is notoriously difficult. Here we propose a new method of fMRI data analysis that tackles this problem by considering large-scale, task-related synchronisation networks. Networks consist of nodes and edges connecting them, where nodes correspond to voxels in fMRI data, and the weight of an edge is determined via task-related changes in dynamic synchronisation between their respective times series. Based on these definitions, we developed a new data analysis algorithm that identifies edges that show differing levels of synchrony between two distinct task conditions and that occur in dense packs with similar characteristics. Hence, we call this approach "Task-related Edge Density" (TED). TED proved to be a very strong marker for dynamic network formation that easily lends itself to statistical analysis using large scale statistical inference. A major advantage of TED compared to other methods is that it does not depend on any specific hemodynamic response model, and it also does not require a presegmentation of the data for dimensionality reduction as it can handle large networks consisting of tens of thousands of voxels. We applied TED to fMRI data of a fingertapping and an emotion processing task provided by the Human Connectome Project. TED revealed network-based involvement of a large number of brain areas that evaded detection using traditional GLM-based analysis. We show that our proposed method provides an entirely new window into the immense complexity of human brain function.

  19. Exploring the reproducibility of functional connectivity alterations in Parkinson’s disease

    PubMed Central

    Onu, Mihaela; Wu, Tao; Roceanu, Adina; Bajenaru, Ovidiu

    2017-01-01

    Since anatomic MRI is presently not able to directly discern neuronal loss in Parkinson’s Disease (PD), studying the associated functional connectivity (FC) changes seems a promising approach toward developing non-invasive and non-radioactive neuroimaging markers for this disease. While several groups have reported such FC changes in PD, there are also significant discrepancies between studies. Investigating the reproducibility of PD-related FC changes on independent datasets is therefore of crucial importance. We acquired resting-state fMRI scans for 43 subjects (27 patients and 16 normal controls, with 2 replicate scans per subject) and compared the observed FC changes with those obtained in two independent datasets, one made available by the PPMI consortium (91 patients, 18 controls) and a second one by the group of Tao Wu (20 patients, 20 controls). Unfortunately, PD-related functional connectivity changes turned out to be non-reproducible across datasets. This could be due to disease heterogeneity, but also to technical differences. To distinguish between the two, we devised a method to directly check for disease heterogeneity using random splits of a single dataset. Since we still observe non-reproducibility in a large fraction of random splits of the same dataset, we conclude that functional heterogeneity may be a dominating factor behind the lack of reproducibility of FC alterations in different rs-fMRI studies of PD. While global PD-related functional connectivity changes were non-reproducible across datasets, we identified a few individual brain region pairs with marginally consistent FC changes across all three datasets. However, training classifiers on each one of the three datasets to discriminate PD scans from controls produced only low accuracies on the remaining two test datasets. Moreover, classifiers trained and tested on random splits of the same dataset (which are technically homogeneous) also had low test accuracies, directly substantiating disease heterogeneity. PMID:29182621

  20. Reference layer adaptive filtering (RLAF) for EEG artifact reduction in simultaneous EEG-fMRI.

    PubMed

    Steyrl, David; Krausz, Gunther; Koschutnig, Karl; Edlinger, Günter; Müller-Putz, Gernot R

    2017-04-01

    Simultaneous electroencephalography (EEG) and functional magnetic resonance imaging (fMRI) combines advantages of both methods, namely high temporal resolution of EEG and high spatial resolution of fMRI. However, EEG quality is limited due to severe artifacts caused by fMRI scanners. To improve EEG data quality substantially, we introduce methods that use a reusable reference layer EEG cap prototype in combination with adaptive filtering. The first method, reference layer adaptive filtering (RLAF), uses adaptive filtering with reference layer artifact data to optimize artifact subtraction from EEG. In the second method, multi band reference layer adaptive filtering (MBRLAF), adaptive filtering is performed on bandwidth limited sub-bands of the EEG and the reference channels. The results suggests that RLAF outperforms the baseline method, average artifact subtraction, in all settings and also its direct predecessor, reference layer artifact subtraction (RLAS), in lower (<35 Hz) frequency ranges. MBRLAF is computationally more demanding than RLAF, but highly effective in all EEG frequency ranges. Effectivity is determined by visual inspection, as well as root-mean-square voltage reduction and power reduction of EEG provided that physiological EEG components such as occipital EEG alpha power and visual evoked potentials (VEP) are preserved. We demonstrate that both, RLAF and MBRLAF, improve VEP quality. For that, we calculate the mean-squared-distance of single trial VEP to the mean VEP and estimate single trial VEP classification accuracies. We found that the average mean-squared-distance is lowest and the average classification accuracy is highest after MBLAF. RLAF was second best. In conclusion, the results suggests that RLAF and MBRLAF are potentially very effective in improving EEG quality of simultaneous EEG-fMRI. Highlights We present a new and reusable reference layer cap prototype for simultaneous EEG-fMRI We introduce new algorithms for reducing EEG artifacts due to simultaneous fMRI The algorithms combine a reference layer and adaptive filtering Several evaluation criteria suggest superior effectivity in terms of artifact reduction We demonstrate that physiological EEG components are preserved.

  1. Reference layer adaptive filtering (RLAF) for EEG artifact reduction in simultaneous EEG-fMRI

    NASA Astrophysics Data System (ADS)

    Steyrl, David; Krausz, Gunther; Koschutnig, Karl; Edlinger, Günter; Müller-Putz, Gernot R.

    2017-04-01

    Objective. Simultaneous electroencephalography (EEG) and functional magnetic resonance imaging (fMRI) combines advantages of both methods, namely high temporal resolution of EEG and high spatial resolution of fMRI. However, EEG quality is limited due to severe artifacts caused by fMRI scanners. Approach. To improve EEG data quality substantially, we introduce methods that use a reusable reference layer EEG cap prototype in combination with adaptive filtering. The first method, reference layer adaptive filtering (RLAF), uses adaptive filtering with reference layer artifact data to optimize artifact subtraction from EEG. In the second method, multi band reference layer adaptive filtering (MBRLAF), adaptive filtering is performed on bandwidth limited sub-bands of the EEG and the reference channels. Main results. The results suggests that RLAF outperforms the baseline method, average artifact subtraction, in all settings and also its direct predecessor, reference layer artifact subtraction (RLAS), in lower (<35 Hz) frequency ranges. MBRLAF is computationally more demanding than RLAF, but highly effective in all EEG frequency ranges. Effectivity is determined by visual inspection, as well as root-mean-square voltage reduction and power reduction of EEG provided that physiological EEG components such as occipital EEG alpha power and visual evoked potentials (VEP) are preserved. We demonstrate that both, RLAF and MBRLAF, improve VEP quality. For that, we calculate the mean-squared-distance of single trial VEP to the mean VEP and estimate single trial VEP classification accuracies. We found that the average mean-squared-distance is lowest and the average classification accuracy is highest after MBLAF. RLAF was second best. Significance. In conclusion, the results suggests that RLAF and MBRLAF are potentially very effective in improving EEG quality of simultaneous EEG-fMRI. Highlights We present a new and reusable reference layer cap prototype for simultaneous EEG-fMRI We introduce new algorithms for reducing EEG artifacts due to simultaneous fMRI The algorithms combine a reference layer and adaptive filtering Several evaluation criteria suggest superior effectivity in terms of artifact reduction We demonstrate that physiological EEG components are preserved

  2. Compensatory neurofuzzy model for discrete data classification in biomedical

    NASA Astrophysics Data System (ADS)

    Ceylan, Rahime

    2015-03-01

    Biomedical data is separated to two main sections: signals and discrete data. So, studies in this area are about biomedical signal classification or biomedical discrete data classification. There are artificial intelligence models which are relevant to classification of ECG, EMG or EEG signals. In same way, in literature, many models exist for classification of discrete data taken as value of samples which can be results of blood analysis or biopsy in medical process. Each algorithm could not achieve high accuracy rate on classification of signal and discrete data. In this study, compensatory neurofuzzy network model is presented for classification of discrete data in biomedical pattern recognition area. The compensatory neurofuzzy network has a hybrid and binary classifier. In this system, the parameters of fuzzy systems are updated by backpropagation algorithm. The realized classifier model is conducted to two benchmark datasets (Wisconsin Breast Cancer dataset and Pima Indian Diabetes dataset). Experimental studies show that compensatory neurofuzzy network model achieved 96.11% accuracy rate in classification of breast cancer dataset and 69.08% accuracy rate was obtained in experiments made on diabetes dataset with only 10 iterations.

  3. Automatic Diabetic Macular Edema Detection in Fundus Images Using Publicly Available Datasets

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Giancardo, Luca; Meriaudeau, Fabrice; Karnowski, Thomas Paul

    2011-01-01

    Diabetic macular edema (DME) is a common vision threatening complication of diabetic retinopathy. In a large scale screening environment DME can be assessed by detecting exudates (a type of bright lesions) in fundus images. In this work, we introduce a new methodology for diagnosis of DME using a novel set of features based on colour, wavelet decomposition and automatic lesion segmentation. These features are employed to train a classifier able to automatically diagnose DME. We present a new publicly available dataset with ground-truth data containing 169 patients from various ethnic groups and levels of DME. This and other two publiclymore » available datasets are employed to evaluate our algorithm. We are able to achieve diagnosis performance comparable to retina experts on the MESSIDOR (an independently labelled dataset with 1200 images) with cross-dataset testing. Our algorithm is robust to segmentation uncertainties, does not need ground truth at lesion level, and is very fast, generating a diagnosis on an average of 4.4 seconds per image on an 2.6 GHz platform with an unoptimised Matlab implementation.« less

  4. GAMETES: a fast, direct algorithm for generating pure, strict, epistatic models with random architectures.

    PubMed

    Urbanowicz, Ryan J; Kiralis, Jeff; Sinnott-Armstrong, Nicholas A; Heberling, Tamra; Fisher, Jonathan M; Moore, Jason H

    2012-10-01

    Geneticists who look beyond single locus disease associations require additional strategies for the detection of complex multi-locus effects. Epistasis, a multi-locus masking effect, presents a particular challenge, and has been the target of bioinformatic development. Thorough evaluation of new algorithms calls for simulation studies in which known disease models are sought. To date, the best methods for generating simulated multi-locus epistatic models rely on genetic algorithms. However, such methods are computationally expensive, difficult to adapt to multiple objectives, and unlikely to yield models with a precise form of epistasis which we refer to as pure and strict. Purely and strictly epistatic models constitute the worst-case in terms of detecting disease associations, since such associations may only be observed if all n-loci are included in the disease model. This makes them an attractive gold standard for simulation studies considering complex multi-locus effects. We introduce GAMETES, a user-friendly software package and algorithm which generates complex biallelic single nucleotide polymorphism (SNP) disease models for simulation studies. GAMETES rapidly and precisely generates random, pure, strict n-locus models with specified genetic constraints. These constraints include heritability, minor allele frequencies of the SNPs, and population prevalence. GAMETES also includes a simple dataset simulation strategy which may be utilized to rapidly generate an archive of simulated datasets for given genetic models. We highlight the utility and limitations of GAMETES with an example simulation study using MDR, an algorithm designed to detect epistasis. GAMETES is a fast, flexible, and precise tool for generating complex n-locus models with random architectures. While GAMETES has a limited ability to generate models with higher heritabilities, it is proficient at generating the lower heritability models typically used in simulation studies evaluating new algorithms. In addition, the GAMETES modeling strategy may be flexibly combined with any dataset simulation strategy. Beyond dataset simulation, GAMETES could be employed to pursue theoretical characterization of genetic models and epistasis.

  5. SU-F-I-45: An Automated Technique to Measure Image Contrast in Clinical CT Images

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sanders, J; Abadi, E; Meng, B

    Purpose: To develop and validate an automated technique for measuring image contrast in chest computed tomography (CT) exams. Methods: An automated computer algorithm was developed to measure the distribution of Hounsfield units (HUs) inside four major organs: the lungs, liver, aorta, and bones. These organs were first segmented or identified using computer vision and image processing techniques. Regions of interest (ROIs) were automatically placed inside the lungs, liver, and aorta and histograms of the HUs inside the ROIs were constructed. The mean and standard deviation of each histogram were computed for each CT dataset. Comparison of the mean and standardmore » deviation of the HUs in the different organs provides different contrast values. The ROI for the bones is simply the segmentation mask of the bones. Since the histogram for bones does not follow a Gaussian distribution, the 25th and 75th percentile were computed instead of the mean. The sensitivity and accuracy of the algorithm was investigated by comparing the automated measurements with manual measurements. Fifteen contrast enhanced and fifteen non-contrast enhanced chest CT clinical datasets were examined in the validation procedure. Results: The algorithm successfully measured the histograms of the four organs in both contrast and non-contrast enhanced chest CT exams. The automated measurements were in agreement with manual measurements. The algorithm has sufficient sensitivity as indicated by the near unity slope of the automated versus manual measurement plots. Furthermore, the algorithm has sufficient accuracy as indicated by the high coefficient of determination, R2, values ranging from 0.879 to 0.998. Conclusion: Patient-specific image contrast can be measured from clinical datasets. The algorithm can be run on both contrast enhanced and non-enhanced clinical datasets. The method can be applied to automatically assess the contrast characteristics of clinical chest CT images and quantify dependencies that may not be captured in phantom data.« less

  6. A Novel Time-Varying Spectral Filtering Algorithm for Reconstruction of Motion Artifact Corrupted Heart Rate Signals During Intense Physical Activities Using a Wearable Photoplethysmogram Sensor

    PubMed Central

    Salehizadeh, Seyed M. A.; Dao, Duy; Bolkhovsky, Jeffrey; Cho, Chae; Mendelson, Yitzhak; Chon, Ki H.

    2015-01-01

    Accurate estimation of heart rates from photoplethysmogram (PPG) signals during intense physical activity is a very challenging problem. This is because strenuous and high intensity exercise can result in severe motion artifacts in PPG signals, making accurate heart rate (HR) estimation difficult. In this study we investigated a novel technique to accurately reconstruct motion-corrupted PPG signals and HR based on time-varying spectral analysis. The algorithm is called Spectral filter algorithm for Motion Artifacts and heart rate reconstruction (SpaMA). The idea is to calculate the power spectral density of both PPG and accelerometer signals for each time shift of a windowed data segment. By comparing time-varying spectra of PPG and accelerometer data, those frequency peaks resulting from motion artifacts can be distinguished from the PPG spectrum. The SpaMA approach was applied to three different datasets and four types of activities: (1) training datasets from the 2015 IEEE Signal Process. Cup Database recorded from 12 subjects while performing treadmill exercise from 1 km/h to 15 km/h; (2) test datasets from the 2015 IEEE Signal Process. Cup Database recorded from 11 subjects while performing forearm and upper arm exercise. (3) Chon Lab dataset including 10 min recordings from 10 subjects during treadmill exercise. The ECG signals from all three datasets provided the reference HRs which were used to determine the accuracy of our SpaMA algorithm. The performance of the SpaMA approach was calculated by computing the mean absolute error between the estimated HR from the PPG and the reference HR from the ECG. The average estimation errors using our method on the first, second and third datasets are 0.89, 1.93 and 1.38 beats/min respectively, while the overall error on all 33 subjects is 1.86 beats/min and the performance on only treadmill experiment datasets (22 subjects) is 1.11 beats/min. Moreover, it was found that dynamics of heart rate variability can be accurately captured using the algorithm where the mean Pearson’s correlation coefficient between the power spectral densities of the reference and the reconstructed heart rate time series was found to be 0.98. These results show that the SpaMA method has a potential for PPG-based HR monitoring in wearable devices for fitness tracking and health monitoring during intense physical activities. PMID:26703618

  7. A Novel Time-Varying Spectral Filtering Algorithm for Reconstruction of Motion Artifact Corrupted Heart Rate Signals During Intense Physical Activities Using a Wearable Photoplethysmogram Sensor.

    PubMed

    Salehizadeh, Seyed M A; Dao, Duy; Bolkhovsky, Jeffrey; Cho, Chae; Mendelson, Yitzhak; Chon, Ki H

    2015-12-23

    Accurate estimation of heart rates from photoplethysmogram (PPG) signals during intense physical activity is a very challenging problem. This is because strenuous and high intensity exercise can result in severe motion artifacts in PPG signals, making accurate heart rate (HR) estimation difficult. In this study we investigated a novel technique to accurately reconstruct motion-corrupted PPG signals and HR based on time-varying spectral analysis. The algorithm is called Spectral filter algorithm for Motion Artifacts and heart rate reconstruction (SpaMA). The idea is to calculate the power spectral density of both PPG and accelerometer signals for each time shift of a windowed data segment. By comparing time-varying spectra of PPG and accelerometer data, those frequency peaks resulting from motion artifacts can be distinguished from the PPG spectrum. The SpaMA approach was applied to three different datasets and four types of activities: (1) training datasets from the 2015 IEEE Signal Process. Cup Database recorded from 12 subjects while performing treadmill exercise from 1 km/h to 15 km/h; (2) test datasets from the 2015 IEEE Signal Process. Cup Database recorded from 11 subjects while performing forearm and upper arm exercise. (3) Chon Lab dataset including 10 min recordings from 10 subjects during treadmill exercise. The ECG signals from all three datasets provided the reference HRs which were used to determine the accuracy of our SpaMA algorithm. The performance of the SpaMA approach was calculated by computing the mean absolute error between the estimated HR from the PPG and the reference HR from the ECG. The average estimation errors using our method on the first, second and third datasets are 0.89, 1.93 and 1.38 beats/min respectively, while the overall error on all 33 subjects is 1.86 beats/min and the performance on only treadmill experiment datasets (22 subjects) is 1.11 beats/min. Moreover, it was found that dynamics of heart rate variability can be accurately captured using the algorithm where the mean Pearson's correlation coefficient between the power spectral densities of the reference and the reconstructed heart rate time series was found to be 0.98. These results show that the SpaMA method has a potential for PPG-based HR monitoring in wearable devices for fitness tracking and health monitoring during intense physical activities.

  8. The Behavioral and Neural Mechanisms Underlying the Tracking of Expertise

    PubMed Central

    Boorman, Erie D.; O’Doherty, John P.; Adolphs, Ralph; Rangel, Antonio

    2013-01-01

    Summary Evaluating the abilities of others is fundamental for successful economic and social behavior. We investigated the computational and neurobiological basis of ability tracking by designing an fMRI task that required participants to use and update estimates of both people and algorithms’ expertise through observation of their predictions. Behaviorally, we find a model-based algorithm characterized subject predictions better than several alternative models. Notably, when the agent’s prediction was concordant rather than discordant with the subject’s own likely prediction, participants credited people more than algorithms for correct predictions and penalized them less for incorrect predictions. Neurally, many components of the mentalizing network—medial prefrontal cortex, anterior cingulate gyrus, temporoparietal junction, and precuneus—represented or updated expertise beliefs about both people and algorithms. Moreover, activity in lateral orbitofrontal and medial prefrontal cortex reflected behavioral differences in learning about people and algorithms. These findings provide basic insights into the neural basis of social learning. PMID:24360551

  9. A Self-Adaptive Fuzzy c-Means Algorithm for Determining the Optimal Number of Clusters

    PubMed Central

    Wang, Zhihao; Yi, Jing

    2016-01-01

    For the shortcoming of fuzzy c-means algorithm (FCM) needing to know the number of clusters in advance, this paper proposed a new self-adaptive method to determine the optimal number of clusters. Firstly, a density-based algorithm was put forward. The algorithm, according to the characteristics of the dataset, automatically determined the possible maximum number of clusters instead of using the empirical rule n and obtained the optimal initial cluster centroids, improving the limitation of FCM that randomly selected cluster centroids lead the convergence result to the local minimum. Secondly, this paper, by introducing a penalty function, proposed a new fuzzy clustering validity index based on fuzzy compactness and separation, which ensured that when the number of clusters verged on that of objects in the dataset, the value of clustering validity index did not monotonically decrease and was close to zero, so that the optimal number of clusters lost robustness and decision function. Then, based on these studies, a self-adaptive FCM algorithm was put forward to estimate the optimal number of clusters by the iterative trial-and-error process. At last, experiments were done on the UCI, KDD Cup 1999, and synthetic datasets, which showed that the method not only effectively determined the optimal number of clusters, but also reduced the iteration of FCM with the stable clustering result. PMID:28042291

  10. Genes@Work: an efficient algorithm for pattern discovery and multivariate feature selection in gene expression data.

    PubMed

    Lepre, Jorge; Rice, J Jeremy; Tu, Yuhai; Stolovitzky, Gustavo

    2004-05-01

    Despite the growing literature devoted to finding differentially expressed genes in assays probing different tissues types, little attention has been paid to the combinatorial nature of feature selection inherent to large, high-dimensional gene expression datasets. New flexible data analysis approaches capable of searching relevant subgroups of genes and experiments are needed to understand multivariate associations of gene expression patterns with observed phenotypes. We present in detail a deterministic algorithm to discover patterns of multivariate gene associations in gene expression data. The patterns discovered are differential with respect to a control dataset. The algorithm is exhaustive and efficient, reporting all existent patterns that fit a given input parameter set while avoiding enumeration of the entire pattern space. The value of the pattern discovery approach is demonstrated by finding a set of genes that differentiate between two types of lymphoma. Moreover, these genes are found to behave consistently in an independent dataset produced in a different laboratory using different arrays, thus validating the genes selected using our algorithm. We show that the genes deemed significant in terms of their multivariate statistics will be missed using other methods. Our set of pattern discovery algorithms including a user interface is distributed as a package called Genes@Work. This package is freely available to non-commercial users and can be downloaded from our website (http://www.research.ibm.com/FunGen).

  11. Feasibility Study of a Generalized Framework for Developing Computer-Aided Detection Systems-a New Paradigm.

    PubMed

    Nemoto, Mitsutaka; Hayashi, Naoto; Hanaoka, Shouhei; Nomura, Yukihiro; Miki, Soichiro; Yoshikawa, Takeharu

    2017-10-01

    We propose a generalized framework for developing computer-aided detection (CADe) systems whose characteristics depend only on those of the training dataset. The purpose of this study is to show the feasibility of the framework. Two different CADe systems were experimentally developed by a prototype of the framework, but with different training datasets. The CADe systems include four components; preprocessing, candidate area extraction, candidate detection, and candidate classification. Four pretrained algorithms with dedicated optimization/setting methods corresponding to the respective components were prepared in advance. The pretrained algorithms were sequentially trained in the order of processing of the components. In this study, two different datasets, brain MRA with cerebral aneurysms and chest CT with lung nodules, were collected to develop two different types of CADe systems in the framework. The performances of the developed CADe systems were evaluated by threefold cross-validation. The CADe systems for detecting cerebral aneurysms in brain MRAs and for detecting lung nodules in chest CTs were successfully developed using the respective datasets. The framework was shown to be feasible by the successful development of the two different types of CADe systems. The feasibility of this framework shows promise for a new paradigm in the development of CADe systems: development of CADe systems without any lesion specific algorithm designing.

  12. A GPU-Accelerated Approach for Feature Tracking in Time-Varying Imagery Datasets.

    PubMed

    Peng, Chao; Sahani, Sandip; Rushing, John

    2017-10-01

    We propose a novel parallel connected component labeling (CCL) algorithm along with efficient out-of-core data management to detect and track feature regions of large time-varying imagery datasets. Our approach contributes to the big data field with parallel algorithms tailored for GPU architectures. We remove the data dependency between frames and achieve pixel-level parallelism. Due to the large size, the entire dataset cannot fit into cached memory. Frames have to be streamed through the memory hierarchy (disk to CPU main memory and then to GPU memory), partitioned, and processed as batches, where each batch is small enough to fit into the GPU. To reconnect the feature regions that are separated due to data partitioning, we present a novel batch merging algorithm to extract the region connection information across multiple batches in a parallel fashion. The information is organized in a memory-efficient structure and supports fast indexing on the GPU. Our experiment uses a commodity workstation equipped with a single GPU. The results show that our approach can efficiently process a weather dataset composed of terabytes of time-varying radar images. The advantages of our approach are demonstrated by comparing to the performance of an efficient CPU cluster implementation which is being used by the weather scientists.

  13. Gene selection for microarray cancer classification using a new evolutionary method employing artificial intelligence concepts.

    PubMed

    Dashtban, M; Balafar, Mohammadali

    2017-03-01

    Gene selection is a demanding task for microarray data analysis. The diverse complexity of different cancers makes this issue still challenging. In this study, a novel evolutionary method based on genetic algorithms and artificial intelligence is proposed to identify predictive genes for cancer classification. A filter method was first applied to reduce the dimensionality of feature space followed by employing an integer-coded genetic algorithm with dynamic-length genotype, intelligent parameter settings, and modified operators. The algorithmic behaviors including convergence trends, mutation and crossover rate changes, and running time were studied, conceptually discussed, and shown to be coherent with literature findings. Two well-known filter methods, Laplacian and Fisher score, were examined considering similarities, the quality of selected genes, and their influences on the evolutionary approach. Several statistical tests concerning choice of classifier, choice of dataset, and choice of filter method were performed, and they revealed some significant differences between the performance of different classifiers and filter methods over datasets. The proposed method was benchmarked upon five popular high-dimensional cancer datasets; for each, top explored genes were reported. Comparing the experimental results with several state-of-the-art methods revealed that the proposed method outperforms previous methods in DLBCL dataset. Copyright © 2017 Elsevier Inc. All rights reserved.

  14. Regularised extreme learning machine with misclassification cost and rejection cost for gene expression data classification.

    PubMed

    Lu, Huijuan; Wei, Shasha; Zhou, Zili; Miao, Yanzi; Lu, Yi

    2015-01-01

    The main purpose of traditional classification algorithms on bioinformatics application is to acquire better classification accuracy. However, these algorithms cannot meet the requirement that minimises the average misclassification cost. In this paper, a new algorithm of cost-sensitive regularised extreme learning machine (CS-RELM) was proposed by using probability estimation and misclassification cost to reconstruct the classification results. By improving the classification accuracy of a group of small sample which higher misclassification cost, the new CS-RELM can minimise the classification cost. The 'rejection cost' was integrated into CS-RELM algorithm to further reduce the average misclassification cost. By using Colon Tumour dataset and SRBCT (Small Round Blue Cells Tumour) dataset, CS-RELM was compared with other cost-sensitive algorithms such as extreme learning machine (ELM), cost-sensitive extreme learning machine, regularised extreme learning machine, cost-sensitive support vector machine (SVM). The results of experiments show that CS-RELM with embedded rejection cost could reduce the average cost of misclassification and made more credible classification decision than others.

  15. A Node Linkage Approach for Sequential Pattern Mining

    PubMed Central

    Navarro, Osvaldo; Cumplido, René; Villaseñor-Pineda, Luis; Feregrino-Uribe, Claudia; Carrasco-Ochoa, Jesús Ariel

    2014-01-01

    Sequential Pattern Mining is a widely addressed problem in data mining, with applications such as analyzing Web usage, examining purchase behavior, and text mining, among others. Nevertheless, with the dramatic increase in data volume, the current approaches prove inefficient when dealing with large input datasets, a large number of different symbols and low minimum supports. In this paper, we propose a new sequential pattern mining algorithm, which follows a pattern-growth scheme to discover sequential patterns. Unlike most pattern growth algorithms, our approach does not build a data structure to represent the input dataset, but instead accesses the required sequences through pseudo-projection databases, achieving better runtime and reducing memory requirements. Our algorithm traverses the search space in a depth-first fashion and only preserves in memory a pattern node linkage and the pseudo-projections required for the branch being explored at the time. Experimental results show that our new approach, the Node Linkage Depth-First Traversal algorithm (NLDFT), has better performance and scalability in comparison with state of the art algorithms. PMID:24933123

  16. Can We Train Machine Learning Methods to Outperform the High-dimensional Propensity Score Algorithm?

    PubMed

    Karim, Mohammad Ehsanul; Pang, Menglan; Platt, Robert W

    2018-03-01

    The use of retrospective health care claims datasets is frequently criticized for the lack of complete information on potential confounders. Utilizing patient's health status-related information from claims datasets as surrogates or proxies for mismeasured and unobserved confounders, the high-dimensional propensity score algorithm enables us to reduce bias. Using a previously published cohort study of postmyocardial infarction statin use (1998-2012), we compare the performance of the algorithm with a number of popular machine learning approaches for confounder selection in high-dimensional covariate spaces: random forest, least absolute shrinkage and selection operator, and elastic net. Our results suggest that, when the data analysis is done with epidemiologic principles in mind, machine learning methods perform as well as the high-dimensional propensity score algorithm. Using a plasmode framework that mimicked the empirical data, we also showed that a hybrid of machine learning and high-dimensional propensity score algorithms generally perform slightly better than both in terms of mean squared error, when a bias-based analysis is used.

  17. Reliability Correction for Functional Connectivity: Theory and Implementation

    PubMed Central

    Mueller, Sophia; Wang, Danhong; Fox, Michael D.; Pan, Ruiqi; Lu, Jie; Li, Kuncheng; Sun, Wei; Buckner, Randy L.; Liu, Hesheng

    2016-01-01

    Network properties can be estimated using functional connectivity MRI (fcMRI). However, regional variation of the fMRI signal causes systematic biases in network estimates including correlation attenuation in regions of low measurement reliability. Here we computed the spatial distribution of fcMRI reliability using longitudinal fcMRI datasets and demonstrated how pre-estimated reliability maps can correct for correlation attenuation. As a test case of reliability-based attenuation correction we estimated properties of the default network, where reliability was significantly lower than average in the medial temporal lobe and higher in the posterior medial cortex, heterogeneity that impacts estimation of the network. Accounting for this bias using attenuation correction revealed that the medial temporal lobe’s contribution to the default network is typically underestimated. To render this approach useful to a greater number of datasets, we demonstrate that test-retest reliability maps derived from repeated runs within a single scanning session can be used as a surrogate for multi-session reliability mapping. Using data segments with different scan lengths between 1 and 30 min, we found that test-retest reliability of connectivity estimates increases with scan length while the spatial distribution of reliability is relatively stable even at short scan lengths. Finally, analyses of tertiary data revealed that reliability distribution is influenced by age, neuropsychiatric status and scanner type, suggesting that reliability correction may be especially important when studying between-group differences. Collectively, these results illustrate that reliability-based attenuation correction is an easily implemented strategy that mitigates certain features of fMRI signal nonuniformity. PMID:26493163

  18. A CCA+ICA based model for multi-task brain imaging data fusion and its application to schizophrenia.

    PubMed

    Sui, Jing; Adali, Tülay; Pearlson, Godfrey; Yang, Honghui; Sponheim, Scott R; White, Tonya; Calhoun, Vince D

    2010-05-15

    Collection of multiple-task brain imaging data from the same subject has now become common practice in medical imaging studies. In this paper, we propose a simple yet effective model, "CCA+ICA", as a powerful tool for multi-task data fusion. This joint blind source separation (BSS) model takes advantage of two multivariate methods: canonical correlation analysis and independent component analysis, to achieve both high estimation accuracy and to provide the correct connection between two datasets in which sources can have either common or distinct between-dataset correlation. In both simulated and real fMRI applications, we compare the proposed scheme with other joint BSS models and examine the different modeling assumptions. The contrast images of two tasks: sensorimotor (SM) and Sternberg working memory (SB), derived from a general linear model (GLM), were chosen to contribute real multi-task fMRI data, both of which were collected from 50 schizophrenia patients and 50 healthy controls. When examining the relationship with duration of illness, CCA+ICA revealed a significant negative correlation with temporal lobe activation. Furthermore, CCA+ICA located sensorimotor cortex as the group-discriminative regions for both tasks and identified the superior temporal gyrus in SM and prefrontal cortex in SB as task-specific group-discriminative brain networks. In summary, we compared the new approach to some competitive methods with different assumptions, and found consistent results regarding each of their hypotheses on connecting the two tasks. Such an approach fills a gap in existing multivariate methods for identifying biomarkers from brain imaging data.

  19. Subtle In-Scanner Motion Biases Automated Measurement of Brain Anatomy From In Vivo MRI

    PubMed Central

    Alexander-Bloch, Aaron; Clasen, Liv; Stockman, Michael; Ronan, Lisa; Lalonde, Francois; Giedd, Jay; Raznahan, Armin

    2016-01-01

    While the potential for small amounts of motion in functional magnetic resonance imaging (fMRI) scans to bias the results of functional neuroimaging studies is well appreciated, the impact of in-scanner motion on morphological analysis of structural MRI is relatively under-studied. Even among “good quality” structural scans, there may be systematic effects of motion on measures of brain morphometry. In the present study, the subjects’ tendency to move during fMRI scans, acquired in the same scanning sessions as their structural scans, yielded a reliable, continuous estimate of in-scanner motion. Using this approach within a sample of 127 children, adolescents, and young adults, significant relationships were found between this measure and estimates of cortical gray matter volume and mean curvature, as well as trend-level relationships with cortical thickness. Specifically, cortical volume and thickness decreased with greater motion, and mean curvature increased. These effects of subtle motion were anatomically heterogeneous, were present across different automated imaging pipelines, showed convergent validity with effects of frank motion assessed in a separate sample of 274 scans, and could be demonstrated in both pediatric and adult populations. Thus, using different motion assays in two large non-overlapping sets of structural MRI scans, convergent evidence showed that in-scanner motion—even at levels which do not manifest in visible motion artifact—can lead to systematic and regionally specific biases in anatomical estimation. These findings have special relevance to structural neuroimaging in developmental and clinical datasets, and inform ongoing efforts to optimize neuroanatomical analysis of existing and future structural MRI datasets in non-sedated humans. PMID:27004471

  20. GCALIGNER 1.0: an alignment program to compute a multiple sample comparison data matrix from large eco-chemical datasets obtained by GC.

    PubMed

    Dellicour, Simon; Lecocq, Thomas

    2013-10-01

    GCALIGNER 1.0 is a computer program designed to perform a preliminary data comparison matrix of chemical data obtained by GC without MS information. The alignment algorithm is based on the comparison between the retention times of each detected compound in a sample. In this paper, we test the GCALIGNER efficiency on three datasets of the chemical secretions of bumble bees. The algorithm performs the alignment with a low error rate (<3%). GCALIGNER 1.0 is a useful, simple and free program based on an algorithm that enables the alignment of table-type data from GC. © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  1. Feature Based Retention Time Alignment for Improved HDX MS Analysis

    NASA Astrophysics Data System (ADS)

    Venable, John D.; Scuba, William; Brock, Ansgar

    2013-04-01

    An algorithm for retention time alignment of mass shifted hydrogen-deuterium exchange (HDX) data based on an iterative distance minimization procedure is described. The algorithm performs pairwise comparisons in an iterative fashion between a list of features from a reference file and a file to be time aligned to calculate a retention time mapping function. Features are characterized by their charge, retention time and mass of the monoisotopic peak. The algorithm is able to align datasets with mass shifted features, which is a prerequisite for aligning hydrogen-deuterium exchange mass spectrometry datasets. Confidence assignments from the fully automated processing of a commercial HDX software package are shown to benefit significantly from retention time alignment prior to extraction of deuterium incorporation values.

  2. Convex Accelerated Maximum Entropy Reconstruction

    PubMed Central

    Worley, Bradley

    2016-01-01

    Maximum entropy (MaxEnt) spectral reconstruction methods provide a powerful framework for spectral estimation of nonuniformly sampled datasets. Many methods exist within this framework, usually defined based on the magnitude of a Lagrange multiplier in the MaxEnt objective function. An algorithm is presented here that utilizes accelerated first-order convex optimization techniques to rapidly and reliably reconstruct nonuniformly sampled NMR datasets using the principle of maximum entropy. This algorithm – called CAMERA for Convex Accelerated Maximum Entropy Reconstruction Algorithm – is a new approach to spectral reconstruction that exhibits fast, tunable convergence in both constant-aim and constant-lambda modes. A high-performance, open source NMR data processing tool is described that implements CAMERA, and brief comparisons to existing reconstruction methods are made on several example spectra. PMID:26894476

  3. Auto-SEIA: simultaneous optimization of image processing and machine learning algorithms

    NASA Astrophysics Data System (ADS)

    Negro Maggio, Valentina; Iocchi, Luca

    2015-02-01

    Object classification from images is an important task for machine vision and it is a crucial ingredient for many computer vision applications, ranging from security and surveillance to marketing. Image based object classification techniques properly integrate image processing and machine learning (i.e., classification) procedures. In this paper we present a system for automatic simultaneous optimization of algorithms and parameters for object classification from images. More specifically, the proposed system is able to process a dataset of labelled images and to return a best configuration of image processing and classification algorithms and of their parameters with respect to the accuracy of classification. Experiments with real public datasets are used to demonstrate the effectiveness of the developed system.

  4. a Web-Based Interactive Platform for Co-Clustering Spatio-Temporal Data

    NASA Astrophysics Data System (ADS)

    Wu, X.; Poorthuis, A.; Zurita-Milla, R.; Kraak, M.-J.

    2017-09-01

    Since current studies on clustering analysis mainly focus on exploring spatial or temporal patterns separately, a co-clustering algorithm is utilized in this study to enable the concurrent analysis of spatio-temporal patterns. To allow users to adopt and adapt the algorithm for their own analysis, it is integrated within the server side of an interactive web-based platform. The client side of the platform, running within any modern browser, is a graphical user interface (GUI) with multiple linked visualizations that facilitates the understanding, exploration and interpretation of the raw dataset and co-clustering results. Users can also upload their own datasets and adjust clustering parameters within the platform. To illustrate the use of this platform, an annual temperature dataset from 28 weather stations over 20 years in the Netherlands is used. After the dataset is loaded, it is visualized in a set of linked visualizations: a geographical map, a timeline and a heatmap. This aids the user in understanding the nature of their dataset and the appropriate selection of co-clustering parameters. Once the dataset is processed by the co-clustering algorithm, the results are visualized in the small multiples, a heatmap and a timeline to provide various views for better understanding and also further interpretation. Since the visualization and analysis are integrated in a seamless platform, the user can explore different sets of co-clustering parameters and instantly view the results in order to do iterative, exploratory data analysis. As such, this interactive web-based platform allows users to analyze spatio-temporal data using the co-clustering method and also helps the understanding of the results using multiple linked visualizations.

  5. A new randomized Kaczmarz based kernel canonical correlation analysis algorithm with applications to information retrieval.

    PubMed

    Cai, Jia; Tang, Yi

    2018-02-01

    Canonical correlation analysis (CCA) is a powerful statistical tool for detecting the linear relationship between two sets of multivariate variables. Kernel generalization of it, namely, kernel CCA is proposed to describe nonlinear relationship between two variables. Although kernel CCA can achieve dimensionality reduction results for high-dimensional data feature selection problem, it also yields the so called over-fitting phenomenon. In this paper, we consider a new kernel CCA algorithm via randomized Kaczmarz method. The main contributions of the paper are: (1) A new kernel CCA algorithm is developed, (2) theoretical convergence of the proposed algorithm is addressed by means of scaled condition number, (3) a lower bound which addresses the minimum number of iterations is presented. We test on both synthetic dataset and several real-world datasets in cross-language document retrieval and content-based image retrieval to demonstrate the effectiveness of the proposed algorithm. Numerical results imply the performance and efficiency of the new algorithm, which is competitive with several state-of-the-art kernel CCA methods. Copyright © 2017 Elsevier Ltd. All rights reserved.

  6. An Automatic Image Processing System for Glaucoma Screening

    PubMed Central

    Alodhayb, Sami; Lakshminarayanan, Vasudevan

    2017-01-01

    Horizontal and vertical cup to disc ratios are the most crucial parameters used clinically to detect glaucoma or monitor its progress and are manually evaluated from retinal fundus images of the optic nerve head. Due to the rarity of the glaucoma experts as well as the increasing in glaucoma's population, an automatically calculated horizontal and vertical cup to disc ratios (HCDR and VCDR, resp.) can be useful for glaucoma screening. We report on two algorithms to calculate the HCDR and VCDR. In the algorithms, level set and inpainting techniques were developed for segmenting the disc, while thresholding using Type-II fuzzy approach was developed for segmenting the cup. The results from the algorithms were verified using the manual markings of images from a dataset of glaucomatous images (retinal fundus images for glaucoma analysis (RIGA dataset)) by six ophthalmologists. The algorithm's accuracy for HCDR and VCDR combined was 74.2%. Only the accuracy of manual markings by one ophthalmologist was higher than the algorithm's accuracy. The algorithm's best agreement was with markings by ophthalmologist number 1 in 230 images (41.8%) of the total tested images. PMID:28947898

  7. Super-resolution reconstruction of MR image with a novel residual learning network algorithm

    NASA Astrophysics Data System (ADS)

    Shi, Jun; Liu, Qingping; Wang, Chaofeng; Zhang, Qi; Ying, Shihui; Xu, Haoyu

    2018-04-01

    Spatial resolution is one of the key parameters of magnetic resonance imaging (MRI). The image super-resolution (SR) technique offers an alternative approach to improve the spatial resolution of MRI due to its simplicity. Convolutional neural networks (CNN)-based SR algorithms have achieved state-of-the-art performance, in which the global residual learning (GRL) strategy is now commonly used due to its effectiveness for learning image details for SR. However, the partial loss of image details usually happens in a very deep network due to the degradation problem. In this work, we propose a novel residual learning-based SR algorithm for MRI, which combines both multi-scale GRL and shallow network block-based local residual learning (LRL). The proposed LRL module works effectively in capturing high-frequency details by learning local residuals. One simulated MRI dataset and two real MRI datasets have been used to evaluate our algorithm. The experimental results show that the proposed SR algorithm achieves superior performance to all of the other compared CNN-based SR algorithms in this work.

  8. Edge-oriented dual-dictionary guided enrichment (EDGE) for MRI-CT image reconstruction.

    PubMed

    Li, Liang; Wang, Bigong; Wang, Ge

    2016-01-01

    In this paper, we formulate the joint/simultaneous X-ray CT and MRI image reconstruction. In particular, a novel algorithm is proposed for MRI image reconstruction from highly under-sampled MRI data and CT images. It consists of two steps. First, a training dataset is generated from a series of well-registered MRI and CT images on the same patients. Then, an initial MRI image of a patient can be reconstructed via edge-oriented dual-dictionary guided enrichment (EDGE) based on the training dataset and a CT image of the patient. Second, an MRI image is reconstructed using the dictionary learning (DL) algorithm from highly under-sampled k-space data and the initial MRI image. Our algorithm can establish a one-to-one correspondence between the two imaging modalities, and obtain a good initial MRI estimation. Both noise-free and noisy simulation studies were performed to evaluate and validate the proposed algorithm. The results with different under-sampling factors show that the proposed algorithm performed significantly better than those reconstructed using the DL algorithm from MRI data alone.

  9. Comparison of Naive Bayes and Decision Tree on Feature Selection Using Genetic Algorithm for Classification Problem

    NASA Astrophysics Data System (ADS)

    Rahmadani, S.; Dongoran, A.; Zarlis, M.; Zakarias

    2018-03-01

    This paper discusses the problem of feature selection using genetic algorithms on a dataset for classification problems. The classification model used is the decicion tree (DT), and Naive Bayes. In this paper we will discuss how the Naive Bayes and Decision Tree models to overcome the classification problem in the dataset, where the dataset feature is selectively selected using GA. Then both models compared their performance, whether there is an increase in accuracy or not. From the results obtained shows an increase in accuracy if the feature selection using GA. The proposed model is referred to as GADT (GA-Decision Tree) and GANB (GA-Naive Bayes). The data sets tested in this paper are taken from the UCI Machine Learning repository.

  10. Classifying Structures in the ISM with Machine Learning Techniques

    NASA Astrophysics Data System (ADS)

    Beaumont, Christopher; Goodman, A. A.; Williams, J. P.

    2011-01-01

    The processes which govern molecular cloud evolution and star formation often sculpt structures in the ISM: filaments, pillars, shells, outflows, etc. Because of their morphological complexity, these objects are often identified manually. Manual classification has several disadvantages; the process is subjective, not easily reproducible, and does not scale well to handle increasingly large datasets. We have explored to what extent machine learning algorithms can be trained to autonomously identify specific morphological features in molecular cloud datasets. We show that the Support Vector Machine algorithm can successfully locate filaments and outflows blended with other emission structures. When the objects of interest are morphologically distinct from the surrounding emission, this autonomous classification achieves >90% accuracy. We have developed a set of IDL-based tools to apply this technique to other datasets.

  11. Time course based artifact identification for independent components of resting-state FMRI.

    PubMed

    Rummel, Christian; Verma, Rajeev Kumar; Schöpf, Veronika; Abela, Eugenio; Hauf, Martinus; Berruecos, José Fernando Zapata; Wiest, Roland

    2013-01-01

    In functional magnetic resonance imaging (fMRI) coherent oscillations of the blood oxygen level-dependent (BOLD) signal can be detected. These arise when brain regions respond to external stimuli or are activated by tasks. The same networks have been characterized during wakeful rest when functional connectivity of the human brain is organized in generic resting-state networks (RSN). Alterations of RSN emerge as neurobiological markers of pathological conditions such as altered mental state. In single-subject fMRI data the coherent components can be identified by blind source separation of the pre-processed BOLD data using spatial independent component analysis (ICA) and related approaches. The resulting maps may represent physiological RSNs or may be due to various artifacts. In this methodological study, we propose a conceptually simple and fully automatic time course based filtering procedure to detect obvious artifacts in the ICA output for resting-state fMRI. The filter is trained on six and tested on 29 healthy subjects, yielding mean filter accuracy, sensitivity and specificity of 0.80, 0.82, and 0.75 in out-of-sample tests. To estimate the impact of clearly artifactual single-subject components on group resting-state studies we analyze unfiltered and filtered output with a second level ICA procedure. Although the automated filter does not reach performance values of visual analysis by human raters, we propose that resting-state compatible analysis of ICA time courses could be very useful to complement the existing map or task/event oriented artifact classification algorithms.

  12. Predicting Long-Term Cognitive Outcome Following Breast Cancer with Pre-Treatment Resting State fMRI and Random Forest Machine Learning.

    PubMed

    Kesler, Shelli R; Rao, Arvind; Blayney, Douglas W; Oakley-Girvan, Ingrid A; Karuturi, Meghan; Palesh, Oxana

    2017-01-01

    We aimed to determine if resting state functional magnetic resonance imaging (fMRI) acquired at pre-treatment baseline could accurately predict breast cancer-related cognitive impairment at long-term follow-up. We evaluated 31 patients with breast cancer (age 34-65) prior to any treatment, post-chemotherapy and 1 year later. Cognitive testing scores were normalized based on data obtained from 43 healthy female controls and then used to categorize patients as impaired or not based on longitudinal changes. We measured clustering coefficient, a measure of local connectivity, by applying graph theory to baseline resting state fMRI and entered these metrics along with relevant patient-related and medical variables into random forest classification. Incidence of cognitive impairment at 1 year follow-up was 55% and was predicted by classification algorithms with up to 100% accuracy ( p < 0.0001). The neuroimaging-based model was significantly more accurate than a model involving patient-related and medical variables ( p = 0.005). Hub regions belonging to several distinct functional networks were the most important predictors of cognitive outcome. Characteristics of these hubs indicated potential spread of brain injury from default mode to other networks over time. These findings suggest that resting state fMRI is a promising tool for predicting future cognitive impairment associated with breast cancer. This information could inform treatment decision making by identifying patients at highest risk for long-term cognitive impairment.

  13. Predicting Long-Term Cognitive Outcome Following Breast Cancer with Pre-Treatment Resting State fMRI and Random Forest Machine Learning

    PubMed Central

    Kesler, Shelli R.; Rao, Arvind; Blayney, Douglas W.; Oakley-Girvan, Ingrid A.; Karuturi, Meghan; Palesh, Oxana

    2017-01-01

    We aimed to determine if resting state functional magnetic resonance imaging (fMRI) acquired at pre-treatment baseline could accurately predict breast cancer-related cognitive impairment at long-term follow-up. We evaluated 31 patients with breast cancer (age 34–65) prior to any treatment, post-chemotherapy and 1 year later. Cognitive testing scores were normalized based on data obtained from 43 healthy female controls and then used to categorize patients as impaired or not based on longitudinal changes. We measured clustering coefficient, a measure of local connectivity, by applying graph theory to baseline resting state fMRI and entered these metrics along with relevant patient-related and medical variables into random forest classification. Incidence of cognitive impairment at 1 year follow-up was 55% and was predicted by classification algorithms with up to 100% accuracy (p < 0.0001). The neuroimaging-based model was significantly more accurate than a model involving patient-related and medical variables (p = 0.005). Hub regions belonging to several distinct functional networks were the most important predictors of cognitive outcome. Characteristics of these hubs indicated potential spread of brain injury from default mode to other networks over time. These findings suggest that resting state fMRI is a promising tool for predicting future cognitive impairment associated with breast cancer. This information could inform treatment decision making by identifying patients at highest risk for long-term cognitive impairment. PMID:29187817

  14. Spatial Variance in Resting fMRI Networks of Schizophrenia Patients: An Independent Vector Analysis

    PubMed Central

    Gopal, Shruti; Miller, Robyn L.; Michael, Andrew; Adali, Tulay; Cetin, Mustafa; Rachakonda, Srinivas; Bustillo, Juan R.; Cahill, Nathan; Baum, Stefi A.; Calhoun, Vince D.

    2016-01-01

    Spatial variability in resting functional MRI (fMRI) brain networks has not been well studied in schizophrenia, a disease known for both neurodevelopmental and widespread anatomic changes. Motivated by abundant evidence of neuroanatomical variability from previous studies of schizophrenia, we draw upon a relatively new approach called independent vector analysis (IVA) to assess this variability in resting fMRI networks. IVA is a blind-source separation algorithm, which segregates fMRI data into temporally coherent but spatially independent networks and has been shown to be especially good at capturing spatial variability among subjects in the extracted networks. We introduce several new ways to quantify differences in variability of IVA-derived networks between schizophrenia patients (SZs = 82) and healthy controls (HCs = 89). Voxelwise amplitude analyses showed significant group differences in the spatial maps of auditory cortex, the basal ganglia, the sensorimotor network, and visual cortex. Tests for differences (HC-SZ) in the spatial variability maps suggest, that at rest, SZs exhibit more activity within externally focused sensory and integrative network and less activity in the default mode network thought to be related to internal reflection. Additionally, tests for difference of variance between groups further emphasize that SZs exhibit greater network variability. These results, consistent with our prediction of increased spatial variability within SZs, enhance our understanding of the disease and suggest that it is not just the amplitude of connectivity that is different in schizophrenia, but also the consistency in spatial connectivity patterns across subjects. PMID:26106217

  15. Remote Sensing Applications to Water Quality Management in Florida

    EPA Science Inventory

    Increasingly, optical datasets from estuarine and coastal systems are becoming available for remote sensing algorithm development, validation, and application. With validated algorithms, the data streams from satellite sensors can provide unprecedented spatial and temporal data ...

  16. A 2D range Hausdorff approach to 3D facial recognition.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Koch, Mark William; Russ, Trina Denise; Little, Charles Quentin

    2004-11-01

    This paper presents a 3D facial recognition algorithm based on the Hausdorff distance metric. The standard 3D formulation of the Hausdorff matching algorithm has been modified to operate on a 2D range image, enabling a reduction in computation from O(N2) to O(N) without large storage requirements. The Hausdorff distance is known for its robustness to data outliers and inconsistent data between two data sets, making it a suitable choice for dealing with the inherent problems in many 3D datasets due to sensor noise and object self-occlusion. For optimal performance, the algorithm assumes a good initial alignment between probe and templatemore » datasets. However, to minimize the error between two faces, the alignment can be iteratively refined. Results from the algorithm are presented using 3D face images from the Face Recognition Grand Challenge database version 1.0.« less

  17. A novel clinical decision support system using improved adaptive genetic algorithm for the assessment of fetal well-being.

    PubMed

    Ravindran, Sindhu; Jambek, Asral Bahari; Muthusamy, Hariharan; Neoh, Siew-Chin

    2015-01-01

    A novel clinical decision support system is proposed in this paper for evaluating the fetal well-being from the cardiotocogram (CTG) dataset through an Improved Adaptive Genetic Algorithm (IAGA) and Extreme Learning Machine (ELM). IAGA employs a new scaling technique (called sigma scaling) to avoid premature convergence and applies adaptive crossover and mutation techniques with masking concepts to enhance population diversity. Also, this search algorithm utilizes three different fitness functions (two single objective fitness functions and multi-objective fitness function) to assess its performance. The classification results unfold that promising classification accuracy of 94% is obtained with an optimal feature subset using IAGA. Also, the classification results are compared with those of other Feature Reduction techniques to substantiate its exhaustive search towards the global optimum. Besides, five other benchmark datasets are used to gauge the strength of the proposed IAGA algorithm.

  18. Utilization of Ancillary Data Sets for SMAP Algorithm Development and Product Generation

    NASA Technical Reports Server (NTRS)

    ONeill, P.; Podest, E.; Njoku, E.

    2011-01-01

    Algorithms being developed for the Soil Moisture Active Passive (SMAP) mission require a variety of both static and ancillary data. The selection of the most appropriate source for each ancillary data parameter is driven by a number of considerations, including accuracy, latency, availability, and consistency across all SMAP products and with SMOS (Soil Moisture Ocean Salinity). It is anticipated that initial selection of all ancillary datasets, which are needed for ongoing algorithm development activities on the SMAP algorithm testbed at JPL, will be completed within the year. These datasets will be updated as new or improved sources become available, and all selections and changes will be documented for the benefit of the user community. Wise choices in ancillary data will help to enable SMAP to provide new global measurements of soil moisture and freeze/thaw state at the targeted accuracy necessary to tackle hydrologically-relevant societal issues.

  19. A statistical framework for evaluating neural networks to predict recurrent events in breast cancer

    NASA Astrophysics Data System (ADS)

    Gorunescu, Florin; Gorunescu, Marina; El-Darzi, Elia; Gorunescu, Smaranda

    2010-07-01

    Breast cancer is the second leading cause of cancer deaths in women today. Sometimes, breast cancer can return after primary treatment. A medical diagnosis of recurrent cancer is often a more challenging task than the initial one. In this paper, we investigate the potential contribution of neural networks (NNs) to support health professionals in diagnosing such events. The NN algorithms are tested and applied to two different datasets. An extensive statistical analysis has been performed to verify our experiments. The results show that a simple network structure for both the multi-layer perceptron and radial basis function can produce equally good results, not all attributes are needed to train these algorithms and, finally, the classification performances of all algorithms are statistically robust. Moreover, we have shown that the best performing algorithm will strongly depend on the features of the datasets, and hence, there is not necessarily a single best classifier.

  20. A novel orthoimage mosaic method using the weighted A* algorithm for UAV imagery

    NASA Astrophysics Data System (ADS)

    Zheng, Maoteng; Zhou, Shunping; Xiong, Xiaodong; Zhu, Junfeng

    2017-12-01

    A weighted A* algorithm is proposed to select optimal seam-lines in orthoimage mosaic for UAV (Unmanned Aircraft Vehicle) imagery. The whole workflow includes four steps: the initial seam-line network is firstly generated by standard Voronoi Diagram algorithm; an edge diagram is then detected based on DSM (Digital Surface Model) data; the vertices (conjunction nodes) of initial network are relocated since some of them are on the high objects (buildings, trees and other artificial structures); and, the initial seam-lines are finally refined using the weighted A* algorithm based on the edge diagram and the relocated vertices. The method was tested with two real UAV datasets. Preliminary results show that the proposed method produces acceptable mosaic images in both the urban and mountainous areas, and is better than the result of the state-of-the-art methods on the datasets.

  1. Combining peak- and chromatogram-based retention time alignment algorithms for multiple chromatography-mass spectrometry datasets.

    PubMed

    Hoffmann, Nils; Keck, Matthias; Neuweger, Heiko; Wilhelm, Mathias; Högy, Petra; Niehaus, Karsten; Stoye, Jens

    2012-08-27

    Modern analytical methods in biology and chemistry use separation techniques coupled to sensitive detectors, such as gas chromatography-mass spectrometry (GC-MS) and liquid chromatography-mass spectrometry (LC-MS). These hyphenated methods provide high-dimensional data. Comparing such data manually to find corresponding signals is a laborious task, as each experiment usually consists of thousands of individual scans, each containing hundreds or even thousands of distinct signals. In order to allow for successful identification of metabolites or proteins within such data, especially in the context of metabolomics and proteomics, an accurate alignment and matching of corresponding features between two or more experiments is required. Such a matching algorithm should capture fluctuations in the chromatographic system which lead to non-linear distortions on the time axis, as well as systematic changes in recorded intensities. Many different algorithms for the retention time alignment of GC-MS and LC-MS data have been proposed and published, but all of them focus either on aligning previously extracted peak features or on aligning and comparing the complete raw data containing all available features. In this paper we introduce two algorithms for retention time alignment of multiple GC-MS datasets: multiple alignment by bidirectional best hits peak assignment and cluster extension (BIPACE) and center-star multiple alignment by pairwise partitioned dynamic time warping (CeMAPP-DTW). We show how the similarity-based peak group matching method BIPACE may be used for multiple alignment calculation individually and how it can be used as a preprocessing step for the pairwise alignments performed by CeMAPP-DTW. We evaluate the algorithms individually and in combination on a previously published small GC-MS dataset studying the Leishmania parasite and on a larger GC-MS dataset studying grains of wheat (Triticum aestivum). We have shown that BIPACE achieves very high precision and recall and a very low number of false positive peak assignments on both evaluation datasets. CeMAPP-DTW finds a high number of true positives when executed on its own, but achieves even better results when BIPACE is used to constrain its search space. The source code of both algorithms is included in the OpenSource software framework Maltcms, which is available from http://maltcms.sf.net. The evaluation scripts of the present study are available from the same source.

  2. Combining peak- and chromatogram-based retention time alignment algorithms for multiple chromatography-mass spectrometry datasets

    PubMed Central

    2012-01-01

    Background Modern analytical methods in biology and chemistry use separation techniques coupled to sensitive detectors, such as gas chromatography-mass spectrometry (GC-MS) and liquid chromatography-mass spectrometry (LC-MS). These hyphenated methods provide high-dimensional data. Comparing such data manually to find corresponding signals is a laborious task, as each experiment usually consists of thousands of individual scans, each containing hundreds or even thousands of distinct signals. In order to allow for successful identification of metabolites or proteins within such data, especially in the context of metabolomics and proteomics, an accurate alignment and matching of corresponding features between two or more experiments is required. Such a matching algorithm should capture fluctuations in the chromatographic system which lead to non-linear distortions on the time axis, as well as systematic changes in recorded intensities. Many different algorithms for the retention time alignment of GC-MS and LC-MS data have been proposed and published, but all of them focus either on aligning previously extracted peak features or on aligning and comparing the complete raw data containing all available features. Results In this paper we introduce two algorithms for retention time alignment of multiple GC-MS datasets: multiple alignment by bidirectional best hits peak assignment and cluster extension (BIPACE) and center-star multiple alignment by pairwise partitioned dynamic time warping (CeMAPP-DTW). We show how the similarity-based peak group matching method BIPACE may be used for multiple alignment calculation individually and how it can be used as a preprocessing step for the pairwise alignments performed by CeMAPP-DTW. We evaluate the algorithms individually and in combination on a previously published small GC-MS dataset studying the Leishmania parasite and on a larger GC-MS dataset studying grains of wheat (Triticum aestivum). Conclusions We have shown that BIPACE achieves very high precision and recall and a very low number of false positive peak assignments on both evaluation datasets. CeMAPP-DTW finds a high number of true positives when executed on its own, but achieves even better results when BIPACE is used to constrain its search space. The source code of both algorithms is included in the OpenSource software framework Maltcms, which is available from http://maltcms.sf.net. The evaluation scripts of the present study are available from the same source. PMID:22920415

  3. The centroidal algorithm in molecular similarity and diversity calculations on confidential datasets.

    PubMed

    Trepalin, Sergey; Osadchiy, Nikolay

    2005-01-01

    Chemical structure provides exhaustive description of a compound, but it is often proprietary and thus an impediment in the exchange of information. For example, structure disclosure is often needed for the selection of most similar or dissimilar compounds. Authors propose a centroidal algorithm based on structural fragments (screens) that can be efficiently used for the similarity and diversity selections without disclosing structures from the reference set. For an increased security purposes, authors recommend that such set contains at least some tens of structures. Analysis of reverse engineering feasibility showed that the problem difficulty grows with decrease of the screen's radius. The algorithm is illustrated with concrete calculations on known steroidal, quinoline, and quinazoline drugs. We also investigate a problem of scaffold identification in combinatorial library dataset. The results show that relatively small screens of radius equal to 2 bond lengths perform well in the similarity sorting, while radius 4 screens yield better results in diversity sorting. The software implementation of the algorithm taking SDF file with a reference set generates screens of various radii which are subsequently used for the similarity and diversity sorting of external SDFs. Since the reverse engineering of the reference set molecules from their screens has the same difficulty as the RSA asymmetric encryption algorithm, generated screens can be stored openly without further encryption. This approach ensures an end user transfers only a set of structural fragments and no other data. Like other algorithms of encryption, the centroid algorithm cannot give 100% guarantee of protecting a chemical structure from dataset, but probability of initial structure identification is very small-order of 10(-40) in typical cases.

  4. The centroidal algorithm in molecular similarity and diversity calculations on confidential datasets

    NASA Astrophysics Data System (ADS)

    Trepalin, Sergey; Osadchiy, Nikolay

    2005-09-01

    Chemical structure provides exhaustive description of a compound, but it is often proprietary and thus an impediment in the exchange of information. For example, structure disclosure is often needed for the selection of most similar or dissimilar compounds. Authors propose a centroidal algorithm based on structural fragments (screens) that can be efficiently used for the similarity and diversity selections without disclosing structures from the reference set. For an increased security purposes, authors recommend that such set contains at least some tens of structures. Analysis of reverse engineering feasibility showed that the problem difficulty grows with decrease of the screen's radius. The algorithm is illustrated with concrete calculations on known steroidal, quinoline, and quinazoline drugs. We also investigate a problem of scaffold identification in combinatorial library dataset. The results show that relatively small screens of radius equal to 2 bond lengths perform well in the similarity sorting, while radius 4 screens yield better results in diversity sorting. The software implementation of the algorithm taking SDF file with a reference set generates screens of various radii which are subsequently used for the similarity and diversity sorting of external SDFs. Since the reverse engineering of the reference set molecules from their screens has the same difficulty as the RSA asymmetric encryption algorithm, generated screens can be stored openly without further encryption. This approach ensures an end user transfers only a set of structural fragments and no other data. Like other algorithms of encryption, the centroid algorithm cannot give 100% guarantee of protecting a chemical structure from dataset, but probability of initial structure identification is very small-order of 10-40 in typical cases.

  5. Exudate-based diabetic macular edema detection in fundus images using publicly available datasets

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Giancardo, Luca; Meriaudeau, Fabrice; Karnowski, Thomas Paul

    2011-01-01

    Diabetic macular edema (DME) is a common vision threatening complication of diabetic retinopathy. In a large scale screening environment DME can be assessed by detecting exudates (a type of bright lesions) in fundus images. In this work, we introduce a new methodology for diagnosis of DME using a novel set of features based on colour, wavelet decomposition and automatic lesion segmentation. These features are employed to train a classifier able to automatically diagnose DME through the presence of exudation. We present a new publicly available dataset with ground-truth data containing 169 patients from various ethnic groups and levels of DME.more » This and other two publicly available datasets are employed to evaluate our algorithm. We are able to achieve diagnosis performance comparable to retina experts on the MESSIDOR (an independently labelled dataset with 1200 images) with cross-dataset testing (e.g., the classifier was trained on an independent dataset and tested on MESSIDOR). Our algorithm obtained an AUC between 0.88 and 0.94 depending on the dataset/features used. Additionally, it does not need ground truth at lesion level to reject false positives and is computationally efficient, as it generates a diagnosis on an average of 4.4 s (9.3 s, considering the optic nerve localization) per image on an 2.6 GHz platform with an unoptimized Matlab implementation.« less

  6. A Modified Active Appearance Model Based on an Adaptive Artificial Bee Colony

    PubMed Central

    Othman, Zulaiha Ali

    2014-01-01

    Active appearance model (AAM) is one of the most popular model-based approaches that have been extensively used to extract features by highly accurate modeling of human faces under various physical and environmental circumstances. However, in such active appearance model, fitting the model with original image is a challenging task. State of the art shows that optimization method is applicable to resolve this problem. However, another common problem is applying optimization. Hence, in this paper we propose an AAM based face recognition technique, which is capable of resolving the fitting problem of AAM by introducing a new adaptive ABC algorithm. The adaptation increases the efficiency of fitting as against the conventional ABC algorithm. We have used three datasets: CASIA dataset, property 2.5D face dataset, and UBIRIS v1 images dataset in our experiments. The results have revealed that the proposed face recognition technique has performed effectively, in terms of accuracy of face recognition. PMID:25165748

  7. MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wu, Yu-Wei; Simmons, Blake A.; Singer, Steven W.

    The recovery of genomes from metagenomic datasets is a critical step to defining the functional roles of the underlying uncultivated populations. We previously developed MaxBin, an automated binning approach for high-throughput recovery of microbial genomes from metagenomes. Here, we present an expanded binning algorithm, MaxBin 2.0, which recovers genomes from co-assembly of a collection of metagenomic datasets. Tests on simulated datasets revealed that MaxBin 2.0 is highly accurate in recovering individual genomes, and the application of MaxBin 2.0 to several metagenomes from environmental samples demonstrated that it could achieve two complementary goals: recovering more bacterial genomes compared to binning amore » single sample as well as comparing the microbial community composition between different sampling environments. Availability and implementation: MaxBin 2.0 is freely available at http://sourceforge.net/projects/maxbin/ under BSD license. Supplementary information: Supplementary data are available at Bioinformatics online.« less

  8. Assessing the role of a medication-indication resource in the treatment relation extraction from clinical text

    PubMed Central

    Bejan, Cosmin Adrian; Wei, Wei-Qi; Denny, Joshua C

    2015-01-01

    Objective To evaluate the contribution of the MEDication Indication (MEDI) resource and SemRep for identifying treatment relations in clinical text. Materials and methods We first processed clinical documents with SemRep to extract the Unified Medical Language System (UMLS) concepts and the treatment relations between them. Then, we incorporated MEDI into a simple algorithm that identifies treatment relations between two concepts if they match a medication-indication pair in this resource. For a better coverage, we expanded MEDI using ontology relationships from RxNorm and UMLS Metathesaurus. We also developed two ensemble methods, which combined the predictions of SemRep and the MEDI algorithm. We evaluated our selected methods on two datasets, a Vanderbilt corpus of 6864 discharge summaries and the 2010 Informatics for Integrating Biology and the Bedside (i2b2)/Veteran's Affairs (VA) challenge dataset. Results The Vanderbilt dataset included 958 manually annotated treatment relations. A double annotation was performed on 25% of relations with high agreement (Cohen's κ = 0.86). The evaluation consisted of comparing the manual annotated relations with the relations identified by SemRep, the MEDI algorithm, and the two ensemble methods. On the first dataset, the best F1-measure results achieved by the MEDI algorithm and the union of the two resources (78.7 and 80, respectively) were significantly higher than the SemRep results (72.3). On the second dataset, the MEDI algorithm achieved better precision and significantly lower recall values than the best system in the i2b2 challenge. The two systems obtained comparable F1-measure values on the subset of i2b2 relations with both arguments in MEDI. Conclusions Both SemRep and MEDI can be used to extract treatment relations from clinical text. Knowledge-based extraction with MEDI outperformed use of SemRep alone, but superior performance was achieved by integrating both systems. The integration of knowledge-based resources such as MEDI into information extraction systems such as SemRep and the i2b2 relation extractors may improve treatment relation extraction from clinical text. PMID:25336593

  9. a Fully Automated Pipeline for Classification Tasks with AN Application to Remote Sensing

    NASA Astrophysics Data System (ADS)

    Suzuki, K.; Claesen, M.; Takeda, H.; De Moor, B.

    2016-06-01

    Nowadays deep learning has been intensively in spotlight owing to its great victories at major competitions, which undeservedly pushed `shallow' machine learning methods, relatively naive/handy algorithms commonly used by industrial engineers, to the background in spite of their facilities such as small requisite amount of time/dataset for training. We, with a practical point of view, utilized shallow learning algorithms to construct a learning pipeline such that operators can utilize machine learning without any special knowledge, expensive computation environment, and a large amount of labelled data. The proposed pipeline automates a whole classification process, namely feature-selection, weighting features and the selection of the most suitable classifier with optimized hyperparameters. The configuration facilitates particle swarm optimization, one of well-known metaheuristic algorithms for the sake of generally fast and fine optimization, which enables us not only to optimize (hyper)parameters but also to determine appropriate features/classifier to the problem, which has conventionally been a priori based on domain knowledge and remained untouched or dealt with naïve algorithms such as grid search. Through experiments with the MNIST and CIFAR-10 datasets, common datasets in computer vision field for character recognition and object recognition problems respectively, our automated learning approach provides high performance considering its simple setting (i.e. non-specialized setting depending on dataset), small amount of training data, and practical learning time. Moreover, compared to deep learning the performance stays robust without almost any modification even with a remote sensing object recognition problem, which in turn indicates that there is a high possibility that our approach contributes to general classification problems.

  10. Data Mining and Optimization Tools for Developing Engine Parameters Tools

    NASA Technical Reports Server (NTRS)

    Dhawan, Atam P.

    1998-01-01

    This project was awarded for understanding the problem and developing a plan for Data Mining tools for use in designing and implementing an Engine Condition Monitoring System. From the total budget of $5,000, Tricia and I studied the problem domain for developing ail Engine Condition Monitoring system using the sparse and non-standardized datasets to be available through a consortium at NASA Lewis Research Center. We visited NASA three times to discuss additional issues related to dataset which was not made available to us. We discussed and developed a general framework of data mining and optimization tools to extract useful information from sparse and non-standard datasets. These discussions lead to the training of Tricia Erhardt to develop Genetic Algorithm based search programs which were written in C++ and used to demonstrate the capability of GA algorithm in searching an optimal solution in noisy datasets. From the study and discussion with NASA LERC personnel, we then prepared a proposal, which is being submitted to NASA for future work for the development of data mining algorithms for engine conditional monitoring. The proposed set of algorithm uses wavelet processing for creating multi-resolution pyramid of the data for GA based multi-resolution optimal search. Wavelet processing is proposed to create a coarse resolution representation of data providing two advantages in GA based search: 1. We will have less data to begin with to make search sub-spaces. 2. It will have robustness against the noise because at every level of wavelet based decomposition, we will be decomposing the signal into low pass and high pass filters.

  11. Automatic estimation of heart boundaries and cardiothoracic ratio from chest x-ray images

    NASA Astrophysics Data System (ADS)

    Dallal, Ahmed H.; Agarwal, Chirag; Arbabshirani, Mohammad R.; Patel, Aalpen; Moore, Gregory

    2017-03-01

    Cardiothoracic ratio (CTR) is a widely used radiographic index to assess heart size on chest X-rays (CXRs). Recent studies have suggested that also two-dimensional CTR might contain clinical information about the heart function. However, manual measurement of such indices is both subjective and time consuming. This study proposes a fast algorithm to automatically estimate CTR indices based on CXRs. The algorithm has three main steps: 1) model based lung segmentation, 2) estimation of heart boundaries from lung contours, and 3) computation of cardiothoracic indices from the estimated boundaries. We extended a previously employed lung detection algorithm to automatically estimate heart boundaries without using ground truth heart markings. We used two datasets: a publicly available dataset with 247 images as well as clinical dataset with 167 studies from Geisinger Health System. The models of lung fields are learned from both datasets. The lung regions in a given test image are estimated by registering the learned models to patient CXRs. Then, heart region is estimated by applying Harris operator on segmented lung fields to detect the corner points corresponding to the heart boundaries. The algorithm calculates three indices, CTR1D, CTR2D, and cardiothoracic area ratio (CTAR). The method was tested on 103 clinical CXRs and average error rates of 7.9%, 25.5%, and 26.4% (for CTR1D, CTR2D, and CTAR respectively) were achieved. The proposed method outperforms previous CTR estimation methods without using any heart templates. This method can have important clinical implications as it can provide fast and accurate estimate of cardiothoracic indices.

  12. Efficient algorithms for accurate hierarchical clustering of huge datasets: tackling the entire protein space.

    PubMed

    Loewenstein, Yaniv; Portugaly, Elon; Fromer, Menachem; Linial, Michal

    2008-07-01

    UPGMA (average linking) is probably the most popular algorithm for hierarchical data clustering, especially in computational biology. However, UPGMA requires the entire dissimilarity matrix in memory. Due to this prohibitive requirement, UPGMA is not scalable to very large datasets. We present a novel class of memory-constrained UPGMA (MC-UPGMA) algorithms. Given any practical memory size constraint, this framework guarantees the correct clustering solution without explicitly requiring all dissimilarities in memory. The algorithms are general and are applicable to any dataset. We present a data-dependent characterization of hardness and clustering efficiency. The presented concepts are applicable to any agglomerative clustering formulation. We apply our algorithm to the entire collection of protein sequences, to automatically build a comprehensive evolutionary-driven hierarchy of proteins from sequence alone. The newly created tree captures protein families better than state-of-the-art large-scale methods such as CluSTr, ProtoNet4 or single-linkage clustering. We demonstrate that leveraging the entire mass embodied in all sequence similarities allows to significantly improve on current protein family clusterings which are unable to directly tackle the sheer mass of this data. Furthermore, we argue that non-metric constraints are an inherent complexity of the sequence space and should not be overlooked. The robustness of UPGMA allows significant improvement, especially for multidomain proteins, and for large or divergent families. A comprehensive tree built from all UniProt sequence similarities, together with navigation and classification tools will be made available as part of the ProtoNet service. A C++ implementation of the algorithm is available on request.

  13. Detecting treatment-subgroup interactions in clustered data with generalized linear mixed-effects model trees.

    PubMed

    Fokkema, M; Smits, N; Zeileis, A; Hothorn, T; Kelderman, H

    2017-10-25

    Identification of subgroups of patients for whom treatment A is more effective than treatment B, and vice versa, is of key importance to the development of personalized medicine. Tree-based algorithms are helpful tools for the detection of such interactions, but none of the available algorithms allow for taking into account clustered or nested dataset structures, which are particularly common in psychological research. Therefore, we propose the generalized linear mixed-effects model tree (GLMM tree) algorithm, which allows for the detection of treatment-subgroup interactions, while accounting for the clustered structure of a dataset. The algorithm uses model-based recursive partitioning to detect treatment-subgroup interactions, and a GLMM to estimate the random-effects parameters. In a simulation study, GLMM trees show higher accuracy in recovering treatment-subgroup interactions, higher predictive accuracy, and lower type II error rates than linear-model-based recursive partitioning and mixed-effects regression trees. Also, GLMM trees show somewhat higher predictive accuracy than linear mixed-effects models with pre-specified interaction effects, on average. We illustrate the application of GLMM trees on an individual patient-level data meta-analysis on treatments for depression. We conclude that GLMM trees are a promising exploratory tool for the detection of treatment-subgroup interactions in clustered datasets.

  14. A novel approach for incremental uncertainty rule generation from databases with missing values handling: application to dynamic medical databases.

    PubMed

    Konias, Sokratis; Chouvarda, Ioanna; Vlahavas, Ioannis; Maglaveras, Nicos

    2005-09-01

    Current approaches for mining association rules usually assume that the mining is performed in a static database, where the problem of missing attribute values does not practically exist. However, these assumptions are not preserved in some medical databases, like in a home care system. In this paper, a novel uncertainty rule algorithm is illustrated, namely URG-2 (Uncertainty Rule Generator), which addresses the problem of mining dynamic databases containing missing values. This algorithm requires only one pass from the initial dataset in order to generate the item set, while new metrics corresponding to the notion of Support and Confidence are used. URG-2 was evaluated over two medical databases, introducing randomly multiple missing values for each record's attribute (rate: 5-20% by 5% increments) in the initial dataset. Compared with the classical approach (records with missing values are ignored), the proposed algorithm was more robust in mining rules from datasets containing missing values. In all cases, the difference in preserving the initial rules ranged between 30% and 60% in favour of URG-2. Moreover, due to its incremental nature, URG-2 saved over 90% of the time required for thorough re-mining. Thus, the proposed algorithm can offer a preferable solution for mining in dynamic relational databases.

  15. A Monocular Vision Sensor-Based Obstacle Detection Algorithm for Autonomous Robots

    PubMed Central

    Lee, Tae-Jae; Yi, Dong-Hoon; Cho, Dong-Il “Dan”

    2016-01-01

    This paper presents a monocular vision sensor-based obstacle detection algorithm for autonomous robots. Each individual image pixel at the bottom region of interest is labeled as belonging either to an obstacle or the floor. While conventional methods depend on point tracking for geometric cues for obstacle detection, the proposed algorithm uses the inverse perspective mapping (IPM) method. This method is much more advantageous when the camera is not high off the floor, which makes point tracking near the floor difficult. Markov random field-based obstacle segmentation is then performed using the IPM results and a floor appearance model. Next, the shortest distance between the robot and the obstacle is calculated. The algorithm is tested by applying it to 70 datasets, 20 of which include nonobstacle images where considerable changes in floor appearance occur. The obstacle segmentation accuracies and the distance estimation error are quantitatively analyzed. For obstacle datasets, the segmentation precision and the average distance estimation error of the proposed method are 81.4% and 1.6 cm, respectively, whereas those for a conventional method are 57.5% and 9.9 cm, respectively. For nonobstacle datasets, the proposed method gives 0.0% false positive rates, while the conventional method gives 17.6%. PMID:26938540

  16. mRMR-ABC: A Hybrid Gene Selection Algorithm for Cancer Classification Using Microarray Gene Expression Profiling

    PubMed Central

    Alshamlan, Hala; Badr, Ghada; Alohali, Yousef

    2015-01-01

    An artificial bee colony (ABC) is a relatively recent swarm intelligence optimization approach. In this paper, we propose the first attempt at applying ABC algorithm in analyzing a microarray gene expression profile. In addition, we propose an innovative feature selection algorithm, minimum redundancy maximum relevance (mRMR), and combine it with an ABC algorithm, mRMR-ABC, to select informative genes from microarray profile. The new approach is based on a support vector machine (SVM) algorithm to measure the classification accuracy for selected genes. We evaluate the performance of the proposed mRMR-ABC algorithm by conducting extensive experiments on six binary and multiclass gene expression microarray datasets. Furthermore, we compare our proposed mRMR-ABC algorithm with previously known techniques. We reimplemented two of these techniques for the sake of a fair comparison using the same parameters. These two techniques are mRMR when combined with a genetic algorithm (mRMR-GA) and mRMR when combined with a particle swarm optimization algorithm (mRMR-PSO). The experimental results prove that the proposed mRMR-ABC algorithm achieves accurate classification performance using small number of predictive genes when tested using both datasets and compared to previously suggested methods. This shows that mRMR-ABC is a promising approach for solving gene selection and cancer classification problems. PMID:25961028

  17. mRMR-ABC: A Hybrid Gene Selection Algorithm for Cancer Classification Using Microarray Gene Expression Profiling.

    PubMed

    Alshamlan, Hala; Badr, Ghada; Alohali, Yousef

    2015-01-01

    An artificial bee colony (ABC) is a relatively recent swarm intelligence optimization approach. In this paper, we propose the first attempt at applying ABC algorithm in analyzing a microarray gene expression profile. In addition, we propose an innovative feature selection algorithm, minimum redundancy maximum relevance (mRMR), and combine it with an ABC algorithm, mRMR-ABC, to select informative genes from microarray profile. The new approach is based on a support vector machine (SVM) algorithm to measure the classification accuracy for selected genes. We evaluate the performance of the proposed mRMR-ABC algorithm by conducting extensive experiments on six binary and multiclass gene expression microarray datasets. Furthermore, we compare our proposed mRMR-ABC algorithm with previously known techniques. We reimplemented two of these techniques for the sake of a fair comparison using the same parameters. These two techniques are mRMR when combined with a genetic algorithm (mRMR-GA) and mRMR when combined with a particle swarm optimization algorithm (mRMR-PSO). The experimental results prove that the proposed mRMR-ABC algorithm achieves accurate classification performance using small number of predictive genes when tested using both datasets and compared to previously suggested methods. This shows that mRMR-ABC is a promising approach for solving gene selection and cancer classification problems.

  18. A Locally Adaptive Regularization Based on Anisotropic Diffusion for Deformable Image Registration of Sliding Organs

    PubMed Central

    Pace, Danielle F.; Aylward, Stephen R.; Niethammer, Marc

    2014-01-01

    We propose a deformable image registration algorithm that uses anisotropic smoothing for regularization to find correspondences between images of sliding organs. In particular, we apply the method for respiratory motion estimation in longitudinal thoracic and abdominal computed tomography scans. The algorithm uses locally adaptive diffusion tensors to determine the direction and magnitude with which to smooth the components of the displacement field that are normal and tangential to an expected sliding boundary. Validation was performed using synthetic, phantom, and 14 clinical datasets, including the publicly available DIR-Lab dataset. We show that motion discontinuities caused by sliding can be effectively recovered, unlike conventional regularizations that enforce globally smooth motion. In the clinical datasets, target registration error showed improved accuracy for lung landmarks compared to the diffusive regularization. We also present a generalization of our algorithm to other sliding geometries, including sliding tubes (e.g., needles sliding through tissue, or contrast agent flowing through a vessel). Potential clinical applications of this method include longitudinal change detection and radiotherapy for lung or abdominal tumours, especially those near the chest or abdominal wall. PMID:23899632

  19. A locally adaptive regularization based on anisotropic diffusion for deformable image registration of sliding organs.

    PubMed

    Pace, Danielle F; Aylward, Stephen R; Niethammer, Marc

    2013-11-01

    We propose a deformable image registration algorithm that uses anisotropic smoothing for regularization to find correspondences between images of sliding organs. In particular, we apply the method for respiratory motion estimation in longitudinal thoracic and abdominal computed tomography scans. The algorithm uses locally adaptive diffusion tensors to determine the direction and magnitude with which to smooth the components of the displacement field that are normal and tangential to an expected sliding boundary. Validation was performed using synthetic, phantom, and 14 clinical datasets, including the publicly available DIR-Lab dataset. We show that motion discontinuities caused by sliding can be effectively recovered, unlike conventional regularizations that enforce globally smooth motion. In the clinical datasets, target registration error showed improved accuracy for lung landmarks compared to the diffusive regularization. We also present a generalization of our algorithm to other sliding geometries, including sliding tubes (e.g., needles sliding through tissue, or contrast agent flowing through a vessel). Potential clinical applications of this method include longitudinal change detection and radiotherapy for lung or abdominal tumours, especially those near the chest or abdominal wall.

  20. A ground truth based comparative study on clustering of gene expression data.

    PubMed

    Zhu, Yitan; Wang, Zuyi; Miller, David J; Clarke, Robert; Xuan, Jianhua; Hoffman, Eric P; Wang, Yue

    2008-05-01

    Given the variety of available clustering methods for gene expression data analysis, it is important to develop an appropriate and rigorous validation scheme to assess the performance and limitations of the most widely used clustering algorithms. In this paper, we present a ground truth based comparative study on the functionality, accuracy, and stability of five data clustering methods, namely hierarchical clustering, K-means clustering, self-organizing maps, standard finite normal mixture fitting, and a caBIG toolkit (VIsual Statistical Data Analyzer--VISDA), tested on sample clustering of seven published microarray gene expression datasets and one synthetic dataset. We examined the performance of these algorithms in both data-sufficient and data-insufficient cases using quantitative performance measures, including cluster number detection accuracy and mean and standard deviation of partition accuracy. The experimental results showed that VISDA, an interactive coarse-to-fine maximum likelihood fitting algorithm, is a solid performer on most of the datasets, while K-means clustering and self-organizing maps optimized by the mean squared compactness criterion generally produce more stable solutions than the other methods.

  1. A Hybrid Spectral Clustering and Deep Neural Network Ensemble Algorithm for Intrusion Detection in Sensor Networks

    PubMed Central

    Ma, Tao; Wang, Fen; Cheng, Jianjun; Yu, Yang; Chen, Xiaoyun

    2016-01-01

    The development of intrusion detection systems (IDS) that are adapted to allow routers and network defence systems to detect malicious network traffic disguised as network protocols or normal access is a critical challenge. This paper proposes a novel approach called SCDNN, which combines spectral clustering (SC) and deep neural network (DNN) algorithms. First, the dataset is divided into k subsets based on sample similarity using cluster centres, as in SC. Next, the distance between data points in a testing set and the training set is measured based on similarity features and is fed into the deep neural network algorithm for intrusion detection. Six KDD-Cup99 and NSL-KDD datasets and a sensor network dataset were employed to test the performance of the model. These experimental results indicate that the SCDNN classifier not only performs better than backpropagation neural network (BPNN), support vector machine (SVM), random forest (RF) and Bayes tree models in detection accuracy and the types of abnormal attacks found. It also provides an effective tool of study and analysis of intrusion detection in large networks. PMID:27754380

  2. A Hybrid Spectral Clustering and Deep Neural Network Ensemble Algorithm for Intrusion Detection in Sensor Networks.

    PubMed

    Ma, Tao; Wang, Fen; Cheng, Jianjun; Yu, Yang; Chen, Xiaoyun

    2016-10-13

    The development of intrusion detection systems (IDS) that are adapted to allow routers and network defence systems to detect malicious network traffic disguised as network protocols or normal access is a critical challenge. This paper proposes a novel approach called SCDNN, which combines spectral clustering (SC) and deep neural network (DNN) algorithms. First, the dataset is divided into k subsets based on sample similarity using cluster centres, as in SC. Next, the distance between data points in a testing set and the training set is measured based on similarity features and is fed into the deep neural network algorithm for intrusion detection. Six KDD-Cup99 and NSL-KDD datasets and a sensor network dataset were employed to test the performance of the model. These experimental results indicate that the SCDNN classifier not only performs better than backpropagation neural network (BPNN), support vector machine (SVM), random forest (RF) and Bayes tree models in detection accuracy and the types of abnormal attacks found. It also provides an effective tool of study and analysis of intrusion detection in large networks.

  3. Multi-label classification of chronically ill patients with bag of words and supervised dimensionality reduction algorithms.

    PubMed

    Bromuri, Stefano; Zufferey, Damien; Hennebert, Jean; Schumacher, Michael

    2014-10-01

    This research is motivated by the issue of classifying illnesses of chronically ill patients for decision support in clinical settings. Our main objective is to propose multi-label classification of multivariate time series contained in medical records of chronically ill patients, by means of quantization methods, such as bag of words (BoW), and multi-label classification algorithms. Our second objective is to compare supervised dimensionality reduction techniques to state-of-the-art multi-label classification algorithms. The hypothesis is that kernel methods and locality preserving projections make such algorithms good candidates to study multi-label medical time series. We combine BoW and supervised dimensionality reduction algorithms to perform multi-label classification on health records of chronically ill patients. The considered algorithms are compared with state-of-the-art multi-label classifiers in two real world datasets. Portavita dataset contains 525 diabetes type 2 (DT2) patients, with co-morbidities of DT2 such as hypertension, dyslipidemia, and microvascular or macrovascular issues. MIMIC II dataset contains 2635 patients affected by thyroid disease, diabetes mellitus, lipoid metabolism disease, fluid electrolyte disease, hypertensive disease, thrombosis, hypotension, chronic obstructive pulmonary disease (COPD), liver disease and kidney disease. The algorithms are evaluated using multi-label evaluation metrics such as hamming loss, one error, coverage, ranking loss, and average precision. Non-linear dimensionality reduction approaches behave well on medical time series quantized using the BoW algorithm, with results comparable to state-of-the-art multi-label classification algorithms. Chaining the projected features has a positive impact on the performance of the algorithm with respect to pure binary relevance approaches. The evaluation highlights the feasibility of representing medical health records using the BoW for multi-label classification tasks. The study also highlights that dimensionality reduction algorithms based on kernel methods, locality preserving projections or both are good candidates to deal with multi-label classification tasks in medical time series with many missing values and high label density. Copyright © 2014 Elsevier Inc. All rights reserved.

  4. Inferring consistent functional interaction patterns from natural stimulus FMRI data

    PubMed Central

    Sun, Jiehuan; Hu, Xintao; Huang, Xiu; Liu, Yang; Li, Kaiming; Li, Xiang; Han, Junwei; Guo, Lei

    2014-01-01

    There has been increasing interest in how the human brain responds to natural stimulus such as video watching in the neuroimaging field. Along this direction, this paper presents our effort in inferring consistent and reproducible functional interaction patterns under natural stimulus of video watching among known functional brain regions identified by task-based fMRI. Then, we applied and compared four statistical approaches, including Bayesian network modeling with searching algorithms: greedy equivalence search (GES), Peter and Clark (PC) analysis, independent multiple greedy equivalence search (IMaGES), and the commonly used Granger causality analysis (GCA), to infer consistent and reproducible functional interaction patterns among these brain regions. It is interesting that a number of reliable and consistent functional interaction patterns were identified by the GES, PC and IMaGES algorithms in different participating subjects when they watched multiple video shots of the same semantic category. These interaction patterns are meaningful given current neuroscience knowledge and are reasonably reproducible across different brains and video shots. In particular, these consistent functional interaction patterns are supported by structural connections derived from diffusion tensor imaging (DTI) data, suggesting the structural underpinnings of consistent functional interactions. Our work demonstrates that specific consistent patterns of functional interactions among relevant brain regions might reflect the brain's fundamental mechanisms of online processing and comprehension of video messages. PMID:22440644

  5. How similar are forest disturbance maps derived from different Landsat time series algorithms?

    Treesearch

    Warren B. Cohen; Sean P. Healey; Zhiqiang Yang; Stephen V. Stehman; C. Kenneth Brewer; Evan B. Brooks; Noel Gorelick; Chengqaun Huang; M. Joseph Hughes; Robert E. Kennedy; Thomas R. Loveland; Gretchen G. Moisen; Todd A. Schroeder; James E. Vogelmann; Curtis E. Woodcock; Limin Yang; Zhe Zhu

    2017-01-01

    Disturbance is a critical ecological process in forested systems, and disturbance maps are important for understanding forest dynamics. Landsat data are a key remote sensing dataset for monitoring forest disturbance and there recently has been major growth in the development of disturbance mapping algorithms. Many of these algorithms take advantage of the high temporal...

  6. Using Data-Driven Model-Brain Mappings to Constrain Formal Models of Cognition

    PubMed Central

    Borst, Jelmer P.; Nijboer, Menno; Taatgen, Niels A.; van Rijn, Hedderik; Anderson, John R.

    2015-01-01

    In this paper we propose a method to create data-driven mappings from components of cognitive models to brain regions. Cognitive models are notoriously hard to evaluate, especially based on behavioral measures alone. Neuroimaging data can provide additional constraints, but this requires a mapping from model components to brain regions. Although such mappings can be based on the experience of the modeler or on a reading of the literature, a formal method is preferred to prevent researcher-based biases. In this paper we used model-based fMRI analysis to create a data-driven model-brain mapping for five modules of the ACT-R cognitive architecture. We then validated this mapping by applying it to two new datasets with associated models. The new mapping was at least as powerful as an existing mapping that was based on the literature, and indicated where the models were supported by the data and where they have to be improved. We conclude that data-driven model-brain mappings can provide strong constraints on cognitive models, and that model-based fMRI is a suitable way to create such mappings. PMID:25747601

  7. Image processing and Quality Control for the first 10,000 brain imaging datasets from UK Biobank.

    PubMed

    Alfaro-Almagro, Fidel; Jenkinson, Mark; Bangerter, Neal K; Andersson, Jesper L R; Griffanti, Ludovica; Douaud, Gwenaëlle; Sotiropoulos, Stamatios N; Jbabdi, Saad; Hernandez-Fernandez, Moises; Vallee, Emmanuel; Vidaurre, Diego; Webster, Matthew; McCarthy, Paul; Rorden, Christopher; Daducci, Alessandro; Alexander, Daniel C; Zhang, Hui; Dragonu, Iulius; Matthews, Paul M; Miller, Karla L; Smith, Stephen M

    2018-02-01

    UK Biobank is a large-scale prospective epidemiological study with all data accessible to researchers worldwide. It is currently in the process of bringing back 100,000 of the original participants for brain, heart and body MRI, carotid ultrasound and low-dose bone/fat x-ray. The brain imaging component covers 6 modalities (T1, T2 FLAIR, susceptibility weighted MRI, Resting fMRI, Task fMRI and Diffusion MRI). Raw and processed data from the first 10,000 imaged subjects has recently been released for general research access. To help convert this data into useful summary information we have developed an automated processing and QC (Quality Control) pipeline that is available for use by other researchers. In this paper we describe the pipeline in detail, following a brief overview of UK Biobank brain imaging and the acquisition protocol. We also describe several quantitative investigations carried out as part of the development of both the imaging protocol and the processing pipeline. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.

  8. Test-retest resting-state fMRI in healthy elderly persons with a family history of Alzheimer's disease.

    PubMed

    Orban, Pierre; Madjar, Cécile; Savard, Mélissa; Dansereau, Christian; Tam, Angela; Das, Samir; Evans, Alan C; Rosa-Neto, Pedro; Breitner, John C S; Bellec, Pierre

    2015-01-01

    We present a test-retest dataset of resting-state fMRI data obtained in 80 cognitively normal elderly volunteers enrolled in the "Pre-symptomatic Evaluation of Novel or Experimental Treatments for Alzheimer's Disease" (PREVENT-AD) Cohort. Subjects with a family history of Alzheimer's disease in first-degree relatives were recruited as part of an on-going double blind randomized clinical trial of Naproxen or placebo. Two pairs of scans were acquired ~3 months apart, allowing the assessment of both intra- and inter-session reliability, with the possible caveat of treatment effects as a source of inter-session variation. Using the NeuroImaging Analysis Kit (NIAK), we report on the standard quality of co-registration and motion parameters of the data, and assess their validity based on the spatial distribution of seed-based connectivity maps as well as intra- and inter-session reliability metrics in the default-mode network. This resource, released publicly as sample UM1 of the Consortium for Reliability and Reproducibility (CoRR), will benefit future studies focusing on the preclinical period preceding the appearance of dementia in Alzheimer's disease.

  9. Benchmarking Commercial Conformer Ensemble Generators.

    PubMed

    Friedrich, Nils-Ole; de Bruyn Kops, Christina; Flachsenberg, Florian; Sommer, Kai; Rarey, Matthias; Kirchmair, Johannes

    2017-11-27

    We assess and compare the performance of eight commercial conformer ensemble generators (ConfGen, ConfGenX, cxcalc, iCon, MOE LowModeMD, MOE Stochastic, MOE Conformation Import, and OMEGA) and one leading free algorithm, the distance geometry algorithm implemented in RDKit. The comparative study is based on a new version of the Platinum Diverse Dataset, a high-quality benchmarking dataset of 2859 protein-bound ligand conformations extracted from the PDB. Differences in the performance of commercial algorithms are much smaller than those observed for free algorithms in our previous study (J. Chem. Inf. 2017, 57, 529-539). For commercial algorithms, the median minimum root-mean-square deviations measured between protein-bound ligand conformations and ensembles of a maximum of 250 conformers are between 0.46 and 0.61 Å. Commercial conformer ensemble generators are characterized by their high robustness, with at least 99% of all input molecules successfully processed and few or even no substantial geometrical errors detectable in their output conformations. The RDKit distance geometry algorithm (with minimization enabled) appears to be a good free alternative since its performance is comparable to that of the midranked commercial algorithms. Based on a statistical analysis, we elaborate on which algorithms to use and how to parametrize them for best performance in different application scenarios.

  10. Comparison of Different Machine Learning Algorithms for Lithological Mapping Using Remote Sensing Data and Morphological Features: A Case Study in Kurdistan Region, NE Iraq

    NASA Astrophysics Data System (ADS)

    Othman, Arsalan; Gloaguen, Richard

    2015-04-01

    Topographic effects and complex vegetation cover hinder lithology classification in mountain regions based not only in field, but also in reflectance remote sensing data. The area of interest "Bardi-Zard" is located in the NE of Iraq. It is part of the Zagros orogenic belt, where seven lithological units outcrop and is known for its chromite deposit. The aim of this study is to compare three machine learning algorithms (MLAs): Maximum Likelihood (ML), Support Vector Machines (SVM), and Random Forest (RF) in the context of a supervised lithology classification task using Advanced Space-borne Thermal Emission and Reflection radiometer (ASTER) satellite, its derived, spatial information (spatial coordinates) and geomorphic data. We emphasize the enhancement in remote sensing lithological mapping accuracy that arises from the integration of geomorphic features and spatial information (spatial coordinates) in classifications. This study identifies that RF is better than ML and SVM algorithms in almost the sixteen combination datasets, which were tested. The overall accuracy of the best dataset combination with the RF map for the all seven classes reach ~80% and the producer and user's accuracies are ~73.91% and 76.09% respectively while the kappa coefficient is ~0.76. TPI is more effective with SVM algorithm than an RF algorithm. This paper demonstrates that adding geomorphic indices such as TPI and spatial information in the dataset increases the lithological classification accuracy.

  11. FHSA-SED: Two-Locus Model Detection for Genome-Wide Association Study with Harmony Search Algorithm.

    PubMed

    Tuo, Shouheng; Zhang, Junying; Yuan, Xiguo; Zhang, Yuanyuan; Liu, Zhaowen

    2016-01-01

    Two-locus model is a typical significant disease model to be identified in genome-wide association study (GWAS). Due to intensive computational burden and diversity of disease models, existing methods have drawbacks on low detection power, high computation cost, and preference for some types of disease models. In this study, two scoring functions (Bayesian network based K2-score and Gini-score) are used for characterizing two SNP locus as a candidate model, the two criteria are adopted simultaneously for improving identification power and tackling the preference problem to disease models. Harmony search algorithm (HSA) is improved for quickly finding the most likely candidate models among all two-locus models, in which a local search algorithm with two-dimensional tabu table is presented to avoid repeatedly evaluating some disease models that have strong marginal effect. Finally G-test statistic is used to further test the candidate models. We investigate our method named FHSA-SED on 82 simulated datasets and a real AMD dataset, and compare it with two typical methods (MACOED and CSE) which have been developed recently based on swarm intelligent search algorithm. The results of simulation experiments indicate that our method outperforms the two compared algorithms in terms of detection power, computation time, evaluation times, sensitivity (TPR), specificity (SPC), positive predictive value (PPV) and accuracy (ACC). Our method has identified two SNPs (rs3775652 and rs10511467) that may be also associated with disease in AMD dataset.

  12. FHSA-SED: Two-Locus Model Detection for Genome-Wide Association Study with Harmony Search Algorithm

    PubMed Central

    Tuo, Shouheng; Zhang, Junying; Yuan, Xiguo; Zhang, Yuanyuan; Liu, Zhaowen

    2016-01-01

    Motivation Two-locus model is a typical significant disease model to be identified in genome-wide association study (GWAS). Due to intensive computational burden and diversity of disease models, existing methods have drawbacks on low detection power, high computation cost, and preference for some types of disease models. Method In this study, two scoring functions (Bayesian network based K2-score and Gini-score) are used for characterizing two SNP locus as a candidate model, the two criteria are adopted simultaneously for improving identification power and tackling the preference problem to disease models. Harmony search algorithm (HSA) is improved for quickly finding the most likely candidate models among all two-locus models, in which a local search algorithm with two-dimensional tabu table is presented to avoid repeatedly evaluating some disease models that have strong marginal effect. Finally G-test statistic is used to further test the candidate models. Results We investigate our method named FHSA-SED on 82 simulated datasets and a real AMD dataset, and compare it with two typical methods (MACOED and CSE) which have been developed recently based on swarm intelligent search algorithm. The results of simulation experiments indicate that our method outperforms the two compared algorithms in terms of detection power, computation time, evaluation times, sensitivity (TPR), specificity (SPC), positive predictive value (PPV) and accuracy (ACC). Our method has identified two SNPs (rs3775652 and rs10511467) that may be also associated with disease in AMD dataset. PMID:27014873

  13. Configurable pattern-based evolutionary biclustering of gene expression data

    PubMed Central

    2013-01-01

    Background Biclustering algorithms for microarray data aim at discovering functionally related gene sets under different subsets of experimental conditions. Due to the problem complexity and the characteristics of microarray datasets, heuristic searches are usually used instead of exhaustive algorithms. Also, the comparison among different techniques is still a challenge. The obtained results vary in relevant features such as the number of genes or conditions, which makes it difficult to carry out a fair comparison. Moreover, existing approaches do not allow the user to specify any preferences on these properties. Results Here, we present the first biclustering algorithm in which it is possible to particularize several biclusters features in terms of different objectives. This can be done by tuning the specified features in the algorithm or also by incorporating new objectives into the search. Furthermore, our approach bases the bicluster evaluation in the use of expression patterns, being able to recognize both shifting and scaling patterns either simultaneously or not. Evolutionary computation has been chosen as the search strategy, naming thus our proposal Evo-Bexpa (Evolutionary Biclustering based in Expression Patterns). Conclusions We have conducted experiments on both synthetic and real datasets demonstrating Evo-Bexpa abilities to obtain meaningful biclusters. Synthetic experiments have been designed in order to compare Evo-Bexpa performance with other approaches when looking for perfect patterns. Experiments with four different real datasets also confirm the proper performing of our algorithm, whose results have been biologically validated through Gene Ontology. PMID:23433178

  14. Lessons learned and way forward from 6 years of Aerosol_cci

    NASA Astrophysics Data System (ADS)

    Popp, Thomas; de Leeuw, Gerrit; Pinnock, Simon

    2017-04-01

    Within the ESA Climate Change Initiative (CCI) Aerosol_cci (2010 - 2017) conducts intensive work to improve and qualify algorithms for the retrieval of aerosol information from European sensors. Meanwhile, several validated (multi-) decadal time series of different aerosol parameters from complementary sensors are available: Aerosol Optical Depth (AOD), stratospheric extinction profiles, a qualitative Absorbing Aerosol Index (AAI), fine mode AOD, mineral dust AOD; absorption information and aerosol layer height are in an evaluation phase and the multi-pixel GRASP algorithm for the POLDER instrument is used for selected regions. Validation (vs. AERONET, MAN) and inter-comparison to other satellite datasets (MODIS, MISR, SeaWIFS) proved the high quality of the available datasets comparable to other satellite retrievals and revealed needs for algorithm improvement (for example for higher AOD values) which were taken into account in an iterative evolution cycle. The datasets contain pixel level uncertainty estimates which were also validated and improved in the reprocessing. The use of an ensemble method was tested, where several algorithms are applied to the same sensor. The presentation will summarize and discuss the lessons learned from the 6 years of intensive collaboration and highlight major achievements (significantly improved AOD quality, fine mode AOD, dust AOD, pixel level uncertainties, ensemble approach); also limitations and remaining deficits shall be discussed. An outlook will discuss the way forward for the continuous algorithm improvement and re-processing together with opportunities for time series extension with successor instruments of the Sentinel family and the complementarity of the different satellite aerosol products.

  15. Design of multiple sequence alignment algorithms on parallel, distributed memory supercomputers.

    PubMed

    Church, Philip C; Goscinski, Andrzej; Holt, Kathryn; Inouye, Michael; Ghoting, Amol; Makarychev, Konstantin; Reumann, Matthias

    2011-01-01

    The challenge of comparing two or more genomes that have undergone recombination and substantial amounts of segmental loss and gain has recently been addressed for small numbers of genomes. However, datasets of hundreds of genomes are now common and their sizes will only increase in the future. Multiple sequence alignment of hundreds of genomes remains an intractable problem due to quadratic increases in compute time and memory footprint. To date, most alignment algorithms are designed for commodity clusters without parallelism. Hence, we propose the design of a multiple sequence alignment algorithm on massively parallel, distributed memory supercomputers to enable research into comparative genomics on large data sets. Following the methodology of the sequential progressiveMauve algorithm, we design data structures including sequences and sorted k-mer lists on the IBM Blue Gene/P supercomputer (BG/P). Preliminary results show that we can reduce the memory footprint so that we can potentially align over 250 bacterial genomes on a single BG/P compute node. We verify our results on a dataset of E.coli, Shigella and S.pneumoniae genomes. Our implementation returns results matching those of the original algorithm but in 1/2 the time and with 1/4 the memory footprint for scaffold building. In this study, we have laid the basis for multiple sequence alignment of large-scale datasets on a massively parallel, distributed memory supercomputer, thus enabling comparison of hundreds instead of a few genome sequences within reasonable time.

  16. Tree-based approach for exploring marine spatial patterns with raster datasets.

    PubMed

    Liao, Xiaohan; Xue, Cunjin; Su, Fenzhen

    2017-01-01

    From multiple raster datasets to spatial association patterns, the data-mining technique is divided into three subtasks, i.e., raster dataset pretreatment, mining algorithm design, and spatial pattern exploration from the mining results. Comparison with the former two subtasks reveals that the latter remains unresolved. Confronted with the interrelated marine environmental parameters, we propose a Tree-based Approach for eXploring Marine Spatial Patterns with multiple raster datasets called TAXMarSP, which includes two models. One is the Tree-based Cascading Organization Model (TCOM), and the other is the Spatial Neighborhood-based CAlculation Model (SNCAM). TCOM designs the "Spatial node→Pattern node" from top to bottom layers to store the table-formatted frequent patterns. Together with TCOM, SNCAM considers the spatial neighborhood contributions to calculate the pattern-matching degree between the specified marine parameters and the table-formatted frequent patterns and then explores the marine spatial patterns. Using the prevalent quantification Apriori algorithm and a real remote sensing dataset from January 1998 to December 2014, a successful application of TAXMarSP to marine spatial patterns in the Pacific Ocean is described, and the obtained marine spatial patterns present not only the well-known but also new patterns to Earth scientists.

  17. Outlier Removal in Model-Based Missing Value Imputation for Medical Datasets.

    PubMed

    Huang, Min-Wei; Lin, Wei-Chao; Tsai, Chih-Fong

    2018-01-01

    Many real-world medical datasets contain some proportion of missing (attribute) values. In general, missing value imputation can be performed to solve this problem, which is to provide estimations for the missing values by a reasoning process based on the (complete) observed data. However, if the observed data contain some noisy information or outliers, the estimations of the missing values may not be reliable or may even be quite different from the real values. The aim of this paper is to examine whether a combination of instance selection from the observed data and missing value imputation offers better performance than performing missing value imputation alone. In particular, three instance selection algorithms, DROP3, GA, and IB3, and three imputation algorithms, KNNI, MLP, and SVM, are used in order to find out the best combination. The experimental results show that that performing instance selection can have a positive impact on missing value imputation over the numerical data type of medical datasets, and specific combinations of instance selection and imputation methods can improve the imputation results over the mixed data type of medical datasets. However, instance selection does not have a definitely positive impact on the imputation result for categorical medical datasets.

  18. Enhanced subject-specific resting-state network detection and extraction with fast fMRI.

    PubMed

    Akin, Burak; Lee, Hsu-Lei; Hennig, Jürgen; LeVan, Pierre

    2017-02-01

    Resting-state networks have become an important tool for the study of brain function. An ultra-fast imaging technique that allows to measure brain function, called Magnetic Resonance Encephalography (MREG), achieves an order of magnitude higher temporal resolution than standard echo-planar imaging (EPI). This new sequence helps to correct physiological artifacts and improves the sensitivity of the fMRI analysis. In this study, EPI is compared with MREG in terms of capability to extract resting-state networks. Healthy controls underwent two consecutive resting-state scans, one with EPI and the other with MREG. Subject-level independent component analyses (ICA) were performed separately for each of the two datasets. Using Stanford FIND atlas parcels as network templates, the presence of ICA maps corresponding to each network was quantified in each subject. The number of detected individual networks was significantly higher in the MREG data set than for EPI. Moreover, using short time segments of MREG data, such as 50 seconds, one can still detect and track consistent networks. Fast fMRI thus results in an increased capability to extract distinct functional regions at the individual subject level for the same scan times, and also allow the extraction of consistent networks within shorter time intervals than when using EPI, which is notably relevant for the analysis of dynamic functional connectivity fluctuations. Hum Brain Mapp 38:817-830, 2017. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

  19. Functional magnetic resonance imaging phase synchronization as a measure of dynamic functional connectivity.

    PubMed

    Glerean, Enrico; Salmi, Juha; Lahnakoski, Juha M; Jääskeläinen, Iiro P; Sams, Mikko

    2012-01-01

    Functional brain activity and connectivity have been studied by calculating intersubject and seed-based correlations of hemodynamic data acquired with functional magnetic resonance imaging (fMRI). To inspect temporal dynamics, these correlation measures have been calculated over sliding time windows with necessary restrictions on the length of the temporal window that compromises the temporal resolution. Here, we show that it is possible to increase temporal resolution by using instantaneous phase synchronization (PS) as a measure of dynamic (time-varying) functional connectivity. We applied PS on an fMRI dataset obtained while 12 healthy volunteers watched a feature film. Narrow frequency band (0.04-0.07 Hz) was used in the PS analysis to avoid artifactual results. We defined three metrics for computing time-varying functional connectivity and time-varying intersubject reliability based on estimation of instantaneous PS across the subjects: (1) seed-based PS, (2) intersubject PS, and (3) intersubject seed-based PS. Our findings show that these PS-based metrics yield results consistent with both seed-based correlation and intersubject correlation methods when inspected over the whole time series, but provide an important advantage of maximal single-TR temporal resolution. These metrics can be applied both in studies with complex naturalistic stimuli (e.g., watching a movie or listening to music in the MRI scanner) and more controlled (e.g., event-related or blocked design) paradigms. A MATLAB toolbox FUNPSY ( http://becs.aalto.fi/bml/software.html ) is openly available for using these metrics in fMRI data analysis.

  20. Underconnectivity of the superior temporal sulcus predicts emotion recognition deficits in autism

    PubMed Central

    Woolley, Daniel G.; Steyaert, Jean; Di Martino, Adriana; Swinnen, Stephan P.; Wenderoth, Nicole

    2014-01-01

    Neurodevelopmental disconnections have been assumed to cause behavioral alterations in autism spectrum disorders (ASDs). Here, we combined measurements of intrinsic functional connectivity (iFC) from resting-state functional magnetic resonance imaging (fMRI) with task-based fMRI to explore whether altered activity and/or iFC of the right posterior superior temporal sulcus (pSTS) mediates deficits in emotion recognition in ASD. Fifteen adults with ASD and 15 matched-controls underwent resting-state and task-based fMRI, during which participants discriminated emotional states from point light displays (PLDs). Intrinsic FC of the right pSTS was further examined using 584 (278 ASD/306 controls) resting-state data of the Autism Brain Imaging Data Exchange (ABIDE). Participants with ASD were less accurate than controls in recognizing emotional states from PLDs. Analyses revealed pronounced ASD-related reductions both in task-based activity and resting-state iFC of the right pSTS with fronto-parietal areas typically encompassing the action observation network (AON). Notably, pSTS-hypo-activity was related to pSTS-hypo-connectivity, and both measures were predictive of emotion recognition performance with each measure explaining a unique part of the variance. Analyses with the large independent ABIDE dataset replicated reductions in pSTS-iFC to fronto-parietal regions. These findings provide novel evidence that pSTS hypo-activity and hypo-connectivity with the fronto-parietal AON are linked to the social deficits characteristic of ASD. PMID:24078018

Top