set analysis methods: Topics by Science.gov

Sample records for set analysis methods

GSA-PCA: gene set generation by principal component analysis of the Laplacian matrix of a metabolic network

PubMed Central

2012-01-01

Background Gene Set Analysis (GSA) has proven to be a useful approach to microarray analysis. However, most of the method development for GSA has focused on the statistical tests to be used rather than on the generation of sets that will be tested. Existing methods of set generation are often overly simplistic. The creation of sets from individual pathways (in isolation) is a poor reflection of the complexity of the underlying metabolic network. We have developed a novel approach to set generation via the use of Principal Component Analysis of the Laplacian matrix of a metabolic network. We have analysed a relatively simple data set to show the difference in results between our method and the current state-of-the-art pathway-based sets. Results The sets generated with this method are semi-exhaustive and capture much of the topological complexity of the metabolic network. The semi-exhaustive nature of this method has also allowed us to design a hypergeometric enrichment test to determine which genes are likely responsible for set significance. We show that our method finds significant aspects of biology that would be missed (i.e. false negatives) and addresses the false positive rates found with the use of simple pathway-based sets. Conclusions The set generation step for GSA is often neglected but is a crucial part of the analysis as it defines the full context for the analysis. As such, set generation methods should be robust and yield as complete a representation of the extant biological knowledge as possible. The method reported here achieves this goal and is demonstrably superior to previous set analysis methods. PMID:22876834
Down-weighting overlapping genes improves gene set analysis

PubMed Central

2012-01-01

Background The identification of gene sets that are significantly impacted in a given condition based on microarray data is a crucial step in current life science research. Most gene set analysis methods treat genes equally, regardless how specific they are to a given gene set. Results In this work we propose a new gene set analysis method that computes a gene set score as the mean of absolute values of weighted moderated gene t-scores. The gene weights are designed to emphasize the genes appearing in few gene sets, versus genes that appear in many gene sets. We demonstrate the usefulness of the method when analyzing gene sets that correspond to the KEGG pathways, and hence we called our method Pathway Analysis with Down-weighting of Overlapping Genes (PADOG). Unlike most gene set analysis methods which are validated through the analysis of 2-3 data sets followed by a human interpretation of the results, the validation employed here uses 24 different data sets and a completely objective assessment scheme that makes minimal assumptions and eliminates the need for possibly biased human assessments of the analysis results. Conclusions PADOG significantly improves gene set ranking and boosts sensitivity of analysis using information already available in the gene expression profiles and the collection of gene sets to be analyzed. The advantages of PADOG over other existing approaches are shown to be stable to changes in the database of gene sets to be analyzed. PADOG was implemented as an R package available at: http://bioinformaticsprb.med.wayne.edu/PADOG/or http://www.bioconductor.org. PMID:22713124
Principal Angle Enrichment Analysis (PAEA): Dimensionally Reduced Multivariate Gene Set Enrichment Analysis Tool

PubMed Central

Clark, Neil R.; Szymkiewicz, Maciej; Wang, Zichen; Monteiro, Caroline D.; Jones, Matthew R.; Ma’ayan, Avi

2016-01-01

Gene set analysis of differential expression, which identifies collectively differentially expressed gene sets, has become an important tool for biology. The power of this approach lies in its reduction of the dimensionality of the statistical problem and its incorporation of biological interpretation by construction. Many approaches to gene set analysis have been proposed, but benchmarking their performance in the setting of real biological data is difficult due to the lack of a gold standard. In a previously published work we proposed a geometrical approach to differential expression which performed highly in benchmarking tests and compared well to the most popular methods of differential gene expression. As reported, this approach has a natural extension to gene set analysis which we call Principal Angle Enrichment Analysis (PAEA). PAEA employs dimensionality reduction and a multivariate approach for gene set enrichment analysis. However, the performance of this method has not been assessed nor its implementation as a web-based tool. Here we describe new benchmarking protocols for gene set analysis methods and find that PAEA performs highly. The PAEA method is implemented as a user-friendly web-based tool, which contains 70 gene set libraries and is freely available to the community. PMID:26848405
Principal Angle Enrichment Analysis (PAEA): Dimensionally Reduced Multivariate Gene Set Enrichment Analysis Tool.

PubMed

Clark, Neil R; Szymkiewicz, Maciej; Wang, Zichen; Monteiro, Caroline D; Jones, Matthew R; Ma'ayan, Avi

2015-11-01

Gene set analysis of differential expression, which identifies collectively differentially expressed gene sets, has become an important tool for biology. The power of this approach lies in its reduction of the dimensionality of the statistical problem and its incorporation of biological interpretation by construction. Many approaches to gene set analysis have been proposed, but benchmarking their performance in the setting of real biological data is difficult due to the lack of a gold standard. In a previously published work we proposed a geometrical approach to differential expression which performed highly in benchmarking tests and compared well to the most popular methods of differential gene expression. As reported, this approach has a natural extension to gene set analysis which we call Principal Angle Enrichment Analysis (PAEA). PAEA employs dimensionality reduction and a multivariate approach for gene set enrichment analysis. However, the performance of this method has not been assessed nor its implementation as a web-based tool. Here we describe new benchmarking protocols for gene set analysis methods and find that PAEA performs highly. The PAEA method is implemented as a user-friendly web-based tool, which contains 70 gene set libraries and is freely available to the community.
Effect of the absolute statistic on gene-sampling gene-set analysis methods.

PubMed

Nam, Dougu

2017-06-01

Gene-set enrichment analysis and its modified versions have commonly been used for identifying altered functions or pathways in disease from microarray data. In particular, the simple gene-sampling gene-set analysis methods have been heavily used for datasets with only a few sample replicates. The biggest problem with this approach is the highly inflated false-positive rate. In this paper, the effect of absolute gene statistic on gene-sampling gene-set analysis methods is systematically investigated. Thus far, the absolute gene statistic has merely been regarded as a supplementary method for capturing the bidirectional changes in each gene set. Here, it is shown that incorporating the absolute gene statistic in gene-sampling gene-set analysis substantially reduces the false-positive rate and improves the overall discriminatory ability. Its effect was investigated by power, false-positive rate, and receiver operating curve for a number of simulated and real datasets. The performances of gene-set analysis methods in one-tailed (genome-wide association study) and two-tailed (gene expression data) tests were also compared and discussed.
Analysis Code - Data Analysis in 'Leveraging Multiple Statistical Methods for Inverse Prediction in Nuclear Forensics Applications' (LMSMIPNFA) v. 1.0

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lewis, John R

R code that performs the analysis of a data set presented in the paper ‘Leveraging Multiple Statistical Methods for Inverse Prediction in Nuclear Forensics Applications’ by Lewis, J., Zhang, A., Anderson-Cook, C. It provides functions for doing inverse predictions in this setting using several different statistical methods. The data set is a publicly available data set from a historical Plutonium production experiment.
Time-Course Gene Set Analysis for Longitudinal Gene Expression Data

PubMed Central

Hejblum, Boris P.; Skinner, Jason; Thiébaut, Rodolphe

2015-01-01

Gene set analysis methods, which consider predefined groups of genes in the analysis of genomic data, have been successfully applied for analyzing gene expression data in cross-sectional studies. The time-course gene set analysis (TcGSA) introduced here is an extension of gene set analysis to longitudinal data. The proposed method relies on random effects modeling with maximum likelihood estimates. It allows to use all available repeated measurements while dealing with unbalanced data due to missing at random (MAR) measurements. TcGSA is a hypothesis driven method that identifies a priori defined gene sets with significant expression variations over time, taking into account the potential heterogeneity of expression within gene sets. When biological conditions are compared, the method indicates if the time patterns of gene sets significantly differ according to these conditions. The interest of the method is illustrated by its application to two real life datasets: an HIV therapeutic vaccine trial (DALIA-1 trial), and data from a recent study on influenza and pneumococcal vaccines. In the DALIA-1 trial TcGSA revealed a significant change in gene expression over time within 69 gene sets during vaccination, while a standard univariate individual gene analysis corrected for multiple testing as well as a standard a Gene Set Enrichment Analysis (GSEA) for time series both failed to detect any significant pattern change over time. When applied to the second illustrative data set, TcGSA allowed the identification of 4 gene sets finally found to be linked with the influenza vaccine too although they were found to be associated to the pneumococcal vaccine only in previous analyses. In our simulation study TcGSA exhibits good statistical properties, and an increased power compared to other approaches for analyzing time-course expression patterns of gene sets. The method is made available for the community through an R package. PMID:26111374
Functional Multiple-Set Canonical Correlation Analysis

ERIC Educational Resources Information Center

Hwang, Heungsun; Jung, Kwanghee; Takane, Yoshio; Woodward, Todd S.

2012-01-01

We propose functional multiple-set canonical correlation analysis for exploring associations among multiple sets of functions. The proposed method includes functional canonical correlation analysis as a special case when only two sets of functions are considered. As in classical multiple-set canonical correlation analysis, computationally, the…
Comparative study on gene set and pathway topology-based enrichment methods.

PubMed

Bayerlová, Michaela; Jung, Klaus; Kramer, Frank; Klemm, Florian; Bleckmann, Annalen; Beißbarth, Tim

2015-10-22

Enrichment analysis is a popular approach to identify pathways or sets of genes which are significantly enriched in the context of differentially expressed genes. The traditional gene set enrichment approach considers a pathway as a simple gene list disregarding any knowledge of gene or protein interactions. In contrast, the new group of so called pathway topology-based methods integrates the topological structure of a pathway into the analysis. We comparatively investigated gene set and pathway topology-based enrichment approaches, considering three gene set and four topological methods. These methods were compared in two extensive simulation studies and on a benchmark of 36 real datasets, providing the same pathway input data for all methods. In the benchmark data analysis both types of methods showed a comparable ability to detect enriched pathways. The first simulation study was conducted with KEGG pathways, which showed considerable gene overlaps between each other. In this study with original KEGG pathways, none of the topology-based methods outperformed the gene set approach. Therefore, a second simulation study was performed on non-overlapping pathways created by unique gene IDs. Here, methods accounting for pathway topology reached higher accuracy than the gene set methods, however their sensitivity was lower. We conducted one of the first comprehensive comparative works on evaluating gene set against pathway topology-based enrichment methods. The topological methods showed better performance in the simulation scenarios with non-overlapping pathways, however, they were not conclusively better in the other scenarios. This suggests that simple gene set approach might be sufficient to detect an enriched pathway under realistic circumstances. Nevertheless, more extensive studies and further benchmark data are needed to systematically evaluate these methods and to assess what gain and cost pathway topology information introduces into enrichment analysis. Both types of methods for enrichment analysis require further improvements in order to deal with the problem of pathway overlaps.
MAVTgsa: An R Package for Gene Set (Enrichment) Analysis

DOE PAGES

Chien, Chih-Yi; Chang, Ching-Wei; Tsai, Chen-An; ...

2014-01-01

Gene semore » t analysis methods aim to determine whether an a priori defined set of genes shows statistically significant difference in expression on either categorical or continuous outcomes. Although many methods for gene set analysis have been proposed, a systematic analysis tool for identification of different types of gene set significance modules has not been developed previously. This work presents an R package, called MAVTgsa, which includes three different methods for integrated gene set enrichment analysis. (1) The one-sided OLS (ordinary least squares) test detects coordinated changes of genes in gene set in one direction, either up- or downregulation. (2) The two-sided MANOVA (multivariate analysis variance) detects changes both up- and downregulation for studying two or more experimental conditions. (3) A random forests-based procedure is to identify gene sets that can accurately predict samples from different experimental conditions or are associated with the continuous phenotypes. MAVTgsa computes the P values and FDR (false discovery rate) q -value for all gene sets in the study. Furthermore, MAVTgsa provides several visualization outputs to support and interpret the enrichment results. This package is available online.« less
Set of new draft methods for the analysis of organic disinfection by-products, including 551 and 552. Draft report

DOE Office of Scientific and Technical Information (OSTI.GOV)

Not Available

1993-01-01

The set of documents discusses the new draft methods (EPA method 551, EPA method 552) for the analysis of disinfection byproducts contained in drinking water. The methods use the techniques of liquid/liquid extraction and gas chromatography with electron capture detection.
Comparative study of joint analysis of microarray gene expression data in survival prediction and risk assessment of breast cancer patients

PubMed Central

2016-01-01

Abstract Microarray gene expression data sets are jointly analyzed to increase statistical power. They could either be merged together or analyzed by meta-analysis. For a given ensemble of data sets, it cannot be foreseen which of these paradigms, merging or meta-analysis, works better. In this article, three joint analysis methods, Z -score normalization, ComBat and the inverse normal method (meta-analysis) were selected for survival prognosis and risk assessment of breast cancer patients. The methods were applied to eight microarray gene expression data sets, totaling 1324 patients with two clinical endpoints, overall survival and relapse-free survival. The performance derived from the joint analysis methods was evaluated using Cox regression for survival analysis and independent validation used as bias estimation. Overall, Z -score normalization had a better performance than ComBat and meta-analysis. Higher Area Under the Receiver Operating Characteristic curve and hazard ratio were also obtained when independent validation was used as bias estimation. With a lower time and memory complexity, Z -score normalization is a simple method for joint analysis of microarray gene expression data sets. The derived findings suggest further assessment of this method in future survival prediction and cancer classification applications. PMID:26504096
Meta-analysis of pathway enrichment: combining independent and dependent omics data sets.

PubMed

Kaever, Alexander; Landesfeind, Manuel; Feussner, Kirstin; Morgenstern, Burkhard; Feussner, Ivo; Meinicke, Peter

2014-01-01

A major challenge in current systems biology is the combination and integrative analysis of large data sets obtained from different high-throughput omics platforms, such as mass spectrometry based Metabolomics and Proteomics or DNA microarray or RNA-seq-based Transcriptomics. Especially in the case of non-targeted Metabolomics experiments, where it is often impossible to unambiguously map ion features from mass spectrometry analysis to metabolites, the integration of more reliable omics technologies is highly desirable. A popular method for the knowledge-based interpretation of single data sets is the (Gene) Set Enrichment Analysis. In order to combine the results from different analyses, we introduce a methodical framework for the meta-analysis of p-values obtained from Pathway Enrichment Analysis (Set Enrichment Analysis based on pathways) of multiple dependent or independent data sets from different omics platforms. For dependent data sets, e.g. obtained from the same biological samples, the framework utilizes a covariance estimation procedure based on the nonsignificant pathways in single data set enrichment analysis. The framework is evaluated and applied in the joint analysis of Metabolomics mass spectrometry and Transcriptomics DNA microarray data in the context of plant wounding. In extensive studies of simulated data set dependence, the introduced correlation could be fully reconstructed by means of the covariance estimation based on pathway enrichment. By restricting the range of p-values of pathways considered in the estimation, the overestimation of correlation, which is introduced by the significant pathways, could be reduced. When applying the proposed methods to the real data sets, the meta-analysis was shown not only to be a powerful tool to investigate the correlation between different data sets and summarize the results of multiple analyses but also to distinguish experiment-specific key pathways.
Iterative Strain-Gage Balance Calibration Data Analysis for Extended Independent Variable Sets

NASA Technical Reports Server (NTRS)

Ulbrich, Norbert Manfred

2011-01-01

A new method was developed that makes it possible to use an extended set of independent calibration variables for an iterative analysis of wind tunnel strain gage balance calibration data. The new method permits the application of the iterative analysis method whenever the total number of balance loads and other independent calibration variables is greater than the total number of measured strain gage outputs. Iteration equations used by the iterative analysis method have the limitation that the number of independent and dependent variables must match. The new method circumvents this limitation. It simply adds a missing dependent variable to the original data set by using an additional independent variable also as an additional dependent variable. Then, the desired solution of the regression analysis problem can be obtained that fits each gage output as a function of both the original and additional independent calibration variables. The final regression coefficients can be converted to data reduction matrix coefficients because the missing dependent variables were added to the data set without changing the regression analysis result for each gage output. Therefore, the new method still supports the application of the two load iteration equation choices that the iterative method traditionally uses for the prediction of balance loads during a wind tunnel test. An example is discussed in the paper that illustrates the application of the new method to a realistic simulation of temperature dependent calibration data set of a six component balance.
The limitations of simple gene set enrichment analysis assuming gene independence.

PubMed

Tamayo, Pablo; Steinhardt, George; Liberzon, Arthur; Mesirov, Jill P

2016-02-01

Since its first publication in 2003, the Gene Set Enrichment Analysis method, based on the Kolmogorov-Smirnov statistic, has been heavily used, modified, and also questioned. Recently a simplified approach using a one-sample t-test score to assess enrichment and ignoring gene-gene correlations was proposed by Irizarry et al. 2009 as a serious contender. The argument criticizes Gene Set Enrichment Analysis's nonparametric nature and its use of an empirical null distribution as unnecessary and hard to compute. We refute these claims by careful consideration of the assumptions of the simplified method and its results, including a comparison with Gene Set Enrichment Analysis's on a large benchmark set of 50 datasets. Our results provide strong empirical evidence that gene-gene correlations cannot be ignored due to the significant variance inflation they produced on the enrichment scores and should be taken into account when estimating gene set enrichment significance. In addition, we discuss the challenges that the complex correlation structure and multi-modality of gene sets pose more generally for gene set enrichment methods. © The Author(s) 2012.
Method for predicting peptide detection in mass spectrometry

DOEpatents

Kangas, Lars [West Richland, WA; Smith, Richard D [Richland, WA; Petritis, Konstantinos [Richland, WA

2010-07-13

A method of predicting whether a peptide present in a biological sample will be detected by analysis with a mass spectrometer. The method uses at least one mass spectrometer to perform repeated analysis of a sample containing peptides from proteins with known amino acids. The method then generates a data set of peptides identified as contained within the sample by the repeated analysis. The method then calculates the probability that a specific peptide in the data set was detected in the repeated analysis. The method then creates a plurality of vectors, where each vector has a plurality of dimensions, and each dimension represents a property of one or more of the amino acids present in each peptide and adjacent peptides in the data set. Using these vectors, the method then generates an algorithm from the plurality of vectors and the calculated probabilities that specific peptides in the data set were detected in the repeated analysis. The algorithm is thus capable of calculating the probability that a hypothetical peptide represented as a vector will be detected by a mass spectrometry based proteomic platform, given that the peptide is present in a sample introduced into a mass spectrometer.
Concordant integrative gene set enrichment analysis of multiple large-scale two-sample expression data sets.

PubMed

Lai, Yinglei; Zhang, Fanni; Nayak, Tapan K; Modarres, Reza; Lee, Norman H; McCaffrey, Timothy A

2014-01-01

Gene set enrichment analysis (GSEA) is an important approach to the analysis of coordinate expression changes at a pathway level. Although many statistical and computational methods have been proposed for GSEA, the issue of a concordant integrative GSEA of multiple expression data sets has not been well addressed. Among different related data sets collected for the same or similar study purposes, it is important to identify pathways or gene sets with concordant enrichment. We categorize the underlying true states of differential expression into three representative categories: no change, positive change and negative change. Due to data noise, what we observe from experiments may not indicate the underlying truth. Although these categories are not observed in practice, they can be considered in a mixture model framework. Then, we define the mathematical concept of concordant gene set enrichment and calculate its related probability based on a three-component multivariate normal mixture model. The related false discovery rate can be calculated and used to rank different gene sets. We used three published lung cancer microarray gene expression data sets to illustrate our proposed method. One analysis based on the first two data sets was conducted to compare our result with a previous published result based on a GSEA conducted separately for each individual data set. This comparison illustrates the advantage of our proposed concordant integrative gene set enrichment analysis. Then, with a relatively new and larger pathway collection, we used our method to conduct an integrative analysis of the first two data sets and also all three data sets. Both results showed that many gene sets could be identified with low false discovery rates. A consistency between both results was also observed. A further exploration based on the KEGG cancer pathway collection showed that a majority of these pathways could be identified by our proposed method. This study illustrates that we can improve detection power and discovery consistency through a concordant integrative analysis of multiple large-scale two-sample gene expression data sets.
Estimation of gene induction enables a relevance-based ranking of gene sets.

PubMed

Bartholomé, Kilian; Kreutz, Clemens; Timmer, Jens

2009-07-01

In order to handle and interpret the vast amounts of data produced by microarray experiments, the analysis of sets of genes with a common biological functionality has been shown to be advantageous compared to single gene analyses. Some statistical methods have been proposed to analyse the differential gene expression of gene sets in microarray experiments. However, most of these methods either require threshhold values to be chosen for the analysis, or they need some reference set for the determination of significance. We present a method that estimates the number of differentially expressed genes in a gene set without requiring a threshold value for significance of genes. The method is self-contained (i.e., it does not require a reference set for comparison). In contrast to other methods which are focused on significance, our approach emphasizes the relevance of the regulation of gene sets. The presented method measures the degree of regulation of a gene set and is a useful tool to compare the induction of different gene sets and place the results of microarray experiments into the biological context. An R-package is available.
MAGMA: Generalized Gene-Set Analysis of GWAS Data

PubMed Central

de Leeuw, Christiaan A.; Mooij, Joris M.; Heskes, Tom; Posthuma, Danielle

2015-01-01

By aggregating data for complex traits in a biologically meaningful way, gene and gene-set analysis constitute a valuable addition to single-marker analysis. However, although various methods for gene and gene-set analysis currently exist, they generally suffer from a number of issues. Statistical power for most methods is strongly affected by linkage disequilibrium between markers, multi-marker associations are often hard to detect, and the reliance on permutation to compute p-values tends to make the analysis computationally very expensive. To address these issues we have developed MAGMA, a novel tool for gene and gene-set analysis. The gene analysis is based on a multiple regression model, to provide better statistical performance. The gene-set analysis is built as a separate layer around the gene analysis for additional flexibility. This gene-set analysis also uses a regression structure to allow generalization to analysis of continuous properties of genes and simultaneous analysis of multiple gene sets and other gene properties. Simulations and an analysis of Crohn’s Disease data are used to evaluate the performance of MAGMA and to compare it to a number of other gene and gene-set analysis tools. The results show that MAGMA has significantly more power than other tools for both the gene and the gene-set analysis, identifying more genes and gene sets associated with Crohn’s Disease while maintaining a correct type 1 error rate. Moreover, the MAGMA analysis of the Crohn’s Disease data was found to be considerably faster as well. PMID:25885710
MAGMA: generalized gene-set analysis of GWAS data.

PubMed

de Leeuw, Christiaan A; Mooij, Joris M; Heskes, Tom; Posthuma, Danielle

2015-04-01

By aggregating data for complex traits in a biologically meaningful way, gene and gene-set analysis constitute a valuable addition to single-marker analysis. However, although various methods for gene and gene-set analysis currently exist, they generally suffer from a number of issues. Statistical power for most methods is strongly affected by linkage disequilibrium between markers, multi-marker associations are often hard to detect, and the reliance on permutation to compute p-values tends to make the analysis computationally very expensive. To address these issues we have developed MAGMA, a novel tool for gene and gene-set analysis. The gene analysis is based on a multiple regression model, to provide better statistical performance. The gene-set analysis is built as a separate layer around the gene analysis for additional flexibility. This gene-set analysis also uses a regression structure to allow generalization to analysis of continuous properties of genes and simultaneous analysis of multiple gene sets and other gene properties. Simulations and an analysis of Crohn's Disease data are used to evaluate the performance of MAGMA and to compare it to a number of other gene and gene-set analysis tools. The results show that MAGMA has significantly more power than other tools for both the gene and the gene-set analysis, identifying more genes and gene sets associated with Crohn's Disease while maintaining a correct type 1 error rate. Moreover, the MAGMA analysis of the Crohn's Disease data was found to be considerably faster as well.

Assessment of protein set coherence using functional annotations

PubMed Central

Chagoyen, Monica; Carazo, Jose M; Pascual-Montano, Alberto

2008-01-01

Background Analysis of large-scale experimental datasets frequently produces one or more sets of proteins that are subsequently mined for functional interpretation and validation. To this end, a number of computational methods have been devised that rely on the analysis of functional annotations. Although current methods provide valuable information (e.g. significantly enriched annotations, pairwise functional similarities), they do not specifically measure the degree of homogeneity of a protein set. Results In this work we present a method that scores the degree of functional homogeneity, or coherence, of a set of proteins on the basis of the global similarity of their functional annotations. The method uses statistical hypothesis testing to assess the significance of the set in the context of the functional space of a reference set. As such, it can be used as a first step in the validation of sets expected to be homogeneous prior to further functional interpretation. Conclusion We evaluate our method by analysing known biologically relevant sets as well as random ones. The known relevant sets comprise macromolecular complexes, cellular components and pathways described for Saccharomyces cerevisiae, which are mostly significantly coherent. Finally, we illustrate the usefulness of our approach for validating 'functional modules' obtained from computational analysis of protein-protein interaction networks. Matlab code and supplementary data are available at PMID:18937846
Selecting risk factors: a comparison of discriminant analysis, logistic regression and Cox's regression model using data from the Tromsø Heart Study.

PubMed

Brenn, T; Arnesen, E

1985-01-01

For comparative evaluation, discriminant analysis, logistic regression and Cox's model were used to select risk factors for total and coronary deaths among 6595 men aged 20-49 followed for 9 years. Groups with mortality between 5 and 93 per 1000 were considered. Discriminant analysis selected variable sets only marginally different from the logistic and Cox methods which always selected the same sets. A time-saving option, offered for both the logistic and Cox selection, showed no advantage compared with discriminant analysis. Analysing more than 3800 subjects, the logistic and Cox methods consumed, respectively, 80 and 10 times more computer time than discriminant analysis. When including the same set of variables in non-stepwise analyses, all methods estimated coefficients that in most cases were almost identical. In conclusion, discriminant analysis is advocated for preliminary or stepwise analysis, otherwise Cox's method should be used.
A Unifying Framework for Causal Analysis in Set-Theoretic Multimethod Research

ERIC Educational Resources Information Center

Rohlfing, Ingo; Schneider, Carsten Q.

2018-01-01

The combination of Qualitative Comparative Analysis (QCA) with process tracing, which we call set-theoretic multimethod research (MMR), is steadily becoming more popular in empirical research. Despite the fact that both methods have an elected affinity based on set theory, it is not obvious how a within-case method operating in a single case and a…
Exploitation of SAR data for measurement of ocean currents and wave velocities

NASA Technical Reports Server (NTRS)

Shuchman, R. A.; Lyzenga, D. R.; Klooster, A., Jr.

1981-01-01

Methods of extracting information on ocean currents and wave orbital velocities from SAR data by an analysis of the Doppler frequency content of the data are discussed. The theory and data analysis methods are discussed, and results are presented for both aircraft and satellite (SEASAT) data sets. A method of measuring the phase velocity of a gravity wave field is also described. This method uses the shift in position of the wave crests on two images generated from the same data set using two separate Doppler bands. Results of the current measurements are pesented for 11 aircraft data sets and 4 SEASAT data sets.
Integrative analysis of gene expression and copy number alterations using canonical correlation analysis.

PubMed

Soneson, Charlotte; Lilljebjörn, Henrik; Fioretos, Thoas; Fontes, Magnus

2010-04-15

With the rapid development of new genetic measurement methods, several types of genetic alterations can be quantified in a high-throughput manner. While the initial focus has been on investigating each data set separately, there is an increasing interest in studying the correlation structure between two or more data sets. Multivariate methods based on Canonical Correlation Analysis (CCA) have been proposed for integrating paired genetic data sets. The high dimensionality of microarray data imposes computational difficulties, which have been addressed for instance by studying the covariance structure of the data, or by reducing the number of variables prior to applying the CCA. In this work, we propose a new method for analyzing high-dimensional paired genetic data sets, which mainly emphasizes the correlation structure and still permits efficient application to very large data sets. The method is implemented by translating a regularized CCA to its dual form, where the computational complexity depends mainly on the number of samples instead of the number of variables. The optimal regularization parameters are chosen by cross-validation. We apply the regularized dual CCA, as well as a classical CCA preceded by a dimension-reducing Principal Components Analysis (PCA), to a paired data set of gene expression changes and copy number alterations in leukemia. Using the correlation-maximizing methods, regularized dual CCA and PCA+CCA, we show that without pre-selection of known disease-relevant genes, and without using information about clinical class membership, an exploratory analysis singles out two patient groups, corresponding to well-known leukemia subtypes. Furthermore, the variables showing the highest relevance to the extracted features agree with previous biological knowledge concerning copy number alterations and gene expression changes in these subtypes. Finally, the correlation-maximizing methods are shown to yield results which are more biologically interpretable than those resulting from a covariance-maximizing method, and provide different insight compared to when each variable set is studied separately using PCA. We conclude that regularized dual CCA as well as PCA+CCA are useful methods for exploratory analysis of paired genetic data sets, and can be efficiently implemented also when the number of variables is very large.
Extracting insights from the shape of complex data using topology

PubMed Central

Lum, P. Y.; Singh, G.; Lehman, A.; Ishkanov, T.; Vejdemo-Johansson, M.; Alagappan, M.; Carlsson, J.; Carlsson, G.

2013-01-01

This paper applies topological methods to study complex high dimensional data sets by extracting shapes (patterns) and obtaining insights about them. Our method combines the best features of existing standard methodologies such as principal component and cluster analyses to provide a geometric representation of complex data sets. Through this hybrid method, we often find subgroups in data sets that traditional methodologies fail to find. Our method also permits the analysis of individual data sets as well as the analysis of relationships between related data sets. We illustrate the use of our method by applying it to three very different kinds of data, namely gene expression from breast tumors, voting data from the United States House of Representatives and player performance data from the NBA, in each case finding stratifications of the data which are more refined than those produced by standard methods. PMID:23393618
Extracting insights from the shape of complex data using topology.

PubMed

Lum, P Y; Singh, G; Lehman, A; Ishkanov, T; Vejdemo-Johansson, M; Alagappan, M; Carlsson, J; Carlsson, G

2013-01-01

This paper applies topological methods to study complex high dimensional data sets by extracting shapes (patterns) and obtaining insights about them. Our method combines the best features of existing standard methodologies such as principal component and cluster analyses to provide a geometric representation of complex data sets. Through this hybrid method, we often find subgroups in data sets that traditional methodologies fail to find. Our method also permits the analysis of individual data sets as well as the analysis of relationships between related data sets. We illustrate the use of our method by applying it to three very different kinds of data, namely gene expression from breast tumors, voting data from the United States House of Representatives and player performance data from the NBA, in each case finding stratifications of the data which are more refined than those produced by standard methods.
Random forests-based differential analysis of gene sets for gene expression data.

PubMed

Hsueh, Huey-Miin; Zhou, Da-Wei; Tsai, Chen-An

2013-04-10

In DNA microarray studies, gene-set analysis (GSA) has become the focus of gene expression data analysis. GSA utilizes the gene expression profiles of functionally related gene sets in Gene Ontology (GO) categories or priori-defined biological classes to assess the significance of gene sets associated with clinical outcomes or phenotypes. Many statistical approaches have been proposed to determine whether such functionally related gene sets express differentially (enrichment and/or deletion) in variations of phenotypes. However, little attention has been given to the discriminatory power of gene sets and classification of patients. In this study, we propose a method of gene set analysis, in which gene sets are used to develop classifications of patients based on the Random Forest (RF) algorithm. The corresponding empirical p-value of an observed out-of-bag (OOB) error rate of the classifier is introduced to identify differentially expressed gene sets using an adequate resampling method. In addition, we discuss the impacts and correlations of genes within each gene set based on the measures of variable importance in the RF algorithm. Significant classifications are reported and visualized together with the underlying gene sets and their contribution to the phenotypes of interest. Numerical studies using both synthesized data and a series of publicly available gene expression data sets are conducted to evaluate the performance of the proposed methods. Compared with other hypothesis testing approaches, our proposed methods are reliable and successful in identifying enriched gene sets and in discovering the contributions of genes within a gene set. The classification results of identified gene sets can provide an valuable alternative to gene set testing to reveal the unknown, biologically relevant classes of samples or patients. In summary, our proposed method allows one to simultaneously assess the discriminatory ability of gene sets and the importance of genes for interpretation of data in complex biological systems. The classifications of biologically defined gene sets can reveal the underlying interactions of gene sets associated with the phenotypes, and provide an insightful complement to conventional gene set analyses. Copyright © 2012 Elsevier B.V. All rights reserved.
Functional Extended Redundancy Analysis

ERIC Educational Resources Information Center

Hwang, Heungsun; Suk, Hye Won; Lee, Jang-Han; Moskowitz, D. S.; Lim, Jooseop

2012-01-01

We propose a functional version of extended redundancy analysis that examines directional relationships among several sets of multivariate variables. As in extended redundancy analysis, the proposed method posits that a weighed composite of each set of exogenous variables influences a set of endogenous variables. It further considers endogenous…
Setting Cut Scores on an EFL Placement Test Using the Prototype Group Method: A Receiver Operating Characteristic (ROC) Analysis

ERIC Educational Resources Information Center

Eckes, Thomas

2017-01-01

This paper presents an approach to standard setting that combines the prototype group method (PGM; Eckes, 2012) with a receiver operating characteristic (ROC) analysis. The combined PGM-ROC approach is applied to setting cut scores on a placement test of English as a foreign language (EFL). To implement the PGM, experts first named learners whom…
A fuzzy-based data transformation for feature extraction to increase classification performance with small medical data sets.

PubMed

Li, Der-Chiang; Liu, Chiao-Wen; Hu, Susan C

2011-05-01

Medical data sets are usually small and have very high dimensionality. Too many attributes will make the analysis less efficient and will not necessarily increase accuracy, while too few data will decrease the modeling stability. Consequently, the main objective of this study is to extract the optimal subset of features to increase analytical performance when the data set is small. This paper proposes a fuzzy-based non-linear transformation method to extend classification related information from the original data attribute values for a small data set. Based on the new transformed data set, this study applies principal component analysis (PCA) to extract the optimal subset of features. Finally, we use the transformed data with these optimal features as the input data for a learning tool, a support vector machine (SVM). Six medical data sets: Pima Indians' diabetes, Wisconsin diagnostic breast cancer, Parkinson disease, echocardiogram, BUPA liver disorders dataset, and bladder cancer cases in Taiwan, are employed to illustrate the approach presented in this paper. This research uses the t-test to evaluate the classification accuracy for a single data set; and uses the Friedman test to show the proposed method is better than other methods over the multiple data sets. The experiment results indicate that the proposed method has better classification performance than either PCA or kernel principal component analysis (KPCA) when the data set is small, and suggest creating new purpose-related information to improve the analysis performance. This paper has shown that feature extraction is important as a function of feature selection for efficient data analysis. When the data set is small, using the fuzzy-based transformation method presented in this work to increase the information available produces better results than the PCA and KPCA approaches. Copyright © 2011 Elsevier B.V. All rights reserved.
Evaluating Gene Set Enrichment Analysis Via a Hybrid Data Model

PubMed Central

Hua, Jianping; Bittner, Michael L.; Dougherty, Edward R.

2014-01-01

Gene set enrichment analysis (GSA) methods have been widely adopted by biological labs to analyze data and generate hypotheses for validation. Most of the existing comparison studies focus on whether the existing GSA methods can produce accurate P-values; however, practitioners are often more concerned with the correct gene-set ranking generated by the methods. The ranking performance is closely related to two critical goals associated with GSA methods: the ability to reveal biological themes and ensuring reproducibility, especially for small-sample studies. We have conducted a comprehensive simulation study focusing on the ranking performance of seven representative GSA methods. We overcome the limitation on the availability of real data sets by creating hybrid data models from existing large data sets. To build the data model, we pick a master gene from the data set to form the ground truth and artificially generate the phenotype labels. Multiple hybrid data models can be constructed from one data set and multiple data sets of smaller sizes can be generated by resampling the original data set. This approach enables us to generate a large batch of data sets to check the ranking performance of GSA methods. Our simulation study reveals that for the proposed data model, the Q2 type GSA methods have in general better performance than other GSA methods and the global test has the most robust results. The properties of a data set play a critical role in the performance. For the data sets with highly connected genes, all GSA methods suffer significantly in performance. PMID:24558298
Signal-to-noise contribution of principal component loads in reconstructed near-infrared Raman tissue spectra.

PubMed

Grimbergen, M C M; van Swol, C F P; Kendall, C; Verdaasdonk, R M; Stone, N; Bosch, J L H R

2010-01-01

The overall quality of Raman spectra in the near-infrared region, where biological samples are often studied, has benefited from various improvements to optical instrumentation over the past decade. However, obtaining ample spectral quality for analysis is still challenging due to device requirements and short integration times required for (in vivo) clinical applications of Raman spectroscopy. Multivariate analytical methods, such as principal component analysis (PCA) and linear discriminant analysis (LDA), are routinely applied to Raman spectral datasets to develop classification models. Data compression is necessary prior to discriminant analysis to prevent or decrease the degree of over-fitting. The logical threshold for the selection of principal components (PCs) to be used in discriminant analysis is likely to be at a point before the PCs begin to introduce equivalent signal and noise and, hence, include no additional value. Assessment of the signal-to-noise ratio (SNR) at a certain peak or over a specific spectral region will depend on the sample measured. Therefore, the mean SNR over the whole spectral region (SNR(msr)) is determined in the original spectrum as well as for spectra reconstructed from an increasing number of principal components. This paper introduces a method of assessing the influence of signal and noise from individual PC loads and indicates a method of selection of PCs for LDA. To evaluate this method, two data sets with different SNRs were used. The sets were obtained with the same Raman system and the same measurement parameters on bladder tissue collected during white light cystoscopy (set A) and fluorescence-guided cystoscopy (set B). This method shows that the mean SNR over the spectral range in the original Raman spectra of these two data sets is related to the signal and noise contribution of principal component loads. The difference in mean SNR over the spectral range can also be appreciated since fewer principal components can reliably be used in the low SNR data set (set B) compared to the high SNR data set (set A). Despite the fact that no definitive threshold could be found, this method may help to determine the cutoff for the number of principal components used in discriminant analysis. Future analysis of a selection of spectral databases using this technique will allow optimum thresholds to be selected for different applications and spectral data quality levels.
GenomeFingerprinter: the genome fingerprint and the universal genome fingerprint analysis for systematic comparative genomics.

PubMed

Ai, Yuncan; Ai, Hannan; Meng, Fanmei; Zhao, Lei

2013-01-01

No attention has been paid on comparing a set of genome sequences crossing genetic components and biological categories with far divergence over large size range. We define it as the systematic comparative genomics and aim to develop the methodology. First, we create a method, GenomeFingerprinter, to unambiguously produce a set of three-dimensional coordinates from a sequence, followed by one three-dimensional plot and six two-dimensional trajectory projections, to illustrate the genome fingerprint of a given genome sequence. Second, we develop a set of concepts and tools, and thereby establish a method called the universal genome fingerprint analysis (UGFA). Particularly, we define the total genetic component configuration (TGCC) (including chromosome, plasmid, and phage) for describing a strain as a systematic unit, the universal genome fingerprint map (UGFM) of TGCC for differentiating strains as a universal system, and the systematic comparative genomics (SCG) for comparing a set of genomes crossing genetic components and biological categories. Third, we construct a method of quantitative analysis to compare two genomes by using the outcome dataset of genome fingerprint analysis. Specifically, we define the geometric center and its geometric mean for a given genome fingerprint map, followed by the Euclidean distance, the differentiate rate, and the weighted differentiate rate to quantitatively describe the difference between two genomes of comparison. Moreover, we demonstrate the applications through case studies on various genome sequences, giving tremendous insights into the critical issues in microbial genomics and taxonomy. We have created a method, GenomeFingerprinter, for rapidly computing, geometrically visualizing, intuitively comparing a set of genomes at genome fingerprint level, and hence established a method called the universal genome fingerprint analysis, as well as developed a method of quantitative analysis of the outcome dataset. These have set up the methodology of systematic comparative genomics based on the genome fingerprint analysis.
Random left censoring: a second look at bone lead concentration measurements

NASA Astrophysics Data System (ADS)

Popovic, M.; Nie, H.; Chettle, D. R.; McNeill, F. E.

2007-09-01

Bone lead concentrations measured in vivo by x-ray fluorescence (XRF) are subjected to left censoring due to limited precision of the technique at very low concentrations. In the analysis of bone lead measurements, inverse variance weighting (IVW) of measurements is commonly used to estimate the mean of a data set and its standard error. Student's t-test is used to compare the IVW means of two sets, testing the hypothesis that the two sets are from the same population. This analysis was undertaken to assess the adequacy of IVW in the analysis of bone lead measurements or to confirm the results of IVW using an independent approach. The rationale is provided for the use of methods of survival data analysis in the study of XRF bone lead measurements. The procedure is provided for bone lead data analysis using the Kaplan-Meier and Nelson-Aalen estimators. The methodology is also outlined for the rank tests that are used to determine whether two censored sets are from the same population. The methods are applied on six data sets acquired in epidemiological studies. The estimated parameters and test statistics were compared with the results of the IVW approach. It is concluded that the proposed methods of statistical analysis can provide valid inference about bone lead concentrations, but the computed parameters do not differ substantially from those derived by the more widely used method of IVW.
Assessment and improvement of statistical tools for comparative proteomics analysis of sparse data sets with few experimental replicates.

PubMed

Schwämmle, Veit; León, Ileana Rodríguez; Jensen, Ole Nørregaard

2013-09-06

Large-scale quantitative analyses of biological systems are often performed with few replicate experiments, leading to multiple nonidentical data sets due to missing values. For example, mass spectrometry driven proteomics experiments are frequently performed with few biological or technical replicates due to sample-scarcity or due to duty-cycle or sensitivity constraints, or limited capacity of the available instrumentation, leading to incomplete results where detection of significant feature changes becomes a challenge. This problem is further exacerbated for the detection of significant changes on the peptide level, for example, in phospho-proteomics experiments. In order to assess the extent of this problem and the implications for large-scale proteome analysis, we investigated and optimized the performance of three statistical approaches by using simulated and experimental data sets with varying numbers of missing values. We applied three tools, including standard t test, moderated t test, also known as limma, and rank products for the detection of significantly changing features in simulated and experimental proteomics data sets with missing values. The rank product method was improved to work with data sets containing missing values. Extensive analysis of simulated and experimental data sets revealed that the performance of the statistical analysis tools depended on simple properties of the data sets. High-confidence results were obtained by using the limma and rank products methods for analyses of triplicate data sets that exhibited more than 1000 features and more than 50% missing values. The maximum number of differentially represented features was identified by using limma and rank products methods in a complementary manner. We therefore recommend combined usage of these methods as a novel and optimal way to detect significantly changing features in these data sets. This approach is suitable for large quantitative data sets from stable isotope labeling and mass spectrometry experiments and should be applicable to large data sets of any type. An R script that implements the improved rank products algorithm and the combined analysis is available.
Improved score statistics for meta-analysis in single-variant and gene-level association studies.

PubMed

Yang, Jingjing; Chen, Sai; Abecasis, Gonçalo

2018-06-01

Meta-analysis is now an essential tool for genetic association studies, allowing them to combine large studies and greatly accelerating the pace of genetic discovery. Although the standard meta-analysis methods perform equivalently as the more cumbersome joint analysis under ideal settings, they result in substantial power loss under unbalanced settings with various case-control ratios. Here, we investigate the power loss problem by the standard meta-analysis methods for unbalanced studies, and further propose novel meta-analysis methods performing equivalently to the joint analysis under both balanced and unbalanced settings. We derive improved meta-score-statistics that can accurately approximate the joint-score-statistics with combined individual-level data, for both linear and logistic regression models, with and without covariates. In addition, we propose a novel approach to adjust for population stratification by correcting for known population structures through minor allele frequencies. In the simulated gene-level association studies under unbalanced settings, our method recovered up to 85% power loss caused by the standard methods. We further showed the power gain of our methods in gene-level tests with 26 unbalanced studies of age-related macular degeneration . In addition, we took the meta-analysis of three unbalanced studies of type 2 diabetes as an example to discuss the challenges of meta-analyzing multi-ethnic samples. In summary, our improved meta-score-statistics with corrections for population stratification can be used to construct both single-variant and gene-level association studies, providing a useful framework for ensuring well-powered, convenient, cross-study analyses. © 2018 WILEY PERIODICALS, INC.
Analysis of pressure distortion testing

NASA Technical Reports Server (NTRS)

Koch, K. E.; Rees, R. L.

1976-01-01

The development of a distortion methodology, method D, was documented, and its application to steady state and unsteady data was demonstrated. Three methodologies based upon DIDENT, a NASA-LeRC distortion methodology based upon the parallel compressor model, were investigated by applying them to a set of steady state data. The best formulation was then applied to an independent data set. The good correlation achieved with this data set showed that method E, one of the above methodologies, is a viable concept. Unsteady data were analyzed by using the method E methodology. This analysis pointed out that the method E sensitivities are functions of pressure defect level as well as corrected speed and pattern.
Combining multiple tools outperforms individual methods in gene set enrichment analyses.

PubMed

Alhamdoosh, Monther; Ng, Milica; Wilson, Nicholas J; Sheridan, Julie M; Huynh, Huy; Wilson, Michael J; Ritchie, Matthew E

2017-02-01

Gene set enrichment (GSE) analysis allows researchers to efficiently extract biological insight from long lists of differentially expressed genes by interrogating them at a systems level. In recent years, there has been a proliferation of GSE analysis methods and hence it has become increasingly difficult for researchers to select an optimal GSE tool based on their particular dataset. Moreover, the majority of GSE analysis methods do not allow researchers to simultaneously compare gene set level results between multiple experimental conditions. The ensemble of genes set enrichment analyses (EGSEA) is a method developed for RNA-sequencing data that combines results from twelve algorithms and calculates collective gene set scores to improve the biological relevance of the highest ranked gene sets. EGSEA's gene set database contains around 25 000 gene sets from sixteen collections. It has multiple visualization capabilities that allow researchers to view gene sets at various levels of granularity. EGSEA has been tested on simulated data and on a number of human and mouse datasets and, based on biologists' feedback, consistently outperforms the individual tools that have been combined. Our evaluation demonstrates the superiority of the ensemble approach for GSE analysis, and its utility to effectively and efficiently extrapolate biological functions and potential involvement in disease processes from lists of differentially regulated genes. EGSEA is available as an R package at http://www.bioconductor.org/packages/EGSEA/ . The gene sets collections are available in the R package EGSEAdata from http://www.bioconductor.org/packages/EGSEAdata/ . monther.alhamdoosh@csl.com.au mritchie@wehi.edu.au. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
Highly Efficient Design-of-Experiments Methods for Combining CFD Analysis and Experimental Data

NASA Technical Reports Server (NTRS)

Anderson, Bernhard H.; Haller, Harold S.

2009-01-01

It is the purpose of this study to examine the impact of "highly efficient" Design-of-Experiments (DOE) methods for combining sets of CFD generated analysis data with smaller sets of Experimental test data in order to accurately predict performance results where experimental test data were not obtained. The study examines the impact of micro-ramp flow control on the shock wave boundary layer (SWBL) interaction where a complete paired set of data exist from both CFD analysis and Experimental measurements By combining the complete set of CFD analysis data composed of fifteen (15) cases with a smaller subset of experimental test data containing four/five (4/5) cases, compound data sets (CFD/EXP) were generated which allows the prediction of the complete set of Experimental results No statistical difference were found to exist between the combined (CFD/EXP) generated data sets and the complete Experimental data set composed of fifteen (15) cases. The same optimal micro-ramp configuration was obtained using the (CFD/EXP) generated data as obtained with the complete set of Experimental data, and the DOE response surfaces generated by the two data sets were also not statistically different.

LEGO: a novel method for gene set over-representation analysis by incorporating network-based gene weights

PubMed Central

Dong, Xinran; Hao, Yun; Wang, Xiao; Tian, Weidong

2016-01-01

Pathway or gene set over-representation analysis (ORA) has become a routine task in functional genomics studies. However, currently widely used ORA tools employ statistical methods such as Fisher’s exact test that reduce a pathway into a list of genes, ignoring the constitutive functional non-equivalent roles of genes and the complex gene-gene interactions. Here, we develop a novel method named LEGO (functional Link Enrichment of Gene Ontology or gene sets) that takes into consideration these two types of information by incorporating network-based gene weights in ORA analysis. In three benchmarks, LEGO achieves better performance than Fisher and three other network-based methods. To further evaluate LEGO’s usefulness, we compare LEGO with five gene expression-based and three pathway topology-based methods using a benchmark of 34 disease gene expression datasets compiled by a recent publication, and show that LEGO is among the top-ranked methods in terms of both sensitivity and prioritization for detecting target KEGG pathways. In addition, we develop a cluster-and-filter approach to reduce the redundancy among the enriched gene sets, making the results more interpretable to biologists. Finally, we apply LEGO to two lists of autism genes, and identify relevant gene sets to autism that could not be found by Fisher. PMID:26750448
LEGO: a novel method for gene set over-representation analysis by incorporating network-based gene weights.

PubMed

Dong, Xinran; Hao, Yun; Wang, Xiao; Tian, Weidong

2016-01-11

Pathway or gene set over-representation analysis (ORA) has become a routine task in functional genomics studies. However, currently widely used ORA tools employ statistical methods such as Fisher's exact test that reduce a pathway into a list of genes, ignoring the constitutive functional non-equivalent roles of genes and the complex gene-gene interactions. Here, we develop a novel method named LEGO (functional Link Enrichment of Gene Ontology or gene sets) that takes into consideration these two types of information by incorporating network-based gene weights in ORA analysis. In three benchmarks, LEGO achieves better performance than Fisher and three other network-based methods. To further evaluate LEGO's usefulness, we compare LEGO with five gene expression-based and three pathway topology-based methods using a benchmark of 34 disease gene expression datasets compiled by a recent publication, and show that LEGO is among the top-ranked methods in terms of both sensitivity and prioritization for detecting target KEGG pathways. In addition, we develop a cluster-and-filter approach to reduce the redundancy among the enriched gene sets, making the results more interpretable to biologists. Finally, we apply LEGO to two lists of autism genes, and identify relevant gene sets to autism that could not be found by Fisher.
Taming Log Files from Game/Simulation-Based Assessments: Data Models and Data Analysis Tools. Research Report. ETS RR-16-10

ERIC Educational Resources Information Center

Hao, Jiangang; Smith, Lawrence; Mislevy, Robert; von Davier, Alina; Bauer, Malcolm

2016-01-01

Extracting information efficiently from game/simulation-based assessment (G/SBA) logs requires two things: a well-structured log file and a set of analysis methods. In this report, we propose a generic data model specified as an extensible markup language (XML) schema for the log files of G/SBAs. We also propose a set of analysis methods for…
Prioritizing individual genetic variants after kernel machine testing using variable selection.

PubMed

He, Qianchuan; Cai, Tianxi; Liu, Yang; Zhao, Ni; Harmon, Quaker E; Almli, Lynn M; Binder, Elisabeth B; Engel, Stephanie M; Ressler, Kerry J; Conneely, Karen N; Lin, Xihong; Wu, Michael C

2016-12-01

Kernel machine learning methods, such as the SNP-set kernel association test (SKAT), have been widely used to test associations between traits and genetic polymorphisms. In contrast to traditional single-SNP analysis methods, these methods are designed to examine the joint effect of a set of related SNPs (such as a group of SNPs within a gene or a pathway) and are able to identify sets of SNPs that are associated with the trait of interest. However, as with many multi-SNP testing approaches, kernel machine testing can draw conclusion only at the SNP-set level, and does not directly inform on which one(s) of the identified SNP set is actually driving the associations. A recently proposed procedure, KerNel Iterative Feature Extraction (KNIFE), provides a general framework for incorporating variable selection into kernel machine methods. In this article, we focus on quantitative traits and relatively common SNPs, and adapt the KNIFE procedure to genetic association studies and propose an approach to identify driver SNPs after the application of SKAT to gene set analysis. Our approach accommodates several kernels that are widely used in SNP analysis, such as the linear kernel and the Identity by State (IBS) kernel. The proposed approach provides practically useful utilities to prioritize SNPs, and fills the gap between SNP set analysis and biological functional studies. Both simulation studies and real data application are used to demonstrate the proposed approach. © 2016 WILEY PERIODICALS, INC.
Coloc-stats: a unified web interface to perform colocalization analysis of genomic features.

PubMed

Simovski, Boris; Kanduri, Chakravarthi; Gundersen, Sveinung; Titov, Dmytro; Domanska, Diana; Bock, Christoph; Bossini-Castillo, Lara; Chikina, Maria; Favorov, Alexander; Layer, Ryan M; Mironov, Andrey A; Quinlan, Aaron R; Sheffield, Nathan C; Trynka, Gosia; Sandve, Geir K

2018-06-05

Functional genomics assays produce sets of genomic regions as one of their main outputs. To biologically interpret such region-sets, researchers often use colocalization analysis, where the statistical significance of colocalization (overlap, spatial proximity) between two or more region-sets is tested. Existing colocalization analysis tools vary in the statistical methodology and analysis approaches, thus potentially providing different conclusions for the same research question. As the findings of colocalization analysis are often the basis for follow-up experiments, it is helpful to use several tools in parallel and to compare the results. We developed the Coloc-stats web service to facilitate such analyses. Coloc-stats provides a unified interface to perform colocalization analysis across various analytical methods and method-specific options (e.g. colocalization measures, resolution, null models). Coloc-stats helps the user to find a method that supports their experimental requirements and allows for a straightforward comparison across methods. Coloc-stats is implemented as a web server with a graphical user interface that assists users with configuring their colocalization analyses. Coloc-stats is freely available at https://hyperbrowser.uio.no/coloc-stats/.
Methods for Conducting Cognitive Task Analysis for a Decision Making Task.

DTIC Science & Technology

1996-01-01

Cognitive task analysis (CTA) improves traditional task analysis procedures by analyzing the thought processes of performers while they complete a...for using these methods to conduct a CTA for domains which involve critical decision making tasks in naturalistic settings. The cognitive task analysis methods
shinyGISPA: A web application for characterizing phenotype by gene sets using multiple omics data combinations.

PubMed

Dwivedi, Bhakti; Kowalski, Jeanne

2018-01-01

While many methods exist for integrating multi-omics data or defining gene sets, there is no one single tool that defines gene sets based on merging of multiple omics data sets. We present shinyGISPA, an open-source application with a user-friendly web-based interface to define genes according to their similarity in several molecular changes that are driving a disease phenotype. This tool was developed to help facilitate the usability of a previously published method, Gene Integrated Set Profile Analysis (GISPA), among researchers with limited computer-programming skills. The GISPA method allows the identification of multiple gene sets that may play a role in the characterization, clinical application, or functional relevance of a disease phenotype. The tool provides an automated workflow that is highly scalable and adaptable to applications that go beyond genomic data merging analysis. It is available at http://shinygispa.winship.emory.edu/shinyGISPA/.
shinyGISPA: A web application for characterizing phenotype by gene sets using multiple omics data combinations

PubMed Central

Dwivedi, Bhakti

2018-01-01

While many methods exist for integrating multi-omics data or defining gene sets, there is no one single tool that defines gene sets based on merging of multiple omics data sets. We present shinyGISPA, an open-source application with a user-friendly web-based interface to define genes according to their similarity in several molecular changes that are driving a disease phenotype. This tool was developed to help facilitate the usability of a previously published method, Gene Integrated Set Profile Analysis (GISPA), among researchers with limited computer-programming skills. The GISPA method allows the identification of multiple gene sets that may play a role in the characterization, clinical application, or functional relevance of a disease phenotype. The tool provides an automated workflow that is highly scalable and adaptable to applications that go beyond genomic data merging analysis. It is available at http://shinygispa.winship.emory.edu/shinyGISPA/. PMID:29415010
[The principal components analysis--method to classify the statistical variables with applications in medicine].

PubMed

Dascălu, Cristina Gena; Antohe, Magda Ecaterina

2009-01-01

Based on the eigenvalues and the eigenvectors analysis, the principal component analysis has the purpose to identify the subspace of the main components from a set of parameters, which are enough to characterize the whole set of parameters. Interpreting the data for analysis as a cloud of points, we find through geometrical transformations the directions where the cloud's dispersion is maximal--the lines that pass through the cloud's center of weight and have a maximal density of points around them (by defining an appropriate criteria function and its minimization. This method can be successfully used in order to simplify the statistical analysis on questionnaires--because it helps us to select from a set of items only the most relevant ones, which cover the variations of the whole set of data. For instance, in the presented sample we started from a questionnaire with 28 items and, applying the principal component analysis we identified 7 principal components--or main items--fact that simplifies significantly the further data statistical analysis.
Methods for Determining Particle Size Distributions from Nuclear Detonations.

DTIC Science & Technology

1987-03-01

Debris . . . 30 IV. Summary of Sample Preparation Method . . . . 35 V. Set Parameters for PCS ... ........... 39 VI. Analysis by Vendors...54 XV. Results From Brookhaven Analysis Using The Method of Cumulants ... ........... . 54 XVI. Results From Brookhaven Analysis of Sample...R-3 Using Histogram Method ......... .55 XVII. Results From Brookhaven Analysis of Sample R-8 Using Histogram Method ........... 56 XVIII.TEM Particle
Tissue Non-Specific Genes and Pathways Associated with Diabetes: An Expression Meta-Analysis.

PubMed

Mei, Hao; Li, Lianna; Liu, Shijian; Jiang, Fan; Griswold, Michael; Mosley, Thomas

2017-01-21

We performed expression studies to identify tissue non-specific genes and pathways of diabetes by meta-analysis. We searched curated datasets of the Gene Expression Omnibus (GEO) database and identified 13 and five expression studies of diabetes and insulin responses at various tissues, respectively. We tested differential gene expression by empirical Bayes-based linear method and investigated gene set expression association by knowledge-based enrichment analysis. Meta-analysis by different methods was applied to identify tissue non-specific genes and gene sets. We also proposed pathway mapping analysis to infer functions of the identified gene sets, and correlation and independent analysis to evaluate expression association profile of genes and gene sets between studies and tissues. Our analysis showed that PGRMC1 and HADH genes were significant over diabetes studies, while IRS1 and MPST genes were significant over insulin response studies, and joint analysis showed that HADH and MPST genes were significant over all combined data sets. The pathway analysis identified six significant gene sets over all studies. The KEGG pathway mapping indicated that the significant gene sets are related to diabetes pathogenesis. The results also presented that 12.8% and 59.0% pairwise studies had significantly correlated expression association for genes and gene sets, respectively; moreover, 12.8% pairwise studies had independent expression association for genes, but no studies were observed significantly different for expression association of gene sets. Our analysis indicated that there are both tissue specific and non-specific genes and pathways associated with diabetes pathogenesis. Compared to the gene expression, pathway association tends to be tissue non-specific, and a common pathway influencing diabetes development is activated through different genes at different tissues.
Recent advances in quantitative high throughput and high content data analysis.

PubMed

Moutsatsos, Ioannis K; Parker, Christian N

2016-01-01

High throughput screening has become a basic technique with which to explore biological systems. Advances in technology, including increased screening capacity, as well as methods that generate multiparametric readouts, are driving the need for improvements in the analysis of data sets derived from such screens. This article covers the recent advances in the analysis of high throughput screening data sets from arrayed samples, as well as the recent advances in the analysis of cell-by-cell data sets derived from image or flow cytometry application. Screening multiple genomic reagents targeting any given gene creates additional challenges and so methods that prioritize individual gene targets have been developed. The article reviews many of the open source data analysis methods that are now available and which are helping to define a consensus on the best practices to use when analyzing screening data. As data sets become larger, and more complex, the need for easily accessible data analysis tools will continue to grow. The presentation of such complex data sets, to facilitate quality control monitoring and interpretation of the results will require the development of novel visualizations. In addition, advanced statistical and machine learning algorithms that can help identify patterns, correlations and the best features in massive data sets will be required. The ease of use for these tools will be important, as they will need to be used iteratively by laboratory scientists to improve the outcomes of complex analyses.
Meta-analysis methods for combining multiple expression profiles: comparisons, statistical characterization and an application guideline

PubMed Central

2013-01-01

Background As high-throughput genomic technologies become accurate and affordable, an increasing number of data sets have been accumulated in the public domain and genomic information integration and meta-analysis have become routine in biomedical research. In this paper, we focus on microarray meta-analysis, where multiple microarray studies with relevant biological hypotheses are combined in order to improve candidate marker detection. Many methods have been developed and applied in the literature, but their performance and properties have only been minimally investigated. There is currently no clear conclusion or guideline as to the proper choice of a meta-analysis method given an application; the decision essentially requires both statistical and biological considerations. Results We performed 12 microarray meta-analysis methods for combining multiple simulated expression profiles, and such methods can be categorized for different hypothesis setting purposes: (1) HS A : DE genes with non-zero effect sizes in all studies, (2) HS B : DE genes with non-zero effect sizes in one or more studies and (3) HS r : DE gene with non-zero effect in "majority" of studies. We then performed a comprehensive comparative analysis through six large-scale real applications using four quantitative statistical evaluation criteria: detection capability, biological association, stability and robustness. We elucidated hypothesis settings behind the methods and further apply multi-dimensional scaling (MDS) and an entropy measure to characterize the meta-analysis methods and data structure, respectively. Conclusions The aggregated results from the simulation study categorized the 12 methods into three hypothesis settings (HS A , HS B , and HS r ). Evaluation in real data and results from MDS and entropy analyses provided an insightful and practical guideline to the choice of the most suitable method in a given application. All source files for simulation and real data are available on the author’s publication website. PMID:24359104
Meta-analysis methods for combining multiple expression profiles: comparisons, statistical characterization and an application guideline.

PubMed

Chang, Lun-Ching; Lin, Hui-Min; Sibille, Etienne; Tseng, George C

2013-12-21

As high-throughput genomic technologies become accurate and affordable, an increasing number of data sets have been accumulated in the public domain and genomic information integration and meta-analysis have become routine in biomedical research. In this paper, we focus on microarray meta-analysis, where multiple microarray studies with relevant biological hypotheses are combined in order to improve candidate marker detection. Many methods have been developed and applied in the literature, but their performance and properties have only been minimally investigated. There is currently no clear conclusion or guideline as to the proper choice of a meta-analysis method given an application; the decision essentially requires both statistical and biological considerations. We performed 12 microarray meta-analysis methods for combining multiple simulated expression profiles, and such methods can be categorized for different hypothesis setting purposes: (1) HS(A): DE genes with non-zero effect sizes in all studies, (2) HS(B): DE genes with non-zero effect sizes in one or more studies and (3) HS(r): DE gene with non-zero effect in "majority" of studies. We then performed a comprehensive comparative analysis through six large-scale real applications using four quantitative statistical evaluation criteria: detection capability, biological association, stability and robustness. We elucidated hypothesis settings behind the methods and further apply multi-dimensional scaling (MDS) and an entropy measure to characterize the meta-analysis methods and data structure, respectively. The aggregated results from the simulation study categorized the 12 methods into three hypothesis settings (HS(A), HS(B), and HS(r)). Evaluation in real data and results from MDS and entropy analyses provided an insightful and practical guideline to the choice of the most suitable method in a given application. All source files for simulation and real data are available on the author's publication website.
How to test validity in orthodontic research: a mixed dentition analysis example.

PubMed

Donatelli, Richard E; Lee, Shin-Jae

2015-02-01

The data used to test the validity of a prediction method should be different from the data used to generate the prediction model. In this study, we explored whether an independent data set is mandatory for testing the validity of a new prediction method and how validity can be tested without independent new data. Several validation methods were compared in an example using the data from a mixed dentition analysis with a regression model. The validation errors of real mixed dentition analysis data and simulation data were analyzed for increasingly large data sets. The validation results of both the real and the simulation studies demonstrated that the leave-1-out cross-validation method had the smallest errors. The largest errors occurred in the traditional simple validation method. The differences between the validation methods diminished as the sample size increased. The leave-1-out cross-validation method seems to be an optimal validation method for improving the prediction accuracy in a data set with limited sample sizes. Copyright © 2015 American Association of Orthodontists. Published by Elsevier Inc. All rights reserved.
Methods and apparatuses for information analysis on shared and distributed computing systems

DOEpatents

Bohn, Shawn J [Richland, WA; Krishnan, Manoj Kumar [Richland, WA; Cowley, Wendy E [Richland, WA; Nieplocha, Jarek [Richland, WA

2011-02-22

Apparatuses and computer-implemented methods for analyzing, on shared and distributed computing systems, information comprising one or more documents are disclosed according to some aspects. In one embodiment, information analysis can comprise distributing one or more distinct sets of documents among each of a plurality of processes, wherein each process performs operations on a distinct set of documents substantially in parallel with other processes. Operations by each process can further comprise computing term statistics for terms contained in each distinct set of documents, thereby generating a local set of term statistics for each distinct set of documents. Still further, operations by each process can comprise contributing the local sets of term statistics to a global set of term statistics, and participating in generating a major term set from an assigned portion of a global vocabulary.
Clinical Trials With Large Numbers of Variables: Important Advantages of Canonical Analysis.

PubMed

Cleophas, Ton J

2016-01-01

Canonical analysis assesses the combined effects of a set of predictor variables on a set of outcome variables, but it is little used in clinical trials despite the omnipresence of multiple variables. The aim of this study was to assess the performance of canonical analysis as compared with traditional multivariate methods using multivariate analysis of covariance (MANCOVA). As an example, a simulated data file with 12 gene expression levels and 4 drug efficacy scores was used. The correlation coefficient between the 12 predictor and 4 outcome variables was 0.87 (P = 0.0001) meaning that 76% of the variability in the outcome variables was explained by the 12 covariates. Repeated testing after the removal of 5 unimportant predictor and 1 outcome variable produced virtually the same overall result. The MANCOVA identified identical unimportant variables, but it was unable to provide overall statistics. (1) Canonical analysis is remarkable, because it can handle many more variables than traditional multivariate methods such as MANCOVA can. (2) At the same time, it accounts for the relative importance of the separate variables, their interactions and differences in units. (3) Canonical analysis provides overall statistics of the effects of sets of variables, whereas traditional multivariate methods only provide the statistics of the separate variables. (4) Unlike other methods for combining the effects of multiple variables such as factor analysis/partial least squares, canonical analysis is scientifically entirely rigorous. (5) Limitations include that it is less flexible than factor analysis/partial least squares, because only 2 sets of variables are used and because multiple solutions instead of one is offered. We do hope that this article will stimulate clinical investigators to start using this remarkable method.
Quantitative Analysis Tools and Digital Phantoms for Deformable Image Registration Quality Assurance.

PubMed

Kim, Haksoo; Park, Samuel B; Monroe, James I; Traughber, Bryan J; Zheng, Yiran; Lo, Simon S; Yao, Min; Mansur, David; Ellis, Rodney; Machtay, Mitchell; Sohn, Jason W

2015-08-01

This article proposes quantitative analysis tools and digital phantoms to quantify intrinsic errors of deformable image registration (DIR) systems and establish quality assurance (QA) procedures for clinical use of DIR systems utilizing local and global error analysis methods with clinically realistic digital image phantoms. Landmark-based image registration verifications are suitable only for images with significant feature points. To address this shortfall, we adapted a deformation vector field (DVF) comparison approach with new analysis techniques to quantify the results. Digital image phantoms are derived from data sets of actual patient images (a reference image set, R, a test image set, T). Image sets from the same patient taken at different times are registered with deformable methods producing a reference DVFref. Applying DVFref to the original reference image deforms T into a new image R'. The data set, R', T, and DVFref, is from a realistic truth set and therefore can be used to analyze any DIR system and expose intrinsic errors by comparing DVFref and DVFtest. For quantitative error analysis, calculating and delineating differences between DVFs, 2 methods were used, (1) a local error analysis tool that displays deformation error magnitudes with color mapping on each image slice and (2) a global error analysis tool that calculates a deformation error histogram, which describes a cumulative probability function of errors for each anatomical structure. Three digital image phantoms were generated from three patients with a head and neck, a lung and a liver cancer. The DIR QA was evaluated using the case with head and neck. © The Author(s) 2014.
Autoregressive modeling for the spectral analysis of oceanographic data

NASA Technical Reports Server (NTRS)

Gangopadhyay, Avijit; Cornillon, Peter; Jackson, Leland B.

1989-01-01

Over the last decade there has been a dramatic increase in the number and volume of data sets useful for oceanographic studies. Many of these data sets consist of long temporal or spatial series derived from satellites and large-scale oceanographic experiments. These data sets are, however, often 'gappy' in space, irregular in time, and always of finite length. The conventional Fourier transform (FT) approach to the spectral analysis is thus often inapplicable, or where applicable, it provides questionable results. Here, through comparative analysis with the FT for different oceanographic data sets, the possibilities offered by autoregressive (AR) modeling to perform spectral analysis of gappy, finite-length series, are discussed. The applications demonstrate that as the length of the time series becomes shorter, the resolving power of the AR approach as compared with that of the FT improves. For the longest data sets examined here, 98 points, the AR method performed only slightly better than the FT, but for the very short ones, 17 points, the AR method showed a dramatic improvement over the FT. The application of the AR method to a gappy time series, although a secondary concern of this manuscript, further underlines the value of this approach.
Improved omit set displacement recoveries in dynamic analysis

NASA Technical Reports Server (NTRS)

Allen, Tom; Cook, Greg; Walls, Bill

1993-01-01

Two related methods for improving the dependent (OMIT set) displacements after performing a Guyan reduction are presented. The theoretical bases for the methods are derived. The NASTRAN DMAP ALTERs used to implement the methods in a NASTRAN execution are described. Data are presented that verify the methods and the NASTRAN DMAP ALTERs.

Estimating a test's accuracy using tailored meta-analysis-How setting-specific data may aid study selection.

PubMed

Willis, Brian H; Hyde, Christopher J

2014-05-01

To determine a plausible estimate for a test's performance in a specific setting using a new method for selecting studies. It is shown how routine data from practice may be used to define an "applicable region" for studies in receiver operating characteristic space. After qualitative appraisal, studies are selected based on the probability that their study accuracy estimates arose from parameters lying in this applicable region. Three methods for calculating these probabilities are developed and used to tailor the selection of studies for meta-analysis. The Pap test applied to the UK National Health Service (NHS) Cervical Screening Programme provides a case example. The meta-analysis for the Pap test included 68 studies, but at most 17 studies were considered applicable to the NHS. For conventional meta-analysis, the sensitivity and specificity (with 95% confidence intervals) were estimated to be 72.8% (65.8, 78.8) and 75.4% (68.1, 81.5) compared with 50.9% (35.8, 66.0) and 98.0% (95.4, 99.1) from tailored meta-analysis using a binomial method for selection. Thus, for a cervical intraepithelial neoplasia (CIN) 1 prevalence of 2.2%, the post-test probability for CIN 1 would increase from 6.2% to 36.6% between the two methods of meta-analysis. Tailored meta-analysis provides a method for augmenting study selection based on the study's applicability to a setting. As such, the summary estimate is more likely to be plausible for a setting and could improve diagnostic prediction in practice. Copyright © 2014 Elsevier Inc. All rights reserved.
Regression Analysis and Calibration Recommendations for the Characterization of Balance Temperature Effects

NASA Technical Reports Server (NTRS)

Ulbrich, N.; Volden, T.

2018-01-01

Analysis and use of temperature-dependent wind tunnel strain-gage balance calibration data are discussed in the paper. First, three different methods are presented and compared that may be used to process temperature-dependent strain-gage balance data. The first method uses an extended set of independent variables in order to process the data and predict balance loads. The second method applies an extended load iteration equation during the analysis of balance calibration data. The third method uses temperature-dependent sensitivities for the data analysis. Physical interpretations of the most important temperature-dependent regression model terms are provided that relate temperature compensation imperfections and the temperature-dependent nature of the gage factor to sets of regression model terms. Finally, balance calibration recommendations are listed so that temperature-dependent calibration data can be obtained and successfully processed using the reviewed analysis methods.
Application of economic principles in healthcare priority setting.

PubMed

Bate, Angela; Mitton, Craig

2006-06-01

In healthcare, resources are often insufficient to meet all claims on them. In this respect, resources are considered scarce and have to be managed by prioritizing between competing claims. Economics as a discipline explicitly addresses this reality by acknowledging resource scarcity. However, the extent to which economics actually influences such prioritizing decisions in healthcare is unclear. The purpose of this paper is to review the use of economics in priority setting decision making. We outline the key principles of economics as they apply to priority setting and review the methods reported in the literature with respect to these. We find that these methods, even economic methods (e.g., those typically used in conducting economic evaluations such as cost-effectiveness analyses) do not tend to explicitly incorporate economic principles. We argue therefore that these methods, when applied to the context of priority setting, are not sufficient and that what is required is a broader framework that can incorporate the output from economic methods yet also be pragmatically applicable. We then go on to present an alternative approach - namely program budgeting and marginal analysis. Finally, we put forward our case for using program budgeting and marginal analysis in priority setting practice and set out some future research challenges.
16th IHIW: Global analysis of registry HLA haplotypes from 20 Million individuals: Report from the IHIW Registry Diversity Group

PubMed Central

Maiers, M; Gragert, L; Madbouly, A; Steiner, D; Marsh, S G E; Gourraud, P-A; Oudshoorn, M; Zanden, H; Schmidt, A H; Pingel, J; Hofmann, J; Müller, C; Eberhard, H-P

2013-01-01

This project has the goal to validate bioinformatics methods and tools for HLA haplotype frequency analysis specifically addressing unique issues of haematopoietic stem cell registry data sets. In addition to generating new methods and tools for the analysis of registry data sets, the intent is to produce a comprehensive analysis of HLA data from 20 million donors from the Bone Marrow Donors Worldwide (BMDW) database. This report summarizes the activity on this project as of the 16IHIW meeting in Liverpool. PMID:23280139
An analysis of possible applications of fuzzy set theory to the actuarial credibility theory

NASA Technical Reports Server (NTRS)

Ostaszewski, Krzysztof; Karwowski, Waldemar

1992-01-01

In this work, we review the basic concepts of actuarial credibility theory from the point of view of introducing applications of the fuzzy set-theoretic method. We show how the concept of actuarial credibility can be modeled through the fuzzy set membership functions and how fuzzy set methods, especially fuzzy pattern recognition, can provide an alternative tool for estimating credibility.
Comparison of normalization methods for differential gene expression analysis in RNA-Seq experiments

PubMed Central

Maza, Elie; Frasse, Pierre; Senin, Pavel; Bouzayen, Mondher; Zouine, Mohamed

2013-01-01

In recent years, RNA-Seq technologies became a powerful tool for transcriptome studies. However, computational methods dedicated to the analysis of high-throughput sequencing data are yet to be standardized. In particular, it is known that the choice of a normalization procedure leads to a great variability in results of differential gene expression analysis. The present study compares the most widespread normalization procedures and proposes a novel one aiming at removing an inherent bias of studied transcriptomes related to their relative size. Comparisons of the normalization procedures are performed on real and simulated data sets. Real RNA-Seq data sets analyses, performed with all the different normalization methods, show that only 50% of significantly differentially expressed genes are common. This result highlights the influence of the normalization step on the differential expression analysis. Real and simulated data sets analyses give similar results showing 3 different groups of procedures having the same behavior. The group including the novel method named “Median Ratio Normalization” (MRN) gives the lower number of false discoveries. Within this group the MRN method is less sensitive to the modification of parameters related to the relative size of transcriptomes such as the number of down- and upregulated genes and the gene expression levels. The newly proposed MRN method efficiently deals with intrinsic bias resulting from relative size of studied transcriptomes. Validation with real and simulated data sets confirmed that MRN is more consistent and robust than existing methods. PMID:26442135
The Impact of Normalization Methods on RNA-Seq Data Analysis

PubMed Central

Zyprych-Walczak, J.; Szabelska, A.; Handschuh, L.; Górczak, K.; Klamecka, K.; Figlerowicz, M.; Siatkowski, I.

2015-01-01

High-throughput sequencing technologies, such as the Illumina Hi-seq, are powerful new tools for investigating a wide range of biological and medical problems. Massive and complex data sets produced by the sequencers create a need for development of statistical and computational methods that can tackle the analysis and management of data. The data normalization is one of the most crucial steps of data processing and this process must be carefully considered as it has a profound effect on the results of the analysis. In this work, we focus on a comprehensive comparison of five normalization methods related to sequencing depth, widely used for transcriptome sequencing (RNA-seq) data, and their impact on the results of gene expression analysis. Based on this study, we suggest a universal workflow that can be applied for the selection of the optimal normalization procedure for any particular data set. The described workflow includes calculation of the bias and variance values for the control genes, sensitivity and specificity of the methods, and classification errors as well as generation of the diagnostic plots. Combining the above information facilitates the selection of the most appropriate normalization method for the studied data sets and determines which methods can be used interchangeably. PMID:26176014
David W. Templeton | NREL

Science.gov Websites

and algal biomass analysis methods and applications of these methods to different processes. Templeton , internally funded research project to develop microalgal compositional analysis methods that included setting methods Closing mass and component balances around pretreatment, saccharification, and fermentation unit
Regularized Generalized Canonical Correlation Analysis

ERIC Educational Resources Information Center

Tenenhaus, Arthur; Tenenhaus, Michel

2011-01-01

Regularized generalized canonical correlation analysis (RGCCA) is a generalization of regularized canonical correlation analysis to three or more sets of variables. It constitutes a general framework for many multi-block data analysis methods. It combines the power of multi-block data analysis methods (maximization of well identified criteria) and…
DOE Office of Scientific and Technical Information (OSTI.GOV)

Aurich, Maike K.; Fleming, Ronan M. T.; Thiele, Ines

Metabolomic data sets provide a direct read-out of cellular phenotypes and are increasingly generated to study biological questions. Previous work, by us and others, revealed the potential of analyzing extracellular metabolomic data in the context of the metabolic model using constraint-based modeling. With the MetaboTools, we make our methods available to the broader scientific community. The MetaboTools consist of a protocol, a toolbox, and tutorials of two use cases. The protocol describes, in a step-wise manner, the workflow of data integration, and computational analysis. The MetaboTools comprise the Matlab code required to complete the workflow described in the protocol. Tutorialsmore » explain the computational steps for integration of two different data sets and demonstrate a comprehensive set of methods for the computational analysis of metabolic models and stratification thereof into different phenotypes. The presented workflow supports integrative analysis of multiple omics data sets. Importantly, all analysis tools can be applied to metabolic models without performing the entire workflow. Taken together, the MetaboTools constitute a comprehensive guide to the intra-model analysis of extracellular metabolomic data from microbial, plant, or human cells. In conclusion, this computational modeling resource offers a broad set of computational analysis tools for a wide biomedical and non-biomedical research community.« less
A Critical Analysis of the Body of Work Method for Setting Cut-Scores

ERIC Educational Resources Information Center

Radwan, Nizam; Rogers, W. Todd

2006-01-01

The recent increase in the use of constructed-response items in educational assessment and the dissatisfaction with the nature of the decision that the judges must make using traditional standard-setting methods created a need to develop new and effective standard-setting procedures for tests that include both multiple-choice and…
Machine learning applications in genetics and genomics.

PubMed

Libbrecht, Maxwell W; Noble, William Stafford

2015-06-01

The field of machine learning, which aims to develop computer algorithms that improve with experience, holds promise to enable computers to assist humans in the analysis of large, complex data sets. Here, we provide an overview of machine learning applications for the analysis of genome sequencing data sets, including the annotation of sequence elements and epigenetic, proteomic or metabolomic data. We present considerations and recurrent challenges in the application of supervised, semi-supervised and unsupervised machine learning methods, as well as of generative and discriminative modelling approaches. We provide general guidelines to assist in the selection of these machine learning methods and their practical application for the analysis of genetic and genomic data sets.
Multivariate analysis in thoracic research.

PubMed

Mengual-Macenlle, Noemí; Marcos, Pedro J; Golpe, Rafael; González-Rivas, Diego

2015-03-01

Multivariate analysis is based in observation and analysis of more than one statistical outcome variable at a time. In design and analysis, the technique is used to perform trade studies across multiple dimensions while taking into account the effects of all variables on the responses of interest. The development of multivariate methods emerged to analyze large databases and increasingly complex data. Since the best way to represent the knowledge of reality is the modeling, we should use multivariate statistical methods. Multivariate methods are designed to simultaneously analyze data sets, i.e., the analysis of different variables for each person or object studied. Keep in mind at all times that all variables must be treated accurately reflect the reality of the problem addressed. There are different types of multivariate analysis and each one should be employed according to the type of variables to analyze: dependent, interdependence and structural methods. In conclusion, multivariate methods are ideal for the analysis of large data sets and to find the cause and effect relationships between variables; there is a wide range of analysis types that we can use.
ADAGE signature analysis: differential expression analysis with data-defined gene sets.

PubMed

Tan, Jie; Huyck, Matthew; Hu, Dongbo; Zelaya, René A; Hogan, Deborah A; Greene, Casey S

2017-11-22

Gene set enrichment analysis and overrepresentation analyses are commonly used methods to determine the biological processes affected by a differential expression experiment. This approach requires biologically relevant gene sets, which are currently curated manually, limiting their availability and accuracy in many organisms without extensively curated resources. New feature learning approaches can now be paired with existing data collections to directly extract functional gene sets from big data. Here we introduce a method to identify perturbed processes. In contrast with methods that use curated gene sets, this approach uses signatures extracted from public expression data. We first extract expression signatures from public data using ADAGE, a neural network-based feature extraction approach. We next identify signatures that are differentially active under a given treatment. Our results demonstrate that these signatures represent biological processes that are perturbed by the experiment. Because these signatures are directly learned from data without supervision, they can identify uncurated or novel biological processes. We implemented ADAGE signature analysis for the bacterial pathogen Pseudomonas aeruginosa. For the convenience of different user groups, we implemented both an R package (ADAGEpath) and a web server ( http://adage.greenelab.com ) to run these analyses. Both are open-source to allow easy expansion to other organisms or signature generation methods. We applied ADAGE signature analysis to an example dataset in which wild-type and ∆anr mutant cells were grown as biofilms on the Cystic Fibrosis genotype bronchial epithelial cells. We mapped active signatures in the dataset to KEGG pathways and compared with pathways identified using GSEA. The two approaches generally return consistent results; however, ADAGE signature analysis also identified a signature that revealed the molecularly supported link between the MexT regulon and Anr. We designed ADAGE signature analysis to perform gene set analysis using data-defined functional gene signatures. This approach addresses an important gap for biologists studying non-traditional model organisms and those without extensive curated resources available. We built both an R package and web server to provide ADAGE signature analysis to the community.
Implementation of Steiner point of fuzzy set.

PubMed

Liang, Jiuzhen; Wang, Dejiang

2014-01-01

This paper deals with the implementation of Steiner point of fuzzy set. Some definitions and properties of Steiner point are investigated and extended to fuzzy set. This paper focuses on establishing efficient methods to compute Steiner point of fuzzy set. Two strategies of computing Steiner point of fuzzy set are proposed. One is called linear combination of Steiner points computed by a series of crisp α-cut sets of the fuzzy set. The other is an approximate method, which is trying to find the optimal α-cut set approaching the fuzzy set. Stability analysis of Steiner point of fuzzy set is also studied. Some experiments on image processing are given, in which the two methods are applied for implementing Steiner point of fuzzy image, and both strategies show their own advantages in computing Steiner point of fuzzy set.
Transformation-cost time-series method for analyzing irregularly sampled data

NASA Astrophysics Data System (ADS)

Ozken, Ibrahim; Eroglu, Deniz; Stemler, Thomas; Marwan, Norbert; Bagci, G. Baris; Kurths, Jürgen

2015-06-01

Irregular sampling of data sets is one of the challenges often encountered in time-series analysis, since traditional methods cannot be applied and the frequently used interpolation approach can corrupt the data and bias the subsequence analysis. Here we present the TrAnsformation-Cost Time-Series (TACTS) method, which allows us to analyze irregularly sampled data sets without degenerating the quality of the data set. Instead of using interpolation we consider time-series segments and determine how close they are to each other by determining the cost needed to transform one segment into the following one. Using a limited set of operations—with associated costs—to transform the time series segments, we determine a new time series, that is our transformation-cost time series. This cost time series is regularly sampled and can be analyzed using standard methods. While our main interest is the analysis of paleoclimate data, we develop our method using numerical examples like the logistic map and the Rössler oscillator. The numerical data allows us to test the stability of our method against noise and for different irregular samplings. In addition we provide guidance on how to choose the associated costs based on the time series at hand. The usefulness of the TACTS method is demonstrated using speleothem data from the Secret Cave in Borneo that is a good proxy for paleoclimatic variability in the monsoon activity around the maritime continent.
Transformation-cost time-series method for analyzing irregularly sampled data.

PubMed

Ozken, Ibrahim; Eroglu, Deniz; Stemler, Thomas; Marwan, Norbert; Bagci, G Baris; Kurths, Jürgen

2015-06-01

Irregular sampling of data sets is one of the challenges often encountered in time-series analysis, since traditional methods cannot be applied and the frequently used interpolation approach can corrupt the data and bias the subsequence analysis. Here we present the TrAnsformation-Cost Time-Series (TACTS) method, which allows us to analyze irregularly sampled data sets without degenerating the quality of the data set. Instead of using interpolation we consider time-series segments and determine how close they are to each other by determining the cost needed to transform one segment into the following one. Using a limited set of operations-with associated costs-to transform the time series segments, we determine a new time series, that is our transformation-cost time series. This cost time series is regularly sampled and can be analyzed using standard methods. While our main interest is the analysis of paleoclimate data, we develop our method using numerical examples like the logistic map and the Rössler oscillator. The numerical data allows us to test the stability of our method against noise and for different irregular samplings. In addition we provide guidance on how to choose the associated costs based on the time series at hand. The usefulness of the TACTS method is demonstrated using speleothem data from the Secret Cave in Borneo that is a good proxy for paleoclimatic variability in the monsoon activity around the maritime continent.
Prediction of protein structural classes by recurrence quantification analysis based on chaos game representation.

PubMed

Yang, Jian-Yi; Peng, Zhen-Ling; Yu, Zu-Guo; Zhang, Rui-Jie; Anh, Vo; Wang, Desheng

2009-04-21

In this paper, we intend to predict protein structural classes (alpha, beta, alpha+beta, or alpha/beta) for low-homology data sets. Two data sets were used widely, 1189 (containing 1092 proteins) and 25PDB (containing 1673 proteins) with sequence homology being 40% and 25%, respectively. We propose to decompose the chaos game representation of proteins into two kinds of time series. Then, a novel and powerful nonlinear analysis technique, recurrence quantification analysis (RQA), is applied to analyze these time series. For a given protein sequence, a total of 16 characteristic parameters can be calculated with RQA, which are treated as feature representation of protein sequences. Based on such feature representation, the structural class for each protein is predicted with Fisher's linear discriminant algorithm. The jackknife test is used to test and compare our method with other existing methods. The overall accuracies with step-by-step procedure are 65.8% and 64.2% for 1189 and 25PDB data sets, respectively. With one-against-others procedure used widely, we compare our method with five other existing methods. Especially, the overall accuracies of our method are 6.3% and 4.1% higher for the two data sets, respectively. Furthermore, only 16 parameters are used in our method, which is less than that used by other methods. This suggests that the current method may play a complementary role to the existing methods and is promising to perform the prediction of protein structural classes.
A Comparison of Two Balance Calibration Model Building Methods

NASA Technical Reports Server (NTRS)

DeLoach, Richard; Ulbrich, Norbert

2007-01-01

Simulated strain-gage balance calibration data is used to compare the accuracy of two balance calibration model building methods for different noise environments and calibration experiment designs. The first building method obtains a math model for the analysis of balance calibration data after applying a candidate math model search algorithm to the calibration data set. The second building method uses stepwise regression analysis in order to construct a model for the analysis. Four balance calibration data sets were simulated in order to compare the accuracy of the two math model building methods. The simulated data sets were prepared using the traditional One Factor At a Time (OFAT) technique and the Modern Design of Experiments (MDOE) approach. Random and systematic errors were introduced in the simulated calibration data sets in order to study their influence on the math model building methods. Residuals of the fitted calibration responses and other statistical metrics were compared in order to evaluate the calibration models developed with different combinations of noise environment, experiment design, and model building method. Overall, predicted math models and residuals of both math model building methods show very good agreement. Significant differences in model quality were attributable to noise environment, experiment design, and their interaction. Generally, the addition of systematic error significantly degraded the quality of calibration models developed from OFAT data by either method, but MDOE experiment designs were more robust with respect to the introduction of a systematic component of the unexplained variance.
Evaluation of peak-picking algorithms for protein mass spectrometry.

PubMed

Bauer, Chris; Cramer, Rainer; Schuchhardt, Johannes

2011-01-01

Peak picking is an early key step in MS data analysis. We compare three commonly used approaches to peak picking and discuss their merits by means of statistical analysis. Methods investigated encompass signal-to-noise ratio, continuous wavelet transform, and a correlation-based approach using a Gaussian template. Functionality of the three methods is illustrated and discussed in a practical context using a mass spectral data set created with MALDI-TOF technology. Sensitivity and specificity are investigated using a manually defined reference set of peaks. As an additional criterion, the robustness of the three methods is assessed by a perturbation analysis and illustrated using ROC curves.

MetaboTools: A comprehensive toolbox for analysis of genome-scale metabolic models

DOE PAGES

Aurich, Maike K.; Fleming, Ronan M. T.; Thiele, Ines

2016-08-03

Metabolomic data sets provide a direct read-out of cellular phenotypes and are increasingly generated to study biological questions. Previous work, by us and others, revealed the potential of analyzing extracellular metabolomic data in the context of the metabolic model using constraint-based modeling. With the MetaboTools, we make our methods available to the broader scientific community. The MetaboTools consist of a protocol, a toolbox, and tutorials of two use cases. The protocol describes, in a step-wise manner, the workflow of data integration, and computational analysis. The MetaboTools comprise the Matlab code required to complete the workflow described in the protocol. Tutorialsmore » explain the computational steps for integration of two different data sets and demonstrate a comprehensive set of methods for the computational analysis of metabolic models and stratification thereof into different phenotypes. The presented workflow supports integrative analysis of multiple omics data sets. Importantly, all analysis tools can be applied to metabolic models without performing the entire workflow. Taken together, the MetaboTools constitute a comprehensive guide to the intra-model analysis of extracellular metabolomic data from microbial, plant, or human cells. In conclusion, this computational modeling resource offers a broad set of computational analysis tools for a wide biomedical and non-biomedical research community.« less
A Survey of Popular R Packages for Cluster Analysis

ERIC Educational Resources Information Center

Flynt, Abby; Dean, Nema

2016-01-01

Cluster analysis is a set of statistical methods for discovering new group/class structure when exploring data sets. This article reviews the following popular libraries/commands in the R software language for applying different types of cluster analysis: from the stats library, the kmeans, and hclust functions; the mclust library; the poLCA…
Weighted partial least squares based on the error and variance of the recovery rate in calibration set.

PubMed

Yu, Shaohui; Xiao, Xue; Ding, Hong; Xu, Ge; Li, Haixia; Liu, Jing

2017-08-05

The quantitative analysis is very difficult for the emission-excitation fluorescence spectroscopy of multi-component mixtures whose fluorescence peaks are serious overlapping. As an effective method for the quantitative analysis, partial least squares can extract the latent variables from both the independent variables and the dependent variables, so it can model for multiple correlations between variables. However, there are some factors that usually affect the prediction results of partial least squares, such as the noise, the distribution and amount of the samples in calibration set etc. This work focuses on the problems in the calibration set that are mentioned above. Firstly, the outliers in the calibration set are removed by leave-one-out cross-validation. Then, according to two different prediction requirements, the EWPLS method and the VWPLS method are proposed. The independent variables and dependent variables are weighted in the EWPLS method by the maximum error of the recovery rate and weighted in the VWPLS method by the maximum variance of the recovery rate. Three organic matters with serious overlapping excitation-emission fluorescence spectroscopy are selected for the experiments. The step adjustment parameter, the iteration number and the sample amount in the calibration set are discussed. The results show the EWPLS method and the VWPLS method are superior to the PLS method especially for the case of small samples in the calibration set. Copyright © 2017 Elsevier B.V. All rights reserved.
Probabilistic finite elements for transient analysis in nonlinear continua

NASA Technical Reports Server (NTRS)

Liu, W. K.; Belytschko, T.; Mani, A.

1985-01-01

The probabilistic finite element method (PFEM), which is a combination of finite element methods and second-moment analysis, is formulated for linear and nonlinear continua with inhomogeneous random fields. Analogous to the discretization of the displacement field in finite element methods, the random field is also discretized. The formulation is simplified by transforming the correlated variables to a set of uncorrelated variables through an eigenvalue orthogonalization. Furthermore, it is shown that a reduced set of the uncorrelated variables is sufficient for the second-moment analysis. Based on the linear formulation of the PFEM, the method is then extended to transient analysis in nonlinear continua. The accuracy and efficiency of the method is demonstrated by application to a one-dimensional, elastic/plastic wave propagation problem. The moments calculated compare favorably with those obtained by Monte Carlo simulation. Also, the procedure is amenable to implementation in deterministic FEM based computer programs.
Improvements to direct quantitative analysis of multiple microRNAs facilitating faster analysis.

PubMed

Ghasemi, Farhad; Wegman, David W; Kanoatov, Mirzo; Yang, Burton B; Liu, Stanley K; Yousef, George M; Krylov, Sergey N

2013-11-05

Studies suggest that patterns of deregulation in sets of microRNA (miRNA) can be used as cancer diagnostic and prognostic biomarkers. Establishing a "miRNA fingerprint"-based diagnostic technique requires a suitable miRNA quantitation method. The appropriate method must be direct, sensitive, capable of simultaneous analysis of multiple miRNAs, rapid, and robust. Direct quantitative analysis of multiple microRNAs (DQAMmiR) is a recently introduced capillary electrophoresis-based hybridization assay that satisfies most of these criteria. Previous implementations of the method suffered, however, from slow analysis time and required lengthy and stringent purification of hybridization probes. Here, we introduce a set of critical improvements to DQAMmiR that address these technical limitations. First, we have devised an efficient purification procedure that achieves the required purity of the hybridization probe in a fast and simple fashion. Second, we have optimized the concentrations of the DNA probe to decrease the hybridization time to 10 min. Lastly, we have demonstrated that the increased probe concentrations and decreased incubation time removed the need for masking DNA, further simplifying the method and increasing its robustness. The presented improvements bring DQAMmiR closer to use in a clinical setting.
Comparison of Seven Methods for Boolean Factor Analysis and Their Evaluation by Information Gain.

PubMed

Frolov, Alexander A; Húsek, Dušan; Polyakov, Pavel Yu

2016-03-01

An usual task in large data set analysis is searching for an appropriate data representation in a space of fewer dimensions. One of the most efficient methods to solve this task is factor analysis. In this paper, we compare seven methods for Boolean factor analysis (BFA) in solving the so-called bars problem (BP), which is a BFA benchmark. The performance of the methods is evaluated by means of information gain. Study of the results obtained in solving BP of different levels of complexity has allowed us to reveal strengths and weaknesses of these methods. It is shown that the Likelihood maximization Attractor Neural Network with Increasing Activity (LANNIA) is the most efficient BFA method in solving BP in many cases. Efficacy of the LANNIA method is also shown, when applied to the real data from the Kyoto Encyclopedia of Genes and Genomes database, which contains full genome sequencing for 1368 organisms, and to text data set R52 (from Reuters 21578) typically used for label categorization.
Python package for model STructure ANalysis (pySTAN)

NASA Astrophysics Data System (ADS)

Van Hoey, Stijn; van der Kwast, Johannes; Nopens, Ingmar; Seuntjens, Piet

2013-04-01

The selection and identification of a suitable hydrological model structure is more than fitting parameters of a model structure to reproduce a measured hydrograph. The procedure is highly dependent on various criteria, i.e. the modelling objective, the characteristics and the scale of the system under investigation as well as the available data. Rigorous analysis of the candidate model structures is needed to support and objectify the selection of the most appropriate structure for a specific case (or eventually justify the use of a proposed ensemble of structures). This holds both in the situation of choosing between a limited set of different structures as well as in the framework of flexible model structures with interchangeable components. Many different methods to evaluate and analyse model structures exist. This leads to a sprawl of available methods, all characterized by different assumptions, changing conditions of application and various code implementations. Methods typically focus on optimization, sensitivity analysis or uncertainty analysis, with backgrounds from optimization, machine-learning or statistics amongst others. These methods also need an evaluation metric (objective function) to compare the model outcome with some observed data. However, for current methods described in literature, implementations are not always transparent and reproducible (if available at all). No standard procedures exist to share code and the popularity (and amount of applications) of the methods is sometimes more dependent on the availability than the merits of the method. Moreover, new implementations of existing methods are difficult to verify and the different theoretical backgrounds make it difficult for environmental scientists to decide about the usefulness of a specific method. A common and open framework with a large set of methods can support users in deciding about the most appropriate method. Hence, it enables to simultaneously apply and compare different methods on a fair basis. We developed and present pySTAN (python framework for STructure Analysis), a python package containing a set of functions for model structure evaluation to provide the analysis of (hydrological) model structures. A selected set of algorithms for optimization, uncertainty and sensitivity analysis is currently available, together with a set of evaluation (objective) functions and input distributions to sample from. The methods are implemented model-independent and the python language provides the wrapper functions to apply administer external model codes. Different objective functions can be considered simultaneously with both statistical metrics and more hydrology specific metrics. By using so-called reStructuredText (sphinx documentation generator) and Python documentation strings (docstrings), the generation of manual pages is semi-automated and a specific environment is available to enhance both the readability and transparency of the code. It thereby enables a larger group of users to apply and compare these methods and to extend the functionalities.
Coding and Commonality Analysis: Non-ANOVA Methods for Analyzing Data from Experiments.

ERIC Educational Resources Information Center

Thompson, Bruce

The advantages and disadvantages of three analytic methods used to analyze experimental data in educational research are discussed. The same hypothetical data set is used with all methods for a direct comparison. The Analysis of Variance (ANOVA) method and its several analogs are collectively labeled OVA methods and are evaluated. Regression…
Comparison of Motion Blur Measurement Methods

NASA Technical Reports Server (NTRS)

Watson, Andrew B.

2008-01-01

Motion blur is a significant display property for which accurate, valid measurement methods are needed. Recent measurements of a set of eight displays by a set of six measurement devices provide an opportunity to evaluate techniques of measurement and of the analysis of those measurements.
Ranking metrics in gene set enrichment analysis: do they matter?

PubMed

Zyla, Joanna; Marczyk, Michal; Weiner, January; Polanska, Joanna

2017-05-12

There exist many methods for describing the complex relation between changes of gene expression in molecular pathways or gene ontologies under different experimental conditions. Among them, Gene Set Enrichment Analysis seems to be one of the most commonly used (over 10,000 citations). An important parameter, which could affect the final result, is the choice of a metric for the ranking of genes. Applying a default ranking metric may lead to poor results. In this work 28 benchmark data sets were used to evaluate the sensitivity and false positive rate of gene set analysis for 16 different ranking metrics including new proposals. Furthermore, the robustness of the chosen methods to sample size was tested. Using k-means clustering algorithm a group of four metrics with the highest performance in terms of overall sensitivity, overall false positive rate and computational load was established i.e. absolute value of Moderated Welch Test statistic, Minimum Significant Difference, absolute value of Signal-To-Noise ratio and Baumgartner-Weiss-Schindler test statistic. In case of false positive rate estimation, all selected ranking metrics were robust with respect to sample size. In case of sensitivity, the absolute value of Moderated Welch Test statistic and absolute value of Signal-To-Noise ratio gave stable results, while Baumgartner-Weiss-Schindler and Minimum Significant Difference showed better results for larger sample size. Finally, the Gene Set Enrichment Analysis method with all tested ranking metrics was parallelised and implemented in MATLAB, and is available at https://github.com/ZAEDPolSl/MrGSEA . Choosing a ranking metric in Gene Set Enrichment Analysis has critical impact on results of pathway enrichment analysis. The absolute value of Moderated Welch Test has the best overall sensitivity and Minimum Significant Difference has the best overall specificity of gene set analysis. When the number of non-normally distributed genes is high, using Baumgartner-Weiss-Schindler test statistic gives better outcomes. Also, it finds more enriched pathways than other tested metrics, which may induce new biological discoveries.
Integrating Cross-Case Analyses and Process Tracing in Set-Theoretic Research: Strategies and Parameters of Debate

ERIC Educational Resources Information Center

Beach, Derek; Rohlfing, Ingo

2018-01-01

In recent years, there has been increasing interest in the combination of two methods on the basis of set theory. In our introduction and this special issue, we focus on two variants of cross-case set-theoretic methods--"qualitative comparative analysis" (QCA) and typological theory (TT)--and their combination with process tracing (PT).…
Comparison of fMRI analysis methods for heterogeneous BOLD responses in block design studies.

PubMed

Liu, Jia; Duffy, Ben A; Bernal-Casas, David; Fang, Zhongnan; Lee, Jin Hyung

2017-02-15

A large number of fMRI studies have shown that the temporal dynamics of evoked BOLD responses can be highly heterogeneous. Failing to model heterogeneous responses in statistical analysis can lead to significant errors in signal detection and characterization and alter the neurobiological interpretation. However, to date it is not clear that, out of a large number of options, which methods are robust against variability in the temporal dynamics of BOLD responses in block-design studies. Here, we used rodent optogenetic fMRI data with heterogeneous BOLD responses and simulations guided by experimental data as a means to investigate different analysis methods' performance against heterogeneous BOLD responses. Evaluations are carried out within the general linear model (GLM) framework and consist of standard basis sets as well as independent component analysis (ICA). Analyses show that, in the presence of heterogeneous BOLD responses, conventionally used GLM with a canonical basis set leads to considerable errors in the detection and characterization of BOLD responses. Our results suggest that the 3rd and 4th order gamma basis sets, the 7th to 9th order finite impulse response (FIR) basis sets, the 5th to 9th order B-spline basis sets, and the 2nd to 5th order Fourier basis sets are optimal for good balance between detection and characterization, while the 1st order Fourier basis set (coherence analysis) used in our earlier studies show good detection capability. ICA has mostly good detection and characterization capabilities, but detects a large volume of spurious activation with the control fMRI data. Copyright © 2016 Elsevier Inc. All rights reserved.
[Local Regression Algorithm Based on Net Analyte Signal and Its Application in Near Infrared Spectral Analysis].

PubMed

Zhang, Hong-guang; Lu, Jian-gang

2016-02-01

Abstract To overcome the problems of significant difference among samples and nonlinearity between the property and spectra of samples in spectral quantitative analysis, a local regression algorithm is proposed in this paper. In this algorithm, net signal analysis method(NAS) was firstly used to obtain the net analyte signal of the calibration samples and unknown samples, then the Euclidean distance between net analyte signal of the sample and net analyte signal of calibration samples was calculated and utilized as similarity index. According to the defined similarity index, the local calibration sets were individually selected for each unknown sample. Finally, a local PLS regression model was built on each local calibration sets for each unknown sample. The proposed method was applied to a set of near infrared spectra of meat samples. The results demonstrate that the prediction precision and model complexity of the proposed method are superior to global PLS regression method and conventional local regression algorithm based on spectral Euclidean distance.
Screening and Evaluation Tool (SET) Users Guide

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pincock, Layne

This document is the users guide to using the Screening and Evaluation Tool (SET). SET is a tool for comparing multiple fuel cycle options against a common set of criteria and metrics. It does this using standard multi-attribute utility decision analysis methods.
Development and application of computer assisted optimal method for treatment of femoral neck fracture.

PubMed

Wang, Monan; Zhang, Kai; Yang, Ning

2018-04-09

To help doctors decide their treatment from the aspect of mechanical analysis, the work built a computer assisted optimal system for treatment of femoral neck fracture oriented to clinical application. The whole system encompassed the following three parts: Preprocessing module, finite element mechanical analysis module, post processing module. Preprocessing module included parametric modeling of bone, parametric modeling of fracture face, parametric modeling of fixed screw and fixed position and input and transmission of model parameters. Finite element mechanical analysis module included grid division, element type setting, material property setting, contact setting, constraint and load setting, analysis method setting and batch processing operation. Post processing module included extraction and display of batch processing operation results, image generation of batch processing operation, optimal program operation and optimal result display. The system implemented the whole operations from input of fracture parameters to output of the optimal fixed plan according to specific patient real fracture parameter and optimal rules, which demonstrated the effectiveness of the system. Meanwhile, the system had a friendly interface, simple operation and could improve the system function quickly through modifying single module.
Data mining and computationally intensive methods: summary of Group 7 contributions to Genetic Analysis Workshop 13.

PubMed

Costello, Tracy J; Falk, Catherine T; Ye, Kenny Q

2003-01-01

The Framingham Heart Study data, as well as a related simulated data set, were generously provided to the participants of the Genetic Analysis Workshop 13 in order that newly developed and emerging statistical methodologies could be tested on that well-characterized data set. The impetus driving the development of novel methods is to elucidate the contributions of genes, environment, and interactions between and among them, as well as to allow comparison between and validation of methods. The seven papers that comprise this group used data-mining methodologies (tree-based methods, neural networks, discriminant analysis, and Bayesian variable selection) in an attempt to identify the underlying genetics of cardiovascular disease and related traits in the presence of environmental and genetic covariates. Data-mining strategies are gaining popularity because they are extremely flexible and may have greater efficiency and potential in identifying the factors involved in complex disorders. While the methods grouped together here constitute a diverse collection, some papers asked similar questions with very different methods, while others used the same underlying methodology to ask very different questions. This paper briefly describes the data-mining methodologies applied to the Genetic Analysis Workshop 13 data sets and the results of those investigations. Copyright 2003 Wiley-Liss, Inc.
The use of the wavelet cluster analysis for asteroid family determination

NASA Technical Reports Server (NTRS)

Benjoya, Phillippe; Slezak, E.; Froeschle, Claude

1992-01-01

The asteroid family determination has been analysis method dependent for a longtime. A new cluster analysis based on the wavelet transform has allowed an automatic definition of families with a degree of significance versus randomness. Actually this method is rather general and can be applied to any kind of structural analysis. We will rather concentrate on the main features of the method. The analysis has been performed on the set of 4100 asteroid proper elements computed by Milani and Knezevic (see Milani and Knezevic 1990). Twenty one families have been found and influence of the chosen metric has been tested. The results have beem compared to Zappala et al.'s ones (see Zappala et al 1990) obtained by the use of a completely different method applied to the same set of data. For the first time, a good overlapping has been found between both method results, not only for the big well known families but also for the smallest ones.
Gene set analysis using variance component tests.

PubMed

Huang, Yen-Tsung; Lin, Xihong

2013-06-28

Gene set analyses have become increasingly important in genomic research, as many complex diseases are contributed jointly by alterations of numerous genes. Genes often coordinate together as a functional repertoire, e.g., a biological pathway/network and are highly correlated. However, most of the existing gene set analysis methods do not fully account for the correlation among the genes. Here we propose to tackle this important feature of a gene set to improve statistical power in gene set analyses. We propose to model the effects of an independent variable, e.g., exposure/biological status (yes/no), on multiple gene expression values in a gene set using a multivariate linear regression model, where the correlation among the genes is explicitly modeled using a working covariance matrix. We develop TEGS (Test for the Effect of a Gene Set), a variance component test for the gene set effects by assuming a common distribution for regression coefficients in multivariate linear regression models, and calculate the p-values using permutation and a scaled chi-square approximation. We show using simulations that type I error is protected under different choices of working covariance matrices and power is improved as the working covariance approaches the true covariance. The global test is a special case of TEGS when correlation among genes in a gene set is ignored. Using both simulation data and a published diabetes dataset, we show that our test outperforms the commonly used approaches, the global test and gene set enrichment analysis (GSEA). We develop a gene set analyses method (TEGS) under the multivariate regression framework, which directly models the interdependence of the expression values in a gene set using a working covariance. TEGS outperforms two widely used methods, GSEA and global test in both simulation and a diabetes microarray data.
Simplified Processing Method for Meter Data Analysis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Fowler, Kimberly M.; Colotelo, Alison H. A.; Downs, Janelle L.

2015-11-01

Simple/Quick metered data processing method that can be used for Army Metered Data Management System (MDMS) and Logistics Innovation Agency data, but may also be useful for other large data sets. Intended for large data sets when analyst has little information about the buildings.
Automated Classification and Analysis of Non-metallic Inclusion Data Sets

NASA Astrophysics Data System (ADS)

Abdulsalam, Mohammad; Zhang, Tongsheng; Tan, Jia; Webler, Bryan A.

2018-05-01

The aim of this study is to utilize principal component analysis (PCA), clustering methods, and correlation analysis to condense and examine large, multivariate data sets produced from automated analysis of non-metallic inclusions. Non-metallic inclusions play a major role in defining the properties of steel and their examination has been greatly aided by automated analysis in scanning electron microscopes equipped with energy dispersive X-ray spectroscopy. The methods were applied to analyze inclusions on two sets of samples: two laboratory-scale samples and four industrial samples from a near-finished 4140 alloy steel components with varying machinability. The laboratory samples had well-defined inclusions chemistries, composed of MgO-Al2O3-CaO, spinel (MgO-Al2O3), and calcium aluminate inclusions. The industrial samples contained MnS inclusions as well as (Ca,Mn)S + calcium aluminate oxide inclusions. PCA could be used to reduce inclusion chemistry variables to a 2D plot, which revealed inclusion chemistry groupings in the samples. Clustering methods were used to automatically classify inclusion chemistry measurements into groups, i.e., no user-defined rules were required.

Spatio-Chromatic Adaptation via Higher-Order Canonical Correlation Analysis of Natural Images

PubMed Central

Gutmann, Michael U.; Laparra, Valero; Hyvärinen, Aapo; Malo, Jesús

2014-01-01

Independent component and canonical correlation analysis are two general-purpose statistical methods with wide applicability. In neuroscience, independent component analysis of chromatic natural images explains the spatio-chromatic structure of primary cortical receptive fields in terms of properties of the visual environment. Canonical correlation analysis explains similarly chromatic adaptation to different illuminations. But, as we show in this paper, neither of the two methods generalizes well to explain both spatio-chromatic processing and adaptation at the same time. We propose a statistical method which combines the desirable properties of independent component and canonical correlation analysis: It finds independent components in each data set which, across the two data sets, are related to each other via linear or higher-order correlations. The new method is as widely applicable as canonical correlation analysis, and also to more than two data sets. We call it higher-order canonical correlation analysis. When applied to chromatic natural images, we found that it provides a single (unified) statistical framework which accounts for both spatio-chromatic processing and adaptation. Filters with spatio-chromatic tuning properties as in the primary visual cortex emerged and corresponding-colors psychophysics was reproduced reasonably well. We used the new method to make a theory-driven testable prediction on how the neural response to colored patterns should change when the illumination changes. We predict shifts in the responses which are comparable to the shifts reported for chromatic contrast habituation. PMID:24533049
Spatio-chromatic adaptation via higher-order canonical correlation analysis of natural images.

PubMed

Gutmann, Michael U; Laparra, Valero; Hyvärinen, Aapo; Malo, Jesús

2014-01-01

Independent component and canonical correlation analysis are two general-purpose statistical methods with wide applicability. In neuroscience, independent component analysis of chromatic natural images explains the spatio-chromatic structure of primary cortical receptive fields in terms of properties of the visual environment. Canonical correlation analysis explains similarly chromatic adaptation to different illuminations. But, as we show in this paper, neither of the two methods generalizes well to explain both spatio-chromatic processing and adaptation at the same time. We propose a statistical method which combines the desirable properties of independent component and canonical correlation analysis: It finds independent components in each data set which, across the two data sets, are related to each other via linear or higher-order correlations. The new method is as widely applicable as canonical correlation analysis, and also to more than two data sets. We call it higher-order canonical correlation analysis. When applied to chromatic natural images, we found that it provides a single (unified) statistical framework which accounts for both spatio-chromatic processing and adaptation. Filters with spatio-chromatic tuning properties as in the primary visual cortex emerged and corresponding-colors psychophysics was reproduced reasonably well. We used the new method to make a theory-driven testable prediction on how the neural response to colored patterns should change when the illumination changes. We predict shifts in the responses which are comparable to the shifts reported for chromatic contrast habituation.
On the use of cartographic projections in visualizing phylo-genetic tree space

PubMed Central

2010-01-01

Phylogenetic analysis is becoming an increasingly important tool for biological research. Applications include epidemiological studies, drug development, and evolutionary analysis. Phylogenetic search is a known NP-Hard problem. The size of the data sets which can be analyzed is limited by the exponential growth in the number of trees that must be considered as the problem size increases. A better understanding of the problem space could lead to better methods, which in turn could lead to the feasible analysis of more data sets. We present a definition of phylogenetic tree space and a visualization of this space that shows significant exploitable structure. This structure can be used to develop search methods capable of handling much larger data sets. PMID:20529355
Applying reliability analysis to design electric power systems for More-electric aircraft

NASA Astrophysics Data System (ADS)

Zhang, Baozhu

The More-Electric Aircraft (MEA) is a type of aircraft that replaces conventional hydraulic and pneumatic systems with electrically powered components. These changes have significantly challenged the aircraft electric power system design. This thesis investigates how reliability analysis can be applied to automatically generate system topologies for the MEA electric power system. We first use a traditional method of reliability block diagrams to analyze the reliability level on different system topologies. We next propose a new methodology in which system topologies, constrained by a set reliability level, are automatically generated. The path-set method is used for analysis. Finally, we interface these sets of system topologies with control synthesis tools to automatically create correct-by-construction control logic for the electric power system.
Multivariate statistical data analysis methods for detecting baroclinic wave interactions in the thermally driven rotating annulus

NASA Astrophysics Data System (ADS)

von Larcher, Thomas; Harlander, Uwe; Alexandrov, Kiril; Wang, Yongtai

2010-05-01

Experiments on baroclinic wave instabilities in a rotating cylindrical gap have been long performed, e.g., to unhide regular waves of different zonal wave number, to better understand the transition to the quasi-chaotic regime, and to reveal the underlying dynamical processes of complex wave flows. We present the application of appropriate multivariate data analysis methods on time series data sets acquired by the use of non-intrusive measurement techniques of a quite different nature. While the high accurate Laser-Doppler-Velocimetry (LDV ) is used for measurements of the radial velocity component at equidistant azimuthal positions, a high sensitive thermographic camera measures the surface temperature field. The measurements are performed at particular parameter points, where our former studies show that kinds of complex wave patterns occur [1, 2]. Obviously, the temperature data set has much more information content as the velocity data set due to the particular measurement techniques. Both sets of time series data are analyzed by using multivariate statistical techniques. While the LDV data sets are studied by applying the Multi-Channel Singular Spectrum Analysis (M - SSA), the temperature data sets are analyzed by applying the Empirical Orthogonal Functions (EOF ). Our goal is (a) to verify the results yielded with the analysis of the velocity data and (b) to compare the data analysis methods. Therefor, the temperature data are processed in a way to become comparable to the LDV data, i.e. reducing the size of the data set in such a manner that the temperature measurements would imaginary be performed at equidistant azimuthal positions only. This approach initially results in a great loss of information. But applying the M - SSA to the reduced temperature data sets enable us to compare the methods. [1] Th. von Larcher and C. Egbers, Experiments on transitions of baroclinic waves in a differentially heated rotating annulus, Nonlinear Processes in Geophysics, 2005, 12, 1033-1041, NPG Print: ISSN 1023-5809, NPG Online: ISSN 1607-7946 [2] U. Harlander, Th. von Larcher, Y. Wang and C. Egbers, PIV- and LDV-measurements of baroclinic wave interactions in a thermally driven rotating annulus, Experiments in Fluids, 2009, DOI: 10.1007/s00348-009-0792-5
gsSKAT: Rapid gene set analysis and multiple testing correction for rare-variant association studies using weighted linear kernels.

PubMed

Larson, Nicholas B; McDonnell, Shannon; Cannon Albright, Lisa; Teerlink, Craig; Stanford, Janet; Ostrander, Elaine A; Isaacs, William B; Xu, Jianfeng; Cooney, Kathleen A; Lange, Ethan; Schleutker, Johanna; Carpten, John D; Powell, Isaac; Bailey-Wilson, Joan E; Cussenot, Olivier; Cancel-Tassin, Geraldine; Giles, Graham G; MacInnis, Robert J; Maier, Christiane; Whittemore, Alice S; Hsieh, Chih-Lin; Wiklund, Fredrik; Catalona, William J; Foulkes, William; Mandal, Diptasri; Eeles, Rosalind; Kote-Jarai, Zsofia; Ackerman, Michael J; Olson, Timothy M; Klein, Christopher J; Thibodeau, Stephen N; Schaid, Daniel J

2017-05-01

Next-generation sequencing technologies have afforded unprecedented characterization of low-frequency and rare genetic variation. Due to low power for single-variant testing, aggregative methods are commonly used to combine observed rare variation within a single gene. Causal variation may also aggregate across multiple genes within relevant biomolecular pathways. Kernel-machine regression and adaptive testing methods for aggregative rare-variant association testing have been demonstrated to be powerful approaches for pathway-level analysis, although these methods tend to be computationally intensive at high-variant dimensionality and require access to complete data. An additional analytical issue in scans of large pathway definition sets is multiple testing correction. Gene set definitions may exhibit substantial genic overlap, and the impact of the resultant correlation in test statistics on Type I error rate control for large agnostic gene set scans has not been fully explored. Herein, we first outline a statistical strategy for aggregative rare-variant analysis using component gene-level linear kernel score test summary statistics as well as derive simple estimators of the effective number of tests for family-wise error rate control. We then conduct extensive simulation studies to characterize the behavior of our approach relative to direct application of kernel and adaptive methods under a variety of conditions. We also apply our method to two case-control studies, respectively, evaluating rare variation in hereditary prostate cancer and schizophrenia. Finally, we provide open-source R code for public use to facilitate easy application of our methods to existing rare-variant analysis results. © 2017 WILEY PERIODICALS, INC.
HYPOTHESIS SETTING AND ORDER STATISTIC FOR ROBUST GENOMIC META-ANALYSIS.

PubMed

Song, Chi; Tseng, George C

2014-01-01

Meta-analysis techniques have been widely developed and applied in genomic applications, especially for combining multiple transcriptomic studies. In this paper, we propose an order statistic of p-values ( r th ordered p-value, rOP) across combined studies as the test statistic. We illustrate different hypothesis settings that detect gene markers differentially expressed (DE) "in all studies", "in the majority of studies", or "in one or more studies", and specify rOP as a suitable method for detecting DE genes "in the majority of studies". We develop methods to estimate the parameter r in rOP for real applications. Statistical properties such as its asymptotic behavior and a one-sided testing correction for detecting markers of concordant expression changes are explored. Power calculation and simulation show better performance of rOP compared to classical Fisher's method, Stouffer's method, minimum p-value method and maximum p-value method under the focused hypothesis setting. Theoretically, rOP is found connected to the naïve vote counting method and can be viewed as a generalized form of vote counting with better statistical properties. The method is applied to three microarray meta-analysis examples including major depressive disorder, brain cancer and diabetes. The results demonstrate rOP as a more generalizable, robust and sensitive statistical framework to detect disease-related markers.
Statistical Test of Expression Pattern (STEPath): a new strategy to integrate gene expression data with genomic information in individual and meta-analysis studies.

PubMed

Martini, Paolo; Risso, Davide; Sales, Gabriele; Romualdi, Chiara; Lanfranchi, Gerolamo; Cagnin, Stefano

2011-04-11

In the last decades, microarray technology has spread, leading to a dramatic increase of publicly available datasets. The first statistical tools developed were focused on the identification of significant differentially expressed genes. Later, researchers moved toward the systematic integration of gene expression profiles with additional biological information, such as chromosomal location, ontological annotations or sequence features. The analysis of gene expression linked to physical location of genes on chromosomes allows the identification of transcriptionally imbalanced regions, while, Gene Set Analysis focuses on the detection of coordinated changes in transcriptional levels among sets of biologically related genes. In this field, meta-analysis offers the possibility to compare different studies, addressing the same biological question to fully exploit public gene expression datasets. We describe STEPath, a method that starts from gene expression profiles and integrates the analysis of imbalanced region as an a priori step before performing gene set analysis. The application of STEPath in individual studies produced gene set scores weighted by chromosomal activation. As a final step, we propose a way to compare these scores across different studies (meta-analysis) on related biological issues. One complication with meta-analysis is batch effects, which occur because molecular measurements are affected by laboratory conditions, reagent lots and personnel differences. Major problems occur when batch effects are correlated with an outcome of interest and lead to incorrect conclusions. We evaluated the power of combining chromosome mapping and gene set enrichment analysis, performing the analysis on a dataset of leukaemia (example of individual study) and on a dataset of skeletal muscle diseases (meta-analysis approach). In leukaemia, we identified the Hox gene set, a gene set closely related to the pathology that other algorithms of gene set analysis do not identify, while the meta-analysis approach on muscular disease discriminates between related pathologies and correlates similar ones from different studies. STEPath is a new method that integrates gene expression profiles, genomic co-expressed regions and the information about the biological function of genes. The usage of the STEPath-computed gene set scores overcomes batch effects in the meta-analysis approaches allowing the direct comparison of different pathologies and different studies on a gene set activation level.
A Comparison of the Effectiveness of Two Design Methodologies in a Secondary School Setting.

ERIC Educational Resources Information Center

Cannizzaro, Brenton; Boughton, Doug

1998-01-01

Examines the effectiveness of the analysis-synthesis and generator-conjuncture-analysis models of design education. Concludes that the generator-conjecture-analysis design method produced student design product of a slightly higher standard than the analysis-synthesis design method. Discusses the findings in more detail and considers implications.…
Looking for trees in the forest: summary tree from posterior samples

PubMed Central

2013-01-01

Background Bayesian phylogenetic analysis generates a set of trees which are often condensed into a single tree representing the whole set. Many methods exist for selecting a representative topology for a set of unrooted trees, few exist for assigning branch lengths to a fixed topology, and even fewer for simultaneously setting the topology and branch lengths. However, there is very little research into locating a good representative for a set of rooted time trees like the ones obtained from a BEAST analysis. Results We empirically compare new and known methods for generating a summary tree. Some new methods are motivated by mathematical constructions such as tree metrics, while the rest employ tree concepts which work well in practice. These use more of the posterior than existing methods, which discard information not directly mapped to the chosen topology. Using results from a large number of simulations we assess the quality of a summary tree, measuring (a) how well it explains the sequence data under the model and (b) how close it is to the “truth”, i.e to the tree used to generate the sequences. Conclusions Our simulations indicate that no single method is “best”. Methods producing good divergence time estimates have poor branch lengths and lower model fit, and vice versa. Using the results presented here, a user can choose the appropriate method based on the purpose of the summary tree. PMID:24093883
Looking for trees in the forest: summary tree from posterior samples.

PubMed

Heled, Joseph; Bouckaert, Remco R

2013-10-04

Bayesian phylogenetic analysis generates a set of trees which are often condensed into a single tree representing the whole set. Many methods exist for selecting a representative topology for a set of unrooted trees, few exist for assigning branch lengths to a fixed topology, and even fewer for simultaneously setting the topology and branch lengths. However, there is very little research into locating a good representative for a set of rooted time trees like the ones obtained from a BEAST analysis. We empirically compare new and known methods for generating a summary tree. Some new methods are motivated by mathematical constructions such as tree metrics, while the rest employ tree concepts which work well in practice. These use more of the posterior than existing methods, which discard information not directly mapped to the chosen topology. Using results from a large number of simulations we assess the quality of a summary tree, measuring (a) how well it explains the sequence data under the model and (b) how close it is to the "truth", i.e to the tree used to generate the sequences. Our simulations indicate that no single method is "best". Methods producing good divergence time estimates have poor branch lengths and lower model fit, and vice versa. Using the results presented here, a user can choose the appropriate method based on the purpose of the summary tree.
NEAT: an efficient network enrichment analysis test.

PubMed

Signorelli, Mirko; Vinciotti, Veronica; Wit, Ernst C

2016-09-05

Network enrichment analysis is a powerful method, which allows to integrate gene enrichment analysis with the information on relationships between genes that is provided by gene networks. Existing tests for network enrichment analysis deal only with undirected networks, they can be computationally slow and are based on normality assumptions. We propose NEAT, a test for network enrichment analysis. The test is based on the hypergeometric distribution, which naturally arises as the null distribution in this context. NEAT can be applied not only to undirected, but to directed and partially directed networks as well. Our simulations indicate that NEAT is considerably faster than alternative resampling-based methods, and that its capacity to detect enrichments is at least as good as the one of alternative tests. We discuss applications of NEAT to network analyses in yeast by testing for enrichment of the Environmental Stress Response target gene set with GO Slim and KEGG functional gene sets, and also by inspecting associations between functional sets themselves. NEAT is a flexible and efficient test for network enrichment analysis that aims to overcome some limitations of existing resampling-based tests. The method is implemented in the R package neat, which can be freely downloaded from CRAN ( https://cran.r-project.org/package=neat ).
Recent advances in (soil moisture) triple collocation analysis

USDA-ARS?s Scientific Manuscript database

To date, triple collocation (TC) analysis is one of the most important methods for the global scale evaluation of remotely sensed soil moisture data sets. In this study we review existing implementations of soil moisture TC analysis as well as investigations of the assumptions underlying the method....
Computer-Assisted Traffic Engineering Using Assignment, Optimal Signal Setting, and Modal Split

DOT National Transportation Integrated Search

1978-05-01

Methods of traffic assignment, traffic signal setting, and modal split analysis are combined in a set of computer-assisted traffic engineering programs. The system optimization and user optimization traffic assignments are described. Travel time func...
The GeoViz Toolkit: Using component-oriented coordination methods for geographic visualization and analysis

PubMed Central

Hardisty, Frank; Robinson, Anthony C.

2010-01-01

In this paper we present the GeoViz Toolkit, an open-source, internet-delivered program for geographic visualization and analysis that features a diverse set of software components which can be flexibly combined by users who do not have programming expertise. The design and architecture of the GeoViz Toolkit allows us to address three key research challenges in geovisualization: allowing end users to create their own geovisualization and analysis component set on-the-fly, integrating geovisualization methods with spatial analysis methods, and making geovisualization applications sharable between users. Each of these tasks necessitates a robust yet flexible approach to inter-tool coordination. The coordination strategy we developed for the GeoViz Toolkit, called Introspective Observer Coordination, leverages and combines key advances in software engineering from the last decade: automatic introspection of objects, software design patterns, and reflective invocation of methods. PMID:21731423
Residual mercury content and leaching of mercury and silver from used amalgam capsules.

PubMed

Stone, M E; Pederson, E D; Cohen, M E; Ragain, J C; Karaway, R S; Auxer, R A; Saluta, A R

2002-06-01

The objective of this investigation was to carry out residual mercury (Hg) determinations and toxicity characteristic leaching procedure (TCLP) analysis of used amalgam capsules. For residual Hg analysis, 25 capsules (20 capsules for one brand) from each of 10 different brands of amalgam were analyzed. Total residual Hg levels per capsule were determined using United States Environmental Protection Agency (USEPA) Method 7471. For TCLP analysis, 25 amalgam capsules for each of 10 brands were extracted using a modification of USEPA Method 1311. Hg analysis of the TCLP extracts was done with USEPA Method 7470A. Analysis of silver (Ag) concentrations in the TCLP extract was done with USEPA Method 6010B. Analysis of the residual Hg data resulted in the segregation of brands into three groups: Dispersalloy capsules, Group A, retained the most Hg (1.225 mg/capsule). These capsules were the only ones to include a pestle. Group B capsules, Valliant PhD, Optaloy II, Megalloy and Valliant Snap Set, retained the next highest amount of Hg (0.534-0.770 mg/capsule), and were characterized by a groove in the inside of the capsule. Group C, Tytin regular set double-spill, Tytin FC, Contour, Sybraloy regular set, and Tytin regular set single-spill retained the least amount of Hg (0.125-0.266 mg/capsule). TCLP analysis of the triturated capsules showed Sybraloy and Contour leached Hg at greater than the 0.2 mg/l Resource Conservation and Recovery Act (RCRA) limit. This study demonstrated that residual mercury may be related to capsule design features and that TCLP extracts from these capsules could, in some brands, exceed RCRA Hg limits, making their disposal problematic. At current RCRA limits, the leaching of Ag is not a problem.
Enrichr: a comprehensive gene set enrichment analysis web server 2016 update

PubMed Central

Kuleshov, Maxim V.; Jones, Matthew R.; Rouillard, Andrew D.; Fernandez, Nicolas F.; Duan, Qiaonan; Wang, Zichen; Koplev, Simon; Jenkins, Sherry L.; Jagodnik, Kathleen M.; Lachmann, Alexander; McDermott, Michael G.; Monteiro, Caroline D.; Gundersen, Gregory W.; Ma'ayan, Avi

2016-01-01

Enrichment analysis is a popular method for analyzing gene sets generated by genome-wide experiments. Here we present a significant update to one of the tools in this domain called Enrichr. Enrichr currently contains a large collection of diverse gene set libraries available for analysis and download. In total, Enrichr currently contains 180 184 annotated gene sets from 102 gene set libraries. New features have been added to Enrichr including the ability to submit fuzzy sets, upload BED files, improved application programming interface and visualization of the results as clustergrams. Overall, Enrichr is a comprehensive resource for curated gene sets and a search engine that accumulates biological knowledge for further biological discoveries. Enrichr is freely available at: http://amp.pharm.mssm.edu/Enrichr. PMID:27141961
A Ricin Forensic Profiling Approach Based on a Complex Set of Biomarkers

DOE PAGES

Fredriksson, Sten-Ake; Wunschel, David S.; Lindstrom, Susanne Wiklund; ...

2018-03-28

A forensic method for the retrospective determination of preparation methods used for illicit ricin toxin production was developed. The method was based on a complex set of biomarkers, including carbohydrates, fatty acids, seed storage proteins, in combination with data on ricin and Ricinus communis agglutinin. The analyses were performed on samples prepared from four castor bean plant (R. communis) cultivars by four different sample preparation methods (PM1 – PM4) ranging from simple disintegration of the castor beans to multi-step preparation methods including different protein precipitation methods. Comprehensive analytical data was collected by use of a range of analytical methods andmore » robust orthogonal partial least squares-discriminant analysis- models (OPLS-DA) were constructed based on the calibration set. By the use of a decision tree and two OPLS-DA models, the sample preparation methods of test set samples were determined. The model statistics of the two models were good and a 100% rate of correct predictions of the test set was achieved.« less
A Ricin Forensic Profiling Approach Based on a Complex Set of Biomarkers

DOE Office of Scientific and Technical Information (OSTI.GOV)

Fredriksson, Sten-Ake; Wunschel, David S.; Lindstrom, Susanne Wiklund

A forensic method for the retrospective determination of preparation methods used for illicit ricin toxin production was developed. The method was based on a complex set of biomarkers, including carbohydrates, fatty acids, seed storage proteins, in combination with data on ricin and Ricinus communis agglutinin. The analyses were performed on samples prepared from four castor bean plant (R. communis) cultivars by four different sample preparation methods (PM1 – PM4) ranging from simple disintegration of the castor beans to multi-step preparation methods including different protein precipitation methods. Comprehensive analytical data was collected by use of a range of analytical methods andmore » robust orthogonal partial least squares-discriminant analysis- models (OPLS-DA) were constructed based on the calibration set. By the use of a decision tree and two OPLS-DA models, the sample preparation methods of test set samples were determined. The model statistics of the two models were good and a 100% rate of correct predictions of the test set was achieved.« less
Maximum unbiased validation (MUV) data sets for virtual screening based on PubChem bioactivity data.

PubMed

Rohrer, Sebastian G; Baumann, Knut

2009-02-01

Refined nearest neighbor analysis was recently introduced for the analysis of virtual screening benchmark data sets. It constitutes a technique from the field of spatial statistics and provides a mathematical framework for the nonparametric analysis of mapped point patterns. Here, refined nearest neighbor analysis is used to design benchmark data sets for virtual screening based on PubChem bioactivity data. A workflow is devised that purges data sets of compounds active against pharmaceutically relevant targets from unselective hits. Topological optimization using experimental design strategies monitored by refined nearest neighbor analysis functions is applied to generate corresponding data sets of actives and decoys that are unbiased with regard to analogue bias and artificial enrichment. These data sets provide a tool for Maximum Unbiased Validation (MUV) of virtual screening methods. The data sets and a software package implementing the MUV design workflow are freely available at http://www.pharmchem.tu-bs.de/lehre/baumann/MUV.html.

Nonlinear multivariate and time series analysis by neural network methods

NASA Astrophysics Data System (ADS)

Hsieh, William W.

2004-03-01

Methods in multivariate statistical analysis are essential for working with large amounts of geophysical data, data from observational arrays, from satellites, or from numerical model output. In classical multivariate statistical analysis, there is a hierarchy of methods, starting with linear regression at the base, followed by principal component analysis (PCA) and finally canonical correlation analysis (CCA). A multivariate time series method, the singular spectrum analysis (SSA), has been a fruitful extension of the PCA technique. The common drawback of these classical methods is that only linear structures can be correctly extracted from the data. Since the late 1980s, neural network methods have become popular for performing nonlinear regression and classification. More recently, neural network methods have been extended to perform nonlinear PCA (NLPCA), nonlinear CCA (NLCCA), and nonlinear SSA (NLSSA). This paper presents a unified view of the NLPCA, NLCCA, and NLSSA techniques and their applications to various data sets of the atmosphere and the ocean (especially for the El Niño-Southern Oscillation and the stratospheric quasi-biennial oscillation). These data sets reveal that the linear methods are often too simplistic to describe real-world systems, with a tendency to scatter a single oscillatory phenomenon into numerous unphysical modes or higher harmonics, which can be largely alleviated in the new nonlinear paradigm.
FDDO and DSMC analyses of rarefied gas flow through 2D nozzles

NASA Technical Reports Server (NTRS)

Chung, Chan-Hong; De Witt, Kenneth J.; Jeng, Duen-Ren; Penko, Paul F.

1992-01-01

Two different approaches, the finite-difference method coupled with the discrete-ordinate method (FDDO), and the direct-simulation Monte Carlo (DSMC) method, are used in the analysis of the flow of a rarefied gas expanding through a two-dimensional nozzle and into a surrounding low-density environment. In the FDDO analysis, by employing the discrete-ordinate method, the Boltzmann equation simplified by a model collision integral is transformed to a set of partial differential equations which are continuous in physical space but are point functions in molecular velocity space. The set of partial differential equations are solved by means of a finite-difference approximation. In the DSMC analysis, the variable hard sphere model is used as a molecular model and the no time counter method is employed as a collision sampling technique. The results of both the FDDO and the DSMC methods show good agreement. The FDDO method requires less computational effort than the DSMC method by factors of 10 to 40 in CPU time, depending on the degree of rarefaction.
Advancing our thinking in presence-only and used-available analysis.

PubMed

Warton, David; Aarts, Geert

2013-11-01

1. The problems of analysing used-available data and presence-only data are equivalent, and this paper uses this equivalence as a platform for exploring opportunities for advancing analysis methodology. 2. We suggest some potential methodological advances in used-available analysis, made possible via lessons learnt in the presence-only literature, for example, using modern methods to improve predictive performance. We also consider the converse - potential advances in presence-only analysis inspired by used-available methodology. 3. Notwithstanding these potential advances in methodology, perhaps a greater opportunity is in advancing our thinking about how to apply a given method to a particular data set. 4. It is shown by example that strikingly different results can be achieved for a single data set by applying a given method of analysis in different ways - hence having chosen a method of analysis, the next step of working out how to apply it is critical to performance. 5. We review some key issues to consider in deciding how to apply an analysis method: apply the method in a manner that reflects the study design; consider data properties; and use diagnostic tools to assess how reasonable a given analysis is for the data at hand. © 2013 The Authors. Journal of Animal Ecology © 2013 British Ecological Society.
Policy Analysis: A Tool for Setting District Computer Use Policy. Paper and Report Series No. 97.

ERIC Educational Resources Information Center

Gray, Peter J.

This report explores the use of policy analysis as a tool for setting computer use policy in a school district by discussing the steps in the policy formation and implementation processes and outlining how policy analysis methods can contribute to the creation of effective policy. Factors related to the adoption and implementation of innovations…
Naturally-Emerging Technology-Based Leadership Roles in Three Independent Schools: A Social Network-Based Case Study Using Fuzzy Set Qualitative Comparative Analysis

ERIC Educational Resources Information Center

Velastegui, Pamela J.

2013-01-01

This hypothesis-generating case study investigates the naturally emerging roles of technology brokers and technology leaders in three independent schools in New York involving 92 school educators. A multiple and mixed method design utilizing Social Network Analysis (SNA) and fuzzy set Qualitative Comparative Analysis (FSQCA) involved gathering…
Detecting discordance enrichment among a series of two-sample genome-wide expression data sets.

PubMed

Lai, Yinglei; Zhang, Fanni; Nayak, Tapan K; Modarres, Reza; Lee, Norman H; McCaffrey, Timothy A

2017-01-25

With the current microarray and RNA-seq technologies, two-sample genome-wide expression data have been widely collected in biological and medical studies. The related differential expression analysis and gene set enrichment analysis have been frequently conducted. Integrative analysis can be conducted when multiple data sets are available. In practice, discordant molecular behaviors among a series of data sets can be of biological and clinical interest. In this study, a statistical method is proposed for detecting discordance gene set enrichment. Our method is based on a two-level multivariate normal mixture model. It is statistically efficient with linearly increased parameter space when the number of data sets is increased. The model-based probability of discordance enrichment can be calculated for gene set detection. We apply our method to a microarray expression data set collected from forty-five matched tumor/non-tumor pairs of tissues for studying pancreatic cancer. We divided the data set into a series of non-overlapping subsets according to the tumor/non-tumor paired expression ratio of gene PNLIP (pancreatic lipase, recently shown it association with pancreatic cancer). The log-ratio ranges from a negative value (e.g. more expressed in non-tumor tissue) to a positive value (e.g. more expressed in tumor tissue). Our purpose is to understand whether any gene sets are enriched in discordant behaviors among these subsets (when the log-ratio is increased from negative to positive). We focus on KEGG pathways. The detected pathways will be useful for our further understanding of the role of gene PNLIP in pancreatic cancer research. Among the top list of detected pathways, the neuroactive ligand receptor interaction and olfactory transduction pathways are the most significant two. Then, we consider gene TP53 that is well-known for its role as tumor suppressor in cancer research. The log-ratio also ranges from a negative value (e.g. more expressed in non-tumor tissue) to a positive value (e.g. more expressed in tumor tissue). We divided the microarray data set again according to the expression ratio of gene TP53. After the discordance enrichment analysis, we observed overall similar results and the above two pathways are still the most significant detections. More interestingly, only these two pathways have been identified for their association with pancreatic cancer in a pathway analysis of genome-wide association study (GWAS) data. This study illustrates that some disease-related pathways can be enriched in discordant molecular behaviors when an important disease-related gene changes its expression. Our proposed statistical method is useful in the detection of these pathways. Furthermore, our method can also be applied to genome-wide expression data collected by the recent RNA-seq technology.
A Review of Functional Analysis Methods Conducted in Public School Classroom Settings

ERIC Educational Resources Information Center

Lloyd, Blair P.; Weaver, Emily S.; Staubitz, Johanna L.

2016-01-01

The use of functional behavior assessments (FBAs) to address problem behavior in classroom settings has increased as a result of education legislation and long-standing evidence supporting function-based interventions. Although functional analysis remains the standard for identifying behavior--environment functional relations, this component is…
Robustness of Type I Error and Power in Set Correlation Analysis of Contingency Tables.

ERIC Educational Resources Information Center

Cohen, Jacob; Nee, John C. M.

1990-01-01

The analysis of contingency tables via set correlation allows the assessment of subhypotheses involving contrast functions of the categories of the nominal scales. The robustness of such methods with regard to Type I error and statistical power was studied via a Monte Carlo experiment. (TJH)
Spacelab Charcoal Analyses

NASA Technical Reports Server (NTRS)

Slivon, L. E.; Hernon-Kenny, L. A.; Katona, V. R.; Dejarme, L. E.

1995-01-01

This report describes analytical methods and results obtained from chemical analysis of 31 charcoal samples in five sets. Each set was obtained from a single scrubber used to filter ambient air on board a Spacelab mission. Analysis of the charcoal samples was conducted by thermal desorption followed by gas chromatography/mass spectrometry (GC/MS). All samples were analyzed using identical methods. The method used for these analyses was able to detect compounds independent of their polarity or volatility. In addition to the charcoal samples, analyses of three Environmental Control and Life Support System (ECLSS) water samples were conducted specifically for trimethylamine.
Analysis of 2014 Post UTME Score of Candidates in the University of Ibadan with Two Methods of Standard Setting to Set Cut Off Points

ERIC Educational Resources Information Center

Oladele, Babatunde

2017-01-01

The aim of the current study is to analyse the 2014 Post UTME scores of candidates in the university of Ibadan towards the establishment of cut off using two methods of standard settings. Prospective candidates who seek admission to higher institution are often denied admission through the Post UTME exercise. There is no single recommended…
An integrated methods study of the experiences of youth with severe disabilities in leisure activity settings: the importance of belonging, fun, and control and choice.

PubMed

King, Gillian; Gibson, Barbara E; Mistry, Bhavnita; Pinto, Madhu; Goh, Freda; Teachman, Gail; Thompson, Laura

2014-01-01

The aim was to examine the leisure activity setting experiences of two groups of youth with severe disabilities - those with complex continuing care (CCC) needs and those who have little functional speech and communicate using augmentative and alternative communication (AAC). Twelve youth took part in a mixed methods study, in which their experiences were ascertained using qualitative methods (observations, photo elicitation and interviews) and the measure of Self-Reported Experiences of Activity Settings (SEAS). Data integration occurred using a "following a thread" technique and case-by-case analysis. The analysis revealed several highly valued aspects of leisure activity setting experiences for youth, including engagement with others, enjoying the moment, and control and choice in selection and participation in activity settings. The findings provide preliminary insights into the nature of optimal activity settings for youth with severe disabilities, and the mediators of these experiences. Compared to other youth, the data illustrate both the commonalities of experiences and differences in the ways in which these experiences are attained. Implications for research concern the utility of mixed methods approaches in understanding the complex nature of participation experiences. Implications for clinical practice concern the importance of not assuming the nature of youths' experiences.
Sentiment Analysis of Health Care Tweets: Review of the Methods Used.

PubMed

Gohil, Sunir; Vuik, Sabine; Darzi, Ara

2018-04-23

Twitter is a microblogging service where users can send and read short 140-character messages called "tweets." There are several unstructured, free-text tweets relating to health care being shared on Twitter, which is becoming a popular area for health care research. Sentiment is a metric commonly used to investigate the positive or negative opinion within these messages. Exploring the methods used for sentiment analysis in Twitter health care research may allow us to better understand the options available for future research in this growing field. The first objective of this study was to understand which tools would be available for sentiment analysis of Twitter health care research, by reviewing existing studies in this area and the methods they used. The second objective was to determine which method would work best in the health care settings, by analyzing how the methods were used to answer specific health care questions, their production, and how their accuracy was analyzed. A review of the literature was conducted pertaining to Twitter and health care research, which used a quantitative method of sentiment analysis for the free-text messages (tweets). The study compared the types of tools used in each case and examined methods for tool production, tool training, and analysis of accuracy. A total of 12 papers studying the quantitative measurement of sentiment in the health care setting were found. More than half of these studies produced tools specifically for their research, 4 used open source tools available freely, and 2 used commercially available software. Moreover, 4 out of the 12 tools were trained using a smaller sample of the study's final data. The sentiment method was trained against, on an average, 0.45% (2816/627,024) of the total sample data. One of the 12 papers commented on the analysis of accuracy of the tool used. Multiple methods are used for sentiment analysis of tweets in the health care setting. These range from self-produced basic categorizations to more complex and expensive commercial software. The open source and commercial methods are developed on product reviews and generic social media messages. None of these methods have been extensively tested against a corpus of health care messages to check their accuracy. This study suggests that there is a need for an accurate and tested tool for sentiment analysis of tweets trained using a health care setting-specific corpus of manually annotated tweets first. ©Sunir Gohil, Sabine Vuik, Ara Darzi. Originally published in JMIR Public Health and Surveillance (http://publichealth.jmir.org), 23.04.2018.
Cut set-based risk and reliability analysis for arbitrarily interconnected networks

DOEpatents

Wyss, Gregory D.

2000-01-01

Method for computing all-terminal reliability for arbitrarily interconnected networks such as the United States public switched telephone network. The method includes an efficient search algorithm to generate minimal cut sets for nonhierarchical networks directly from the network connectivity diagram. Efficiency of the search algorithm stems in part from its basis on only link failures. The method also includes a novel quantification scheme that likewise reduces computational effort associated with assessing network reliability based on traditional risk importance measures. Vast reductions in computational effort are realized since combinatorial expansion and subsequent Boolean reduction steps are eliminated through analysis of network segmentations using a technique of assuming node failures to occur on only one side of a break in the network, and repeating the technique for all minimal cut sets generated with the search algorithm. The method functions equally well for planar and non-planar networks.
Image segmentation and dynamic lineage analysis in single-cell fluorescence microscopy.

PubMed

Wang, Quanli; Niemi, Jarad; Tan, Chee-Meng; You, Lingchong; West, Mike

2010-01-01

An increasingly common component of studies in synthetic and systems biology is analysis of dynamics of gene expression at the single-cell level, a context that is heavily dependent on the use of time-lapse movies. Extracting quantitative data on the single-cell temporal dynamics from such movies remains a major challenge. Here, we describe novel methods for automating key steps in the analysis of single-cell, fluorescent images-segmentation and lineage reconstruction-to recognize and track individual cells over time. The automated analysis iteratively combines a set of extended morphological methods for segmentation, and uses a neighborhood-based scoring method for frame-to-frame lineage linking. Our studies with bacteria, budding yeast and human cells, demonstrate the portability and usability of these methods, whether using phase, bright field or fluorescent images. These examples also demonstrate the utility of our integrated approach in facilitating analyses of engineered and natural cellular networks in diverse settings. The automated methods are implemented in freely available, open-source software.
Systems-based biological concordance and predictive reproducibility of gene set discovery methods in cardiovascular disease.

PubMed

Azuaje, Francisco; Zheng, Huiru; Camargo, Anyela; Wang, Haiying

2011-08-01

The discovery of novel disease biomarkers is a crucial challenge for translational bioinformatics. Demonstration of both their classification power and reproducibility across independent datasets are essential requirements to assess their potential clinical relevance. Small datasets and multiplicity of putative biomarker sets may explain lack of predictive reproducibility. Studies based on pathway-driven discovery approaches have suggested that, despite such discrepancies, the resulting putative biomarkers tend to be implicated in common biological processes. Investigations of this problem have been mainly focused on datasets derived from cancer research. We investigated the predictive and functional concordance of five methods for discovering putative biomarkers in four independently-generated datasets from the cardiovascular disease domain. A diversity of biosignatures was identified by the different methods. However, we found strong biological process concordance between them, especially in the case of methods based on gene set analysis. With a few exceptions, we observed lack of classification reproducibility using independent datasets. Partial overlaps between our putative sets of biomarkers and the primary studies exist. Despite the observed limitations, pathway-driven or gene set analysis can predict potentially novel biomarkers and can jointly point to biomedically-relevant underlying molecular mechanisms. Copyright © 2011 Elsevier Inc. All rights reserved.
Multi-disciplinary optimization of aeroservoelastic systems

NASA Technical Reports Server (NTRS)

Karpel, Mordechay

1991-01-01

New methods were developed for efficient aeroservoelastic analysis and optimization. The main target was to develop a method for investigating large structural variations using a single set of modal coordinates. This task was accomplished by basing the structural modal coordinates on normal modes calculated with a set of fictitious masses loading the locations of anticipated structural changes. The following subject areas are covered: (1) modal coordinates for aeroelastic analysis with large local structural variations; and (2) time simulation of flutter with large stiffness changes.
Evaluation of hierarchical agglomerative cluster analysis methods for discrimination of primary biological aerosol

NASA Astrophysics Data System (ADS)

Crawford, I.; Ruske, S.; Topping, D. O.; Gallagher, M. W.

2015-11-01

In this paper we present improved methods for discriminating and quantifying primary biological aerosol particles (PBAPs) by applying hierarchical agglomerative cluster analysis to multi-parameter ultraviolet-light-induced fluorescence (UV-LIF) spectrometer data. The methods employed in this study can be applied to data sets in excess of 1 × 106 points on a desktop computer, allowing for each fluorescent particle in a data set to be explicitly clustered. This reduces the potential for misattribution found in subsampling and comparative attribution methods used in previous approaches, improving our capacity to discriminate and quantify PBAP meta-classes. We evaluate the performance of several hierarchical agglomerative cluster analysis linkages and data normalisation methods using laboratory samples of known particle types and an ambient data set. Fluorescent and non-fluorescent polystyrene latex spheres were sampled with a Wideband Integrated Bioaerosol Spectrometer (WIBS-4) where the optical size, asymmetry factor and fluorescent measurements were used as inputs to the analysis package. It was found that the Ward linkage with z-score or range normalisation performed best, correctly attributing 98 and 98.1 % of the data points respectively. The best-performing methods were applied to the BEACHON-RoMBAS (Bio-hydro-atmosphere interactions of Energy, Aerosols, Carbon, H2O, Organics and Nitrogen-Rocky Mountain Biogenic Aerosol Study) ambient data set, where it was found that the z-score and range normalisation methods yield similar results, with each method producing clusters representative of fungal spores and bacterial aerosol, consistent with previous results. The z-score result was compared to clusters generated with previous approaches (WIBS AnalysiS Program, WASP) where we observe that the subsampling and comparative attribution method employed by WASP results in the overestimation of the fungal spore concentration by a factor of 1.5 and the underestimation of bacterial aerosol concentration by a factor of 5. We suggest that this likely due to errors arising from misattribution due to poor centroid definition and failure to assign particles to a cluster as a result of the subsampling and comparative attribution method employed by WASP. The methods used here allow for the entire fluorescent population of particles to be analysed, yielding an explicit cluster attribution for each particle and improving cluster centroid definition and our capacity to discriminate and quantify PBAP meta-classes compared to previous approaches.
[Application of Stata software to test heterogeneity in meta-analysis method].

PubMed

Wang, Dan; Mou, Zhen-yun; Zhai, Jun-xia; Zong, Hong-xia; Zhao, Xiao-dong

2008-07-01

To introduce the application of Stata software to heterogeneity test in meta-analysis. A data set was set up according to the example in the study, and the corresponding commands of the methods in Stata 9 software were applied to test the example. The methods used were Q-test and I2 statistic attached to the fixed effect model forest plot, H statistic and Galbraith plot. The existence of the heterogeneity among studies could be detected by Q-test and H statistic and the degree of the heterogeneity could be detected by I2 statistic. The outliers which were the sources of the heterogeneity could be spotted from the Galbraith plot. Heterogeneity test in meta-analysis can be completed by the four methods in Stata software simply and quickly. H and I2 statistics are more robust, and the outliers of the heterogeneity can be clearly seen in the Galbraith plot among the four methods.
Optimizing Probability of Detection Point Estimate Demonstration

NASA Technical Reports Server (NTRS)

Koshti, Ajay M.

2017-01-01

Probability of detection (POD) analysis is used in assessing reliably detectable flaw size in nondestructive evaluation (NDE). MIL-HDBK-18231and associated mh18232POD software gives most common methods of POD analysis. Real flaws such as cracks and crack-like flaws are desired to be detected using these NDE methods. A reliably detectable crack size is required for safe life analysis of fracture critical parts. The paper provides discussion on optimizing probability of detection (POD) demonstration experiments using Point Estimate Method. POD Point estimate method is used by NASA for qualifying special NDE procedures. The point estimate method uses binomial distribution for probability density. Normally, a set of 29 flaws of same size within some tolerance are used in the demonstration. The optimization is performed to provide acceptable value for probability of passing demonstration (PPD) and achieving acceptable value for probability of false (POF) calls while keeping the flaw sizes in the set as small as possible.
Standard Setting Methods for Pass/Fail Decisions on High-Stakes Objective Structured Clinical Examinations: A Validity Study.

PubMed

Yousuf, Naveed; Violato, Claudio; Zuberi, Rukhsana W

2015-01-01

CONSTRUCT: Authentic standard setting methods will demonstrate high convergent validity evidence of their outcomes, that is, cutoff scores and pass/fail decisions, with most other methods when compared with each other. The objective structured clinical examination (OSCE) was established for valid, reliable, and objective assessment of clinical skills in health professions education. Various standard setting methods have been proposed to identify objective, reliable, and valid cutoff scores on OSCEs. These methods may identify different cutoff scores for the same examinations. Identification of valid and reliable cutoff scores for OSCEs remains an important issue and a challenge. Thirty OSCE stations administered at least twice in the years 2010-2012 to 393 medical students in Years 2 and 3 at Aga Khan University are included. Psychometric properties of the scores are determined. Cutoff scores and pass/fail decisions of Wijnen, Cohen, Mean-1.5SD, Mean-1SD, Angoff, borderline group and borderline regression (BL-R) methods are compared with each other and with three variants of cluster analysis using repeated measures analysis of variance and Cohen's kappa. The mean psychometric indices on the 30 OSCE stations are reliability coefficient = 0.76 (SD = 0.12); standard error of measurement = 5.66 (SD = 1.38); coefficient of determination = 0.47 (SD = 0.19), and intergrade discrimination = 7.19 (SD = 1.89). BL-R and Wijnen methods show the highest convergent validity evidence among other methods on the defined criteria. Angoff and Mean-1.5SD demonstrated least convergent validity evidence. The three cluster variants showed substantial convergent validity with borderline methods. Although there was a high level of convergent validity of Wijnen method, it lacks the theoretical strength to be used for competency-based assessments. The BL-R method is found to show the highest convergent validity evidences for OSCEs with other standard setting methods used in the present study. We also found that cluster analysis using mean method can be used for quality assurance of borderline methods. These findings should be further confirmed by studies in other settings.

Numerical simulation of rarefied gas flow through a slit

NASA Technical Reports Server (NTRS)

Keith, Theo G., Jr.; Jeng, Duen-Ren; De Witt, Kenneth J.; Chung, Chan-Hong

1990-01-01

Two different approaches, the finite-difference method coupled with the discrete-ordinate method (FDDO), and the direct-simulation Monte Carlo (DSMC) method, are used in the analysis of the flow of a rarefied gas from one reservoir to another through a two-dimensional slit. The cases considered are for hard vacuum downstream pressure, finite pressure ratios, and isobaric pressure with thermal diffusion, which are not well established in spite of the simplicity of the flow field. In the FDDO analysis, by employing the discrete-ordinate method, the Boltzmann equation simplified by a model collision integral is transformed to a set of partial differential equations which are continuous in physical space but are point functions in molecular velocity space. The set of partial differential equations are solved by means of a finite-difference approximation. In the DSMC analysis, three kinds of collision sampling techniques, the time counter (TC) method, the null collision (NC) method, and the no time counter (NTC) method, are used.
An analytic data analysis method for oscillatory slug tests.

PubMed

Chen, Chia-Shyun

2006-01-01

An analytical data analysis method is developed for slug tests in partially penetrating wells in confined or unconfined aquifers of high hydraulic conductivity. As adapted from the van der Kamp method, the determination of the hydraulic conductivity is based on the occurrence times and the displacements of the extreme points measured from the oscillatory data and their theoretical counterparts available in the literature. This method is applied to two sets of slug test response data presented by Butler et al.: one set shows slow damping with seven discernable extremities, and the other shows rapid damping with three extreme points. The estimates of the hydraulic conductivity obtained by the analytic method are in good agreement with those determined by an available curve-matching technique.
Meta-Analysis of Tumor Stem-Like Breast Cancer Cells Using Gene Set and Network Analysis

PubMed Central

Lee, Won Jun; Kim, Sang Cheol; Yoon, Jung-Ho; Yoon, Sang Jun; Lim, Johan; Kim, You-Sun; Kwon, Sung Won; Park, Jeong Hill

2016-01-01

Generally, cancer stem cells have epithelial-to-mesenchymal-transition characteristics and other aggressive properties that cause metastasis. However, there have been no confident markers for the identification of cancer stem cells and comparative methods examining adherent and sphere cells are widely used to investigate mechanism underlying cancer stem cells, because sphere cells have been known to maintain cancer stem cell characteristics. In this study, we conducted a meta-analysis that combined gene expression profiles from several studies that utilized tumorsphere technology to investigate tumor stem-like breast cancer cells. We used our own gene expression profiles along with the three different gene expression profiles from the Gene Expression Omnibus, which we combined using the ComBat method, and obtained significant gene sets using the gene set analysis of our datasets and the combined dataset. This experiment focused on four gene sets such as cytokine-cytokine receptor interaction that demonstrated significance in both datasets. Our observations demonstrated that among the genes of four significant gene sets, six genes were consistently up-regulated and satisfied the p-value of < 0.05, and our network analysis showed high connectivity in five genes. From these results, we established CXCR4, CXCL1 and HMGCS1, the intersecting genes of the datasets with high connectivity and p-value of < 0.05, as significant genes in the identification of cancer stem cells. Additional experiment using quantitative reverse transcription-polymerase chain reaction showed significant up-regulation in MCF-7 derived sphere cells and confirmed the importance of these three genes. Taken together, using meta-analysis that combines gene set and network analysis, we suggested CXCR4, CXCL1 and HMGCS1 as candidates involved in tumor stem-like breast cancer cells. Distinct from other meta-analysis, by using gene set analysis, we selected possible markers which can explain the biological mechanisms and suggested network analysis as an additional criterion for selecting candidates. PMID:26870956
A projection operator method for the analysis of magnetic neutron form factors

NASA Astrophysics Data System (ADS)

Kaprzyk, S.; Van Laar, B.; Maniawski, F.

1981-03-01

A set of projection operators in matrix form has been derived on the basis of decomposition of the spin density into a series of fully symmetrized cubic harmonics. This set of projection operators allows a formulation of the Fourier analysis of magnetic form factors in a convenient way. The presented method is capable of checking the validity of various theoretical models used for spin density analysis up to now. The general formalism is worked out in explicit form for the fcc and bcc structures and deals with that part of spin density which is contained within the sphere inscribed in the Wigner-Seitz cell. This projection operator method has been tested on the magnetic form factors of nickel and iron.
Gene set analysis approaches for RNA-seq data: performance evaluation and application guideline

PubMed Central

Rahmatallah, Yasir; Emmert-Streib, Frank

2016-01-01

Transcriptome sequencing (RNA-seq) is gradually replacing microarrays for high-throughput studies of gene expression. The main challenge of analyzing microarray data is not in finding differentially expressed genes, but in gaining insights into the biological processes underlying phenotypic differences. To interpret experimental results from microarrays, gene set analysis (GSA) has become the method of choice, in particular because it incorporates pre-existing biological knowledge (in a form of functionally related gene sets) into the analysis. Here we provide a brief review of several statistically different GSA approaches (competitive and self-contained) that can be adapted from microarrays practice as well as those specifically designed for RNA-seq. We evaluate their performance (in terms of Type I error rate, power, robustness to the sample size and heterogeneity, as well as the sensitivity to different types of selection biases) on simulated and real RNA-seq data. Not surprisingly, the performance of various GSA approaches depends only on the statistical hypothesis they test and does not depend on whether the test was developed for microarrays or RNA-seq data. Interestingly, we found that competitive methods have lower power as well as robustness to the samples heterogeneity than self-contained methods, leading to poor results reproducibility. We also found that the power of unsupervised competitive methods depends on the balance between up- and down-regulated genes in tested gene sets. These properties of competitive methods have been overlooked before. Our evaluation provides a concise guideline for selecting GSA approaches, best performing under particular experimental settings in the context of RNA-seq. PMID:26342128
Parameter Optimization for Selected Correlation Analysis of Intracranial Pathophysiology.

PubMed

Faltermeier, Rupert; Proescholdt, Martin A; Bele, Sylvia; Brawanski, Alexander

2015-01-01

Recently we proposed a mathematical tool set, called selected correlation analysis, that reliably detects positive and negative correlations between arterial blood pressure (ABP) and intracranial pressure (ICP). Such correlations are associated with severe impairment of the cerebral autoregulation and intracranial compliance, as predicted by a mathematical model. The time resolved selected correlation analysis is based on a windowing technique combined with Fourier-based coherence calculations and therefore depends on several parameters. For real time application of this method at an ICU it is inevitable to adjust this mathematical tool for high sensitivity and distinct reliability. In this study, we will introduce a method to optimize the parameters of the selected correlation analysis by correlating an index, called selected correlation positive (SCP), with the outcome of the patients represented by the Glasgow Outcome Scale (GOS). For that purpose, the data of twenty-five patients were used to calculate the SCP value for each patient and multitude of feasible parameter sets of the selected correlation analysis. It could be shown that an optimized set of parameters is able to improve the sensitivity of the method by a factor greater than four in comparison to our first analyses.
Parameter Optimization for Selected Correlation Analysis of Intracranial Pathophysiology

PubMed Central

Faltermeier, Rupert; Proescholdt, Martin A.; Bele, Sylvia; Brawanski, Alexander

2015-01-01

Recently we proposed a mathematical tool set, called selected correlation analysis, that reliably detects positive and negative correlations between arterial blood pressure (ABP) and intracranial pressure (ICP). Such correlations are associated with severe impairment of the cerebral autoregulation and intracranial compliance, as predicted by a mathematical model. The time resolved selected correlation analysis is based on a windowing technique combined with Fourier-based coherence calculations and therefore depends on several parameters. For real time application of this method at an ICU it is inevitable to adjust this mathematical tool for high sensitivity and distinct reliability. In this study, we will introduce a method to optimize the parameters of the selected correlation analysis by correlating an index, called selected correlation positive (SCP), with the outcome of the patients represented by the Glasgow Outcome Scale (GOS). For that purpose, the data of twenty-five patients were used to calculate the SCP value for each patient and multitude of feasible parameter sets of the selected correlation analysis. It could be shown that an optimized set of parameters is able to improve the sensitivity of the method by a factor greater than four in comparison to our first analyses. PMID:26693250
Challenges in combining different data sets during analysis when using grounded theory.

PubMed

Rintala, Tuula-Maria; Paavilainen, Eija; Astedt-Kurki, Päivi

2014-05-01

To describe the challenges in combining two data sets during grounded theory analysis. The use of grounded theory in nursing research is common. It is a suitable method for studying human action and interaction. It is recommended that many alternative sources of data are collected to create as rich a dataset as possible. Data from interviews with people with diabetes (n=19) and their family members (n=19). Combining two data sets. When using grounded theory, there are numerous challenges in collecting and managing data, especially for the novice researcher. One challenge is to combine different data sets during the analysis. There are many methodological textbooks about grounded theory but there is little written in the literature about combining different data sets. Discussion is needed on the management of data and the challenges of grounded theory. This article provides a means for combining different data sets in the grounded theory analysis process.
Enrichr: a comprehensive gene set enrichment analysis web server 2016 update.

PubMed

Kuleshov, Maxim V; Jones, Matthew R; Rouillard, Andrew D; Fernandez, Nicolas F; Duan, Qiaonan; Wang, Zichen; Koplev, Simon; Jenkins, Sherry L; Jagodnik, Kathleen M; Lachmann, Alexander; McDermott, Michael G; Monteiro, Caroline D; Gundersen, Gregory W; Ma'ayan, Avi

2016-07-08

Enrichment analysis is a popular method for analyzing gene sets generated by genome-wide experiments. Here we present a significant update to one of the tools in this domain called Enrichr. Enrichr currently contains a large collection of diverse gene set libraries available for analysis and download. In total, Enrichr currently contains 180 184 annotated gene sets from 102 gene set libraries. New features have been added to Enrichr including the ability to submit fuzzy sets, upload BED files, improved application programming interface and visualization of the results as clustergrams. Overall, Enrichr is a comprehensive resource for curated gene sets and a search engine that accumulates biological knowledge for further biological discoveries. Enrichr is freely available at: http://amp.pharm.mssm.edu/Enrichr. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
A strategy for compression and analysis of massive geophysical data sets

NASA Technical Reports Server (NTRS)

Braverman, A.

2001-01-01

This paper describes a method for summaraizing data in a way that approximately preserves high-resolution data structure while reducing data volume and maintaining global integrity of very large, remote sensing data sets. The method is under development for one of Terra's instruments, the Multi-angle Imaging SpectroRadiometer (MISR).
A regulation probability model-based meta-analysis of multiple transcriptomics data sets for cancer biomarker identification.

PubMed

Xie, Xin-Ping; Xie, Yu-Feng; Wang, Hong-Qiang

2017-08-23

Large-scale accumulation of omics data poses a pressing challenge of integrative analysis of multiple data sets in bioinformatics. An open question of such integrative analysis is how to pinpoint consistent but subtle gene activity patterns across studies. Study heterogeneity needs to be addressed carefully for this goal. This paper proposes a regulation probability model-based meta-analysis, jGRP, for identifying differentially expressed genes (DEGs). The method integrates multiple transcriptomics data sets in a gene regulatory space instead of in a gene expression space, which makes it easy to capture and manage data heterogeneity across studies from different laboratories or platforms. Specifically, we transform gene expression profiles into a united gene regulation profile across studies by mathematically defining two gene regulation events between two conditions and estimating their occurring probabilities in a sample. Finally, a novel differential expression statistic is established based on the gene regulation profiles, realizing accurate and flexible identification of DEGs in gene regulation space. We evaluated the proposed method on simulation data and real-world cancer datasets and showed the effectiveness and efficiency of jGRP in identifying DEGs identification in the context of meta-analysis. Data heterogeneity largely influences the performance of meta-analysis of DEGs identification. Existing different meta-analysis methods were revealed to exhibit very different degrees of sensitivity to study heterogeneity. The proposed method, jGRP, can be a standalone tool due to its united framework and controllable way to deal with study heterogeneity.
Normalized Polarization Ratios for the Analysis of Cell Polarity

PubMed Central

Shimoni, Raz; Pham, Kim; Yassin, Mohammed; Ludford-Menting, Mandy J.; Gu, Min; Russell, Sarah M.

2014-01-01

The quantification and analysis of molecular localization in living cells is increasingly important for elucidating biological pathways, and new methods are rapidly emerging. The quantification of cell polarity has generated much interest recently, and ratiometric analysis of fluorescence microscopy images provides one means to quantify cell polarity. However, detection of fluorescence, and the ratiometric measurement, is likely to be sensitive to acquisition settings and image processing parameters. Using imaging of EGFP-expressing cells and computer simulations of variations in fluorescence ratios, we characterized the dependence of ratiometric measurements on processing parameters. This analysis showed that image settings alter polarization measurements; and that clustered localization is more susceptible to artifacts than homogeneous localization. To correct for such inconsistencies, we developed and validated a method for choosing the most appropriate analysis settings, and for incorporating internal controls to ensure fidelity of polarity measurements. This approach is applicable to testing polarity in all cells where the axis of polarity is known. PMID:24963926
Application of class-modelling techniques to infrared spectra for analysis of pork adulteration in beef jerkys.

PubMed

Kuswandi, Bambang; Putri, Fitra Karima; Gani, Agus Abdul; Ahmad, Musa

2015-12-01

The use of chemometrics to analyse infrared spectra to predict pork adulteration in the beef jerky (dendeng) was explored. In the first step, the analysis of pork in the beef jerky formulation was conducted by blending the beef jerky with pork at 5-80 % levels. Then, they were powdered and classified into training set and test set. The second step, the spectra of the two sets was recorded by Fourier Transform Infrared (FTIR) spectroscopy using atenuated total reflection (ATR) cell on the basis of spectral data at frequency region 4000-700 cm(-1). The spectra was categorised into four data sets, i.e. (a) spectra in the whole region as data set 1; (b) spectra in the fingerprint region (1500-600 cm(-1)) as data set 2; (c) spectra in the whole region with treatment as data set 3; and (d) spectra in the fingerprint region with treatment as data set 4. The third step, the chemometric analysis were employed using three class-modelling techniques (i.e. LDA, SIMCA, and SVM) toward the data sets. Finally, the best result of the models towards the data sets on the adulteration analysis of the samples were selected and the best model was compared with the ELISA method. From the chemometric results, the LDA model on the data set 1 was found to be the best model, since it could classify and predict 100 % accuracy of the sample tested. The LDA model was applied toward the real samples of the beef jerky marketed in Jember, and the results showed that the LDA model developed was in good agreement with the ELISA method.
Aggregation Bias and the Analysis of Necessary and Sufficient Conditions in fsQCA

ERIC Educational Resources Information Center

Braumoeller, Bear F.

2017-01-01

Fuzzy-set qualitative comparative analysis (fsQCA) has become one of the most prominent methods in the social sciences for capturing causal complexity, especially for scholars with small- and medium-"N" data sets. This research note explores two key assumptions in fsQCA's methodology for testing for necessary and sufficient…
Homology and the optimization of DNA sequence data

NASA Technical Reports Server (NTRS)

Wheeler, W.

2001-01-01

Three methods of nucleotide character analysis are discussed. Their implications for molecular sequence homology and phylogenetic analysis are compared. The criterion of inter-data set congruence, both character based and topological, are applied to two data sets to elucidate and potentially discriminate among these parsimony-based ideas. c2001 The Willi Hennig Society.
A Comparison of Imputation Methods for Bayesian Factor Analysis Models

ERIC Educational Resources Information Center

Merkle, Edgar C.

2011-01-01

Imputation methods are popular for the handling of missing data in psychology. The methods generally consist of predicting missing data based on observed data, yielding a complete data set that is amiable to standard statistical analyses. In the context of Bayesian factor analysis, this article compares imputation under an unrestricted…
The Symbolic Violence of Setting: A Bourdieusian Analysis of Mixed Methods Data on Secondary Students' Views about Setting

ERIC Educational Resources Information Center

Archer, Louise; Francis, Becky; Miller, Sarah; Taylor, Becky; Tereshchenko, Antonina; Mazenod, Anna; Pepper, David; Travers, Mary-Claire

2018-01-01

"Setting" is a widespread practice in the UK, despite little evidence of its efficacy and substantial evidence of its detrimental impact on those allocated to the lowest sets. Taking a Bourdieusian approach, we propose that setting can be understood as a practice through which the social and cultural reproduction of dominant power…
A novel hazard assessment method for biomass gasification stations based on extended set pair analysis

PubMed Central

Yan, Fang; Xu, Kaili; Li, Deshun; Cui, Zhikai

2017-01-01

Biomass gasification stations are facing many hazard factors, therefore, it is necessary to make hazard assessment for them. In this study, a novel hazard assessment method called extended set pair analysis (ESPA) is proposed based on set pair analysis (SPA). However, the calculation of the connection degree (CD) requires the classification of hazard grades and their corresponding thresholds using SPA for the hazard assessment. In regard to the hazard assessment using ESPA, a novel calculation algorithm of the CD is worked out when hazard grades and their corresponding thresholds are unknown. Then the CD can be converted into Euclidean distance (ED) by a simple and concise calculation, and the hazard of each sample will be ranked based on the value of ED. In this paper, six biomass gasification stations are introduced to make hazard assessment using ESPA and general set pair analysis (GSPA), respectively. By the comparison of hazard assessment results obtained from ESPA and GSPA, the availability and validity of ESPA can be proved in the hazard assessment for biomass gasification stations. Meanwhile, the reasonability of ESPA is also justified by the sensitivity analysis of hazard assessment results obtained by ESPA and GSPA. PMID:28938011
iTemplate: A template-based eye movement data analysis approach.

PubMed

Xiao, Naiqi G; Lee, Kang

2018-02-08

Current eye movement data analysis methods rely on defining areas of interest (AOIs). Due to the fact that AOIs are created and modified manually, variances in their size, shape, and location are unavoidable. These variances affect not only the consistency of the AOI definitions, but also the validity of the eye movement analyses based on the AOIs. To reduce the variances in AOI creation and modification and achieve a procedure to process eye movement data with high precision and efficiency, we propose a template-based eye movement data analysis method. Using a linear transformation algorithm, this method registers the eye movement data from each individual stimulus to a template. Thus, users only need to create one set of AOIs for the template in order to analyze eye movement data, rather than creating a unique set of AOIs for all individual stimuli. This change greatly reduces the error caused by the variance from manually created AOIs and boosts the efficiency of the data analysis. Furthermore, this method can help researchers prepare eye movement data for some advanced analysis approaches, such as iMap. We have developed software (iTemplate) with a graphic user interface to make this analysis method available to researchers.
Pathway Distiller - multisource biological pathway consolidation

PubMed Central

2012-01-01

Background One method to understand and evaluate an experiment that produces a large set of genes, such as a gene expression microarray analysis, is to identify overrepresentation or enrichment for biological pathways. Because pathways are able to functionally describe the set of genes, much effort has been made to collect curated biological pathways into publicly accessible databases. When combining disparate databases, highly related or redundant pathways exist, making their consolidation into pathway concepts essential. This will facilitate unbiased, comprehensive yet streamlined analysis of experiments that result in large gene sets. Methods After gene set enrichment finds representative pathways for large gene sets, pathways are consolidated into representative pathway concepts. Three complementary, but different methods of pathway consolidation are explored. Enrichment Consolidation combines the set of the pathways enriched for the signature gene list through iterative combining of enriched pathways with other pathways with similar signature gene sets; Weighted Consolidation utilizes a Protein-Protein Interaction network based gene-weighting approach that finds clusters of both enriched and non-enriched pathways limited to the experiments' resultant gene list; and finally the de novo Consolidation method uses several measurements of pathway similarity, that finds static pathway clusters independent of any given experiment. Results We demonstrate that the three consolidation methods provide unified yet different functional insights of a resultant gene set derived from a genome-wide profiling experiment. Results from the methods are presented, demonstrating their applications in biological studies and comparing with a pathway web-based framework that also combines several pathway databases. Additionally a web-based consolidation framework that encompasses all three methods discussed in this paper, Pathway Distiller (http://cbbiweb.uthscsa.edu/PathwayDistiller), is established to allow researchers access to the methods and example microarray data described in this manuscript, and the ability to analyze their own gene list by using our unique consolidation methods. Conclusions By combining several pathway systems, implementing different, but complementary pathway consolidation methods, and providing a user-friendly web-accessible tool, we have enabled users the ability to extract functional explanations of their genome wide experiments. PMID:23134636

Reevaluation of Stratospheric Ozone Trends From SAGE II Data Using a Simultaneous Temporal and Spatial Analysis

NASA Technical Reports Server (NTRS)

Damadeo, R. P.; Zawodny, J. M.; Thomason, L. W.

2014-01-01

This paper details a new method of regression for sparsely sampled data sets for use with time-series analysis, in particular the Stratospheric Aerosol and Gas Experiment (SAGE) II ozone data set. Non-uniform spatial, temporal, and diurnal sampling present in the data set result in biased values for the long-term trend if not accounted for. This new method is performed close to the native resolution of measurements and is a simultaneous temporal and spatial analysis that accounts for potential diurnal ozone variation. Results show biases, introduced by the way data is prepared for use with traditional methods, can be as high as 10%. Derived long-term changes show declines in ozone similar to other studies but very different trends in the presumed recovery period, with differences up to 2% per decade. The regression model allows for a variable turnaround time and reveals a hemispheric asymmetry in derived trends in the middle to upper stratosphere. Similar methodology is also applied to SAGE II aerosol optical depth data to create a new volcanic proxy that covers the SAGE II mission period. Ultimately this technique may be extensible towards the inclusion of multiple data sets without the need for homogenization.
Beyond the scope of Free-Wilson analysis: building interpretable QSAR models with machine learning algorithms.

PubMed

Chen, Hongming; Carlsson, Lars; Eriksson, Mats; Varkonyi, Peter; Norinder, Ulf; Nilsson, Ingemar

2013-06-24

A novel methodology was developed to build Free-Wilson like local QSAR models by combining R-group signatures and the SVM algorithm. Unlike Free-Wilson analysis this method is able to make predictions for compounds with R-groups not present in a training set. Eleven public data sets were chosen as test cases for comparing the performance of our new method with several other traditional modeling strategies, including Free-Wilson analysis. Our results show that the R-group signature SVM models achieve better prediction accuracy compared with Free-Wilson analysis in general. Moreover, the predictions of R-group signature models are also comparable to the models using ECFP6 fingerprints and signatures for the whole compound. Most importantly, R-group contributions to the SVM model can be obtained by calculating the gradient for R-group signatures. For most of the studied data sets, a significant correlation with that of a corresponding Free-Wilson analysis is shown. These results suggest that the R-group contribution can be used to interpret bioactivity data and highlight that the R-group signature based SVM modeling method is as interpretable as Free-Wilson analysis. Hence the signature SVM model can be a useful modeling tool for any drug discovery project.
Performance Analysis of Hybrid Electric Vehicle over Different Driving Cycles

NASA Astrophysics Data System (ADS)

Panday, Aishwarya; Bansal, Hari Om

2017-02-01

Article aims to find the nature and response of a hybrid vehicle on various standard driving cycles. Road profile parameters play an important role in determining the fuel efficiency. Typical parameters of road profile can be reduced to a useful smaller set using principal component analysis and independent component analysis. Resultant data set obtained after size reduction may result in more appropriate and important parameter cluster. With reduced parameter set fuel economies over various driving cycles, are ranked using TOPSIS and VIKOR multi-criteria decision making methods. The ranking trend is then compared with the fuel economies achieved after driving the vehicle over respective roads. Control strategy responsible for power split is optimized using genetic algorithm. 1RC battery model and modified SOC estimation method are considered for the simulation and improved results compared with the default are obtained.
Secondary data analysis of large data sets in urology: successes and errors to avoid.

PubMed

Schlomer, Bruce J; Copp, Hillary L

2014-03-01

Secondary data analysis is the use of data collected for research by someone other than the investigator. In the last several years there has been a dramatic increase in the number of these studies being published in urological journals and presented at urological meetings, especially involving secondary data analysis of large administrative data sets. Along with this expansion, skepticism for secondary data analysis studies has increased for many urologists. In this narrative review we discuss the types of large data sets that are commonly used for secondary data analysis in urology, and discuss the advantages and disadvantages of secondary data analysis. A literature search was performed to identify urological secondary data analysis studies published since 2008 using commonly used large data sets, and examples of high quality studies published in high impact journals are given. We outline an approach for performing a successful hypothesis or goal driven secondary data analysis study and highlight common errors to avoid. More than 350 secondary data analysis studies using large data sets have been published on urological topics since 2008 with likely many more studies presented at meetings but never published. Nonhypothesis or goal driven studies have likely constituted some of these studies and have probably contributed to the increased skepticism of this type of research. However, many high quality, hypothesis driven studies addressing research questions that would have been difficult to conduct with other methods have been performed in the last few years. Secondary data analysis is a powerful tool that can address questions which could not be adequately studied by another method. Knowledge of the limitations of secondary data analysis and of the data sets used is critical for a successful study. There are also important errors to avoid when planning and performing a secondary data analysis study. Investigators and the urological community need to strive to use secondary data analysis of large data sets appropriately to produce high quality studies that hopefully lead to improved patient outcomes. Copyright © 2014 American Urological Association Education and Research, Inc. Published by Elsevier Inc. All rights reserved.
Mutual Coupling Analysis for Conformal Microstrip Antennas.

DTIC Science & Technology

1984-12-01

6 0.001/ko, and the infinite integral is terminated at k 150 ko . 28*,-J ." . .. C. MUTUAL COUPLING ANALYSIS In this section, the moment method ...fact that it does provide an attractive alternative to the Green’s function method on which the analysis in later sections is based. In the present...by the moment method , the chosen set of expansion dipole modes plays a very important role. The efficiency as well as accuracy of the analysis depend
Comparison of fMRI analysis methods for heterogeneous BOLD responses in block design studies

PubMed Central

Bernal-Casas, David; Fang, Zhongnan; Lee, Jin Hyung

2017-01-01

A large number of fMRI studies have shown that the temporal dynamics of evoked BOLD responses can be highly heterogeneous. Failing to model heterogeneous responses in statistical analysis can lead to significant errors in signal detection and characterization and alter the neurobiological interpretation. However, to date it is not clear that, out of a large number of options, which methods are robust against variability in the temporal dynamics of BOLD responses in block-design studies. Here, we used rodent optogenetic fMRI data with heterogeneous BOLD responses and simulations guided by experimental data as a means to investigate different analysis methods’ performance against heterogeneous BOLD responses. Evaluations are carried out within the general linear model (GLM) framework and consist of standard basis sets as well as independent component analysis (ICA). Analyses show that, in the presence of heterogeneous BOLD responses, conventionally used GLM with a canonical basis set leads to considerable errors in the detection and characterization of BOLD responses. Our results suggest that the 3rd and 4th order gamma basis sets, the 7th to 9th order finite impulse response (FIR) basis sets, the 5th to 9th order B-spline basis sets, and the 2nd to 5th order Fourier basis sets are optimal for good balance between detection and characterization, while the 1st order Fourier basis set (coherence analysis) used in our earlier studies show good detection capability. ICA has mostly good detection and characterization capabilities, but detects a large volume of spurious activation with the control fMRI data. PMID:27993672
Next-generation text-mining mediated generation of chemical response-specific gene sets for interpretation of gene expression data

PubMed Central

2013-01-01

Background Availability of chemical response-specific lists of genes (gene sets) for pharmacological and/or toxic effect prediction for compounds is limited. We hypothesize that more gene sets can be created by next-generation text mining (next-gen TM), and that these can be used with gene set analysis (GSA) methods for chemical treatment identification, for pharmacological mechanism elucidation, and for comparing compound toxicity profiles. Methods We created 30,211 chemical response-specific gene sets for human and mouse by next-gen TM, and derived 1,189 (human) and 588 (mouse) gene sets from the Comparative Toxicogenomics Database (CTD). We tested for significant differential expression (SDE) (false discovery rate -corrected p-values < 0.05) of the next-gen TM-derived gene sets and the CTD-derived gene sets in gene expression (GE) data sets of five chemicals (from experimental models). We tested for SDE of gene sets for six fibrates in a peroxisome proliferator-activated receptor alpha (PPARA) knock-out GE dataset and compared to results from the Connectivity Map. We tested for SDE of 319 next-gen TM-derived gene sets for environmental toxicants in three GE data sets of triazoles, and tested for SDE of 442 gene sets associated with embryonic structures. We compared the gene sets to triazole effects seen in the Whole Embryo Culture (WEC), and used principal component analysis (PCA) to discriminate triazoles from other chemicals. Results Next-gen TM-derived gene sets matching the chemical treatment were significantly altered in three GE data sets, and the corresponding CTD-derived gene sets were significantly altered in five GE data sets. Six next-gen TM-derived and four CTD-derived fibrate gene sets were significantly altered in the PPARA knock-out GE dataset. None of the fibrate signatures in cMap scored significant against the PPARA GE signature. 33 environmental toxicant gene sets were significantly altered in the triazole GE data sets. 21 of these toxicants had a similar toxicity pattern as the triazoles. We confirmed embryotoxic effects, and discriminated triazoles from other chemicals. Conclusions Gene set analysis with next-gen TM-derived chemical response-specific gene sets is a scalable method for identifying similarities in gene responses to other chemicals, from which one may infer potential mode of action and/or toxic effect. PMID:23356878
[Local fractal analysis of noise-like time series by all permutations method for 1-115 min periods].

PubMed

Panchelyuga, V A; Panchelyuga, M S

2015-01-01

Results of local fractal analysis of 329-per-day time series of 239Pu alpha-decay rate fluctuations by means of all permutations method (APM) are presented. The APM-analysis reveals in the time series some steady frequency set. The coincidence of the frequency set with the Earth natural oscillations was demonstrated. A short review of works by different authors who analyzed the time series of fluctuations in processes of different nature is given. We have shown that the periods observed in those works correspond to the periods revealed in our study. It points to a common mechanism of the phenomenon observed.
A ricin forensic profiling approach based on a complex set of biomarkers.

PubMed

Fredriksson, Sten-Åke; Wunschel, David S; Lindström, Susanne Wiklund; Nilsson, Calle; Wahl, Karen; Åstot, Crister

2018-08-15

A forensic method for the retrospective determination of preparation methods used for illicit ricin toxin production was developed. The method was based on a complex set of biomarkers, including carbohydrates, fatty acids, seed storage proteins, in combination with data on ricin and Ricinus communis agglutinin. The analyses were performed on samples prepared from four castor bean plant (R. communis) cultivars by four different sample preparation methods (PM1-PM4) ranging from simple disintegration of the castor beans to multi-step preparation methods including different protein precipitation methods. Comprehensive analytical data was collected by use of a range of analytical methods and robust orthogonal partial least squares-discriminant analysis- models (OPLS-DA) were constructed based on the calibration set. By the use of a decision tree and two OPLS-DA models, the sample preparation methods of test set samples were determined. The model statistics of the two models were good and a 100% rate of correct predictions of the test set was achieved. Copyright © 2018 Elsevier B.V. All rights reserved.
Similarities among receptor pockets and among compounds: analysis and application to in silico ligand screening.

PubMed

Fukunishi, Yoshifumi; Mikami, Yoshiaki; Nakamura, Haruki

2005-09-01

We developed a new method to evaluate the distances and similarities between receptor pockets or chemical compounds based on a multi-receptor versus multi-ligand docking affinity matrix. The receptors were classified by a cluster analysis based on calculations of the distance between receptor pockets. A set of low homologous receptors that bind a similar compound could be classified into one cluster. Based on this line of reasoning, we proposed a new in silico screening method. According to this method, compounds in a database were docked to multiple targets. The new docking score was a slightly modified version of the multiple active site correction (MASC) score. Receptors that were at a set distance from the target receptor were not included in the analysis, and the modified MASC scores were calculated for the selected receptors. The choice of the receptors is important to achieve a good screening result, and our clustering of receptors is useful to this purpose. This method was applied to the analysis of a set of 132 receptors and 132 compounds, and the results demonstrated that this method achieves a high hit ratio, as compared to that of a uniform sampling, using a receptor-ligand docking program, Sievgene, which was newly developed with a good docking performance yielding 50.8% of the reconstructed complexes at a distance of less than 2 A RMSD.
A simple method for plasma total vitamin C analysis suitable for routine clinical laboratory use.

PubMed

Robitaille, Line; Hoffer, L John

2016-04-21

In-hospital hypovitaminosis C is highly prevalent but almost completely unrecognized. Medical awareness of this potentially important disorder is hindered by the inability of most hospital laboratories to determine plasma vitamin C concentrations. The availability of a simple, reliable method for analyzing plasma vitamin C could increase opportunities for routine plasma vitamin C analysis in clinical medicine. Plasma vitamin C can be analyzed by high performance liquid chromatography (HPLC) with electrochemical (EC) or ultraviolet (UV) light detection. We modified existing UV-HPLC methods for plasma total vitamin C analysis (the sum of ascorbic and dehydroascorbic acid) to develop a simple, constant-low-pH sample reduction procedure followed by isocratic reverse-phase HPLC separation using a purely aqueous low-pH non-buffered mobile phase. Although EC-HPLC is widely recommended over UV-HPLC for plasma total vitamin C analysis, the two methods have never been directly compared. We formally compared the simplified UV-HPLC method with EC-HPLC in 80 consecutive clinical samples. The simplified UV-HPLC method was less expensive, easier to set up, required fewer reagents and no pH adjustments, and demonstrated greater sample stability than many existing methods for plasma vitamin C analysis. When compared with the gold-standard EC-HPLC method in 80 consecutive clinical samples exhibiting a wide range of plasma vitamin C concentrations, it performed equivalently. The easy set up, simplicity and sensitivity of the plasma vitamin C analysis method described here could make it practical in a normally equipped hospital laboratory. Unlike any prior UV-HPLC method for plasma total vitamin C analysis, it was rigorously compared with the gold-standard EC-HPLC method and performed equivalently. Adoption of this method could increase the availability of plasma vitamin C analysis in clinical medicine.
The Effects of Transfer in Teaching Vocabulary to School Children: An Analysis of the Dependencies between Lists of Trained and Non-Trained Words

ERIC Educational Resources Information Center

Frost, Jørgen; Ottem, Ernst; Hagtvet, Bente E.; Snow, Catherine E.

2016-01-01

In the present study, 81 Norwegian students were taught the meaning of words by the Word Generation (WG) method and 51 Norwegian students were taught by an approach inspired by the Thinking Schools (TS) concept. Two sets of words were used: a set of words to be trained and a set of non-trained control words. The two teaching methods yielded no…
Robust extraction of functional signals from gene set analysis using a generalized threshold free scoring function

PubMed Central

2009-01-01

Background A central task in contemporary biosciences is the identification of biological processes showing response in genome-wide differential gene expression experiments. Two types of analysis are common. Either, one generates an ordered list based on the differential expression values of the probed genes and examines the tail areas of the list for over-representation of various functional classes. Alternatively, one monitors the average differential expression level of genes belonging to a given functional class. So far these two types of method have not been combined. Results We introduce a scoring function, Gene Set Z-score (GSZ), for the analysis of functional class over-representation that combines two previous analysis methods. GSZ encompasses popular functions such as correlation, hypergeometric test, Max-Mean and Random Sets as limiting cases. GSZ is stable against changes in class size as well as across different positions of the analysed gene list in tests with randomized data. GSZ shows the best overall performance in a detailed comparison to popular functions using artificial data. Likewise, GSZ stands out in a cross-validation of methods using split real data. A comparison of empirical p-values further shows a strong difference in favour of GSZ, which clearly reports better p-values for top classes than the other methods. Furthermore, GSZ detects relevant biological themes that are missed by the other methods. These observations also hold when comparing GSZ with popular program packages. Conclusion GSZ and improved versions of earlier methods are a useful contribution to the analysis of differential gene expression. The methods and supplementary material are available from the website http://ekhidna.biocenter.helsinki.fi/users/petri/public/GSZ/GSZscore.html. PMID:19775443
GARNET--gene set analysis with exploration of annotation relations.

PubMed

Rho, Kyoohyoung; Kim, Bumjin; Jang, Youngjun; Lee, Sanghyun; Bae, Taejeong; Seo, Jihae; Seo, Chaehwa; Lee, Jihyun; Kang, Hyunjung; Yu, Ungsik; Kim, Sunghoon; Lee, Sanghyuk; Kim, Wan Kyu

2011-02-15

Gene set analysis is a powerful method of deducing biological meaning for an a priori defined set of genes. Numerous tools have been developed to test statistical enrichment or depletion in specific pathways or gene ontology (GO) terms. Major difficulties towards biological interpretation are integrating diverse types of annotation categories and exploring the relationships between annotation terms of similar information. GARNET (Gene Annotation Relationship NEtwork Tools) is an integrative platform for gene set analysis with many novel features. It includes tools for retrieval of genes from annotation database, statistical analysis & visualization of annotation relationships, and managing gene sets. In an effort to allow access to a full spectrum of amassed biological knowledge, we have integrated a variety of annotation data that include the GO, domain, disease, drug, chromosomal location, and custom-defined annotations. Diverse types of molecular networks (pathways, transcription and microRNA regulations, protein-protein interaction) are also included. The pair-wise relationship between annotation gene sets was calculated using kappa statistics. GARNET consists of three modules--gene set manager, gene set analysis and gene set retrieval, which are tightly integrated to provide virtually automatic analysis for gene sets. A dedicated viewer for annotation network has been developed to facilitate exploration of the related annotations. GARNET (gene annotation relationship network tools) is an integrative platform for diverse types of gene set analysis, where complex relationships among gene annotations can be easily explored with an intuitive network visualization tool (http://garnet.isysbio.org/ or http://ercsb.ewha.ac.kr/garnet/).
In silico Pathway Activation Network Decomposition Analysis (iPANDA) as a method for biomarker development.

PubMed

Ozerov, Ivan V; Lezhnina, Ksenia V; Izumchenko, Evgeny; Artemov, Artem V; Medintsev, Sergey; Vanhaelen, Quentin; Aliper, Alexander; Vijg, Jan; Osipov, Andreyan N; Labat, Ivan; West, Michael D; Buzdin, Anton; Cantor, Charles R; Nikolsky, Yuri; Borisov, Nikolay; Irincheeva, Irina; Khokhlovich, Edward; Sidransky, David; Camargo, Miguel Luiz; Zhavoronkov, Alex

2016-11-16

Signalling pathway activation analysis is a powerful approach for extracting biologically relevant features from large-scale transcriptomic and proteomic data. However, modern pathway-based methods often fail to provide stable pathway signatures of a specific phenotype or reliable disease biomarkers. In the present study, we introduce the in silico Pathway Activation Network Decomposition Analysis (iPANDA) as a scalable robust method for biomarker identification using gene expression data. The iPANDA method combines precalculated gene coexpression data with gene importance factors based on the degree of differential gene expression and pathway topology decomposition for obtaining pathway activation scores. Using Microarray Analysis Quality Control (MAQC) data sets and pretreatment data on Taxol-based neoadjuvant breast cancer therapy from multiple sources, we demonstrate that iPANDA provides significant noise reduction in transcriptomic data and identifies highly robust sets of biologically relevant pathway signatures. We successfully apply iPANDA for stratifying breast cancer patients according to their sensitivity to neoadjuvant therapy.
Computational aspects of helicopter trim analysis and damping levels from Floquet theory

NASA Technical Reports Server (NTRS)

Gaonkar, Gopal H.; Achar, N. S.

1992-01-01

Helicopter trim settings of periodic initial state and control inputs are investigated for convergence of Newton iteration in computing the settings sequentially and in parallel. The trim analysis uses a shooting method and a weak version of two temporal finite element methods with displacement formulation and with mixed formulation of displacements and momenta. These three methods broadly represent two main approaches of trim analysis: adaptation of initial-value and finite element boundary-value codes to periodic boundary conditions, particularly for unstable and marginally stable systems. In each method, both the sequential and in-parallel schemes are used and the resulting nonlinear algebraic equations are solved by damped Newton iteration with an optimally selected damping parameter. The impact of damped Newton iteration, including earlier-observed divergence problems in trim analysis, is demonstrated by the maximum condition number of the Jacobian matrices of the iterative scheme and by virtual elimination of divergence. The advantages of the in-parallel scheme over the conventional sequential scheme are also demonstrated.
In silico Pathway Activation Network Decomposition Analysis (iPANDA) as a method for biomarker development

PubMed Central

Ozerov, Ivan V.; Lezhnina, Ksenia V.; Izumchenko, Evgeny; Artemov, Artem V.; Medintsev, Sergey; Vanhaelen, Quentin; Aliper, Alexander; Vijg, Jan; Osipov, Andreyan N.; Labat, Ivan; West, Michael D.; Buzdin, Anton; Cantor, Charles R.; Nikolsky, Yuri; Borisov, Nikolay; Irincheeva, Irina; Khokhlovich, Edward; Sidransky, David; Camargo, Miguel Luiz; Zhavoronkov, Alex

2016-01-01

Signalling pathway activation analysis is a powerful approach for extracting biologically relevant features from large-scale transcriptomic and proteomic data. However, modern pathway-based methods often fail to provide stable pathway signatures of a specific phenotype or reliable disease biomarkers. In the present study, we introduce the in silico Pathway Activation Network Decomposition Analysis (iPANDA) as a scalable robust method for biomarker identification using gene expression data. The iPANDA method combines precalculated gene coexpression data with gene importance factors based on the degree of differential gene expression and pathway topology decomposition for obtaining pathway activation scores. Using Microarray Analysis Quality Control (MAQC) data sets and pretreatment data on Taxol-based neoadjuvant breast cancer therapy from multiple sources, we demonstrate that iPANDA provides significant noise reduction in transcriptomic data and identifies highly robust sets of biologically relevant pathway signatures. We successfully apply iPANDA for stratifying breast cancer patients according to their sensitivity to neoadjuvant therapy. PMID:27848968
Different approaches in Partial Least Squares and Artificial Neural Network models applied for the analysis of a ternary mixture of Amlodipine, Valsartan and Hydrochlorothiazide

NASA Astrophysics Data System (ADS)

Darwish, Hany W.; Hassan, Said A.; Salem, Maissa Y.; El-Zeany, Badr A.

2014-03-01

Different chemometric models were applied for the quantitative analysis of Amlodipine (AML), Valsartan (VAL) and Hydrochlorothiazide (HCT) in ternary mixture, namely, Partial Least Squares (PLS) as traditional chemometric model and Artificial Neural Networks (ANN) as advanced model. PLS and ANN were applied with and without variable selection procedure (Genetic Algorithm GA) and data compression procedure (Principal Component Analysis PCA). The chemometric methods applied are PLS-1, GA-PLS, ANN, GA-ANN and PCA-ANN. The methods were used for the quantitative analysis of the drugs in raw materials and pharmaceutical dosage form via handling the UV spectral data. A 3-factor 5-level experimental design was established resulting in 25 mixtures containing different ratios of the drugs. Fifteen mixtures were used as a calibration set and the other ten mixtures were used as validation set to validate the prediction ability of the suggested methods. The validity of the proposed methods was assessed using the standard addition technique.
[Research on spectra recognition method for cabbages and weeds based on PCA and SIMCA].

PubMed

Zu, Qin; Deng, Wei; Wang, Xiu; Zhao, Chun-Jiang

2013-10-01

In order to improve the accuracy and efficiency of weed identification, the difference of spectral reflectance was employed to distinguish between crops and weeds. Firstly, the different combinations of Savitzky-Golay (SG) convolutional derivation and multiplicative scattering correction (MSC) method were applied to preprocess the raw spectral data. Then the clustering analysis of various types of plants was completed by using principal component analysis (PCA) method, and the feature wavelengths which were sensitive for classifying various types of plants were extracted according to the corresponding loading plots of the optimal principal components in PCA results. Finally, setting the feature wavelengths as the input variables, the soft independent modeling of class analogy (SIMCA) classification method was used to identify the various types of plants. The experimental results of classifying cabbages and weeds showed that on the basis of the optimal pretreatment by a synthetic application of MSC and SG convolutional derivation with SG's parameters set as 1rd order derivation, 3th degree polynomial and 51 smoothing points, 23 feature wavelengths were extracted in accordance with the top three principal components in PCA results. When SIMCA method was used for classification while the previously selected 23 feature wavelengths were set as the input variables, the classification rates of the modeling set and the prediction set were respectively up to 98.6% and 100%.
Pathway Distiller - multisource biological pathway consolidation.

PubMed

Doderer, Mark S; Anguiano, Zachry; Suresh, Uthra; Dashnamoorthy, Ravi; Bishop, Alexander J R; Chen, Yidong

2012-01-01

One method to understand and evaluate an experiment that produces a large set of genes, such as a gene expression microarray analysis, is to identify overrepresentation or enrichment for biological pathways. Because pathways are able to functionally describe the set of genes, much effort has been made to collect curated biological pathways into publicly accessible databases. When combining disparate databases, highly related or redundant pathways exist, making their consolidation into pathway concepts essential. This will facilitate unbiased, comprehensive yet streamlined analysis of experiments that result in large gene sets. After gene set enrichment finds representative pathways for large gene sets, pathways are consolidated into representative pathway concepts. Three complementary, but different methods of pathway consolidation are explored. Enrichment Consolidation combines the set of the pathways enriched for the signature gene list through iterative combining of enriched pathways with other pathways with similar signature gene sets; Weighted Consolidation utilizes a Protein-Protein Interaction network based gene-weighting approach that finds clusters of both enriched and non-enriched pathways limited to the experiments' resultant gene list; and finally the de novo Consolidation method uses several measurements of pathway similarity, that finds static pathway clusters independent of any given experiment. We demonstrate that the three consolidation methods provide unified yet different functional insights of a resultant gene set derived from a genome-wide profiling experiment. Results from the methods are presented, demonstrating their applications in biological studies and comparing with a pathway web-based framework that also combines several pathway databases. Additionally a web-based consolidation framework that encompasses all three methods discussed in this paper, Pathway Distiller (http://cbbiweb.uthscsa.edu/PathwayDistiller), is established to allow researchers access to the methods and example microarray data described in this manuscript, and the ability to analyze their own gene list by using our unique consolidation methods. By combining several pathway systems, implementing different, but complementary pathway consolidation methods, and providing a user-friendly web-accessible tool, we have enabled users the ability to extract functional explanations of their genome wide experiments.

Comparison of 3D quantitative structure-activity relationship methods: Analysis of the in vitro antimalarial activity of 154 artemisinin analogues by hypothetical active-site lattice and comparative molecular field analysis

NASA Astrophysics Data System (ADS)

Woolfrey, John R.; Avery, Mitchell A.; Doweyko, Arthur M.

1998-03-01

Two three-dimensional quantitative structure-activity relationship (3D-QSAR) methods, comparative molecular field analysis (CoMFA) and hypothetical active site lattice (HASL), were compared with respect to the analysis of a training set of 154 artemisinin analogues. Five models were created, including a complete HASL and two trimmed versions, as well as two CoMFA models (leave-one-out standard CoMFA and the guided-region selection protocol). Similar r2 and q2 values were obtained by each method, although some striking differences existed between CoMFA contour maps and the HASL output. Each of the four predictive models exhibited a similar ability to predict the activity of a test set of 23 artemisinin analogues, although some differences were noted as to which compounds were described well by either model.
Neural net applied to anthropological material: a methodical study on the human nasal skeleton.

PubMed

Prescher, Andreas; Meyers, Anne; Gerf von Keyserlingk, Diedrich

2005-07-01

A new information processing method, an artificial neural net, was applied to characterise the variability of anthropological features of the human nasal skeleton. The aim was to find different types of nasal skeletons. A neural net with 15*15 nodes was trained by 17 standard anthropological parameters taken from 184 skulls of the Aachen collection. The trained neural net delivers its classification in a two-dimensional map. Different types of noses were locally separated within the map. Rare and frequent types may be distinguished after one passage of the complete collection through the net. Statistical descriptive analysis, hierarchical cluster analysis, and discriminant analysis were applied to the same data set. These parallel applications allowed comparison of the new approach to the more traditional ones. In general the classification by the neural net is in correspondence with cluster analysis and discriminant analysis. However, it goes beyond these classifications because of the possibility of differentiating the types in multi-dimensional dependencies. Furthermore, places in the map are kept blank for intermediate forms, which may be theoretically expected, but were not included in the training set. In conclusion, the application of a neural network is a suitable method for investigating large collections of biological material. The gained classification may be helpful in anatomy and anthropology as well as in forensic medicine. It may be used to characterise the peculiarity of a whole set as well as to find particular cases within the set.
Analysis of Duplicated Multiple-Samples Rank Data Using the Mack-Skillings Test.

PubMed

Carabante, Kennet Mariano; Alonso-Marenco, Jose Ramon; Chokumnoyporn, Napapan; Sriwattana, Sujinda; Prinyawiwatkul, Witoon

2016-07-01

Appropriate analysis for duplicated multiple-samples rank data is needed. This study compared analysis of duplicated rank preference data using the Friedman versus Mack-Skillings tests. Panelists (n = 125) ranked twice 2 orange juice sets: different-samples set (100%, 70%, vs. 40% juice) and similar-samples set (100%, 95%, vs. 90%). These 2 sample sets were designed to get contrasting differences in preference. For each sample set, rank sum data were obtained from (1) averaged rank data of each panelist from the 2 replications (n = 125), (2) rank data of all panelists from each of the 2 separate replications (n = 125 each), (3) jointed rank data of all panelists from the 2 replications (n = 125), and (4) rank data of all panelists pooled from the 2 replications (n = 250); rank data (1), (2), and (4) were separately analyzed by the Friedman test, although those from (3) by the Mack-Skillings test. The effect of sample sizes (n = 10 to 125) was evaluated. For the similar-samples set, higher variations in rank data from the 2 replications were observed; therefore, results of the main effects were more inconsistent among methods and sample sizes. Regardless of analysis methods, the larger the sample size, the higher the χ(2) value, the lower the P-value (testing H0 : all samples are not different). Analyzing rank data (2) separately by replication yielded inconsistent conclusions across sample sizes, hence this method is not recommended. The Mack-Skillings test was more sensitive than the Friedman test. Furthermore, it takes into account within-panelist variations and is more appropriate for analyzing duplicated rank data. © 2016 Institute of Food Technologists®
Chemometric analysis of soil pollution data using the Tucker N-way method.

PubMed

Stanimirova, I; Zehl, K; Massart, D L; Vander Heyden, Y; Einax, J W

2006-06-01

N-way methods, particularly the Tucker method, are often the methods of choice when analyzing data sets arranged in three- (or higher) way arrays, which is the case for most environmental data sets. In the future, applying N-way methods will become an increasingly popular way to uncover hidden information in complex data sets. The reason for this is that classical two-way approaches such as principal component analysis are not as good at revealing the complex relationships present in data sets. This study describes in detail the application of a chemometric N-way approach, namely the Tucker method, in order to evaluate the level of pollution in soil from a contaminated site. The analyzed soil data set was five-way in nature. The samples were collected at different depths (way 1) from two locations (way 2) and the levels of thirteen metals (way 3) were analyzed using a four-step-sequential extraction procedure (way 4), allowing detailed information to be obtained about the bioavailability and activity of the different binding forms of the metals. Furthermore, the measurements were performed under two conditions (way 5), inert and non-inert. The preferred Tucker model of definite complexity showed that there was no significant difference in measurements analyzed under inert or non-inert conditions. It also allowed two depth horizons, characterized by different accumulation pathways, to be distinguished, and it allowed the relationships between chemical elements and their biological activities and mobilities in the soil to be described in detail.
Flow analysis of human chromosome sets by means of mixing-stirring device

NASA Astrophysics Data System (ADS)

Zenin, Valeri V.; Aksenov, Nicolay D.; Shatrova, Alla N.; Klopov, Nicolay V.; Cram, L. Scott; Poletaev, Andrey I.

1997-05-01

A new mixing and stirring device (MSD) was used to perform flow karyotype analysis of single human mitotic chromosomes analyzed so as to maintain the identity of chromosomes derived from the same cell. An improved method for cell preparation and intracellular staining of chromosomes was developed. The method includes enzyme treatment, incubation with saponin and separation of prestained cells from debris on a sucrose gradient. Mitotic cells are injected one by one in the MSD which is located inside the flow chamber where cells are ruptured, thereby releasing chromosomes. The set of chromosomes proceeds to flow in single file fashion to the point of analysis. The device works in a stepwise manner. The concentration of cells in the sample must be kept low to ensure that only one cell at a time enters the breaking chamber. Time-gated accumulation of data in listmode files makes it possible to separate chromosome sets comprising of single cells. The software that was developed classifies chromosome sets according to different criteria: total number of chromosomes, overall DNA content in the set, and the number of chromosomes of certain types. This approach combines the high performance of flow cytometry with the advantages of image analysis. Examples obtained with different human cell lines are presented.
Let them fall where they may: congruence analysis in massive phylogenetically messy data sets.

PubMed

Leigh, Jessica W; Schliep, Klaus; Lopez, Philippe; Bapteste, Eric

2011-10-01

Interest in congruence in phylogenetic data has largely focused on issues affecting multicellular organisms, and animals in particular, in which the level of incongruence is expected to be relatively low. In addition, assessment methods developed in the past have been designed for reasonably small numbers of loci and scale poorly for larger data sets. However, there are currently over a thousand complete genome sequences available and of interest to evolutionary biologists, and these sequences are predominantly from microbial organisms, whose molecular evolution is much less frequently tree-like than that of multicellular life forms. As such, the level of incongruence in these data is expected to be high. We present a congruence method that accommodates both very large numbers of genes and high degrees of incongruence. Our method uses clustering algorithms to identify subsets of genes based on similarity of phylogenetic signal. It involves only a single phylogenetic analysis per gene, and therefore, computation time scales nearly linearly with the number of genes in the data set. We show that our method performs very well with sets of sequence alignments simulated under a wide variety of conditions. In addition, we present an analysis of core genes of prokaryotes, often assumed to have been largely vertically inherited, in which we identify two highly incongruent classes of genes. This result is consistent with the complexity hypothesis.
New approaches to wipe sampling methods for antineoplastic and other hazardous drugs in healthcare settings.

PubMed

Connor, Thomas H; Smith, Jerome P

2016-09-01

At the present time, the method of choice to determine surface contamination of the workplace with antineoplastic and other hazardous drugs is surface wipe sampling and subsequent sample analysis with a variety of analytical techniques. The purpose of this article is to review current methodology for determining the level of surface contamination with hazardous drugs in healthcare settings and to discuss recent advances in this area. In addition it will provide some guidance for conducting surface wipe sampling and sample analysis for these drugs in healthcare settings. Published studies on the use of wipe sampling to measure hazardous drugs on surfaces in healthcare settings drugs were reviewed. These studies include the use of well-documented chromatographic techniques for sample analysis in addition to newly evolving technology that provides rapid analysis of specific antineoplastic. Methodology for the analysis of surface wipe samples for hazardous drugs are reviewed, including the purposes, technical factors, sampling strategy, materials required, and limitations. The use of lateral flow immunoassay (LFIA) and fluorescence covalent microbead immunosorbent assay (FCMIA) for surface wipe sample evaluation is also discussed. Current recommendations are that all healthc a re settings where antineoplastic and other hazardous drugs are handled include surface wipe sampling as part of a comprehensive hazardous drug-safe handling program. Surface wipe sampling may be used as a method to characterize potential occupational dermal exposure risk and to evaluate the effectiveness of implemented controls and the overall safety program. New technology, although currently limited in scope, may make wipe sampling for hazardous drugs more routine, less costly, and provide a shorter response time than classical analytical techniques now in use.
Using the gene ontology for microarray data mining: a comparison of methods and application to age effects in human prefrontal cortex.

PubMed

Pavlidis, Paul; Qin, Jie; Arango, Victoria; Mann, John J; Sibille, Etienne

2004-06-01

One of the challenges in the analysis of gene expression data is placing the results in the context of other data available about genes and their relationships to each other. Here, we approach this problem in the study of gene expression changes associated with age in two areas of the human prefrontal cortex, comparing two computational methods. The first method, "overrepresentation analysis" (ORA), is based on statistically evaluating the fraction of genes in a particular gene ontology class found among the set of genes showing age-related changes in expression. The second method, "functional class scoring" (FCS), examines the statistical distribution of individual gene scores among all genes in the gene ontology class and does not involve an initial gene selection step. We find that FCS yields more consistent results than ORA, and the results of ORA depended strongly on the gene selection threshold. Our findings highlight the utility of functional class scoring for the analysis of complex expression data sets and emphasize the advantage of considering all available genomic information rather than sets of genes that pass a predetermined "threshold of significance."
Maternal psychosocial well-being in Eritrea: application of participatory methods and tools of investigation and analysis in complex emergency settings.

PubMed Central

Almedom, Astier M.; Tesfamichael, Berhe; Yacob, Abdu; Debretsion, Zaïd; Teklehaimanot, Kidane; Beyene, Teshome; Kuhn, Kira; Alemu, Zemui

2003-01-01

OBJECTIVE: To establish the context in which maternal psychosocial well-being is understood in war-affected settings in Eritrea. METHOD: Pretested and validated participatory methods and tools of investigation and analysis were employed to allow participants to engage in processes of qualitative data collection, on-site analysis, and interpretation. FINDINGS: Maternal psychosocial well-being in Eritrea is maintained primarily by traditional systems of social support that are mostly outside the domain of statutory primary care. Traditional birth attendants provide a vital link between the two. Formal training and regular supplies of sterile delivery kits appear to be worthwhile options for health policy and practice in the face of the post-conflict challenges of ruined infrastructure and an overstretched and/or ill-mannered workforce in the maternity health service. CONCLUSION: Methodological advances in health research and the dearth of data on maternal psychosocial well-being in complex emergency settings call for scholars and practitioners to collaborate in creative searches for sound evidence on which to base maternity, mental health and social care policy and practice. Participatory methods facilitate the meaningful engagement of key stakeholders and enhance data quality, reliability and usability. PMID:12856054
Integrated Analysis of Pharmacologic, Clinical, and SNP Microarray Data using Projection onto the Most Interesting Statistical Evidence with Adaptive Permutation Testing

PubMed Central

Pounds, Stan; Cao, Xueyuan; Cheng, Cheng; Yang, Jun; Campana, Dario; Evans, William E.; Pui, Ching-Hon; Relling, Mary V.

2010-01-01

Powerful methods for integrated analysis of multiple biological data sets are needed to maximize interpretation capacity and acquire meaningful knowledge. We recently developed Projection Onto the Most Interesting Statistical Evidence (PROMISE). PROMISE is a statistical procedure that incorporates prior knowledge about the biological relationships among endpoint variables into an integrated analysis of microarray gene expression data with multiple biological and clinical endpoints. Here, PROMISE is adapted to the integrated analysis of pharmacologic, clinical, and genome-wide genotype data that incorporating knowledge about the biological relationships among pharmacologic and clinical response data. An efficient permutation-testing algorithm is introduced so that statistical calculations are computationally feasible in this higher-dimension setting. The new method is applied to a pediatric leukemia data set. The results clearly indicate that PROMISE is a powerful statistical tool for identifying genomic features that exhibit a biologically meaningful pattern of association with multiple endpoint variables. PMID:21516175
Application of econometric and ecology analysis methods in physics software

NASA Astrophysics Data System (ADS)

Han, Min Cheol; Hoff, Gabriela; Kim, Chan Hyeong; Kim, Sung Hun; Grazia Pia, Maria; Ronchieri, Elisabetta; Saracco, Paolo

2017-10-01

Some data analysis methods typically used in econometric studies and in ecology have been evaluated and applied in physics software environments. They concern the evolution of observables through objective identification of change points and trends, and measurements of inequality, diversity and evenness across a data set. Within each analysis area, various statistical tests and measures have been examined. This conference paper summarizes a brief overview of some of these methods.
Statistical plant set estimation using Schroeder-phased multisinusoidal input design

NASA Technical Reports Server (NTRS)

Bayard, D. S.

1992-01-01

A frequency domain method is developed for plant set estimation. The estimation of a plant 'set' rather than a point estimate is required to support many methods of modern robust control design. The approach here is based on using a Schroeder-phased multisinusoid input design which has the special property of placing input energy only at the discrete frequency points used in the computation. A detailed analysis of the statistical properties of the frequency domain estimator is given, leading to exact expressions for the probability distribution of the estimation error, and many important properties. It is shown that, for any nominal parametric plant estimate, one can use these results to construct an overbound on the additive uncertainty to any prescribed statistical confidence. The 'soft' bound thus obtained can be used to replace 'hard' bounds presently used in many robust control analysis and synthesis methods.
Baryonic effects in cosmic shear tomography: PCA parametrization and importance of extreme baryonic models

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mohammed, Irshad; Gnedin, Nickolay Y.

Baryonic effects are amongst the most severe systematics to the tomographic analysis of weak lensing data which is the principal probe in many future generations of cosmological surveys like LSST, Euclid etc.. Modeling or parameterizing these effects is essential in order to extract valuable constraints on cosmological parameters. In a recent paper, Eifler et al. (2015) suggested a reduction technique for baryonic effects by conducting a principal component analysis (PCA) and removing the largest baryonic eigenmodes from the data. In this article, we conducted the investigation further and addressed two critical aspects. Firstly, we performed the analysis by separating the simulations into training and test sets, computing a minimal set of principle components from the training set and examining the fits on the test set. We found that using only four parameters, corresponding to the four largest eigenmodes of the training set, the test sets can be fitted thoroughly with an RMSmore » $$\\sim 0.0011$$. Secondly, we explored the significance of outliers, the most exotic/extreme baryonic scenarios, in this method. We found that excluding the outliers from the training set results in a relatively bad fit and degraded the RMS by nearly a factor of 3. Therefore, for a direct employment of this method to the tomographic analysis of the weak lensing data, the principle components should be derived from a training set that comprises adequately exotic but reasonable models such that the reality is included inside the parameter domain sampled by the training set. The baryonic effects can be parameterized as the coefficients of these principle components and should be marginalized over the cosmological parameter space.« less
An efficient graph theory based method to identify every minimal reaction set in a metabolic network

PubMed Central

2014-01-01

Background Development of cells with minimal metabolic functionality is gaining importance due to their efficiency in producing chemicals and fuels. Existing computational methods to identify minimal reaction sets in metabolic networks are computationally expensive. Further, they identify only one of the several possible minimal reaction sets. Results In this paper, we propose an efficient graph theory based recursive optimization approach to identify all minimal reaction sets. Graph theoretical insights offer systematic methods to not only reduce the number of variables in math programming and increase its computational efficiency, but also provide efficient ways to find multiple optimal solutions. The efficacy of the proposed approach is demonstrated using case studies from Escherichia coli and Saccharomyces cerevisiae. In case study 1, the proposed method identified three minimal reaction sets each containing 38 reactions in Escherichia coli central metabolic network with 77 reactions. Analysis of these three minimal reaction sets revealed that one of them is more suitable for developing minimal metabolism cell compared to other two due to practically achievable internal flux distribution. In case study 2, the proposed method identified 256 minimal reaction sets from the Saccharomyces cerevisiae genome scale metabolic network with 620 reactions. The proposed method required only 4.5 hours to identify all the 256 minimal reaction sets and has shown a significant reduction (approximately 80%) in the solution time when compared to the existing methods for finding minimal reaction set. Conclusions Identification of all minimal reactions sets in metabolic networks is essential since different minimal reaction sets have different properties that effect the bioprocess development. The proposed method correctly identified all minimal reaction sets in a both the case studies. The proposed method is computationally efficient compared to other methods for finding minimal reaction sets and useful to employ with genome-scale metabolic networks. PMID:24594118
Quantifying and visualizing variations in sets of images using continuous linear optimal transport

NASA Astrophysics Data System (ADS)

Kolouri, Soheil; Rohde, Gustavo K.

2014-03-01

Modern advancements in imaging devices have enabled us to explore the subcellular structure of living organisms and extract vast amounts of information. However, interpreting the biological information mined in the captured images is not a trivial task. Utilizing predetermined numerical features is usually the only hope for quantifying this information. Nonetheless, direct visual or biological interpretation of results obtained from these selected features is non-intuitive and difficult. In this paper, we describe an automatic method for modeling visual variations in a set of images, which allows for direct visual interpretation of the most significant differences, without the need for predefined features. The method is based on a linearized version of the continuous optimal transport (OT) metric, which provides a natural linear embedding for the image data set, in which linear combination of images leads to a visually meaningful image. This enables us to apply linear geometric data analysis techniques such as principal component analysis and linear discriminant analysis in the linearly embedded space and visualize the most prominent modes, as well as the most discriminant modes of variations, in the dataset. Using the continuous OT framework, we are able to analyze variations in shape and texture in a set of images utilizing each image at full resolution, that otherwise cannot be done by existing methods. The proposed method is applied to a set of nuclei images segmented from Feulgen stained liver tissues in order to investigate the major visual differences in chromatin distribution of Fetal-Type Hepatoblastoma (FHB) cells compared to the normal cells.
Analysis of longitudinal data from animals with missing values using SPSS.

PubMed

Duricki, Denise A; Soleman, Sara; Moon, Lawrence D F

2016-06-01

Testing of therapies for disease or injury often involves the analysis of longitudinal data from animals. Modern analytical methods have advantages over conventional methods (particularly when some data are missing), yet they are not used widely by preclinical researchers. Here we provide an easy-to-use protocol for the analysis of longitudinal data from animals, and we present a click-by-click guide for performing suitable analyses using the statistical package IBM SPSS Statistics software (SPSS). We guide readers through the analysis of a real-life data set obtained when testing a therapy for brain injury (stroke) in elderly rats. If a few data points are missing, as in this example data set (for example, because of animal dropout), repeated-measures analysis of covariance may fail to detect a treatment effect. An alternative analysis method, such as the use of linear models (with various covariance structures), and analysis using restricted maximum likelihood estimation (to include all available data) can be used to better detect treatment effects. This protocol takes 2 h to carry out.
The Impact of Being Part of an Action Learning Set for New Lecturers: A Reflective Analysis

ERIC Educational Resources Information Center

Haith, Mark P.; Whittingham, Katrina A.

2012-01-01

What is an action learning set (ALS)? An ALS is a regular, action focused peer discussion group, generally facilitated, to address work place issues. Methods of undertaking ALS: methods are flexible within a range of approaches according to the group's developing needs. Benefits of ALS: builds trust, professional development, enables action,…
Discrimination of Active and Weakly Active Human BACE1 Inhibitors Using Self-Organizing Map and Support Vector Machine.

PubMed

Li, Hang; Wang, Maolin; Gong, Ya-Nan; Yan, Aixia

2016-01-01

β-secretase (BACE1) is an aspartyl protease, which is considered as a novel vital target in Alzheimer`s disease therapy. We collected a data set of 294 BACE1 inhibitors, and built six classification models to discriminate active and weakly active inhibitors using Kohonen's Self-Organizing Map (SOM) method and Support Vector Machine (SVM) method. Each molecular descriptor was calculated using the program ADRIANA.Code. We adopted two different methods: random method and Self-Organizing Map method, for training/test set split. The descriptors were selected by F-score and stepwise linear regression analysis. The best SVM model Model2C has a good prediction performance on test set with prediction accuracy, sensitivity (SE), specificity (SP) and Matthews correlation coefficient (MCC) of 89.02%, 90%, 88%, 0.78, respectively. Model 1A is the best SOM model, whose accuracy and MCC of the test set were 94.57% and 0.98, respectively. The lone pair electronegativity and polarizability related descriptors importantly contributed to bioactivity of BACE1 inhibitor. The Extended-Connectivity Finger-Prints_4 (ECFP_4) analysis found some vitally key substructural features, which could be helpful for further drug design research. The SOM and SVM models built in this study can be obtained from the authors by email or other contacts.
Sampling with poling-based flux balance analysis: optimal versus sub-optimal flux space analysis of Actinobacillus succinogenes.

PubMed

Binns, Michael; de Atauri, Pedro; Vlysidis, Anestis; Cascante, Marta; Theodoropoulos, Constantinos

2015-02-18

Flux balance analysis is traditionally implemented to identify the maximum theoretical flux for some specified reaction and a single distribution of flux values for all the reactions present which achieve this maximum value. However it is well known that the uncertainty in reaction networks due to branches, cycles and experimental errors results in a large number of combinations of internal reaction fluxes which can achieve the same optimal flux value. In this work, we have modified the applied linear objective of flux balance analysis to include a poling penalty function, which pushes each new set of reaction fluxes away from previous solutions generated. Repeated poling-based flux balance analysis generates a sample of different solutions (a characteristic set), which represents all the possible functionality of the reaction network. Compared to existing sampling methods, for the purpose of generating a relatively "small" characteristic set, our new method is shown to obtain a higher coverage than competing methods under most conditions. The influence of the linear objective function on the sampling (the linear bias) constrains optimisation results to a subspace of optimal solutions all producing the same maximal fluxes. Visualisation of reaction fluxes plotted against each other in 2 dimensions with and without the linear bias indicates the existence of correlations between fluxes. This method of sampling is applied to the organism Actinobacillus succinogenes for the production of succinic acid from glycerol. A new method of sampling for the generation of different flux distributions (sets of individual fluxes satisfying constraints on the steady-state mass balances of intermediates) has been developed using a relatively simple modification of flux balance analysis to include a poling penalty function inside the resulting optimisation objective function. This new methodology can achieve a high coverage of the possible flux space and can be used with and without linear bias to show optimal versus sub-optimal solution spaces. Basic analysis of the Actinobacillus succinogenes system using sampling shows that in order to achieve the maximal succinic acid production CO₂ must be taken into the system. Solutions involving release of CO₂ all give sub-optimal succinic acid production.
Comparing Analysis Frames for Visual Data Sets: Using Pupil Views Templates to Explore Perspectives of Learning

ERIC Educational Resources Information Center

Wall, Kate; Higgins, Steve; Remedios, Richard; Rafferty, Victoria; Tiplady, Lucy

2013-01-01

A key challenge of visual methodology is how to combine large-scale qualitative data sets with epistemologically acceptable and rigorous analysis techniques. The authors argue that a pragmatic approach drawing on ideas from mixed methods is helpful to open up the full potential of visual data. However, before one starts to "mix" the…

Non-monetary valuation using Multi-Criteria Decision Analysis: Sensitivity of additive aggregation methods to scaling and compensation assumptions

EPA Science Inventory

Analytical methods for Multi-Criteria Decision Analysis (MCDA) support the non-monetary valuation of ecosystem services for environmental decision making. Many published case studies transform ecosystem service outcomes into a common metric and aggregate the outcomes to set land ...
A Technique of Two-Stage Clustering Applied to Environmental and Civil Engineering and Related Methods of Citation Analysis.

ERIC Educational Resources Information Center

Miyamoto, S.; Nakayama, K.

1983-01-01

A method of two-stage clustering of literature based on citation frequency is applied to 5,065 articles from 57 journals in environmental and civil engineering. Results of related methods of citation analysis (hierarchical graph, clustering of journals, multidimensional scaling) applied to same set of articles are compared. Ten references are…
Robust Mokken Scale Analysis by Means of the Forward Search Algorithm for Outlier Detection

ERIC Educational Resources Information Center

Zijlstra, Wobbe P.; van der Ark, L. Andries; Sijtsma, Klaas

2011-01-01

Exploratory Mokken scale analysis (MSA) is a popular method for identifying scales from larger sets of items. As with any statistical method, in MSA the presence of outliers in the data may result in biased results and wrong conclusions. The forward search algorithm is a robust diagnostic method for outlier detection, which we adapt here to…
The Analysis of Likert Scales Using State Multipoles: An Application of Quantum Methods to Behavioral Sciences Data

ERIC Educational Resources Information Center

Camparo, James; Camparo, Lorinda B.

2013-01-01

Though ubiquitous, Likert scaling's traditional mode of analysis is often unable to uncover all of the valid information in a data set. Here, the authors discuss a solution to this problem based on methodology developed by quantum physicists: the state multipole method. The authors demonstrate the relative ease and value of this method by…
Data Analysis for the Behavioral Sciences Using SPSS

NASA Astrophysics Data System (ADS)

Lawner Weinberg, Sharon; Knapp Abramowitz, Sarah

2002-04-01

This book is written from the perspective that statistics is an integrated set of tools used together to uncover the story contained in numerical data. Accordingly, the book comes with a disk containing a series of real data sets to motivate discussions of appropriate methods of analysis. The presentation is based on a conceptual approach supported by an understanding of underlying mathematical foundations. Students learn that more than one method of analysis is typically needed and that an ample characterization of results is a critical component of any data analytic plan. The use of real data and SPSS to perform computations and create graphical summaries enables a greater emphasis on conceptual understanding and interpretation.
Visualizing Phylogenetic Treespace Using Cartographic Projections

NASA Astrophysics Data System (ADS)

Sundberg, Kenneth; Clement, Mark; Snell, Quinn

Phylogenetic analysis is becoming an increasingly important tool for biological research. Applications include epidemiological studies, drug development, and evolutionary analysis. Phylogenetic search is a known NP-Hard problem. The size of the data sets which can be analyzed is limited by the exponential growth in the number of trees that must be considered as the problem size increases. A better understanding of the problem space could lead to better methods, which in turn could lead to the feasible analysis of more data sets. We present a definition of phylogenetic tree space and a visualization of this space that shows significant exploitable structure. This structure can be used to develop search methods capable of handling much larger datasets.
Fast and Scalable Gaussian Process Modeling with Applications to Astronomical Time Series

NASA Astrophysics Data System (ADS)

Foreman-Mackey, Daniel; Agol, Eric; Ambikasaran, Sivaram; Angus, Ruth

2017-12-01

The growing field of large-scale time domain astronomy requires methods for probabilistic data analysis that are computationally tractable, even with large data sets. Gaussian processes (GPs) are a popular class of models used for this purpose, but since the computational cost scales, in general, as the cube of the number of data points, their application has been limited to small data sets. In this paper, we present a novel method for GPs modeling in one dimension where the computational requirements scale linearly with the size of the data set. We demonstrate the method by applying it to simulated and real astronomical time series data sets. These demonstrations are examples of probabilistic inference of stellar rotation periods, asteroseismic oscillation spectra, and transiting planet parameters. The method exploits structure in the problem when the covariance function is expressed as a mixture of complex exponentials, without requiring evenly spaced observations or uniform noise. This form of covariance arises naturally when the process is a mixture of stochastically driven damped harmonic oscillators—providing a physical motivation for and interpretation of this choice—but we also demonstrate that it can be a useful effective model in some other cases. We present a mathematical description of the method and compare it to existing scalable GP methods. The method is fast and interpretable, with a range of potential applications within astronomical data analysis and beyond. We provide well-tested and documented open-source implementations of this method in C++, Python, and Julia.
Network neighborhood analysis with the multi-node topological overlap measure.

PubMed

Li, Ai; Horvath, Steve

2007-01-15

The goal of neighborhood analysis is to find a set of genes (the neighborhood) that is similar to an initial 'seed' set of genes. Neighborhood analysis methods for network data are important in systems biology. If individual network connections are susceptible to noise, it can be advantageous to define neighborhoods on the basis of a robust interconnectedness measure, e.g. the topological overlap measure. Since the use of multiple nodes in the seed set may lead to more informative neighborhoods, it can be advantageous to define multi-node similarity measures. The pairwise topological overlap measure is generalized to multiple network nodes and subsequently used in a recursive neighborhood construction method. A local permutation scheme is used to determine the neighborhood size. Using four network applications and a simulated example, we provide empirical evidence that the resulting neighborhoods are biologically meaningful, e.g. we use neighborhood analysis to identify brain cancer related genes. An executable Windows program and tutorial for multi-node topological overlap measure (MTOM) based analysis can be downloaded from the webpage (http://www.genetics.ucla.edu/labs/horvath/MTOM/).
Material nonlinear analysis via mixed-iterative finite element method

NASA Technical Reports Server (NTRS)

Sutjahjo, Edhi; Chamis, Christos C.

1992-01-01

The performance of elastic-plastic mixed-iterative analysis is examined through a set of convergence studies. Membrane and bending behaviors are tested using 4-node quadrilateral finite elements. The membrane result is excellent, which indicates the implementation of elastic-plastic mixed-iterative analysis is appropriate. On the other hand, further research to improve bending performance of the method seems to be warranted.
On the construction of a ground truth framework for evaluating voxel-based diffusion tensor MRI analysis methods.

PubMed

Van Hecke, Wim; Sijbers, Jan; De Backer, Steve; Poot, Dirk; Parizel, Paul M; Leemans, Alexander

2009-07-01

Although many studies are starting to use voxel-based analysis (VBA) methods to compare diffusion tensor images between healthy and diseased subjects, it has been demonstrated that VBA results depend heavily on parameter settings and implementation strategies, such as the applied coregistration technique, smoothing kernel width, statistical analysis, etc. In order to investigate the effect of different parameter settings and implementations on the accuracy and precision of the VBA results quantitatively, ground truth knowledge regarding the underlying microstructural alterations is required. To address the lack of such a gold standard, simulated diffusion tensor data sets are developed, which can model an array of anomalies in the diffusion properties of a predefined location. These data sets can be employed to evaluate the numerous parameters that characterize the pipeline of a VBA algorithm and to compare the accuracy, precision, and reproducibility of different post-processing approaches quantitatively. We are convinced that the use of these simulated data sets can improve the understanding of how different diffusion tensor image post-processing techniques affect the outcome of VBA. In turn, this may possibly lead to a more standardized and reliable evaluation of diffusion tensor data sets of large study groups with a wide range of white matter altering pathologies. The simulated DTI data sets will be made available online (http://www.dti.ua.ac.be).
The EIPeptiDi tool: enhancing peptide discovery in ICAT-based LC MS/MS experiments.

PubMed

Cannataro, Mario; Cuda, Giovanni; Gaspari, Marco; Greco, Sergio; Tradigo, Giuseppe; Veltri, Pierangelo

2007-07-15

Isotope-coded affinity tags (ICAT) is a method for quantitative proteomics based on differential isotopic labeling, sample digestion and mass spectrometry (MS). The method allows the identification and relative quantification of proteins present in two samples and consists of the following phases. First, cysteine residues are either labeled using the ICAT Light or ICAT Heavy reagent (having identical chemical properties but different masses). Then, after whole sample digestion, the labeled peptides are captured selectively using the biotin tag contained in both ICAT reagents. Finally, the simplified peptide mixture is analyzed by nanoscale liquid chromatography-tandem mass spectrometry (LC-MS/MS). Nevertheless, the ICAT LC-MS/MS method still suffers from insufficient sample-to-sample reproducibility on peptide identification. In particular, the number and the type of peptides identified in different experiments can vary considerably and, thus, the statistical (comparative) analysis of sample sets is very challenging. Low information overlap at the peptide and, consequently, at the protein level, is very detrimental in situations where the number of samples to be analyzed is high. We designed a method for improving the data processing and peptide identification in sample sets subjected to ICAT labeling and LC-MS/MS analysis, based on cross validating MS/MS results. Such a method has been implemented in a tool, called EIPeptiDi, which boosts the ICAT data analysis software improving peptide identification throughout the input data set. Heavy/Light (H/L) pairs quantified but not identified by the MS/MS routine, are assigned to peptide sequences identified in other samples, by using similarity criteria based on chromatographic retention time and Heavy/Light mass attributes. EIPeptiDi significantly improves the number of identified peptides per sample, proving that the proposed method has a considerable impact on the protein identification process and, consequently, on the amount of potentially critical information in clinical studies. The EIPeptiDi tool is available at http://bioingegneria.unicz.it/~veltri/projects/eipeptidi/ with a demo data set. EIPeptiDi significantly increases the number of peptides identified and quantified in analyzed samples, thus reducing the number of unassigned H/L pairs and allowing a better comparative analysis of sample data sets.
GazeAppraise v. 0.1

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wilson, Andrew; Haass, Michael; Rintoul, Mark Daniel

GazeAppraise advances the state of the art of gaze pattern analysis using methods that simultaneously analyze spatial and temporal characteristics of gaze patterns. GazeAppraise enables novel research in visual perception and cognition; for example, using shape features as distinguishing elements to assess individual differences in visual search strategy. Given a set of point-to-point gaze sequences, hereafter referred to as scanpaths, the method constructs multiple descriptive features for each scanpath. Once the scanpath features have been calculated, they are used to form a multidimensional vector representing each scanpath and cluster analysis is performed on the set of vectors from all scanpaths.more » An additional benefit of this method is the identification of causal or correlated characteristics of the stimuli, subjects, and visual task through statistical analysis of descriptive metadata distributions within and across clusters.« less
Hybrid Wavelet De-noising and Rank-Set Pair Analysis approach for forecasting hydro-meteorological time series

NASA Astrophysics Data System (ADS)

WANG, D.; Wang, Y.; Zeng, X.

2017-12-01

Accurate, fast forecasting of hydro-meteorological time series is presently a major challenge in drought and flood mitigation. This paper proposes a hybrid approach, Wavelet De-noising (WD) and Rank-Set Pair Analysis (RSPA), that takes full advantage of a combination of the two approaches to improve forecasts of hydro-meteorological time series. WD allows decomposition and reconstruction of a time series by the wavelet transform, and hence separation of the noise from the original series. RSPA, a more reliable and efficient version of Set Pair Analysis, is integrated with WD to form the hybrid WD-RSPA approach. Two types of hydro-meteorological data sets with different characteristics and different levels of human influences at some representative stations are used to illustrate the WD-RSPA approach. The approach is also compared to three other generic methods: the conventional Auto Regressive Integrated Moving Average (ARIMA) method, Artificial Neural Networks (ANNs) (BP-error Back Propagation, MLP-Multilayer Perceptron and RBF-Radial Basis Function), and RSPA alone. Nine error metrics are used to evaluate the model performance. The results show that WD-RSPA is accurate, feasible, and effective. In particular, WD-RSPA is found to be the best among the various generic methods compared in this paper, even when the extreme events are included within a time series.
Alternatives to current flow cytometry data analysis for clinical and research studies.

PubMed

Gondhalekar, Carmen; Rajwa, Bartek; Patsekin, Valery; Ragheb, Kathy; Sturgis, Jennifer; Robinson, J Paul

2018-02-01

Flow cytometry has well-established methods for data analysis based on traditional data collection techniques. These techniques typically involved manual insertion of tube samples into an instrument that, historically, could only measure 1-3 colors. The field has since evolved to incorporate new technologies for faster and highly automated sample preparation and data collection. For example, the use of microwell plates on benchtop instruments is now a standard on virtually every new instrument, and so users can easily accumulate multiple data sets quickly. Further, because the user must carefully define the layout of the plate, this information is already defined when considering the analytical process, expanding the opportunities for automated analysis. Advances in multi-parametric data collection, as demonstrated by the development of hyperspectral flow-cytometry, 20-40 color polychromatic flow cytometry, and mass cytometry (CyTOF), are game-changing. As data and assay complexity increase, so too does the complexity of data analysis. Complex data analysis is already a challenge to traditional flow cytometry software. New methods for reviewing large and complex data sets can provide rapid insight into processes difficult to define without more advanced analytical tools. In settings such as clinical labs where rapid and accurate data analysis is a priority, rapid, efficient and intuitive software is needed. This paper outlines opportunities for analysis of complex data sets using examples of multiplexed bead-based assays, drug screens and cell cycle analysis - any of which could become integrated into the clinical environment. Copyright © 2017. Published by Elsevier Inc.
System and method for chromatography and electrophoresis using circular optical scanning

DOEpatents

Balch, Joseph W.; Brewer, Laurence R.; Davidson, James C.; Kimbrough, Joseph R.

2001-01-01

A system and method is disclosed for chromatography and electrophoresis using circular optical scanning. One or more rectangular microchannel plates or radial microchannel plates has a set of analysis channels for insertion of molecular samples. One or more scanning devices repeatedly pass over the analysis channels in one direction at a predetermined rotational velocity and with a predetermined rotational radius. The rotational radius may be dynamically varied so as to monitor the molecular sample at various positions along a analysis channel. Sample loading robots may also be used to input molecular samples into the analysis channels. Radial microchannel plates are built from a substrate whose analysis channels are disposed at a non-parallel angle with respect to each other. A first step in the method accesses either a rectangular or radial microchannel plate, having a set of analysis channels, and second step passes a scanning device repeatedly in one direction over the analysis channels. As a third step, the scanning device is passed over the analysis channels at dynamically varying distances from a centerpoint of the scanning device. As a fourth step, molecular samples are loaded into the analysis channels with a robot.
Analysis of regional rainfall-runoff parameters for the Lake Michigan Diversion hydrological modeling

USGS Publications Warehouse

Soong, David T.; Over, Thomas M.

2015-01-01

Recalibration of the HSPF parameters to the updated inputs and land covers was completed on two representative watershed models selected from the nine by using a manual method (HSPEXP) and an automatic method (PEST). The objective of the recalibration was to develop a regional parameter set that improves the accuracy in runoff volume prediction for the nine study watersheds. Knowledge about flow and watershed characteristics plays a vital role for validating the calibration in both manual and automatic methods. The best performing parameter set was determined by the automatic calibration method on a two-watershed model. Applying this newly determined parameter set to the nine watersheds for runoff volume simulation resulted in “very good” ratings in five watersheds, an improvement as compared to “very good” ratings achieved for three watersheds by the North Branch parameter set.
Evaluation of variable selection methods for random forests and omics data sets.

PubMed

Degenhardt, Frauke; Seifert, Stephan; Szymczak, Silke

2017-10-16

Machine learning methods and in particular random forests are promising approaches for prediction based on high dimensional omics data sets. They provide variable importance measures to rank predictors according to their predictive power. If building a prediction model is the main goal of a study, often a minimal set of variables with good prediction performance is selected. However, if the objective is the identification of involved variables to find active networks and pathways, approaches that aim to select all relevant variables should be preferred. We evaluated several variable selection procedures based on simulated data as well as publicly available experimental methylation and gene expression data. Our comparison included the Boruta algorithm, the Vita method, recurrent relative variable importance, a permutation approach and its parametric variant (Altmann) as well as recursive feature elimination (RFE). In our simulation studies, Boruta was the most powerful approach, followed closely by the Vita method. Both approaches demonstrated similar stability in variable selection, while Vita was the most robust approach under a pure null model without any predictor variables related to the outcome. In the analysis of the different experimental data sets, Vita demonstrated slightly better stability in variable selection and was less computationally intensive than Boruta.In conclusion, we recommend the Boruta and Vita approaches for the analysis of high-dimensional data sets. Vita is considerably faster than Boruta and thus more suitable for large data sets, but only Boruta can also be applied in low-dimensional settings. © The Author 2017. Published by Oxford University Press.
Network-based differential gene expression analysis suggests cell cycle related genes regulated by E2F1 underlie the molecular difference between smoker and non-smoker lung adenocarcinoma

PubMed Central

2013-01-01

Background Differential gene expression (DGE) analysis is commonly used to reveal the deregulated molecular mechanisms of complex diseases. However, traditional DGE analysis (e.g., the t test or the rank sum test) tests each gene independently without considering interactions between them. Top-ranked differentially regulated genes prioritized by the analysis may not directly relate to the coherent molecular changes underlying complex diseases. Joint analyses of co-expression and DGE have been applied to reveal the deregulated molecular modules underlying complex diseases. Most of these methods consist of separate steps: first to identify gene-gene relationships under the studied phenotype then to integrate them with gene expression changes for prioritizing signature genes, or vice versa. It is warrant a method that can simultaneously consider gene-gene co-expression strength and corresponding expression level changes so that both types of information can be leveraged optimally. Results In this paper, we develop a gene module based method for differential gene expression analysis, named network-based differential gene expression (nDGE) analysis, a one-step integrative process for prioritizing deregulated genes and grouping them into gene modules. We demonstrate that nDGE outperforms existing methods in prioritizing deregulated genes and discovering deregulated gene modules using simulated data sets. When tested on a series of smoker and non-smoker lung adenocarcinoma data sets, we show that top differentially regulated genes identified by the rank sum test in different sets are not consistent while top ranked genes defined by nDGE in different data sets significantly overlap. nDGE results suggest that a differentially regulated gene module, which is enriched for cell cycle related genes and E2F1 targeted genes, plays a role in the molecular differences between smoker and non-smoker lung adenocarcinoma. Conclusions In this paper, we develop nDGE to prioritize deregulated genes and group them into gene modules by simultaneously considering gene expression level changes and gene-gene co-regulations. When applied to both simulated and empirical data, nDGE outperforms the traditional DGE method. More specifically, when applied to smoker and non-smoker lung cancer sets, nDGE results illustrate the molecular differences between smoker and non-smoker lung cancer. PMID:24341432
Analysis of the VIDAS® Staph Enterotoxin III (SET3) for Detection of Staphylococcal Enterotoxins G, H, and I in Foods.

PubMed

Hait, Jennifer M; Nguyen, Angela T; Tallent, Sandra M

2018-04-20

Background : Staphylococcal food poisoning (SFP) frequently causes illnesses worldwide. SFP occurs from the ingestion of staphylococcal enterotoxins (SEs) preformed in foods by enterotoxigenic strains of Staphylococcus species, primarily S. aureus . SEG, SEH, and SEI induce emesis and have been implicated in outbreaks. Immunological-based methods are deemed the most practical methods for the routine analysis of SEs in foods given their ease of use, sensitivity, specificity, and commercial availability. These kits are routinely used to test for SEA-SEE. However, only recently has a kit been developed to detect SEG, SEH, and SEI. Objective: Our research examined the performance of the novel VIDAS ® Staph Enterotoxin III (SET3) for the detection of staphylococcal enterotoxins SEG, SEH, and SEI in foods. Methods : Here we assess the sensitivity and specificity of SET3 using duplicate test portions of six foods at varying concentrations of inclusivity and exclusivity inocula: pure SEG, SEH, SEI, S. aureus strain extracts positive for seg, seh , and sei , as well as SEA, SEB, SEC, SED, and SEE. Results : The overall detection limit was less than 2.09 ng/mL for foods inoculated with SEG, SEH, and SEI, with no cross reactivity observed. Highlights : Integrating concurrent testing to detect the presence of SEA-SEE and SEG-SEI utilizing the SET3 along with the VIDAS SET2, Ridascreen ® SET total, or other comparable kits will be instrumental for the future food assessments in our laboratory and may become the new standard for SE analysis of foods.
Effects of imputation on correlation: implications for analysis of mass spectrometry data from multiple biological matrices.

PubMed

Taylor, Sandra L; Ruhaak, L Renee; Kelly, Karen; Weiss, Robert H; Kim, Kyoungmi

2017-03-01

With expanded access to, and decreased costs of, mass spectrometry, investigators are collecting and analyzing multiple biological matrices from the same subject such as serum, plasma, tissue and urine to enhance biomarker discoveries, understanding of disease processes and identification of therapeutic targets. Commonly, each biological matrix is analyzed separately, but multivariate methods such as MANOVAs that combine information from multiple biological matrices are potentially more powerful. However, mass spectrometric data typically contain large amounts of missing values, and imputation is often used to create complete data sets for analysis. The effects of imputation on multiple biological matrix analyses have not been studied. We investigated the effects of seven imputation methods (half minimum substitution, mean substitution, k-nearest neighbors, local least squares regression, Bayesian principal components analysis, singular value decomposition and random forest), on the within-subject correlation of compounds between biological matrices and its consequences on MANOVA results. Through analysis of three real omics data sets and simulation studies, we found the amount of missing data and imputation method to substantially change the between-matrix correlation structure. The magnitude of the correlations was generally reduced in imputed data sets, and this effect increased with the amount of missing data. Significant results from MANOVA testing also were substantially affected. In particular, the number of false positives increased with the level of missing data for all imputation methods. No one imputation method was universally the best, but the simple substitution methods (Half Minimum and Mean) consistently performed poorly. © The Author 2016. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.

Gene Ranking of RNA-Seq Data via Discriminant Non-Negative Matrix Factorization.

PubMed

Jia, Zhilong; Zhang, Xiang; Guan, Naiyang; Bo, Xiaochen; Barnes, Michael R; Luo, Zhigang

2015-01-01

RNA-sequencing is rapidly becoming the method of choice for studying the full complexity of transcriptomes, however with increasing dimensionality, accurate gene ranking is becoming increasingly challenging. This paper proposes an accurate and sensitive gene ranking method that implements discriminant non-negative matrix factorization (DNMF) for RNA-seq data. To the best of our knowledge, this is the first work to explore the utility of DNMF for gene ranking. When incorporating Fisher's discriminant criteria and setting the reduced dimension as two, DNMF learns two factors to approximate the original gene expression data, abstracting the up-regulated or down-regulated metagene by using the sample label information. The first factor denotes all the genes' weights of two metagenes as the additive combination of all genes, while the second learned factor represents the expression values of two metagenes. In the gene ranking stage, all the genes are ranked as a descending sequence according to the differential values of the metagene weights. Leveraging the nature of NMF and Fisher's criterion, DNMF can robustly boost the gene ranking performance. The Area Under the Curve analysis of differential expression analysis on two benchmarking tests of four RNA-seq data sets with similar phenotypes showed that our proposed DNMF-based gene ranking method outperforms other widely used methods. Moreover, the Gene Set Enrichment Analysis also showed DNMF outweighs others. DNMF is also computationally efficient, substantially outperforming all other benchmarked methods. Consequently, we suggest DNMF is an effective method for the analysis of differential gene expression and gene ranking for RNA-seq data.
Functional Parallel Factor Analysis for Functions of One- and Two-dimensional Arguments.

PubMed

Choi, Ji Yeh; Hwang, Heungsun; Timmerman, Marieke E

2018-03-01

Parallel factor analysis (PARAFAC) is a useful multivariate method for decomposing three-way data that consist of three different types of entities simultaneously. This method estimates trilinear components, each of which is a low-dimensional representation of a set of entities, often called a mode, to explain the maximum variance of the data. Functional PARAFAC permits the entities in different modes to be smooth functions or curves, varying over a continuum, rather than a collection of unconnected responses. The existing functional PARAFAC methods handle functions of a one-dimensional argument (e.g., time) only. In this paper, we propose a new extension of functional PARAFAC for handling three-way data whose responses are sequenced along both a two-dimensional domain (e.g., a plane with x- and y-axis coordinates) and a one-dimensional argument. Technically, the proposed method combines PARAFAC with basis function expansion approximations, using a set of piecewise quadratic finite element basis functions for estimating two-dimensional smooth functions and a set of one-dimensional basis functions for estimating one-dimensional smooth functions. In a simulation study, the proposed method appeared to outperform the conventional PARAFAC. We apply the method to EEG data to demonstrate its empirical usefulness.
An improved set of standards for finding cost for cost-effectiveness analysis.

PubMed

Barnett, Paul G

2009-07-01

Guidelines have helped standardize methods of cost-effectiveness analysis, allowing different interventions to be compared and enhancing the generalizability of study findings. There is agreement that all relevant services be valued from the societal perspective using a long-term time horizon and that more exact methods be used to cost services most affected by the study intervention. Guidelines are not specific enough with respect to costing methods, however. The literature was reviewed to identify the problems associated with the 4 principal methods of cost determination. Microcosting requires direct measurement and is ordinarily reserved to cost novel interventions. Analysts should include nonwage labor cost, person-level and institutional overhead, and the cost of development, set-up activities, supplies, space, and screening. Activity-based cost systems have promise of finding accurate costs of all services provided, but are not widely adopted. Quality must be evaluated and the generalizability of cost estimates to other settings must be considered. Administrative cost estimates, chiefly cost-adjusted charges, are widely used, but the analyst must consider items excluded from the available system. Gross costing methods determine quantity of services used and employ a unit cost. If the intervention will affect the characteristics of a service, the method should not assume that the service is homogeneous. Questions are posed for future reviews of the quality of costing methods. The analyst must avoid inappropriate assumptions, especially those that bias the analysis by exclusion of costs that are affected by the intervention under study.
The effect of mixing method on tricalcium silicate-based cement.

PubMed

Duque, J A; Fernandes, S L; Bubola, J P; Duarte, M A H; Camilleri, J; Marciano, M A

2018-01-01

To evaluate the effect of three methods of mixing on the physical and chemical properties of tricalcium silicate-based cements. The materials evaluated were MTA Angelus and Portland cement with 20% zirconium oxide (PC-20-Zr). The cements were mixed using a 3 : 1 powder-to-liquid ratio. The mixing methods were manual (m), trituration (tr) and ultrasonic (us) activation. The materials were characterized by means of scanning electron microscope (SEM) and energy dispersive X-ray spectroscopy. Flowability was analysed according to ANSI/ADA 57/2012. Initial and final setting times were assessed following ASTM C266/08. Volume change was evaluated using a micro-CT volumetric method. Solubility was analysed according to ADA 57/2012. pH and calcium ion release were measured after 3, 24, 72 and 168 h. Statistical analysis was performed using two-way analysis of variance. The level of significance was set at P = 0.05. The SEM analysis revealed that ultrasonic activation was associated with a homogeneous distribution of particles. Flowability, volume change and initial setting time were not influenced by the mixing method (P > 0.05). Solubility was influenced by the mixing method (P < 0.05). For pH, at 168 h, significant differences were found between MTA-m and PC-20-Zr-m (P < 0.05). For calcium ion release, PC-20-Zr-tr had higher values than MTA-m at 3 h, and MTA-tr had higher values than PC-20-Zr-m at 168 h (P < 0.05). The ultrasonic and trituration methods led to higher calcium ion release and pH compared with manual mixing for all cements, whilst the ultrasonic method produced smaller particles for the PC-20-Zr cement. Flow, setting times and volume change were not influenced by the mixing method used; however, it did have an impact on solubility. © 2017 International Endodontic Journal. Published by John Wiley & Sons Ltd.
Using Set Covering with Item Sampling to Analyze the Infeasibility of Linear Programming Test Assembly Models

ERIC Educational Resources Information Center

Huitzing, Hiddo A.

2004-01-01

This article shows how set covering with item sampling (SCIS) methods can be used in the analysis and preanalysis of linear programming models for test assembly (LPTA). LPTA models can construct tests, fulfilling a set of constraints set by the test assembler. Sometimes, no solution to the LPTA model exists. The model is then said to be…
Advantages of Synthetic Noise and Machine Learning for Analyzing Radioecological Data Sets.

PubMed

Shuryak, Igor

2017-01-01

The ecological effects of accidental or malicious radioactive contamination are insufficiently understood because of the hazards and difficulties associated with conducting studies in radioactively-polluted areas. Data sets from severely contaminated locations can therefore be small. Moreover, many potentially important factors, such as soil concentrations of toxic chemicals, pH, and temperature, can be correlated with radiation levels and with each other. In such situations, commonly-used statistical techniques like generalized linear models (GLMs) may not be able to provide useful information about how radiation and/or these other variables affect the outcome (e.g. abundance of the studied organisms). Ensemble machine learning methods such as random forests offer powerful alternatives. We propose that analysis of small radioecological data sets by GLMs and/or machine learning can be made more informative by using the following techniques: (1) adding synthetic noise variables to provide benchmarks for distinguishing the performances of valuable predictors from irrelevant ones; (2) adding noise directly to the predictors and/or to the outcome to test the robustness of analysis results against random data fluctuations; (3) adding artificial effects to selected predictors to test the sensitivity of the analysis methods in detecting predictor effects; (4) running a selected machine learning method multiple times (with different random-number seeds) to test the robustness of the detected "signal"; (5) using several machine learning methods to test the "signal's" sensitivity to differences in analysis techniques. Here, we applied these approaches to simulated data, and to two published examples of small radioecological data sets: (I) counts of fungal taxa in samples of soil contaminated by the Chernobyl nuclear power plan accident (Ukraine), and (II) bacterial abundance in soil samples under a ruptured nuclear waste storage tank (USA). We show that the proposed techniques were advantageous compared with the methodology used in the original publications where the data sets were presented. Specifically, our approach identified a negative effect of radioactive contamination in data set I, and suggested that in data set II stable chromium could have been a stronger limiting factor for bacterial abundance than the radionuclides 137Cs and 99Tc. This new information, which was extracted from these data sets using the proposed techniques, can potentially enhance the design of radioactive waste bioremediation.
Advantages of Synthetic Noise and Machine Learning for Analyzing Radioecological Data Sets

PubMed Central

Shuryak, Igor

2017-01-01

The ecological effects of accidental or malicious radioactive contamination are insufficiently understood because of the hazards and difficulties associated with conducting studies in radioactively-polluted areas. Data sets from severely contaminated locations can therefore be small. Moreover, many potentially important factors, such as soil concentrations of toxic chemicals, pH, and temperature, can be correlated with radiation levels and with each other. In such situations, commonly-used statistical techniques like generalized linear models (GLMs) may not be able to provide useful information about how radiation and/or these other variables affect the outcome (e.g. abundance of the studied organisms). Ensemble machine learning methods such as random forests offer powerful alternatives. We propose that analysis of small radioecological data sets by GLMs and/or machine learning can be made more informative by using the following techniques: (1) adding synthetic noise variables to provide benchmarks for distinguishing the performances of valuable predictors from irrelevant ones; (2) adding noise directly to the predictors and/or to the outcome to test the robustness of analysis results against random data fluctuations; (3) adding artificial effects to selected predictors to test the sensitivity of the analysis methods in detecting predictor effects; (4) running a selected machine learning method multiple times (with different random-number seeds) to test the robustness of the detected “signal”; (5) using several machine learning methods to test the “signal’s” sensitivity to differences in analysis techniques. Here, we applied these approaches to simulated data, and to two published examples of small radioecological data sets: (I) counts of fungal taxa in samples of soil contaminated by the Chernobyl nuclear power plan accident (Ukraine), and (II) bacterial abundance in soil samples under a ruptured nuclear waste storage tank (USA). We show that the proposed techniques were advantageous compared with the methodology used in the original publications where the data sets were presented. Specifically, our approach identified a negative effect of radioactive contamination in data set I, and suggested that in data set II stable chromium could have been a stronger limiting factor for bacterial abundance than the radionuclides 137Cs and 99Tc. This new information, which was extracted from these data sets using the proposed techniques, can potentially enhance the design of radioactive waste bioremediation. PMID:28068401
Robust boundary detection of left ventricles on ultrasound images using ASM-level set method.

PubMed

Zhang, Yaonan; Gao, Yuan; Li, Hong; Teng, Yueyang; Kang, Yan

2015-01-01

Level set method has been widely used in medical image analysis, but it has difficulties when being used in the segmentation of left ventricular (LV) boundaries on echocardiography images because the boundaries are not very distinguish, and the signal-to-noise ratio of echocardiography images is not very high. In this paper, we introduce the Active Shape Model (ASM) into the traditional level set method to enforce shape constraints. It improves the accuracy of boundary detection and makes the evolution more efficient. The experiments conducted on the real cardiac ultrasound image sequences show a positive and promising result.
Intelligent process mapping through systematic improvement of heuristics

NASA Technical Reports Server (NTRS)

Ieumwananonthachai, Arthur; Aizawa, Akiko N.; Schwartz, Steven R.; Wah, Benjamin W.; Yan, Jerry C.

1992-01-01

The present system for automatic learning/evaluation of novel heuristic methods applicable to the mapping of communication-process sets on a computer network has its basis in the testing of a population of competing heuristic methods within a fixed time-constraint. The TEACHER 4.1 prototype learning system implemented or learning new postgame analysis heuristic methods iteratively generates and refines the mappings of a set of communicating processes on a computer network. A systematic exploration of the space of possible heuristic methods is shown to promise significant improvement.
Introduction and Effectiveness of New Methods of Instruction Using Literature in a Japanese High School Setting

ERIC Educational Resources Information Center

Richings, Vicky Ann; Nishimuro, Masateru

2017-01-01

This paper reports on findings from a classroom study on the introduction and effectiveness of new methods of instruction using English literature in a Japanese high school setting. It is based on data compiled during a two-year research project. In this paper, we will detail the investigation and findings from an analysis of student questionnaire…
Using Narrative as a Data Source and Analytic Method to Investigate Learning Outside of Traditional School Settings with Diverse Youth

ERIC Educational Resources Information Center

Martell, Sandra Toro; Antrop-Gonzalez, Rene

2008-01-01

Narrative is used to describe and understand how people construct meaning of their lives and experiences and how they think about their own and others' identities. We examined narrative as both data source and method of analysis for investigating learning in non-traditional school settings with students from diverse socio-economic status and…
Guidelines for Developing and Reporting Machine Learning Predictive Models in Biomedical Research: A Multidisciplinary View

PubMed Central

2016-01-01

Background As more and more researchers are turning to big data for new opportunities of biomedical discoveries, machine learning models, as the backbone of big data analysis, are mentioned more often in biomedical journals. However, owing to the inherent complexity of machine learning methods, they are prone to misuse. Because of the flexibility in specifying machine learning models, the results are often insufficiently reported in research articles, hindering reliable assessment of model validity and consistent interpretation of model outputs. Objective To attain a set of guidelines on the use of machine learning predictive models within clinical settings to make sure the models are correctly applied and sufficiently reported so that true discoveries can be distinguished from random coincidence. Methods A multidisciplinary panel of machine learning experts, clinicians, and traditional statisticians were interviewed, using an iterative process in accordance with the Delphi method. Results The process produced a set of guidelines that consists of (1) a list of reporting items to be included in a research article and (2) a set of practical sequential steps for developing predictive models. Conclusions A set of guidelines was generated to enable correct application of machine learning models and consistent reporting of model specifications and results in biomedical research. We believe that such guidelines will accelerate the adoption of big data analysis, particularly with machine learning methods, in the biomedical research community. PMID:27986644
Assessing differential expression in two-color microarrays: a resampling-based empirical Bayes approach.

PubMed

Li, Dongmei; Le Pape, Marc A; Parikh, Nisha I; Chen, Will X; Dye, Timothy D

2013-01-01

Microarrays are widely used for examining differential gene expression, identifying single nucleotide polymorphisms, and detecting methylation loci. Multiple testing methods in microarray data analysis aim at controlling both Type I and Type II error rates; however, real microarray data do not always fit their distribution assumptions. Smyth's ubiquitous parametric method, for example, inadequately accommodates violations of normality assumptions, resulting in inflated Type I error rates. The Significance Analysis of Microarrays, another widely used microarray data analysis method, is based on a permutation test and is robust to non-normally distributed data; however, the Significance Analysis of Microarrays method fold change criteria are problematic, and can critically alter the conclusion of a study, as a result of compositional changes of the control data set in the analysis. We propose a novel approach, combining resampling with empirical Bayes methods: the Resampling-based empirical Bayes Methods. This approach not only reduces false discovery rates for non-normally distributed microarray data, but it is also impervious to fold change threshold since no control data set selection is needed. Through simulation studies, sensitivities, specificities, total rejections, and false discovery rates are compared across the Smyth's parametric method, the Significance Analysis of Microarrays, and the Resampling-based empirical Bayes Methods. Differences in false discovery rates controls between each approach are illustrated through a preterm delivery methylation study. The results show that the Resampling-based empirical Bayes Methods offer significantly higher specificity and lower false discovery rates compared to Smyth's parametric method when data are not normally distributed. The Resampling-based empirical Bayes Methods also offers higher statistical power than the Significance Analysis of Microarrays method when the proportion of significantly differentially expressed genes is large for both normally and non-normally distributed data. Finally, the Resampling-based empirical Bayes Methods are generalizable to next generation sequencing RNA-seq data analysis.
[Quantitative spectrum analysis of characteristic gases of spontaneous combustion coal].

PubMed

Liang, Yun-Tao; Tang, Xiao-Jun; Luo, Hai-Zhu; Sun, Yong

2011-09-01

Aimed at the characteristics of spontaneous combustion gas such as a variety of gases, lou limit of detection, and critical requirement of safety, Fourier transform infrared (FTIR) spectral analysis is presented to analyze characteristic gases of spontaneous combustion In this paper, analysis method is introduced at first by combing characteristics of absorption spectra of analyte and analysis requirement. Parameter setting method, sample preparation, feature variable abstract and analysis model building are taken into consideration. The methods of sample preparation, feature abstraction and analysis model are introduced in detail. And then, eleven kinds of gases were tested with Tensor 27 spectrometer. CH4, C2H6, C3H8, iC4H10, nC4H10, C2 H4, C3 H6, C3 H2, SF6, CO and CO2 were included. The optical path length was 10 cm while the spectra resolution was set as 1 cm(-1). The testing results show that the detection limit of all analytes is less than 2 x 10(-6). All the detection limits fit the measurement requirement of spontaneous combustion gas, which means that FTIR may be an ideal instrument and the analysis method used in this paper is competent for spontaneous combustion gas measurement on line.
Data analysis software for the autoradiographic enhancement process. Volumes 1, 2, and 3, and appendix

NASA Technical Reports Server (NTRS)

Singh, S. P.

1979-01-01

The computer software developed to set up a method for Wiener spectrum analysis of photographic films is presented. This method is used for the quantitative analysis of the autoradiographic enhancement process. The software requirements and design for the autoradiographic enhancement process are given along with the program listings and the users manual. A software description and program listings modification of the data analysis software are included.
Integrative omics analysis. A study based on Plasmodium falciparum mRNA and protein data.

PubMed

Tomescu, Oana A; Mattanovich, Diethard; Thallinger, Gerhard G

2014-01-01

Technological improvements have shifted the focus from data generation to data analysis. The availability of large amounts of data from transcriptomics, protemics and metabolomics experiments raise new questions concerning suitable integrative analysis methods. We compare three integrative analysis techniques (co-inertia analysis, generalized singular value decomposition and integrative biclustering) by applying them to gene and protein abundance data from the six life cycle stages of Plasmodium falciparum. Co-inertia analysis is an analysis method used to visualize and explore gene and protein data. The generalized singular value decomposition has shown its potential in the analysis of two transcriptome data sets. Integrative Biclustering applies biclustering to gene and protein data. Using CIA, we visualize the six life cycle stages of Plasmodium falciparum, as well as GO terms in a 2D plane and interpret the spatial configuration. With GSVD, we decompose the transcriptomic and proteomic data sets into matrices with biologically meaningful interpretations and explore the processes captured by the data sets. IBC identifies groups of genes, proteins, GO Terms and life cycle stages of Plasmodium falciparum. We show method-specific results as well as a network view of the life cycle stages based on the results common to all three methods. Additionally, by combining the results of the three methods, we create a three-fold validated network of life cycle stage specific GO terms: Sporozoites are associated with transcription and transport; merozoites with entry into host cell as well as biosynthetic and metabolic processes; rings with oxidation-reduction processes; trophozoites with glycolysis and energy production; schizonts with antigenic variation and immune response; gametocyctes with DNA packaging and mitochondrial transport. Furthermore, the network connectivity underlines the separation of the intraerythrocytic cycle from the gametocyte and sporozoite stages. Using integrative analysis techniques, we can integrate knowledge from different levels and obtain a wider view of the system under study. The overlap between method-specific and common results is considerable, even if the basic mathematical assumptions are very different. The three-fold validated network of life cycle stage characteristics of Plasmodium falciparum could identify a large amount of the known associations from literature in only one study.
Integrative omics analysis. A study based on Plasmodium falciparum mRNA and protein data

PubMed Central

2014-01-01

Background Technological improvements have shifted the focus from data generation to data analysis. The availability of large amounts of data from transcriptomics, protemics and metabolomics experiments raise new questions concerning suitable integrative analysis methods. We compare three integrative analysis techniques (co-inertia analysis, generalized singular value decomposition and integrative biclustering) by applying them to gene and protein abundance data from the six life cycle stages of Plasmodium falciparum. Co-inertia analysis is an analysis method used to visualize and explore gene and protein data. The generalized singular value decomposition has shown its potential in the analysis of two transcriptome data sets. Integrative Biclustering applies biclustering to gene and protein data. Results Using CIA, we visualize the six life cycle stages of Plasmodium falciparum, as well as GO terms in a 2D plane and interpret the spatial configuration. With GSVD, we decompose the transcriptomic and proteomic data sets into matrices with biologically meaningful interpretations and explore the processes captured by the data sets. IBC identifies groups of genes, proteins, GO Terms and life cycle stages of Plasmodium falciparum. We show method-specific results as well as a network view of the life cycle stages based on the results common to all three methods. Additionally, by combining the results of the three methods, we create a three-fold validated network of life cycle stage specific GO terms: Sporozoites are associated with transcription and transport; merozoites with entry into host cell as well as biosynthetic and metabolic processes; rings with oxidation-reduction processes; trophozoites with glycolysis and energy production; schizonts with antigenic variation and immune response; gametocyctes with DNA packaging and mitochondrial transport. Furthermore, the network connectivity underlines the separation of the intraerythrocytic cycle from the gametocyte and sporozoite stages. Conclusion Using integrative analysis techniques, we can integrate knowledge from different levels and obtain a wider view of the system under study. The overlap between method-specific and common results is considerable, even if the basic mathematical assumptions are very different. The three-fold validated network of life cycle stage characteristics of Plasmodium falciparum could identify a large amount of the known associations from literature in only one study. PMID:25033389
QCL spectroscopy combined with the least squares method for substance analysis

NASA Astrophysics Data System (ADS)

Samsonov, D. A.; Tabalina, A. S.; Fufurin, I. L.

2017-11-01

The article briefly describes distinctive features of quantum cascade lasers (QCL). It also describes an experimental set-up for acquiring mid-infrared absorption spectra using QCL. The paper demonstrates experimental results in the form of normed spectra. We tested the application of the least squares method for spectrum analysis. We used this method for substance identification and extraction of concentration data. We compare the results with more common methods of absorption spectroscopy. Eventually, we prove the feasibility of using this simple method for quantitative and qualitative analysis of experimental data acquired with QCL.
16 CFR 309.10 - Alternative vehicle fuel rating.

Code of Federal Regulations, 2010 CFR

2010-01-01

... Analysis of Natural Gas by Gas Chromatography.” For the purposes of this section, fuel ratings for the... methods set forth in ASTM D 1946-90, “Standard Practice for Analysis of Reformed Gas by Gas Chromatography... the principal component of compressed natural gas are to be determined in accordance with test methods...
16 CFR 309.10 - Alternative vehicle fuel rating.

Code of Federal Regulations, 2011 CFR

2011-01-01

... Analysis of Natural Gas by Gas Chromatography.” For the purposes of this section, fuel ratings for the... methods set forth in ASTM D 1946-90, “Standard Practice for Analysis of Reformed Gas by Gas Chromatography... the principal component of compressed natural gas are to be determined in accordance with test methods...

21 CFR 177.2450 - Polyamide-imide resins.

Code of Federal Regulations, 2010 CFR

2010-04-01

... Determination as set forth in the “Official Methods of Analysis of the Association of Official Analytical... chromatography method titled “Amide-Imide Polymer Analysis—Analysis of Monomer Content,” is incorporated by... of films of 1 mil uniform thickness after coating and heat curing at 600 °F for 15 minutes on...
A SAS Interface for Bayesian Analysis with WinBUGS

ERIC Educational Resources Information Center

Zhang, Zhiyong; McArdle, John J.; Wang, Lijuan; Hamagami, Fumiaki

2008-01-01

Bayesian methods are becoming very popular despite some practical difficulties in implementation. To assist in the practical application of Bayesian methods, we show how to implement Bayesian analysis with WinBUGS as part of a standard set of SAS routines. This implementation procedure is first illustrated by fitting a multiple regression model…
La pronunciacion espanola y los metodos de investigacion. (Spanish Pronunciation and Methods of Investigation.)

ERIC Educational Resources Information Center

Torreblanca, Maximo

1988-01-01

Discusses the validity of studies of Spanish pronunciation in terms of research methods employed. Topics include data collection in the laboratory vs. in a natural setting; recorded vs. non-recorded data; quality of the recording; aural analysis vs. spectrographic analysis; and transcriber reliability. Suggestions for improving data collection are…
SECIMTools: a suite of metabolomics data analysis tools.

PubMed

Kirpich, Alexander S; Ibarra, Miguel; Moskalenko, Oleksandr; Fear, Justin M; Gerken, Joseph; Mi, Xinlei; Ashrafi, Ali; Morse, Alison M; McIntyre, Lauren M

2018-04-20

Metabolomics has the promise to transform the area of personalized medicine with the rapid development of high throughput technology for untargeted analysis of metabolites. Open access, easy to use, analytic tools that are broadly accessible to the biological community need to be developed. While technology used in metabolomics varies, most metabolomics studies have a set of features identified. Galaxy is an open access platform that enables scientists at all levels to interact with big data. Galaxy promotes reproducibility by saving histories and enabling the sharing workflows among scientists. SECIMTools (SouthEast Center for Integrated Metabolomics) is a set of Python applications that are available both as standalone tools and wrapped for use in Galaxy. The suite includes a comprehensive set of quality control metrics (retention time window evaluation and various peak evaluation tools), visualization techniques (hierarchical cluster heatmap, principal component analysis, modular modularity clustering), basic statistical analysis methods (partial least squares - discriminant analysis, analysis of variance, t-test, Kruskal-Wallis non-parametric test), advanced classification methods (random forest, support vector machines), and advanced variable selection tools (least absolute shrinkage and selection operator LASSO and Elastic Net). SECIMTools leverages the Galaxy platform and enables integrated workflows for metabolomics data analysis made from building blocks designed for easy use and interpretability. Standard data formats and a set of utilities allow arbitrary linkages between tools to encourage novel workflow designs. The Galaxy framework enables future data integration for metabolomics studies with other omics data.
Independent component analysis decomposition of hospital emergency department throughput measures

NASA Astrophysics Data System (ADS)

He, Qiang; Chu, Henry

2016-05-01

We present a method adapted from medical sensor data analysis, viz. independent component analysis of electroencephalography data, to health system analysis. Timely and effective care in a hospital emergency department is measured by throughput measures such as median times patients spent before they were admitted as an inpatient, before they were sent home, before they were seen by a healthcare professional. We consider a set of five such measures collected at 3,086 hospitals distributed across the U.S. One model of the performance of an emergency department is that these correlated throughput measures are linear combinations of some underlying sources. The independent component analysis decomposition of the data set can thus be viewed as transforming a set of performance measures collected at a site to a collection of outputs of spatial filters applied to the whole multi-measure data. We compare the independent component sources with the output of the conventional principal component analysis to show that the independent components are more suitable for understanding the data sets through visualizations.
Rapid acquisition of data dense solid-state CPMG NMR spectral sets using multi-dimensional statistical analysis

DOE PAGES

Mason, H. E.; Uribe, E. C.; Shusterman, J. A.

2018-01-01

Tensor-rank decomposition methods have been applied to variable contact time 29 Si{ 1 H} CP/CPMG NMR data sets to extract NMR dynamics information and dramatically decrease conventional NMR acquisition times.
Rapid acquisition of data dense solid-state CPMG NMR spectral sets using multi-dimensional statistical analysis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mason, H. E.; Uribe, E. C.; Shusterman, J. A.

Tensor-rank decomposition methods have been applied to variable contact time 29 Si{ 1 H} CP/CPMG NMR data sets to extract NMR dynamics information and dramatically decrease conventional NMR acquisition times.
A standards-based method for compositional analysis by energy dispersive X-ray spectrometry using multivariate statistical analysis: application to multicomponent alloys.

PubMed

Rathi, Monika; Ahrenkiel, S P; Carapella, J J; Wanlass, M W

2013-02-01

Given an unknown multicomponent alloy, and a set of standard compounds or alloys of known composition, can one improve upon popular standards-based methods for energy dispersive X-ray (EDX) spectrometry to quantify the elemental composition of the unknown specimen? A method is presented here for determining elemental composition of alloys using transmission electron microscopy-based EDX with appropriate standards. The method begins with a discrete set of related reference standards of known composition, applies multivariate statistical analysis to those spectra, and evaluates the compositions with a linear matrix algebra method to relate the spectra to elemental composition. By using associated standards, only limited assumptions about the physical origins of the EDX spectra are needed. Spectral absorption corrections can be performed by providing an estimate of the foil thickness of one or more reference standards. The technique was applied to III-V multicomponent alloy thin films: composition and foil thickness were determined for various III-V alloys. The results were then validated by comparing with X-ray diffraction and photoluminescence analysis, demonstrating accuracy of approximately 1% in atomic fraction.
Combined target factor analysis and Bayesian soft-classification of interference-contaminated samples: forensic fire debris analysis.

PubMed

Williams, Mary R; Sigman, Michael E; Lewis, Jennifer; Pitan, Kelly McHugh

2012-10-10

A bayesian soft classification method combined with target factor analysis (TFA) is described and tested for the analysis of fire debris data. The method relies on analysis of the average mass spectrum across the chromatographic profile (i.e., the total ion spectrum, TIS) from multiple samples taken from a single fire scene. A library of TIS from reference ignitable liquids with assigned ASTM classification is used as the target factors in TFA. The class-conditional distributions of correlations between the target and predicted factors for each ASTM class are represented by kernel functions and analyzed by bayesian decision theory. The soft classification approach assists in assessing the probability that ignitable liquid residue from a specific ASTM E1618 class, is present in a set of samples from a single fire scene, even in the presence of unspecified background contributions from pyrolysis products. The method is demonstrated with sample data sets and then tested on laboratory-scale burn data and large-scale field test burns. The overall performance achieved in laboratory and field test of the method is approximately 80% correct classification of fire debris samples. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.
Detecting concerted demographic response across community assemblages using hierarchical approximate Bayesian computation.

PubMed

Chan, Yvonne L; Schanzenbach, David; Hickerson, Michael J

2014-09-01

Methods that integrate population-level sampling from multiple taxa into a single community-level analysis are an essential addition to the comparative phylogeographic toolkit. Detecting how species within communities have demographically tracked each other in space and time is important for understanding the effects of future climate and landscape changes and the resulting acceleration of extinctions, biological invasions, and potential surges in adaptive evolution. Here, we present a statistical framework for such an analysis based on hierarchical approximate Bayesian computation (hABC) with the goal of detecting concerted demographic histories across an ecological assemblage. Our method combines population genetic data sets from multiple taxa into a single analysis to estimate: 1) the proportion of a community sample that demographically expanded in a temporally clustered pulse and 2) when the pulse occurred. To validate the accuracy and utility of this new approach, we use simulation cross-validation experiments and subsequently analyze an empirical data set of 32 avian populations from Australia that are hypothesized to have expanded from smaller refugia populations in the late Pleistocene. The method can accommodate data set heterogeneity such as variability in effective population size, mutation rates, and sample sizes across species and exploits the statistical strength from the simultaneous analysis of multiple species. This hABC framework used in a multitaxa demographic context can increase our understanding of the impact of historical climate change by determining what proportion of the community responded in concert or independently and can be used with a wide variety of comparative phylogeographic data sets as biota-wide DNA barcoding data sets accumulate. © The Author 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Local regression type methods applied to the study of geophysics and high frequency financial data

NASA Astrophysics Data System (ADS)

Mariani, M. C.; Basu, K.

2014-09-01

In this work we applied locally weighted scatterplot smoothing techniques (Lowess/Loess) to Geophysical and high frequency financial data. We first analyze and apply this technique to the California earthquake geological data. A spatial analysis was performed to show that the estimation of the earthquake magnitude at a fixed location is very accurate up to the relative error of 0.01%. We also applied the same method to a high frequency data set arising in the financial sector and obtained similar satisfactory results. The application of this approach to the two different data sets demonstrates that the overall method is accurate and efficient, and the Lowess approach is much more desirable than the Loess method. The previous works studied the time series analysis; in this paper our local regression models perform a spatial analysis for the geophysics data providing different information. For the high frequency data, our models estimate the curve of best fit where data are dependent on time.
How High is that Dune? A Comparison of Methods Used to Constrain the Morphometry of Aeolian Bedforms on Mars

NASA Technical Reports Server (NTRS)

Bourke, M.; Balme, M.; Beyer, R. A.; Williams, K. K.

2004-01-01

Methods traditionally used to estimate the relative height of surface features on Mars include: photoclinometry, shadow length and stereography. The MOLA data set enables a more accurate assessment of the surface topography of Mars. However, many small-scale aeolian bedforms remain below the sample resolution of the MOLA data set. In response to this a number of research teams have adopted and refined existing methods and applied them to high resolution (2-6 m/pixel) narrow angle MOC satellite images. Collectively, the methods provide data on a range of morphometric parameters (many not previously available for dunes on Mars). These include dune height, width, length, surface area, volume, longitudinal and cross profiles). This data will facilitate a more accurate analysis of aeolian bedforms on Mars. In this paper we undertake a comparative analysis of methods used to determine the height of aeolian dunes and ripples.
A Fast Multiple-Kernel Method With Applications to Detect Gene-Environment Interaction.

PubMed

Marceau, Rachel; Lu, Wenbin; Holloway, Shannon; Sale, Michèle M; Worrall, Bradford B; Williams, Stephen R; Hsu, Fang-Chi; Tzeng, Jung-Ying

2015-09-01

Kernel machine (KM) models are a powerful tool for exploring associations between sets of genetic variants and complex traits. Although most KM methods use a single kernel function to assess the marginal effect of a variable set, KM analyses involving multiple kernels have become increasingly popular. Multikernel analysis allows researchers to study more complex problems, such as assessing gene-gene or gene-environment interactions, incorporating variance-component based methods for population substructure into rare-variant association testing, and assessing the conditional effects of a variable set adjusting for other variable sets. The KM framework is robust, powerful, and provides efficient dimension reduction for multifactor analyses, but requires the estimation of high dimensional nuisance parameters. Traditional estimation techniques, including regularization and the "expectation-maximization (EM)" algorithm, have a large computational cost and are not scalable to large sample sizes needed for rare variant analysis. Therefore, under the context of gene-environment interaction, we propose a computationally efficient and statistically rigorous "fastKM" algorithm for multikernel analysis that is based on a low-rank approximation to the nuisance effect kernel matrices. Our algorithm is applicable to various trait types (e.g., continuous, binary, and survival traits) and can be implemented using any existing single-kernel analysis software. Through extensive simulation studies, we show that our algorithm has similar performance to an EM-based KM approach for quantitative traits while running much faster. We also apply our method to the Vitamin Intervention for Stroke Prevention (VISP) clinical trial, examining gene-by-vitamin effects on recurrent stroke risk and gene-by-age effects on change in homocysteine level. © 2015 WILEY PERIODICALS, INC.
A Comparison of the Incidence of Cricothyrotomy in the Deployed Setting to the Emergency Department at a Level 1 Military Trauma Center: A Descriptive Analysis

DTIC Science & Technology

2015-03-01

the providers in the deployed setting and include the Tactical Combat Casualty Care casualty card. Data are then coded for query and analysis. All...intubate, can’t ventilate” and disruption of head/neck anatomy. Of the four procedures performed in the ED setting, three patients survived to hospital...data from SAMMC are limited by the search methods and data extraction. We searched by Current Procedural Ter- minology code , which requires that the
A Parallel Genetic Algorithm to Discover Patterns in Genetic Markers that Indicate Predisposition to Multifactorial Disease

PubMed Central

Rausch, Tobias; Thomas, Alun; Camp, Nicola J.; Cannon-Albright, Lisa A.; Facelli, Julio C.

2008-01-01

This paper describes a novel algorithm to analyze genetic linkage data using pattern recognition techniques and genetic algorithms (GA). The method allows a search for regions of the chromosome that may contain genetic variations that jointly predispose individuals for a particular disease. The method uses correlation analysis, filtering theory and genetic algorithms (GA) to achieve this goal. Because current genome scans use from hundreds to hundreds of thousands of markers, two versions of the method have been implemented. The first is an exhaustive analysis version that can be used to visualize, explore, and analyze small genetic data sets for two marker correlations; the second is a GA version, which uses a parallel implementation allowing searches of higher-order correlations in large data sets. Results on simulated data sets indicate that the method can be informative in the identification of major disease loci and gene-gene interactions in genome-wide linkage data and that further exploration of these techniques is justified. The results presented for both variants of the method show that it can help genetic epidemiologists to identify promising combinations of genetic factors that might predispose to complex disorders. In particular, the correlation analysis of IBD expression patterns might hint to possible gene-gene interactions and the filtering might be a fruitful approach to distinguish true correlation signals from noise. PMID:18547558
A time-series method for automated measurement of changes in mitotic and interphase duration from time-lapse movies.

PubMed

Sigoillot, Frederic D; Huckins, Jeremy F; Li, Fuhai; Zhou, Xiaobo; Wong, Stephen T C; King, Randall W

2011-01-01

Automated time-lapse microscopy can visualize proliferation of large numbers of individual cells, enabling accurate measurement of the frequency of cell division and the duration of interphase and mitosis. However, extraction of quantitative information by manual inspection of time-lapse movies is too time-consuming to be useful for analysis of large experiments. Here we present an automated time-series approach that can measure changes in the duration of mitosis and interphase in individual cells expressing fluorescent histone 2B. The approach requires analysis of only 2 features, nuclear area and average intensity. Compared to supervised learning approaches, this method reduces processing time and does not require generation of training data sets. We demonstrate that this method is as sensitive as manual analysis in identifying small changes in interphase or mitotic duration induced by drug or siRNA treatment. This approach should facilitate automated analysis of high-throughput time-lapse data sets to identify small molecules or gene products that influence timing of cell division.
Binary tree eigen solver in finite element analysis

NASA Technical Reports Server (NTRS)

Akl, F. A.; Janetzke, D. C.; Kiraly, L. J.

1993-01-01

This paper presents a transputer-based binary tree eigensolver for the solution of the generalized eigenproblem in linear elastic finite element analysis. The algorithm is based on the method of recursive doubling, which parallel implementation of a number of associative operations on an arbitrary set having N elements is of the order of o(log2N), compared to (N-1) steps if implemented sequentially. The hardware used in the implementation of the binary tree consists of 32 transputers. The algorithm is written in OCCAM which is a high-level language developed with the transputers to address parallel programming constructs and to provide the communications between processors. The algorithm can be replicated to match the size of the binary tree transputer network. Parallel and sequential finite element analysis programs have been developed to solve for the set of the least-order eigenpairs using the modified subspace method. The speed-up obtained for a typical analysis problem indicates close agreement with the theoretical prediction given by the method of recursive doubling.
The Fungal Frontier: A Comparative Analysis of Methods Used in the Study of the Human Gut Mycobiome.

PubMed

Huseyin, Chloe E; Rubio, Raul Cabrera; O'Sullivan, Orla; Cotter, Paul D; Scanlan, Pauline D

2017-01-01

The human gut is host to a diverse range of fungal species, collectively referred to as the gut "mycobiome". The gut mycobiome is emerging as an area of considerable research interest due to the potential roles of these fungi in human health and disease. However, there is no consensus as to what the best or most suitable methodologies available are with respect to characterizing the human gut mycobiome. The aim of this study is to provide a comparative analysis of several previously published mycobiome-specific culture-dependent and -independent methodologies, including choice of culture media, incubation conditions (aerobic versus anaerobic), DNA extraction method, primer set and freezing of fecal samples to assess their relative merits and suitability for gut mycobiome analysis. There was no significant effect of media type or aeration on culture-dependent results. However, freezing was found to have a significant effect on fungal viability, with significantly lower fungal numbers recovered from frozen samples. DNA extraction method had a significant effect on DNA yield and quality. However, freezing and extraction method did not have any impact on either α or β diversity. There was also considerable variation in the ability of different fungal-specific primer sets to generate PCR products for subsequent sequence analysis. Through this investigation two DNA extraction methods and one primer set was identified which facilitated the analysis of the mycobiome for all samples in this study. Ultimately, a diverse range of fungal species were recovered using both approaches, with Candida and Saccharomyces identified as the most common fungal species recovered using culture-dependent and culture-independent methods, respectively. As has been apparent from ecological surveys of the bacterial fraction of the gut microbiota, the use of different methodologies can also impact on our understanding of gut mycobiome composition and therefore requires careful consideration. Future research into the gut mycobiome needs to adopt a common strategy to minimize potentially confounding effects of methodological choice and to facilitate comparative analysis of datasets.
A Mass Spectrometric Analysis Method Based on PPCA and SVM for Early Detection of Ovarian Cancer.

PubMed

Wu, Jiang; Ji, Yanju; Zhao, Ling; Ji, Mengying; Ye, Zhuang; Li, Suyi

2016-01-01

Background. Surfaced-enhanced laser desorption-ionization-time of flight mass spectrometry (SELDI-TOF-MS) technology plays an important role in the early diagnosis of ovarian cancer. However, the raw MS data is highly dimensional and redundant. Therefore, it is necessary to study rapid and accurate detection methods from the massive MS data. Methods. The clinical data set used in the experiments for early cancer detection consisted of 216 SELDI-TOF-MS samples. An MS analysis method based on probabilistic principal components analysis (PPCA) and support vector machine (SVM) was proposed and applied to the ovarian cancer early classification in the data set. Additionally, by the same data set, we also established a traditional PCA-SVM model. Finally we compared the two models in detection accuracy, specificity, and sensitivity. Results. Using independent training and testing experiments 10 times to evaluate the ovarian cancer detection models, the average prediction accuracy, sensitivity, and specificity of the PCA-SVM model were 83.34%, 82.70%, and 83.88%, respectively. In contrast, those of the PPCA-SVM model were 90.80%, 92.98%, and 88.97%, respectively. Conclusions. The PPCA-SVM model had better detection performance. And the model combined with the SELDI-TOF-MS technology had a prospect in early clinical detection and diagnosis of ovarian cancer.
Parameter motivated mutual correlation analysis: Application to the study of currency exchange rates based on intermittency parameter and Hurst exponent

NASA Astrophysics Data System (ADS)

Cristescu, Constantin P.; Stan, Cristina; Scarlat, Eugen I.; Minea, Teofil; Cristescu, Cristina M.

2012-04-01

We present a novel method for the parameter oriented analysis of mutual correlation between independent time series or between equivalent structures such as ordered data sets. The proposed method is based on the sliding window technique, defines a new type of correlation measure and can be applied to time series from all domains of science and technology, experimental or simulated. A specific parameter that can characterize the time series is computed for each window and a cross correlation analysis is carried out on the set of values obtained for the time series under investigation. We apply this method to the study of some currency daily exchange rates from the point of view of the Hurst exponent and the intermittency parameter. Interesting correlation relationships are revealed and a tentative crisis prediction is presented.

The anisotropic Hooke's law for cancellous bone and wood.

PubMed

Yang, G; Kabel, J; van Rietbergen, B; Odgaard, A; Huiskes, R; Cowin, S C

A method of data analysis for a set of elastic constant measurements is applied to data bases for wood and cancellous bone. For these materials the identification of the type of elastic symmetry is complicated by the variable composition of the material. The data analysis method permits the identification of the type of elastic symmetry to be accomplished independent of the examination of the variable composition. This method of analysis may be applied to any set of elastic constant measurements, but is illustrated here by application to hardwoods and softwoods, and to an extraordinary data base of cancellous bone elastic constants. The solid volume fraction or bulk density is the compositional variable for the elastic constants of these natural materials. The final results are the solid volume fraction dependent orthotropic Hooke's law for cancellous bone and a bulk density dependent one for hardwoods and softwoods.
A Multicriteria Decision Making Approach for Estimating the Number of Clusters in a Data Set

PubMed Central

Peng, Yi; Zhang, Yong; Kou, Gang; Shi, Yong

2012-01-01

Determining the number of clusters in a data set is an essential yet difficult step in cluster analysis. Since this task involves more than one criterion, it can be modeled as a multiple criteria decision making (MCDM) problem. This paper proposes a multiple criteria decision making (MCDM)-based approach to estimate the number of clusters for a given data set. In this approach, MCDM methods consider different numbers of clusters as alternatives and the outputs of any clustering algorithm on validity measures as criteria. The proposed method is examined by an experimental study using three MCDM methods, the well-known clustering algorithm–k-means, ten relative measures, and fifteen public-domain UCI machine learning data sets. The results show that MCDM methods work fairly well in estimating the number of clusters in the data and outperform the ten relative measures considered in the study. PMID:22870181
A strategy to apply quantitative epistasis analysis on developmental traits.

PubMed

Labocha, Marta K; Yuan, Wang; Aleman-Meza, Boanerges; Zhong, Weiwei

2017-05-15

Genetic interactions are keys to understand complex traits and evolution. Epistasis analysis is an effective method to map genetic interactions. Large-scale quantitative epistasis analysis has been well established for single cells. However, there is a substantial lack of such studies in multicellular organisms and their complex phenotypes such as development. Here we present a method to extend quantitative epistasis analysis to developmental traits. In the nematode Caenorhabditis elegans, we applied RNA interference on mutants to inactivate two genes, used an imaging system to quantitatively measure phenotypes, and developed a set of statistical methods to extract genetic interactions from phenotypic measurement. Using two different C. elegans developmental phenotypes, body length and sex ratio, as examples, we showed that this method could accommodate various metazoan phenotypes with performances comparable to those methods in single cell growth studies. Comparing with qualitative observations, this method of quantitative epistasis enabled detection of new interactions involving subtle phenotypes. For example, several sex-ratio genes were found to interact with brc-1 and brd-1, the orthologs of the human breast cancer genes BRCA1 and BARD1, respectively. We confirmed the brc-1 interactions with the following genes in DNA damage response: C34F6.1, him-3 (ortholog of HORMAD1, HORMAD2), sdc-1, and set-2 (ortholog of SETD1A, SETD1B, KMT2C, KMT2D), validating the effectiveness of our method in detecting genetic interactions. We developed a reliable, high-throughput method for quantitative epistasis analysis of developmental phenotypes.
Topological chaos, braiding and bifurcation of almost-cyclic sets.

PubMed

Grover, Piyush; Ross, Shane D; Stremler, Mark A; Kumar, Pankaj

2012-12-01

In certain two-dimensional time-dependent flows, the braiding of periodic orbits provides a way to analyze chaos in the system through application of the Thurston-Nielsen classification theorem (TNCT). We expand upon earlier work that introduced the application of the TNCT to braiding of almost-cyclic sets, which are individual components of almost-invariant sets [Stremler et al., "Topological chaos and periodic braiding of almost-cyclic sets," Phys. Rev. Lett. 106, 114101 (2011)]. In this context, almost-cyclic sets are periodic regions in the flow with high local residence time that act as stirrers or "ghost rods" around which the surrounding fluid appears to be stretched and folded. In the present work, we discuss the bifurcation of the almost-cyclic sets as a system parameter is varied, which results in a sequence of topologically distinct braids. We show that, for Stokes' flow in a lid-driven cavity, these various braids give good lower bounds on the topological entropy over the respective parameter regimes in which they exist. We make the case that a topological analysis based on spatiotemporal braiding of almost-cyclic sets can be used for analyzing chaos in fluid flows. Hence, we further develop a connection between set-oriented statistical methods and topological methods, which promises to be an important analysis tool in the study of complex systems.
Pathway Analysis in Attention Deficit Hyperactivity Disorder: An Ensemble Approach

PubMed Central

Mooney, Michael A.; McWeeney, Shannon K.; Faraone, Stephen V.; Hinney, Anke; Hebebrand, Johannes; Nigg, Joel T.; Wilmot, Beth

2016-01-01

Despite a wealth of evidence for the role of genetics in attention deficit hyperactivity disorder (ADHD), specific and definitive genetic mechanisms have not been identified. Pathway analyses, a subset of gene-set analyses, extend the knowledge gained from genome-wide association studies (GWAS) by providing functional context for genetic associations. However, there are numerous methods for association testing of gene sets and no real consensus regarding the best approach. The present study applied six pathway analysis methods to identify pathways associated with ADHD in two GWAS datasets from the Psychiatric Genomics Consortium. Methods that utilize genotypes to model pathway-level effects identified more replicable pathway associations than methods using summary statistics. In addition, pathways implicated by more than one method were significantly more likely to replicate. A number of brain-relevant pathways, such as RhoA signaling, glycosaminoglycan biosynthesis, fibroblast growth factor receptor activity, and pathways containing potassium channel genes, were nominally significant by multiple methods in both datasets. These results support previous hypotheses about the role of regulation of neurotransmitter release, neurite outgrowth and axon guidance in contributing to the ADHD phenotype and suggest the value of cross-method convergence in evaluating pathway analysis results. PMID:27004716
Interdisciplinary research on patient-provider communication: a cross-method comparison.

PubMed

Chou, Wen-Ying Sylvia; Han, Paul; Pilsner, Alison; Coa, Kisha; Greenberg, Larrie; Blatt, Benjamin

2011-01-01

Patient-provider communication, a key aspect of healthcare delivery, has been assessed through multiple methods for purposes of research, education, and quality control. Common techniques include satisfaction ratings and quantitatively- and qualitatively-oriented direct observations. Identifying the strengths and weaknesses of different approaches is critically important in determining the appropriate assessment method for a specific research or practical goal. Analyzing ten videotaped simulated encounters between medical students and Standardized Patients (SPs), this study compared three existing assessment methods through the same data set. Methods included: (1) dichotomized SP ratings on students' communication skills; (2) Roter Interaction Analysis System (RIAS) analysis; and (3) inductive discourse analysis informed by sociolinguistic theories. The large dichotomous contrast between good and poor ratings in (1) was not evidenced in any of the other methods. Following a discussion of strengths and weaknesses of each approach, we pilot-tested a combined assessment done by coders blinded to results of (1)-(3). This type of integrative approach has the potential of adding a quantifiable dimension to qualitative, discourse-based observations. Subjecting the same data set to separate analytic methods provides an excellent opportunity for methodological comparisons with the goal of informing future assessment of clinical encounters.
Synthesis of linear regression coefficients by recovering the within-study covariance matrix from summary statistics.

PubMed

Yoneoka, Daisuke; Henmi, Masayuki

2017-06-01

Recently, the number of regression models has dramatically increased in several academic fields. However, within the context of meta-analysis, synthesis methods for such models have not been developed in a commensurate trend. One of the difficulties hindering the development is the disparity in sets of covariates among literature models. If the sets of covariates differ across models, interpretation of coefficients will differ, thereby making it difficult to synthesize them. Moreover, previous synthesis methods for regression models, such as multivariate meta-analysis, often have problems because covariance matrix of coefficients (i.e. within-study correlations) or individual patient data are not necessarily available. This study, therefore, proposes a brief explanation regarding a method to synthesize linear regression models under different covariate sets by using a generalized least squares method involving bias correction terms. Especially, we also propose an approach to recover (at most) threecorrelations of covariates, which is required for the calculation of the bias term without individual patient data. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
An improved level set method for brain MR images segmentation and bias correction.

PubMed

Chen, Yunjie; Zhang, Jianwei; Macione, Jim

2009-10-01

Intensity inhomogeneities cause considerable difficulty in the quantitative analysis of magnetic resonance (MR) images. Thus, bias field estimation is a necessary step before quantitative analysis of MR data can be undertaken. This paper presents a variational level set approach to bias correction and segmentation for images with intensity inhomogeneities. Our method is based on an observation that intensities in a relatively small local region are separable, despite of the inseparability of the intensities in the whole image caused by the overall intensity inhomogeneity. We first define a localized K-means-type clustering objective function for image intensities in a neighborhood around each point. The cluster centers in this objective function have a multiplicative factor that estimates the bias within the neighborhood. The objective function is then integrated over the entire domain to define the data term into the level set framework. Our method is able to capture bias of quite general profiles. Moreover, it is robust to initialization, and thereby allows fully automated applications. The proposed method has been used for images of various modalities with promising results.
Comparing the performance of biomedical clustering methods.

PubMed

Wiwie, Christian; Baumbach, Jan; Röttger, Richard

2015-11-01

Identifying groups of similar objects is a popular first step in biomedical data analysis, but it is error-prone and impossible to perform manually. Many computational methods have been developed to tackle this problem. Here we assessed 13 well-known methods using 24 data sets ranging from gene expression to protein domains. Performance was judged on the basis of 13 common cluster validity indices. We developed a clustering analysis platform, ClustEval (http://clusteval.mpi-inf.mpg.de), to promote streamlined evaluation, comparison and reproducibility of clustering results in the future. This allowed us to objectively evaluate the performance of all tools on all data sets with up to 1,000 different parameter sets each, resulting in a total of more than 4 million calculated cluster validity indices. We observed that there was no universal best performer, but on the basis of this wide-ranging comparison we were able to develop a short guideline for biomedical clustering tasks. ClustEval allows biomedical researchers to pick the appropriate tool for their data type and allows method developers to compare their tool to the state of the art.
Fourier spatial frequency analysis for image classification: training the training set

NASA Astrophysics Data System (ADS)

Johnson, Timothy H.; Lhamo, Yigah; Shi, Lingyan; Alfano, Robert R.; Russell, Stewart

2016-04-01

The Directional Fourier Spatial Frequencies (DFSF) of a 2D image can identify similarity in spatial patterns within groups of related images. A Support Vector Machine (SVM) can then be used to classify images if the inter-image variance of the FSF in the training set is bounded. However, if variation in FSF increases with training set size, accuracy may decrease as the size of the training set increases. This calls for a method to identify a set of training images from among the originals that can form a vector basis for the entire class. Applying the Cauchy product method we extract the DFSF spectrum from radiographs of osteoporotic bone, and use it as a matched filter set to eliminate noise and image specific frequencies, and demonstrate that selection of a subset of superclassifiers from within a set of training images improves SVM accuracy. Central to this challenge is that the size of the search space can become computationally prohibitive for all but the smallest training sets. We are investigating methods to reduce the search space to identify an optimal subset of basis training images.
Rank estimation and the multivariate analysis of in vivo fast-scan cyclic voltammetric data

PubMed Central

Keithley, Richard B.; Carelli, Regina M.; Wightman, R. Mark

2010-01-01

Principal component regression has been used in the past to separate current contributions from different neuromodulators measured with in vivo fast-scan cyclic voltammetry. Traditionally, a percent cumulative variance approach has been used to determine the rank of the training set voltammetric matrix during model development, however this approach suffers from several disadvantages including the use of arbitrary percentages and the requirement of extreme precision of training sets. Here we propose that Malinowski’s F-test, a method based on a statistical analysis of the variance contained within the training set, can be used to improve factor selection for the analysis of in vivo fast-scan cyclic voltammetric data. These two methods of rank estimation were compared at all steps in the calibration protocol including the number of principal components retained, overall noise levels, model validation as determined using a residual analysis procedure, and predicted concentration information. By analyzing 119 training sets from two different laboratories amassed over several years, we were able to gain insight into the heterogeneity of in vivo fast-scan cyclic voltammetric data and study how differences in factor selection propagate throughout the entire principal component regression analysis procedure. Visualizing cyclic voltammetric representations of the data contained in the retained and discarded principal components showed that using Malinowski’s F-test for rank estimation of in vivo training sets allowed for noise to be more accurately removed. Malinowski’s F-test also improved the robustness of our criterion for judging multivariate model validity, even though signal-to-noise ratios of the data varied. In addition, pH change was the majority noise carrier of in vivo training sets while dopamine prediction was more sensitive to noise. PMID:20527815
Multivariate missing data in hydrology - Review and applications

NASA Astrophysics Data System (ADS)

Ben Aissia, Mohamed-Aymen; Chebana, Fateh; Ouarda, Taha B. M. J.

2017-12-01

Water resources planning and management require complete data sets of a number of hydrological variables, such as flood peaks and volumes. However, hydrologists are often faced with the problem of missing data (MD) in hydrological databases. Several methods are used to deal with the imputation of MD. During the last decade, multivariate approaches have gained popularity in the field of hydrology, especially in hydrological frequency analysis (HFA). However, treating the MD remains neglected in the multivariate HFA literature whereas the focus has been mainly on the modeling component. For a complete analysis and in order to optimize the use of data, MD should also be treated in the multivariate setting prior to modeling and inference. Imputation of MD in the multivariate hydrological framework can have direct implications on the quality of the estimation. Indeed, the dependence between the series represents important additional information that can be included in the imputation process. The objective of the present paper is to highlight the importance of treating MD in multivariate hydrological frequency analysis by reviewing and applying multivariate imputation methods and by comparing univariate and multivariate imputation methods. An application is carried out for multiple flood attributes on three sites in order to evaluate the performance of the different methods based on the leave-one-out procedure. The results indicate that, the performance of imputation methods can be improved by adopting the multivariate setting, compared to mean substitution and interpolation methods, especially when using the copula-based approach.
GO-PCA: An Unsupervised Method to Explore Gene Expression Data Using Prior Knowledge

PubMed Central

Wagner, Florian

2015-01-01

Method Genome-wide expression profiling is a widely used approach for characterizing heterogeneous populations of cells, tissues, biopsies, or other biological specimen. The exploratory analysis of such data typically relies on generic unsupervised methods, e.g. principal component analysis (PCA) or hierarchical clustering. However, generic methods fail to exploit prior knowledge about the molecular functions of genes. Here, I introduce GO-PCA, an unsupervised method that combines PCA with nonparametric GO enrichment analysis, in order to systematically search for sets of genes that are both strongly correlated and closely functionally related. These gene sets are then used to automatically generate expression signatures with functional labels, which collectively aim to provide a readily interpretable representation of biologically relevant similarities and differences. The robustness of the results obtained can be assessed by bootstrapping. Results I first applied GO-PCA to datasets containing diverse hematopoietic cell types from human and mouse, respectively. In both cases, GO-PCA generated a small number of signatures that represented the majority of lineages present, and whose labels reflected their respective biological characteristics. I then applied GO-PCA to human glioblastoma (GBM) data, and recovered signatures associated with four out of five previously defined GBM subtypes. My results demonstrate that GO-PCA is a powerful and versatile exploratory method that reduces an expression matrix containing thousands of genes to a much smaller set of interpretable signatures. In this way, GO-PCA aims to facilitate hypothesis generation, design of further analyses, and functional comparisons across datasets. PMID:26575370
[Tobacco quality analysis of producing areas of Yunnan tobacco using near-infrared (NIR) spectrum].

PubMed

Wang, Yi; Ma, Xiang; Wen, Ya-Dong; Yu, Chun-Xia; Wang, Luo-Ping; Zhao, Long-Lian; Li, Jun-Hui

2013-01-01

In the present study, tobacco quality analysis of different producing areas was carried out applying spectrum projection and correlation methods. The group of industrial classification data was near-infrared (NIR) spectrum in 2010 year of middle parts of tobacco plant from Hongta Tobacco (Group) Co., Ltd. Twelve hundred seventy six superior tobacco leaf samples were collected from four producing areas, in which three areas from Yuxi, Chuxiong and Zhaotong, in Yunnan province all belong to tobacco varieties of K326 and one area from Dali belongs to tobacco varieties of Hongda. The conclusion showed that when the samples were divided into two parts by the ratio of 2 : 1 randomly as analysis and verification sets, the verification set corresponded with the analysis set applying spectrum projection because their correlation coefficients by the first and second dimensional projection were all above 0.99. At the same time, The study discussed a method to get the quantitative similarity values of different producing areas samples. The similarity values were instructive in tobacco plant planning, quality management, acquisition of raw materials of tobacco and tobacco leaf blending.
Visualization of time series statistical data by shape analysis (GDP ratio changes among Asia countries)

NASA Astrophysics Data System (ADS)

Shirota, Yukari; Hashimoto, Takako; Fitri Sari, Riri

2018-03-01

It has been very significant to visualize time series big data. In the paper we shall discuss a new analysis method called “statistical shape analysis” or “geometry driven statistics” on time series statistical data in economics. In the paper, we analyse the agriculture, value added and industry, value added (percentage of GDP) changes from 2000 to 2010 in Asia. We handle the data as a set of landmarks on a two-dimensional image to see the deformation using the principal components. The point of the analysis method is the principal components of the given formation which are eigenvectors of its bending energy matrix. The local deformation can be expressed as the set of non-Affine transformations. The transformations give us information about the local differences between in 2000 and in 2010. Because the non-Affine transformation can be decomposed into a set of partial warps, we present the partial warps visually. The statistical shape analysis is widely used in biology but, in economics, no application can be found. In the paper, we investigate its potential to analyse the economic data.
A method of hidden Markov model optimization for use with geophysical data sets

NASA Technical Reports Server (NTRS)

Granat, R. A.

2003-01-01

Geophysics research has been faced with a growing need for automated techniques with which to process large quantities of data. A successful tool must meet a number of requirements: it should be consistent, require minimal parameter tuning, and produce scientifically meaningful results in reasonable time. We introduce a hidden Markov model (HMM)-based method for analysis of geophysical data sets that attempts to address these issues.
An enhanced cluster analysis program with bootstrap significance testing for ecological community analysis

USGS Publications Warehouse

McKenna, J.E.

2003-01-01

The biosphere is filled with complex living patterns and important questions about biodiversity and community and ecosystem ecology are concerned with structure and function of multispecies systems that are responsible for those patterns. Cluster analysis identifies discrete groups within multivariate data and is an effective method of coping with these complexities, but often suffers from subjective identification of groups. The bootstrap testing method greatly improves objective significance determination for cluster analysis. The BOOTCLUS program makes cluster analysis that reliably identifies real patterns within a data set more accessible and easier to use than previously available programs. A variety of analysis options and rapid re-analysis provide a means to quickly evaluate several aspects of a data set. Interpretation is influenced by sampling design and a priori designation of samples into replicate groups, and ultimately relies on the researcher's knowledge of the organisms and their environment. However, the BOOTCLUS program provides reliable, objectively determined groupings of multivariate data.
Model for spectral and chromatographic data

DOEpatents

Jarman, Kristin [Richland, WA; Willse, Alan [Richland, WA; Wahl, Karen [Richland, WA; Wahl, Jon [Richland, WA

2002-11-26

A method and apparatus using a spectral analysis technique are disclosed. In one form of the invention, probabilities are selected to characterize the presence (and in another form, also a quantification of a characteristic) of peaks in an indexed data set for samples that match a reference species, and other probabilities are selected for samples that do not match the reference species. An indexed data set is acquired for a sample, and a determination is made according to techniques exemplified herein as to whether the sample matches or does not match the reference species. When quantification of peak characteristics is undertaken, the model is appropriately expanded, and the analysis accounts for the characteristic model and data. Further techniques are provided to apply the methods and apparatuses to process control, cluster analysis, hypothesis testing, analysis of variance, and other procedures involving multiple comparisons of indexed data.
Method for factor analysis of GC/MS data

DOEpatents

Van Benthem, Mark H; Kotula, Paul G; Keenan, Michael R

2012-09-11

The method of the present invention provides a fast, robust, and automated multivariate statistical analysis of gas chromatography/mass spectroscopy (GC/MS) data sets. The method can involve systematic elimination of undesired, saturated peak masses to yield data that follow a linear, additive model. The cleaned data can then be subjected to a combination of PCA and orthogonal factor rotation followed by refinement with MCR-ALS to yield highly interpretable results.
HiQuant: Rapid Postquantification Analysis of Large-Scale MS-Generated Proteomics Data.

PubMed

Bryan, Kenneth; Jarboui, Mohamed-Ali; Raso, Cinzia; Bernal-Llinares, Manuel; McCann, Brendan; Rauch, Jens; Boldt, Karsten; Lynn, David J

2016-06-03

Recent advances in mass-spectrometry-based proteomics are now facilitating ambitious large-scale investigations of the spatial and temporal dynamics of the proteome; however, the increasing size and complexity of these data sets is overwhelming current downstream computational methods, specifically those that support the postquantification analysis pipeline. Here we present HiQuant, a novel application that enables the design and execution of a postquantification workflow, including common data-processing steps, such as assay normalization and grouping, and experimental replicate quality control and statistical analysis. HiQuant also enables the interpretation of results generated from large-scale data sets by supporting interactive heatmap analysis and also the direct export to Cytoscape and Gephi, two leading network analysis platforms. HiQuant may be run via a user-friendly graphical interface and also supports complete one-touch automation via a command-line mode. We evaluate HiQuant's performance by analyzing a large-scale, complex interactome mapping data set and demonstrate a 200-fold improvement in the execution time over current methods. We also demonstrate HiQuant's general utility by analyzing proteome-wide quantification data generated from both a large-scale public tyrosine kinase siRNA knock-down study and an in-house investigation into the temporal dynamics of the KSR1 and KSR2 interactomes. Download HiQuant, sample data sets, and supporting documentation at http://hiquant.primesdb.eu .

Influence of Protein Abundance on High-Throughput Protein-Protein Interaction Detection

DTIC Science & Technology

2009-06-05

the interaction data sets we determined, via comparisons with strict randomized simulations , the propensity for essential proteins to selectively...and analysis of high- quality PPI data sets. Materials and Methods We analyzed protein interaction networks for yeast and E. coli determined from Y2H...we reinvestigated the centrality-lethality rule, which implies that proteins having more interactions are more likely to be essential. From analysis
Analysis to Inform CA Grid Integration Rules for PV: Final Report on Inverter Settings for Transmission and Distribution System Performance

DOE Office of Scientific and Technical Information (OSTI.GOV)

Smith, Jeff; Rylander, Matthew; Boemer, Jens

The fourth solicitation of the California Solar Initiative (CSI) Research, Development, Demonstration and Deployment (RD&D) Program established by the California Public Utilities Commission (CPUC) supported the Electric Power Research Institute (EPRI), National Renewable Energy Laboratory (NREL), and Sandia National Laboratories (SNL) with data provided from Pacific Gas and Electric (PG&E), Southern California Edison (SCE), and San Diego Gas and Electric (SDG&E) conducted research to determine optimal default settings for distributed energy resource advanced inverter controls. The inverter functions studied are aligned with those developed by the California Smart Inverter Working Group (SIWG) and those being considered by the IEEE 1547more » Working Group. The advanced inverter controls examined to improve the distribution system response included power factor, volt-var, and volt-watt. The advanced inverter controls examined to improve the transmission system response included frequency and voltage ride-through as well as Dynamic Voltage Support. This CSI RD&D project accomplished the task of developing methods to derive distribution focused advanced inverter control settings, selecting a diverse set of feeders to evaluate the methods through detailed analysis, and evaluating the effectiveness of each method developed. Inverter settings focused on the transmission system performance were also evaluated and verified. Based on the findings of this work, the suggested advanced inverter settings and methods to determine settings can be used to improve the accommodation of distributed energy resources (PV specifically). The voltage impact from PV can be mitigated using power factor, volt-var, or volt-watt control, while the bulk system impact can be improved with frequency/voltage ride-through.« less
Spiraling between qualitative and quantitative data on women's health behaviors: a double helix model for mixed methods.

PubMed

Mendlinger, Sheryl; Cwikel, Julie

2008-02-01

A double helix spiral model is presented which demonstrates how to combine qualitative and quantitative methods of inquiry in an interactive fashion over time. Using findings on women's health behaviors (e.g., menstruation, breast-feeding, coping strategies), we show how qualitative and quantitative methods highlight the theory of knowledge acquisition in women's health decisions. A rich data set of 48 semistructured, in-depth ethnographic interviews with mother-daughter dyads from six ethnic groups (Israeli, European, North African, Former Soviet Union [FSU], American/Canadian, and Ethiopian), plus seven focus groups, provided the qualitative sources for analysis. This data set formed the basis of research questions used in a quantitative telephone survey of 302 Israeli women from the ages of 25 to 42 from four ethnic groups. We employed multiple cycles of data analysis from both data sets to produce a more detailed and multidimensional picture of women's health behavior decisions through a spiraling process.
Comparison of Feature Selection Techniques in Machine Learning for Anatomical Brain MRI in Dementia.

PubMed

Tohka, Jussi; Moradi, Elaheh; Huttunen, Heikki

2016-07-01

We present a comparative split-half resampling analysis of various data driven feature selection and classification methods for the whole brain voxel-based classification analysis of anatomical magnetic resonance images. We compared support vector machines (SVMs), with or without filter based feature selection, several embedded feature selection methods and stability selection. While comparisons of the accuracy of various classification methods have been reported previously, the variability of the out-of-training sample classification accuracy and the set of selected features due to independent training and test sets have not been previously addressed in a brain imaging context. We studied two classification problems: 1) Alzheimer's disease (AD) vs. normal control (NC) and 2) mild cognitive impairment (MCI) vs. NC classification. In AD vs. NC classification, the variability in the test accuracy due to the subject sample did not vary between different methods and exceeded the variability due to different classifiers. In MCI vs. NC classification, particularly with a large training set, embedded feature selection methods outperformed SVM-based ones with the difference in the test accuracy exceeding the test accuracy variability due to the subject sample. The filter and embedded methods produced divergent feature patterns for MCI vs. NC classification that suggests the utility of the embedded feature selection for this problem when linked with the good generalization performance. The stability of the feature sets was strongly correlated with the number of features selected, weakly correlated with the stability of classification accuracy, and uncorrelated with the average classification accuracy.
Determination of thickness of thin turbid painted over-layers using micro-scale spatially offset Raman spectroscopy

NASA Astrophysics Data System (ADS)

Conti, Claudia; Realini, Marco; Colombo, Chiara; Botteon, Alessandra; Bertasa, Moira; Striova, Jana; Barucci, Marco; Matousek, Pavel

2016-12-01

We present a method for estimating the thickness of thin turbid layers using defocusing micro-spatially offset Raman spectroscopy (micro-SORS). The approach, applicable to highly turbid systems, enables one to predict depths in excess of those accessible with conventional Raman microscopy. The technique can be used, for example, to establish the paint layer thickness on cultural heritage objects, such as panel canvases, mural paintings, painted statues and decorated objects. Other applications include analysis in polymer, biological and biomedical disciplines, catalytic and forensics sciences where highly turbid overlayers are often present and where invasive probing may not be possible or is undesirable. The method comprises two stages: (i) a calibration step for training the method on a well characterized sample set with a known thickness, and (ii) a prediction step where the prediction of layer thickness is carried out non-invasively on samples of unknown thickness of the same chemical and physical make up as the calibration set. An illustrative example of a practical deployment of this method is the analysis of larger areas of paintings. In this case, first, a calibration would be performed on a fragment of painting of a known thickness (e.g. derived from cross-sectional analysis) and subsequently the analysis of thickness across larger areas of painting could then be carried out non-invasively. The performance of the method is compared with that of the more established optical coherence tomography (OCT) technique on identical sample set. This article is part of the themed issue "Raman spectroscopy in art and archaeology".
Correlation set analysis: detecting active regulators in disease populations using prior causal knowledge

PubMed Central

2012-01-01

Background Identification of active causal regulators is a crucial problem in understanding mechanism of diseases or finding drug targets. Methods that infer causal regulators directly from primary data have been proposed and successfully validated in some cases. These methods necessarily require very large sample sizes or a mix of different data types. Recent studies have shown that prior biological knowledge can successfully boost a method's ability to find regulators. Results We present a simple data-driven method, Correlation Set Analysis (CSA), for comprehensively detecting active regulators in disease populations by integrating co-expression analysis and a specific type of literature-derived causal relationships. Instead of investigating the co-expression level between regulators and their regulatees, we focus on coherence of regulatees of a regulator. Using simulated datasets we show that our method performs very well at recovering even weak regulatory relationships with a low false discovery rate. Using three separate real biological datasets we were able to recover well known and as yet undescribed, active regulators for each disease population. The results are represented as a rank-ordered list of regulators, and reveals both single and higher-order regulatory relationships. Conclusions CSA is an intuitive data-driven way of selecting directed perturbation experiments that are relevant to a disease population of interest and represent a starting point for further investigation. Our findings demonstrate that combining co-expression analysis on regulatee sets with a literature-derived network can successfully identify causal regulators and help develop possible hypothesis to explain disease progression. PMID:22443377
GOTree Machine (GOTM): a web-based platform for interpreting sets of interesting genes using Gene Ontology hierarchies

PubMed Central

Zhang, Bing; Schmoyer, Denise; Kirov, Stefan; Snoddy, Jay

2004-01-01

Background Microarray and other high-throughput technologies are producing large sets of interesting genes that are difficult to analyze directly. Bioinformatics tools are needed to interpret the functional information in the gene sets. Results We have created a web-based tool for data analysis and data visualization for sets of genes called GOTree Machine (GOTM). This tool was originally intended to analyze sets of co-regulated genes identified from microarray analysis but is adaptable for use with other gene sets from other high-throughput analyses. GOTree Machine generates a GOTree, a tree-like structure to navigate the Gene Ontology Directed Acyclic Graph for input gene sets. This system provides user friendly data navigation and visualization. Statistical analysis helps users to identify the most important Gene Ontology categories for the input gene sets and suggests biological areas that warrant further study. GOTree Machine is available online at . Conclusion GOTree Machine has a broad application in functional genomic, proteomic and other high-throughput methods that generate large sets of interesting genes; its primary purpose is to help users sort for interesting patterns in gene sets. PMID:14975175
A new method of linkage analysis using LOD scores for quantitative traits supports linkage of monoamine oxidase activity to D17S250 in the Collaborative Study on the Genetics of Alcoholism pedigrees.

PubMed

Curtis, David; Knight, Jo; Sham, Pak C

2005-09-01

Although LOD score methods have been applied to diseases with complex modes of inheritance, linkage analysis of quantitative traits has tended to rely on non-parametric methods based on regression or variance components analysis. Here, we describe a new method for LOD score analysis of quantitative traits which does not require specification of a mode of inheritance. The technique is derived from the MFLINK method for dichotomous traits. A range of plausible transmission models is constructed, constrained to yield the correct population mean and variance for the trait but differing with respect to the contribution to the variance due to the locus under consideration. Maximized LOD scores under homogeneity and admixture are calculated, as is a model-free LOD score which compares the maximized likelihoods under admixture assuming linkage and no linkage. These LOD scores have known asymptotic distributions and hence can be used to provide a statistical test for linkage. The method has been implemented in a program called QMFLINK. It was applied to data sets simulated using a variety of transmission models and to a measure of monoamine oxidase activity in 105 pedigrees from the Collaborative Study on the Genetics of Alcoholism. With the simulated data, the results showed that the new method could detect linkage well if the true allele frequency for the trait was close to that specified. However, it performed poorly on models in which the true allele frequency was much rarer. For the Collaborative Study on the Genetics of Alcoholism data set only a modest overlap was observed between the results obtained from the new method and those obtained when the same data were analysed previously using regression and variance components analysis. Of interest is that D17S250 produced a maximized LOD score under homogeneity and admixture of 2.6 but did not indicate linkage using the previous methods. However, this region did produce evidence for linkage in a separate data set, suggesting that QMFLINK may have been able to detect a true linkage which was not picked up by the other methods. The application of model-free LOD score analysis to quantitative traits is novel and deserves further evaluation of its merits and disadvantages relative to other methods.
A meta-data based method for DNA microarray imputation.

PubMed

Jörnsten, Rebecka; Ouyang, Ming; Wang, Hui-Yu

2007-03-29

DNA microarray experiments are conducted in logical sets, such as time course profiling after a treatment is applied to the samples, or comparisons of the samples under two or more conditions. Due to cost and design constraints of spotted cDNA microarray experiments, each logical set commonly includes only a small number of replicates per condition. Despite the vast improvement of the microarray technology in recent years, missing values are prevalent. Intuitively, imputation of missing values is best done using many replicates within the same logical set. In practice, there are few replicates and thus reliable imputation within logical sets is difficult. However, it is in the case of few replicates that the presence of missing values, and how they are imputed, can have the most profound impact on the outcome of downstream analyses (e.g. significance analysis and clustering). This study explores the feasibility of imputation across logical sets, using the vast amount of publicly available microarray data to improve imputation reliability in the small sample size setting. We download all cDNA microarray data of Saccharomyces cerevisiae, Arabidopsis thaliana, and Caenorhabditis elegans from the Stanford Microarray Database. Through cross-validation and simulation, we find that, for all three species, our proposed imputation using data from public databases is far superior to imputation within a logical set, sometimes to an astonishing degree. Furthermore, the imputation root mean square error for significant genes is generally a lot less than that of non-significant ones. Since downstream analysis of significant genes, such as clustering and network analysis, can be very sensitive to small perturbations of estimated gene effects, it is highly recommended that researchers apply reliable data imputation prior to further analysis. Our method can also be applied to cDNA microarray experiments from other species, provided good reference data are available.
Spectroscopic studies (FTIR, FT-Raman and UV-Visible), normal coordinate analysis, NBO analysis, first order hyper polarizability, HOMO and LUMO analysis of (1R)-N-(Prop-2-yn-1-yl)-2,3-dihydro-1H-inden-1-amine molecule by ab initio HF and density functional methods.

PubMed

Muthu, S; Ramachandran, G

2014-01-01

The Fourier transform infrared (FT-IR) and FT-Raman of (1R)-N-(Prop-2-yn-1-yl)-2,3-dihydro-1H-inden-1-amine (1RNPDA) were recorded in the regions 4000-400 cm(-1) and 4000-100 cm(-1) respectively. A complete assignment and analysis of the fundamental vibrational modes of the molecule were carried out. The observed fundamental modes have been compared with the harmonic vibrational frequencies computed using HF method by employing 6-31G(d,p) basis set and DFT(B3LYP) method by employing 6-31G(d,p) basis set. The vibrational studies were interpreted in terms of Potential Energy Distribution (PED). The complete vibrational frequency assignments were made by Normal Co-ordinate Analysis (NCA) following the scaled quantum mechanical force field methodology (SQMFF). The first order hyper polarizability (β0) of this molecular system and related properties (α, μ, and Δα) are calculated using B3LYP/6-31G(d,p) method based on the finite-field approach. The thermodynamic functions of the title compound were also performed at the above methods and basis set. A detailed interpretation of the infrared and Raman spectra of 1RNPDA is reported. The (1)H and (13)C nuclear magnetic resonance (NMR) chemical shifts of the molecule were calculated using the GIAO method confirms with the experimental values. Stability of the molecule arising from hyper-conjugative interactions and charge delocalization has been analyzed using Natural Bond Orbital (NBO) analysis. UV-vis spectrum of the compound was recorded and electronic properties such as excitation energies, oscillator strength and wavelength were performed by TD-DFT/B3LYP using 6-31G(d,p) basis set. The HOMO and LUMO energy gap reveals that the energy gap reflects the chemical activity of the molecule. The observed and calculated wave numbers are formed to be in good agreement. The experimental spectra also coincide satisfactorily with those of theoretically constructed spectra. Copyright © 2013 Elsevier B.V. All rights reserved.
Demonstration of a software design and statistical analysis methodology with application to patient outcomes data sets

PubMed Central

Mayo, Charles; Conners, Steve; Warren, Christopher; Miller, Robert; Court, Laurence; Popple, Richard

2013-01-01

Purpose: With emergence of clinical outcomes databases as tools utilized routinely within institutions, comes need for software tools to support automated statistical analysis of these large data sets and intrainstitutional exchange from independent federated databases to support data pooling. In this paper, the authors present a design approach and analysis methodology that addresses both issues. Methods: A software application was constructed to automate analysis of patient outcomes data using a wide range of statistical metrics, by combining use of C#.Net and R code. The accuracy and speed of the code was evaluated using benchmark data sets. Results: The approach provides data needed to evaluate combinations of statistical measurements for ability to identify patterns of interest in the data. Through application of the tools to a benchmark data set for dose-response threshold and to SBRT lung data sets, an algorithm was developed that uses receiver operator characteristic curves to identify a threshold value and combines use of contingency tables, Fisher exact tests, Welch t-tests, and Kolmogorov-Smirnov tests to filter the large data set to identify values demonstrating dose-response. Kullback-Leibler divergences were used to provide additional confirmation. Conclusions: The work demonstrates the viability of the design approach and the software tool for analysis of large data sets. PMID:24320426
Toward a Richer View of the Scientific Method: The Role of Conceptual Analysis

ERIC Educational Resources Information Center

Machado, Armando; Silva, Francisco J.

2007-01-01

Within the complex set of activities that comprise the scientific method, three clusters of activities can be recognized: experimentation, mathematization, and conceptual analysis. In psychology, the first two of these clusters are well-known and valued, but the third seems less known and valued. The authors show the value of these three clusters…
Heuristics for Understanding the Concepts of Interaction, Polynomial Trend, and the General Linear Model.

ERIC Educational Resources Information Center

Thompson, Bruce

The relationship between analysis of variance (ANOVA) methods and their analogs (analysis of covariance and multiple analyses of variance and covariance--collectively referred to as OVA methods) and the more general analytic case is explored. A small heuristic data set is used, with a hypothetical sample of 20 subjects, randomly assigned to five…
Formative Research on the Simplifying Conditions Method (SCM) for Task Analysis and Sequencing.

ERIC Educational Resources Information Center

Kim, YoungHwan; Reigluth, Charles M.

The Simplifying Conditions Method (SCM) is a set of guidelines for task analysis and sequencing of instructional content under the Elaboration Theory (ET). This article introduces the fundamentals of SCM and presents the findings from a formative research study on SCM. It was conducted in two distinct phases: design and instruction. In the first…
A Guide to Analyzing Message-Response Sequences and Group Interaction Patterns in Computer-Mediated Communication

ERIC Educational Resources Information Center

Jeong, Allan

2005-01-01

This paper proposes a set of methods and a framework for evaluating, modeling, and predicting group interactions in computer-mediated communication. The method of sequential analysis is described along with specific software tools and techniques to facilitate the analysis of message-response sequences. In addition, the Dialogic Theory and its…
The Use of Time Series Analysis and t Tests with Serially Correlated Data Tests.

ERIC Educational Resources Information Center

Nicolich, Mark J.; Weinstein, Carol S.

1981-01-01

Results of three methods of analysis applied to simulated autocorrelated data sets with an intervention point (varying in autocorrelation degree, variance of error term, and magnitude of intervention effect) are compared and presented. The three methods are: t tests; maximum likelihood Box-Jenkins (ARIMA); and Bayesian Box Jenkins. (Author/AEF)
Decision modeling for fire incident analysis

Treesearch

Donald G. MacGregor; Armando González-Cabán

2009-01-01

This paper reports on methods for representing and modeling fire incidents based on concepts and models from the decision and risk sciences. A set of modeling techniques are used to characterize key fire management decision processes and provide a basis for incident analysis. The results of these methods can be used to provide insights into the structure of fire...
Development of the mathematical model for design and verification of acoustic modal analysis methods

NASA Astrophysics Data System (ADS)

Siner, Alexander; Startseva, Maria

2016-10-01

To reduce the turbofan noise it is necessary to develop methods for the analysis of the sound field generated by the blade machinery called modal analysis. Because modal analysis methods are very difficult and their testing on the full scale measurements are very expensive and tedious it is necessary to construct some mathematical models allowing to test modal analysis algorithms fast and cheap. At this work the model allowing to set single modes at the channel and to analyze generated sound field is presented. Modal analysis of the sound generated by the ring array of point sound sources is made. Comparison of experimental and numerical modal analysis results is presented at this work.
Delay correlation analysis and representation for vital complaint VHDL models

DOEpatents

Rich, Marvin J.; Misra, Ashutosh

2004-11-09

A method and system unbind a rise/fall tuple of a VHDL generic variable and create rise time and fall time generics of each generic variable that are independent of each other. Then, according to a predetermined correlation policy, the method and system collect delay values in a VHDL standard delay file, sort the delay values, remove duplicate delay values, group the delay values into correlation sets, and output an analysis file. The correlation policy may include collecting all generic variables in a VHDL standard delay file, selecting each generic variable, and performing reductions on the set of delay values associated with each selected generic variable.
Three novel approaches to structural identifiability analysis in mixed-effects models.

PubMed

Janzén, David L I; Jirstrand, Mats; Chappell, Michael J; Evans, Neil D

2016-05-06

Structural identifiability is a concept that considers whether the structure of a model together with a set of input-output relations uniquely determines the model parameters. In the mathematical modelling of biological systems, structural identifiability is an important concept since biological interpretations are typically made from the parameter estimates. For a system defined by ordinary differential equations, several methods have been developed to analyse whether the model is structurally identifiable or otherwise. Another well-used modelling framework, which is particularly useful when the experimental data are sparsely sampled and the population variance is of interest, is mixed-effects modelling. However, established identifiability analysis techniques for ordinary differential equations are not directly applicable to such models. In this paper, we present and apply three different methods that can be used to study structural identifiability in mixed-effects models. The first method, called the repeated measurement approach, is based on applying a set of previously established statistical theorems. The second method, called the augmented system approach, is based on augmenting the mixed-effects model to an extended state-space form. The third method, called the Laplace transform mixed-effects extension, is based on considering the moment invariants of the systems transfer function as functions of random variables. To illustrate, compare and contrast the application of the three methods, they are applied to a set of mixed-effects models. Three structural identifiability analysis methods applicable to mixed-effects models have been presented in this paper. As method development of structural identifiability techniques for mixed-effects models has been given very little attention, despite mixed-effects models being widely used, the methods presented in this paper provides a way of handling structural identifiability in mixed-effects models previously not possible. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

The Gaussian Graphical Model in Cross-Sectional and Time-Series Data.

PubMed

Epskamp, Sacha; Waldorp, Lourens J; Mõttus, René; Borsboom, Denny

2018-04-16

We discuss the Gaussian graphical model (GGM; an undirected network of partial correlation coefficients) and detail its utility as an exploratory data analysis tool. The GGM shows which variables predict one-another, allows for sparse modeling of covariance structures, and may highlight potential causal relationships between observed variables. We describe the utility in three kinds of psychological data sets: data sets in which consecutive cases are assumed independent (e.g., cross-sectional data), temporally ordered data sets (e.g., n = 1 time series), and a mixture of the 2 (e.g., n > 1 time series). In time-series analysis, the GGM can be used to model the residual structure of a vector-autoregression analysis (VAR), also termed graphical VAR. Two network models can then be obtained: a temporal network and a contemporaneous network. When analyzing data from multiple subjects, a GGM can also be formed on the covariance structure of stationary means-the between-subjects network. We discuss the interpretation of these models and propose estimation methods to obtain these networks, which we implement in the R packages graphicalVAR and mlVAR. The methods are showcased in two empirical examples, and simulation studies on these methods are included in the supplementary materials.
Non-linear principal component analysis applied to Lorenz models and to North Atlantic SLP

NASA Astrophysics Data System (ADS)

Russo, A.; Trigo, R. M.

2003-04-01

A non-linear generalisation of Principal Component Analysis (PCA), denoted Non-Linear Principal Component Analysis (NLPCA), is introduced and applied to the analysis of three data sets. Non-Linear Principal Component Analysis allows for the detection and characterisation of low-dimensional non-linear structure in multivariate data sets. This method is implemented using a 5-layer feed-forward neural network introduced originally in the chemical engineering literature (Kramer, 1991). The method is described and details of its implementation are addressed. Non-Linear Principal Component Analysis is first applied to a data set sampled from the Lorenz attractor (1963). It is found that the NLPCA approximations are more representative of the data than are the corresponding PCA approximations. The same methodology was applied to the less known Lorenz attractor (1984). However, the results obtained weren't as good as those attained with the famous 'Butterfly' attractor. Further work with this model is underway in order to assess if NLPCA techniques can be more representative of the data characteristics than are the corresponding PCA approximations. The application of NLPCA to relatively 'simple' dynamical systems, such as those proposed by Lorenz, is well understood. However, the application of NLPCA to a large climatic data set is much more challenging. Here, we have applied NLPCA to the sea level pressure (SLP) field for the entire North Atlantic area and the results show a slight imcrement of explained variance associated. Finally, directions for future work are presented.%}
Monochloramine Disinfection Kinetics of Nitrosomonas europaea by Propidium Monoazide Quantitative PCR and Live/Dead BacLight Methods▿

PubMed Central

Wahman, David G.; Wulfeck-Kleier, Karen A.; Pressman, Jonathan G.

2009-01-01

Monochloramine disinfection kinetics were determined for the pure-culture ammonia-oxidizing bacterium Nitrosomonas europaea (ATCC 19718) by two culture-independent methods, namely, Live/Dead BacLight (LD) and propidium monoazide quantitative PCR (PMA-qPCR). Both methods were first verified with mixtures of heat-killed (nonviable) and non-heat-killed (viable) cells before a series of batch disinfection experiments with stationary-phase cultures (batch grown for 7 days) at pH 8.0, 25°C, and 5, 10, and 20 mg Cl2/liter monochloramine. Two data sets were generated based on the viability method used, either (i) LD or (ii) PMA-qPCR. These two data sets were used to estimate kinetic parameters for the delayed Chick-Watson disinfection model through a Bayesian analysis implemented in WinBUGS. This analysis provided parameter estimates of 490 mg Cl2-min/liter for the lag coefficient (b) and 1.6 × 10−3 to 4.0 × 10−3 liter/mg Cl2-min for the Chick-Watson disinfection rate constant (k). While estimates of b were similar for both data sets, the LD data set resulted in a greater k estimate than that obtained with the PMA-qPCR data set, implying that the PMA-qPCR viability measure was more conservative than LD. For N. europaea, the lag phase was not previously reported for culture-independent methods and may have implications for nitrification in drinking water distribution systems. This is the first published application of a PMA-qPCR method for disinfection kinetic model parameter estimation as well as its application to N. europaea or monochloramine. Ultimately, this PMA-qPCR method will allow evaluation of monochloramine disinfection kinetics for mixed-culture bacteria in drinking water distribution systems. PMID:19561179
Do regional methods really help reduce uncertainties in flood frequency analyses?

NASA Astrophysics Data System (ADS)

Cong Nguyen, Chi; Payrastre, Olivier; Gaume, Eric

2013-04-01

Flood frequency analyses are often based on continuous measured series at gauge sites. However, the length of the available data sets is usually too short to provide reliable estimates of extreme design floods. To reduce the estimation uncertainties, the analyzed data sets have to be extended either in time, making use of historical and paleoflood data, or in space, merging data sets considered as statistically homogeneous to build large regional data samples. Nevertheless, the advantage of the regional analyses, the important increase of the size of the studied data sets, may be counterbalanced by the possible heterogeneities of the merged sets. The application and comparison of four different flood frequency analysis methods to two regions affected by flash floods in the south of France (Ardèche and Var) illustrates how this balance between the number of records and possible heterogeneities plays in real-world applications. The four tested methods are: (1) a local statistical analysis based on the existing series of measured discharges, (2) a local analysis valuating the existing information on historical floods, (3) a standard regional flood frequency analysis based on existing measured series at gauged sites and (4) a modified regional analysis including estimated extreme peak discharges at ungauged sites. Monte Carlo simulations are conducted to simulate a large number of discharge series with characteristics similar to the observed ones (type of statistical distributions, number of sites and records) to evaluate to which extent the results obtained on these case studies can be generalized. These two case studies indicate that even small statistical heterogeneities, which are not detected by the standard homogeneity tests implemented in regional flood frequency studies, may drastically limit the usefulness of such approaches. On the other hand, these result show that the valuation of information on extreme events, either historical flood events at gauged sites or estimated extremes at ungauged sites in the considered region, is an efficient way to reduce uncertainties in flood frequency studies.
Trajectory Optimization for Spacecraft Collision Avoidance

DTIC Science & Technology

2013-09-01

Modified Set of Equinoctial Orbit Elements . AAS/AIAA 91-524," in Astrodynamics Specialist Conference, Durango, CO, 1991. [18] D. E. Kirk...these singularities, the COE are not necessarily the best set of states for numerical analysis. 2.3.3 Equinoctial Orbital Elements A third method of...completely defining an orbit is by the use of the Equinoctial Orbital Elements . This element set maintains the
Functional Module Search in Protein Networks based on Semantic Similarity Improves the Analysis of Proteomics Data*

PubMed Central

Boyanova, Desislava; Nilla, Santosh; Klau, Gunnar W.; Dandekar, Thomas; Müller, Tobias; Dittrich, Marcus

2014-01-01

The continuously evolving field of proteomics produces increasing amounts of data while improving the quality of protein identifications. Albeit quantitative measurements are becoming more popular, many proteomic studies are still based on non-quantitative methods for protein identification. These studies result in potentially large sets of identified proteins, where the biological interpretation of proteins can be challenging. Systems biology develops innovative network-based methods, which allow an integrated analysis of these data. Here we present a novel approach, which combines prior knowledge of protein-protein interactions (PPI) with proteomics data using functional similarity measurements of interacting proteins. This integrated network analysis exactly identifies network modules with a maximal consistent functional similarity reflecting biological processes of the investigated cells. We validated our approach on small (H9N2 virus-infected gastric cells) and large (blood constituents) proteomic data sets. Using this novel algorithm, we identified characteristic functional modules in virus-infected cells, comprising key signaling proteins (e.g. the stress-related kinase RAF1) and demonstrate that this method allows a module-based functional characterization of cell types. Analysis of a large proteome data set of blood constituents resulted in clear separation of blood cells according to their developmental origin. A detailed investigation of the T-cell proteome further illustrates how the algorithm partitions large networks into functional subnetworks each representing specific cellular functions. These results demonstrate that the integrated network approach not only allows a detailed analysis of proteome networks but also yields a functional decomposition of complex proteomic data sets and thereby provides deeper insights into the underlying cellular processes of the investigated system. PMID:24807868
[Study on discrimination of varieties of fire resistive coating for steel structure based on near-infrared spectroscopy].

PubMed

Xue, Gang; Song, Wen-qi; Li, Shu-chao

2015-01-01

In order to achieve the rapid identification of fire resistive coating for steel structure of different brands in circulating, a new method for the fast discrimination of varieties of fire resistive coating for steel structure by means of near infrared spectroscopy was proposed. The raster scanning near infrared spectroscopy instrument and near infrared diffuse reflectance spectroscopy were applied to collect the spectral curve of different brands of fire resistive coating for steel structure and the spectral data were preprocessed with standard normal variate transformation(standard normal variate transformation, SNV) and Norris second derivative. The principal component analysis (principal component analysis, PCA)was used to near infrared spectra for cluster analysis. The analysis results showed that the cumulate reliabilities of PC1 to PC5 were 99. 791%. The 3-dimentional plot was drawn with the scores of PC1, PC2 and PC3 X 10, which appeared to provide the best clustering of the varieties of fire resistive coating for steel structure. A total of 150 fire resistive coating samples were divided into calibration set and validation set randomly, the calibration set had 125 samples with 25 samples of each variety, and the validation set had 25 samples with 5 samples of each variety. According to the principal component scores of unknown samples, Mahalanobis distance values between each variety and unknown samples were calculated to realize the discrimination of different varieties. The qualitative analysis model for external verification of unknown samples is a 10% recognition ration. The results demonstrated that this identification method can be used as a rapid, accurate method to identify the classification of fire resistive coating for steel structure and provide technical reference for market regulation.
An industry consensus study on an HPLC fluorescence method for the determination of (±)-catechin and (±)-epicatechin in cocoa and chocolate products.

PubMed

Shumow, Laura; Bodor, Alison

2011-07-05

This manuscript describes the results of an HPLC study for the determination of the flavan-3-ol monomers, (±)-catechin and (±)-epicatechin, in cocoa and plain dark and milk chocolate products. The study was performed under the auspices of the National Confectioners Association (NCA) and involved the analysis of a series of samples by laboratories of five member companies using a common method. The method reported in this paper uses reversed phase HPLC with fluorescence detection to analyze (±)-epicatechin and (±)-catechin extracted with an acidic solvent from defatted cocoa and chocolate. In addition to a variety of cocoa and chocolate products, the sample set included a blind duplicate used to assess method reproducibility. All data were subjected to statistical analysis with outliers eliminated from the data set. The percent coefficient of variation (%CV) of the sample set ranged from approximately 7 to 15%. Further experimental details are described in the body of the manuscript and the results indicate the method is suitable for the determination of (±)-catechin and (±)-epicatechin in cocoa and chocolate products and represents the first collaborative study of this HPLC method for these compounds in these matrices.
ADGO: analysis of differentially expressed gene sets using composite GO annotation.

PubMed

Nam, Dougu; Kim, Sang-Bae; Kim, Seon-Kyu; Yang, Sungjin; Kim, Seon-Young; Chu, In-Sun

2006-09-15

Genes are typically expressed in modular manners in biological processes. Recent studies reflect such features in analyzing gene expression patterns by directly scoring gene sets. Gene annotations have been used to define the gene sets, which have served to reveal specific biological themes from expression data. However, current annotations have limited analytical power, because they are classified by single categories providing only unary information for the gene sets. Here we propose a method for discovering composite biological themes from expression data. We intersected two annotated gene sets from different categories of Gene Ontology (GO). We then scored the expression changes of all the single and intersected sets. In this way, we were able to uncover, for example, a gene set with the molecular function F and the cellular component C that showed significant expression change, while the changes in individual gene sets were not significant. We provided an exemplary analysis for HIV-1 immune response. In addition, we tested the method on 20 public datasets where we found many 'filtered' composite terms the number of which reached approximately 34% (a strong criterion, 5% significance) of the number of significant unary terms on average. By using composite annotation, we can derive new and improved information about disease and biological processes from expression data. We provide a web application (ADGO: http://array.kobic.re.kr/ADGO) for the analysis of differentially expressed gene sets with composite GO annotations. The user can analyze Affymetrix and dual channel array (spotted cDNA and spotted oligo microarray) data for four species: human, mouse, rat and yeast. chu@kribb.re.kr http://array.kobic.re.kr/ADGO.
A Review On Missing Value Estimation Using Imputation Algorithm

NASA Astrophysics Data System (ADS)

Armina, Roslan; Zain, Azlan Mohd; Azizah Ali, Nor; Sallehuddin, Roselina

2017-09-01

The presence of the missing value in the data set has always been a major problem for precise prediction. The method for imputing missing value needs to minimize the effect of incomplete data sets for the prediction model. Many algorithms have been proposed for countermeasure of missing value problem. In this review, we provide a comprehensive analysis of existing imputation algorithm, focusing on the technique used and the implementation of global or local information of data sets for missing value estimation. In addition validation method for imputation result and way to measure the performance of imputation algorithm also described. The objective of this review is to highlight possible improvement on existing method and it is hoped that this review gives reader better understanding of imputation method trend.
Examining organizational change in primary care practices: experiences from using ethnographic methods.

PubMed

Russell, Grant; Advocat, Jenny; Geneau, Robert; Farrell, Barbara; Thille, Patricia; Ward, Natalie; Evans, Samantha

2012-08-01

Qualitative methods are an important part of the primary care researcher's toolkit providing a nuanced view of the complexity in primary care reform and delivery. Ethnographic research is a comprehensive approach to qualitative data collection, including observation, in-depth interviews and document analysis. Few studies have been published outlining methodological issues related to ethnography in this setting. This paper examines some of the challenges of conducting an ethnographic study in primary care setting in Canada, where there recently have been major reforms to traditional methods of organizing primary care services. This paper is based on an ethnographic study set in primary care practices in Ontario, Canada, designed to investigate changes to organizational and clinical routines in practices undergoing transition to new, interdisciplinary Family Health Teams (FHTs). The study was set in six new FHTs in Ontario. This paper is a reflexive examination of some of the challenges encountered while conducting an ethnographic study in a primary care setting. Our experiences in this study highlight some potential benefits of and difficulties in conducting an ethnographic study in family practice. Our study design gave us an opportunity to highlight the changes in routines within an organization in transition. A study with a clinical perspective requires training, support, a mixture of backgrounds and perspectives and ongoing communication. Despite some of the difficulties, the richness of this method has allowed the exploration of a number of additional research questions that emerged during data analysis.
Improved accuracy in quantitative laser-induced breakdown spectroscopy using sub-models

USGS Publications Warehouse

Anderson, Ryan; Clegg, Samuel M.; Frydenvang, Jens; Wiens, Roger C.; McLennan, Scott M.; Morris, Richard V.; Ehlmann, Bethany L.; Dyar, M. Darby

2017-01-01

Accurate quantitative analysis of diverse geologic materials is one of the primary challenges faced by the Laser-Induced Breakdown Spectroscopy (LIBS)-based ChemCam instrument on the Mars Science Laboratory (MSL) rover. The SuperCam instrument on the Mars 2020 rover, as well as other LIBS instruments developed for geochemical analysis on Earth or other planets, will face the same challenge. Consequently, part of the ChemCam science team has focused on the development of improved multivariate analysis calibrations methods. Developing a single regression model capable of accurately determining the composition of very different target materials is difficult because the response of an element’s emission lines in LIBS spectra can vary with the concentration of other elements. We demonstrate a conceptually simple “sub-model” method for improving the accuracy of quantitative LIBS analysis of diverse target materials. The method is based on training several regression models on sets of targets with limited composition ranges and then “blending” these “sub-models” into a single final result. Tests of the sub-model method show improvement in test set root mean squared error of prediction (RMSEP) for almost all cases. The sub-model method, using partial least squares regression (PLS), is being used as part of the current ChemCam quantitative calibration, but the sub-model method is applicable to any multivariate regression method and may yield similar improvements.
Use of direct gradient analysis to uncover biological hypotheses in 16s survey data and beyond.

PubMed

Erb-Downward, John R; Sadighi Akha, Amir A; Wang, Juan; Shen, Ning; He, Bei; Martinez, Fernando J; Gyetko, Margaret R; Curtis, Jeffrey L; Huffnagle, Gary B

2012-01-01

This study investigated the use of direct gradient analysis of bacterial 16S pyrosequencing surveys to identify relevant bacterial community signals in the midst of a "noisy" background, and to facilitate hypothesis-testing both within and beyond the realm of ecological surveys. The results, utilizing 3 different real world data sets, demonstrate the utility of adding direct gradient analysis to any analysis that draws conclusions from indirect methods such as Principal Component Analysis (PCA) and Principal Coordinates Analysis (PCoA). Direct gradient analysis produces testable models, and can identify significant patterns in the midst of noisy data. Additionally, we demonstrate that direct gradient analysis can be used with other kinds of multivariate data sets, such as flow cytometric data, to identify differentially expressed populations. The results of this study demonstrate the utility of direct gradient analysis in microbial ecology and in other areas of research where large multivariate data sets are involved.
Sample entropy analysis of cervical neoplasia gene-expression signatures

PubMed Central

Botting, Shaleen K; Trzeciakowski, Jerome P; Benoit, Michelle F; Salama, Salama A; Diaz-Arrastia, Concepcion R

2009-01-01

Background We introduce Approximate Entropy as a mathematical method of analysis for microarray data. Approximate entropy is applied here as a method to classify the complex gene expression patterns resultant of a clinical sample set. Since Entropy is a measure of disorder in a system, we believe that by choosing genes which display minimum entropy in normal controls and maximum entropy in the cancerous sample set we will be able to distinguish those genes which display the greatest variability in the cancerous set. Here we describe a method of utilizing Approximate Sample Entropy (ApSE) analysis to identify genes of interest with the highest probability of producing an accurate, predictive, classification model from our data set. Results In the development of a diagnostic gene-expression profile for cervical intraepithelial neoplasia (CIN) and squamous cell carcinoma of the cervix, we identified 208 genes which are unchanging in all normal tissue samples, yet exhibit a random pattern indicative of the genetic instability and heterogeneity of malignant cells. This may be measured in terms of the ApSE when compared to normal tissue. We have validated 10 of these genes on 10 Normal and 20 cancer and CIN3 samples. We report that the predictive value of the sample entropy calculation for these 10 genes of interest is promising (75% sensitivity, 80% specificity for prediction of cervical cancer over CIN3). Conclusion The success of the Approximate Sample Entropy approach in discerning alterations in complexity from biological system with such relatively small sample set, and extracting biologically relevant genes of interest hold great promise. PMID:19232110
Heuristics to Facilitate Understanding of Discriminant Analysis.

ERIC Educational Resources Information Center

Van Epps, Pamela D.

This paper discusses the principles underlying discriminant analysis and constructs a simulated data set to illustrate its methods. Discriminant analysis is a multivariate technique for identifying the best combination of variables to maximally discriminate between groups. Discriminant functions are established on existing groups and used to…
Statistical inference for time course RNA-Seq data using a negative binomial mixed-effect model.

PubMed

Sun, Xiaoxiao; Dalpiaz, David; Wu, Di; S Liu, Jun; Zhong, Wenxuan; Ma, Ping

2016-08-26

Accurate identification of differentially expressed (DE) genes in time course RNA-Seq data is crucial for understanding the dynamics of transcriptional regulatory network. However, most of the available methods treat gene expressions at different time points as replicates and test the significance of the mean expression difference between treatments or conditions irrespective of time. They thus fail to identify many DE genes with different profiles across time. In this article, we propose a negative binomial mixed-effect model (NBMM) to identify DE genes in time course RNA-Seq data. In the NBMM, mean gene expression is characterized by a fixed effect, and time dependency is described by random effects. The NBMM is very flexible and can be fitted to both unreplicated and replicated time course RNA-Seq data via a penalized likelihood method. By comparing gene expression profiles over time, we further classify the DE genes into two subtypes to enhance the understanding of expression dynamics. A significance test for detecting DE genes is derived using a Kullback-Leibler distance ratio. Additionally, a significance test for gene sets is developed using a gene set score. Simulation analysis shows that the NBMM outperforms currently available methods for detecting DE genes and gene sets. Moreover, our real data analysis of fruit fly developmental time course RNA-Seq data demonstrates the NBMM identifies biologically relevant genes which are well justified by gene ontology analysis. The proposed method is powerful and efficient to detect biologically relevant DE genes and gene sets in time course RNA-Seq data.
A hybrid wavelet de-noising and Rank-Set Pair Analysis approach for forecasting hydro-meteorological time series.

PubMed

Wang, Dong; Borthwick, Alistair G; He, Handan; Wang, Yuankun; Zhu, Jieyu; Lu, Yuan; Xu, Pengcheng; Zeng, Xiankui; Wu, Jichun; Wang, Lachun; Zou, Xinqing; Liu, Jiufu; Zou, Ying; He, Ruimin

2018-01-01

Accurate, fast forecasting of hydro-meteorological time series is presently a major challenge in drought and flood mitigation. This paper proposes a hybrid approach, wavelet de-noising (WD) and Rank-Set Pair Analysis (RSPA), that takes full advantage of a combination of the two approaches to improve forecasts of hydro-meteorological time series. WD allows decomposition and reconstruction of a time series by the wavelet transform, and hence separation of the noise from the original series. RSPA, a more reliable and efficient version of Set Pair Analysis, is integrated with WD to form the hybrid WD-RSPA approach. Two types of hydro-meteorological data sets with different characteristics and different levels of human influences at some representative stations are used to illustrate the WD-RSPA approach. The approach is also compared to three other generic methods: the conventional Auto Regressive Integrated Moving Average (ARIMA) method, Artificial Neural Networks (ANNs) (BP-error Back Propagation, MLP-Multilayer Perceptron and RBF-Radial Basis Function), and RSPA alone. Nine error metrics are used to evaluate the model performance. Compared to three other generic methods, the results generated by WD-REPA model presented invariably smaller error measures which means the forecasting capability of the WD-REPA model is better than other models. The results show that WD-RSPA is accurate, feasible, and effective. In particular, WD-RSPA is found to be the best among the various generic methods compared in this paper, even when the extreme events are included within a time series. Copyright © 2017 Elsevier Inc. All rights reserved.
Optimizing detection and analysis of slow waves in sleep EEG.

PubMed

Mensen, Armand; Riedner, Brady; Tononi, Giulio

2016-12-01

Analysis of individual slow waves in EEG recording during sleep provides both greater sensitivity and specificity compared to spectral power measures. However, parameters for detection and analysis have not been widely explored and validated. We present a new, open-source, Matlab based, toolbox for the automatic detection and analysis of slow waves; with adjustable parameter settings, as well as manual correction and exploration of the results using a multi-faceted visualization tool. We explore a large search space of parameter settings for slow wave detection and measure their effects on a selection of outcome parameters. Every choice of parameter setting had some effect on at least one outcome parameter. In general, the largest effect sizes were found when choosing the EEG reference, type of canonical waveform, and amplitude thresholding. Previously published methods accurately detect large, global waves but are conservative and miss the detection of smaller amplitude, local slow waves. The toolbox has additional benefits in terms of speed, user-interface, and visualization options to compare and contrast slow waves. The exploration of parameter settings in the toolbox highlights the importance of careful selection of detection METHODS: The sensitivity and specificity of the automated detection can be improved by manually adding or deleting entire waves and or specific channels using the toolbox visualization functions. The toolbox standardizes the detection procedure, sets the stage for reliable results and comparisons and is easy to use without previous programming experience. Copyright © 2016 Elsevier B.V. All rights reserved.
Comparative Analysis of a Principal Component Analysis-Based and an Artificial Neural Network-Based Method for Baseline Removal.

PubMed

Carvajal, Roberto C; Arias, Luis E; Garces, Hugo O; Sbarbaro, Daniel G

2016-04-01

This work presents a non-parametric method based on a principal component analysis (PCA) and a parametric one based on artificial neural networks (ANN) to remove continuous baseline features from spectra. The non-parametric method estimates the baseline based on a set of sampled basis vectors obtained from PCA applied over a previously composed continuous spectra learning matrix. The parametric method, however, uses an ANN to filter out the baseline. Previous studies have demonstrated that this method is one of the most effective for baseline removal. The evaluation of both methods was carried out by using a synthetic database designed for benchmarking baseline removal algorithms, containing 100 synthetic composed spectra at different signal-to-baseline ratio (SBR), signal-to-noise ratio (SNR), and baseline slopes. In addition to deomonstrating the utility of the proposed methods and to compare them in a real application, a spectral data set measured from a flame radiation process was used. Several performance metrics such as correlation coefficient, chi-square value, and goodness-of-fit coefficient were calculated to quantify and compare both algorithms. Results demonstrate that the PCA-based method outperforms the one based on ANN both in terms of performance and simplicity. © The Author(s) 2016.
System and method for generating a deselect mapping for a focal plane array

DOEpatents

Bixler, Jay V; Brandt, Timothy G; Conger, James L; Lawson, Janice K

2013-05-21

A method for generating a deselect mapping for a focal plane array according to one embodiment includes gathering a data set for a focal plane array when exposed to light or radiation from a first known target; analyzing the data set for determining which pixels or subpixels of the focal plane array to add to a deselect mapping; adding the pixels or subpixels to the deselect mapping based on the analysis; and storing the deselect mapping. A method for gathering data using a focal plane array according to another embodiment includes deselecting pixels or subpixels based on a deselect mapping; gathering a data set using pixels or subpixels in a focal plane array that are not deselected upon exposure thereof to light or radiation from a target of interest; and outputting the data set.

A systematic evaluation of normalization methods in quantitative label-free proteomics.

PubMed

Välikangas, Tommi; Suomi, Tomi; Elo, Laura L

2018-01-01

To date, mass spectrometry (MS) data remain inherently biased as a result of reasons ranging from sample handling to differences caused by the instrumentation. Normalization is the process that aims to account for the bias and make samples more comparable. The selection of a proper normalization method is a pivotal task for the reliability of the downstream analysis and results. Many normalization methods commonly used in proteomics have been adapted from the DNA microarray techniques. Previous studies comparing normalization methods in proteomics have focused mainly on intragroup variation. In this study, several popular and widely used normalization methods representing different strategies in normalization are evaluated using three spike-in and one experimental mouse label-free proteomic data sets. The normalization methods are evaluated in terms of their ability to reduce variation between technical replicates, their effect on differential expression analysis and their effect on the estimation of logarithmic fold changes. Additionally, we examined whether normalizing the whole data globally or in segments for the differential expression analysis has an effect on the performance of the normalization methods. We found that variance stabilization normalization (Vsn) reduced variation the most between technical replicates in all examined data sets. Vsn also performed consistently well in the differential expression analysis. Linear regression normalization and local regression normalization performed also systematically well. Finally, we discuss the choice of a normalization method and some qualities of a suitable normalization method in the light of the results of our evaluation. © The Author 2016. Published by Oxford University Press.
Automated Analysis of Renewable Energy Datasets ('EE/RE Data Mining')

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bush, Brian; Elmore, Ryan; Getman, Dan

This poster illustrates methods to substantially improve the understanding of renewable energy data sets and the depth and efficiency of their analysis through the application of statistical learning methods ('data mining') in the intelligent processing of these often large and messy information sources. The six examples apply methods for anomaly detection, data cleansing, and pattern mining to time-series data (measurements from metering points in buildings) and spatiotemporal data (renewable energy resource datasets).
Broadband Studies of Semsmic Sources at Regional and Teleseismic Distances Using Advanced Time Series Analysis Methods. Volume 1.

DTIC Science & Technology

1991-03-21

discussion of spectral factorability and motivations for broadband analysis, the report is subdivided into four main sections. In Section 1.0, we...estimates. The motivation for developing our multi-channel deconvolution method was to gain information about seismic sources, most notably, nuclear...with complex constraints for estimating the rupture history. Such methods (applied mostly to data sets that also include strong rmotion data), were
The NASA/industry Design Analysis Methods for Vibrations (DAMVIBS) program: McDonnell-Douglas Helicopter Company achievements

NASA Technical Reports Server (NTRS)

Toossi, Mostafa; Weisenburger, Richard; Hashemi-Kia, Mostafa

1993-01-01

This paper presents a summary of some of the work performed by McDonnell Douglas Helicopter Company under NASA Langley-sponsored rotorcraft structural dynamics program known as DAMVIBS (Design Analysis Methods for VIBrationS). A set of guidelines which is applicable to dynamic modeling, analysis, testing, and correlation of both helicopter airframes and a large variety of structural finite element models is presented. Utilization of these guidelines and the key features of their applications to vibration modeling of helicopter airframes are discussed. Correlation studies with the test data, together with the development and applications of a set of efficient finite element model checkout procedures, are demonstrated on a large helicopter airframe finite element model. Finally, the lessons learned and the benefits resulting from this program are summarized.
Research on distributed heterogeneous data PCA algorithm based on cloud platform

NASA Astrophysics Data System (ADS)

Zhang, Jin; Huang, Gang

2018-05-01

Principal component analysis (PCA) of heterogeneous data sets can solve the problem that centralized data scalability is limited. In order to reduce the generation of intermediate data and error components of distributed heterogeneous data sets, a principal component analysis algorithm based on heterogeneous data sets under cloud platform is proposed. The algorithm performs eigenvalue processing by using Householder tridiagonalization and QR factorization to calculate the error component of the heterogeneous database associated with the public key to obtain the intermediate data set and the lost information. Experiments on distributed DBM heterogeneous datasets show that the model method has the feasibility and reliability in terms of execution time and accuracy.
The Identification of Software Failure Regions

DTIC Science & Technology

1990-06-01

be used to detect non-obviously redundant test cases. A preliminary examination of the manual analysis method is performed with a set of programs ...failure regions are defined and a method of failure region analysis is described in detail. The thesis describes how this analysis may be used to detect...is the termination of the ability of a functional unit to perform its required function. (Glossary, 1983) The presence of faults in program code
Stability analysis of a liquid fuel annular combustion chamber. M.S. Thesis

NASA Technical Reports Server (NTRS)

Mcdonald, G. H.

1978-01-01

High frequency combustion instability problems in a liquid fuel annular combustion chamber are examined. A modified Galerkin method was used to produce a set of modal amplitude equations from the general nonlinear partial differential acoustic wave equation in order to analyze the problem of instability. From these modal amplitude equations, the two variable perturbation method was used to develop a set of approximate equations of a given order of magnitude. These equations were modeled to show the effects of velocity sensitive combustion instabilities by evaluating the effects of certain parameters in the given set of equations.
Psychotherapy and despair in the prison setting.

PubMed

Gee, Joanna; Loewenthal, Del; Cayne, Julia

2015-01-01

The purpose of this paper is to outline research which aimed to explore psychotherapists' experience of working with despair, in the UK prison setting, through a qualitative phenomenological approach. Within the forensic psychological literature, despair is considered a pathology, associated with suicide and self-harm, resulting from the prisoners histories and the coercive prison setting. In turn, therapeutic writings outline the importance of therapy in the prison setting with despair in providing coping skills, containment and learning opportunities for the prisoners involved. Within the study, ten psychotherapists were interviewed as to their experience of working with clients in despair in the prison setting. The data were analysed via the phenomenological research method Empirical Phenomenological Analysis (EPA), and a secondary analysis through reverie. Through the analysis by EPA, despair emerged in the prison setting as a destabilising phenomenon to which there was no protocol for working with it. Participants also described the prisoners' despair and the despairing prison setting, touching on their own sense of vulnerability and despair. However, drawing on the secondary analysis by reverie, the researcher also became aware of how the phenomenon of despair emerged not simply through the said, but also through the intersubjective. It was therefore through the secondary analysis by reverie that the importance of the attendance to aspects of intersubjectivity in prison research emerged. This paper contributes to the therapeutic writings on despair in the prison setting, alongside holding implications for qualitative research in the prison setting.
Decision Support Methods and Tools

NASA Technical Reports Server (NTRS)

Green, Lawrence L.; Alexandrov, Natalia M.; Brown, Sherilyn A.; Cerro, Jeffrey A.; Gumbert, Clyde r.; Sorokach, Michael R.; Burg, Cecile M.

2006-01-01

This paper is one of a set of papers, developed simultaneously and presented within a single conference session, that are intended to highlight systems analysis and design capabilities within the Systems Analysis and Concepts Directorate (SACD) of the National Aeronautics and Space Administration (NASA) Langley Research Center (LaRC). This paper focuses on the specific capabilities of uncertainty/risk analysis, quantification, propagation, decomposition, and management, robust/reliability design methods, and extensions of these capabilities into decision analysis methods within SACD. These disciplines are discussed together herein under the name of Decision Support Methods and Tools. Several examples are discussed which highlight the application of these methods within current or recent aerospace research at the NASA LaRC. Where applicable, commercially available, or government developed software tools are also discussed
Lesion Border Detection in Dermoscopy Images

PubMed Central

Celebi, M. Emre; Schaefer, Gerald; Iyatomi, Hitoshi; Stoecker, William V.

2009-01-01

Background Dermoscopy is one of the major imaging modalities used in the diagnosis of melanoma and other pigmented skin lesions. Due to the difficulty and subjectivity of human interpretation, computerized analysis of dermoscopy images has become an important research area. One of the most important steps in dermoscopy image analysis is the automated detection of lesion borders. Methods In this article, we present a systematic overview of the recent border detection methods in the literature paying particular attention to computational issues and evaluation aspects. Conclusion Common problems with the existing approaches include the acquisition, size, and diagnostic distribution of the test image set, the evaluation of the results, and the inadequate description of the employed methods. Border determination by dermatologists appears to depend upon higher-level knowledge, therefore it is likely that the incorporation of domain knowledge in automated methods will enable them to perform better, especially in sets of images with a variety of diagnoses. PMID:19121917
A method to eliminate the influence of incident light variations in spectral analysis

NASA Astrophysics Data System (ADS)

Luo, Yongshun; Li, Gang; Fu, Zhigang; Guan, Yang; Zhang, Shengzhao; Lin, Ling

2018-06-01

The intensity of the light source and consistency of the spectrum are the most important factors influencing the accuracy in quantitative spectrometric analysis. An efficient "measuring in layer" method was proposed in this paper to limit the influence of inconsistencies in the intensity and spectrum of the light source. In order to verify the effectiveness of this method, a light source with a variable intensity and spectrum was designed according to Planck's law and Wien's displacement law. Intra-lipid samples with 12 different concentrations were prepared and divided into modeling sets and prediction sets according to different incident lights and solution concentrations. The spectra of each sample were measured with five different light intensities. The experimental results showed that the proposed method was effective in eliminating the influence caused by incident light changes and was more effective than normalized processing.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Wang, Weizhou, E-mail: wzw@lynu.edu.cn, E-mail: ybw@gzu.edu.cn; Zhang, Yu; Sun, Tao

High-level coupled cluster singles, doubles, and perturbative triples [CCSD(T)] computations with up to the aug-cc-pVQZ basis set (1924 basis functions) and various extrapolations toward the complete basis set (CBS) limit are presented for the sandwich, T-shaped, and parallel-displaced benzene⋯naphthalene complex. Using the CCSD(T)/CBS interaction energies as a benchmark, the performance of some newly developed wave function and density functional theory methods has been evaluated. The best performing methods were found to be the dispersion-corrected PBE0 functional (PBE0-D3) and spin-component scaled zeroth-order symmetry-adapted perturbation theory (SCS-SAPT0). The success of SCS-SAPT0 is very encouraging because it provides one method for energy componentmore » analysis of π-stacked complexes with 200 atoms or more. Most newly developed methods do, however, overestimate the interaction energies. The results of energy component analysis show that interaction energies are overestimated mainly due to the overestimation of dispersion energy.« less
Advanced spectrophotometric chemometric methods for resolving the binary mixture of doxylamine succinate and pyridoxine hydrochloride.

PubMed

Katsarov, Plamen; Gergov, Georgi; Alin, Aylin; Pilicheva, Bissera; Al-Degs, Yahya; Simeonov, Vasil; Kassarova, Margarita

2018-03-01

The prediction power of partial least squares (PLS) and multivariate curve resolution-alternating least squares (MCR-ALS) methods have been studied for simultaneous quantitative analysis of the binary drug combination - doxylamine succinate and pyridoxine hydrochloride. Analysis of first-order UV overlapped spectra was performed using different PLS models - classical PLS1 and PLS2 as well as partial robust M-regression (PRM). These linear models were compared to MCR-ALS with equality and correlation constraints (MCR-ALS-CC). All techniques operated within the full spectral region and extracted maximum information for the drugs analysed. The developed chemometric methods were validated on external sample sets and were applied to the analyses of pharmaceutical formulations. The obtained statistical parameters were satisfactory for calibration and validation sets. All developed methods can be successfully applied for simultaneous spectrophotometric determination of doxylamine and pyridoxine both in laboratory-prepared mixtures and commercial dosage forms.
Statistical Methods in Physical Oceanography: Proceedings of ’Aha Huliko’a Hawaiian Winter Workshop Held in Manoa, Hawaii on January 12-15, 1993

DTIC Science & Technology

1993-11-01

field X(t) at time 1. Ti. is the set of all times when both pi and pi have been observed and ni. is the number of elements in T Definition Eq. (22) is...termed contour analysis, for melding of oceanic data and for space-time interpolation of gappy frontal data sets . The key elements of contour analysis...plane and let fl(1) be the set of all straight lines intersecting F. Directly measuring the number of intersections between a random element W E 11(F) and
Container-Based Clinical Solutions for Portable and Reproducible Image Analysis.

PubMed

Matelsky, Jordan; Kiar, Gregory; Johnson, Erik; Rivera, Corban; Toma, Michael; Gray-Roncal, William

2018-05-08

Medical imaging analysis depends on the reproducibility of complex computation. Linux containers enable the abstraction, installation, and configuration of environments so that software can be both distributed in self-contained images and used repeatably by tool consumers. While several initiatives in neuroimaging have adopted approaches for creating and sharing more reliable scientific methods and findings, Linux containers are not yet mainstream in clinical settings. We explore related technologies and their efficacy in this setting, highlight important shortcomings, demonstrate a simple use-case, and endorse the use of Linux containers for medical image analysis.
Multipole moments in the effective fragment potential method

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bertoni, Colleen; Slipchenko, Lyudmila V.; Misquitta, Alston J.

In the effective fragment potential (EFP) method the Coulomb potential is represented using a set of multipole moments generated by the distributed multipole analysis (DMA) method. Misquitta, Stone, and Fazeli recently developed a basis space-iterated stockholder atom (BS-ISA) method to generate multipole moments. This study assesses the accuracy of the EFP interaction energies using sets of multipole moments generated from the BS-ISA method, and from several versions of the DMA method (such as analytic and numeric grid-based), with varying basis sets. Both methods lead to reasonable results, although using certain implementations of the DMA method can result in large errors.more » With respect to the CCSD(T)/CBS interaction energies, the mean unsigned error (MUE) of the EFP method for the S22 data set using BS-ISA–generated multipole moments and DMA-generated multipole moments (using a small basis set and the analytic DMA procedure) is 0.78 and 0.72 kcal/mol, respectively. Here, the MUE accuracy is on the same order as MP2 and SCS-MP2. The MUEs are lower than in a previous study benchmarking the EFP method without the EFP charge transfer term, demonstrating that the charge transfer term increases the accuracy of the EFP method. Regardless of the multipole moment method used, it is likely that much of the error is due to an insufficient short-range electrostatic term (i.e., charge penetration term), as shown by comparisons with symmetry-adapted perturbation theory.« less
Multipole moments in the effective fragment potential method

DOE PAGES

Bertoni, Colleen; Slipchenko, Lyudmila V.; Misquitta, Alston J.; ...

2017-02-17

In the effective fragment potential (EFP) method the Coulomb potential is represented using a set of multipole moments generated by the distributed multipole analysis (DMA) method. Misquitta, Stone, and Fazeli recently developed a basis space-iterated stockholder atom (BS-ISA) method to generate multipole moments. This study assesses the accuracy of the EFP interaction energies using sets of multipole moments generated from the BS-ISA method, and from several versions of the DMA method (such as analytic and numeric grid-based), with varying basis sets. Both methods lead to reasonable results, although using certain implementations of the DMA method can result in large errors.more » With respect to the CCSD(T)/CBS interaction energies, the mean unsigned error (MUE) of the EFP method for the S22 data set using BS-ISA–generated multipole moments and DMA-generated multipole moments (using a small basis set and the analytic DMA procedure) is 0.78 and 0.72 kcal/mol, respectively. Here, the MUE accuracy is on the same order as MP2 and SCS-MP2. The MUEs are lower than in a previous study benchmarking the EFP method without the EFP charge transfer term, demonstrating that the charge transfer term increases the accuracy of the EFP method. Regardless of the multipole moment method used, it is likely that much of the error is due to an insufficient short-range electrostatic term (i.e., charge penetration term), as shown by comparisons with symmetry-adapted perturbation theory.« less
Eigenvalue and eigenvector sensitivity and approximate analysis for repeated eigenvalue problems

NASA Technical Reports Server (NTRS)

Hou, Gene J. W.; Kenny, Sean P.

1991-01-01

A set of computationally efficient equations for eigenvalue and eigenvector sensitivity analysis are derived, and a method for eigenvalue and eigenvector approximate analysis in the presence of repeated eigenvalues is presented. The method developed for approximate analysis involves a reparamaterization of the multivariable structural eigenvalue problem in terms of a single positive-valued parameter. The resulting equations yield first-order approximations of changes in both the eigenvalues and eigenvectors associated with the repeated eigenvalue problem. Examples are given to demonstrate the application of such equations for sensitivity and approximate analysis.
Picture of All Solutions of Successive 2-Block Maxbet Problems

ERIC Educational Resources Information Center

Choulakian, Vartan

2011-01-01

The Maxbet method is a generalized principal components analysis of a data set, where the group structure of the variables is taken into account. Similarly, 3-block[12,13] partial Maxdiff method is a generalization of covariance analysis, where only the covariances between blocks (1, 2) and (1, 3) are taken into account. The aim of this paper is…
The error and bias of supplementing a short, arid climate, rainfall record with regional vs. global frequency analysis

NASA Astrophysics Data System (ADS)

Endreny, Theodore A.; Pashiardis, Stelios

2007-02-01

SummaryRobust and accurate estimates of rainfall frequencies are difficult to make with short, and arid-climate, rainfall records, however new regional and global methods were used to supplement such a constrained 15-34 yr record in Cyprus. The impact of supplementing rainfall frequency analysis with the regional and global approaches was measured with relative bias and root mean square error (RMSE) values. Analysis considered 42 stations with 8 time intervals (5-360 min) in four regions delineated by proximity to sea and elevation. Regional statistical algorithms found the sites passed discordancy tests of coefficient of variation, skewness and kurtosis, while heterogeneity tests revealed the regions were homogeneous to mildly heterogeneous. Rainfall depths were simulated in the regional analysis method 500 times, and then goodness of fit tests identified the best candidate distribution as the general extreme value (GEV) Type II. In the regional analysis, the method of L-moments was used to estimate location, shape, and scale parameters. In the global based analysis, the distribution was a priori prescribed as GEV Type II, a shape parameter was a priori set to 0.15, and a time interval term was constructed to use one set of parameters for all time intervals. Relative RMSE values were approximately equal at 10% for the regional and global method when regions were compared, but when time intervals were compared the global method RMSE had a parabolic-shaped time interval trend. Relative bias values were also approximately equal for both methods when regions were compared, but again a parabolic-shaped time interval trend was found for the global method. The global method relative RMSE and bias trended with time interval, which may be caused by fitting a single scale value for all time intervals.

A statistically harmonized alignment-classification in image space enables accurate and robust alignment of noisy images in single particle analysis.

PubMed

Kawata, Masaaki; Sato, Chikara

2007-06-01

In determining the three-dimensional (3D) structure of macromolecular assemblies in single particle analysis, a large representative dataset of two-dimensional (2D) average images from huge number of raw images is a key for high resolution. Because alignments prior to averaging are computationally intensive, currently available multireference alignment (MRA) software does not survey every possible alignment. This leads to misaligned images, creating blurred averages and reducing the quality of the final 3D reconstruction. We present a new method, in which multireference alignment is harmonized with classification (multireference multiple alignment: MRMA). This method enables a statistical comparison of multiple alignment peaks, reflecting the similarities between each raw image and a set of reference images. Among the selected alignment candidates for each raw image, misaligned images are statistically excluded, based on the principle that aligned raw images of similar projections have a dense distribution around the correctly aligned coordinates in image space. This newly developed method was examined for accuracy and speed using model image sets with various signal-to-noise ratios, and with electron microscope images of the Transient Receptor Potential C3 and the sodium channel. In every data set, the newly developed method outperformed conventional methods in robustness against noise and in speed, creating 2D average images of higher quality. This statistically harmonized alignment-classification combination should greatly improve the quality of single particle analysis.
GO-PCA: An Unsupervised Method to Explore Gene Expression Data Using Prior Knowledge.

PubMed

Wagner, Florian

2015-01-01

Genome-wide expression profiling is a widely used approach for characterizing heterogeneous populations of cells, tissues, biopsies, or other biological specimen. The exploratory analysis of such data typically relies on generic unsupervised methods, e.g. principal component analysis (PCA) or hierarchical clustering. However, generic methods fail to exploit prior knowledge about the molecular functions of genes. Here, I introduce GO-PCA, an unsupervised method that combines PCA with nonparametric GO enrichment analysis, in order to systematically search for sets of genes that are both strongly correlated and closely functionally related. These gene sets are then used to automatically generate expression signatures with functional labels, which collectively aim to provide a readily interpretable representation of biologically relevant similarities and differences. The robustness of the results obtained can be assessed by bootstrapping. I first applied GO-PCA to datasets containing diverse hematopoietic cell types from human and mouse, respectively. In both cases, GO-PCA generated a small number of signatures that represented the majority of lineages present, and whose labels reflected their respective biological characteristics. I then applied GO-PCA to human glioblastoma (GBM) data, and recovered signatures associated with four out of five previously defined GBM subtypes. My results demonstrate that GO-PCA is a powerful and versatile exploratory method that reduces an expression matrix containing thousands of genes to a much smaller set of interpretable signatures. In this way, GO-PCA aims to facilitate hypothesis generation, design of further analyses, and functional comparisons across datasets.
Cluster analysis as a prediction tool for pregnancy outcomes.

PubMed

Banjari, Ines; Kenjerić, Daniela; Šolić, Krešimir; Mandić, Milena L

2015-03-01

Considering specific physiology changes during gestation and thinking of pregnancy as a "critical window", classification of pregnant women at early pregnancy can be considered as crucial. The paper demonstrates the use of a method based on an approach from intelligent data mining, cluster analysis. Cluster analysis method is a statistical method which makes possible to group individuals based on sets of identifying variables. The method was chosen in order to determine possibility for classification of pregnant women at early pregnancy to analyze unknown correlations between different variables so that the certain outcomes could be predicted. 222 pregnant women from two general obstetric offices' were recruited. The main orient was set on characteristics of these pregnant women: their age, pre-pregnancy body mass index (BMI) and haemoglobin value. Cluster analysis gained a 94.1% classification accuracy rate with three branch- es or groups of pregnant women showing statistically significant correlations with pregnancy outcomes. The results are showing that pregnant women both of older age and higher pre-pregnancy BMI have a significantly higher incidence of delivering baby of higher birth weight but they gain significantly less weight during pregnancy. Their babies are also longer, and these women have significantly higher probability for complications during pregnancy (gestosis) and higher probability of induced or caesarean delivery. We can conclude that the cluster analysis method can appropriately classify pregnant women at early pregnancy to predict certain outcomes.
Bio++: a set of C++ libraries for sequence analysis, phylogenetics, molecular evolution and population genetics.

PubMed

Dutheil, Julien; Gaillard, Sylvain; Bazin, Eric; Glémin, Sylvain; Ranwez, Vincent; Galtier, Nicolas; Belkhir, Khalid

2006-04-04

A large number of bioinformatics applications in the fields of bio-sequence analysis, molecular evolution and population genetics typically share input/output methods, data storage requirements and data analysis algorithms. Such common features may be conveniently bundled into re-usable libraries, which enable the rapid development of new methods and robust applications. We present Bio++, a set of Object Oriented libraries written in C++. Available components include classes for data storage and handling (nucleotide/amino-acid/codon sequences, trees, distance matrices, population genetics datasets), various input/output formats, basic sequence manipulation (concatenation, transcription, translation, etc.), phylogenetic analysis (maximum parsimony, markov models, distance methods, likelihood computation and maximization), population genetics/genomics (diversity statistics, neutrality tests, various multi-locus analyses) and various algorithms for numerical calculus. Implementation of methods aims at being both efficient and user-friendly. A special concern was given to the library design to enable easy extension and new methods development. We defined a general hierarchy of classes that allow the developer to implement its own algorithms while remaining compatible with the rest of the libraries. Bio++ source code is distributed free of charge under the CeCILL general public licence from its website http://kimura.univ-montp2.fr/BioPP.
Two-dimensional wavelet analysis based classification of gas chromatogram differential mobility spectrometry signals.

PubMed

Zhao, Weixiang; Sankaran, Shankar; Ibáñez, Ana M; Dandekar, Abhaya M; Davis, Cristina E

2009-08-04

This study introduces two-dimensional (2-D) wavelet analysis to the classification of gas chromatogram differential mobility spectrometry (GC/DMS) data which are composed of retention time, compensation voltage, and corresponding intensities. One reported method to process such large data sets is to convert 2-D signals to 1-D signals by summing intensities either across retention time or compensation voltage, but it can lose important signal information in one data dimension. A 2-D wavelet analysis approach keeps the 2-D structure of original signals, while significantly reducing data size. We applied this feature extraction method to 2-D GC/DMS signals measured from control and disordered fruit and then employed two typical classification algorithms to testify the effects of the resultant features on chemical pattern recognition. Yielding a 93.3% accuracy of separating data from control and disordered fruit samples, 2-D wavelet analysis not only proves its feasibility to extract feature from original 2-D signals but also shows its superiority over the conventional feature extraction methods including converting 2-D to 1-D and selecting distinguishable pixels from training set. Furthermore, this process does not require coupling with specific pattern recognition methods, which may help ensure wide applications of this method to 2-D spectrometry data.
A method for the automated detection phishing websites through both site characteristics and image analysis

NASA Astrophysics Data System (ADS)

White, Joshua S.; Matthews, Jeanna N.; Stacy, John L.

2012-06-01

Phishing website analysis is largely still a time-consuming manual process of discovering potential phishing sites, verifying if suspicious sites truly are malicious spoofs and if so, distributing their URLs to the appropriate blacklisting services. Attackers increasingly use sophisticated systems for bringing phishing sites up and down rapidly at new locations, making automated response essential. In this paper, we present a method for rapid, automated detection and analysis of phishing websites. Our method relies on near real-time gathering and analysis of URLs posted on social media sites. We fetch the pages pointed to by each URL and characterize each page with a set of easily computed values such as number of images and links. We also capture a screen-shot of the rendered page image, compute a hash of the image and use the Hamming distance between these image hashes as a form of visual comparison. We provide initial results demonstrate the feasibility of our techniques by comparing legitimate sites to known fraudulent versions from Phishtank.com, by actively introducing a series of minor changes to a phishing toolkit captured in a local honeypot and by performing some initial analysis on a set of over 2.8 million URLs posted to Twitter over a 4 days in August 2011. We discuss the issues encountered during our testing such as resolvability and legitimacy of URL's posted on Twitter, the data sets used, the characteristics of the phishing sites we discovered, and our plans for future work.
Trend analysis of salt load and evaluation of the frequency of water-quality measurements for the Gunnison, the Colorado, and the Dolores rivers in Colorado and Utah

USGS Publications Warehouse

Kircher, J.E.; Dinicola, Richard S.; Middelburg, R.F.

1984-01-01

Monthly values were computed for water-quality constituents at four streamflow gaging stations in the Upper Colorado River basin for the determination of trends. Seasonal regression and seasonal Kendall trend analysis techniques were applied to two monthly data sets at each station site for four different time periods. A recently developed method for determining optimal water-discharge data-collection frequency was also applied to the monthly water-quality data. Trend analysis results varied with each monthly load computational method, period of record, and trend detection model used. No conclusions could be reached regarding which computational method was best to use in trend analysis. Time-period selection for analysis was found to be important with regard to intended use of the results. Seasonal Kendall procedures were found to be applicable to most data sets. Seasonal regression models were more difficult to apply and were sometimes of questionable validity; however, those results were more informative than seasonal Kendall results. The best model to use depends upon the characteristics of the data and the amount of trend information needed. The measurement-frequency optimization method had potential for application to water-quality data, but refinements are needed. (USGS)
Removing cosmic spikes using a hyperspectral upper-bound spectrum method

DOE PAGES

Anthony, Stephen Michael; Timlin, Jerilyn A.

2016-11-04

Cosmic ray spikes are especially problematic for hyperspectral imaging because of the large number of spikes often present and their negative effects upon subsequent chemometric analysis. Fortunately, while the large number of spectra acquired in a hyperspectral imaging data set increases the probability and number of cosmic spikes observed, the multitude of spectra can also aid in the effective recognition and removal of the cosmic spikes. Zhang and Ben-Amotz were perhaps the first to leverage the additional spatial dimension of hyperspectral data matrices (DM). They integrated principal component analysis (PCA) into the upper bound spectrum method (UBS), resulting in amore » hybrid method (UBS-DM) for hyperspectral images. Here, we expand upon their use of PCA, recognizing that principal components primarily present in only a few pixels most likely correspond to cosmic spikes. Eliminating the contribution of those principal components in those pixels improves the cosmic spike removal. Both simulated and experimental hyperspectral Raman image data sets are used to test the newly developed UBS-DM-hyperspectral (UBS-DM-HS) method which extends the UBS-DM method by leveraging characteristics of hyperspectral data sets. As a result, a comparison is provided between the performance of the UBS-DM-HS method and other methods suitable for despiking hyperspectral images, evaluating both their ability to remove cosmic ray spikes and the extent to which they introduce spectral bias.« less
Removing cosmic spikes using a hyperspectral upper-bound spectrum method

DOE Office of Scientific and Technical Information (OSTI.GOV)

Anthony, Stephen Michael; Timlin, Jerilyn A.

Cosmic ray spikes are especially problematic for hyperspectral imaging because of the large number of spikes often present and their negative effects upon subsequent chemometric analysis. Fortunately, while the large number of spectra acquired in a hyperspectral imaging data set increases the probability and number of cosmic spikes observed, the multitude of spectra can also aid in the effective recognition and removal of the cosmic spikes. Zhang and Ben-Amotz were perhaps the first to leverage the additional spatial dimension of hyperspectral data matrices (DM). They integrated principal component analysis (PCA) into the upper bound spectrum method (UBS), resulting in amore » hybrid method (UBS-DM) for hyperspectral images. Here, we expand upon their use of PCA, recognizing that principal components primarily present in only a few pixels most likely correspond to cosmic spikes. Eliminating the contribution of those principal components in those pixels improves the cosmic spike removal. Both simulated and experimental hyperspectral Raman image data sets are used to test the newly developed UBS-DM-hyperspectral (UBS-DM-HS) method which extends the UBS-DM method by leveraging characteristics of hyperspectral data sets. As a result, a comparison is provided between the performance of the UBS-DM-HS method and other methods suitable for despiking hyperspectral images, evaluating both their ability to remove cosmic ray spikes and the extent to which they introduce spectral bias.« less
Removing Cosmic Spikes Using a Hyperspectral Upper-Bound Spectrum Method.

PubMed

Anthony, Stephen M; Timlin, Jerilyn A

2017-03-01

Cosmic ray spikes are especially problematic for hyperspectral imaging because of the large number of spikes often present and their negative effects upon subsequent chemometric analysis. Fortunately, while the large number of spectra acquired in a hyperspectral imaging data set increases the probability and number of cosmic spikes observed, the multitude of spectra can also aid in the effective recognition and removal of the cosmic spikes. Zhang and Ben-Amotz were perhaps the first to leverage the additional spatial dimension of hyperspectral data matrices (DM). They integrated principal component analysis (PCA) into the upper bound spectrum method (UBS), resulting in a hybrid method (UBS-DM) for hyperspectral images. Here, we expand upon their use of PCA, recognizing that principal components primarily present in only a few pixels most likely correspond to cosmic spikes. Eliminating the contribution of those principal components in those pixels improves the cosmic spike removal. Both simulated and experimental hyperspectral Raman image data sets are used to test the newly developed UBS-DM-hyperspectral (UBS-DM-HS) method which extends the UBS-DM method by leveraging characteristics of hyperspectral data sets. A comparison is provided between the performance of the UBS-DM-HS method and other methods suitable for despiking hyperspectral images, evaluating both their ability to remove cosmic ray spikes and the extent to which they introduce spectral bias.
Between-centre variability in transfer function analysis, a widely used method for linear quantification of the dynamic pressure–flow relation: The CARNet study

PubMed Central

Meel-van den Abeelen, Aisha S.S.; Simpson, David M.; Wang, Lotte J.Y.; Slump, Cornelis H.; Zhang, Rong; Tarumi, Takashi; Rickards, Caroline A.; Payne, Stephen; Mitsis, Georgios D.; Kostoglou, Kyriaki; Marmarelis, Vasilis; Shin, Dae; Tzeng, Yu-Chieh; Ainslie, Philip N.; Gommer, Erik; Müller, Martin; Dorado, Alexander C.; Smielewski, Peter; Yelicich, Bernardo; Puppo, Corina; Liu, Xiuyun; Czosnyka, Marek; Wang, Cheng-Yen; Novak, Vera; Panerai, Ronney B.; Claassen, Jurgen A.H.R.

2014-01-01

Transfer function analysis (TFA) is a frequently used method to assess dynamic cerebral autoregulation (CA) using spontaneous oscillations in blood pressure (BP) and cerebral blood flow velocity (CBFV). However, controversies and variations exist in how research groups utilise TFA, causing high variability in interpretation. The objective of this study was to evaluate between-centre variability in TFA outcome metrics. 15 centres analysed the same 70 BP and CBFV datasets from healthy subjects (n = 50 rest; n = 20 during hypercapnia); 10 additional datasets were computer-generated. Each centre used their in-house TFA methods; however, certain parameters were specified to reduce a priori between-centre variability. Hypercapnia was used to assess discriminatory performance and synthetic data to evaluate effects of parameter settings. Results were analysed using the Mann–Whitney test and logistic regression. A large non-homogeneous variation was found in TFA outcome metrics between the centres. Logistic regression demonstrated that 11 centres were able to distinguish between normal and impaired CA with an AUC > 0.85. Further analysis identified TFA settings that are associated with large variation in outcome measures. These results indicate the need for standardisation of TFA settings in order to reduce between-centre variability and to allow accurate comparison between studies. Suggestions on optimal signal processing methods are proposed. PMID:24725709
Bias-Free Chemically Diverse Test Sets from Machine Learning.

PubMed

Swann, Ellen T; Fernandez, Michael; Coote, Michelle L; Barnard, Amanda S

2017-08-14

Current benchmarking methods in quantum chemistry rely on databases that are built using a chemist's intuition. It is not fully understood how diverse or representative these databases truly are. Multivariate statistical techniques like archetypal analysis and K-means clustering have previously been used to summarize large sets of nanoparticles however molecules are more diverse and not as easily characterized by descriptors. In this work, we compare three sets of descriptors based on the one-, two-, and three-dimensional structure of a molecule. Using data from the NIST Computational Chemistry Comparison and Benchmark Database and machine learning techniques, we demonstrate the functional relationship between these structural descriptors and the electronic energy of molecules. Archetypes and prototypes found with topological or Coulomb matrix descriptors can be used to identify smaller, statistically significant test sets that better capture the diversity of chemical space. We apply this same method to find a diverse subset of organic molecules to demonstrate how the methods can easily be reapplied to individual research projects. Finally, we use our bias-free test sets to assess the performance of density functional theory and quantum Monte Carlo methods.
Rapid differentiation of Ghana cocoa beans by FT-NIR spectroscopy coupled with multivariate classification

NASA Astrophysics Data System (ADS)

Teye, Ernest; Huang, Xingyi; Dai, Huang; Chen, Quansheng

2013-10-01

Quick, accurate and reliable technique for discrimination of cocoa beans according to geographical origin is essential for quality control and traceability management. This current study presents the application of Near Infrared Spectroscopy technique and multivariate classification for the differentiation of Ghana cocoa beans. A total of 194 cocoa bean samples from seven cocoa growing regions were used. Principal component analysis (PCA) was used to extract relevant information from the spectral data and this gave visible cluster trends. The performance of four multivariate classification methods: Linear discriminant analysis (LDA), K-nearest neighbors (KNN), Back propagation artificial neural network (BPANN) and Support vector machine (SVM) were compared. The performances of the models were optimized by cross validation. The results revealed that; SVM model was superior to all the mathematical methods with a discrimination rate of 100% in both the training and prediction set after preprocessing with Mean centering (MC). BPANN had a discrimination rate of 99.23% for the training set and 96.88% for prediction set. While LDA model had 96.15% and 90.63% for the training and prediction sets respectively. KNN model had 75.01% for the training set and 72.31% for prediction set. The non-linear classification methods used were superior to the linear ones. Generally, the results revealed that NIR Spectroscopy coupled with SVM model could be used successfully to discriminate cocoa beans according to their geographical origins for effective quality assurance.
Next-generation text-mining mediated generation of chemical response-specific gene sets for interpretation of gene expression data.

PubMed

Hettne, Kristina M; Boorsma, André; van Dartel, Dorien A M; Goeman, Jelle J; de Jong, Esther; Piersma, Aldert H; Stierum, Rob H; Kleinjans, Jos C; Kors, Jan A

2013-01-29

Availability of chemical response-specific lists of genes (gene sets) for pharmacological and/or toxic effect prediction for compounds is limited. We hypothesize that more gene sets can be created by next-generation text mining (next-gen TM), and that these can be used with gene set analysis (GSA) methods for chemical treatment identification, for pharmacological mechanism elucidation, and for comparing compound toxicity profiles. We created 30,211 chemical response-specific gene sets for human and mouse by next-gen TM, and derived 1,189 (human) and 588 (mouse) gene sets from the Comparative Toxicogenomics Database (CTD). We tested for significant differential expression (SDE) (false discovery rate -corrected p-values < 0.05) of the next-gen TM-derived gene sets and the CTD-derived gene sets in gene expression (GE) data sets of five chemicals (from experimental models). We tested for SDE of gene sets for six fibrates in a peroxisome proliferator-activated receptor alpha (PPARA) knock-out GE dataset and compared to results from the Connectivity Map. We tested for SDE of 319 next-gen TM-derived gene sets for environmental toxicants in three GE data sets of triazoles, and tested for SDE of 442 gene sets associated with embryonic structures. We compared the gene sets to triazole effects seen in the Whole Embryo Culture (WEC), and used principal component analysis (PCA) to discriminate triazoles from other chemicals. Next-gen TM-derived gene sets matching the chemical treatment were significantly altered in three GE data sets, and the corresponding CTD-derived gene sets were significantly altered in five GE data sets. Six next-gen TM-derived and four CTD-derived fibrate gene sets were significantly altered in the PPARA knock-out GE dataset. None of the fibrate signatures in cMap scored significant against the PPARA GE signature. 33 environmental toxicant gene sets were significantly altered in the triazole GE data sets. 21 of these toxicants had a similar toxicity pattern as the triazoles. We confirmed embryotoxic effects, and discriminated triazoles from other chemicals. Gene set analysis with next-gen TM-derived chemical response-specific gene sets is a scalable method for identifying similarities in gene responses to other chemicals, from which one may infer potential mode of action and/or toxic effect.
Methodology and software to detect viral integration site hot-spots

PubMed Central

2011-01-01

Background Modern gene therapy methods have limited control over where a therapeutic viral vector inserts into the host genome. Vector integration can activate local gene expression, which can cause cancer if the vector inserts near an oncogene. Viral integration hot-spots or 'common insertion sites' (CIS) are scrutinized to evaluate and predict patient safety. CIS are typically defined by a minimum density of insertions (such as 2-4 within a 30-100 kb region), which unfortunately depends on the total number of observed VIS. This is problematic for comparing hot-spot distributions across data sets and patients, where the VIS numbers may vary. Results We develop two new methods for defining hot-spots that are relatively independent of data set size. Both methods operate on distributions of VIS across consecutive 1 Mb 'bins' of the genome. The first method 'z-threshold' tallies the number of VIS per bin, converts these counts to z-scores, and applies a threshold to define high density bins. The second method 'BCP' applies a Bayesian change-point model to the z-scores to define hot-spots. The novel hot-spot methods are compared with a conventional CIS method using simulated data sets and data sets from five published human studies, including the X-linked ALD (adrenoleukodystrophy), CGD (chronic granulomatous disease) and SCID-X1 (X-linked severe combined immunodeficiency) trials. The BCP analysis of the human X-linked ALD data for two patients separately (774 and 1627 VIS) and combined (2401 VIS) resulted in 5-6 hot-spots covering 0.17-0.251% of the genome and containing 5.56-7.74% of the total VIS. In comparison, the CIS analysis resulted in 12-110 hot-spots covering 0.018-0.246% of the genome and containing 5.81-22.7% of the VIS, corresponding to a greater number of hot-spots as the data set size increased. Our hot-spot methods enable one to evaluate the extent of VIS clustering, and formally compare data sets in terms of hot-spot overlap. Finally, we show that the BCP hot-spots from the repopulating samples coincide with greater gene and CpG island density than the median genome density. Conclusions The z-threshold and BCP methods are useful for comparing hot-spot patterns across data sets of disparate sizes. The methodology and software provided here should enable one to study hot-spot conservation across a variety of VIS data sets and evaluate vector safety for gene therapy trials. PMID:21914224
Social network analysis identified central outcomes for core outcome sets using systematic reviews of HIV/AIDS.

PubMed

Saldanha, Ian J; Li, Tianjing; Yang, Cui; Ugarte-Gil, Cesar; Rutherford, George W; Dickersin, Kay

2016-02-01

Methods to develop core outcome sets, the minimum outcomes that should be measured in research in a topic area, vary. We applied social network analysis methods to understand outcome co-occurrence patterns in human immunodeficiency virus (HIV)/AIDS systematic reviews and identify outcomes central to the network of outcomes in HIV/AIDS. We examined all Cochrane reviews of HIV/AIDS as of June 2013. We defined a tie as two outcomes (nodes) co-occurring in ≥2 reviews. To identify central outcomes, we used normalized node betweenness centrality (nNBC) (the extent to which connections between other outcomes in a network rely on that outcome as an intermediary). We conducted a subgroup analysis by HIV/AIDS intervention type (i.e., clinical management, biomedical prevention, behavioral prevention, and health services). The 140 included reviews examined 1,140 outcomes, 294 of which were unique. The most central outcome overall was all-cause mortality (nNBC = 23.9). The most central and most frequent outcomes differed overall and within subgroups. For example, "adverse events (specified)" was among the most central but not among the most frequent outcomes, overall. Social network analysis methods are a novel application to identify central outcomes, which provides additional information potentially useful for developing core outcome sets. Copyright © 2016 Elsevier Inc. All rights reserved.
A self-organizing Lagrangian particle method for adaptive-resolution advection-diffusion simulations

NASA Astrophysics Data System (ADS)

Reboux, Sylvain; Schrader, Birte; Sbalzarini, Ivo F.

2012-05-01

We present a novel adaptive-resolution particle method for continuous parabolic problems. In this method, particles self-organize in order to adapt to local resolution requirements. This is achieved by pseudo forces that are designed so as to guarantee that the solution is always well sampled and that no holes or clusters develop in the particle distribution. The particle sizes are locally adapted to the length scale of the solution. Differential operators are consistently evaluated on the evolving set of irregularly distributed particles of varying sizes using discretization-corrected operators. The method does not rely on any global transforms or mapping functions. After presenting the method and its error analysis, we demonstrate its capabilities and limitations on a set of two- and three-dimensional benchmark problems. These include advection-diffusion, the Burgers equation, the Buckley-Leverett five-spot problem, and curvature-driven level-set surface refinement.
Brain medical image diagnosis based on corners with importance-values.

PubMed

Gao, Linlin; Pan, Haiwei; Li, Qing; Xie, Xiaoqin; Zhang, Zhiqiang; Han, Jinming; Zhai, Xiao

2017-11-21

Brain disorders are one of the top causes of human death. Generally, neurologists analyze brain medical images for diagnosis. In the image analysis field, corners are one of the most important features, which makes corner detection and matching studies essential. However, existing corner detection studies do not consider the domain information of brain. This leads to many useless corners and the loss of significant information. Regarding corner matching, the uncertainty and structure of brain are not employed in existing methods. Moreover, most corner matching studies are used for 3D image registration. They are inapplicable for 2D brain image diagnosis because of the different mechanisms. To address these problems, we propose a novel corner-based brain medical image classification method. Specifically, we automatically extract multilayer texture images (MTIs) which embody diagnostic information from neurologists. Moreover, we present a corner matching method utilizing the uncertainty and structure of brain medical images and a bipartite graph model. Finally, we propose a similarity calculation method for diagnosis. Brain CT and MRI image sets are utilized to evaluate the proposed method. First, classifiers are trained in N-fold cross-validation analysis to produce the best θ and K. Then independent brain image sets are tested to evaluate the classifiers. Moreover, the classifiers are also compared with advanced brain image classification studies. For the brain CT image set, the proposed classifier outperforms the comparison methods by at least 8% on accuracy and 2.4% on F1-score. Regarding the brain MRI image set, the proposed classifier is superior to the comparison methods by more than 7.3% on accuracy and 4.9% on F1-score. Results also demonstrate that the proposed method is robust to different intensity ranges of brain medical image. In this study, we develop a robust corner-based brain medical image classifier. Specifically, we propose a corner detection method utilizing the diagnostic information from neurologists and a corner matching method based on the uncertainty and structure of brain medical images. Additionally, we present a similarity calculation method for brain image classification. Experimental results on two brain image sets show the proposed corner-based brain medical image classifier outperforms the state-of-the-art studies.
Fiber estimation and tractography in diffusion MRI: Development of simulated brain images and comparison of multi-fiber analysis methods at clinical b-values

PubMed Central

Wilkins, Bryce; Lee, Namgyun; Gajawelli, Niharika; Law, Meng; Leporé, Natasha

2015-01-01

Advances in diffusion-weighted magnetic resonance imaging (DW-MRI) have led to many alternative diffusion sampling strategies and analysis methodologies. A common objective among methods is estimation of white matter fiber orientations within each voxel, as doing so permits in-vivo fiber-tracking and the ability to study brain connectivity and networks. Knowledge of how DW-MRI sampling schemes affect fiber estimation accuracy, and consequently tractography and the ability to recover complex white-matter pathways, as well as differences between results due to choice of analysis method and which method(s) perform optimally for specific data sets, all remain important problems, especially as tractography-based studies become common. In this work we begin to address these concerns by developing sets of simulated diffusion-weighted brain images which we then use to quantitatively evaluate the performance of six DW-MRI analysis methods in terms of estimated fiber orientation accuracy, false-positive (spurious) and false-negative (missing) fiber rates, and fiber-tracking. The analysis methods studied are: 1) a two-compartment “ball and stick” model (BSM) (Behrens et al., 2003); 2) a non-negativity constrained spherical deconvolution (CSD) approach (Tournier et al., 2007); 3) analytical q-ball imaging (QBI) (Descoteaux et al., 2007); 4) q-ball imaging with Funk-Radon and Cosine Transform (FRACT) (Haldar and Leahy, 2013); 5) q-ball imaging within constant solid angle (CSA) (Aganj et al., 2010); and 6) a generalized Fourier transform approach known as generalized q-sampling imaging (GQI) (Yeh et al., 2010). We investigate these methods using 20, 30, 40, 60, 90 and 120 evenly distributed q-space samples of a single shell, and focus on a signal-to-noise ratio (SNR = 18) and diffusion-weighting (b = 1000 s/mm2) common to clinical studies. We found the BSM and CSD methods consistently yielded the least fiber orientation error and simultaneously greatest detection rate of fibers. Fiber detection rate was found to be the most distinguishing characteristic between the methods, and a significant factor for complete recovery of tractography through complex white-matter pathways. For example, while all methods recovered similar tractography of prominent white matter pathways of limited fiber crossing, CSD (which had the highest fiber detection rate, especially for voxels containing three fibers) recovered the greatest number of fibers and largest fraction of correct tractography for a complex three-fiber crossing region. The synthetic data sets, ground-truth, and tools for quantitative evaluation are publically available on the NITRC website as the project “Simulated DW-MRI Brain Data Sets for Quantitative Evaluation of Estimated Fiber Orientations” at http://www.nitrc.org/projects/sim_dwi_brain PMID:25555998
Fiber estimation and tractography in diffusion MRI: development of simulated brain images and comparison of multi-fiber analysis methods at clinical b-values.

PubMed

Wilkins, Bryce; Lee, Namgyun; Gajawelli, Niharika; Law, Meng; Leporé, Natasha

2015-04-01

Advances in diffusion-weighted magnetic resonance imaging (DW-MRI) have led to many alternative diffusion sampling strategies and analysis methodologies. A common objective among methods is estimation of white matter fiber orientations within each voxel, as doing so permits in-vivo fiber-tracking and the ability to study brain connectivity and networks. Knowledge of how DW-MRI sampling schemes affect fiber estimation accuracy, tractography and the ability to recover complex white-matter pathways, differences between results due to choice of analysis method, and which method(s) perform optimally for specific data sets, all remain important problems, especially as tractography-based studies become common. In this work, we begin to address these concerns by developing sets of simulated diffusion-weighted brain images which we then use to quantitatively evaluate the performance of six DW-MRI analysis methods in terms of estimated fiber orientation accuracy, false-positive (spurious) and false-negative (missing) fiber rates, and fiber-tracking. The analysis methods studied are: 1) a two-compartment "ball and stick" model (BSM) (Behrens et al., 2003); 2) a non-negativity constrained spherical deconvolution (CSD) approach (Tournier et al., 2007); 3) analytical q-ball imaging (QBI) (Descoteaux et al., 2007); 4) q-ball imaging with Funk-Radon and Cosine Transform (FRACT) (Haldar and Leahy, 2013); 5) q-ball imaging within constant solid angle (CSA) (Aganj et al., 2010); and 6) a generalized Fourier transform approach known as generalized q-sampling imaging (GQI) (Yeh et al., 2010). We investigate these methods using 20, 30, 40, 60, 90 and 120 evenly distributed q-space samples of a single shell, and focus on a signal-to-noise ratio (SNR = 18) and diffusion-weighting (b = 1000 s/mm(2)) common to clinical studies. We found that the BSM and CSD methods consistently yielded the least fiber orientation error and simultaneously greatest detection rate of fibers. Fiber detection rate was found to be the most distinguishing characteristic between the methods, and a significant factor for complete recovery of tractography through complex white-matter pathways. For example, while all methods recovered similar tractography of prominent white matter pathways of limited fiber crossing, CSD (which had the highest fiber detection rate, especially for voxels containing three fibers) recovered the greatest number of fibers and largest fraction of correct tractography for complex three-fiber crossing regions. The synthetic data sets, ground-truth, and tools for quantitative evaluation are publically available on the NITRC website as the project "Simulated DW-MRI Brain Data Sets for Quantitative Evaluation of Estimated Fiber Orientations" at http://www.nitrc.org/projects/sim_dwi_brain. Copyright © 2014 Elsevier Inc. All rights reserved.

Linear regression models and k-means clustering for statistical analysis of fNIRS data.

PubMed

Bonomini, Viola; Zucchelli, Lucia; Re, Rebecca; Ieva, Francesca; Spinelli, Lorenzo; Contini, Davide; Paganoni, Anna; Torricelli, Alessandro

2015-02-01

We propose a new algorithm, based on a linear regression model, to statistically estimate the hemodynamic activations in fNIRS data sets. The main concern guiding the algorithm development was the minimization of assumptions and approximations made on the data set for the application of statistical tests. Further, we propose a K-means method to cluster fNIRS data (i.e. channels) as activated or not activated. The methods were validated both on simulated and in vivo fNIRS data. A time domain (TD) fNIRS technique was preferred because of its high performances in discriminating cortical activation and superficial physiological changes. However, the proposed method is also applicable to continuous wave or frequency domain fNIRS data sets.
Linear regression models and k-means clustering for statistical analysis of fNIRS data

PubMed Central

Bonomini, Viola; Zucchelli, Lucia; Re, Rebecca; Ieva, Francesca; Spinelli, Lorenzo; Contini, Davide; Paganoni, Anna; Torricelli, Alessandro

2015-01-01

We propose a new algorithm, based on a linear regression model, to statistically estimate the hemodynamic activations in fNIRS data sets. The main concern guiding the algorithm development was the minimization of assumptions and approximations made on the data set for the application of statistical tests. Further, we propose a K-means method to cluster fNIRS data (i.e. channels) as activated or not activated. The methods were validated both on simulated and in vivo fNIRS data. A time domain (TD) fNIRS technique was preferred because of its high performances in discriminating cortical activation and superficial physiological changes. However, the proposed method is also applicable to continuous wave or frequency domain fNIRS data sets. PMID:25780751
IMPLICATIONS OF USING ROBUST BAYESIAN ANALYSIS TO REPRESENT DIVERSE SOURCES OF UNCERTAINTY IN INTEGRATED ASSESSMENT

EPA Science Inventory

In our previous research, we showed that robust Bayesian methods can be used in environmental modeling to define a set of probability distributions for key parameters that captures the effects of expert disagreement, ambiguity, or ignorance. This entire set can then be update...
Determination of butter adulteration with margarine using Raman spectroscopy.

PubMed

Uysal, Reyhan Selin; Boyaci, Ismail Hakki; Genis, Hüseyin Efe; Tamer, Ugur

2013-12-15

In this study, adulteration of butter with margarine was analysed using Raman spectroscopy combined with chemometric methods (principal component analysis (PCA), principal component regression (PCR), partial least squares (PLS)) and artificial neural networks (ANNs). Different butter and margarine samples were mixed at various concentrations ranging from 0% to 100% w/w. PCA analysis was applied for the classification of butters, margarines and mixtures. PCR, PLS and ANN were used for the detection of adulteration ratios of butter. Models were created using a calibration data set and developed models were evaluated using a validation data set. The coefficient of determination (R(2)) values between actual and predicted values obtained for PCR, PLS and ANN for the validation data set were 0.968, 0.987 and 0.978, respectively. In conclusion, a combination of Raman spectroscopy with chemometrics and ANN methods can be applied for testing butter adulteration. Copyright © 2013 Elsevier Ltd. All rights reserved.
ExAtlas: An interactive online tool for meta-analysis of gene expression data.

PubMed

Sharov, Alexei A; Schlessinger, David; Ko, Minoru S H

2015-12-01

We have developed ExAtlas, an on-line software tool for meta-analysis and visualization of gene expression data. In contrast to existing software tools, ExAtlas compares multi-component data sets and generates results for all combinations (e.g. all gene expression profiles versus all Gene Ontology annotations). ExAtlas handles both users' own data and data extracted semi-automatically from the public repository (GEO/NCBI database). ExAtlas provides a variety of tools for meta-analyses: (1) standard meta-analysis (fixed effects, random effects, z-score, and Fisher's methods); (2) analyses of global correlations between gene expression data sets; (3) gene set enrichment; (4) gene set overlap; (5) gene association by expression profile; (6) gene specificity; and (7) statistical analysis (ANOVA, pairwise comparison, and PCA). ExAtlas produces graphical outputs, including heatmaps, scatter-plots, bar-charts, and three-dimensional images. Some of the most widely used public data sets (e.g. GNF/BioGPS, Gene Ontology, KEGG, GAD phenotypes, BrainScan, ENCODE ChIP-seq, and protein-protein interaction) are pre-loaded and can be used for functional annotations.
[Proposal of a method for collective analysis of work-related accidents in the hospital setting].

PubMed

Osório, Claudia; Machado, Jorge Mesquita Huet; Minayo-Gomez, Carlos

2005-01-01

The article presents a method for the analysis of work-related accidents in hospitals, with the double aim of analyzing accidents in light of actual work activity and enhancing the vitality of the various professions that comprise hospital work. This process involves both research and intervention, combining knowledge output with training of health professionals, fostering expanded participation by workers in managing their daily work. The method consists of stimulating workers to recreate the situation in which a given accident occurred, shifting themselves to the position of observers of their own work. In the first stage of analysis, workers are asked to show the work analyst how the accident occurred; in the second stage, the work accident victim and analyst jointly record the described series of events in a diagram; in the third, the resulting record is re-discussed and further elaborated; in the fourth, the work accident victim and analyst evaluate and implement measures aimed to prevent the accident from recurring. The article concludes by discussing the method's possibilities and limitations in the hospital setting.
The identification of solar wind waves at discrete frequencies and the role of the spectral analysis techniques

NASA Astrophysics Data System (ADS)

Di Matteo, S.; Villante, U.

2017-05-01

The occurrence of waves at discrete frequencies in the solar wind (SW) parameters has been reported in the scientific literature with some controversial results, mostly concerning the existence (and stability) of favored sets of frequencies. On the other hand, the experimental results might be influenced by the analytical methods adopted for the spectral analysis. We focused attention on the fluctuations of the SW dynamic pressure (PSW) occurring in the leading edges of streams following interplanetary shocks and compared the results of the Welch method (WM) with those of the multitaper method (MTM). The results of a simulation analysis demonstrate that the identification of the wave occurrence and the frequency estimate might be strongly influenced by the signal characteristics and analytical methods, especially in the presence of multicomponent signals. In SW streams, PSW oscillations are routinely detected in the entire range f ≈ 1.2-5.0 mHz; nevertheless, the WM/MTM agreement in the identification and frequency estimate occurs in ≈50% of events and different sets of favored frequencies would be proposed for the same set of events by the WM and MTM analysis. The histogram of the frequency distribution of the events identified by both methods suggests more relevant percentages between f ≈ 1.7-1.9, f ≈ 2.7-3.4, and f ≈ 3.9-4.4 (with a most relevant peak at f ≈ 4.2 mHz). Extremely severe thresholds select a small number (14) of remarkable events, with a one-to-one correspondence between WM and MTM: interestingly, these events reveal a tendency for a favored occurrence in bins centered at f ≈ 2.9 and at f ≈ 4.2 mHz.
A method to evaluate performance reliability of individual subjects in laboratory research applied to work settings.

DOT National Transportation Integrated Search

1978-10-01

This report presents a method that may be used to evaluate the reliability of performance of individual subjects, particularly in applied laboratory research. The method is based on analysis of variance of a tasks-by-subjects data matrix, with all sc...
Testing Different Model Building Procedures Using Multiple Regression.

ERIC Educational Resources Information Center

Thayer, Jerome D.

The stepwise regression method of selecting predictors for computer assisted multiple regression analysis was compared with forward, backward, and best subsets regression, using 16 data sets. The results indicated the stepwise method was preferred because of its practical nature, when the models chosen by different selection methods were similar…
Segmentation and texture analysis of structural biomarkers using neighborhood-clustering-based level set in MRI of the schizophrenic brain.

PubMed

Latha, Manohar; Kavitha, Ganesan

2018-02-03

Schizophrenia (SZ) is a psychiatric disorder that especially affects individuals during their adolescence. There is a need to study the subanatomical regions of SZ brain on magnetic resonance images (MRI) based on morphometry. In this work, an attempt was made to analyze alterations in structure and texture patterns in images of the SZ brain using the level-set method and Laws texture features. T1-weighted MRI of the brain from Center of Biomedical Research Excellence (COBRE) database were considered for analysis. Segmentation was carried out using the level-set method. Geometrical and Laws texture features were extracted from the segmented brain stem, corpus callosum, cerebellum, and ventricle regions to analyze pattern changes in SZ. The level-set method segmented multiple brain regions, with higher similarity and correlation values compared with an optimized method. The geometric features obtained from regions of the corpus callosum and ventricle showed significant variation (p < 0.00001) between normal and SZ brain. Laws texture feature identified a heterogeneous appearance in the brain stem, corpus callosum and ventricular regions, and features from the brain stem were correlated with Positive and Negative Syndrome Scale (PANSS) score (p < 0.005). A framework of geometric and Laws texture features obtained from brain subregions can be used as a supplement for diagnosis of psychiatric disorders.
Principal component analysis-based unsupervised feature extraction applied to in silico drug discovery for posttraumatic stress disorder-mediated heart disease.

PubMed

Taguchi, Y-h; Iwadate, Mitsuo; Umeyama, Hideaki

2015-04-30

Feature extraction (FE) is difficult, particularly if there are more features than samples, as small sample numbers often result in biased outcomes or overfitting. Furthermore, multiple sample classes often complicate FE because evaluating performance, which is usual in supervised FE, is generally harder than the two-class problem. Developing sample classification independent unsupervised methods would solve many of these problems. Two principal component analysis (PCA)-based FE, specifically, variational Bayes PCA (VBPCA) was extended to perform unsupervised FE, and together with conventional PCA (CPCA)-based unsupervised FE, were tested as sample classification independent unsupervised FE methods. VBPCA- and CPCA-based unsupervised FE both performed well when applied to simulated data, and a posttraumatic stress disorder (PTSD)-mediated heart disease data set that had multiple categorical class observations in mRNA/microRNA expression of stressed mouse heart. A critical set of PTSD miRNAs/mRNAs were identified that show aberrant expression between treatment and control samples, and significant, negative correlation with one another. Moreover, greater stability and biological feasibility than conventional supervised FE was also demonstrated. Based on the results obtained, in silico drug discovery was performed as translational validation of the methods. Our two proposed unsupervised FE methods (CPCA- and VBPCA-based) worked well on simulated data, and outperformed two conventional supervised FE methods on a real data set. Thus, these two methods have suggested equivalence for FE on categorical multiclass data sets, with potential translational utility for in silico drug discovery.
Computation and application of tissue-specific gene set weights.

PubMed

Frost, H Robert

2018-04-06

Gene set testing, or pathway analysis, has become a critical tool for the analysis of highdimensional genomic data. Although the function and activity of many genes and higher-level processes is tissue-specific, gene set testing is typically performed in a tissue agnostic fashion, which impacts statistical power and the interpretation and replication of results. To address this challenge, we have developed a bioinformatics approach to compute tissuespecific weights for individual gene sets using information on tissue-specific gene activity from the Human Protein Atlas (HPA). We used this approach to create a public repository of tissue-specific gene set weights for 37 different human tissue types from the HPA and all collections in the Molecular Signatures Database (MSigDB). To demonstrate the validity and utility of these weights, we explored three different applications: the functional characterization of human tissues, multi-tissue analysis for systemic diseases and tissue-specific gene set testing. All data used in the reported analyses is publicly available. An R implementation of the method and tissue-specific weights for MSigDB gene set collections can be downloaded at http://www.dartmouth.edu/∼hrfrost/TissueSpecificGeneSets. rob.frost@dartmouth.edu.
A comparison of radiometric correction techniques in the evaluation of the relationship between LST and NDVI in Landsat imagery.

PubMed

Tan, Kok Chooi; Lim, Hwee San; Matjafri, Mohd Zubir; Abdullah, Khiruddin

2012-06-01

Atmospheric corrections for multi-temporal optical satellite images are necessary, especially in change detection analyses, such as normalized difference vegetation index (NDVI) rationing. Abrupt change detection analysis using remote-sensing techniques requires radiometric congruity and atmospheric correction to monitor terrestrial surfaces over time. Two atmospheric correction methods were used for this study: relative radiometric normalization and the simplified method for atmospheric correction (SMAC) in the solar spectrum. A multi-temporal data set consisting of two sets of Landsat images from the period between 1991 and 2002 of Penang Island, Malaysia, was used to compare NDVI maps, which were generated using the proposed atmospheric correction methods. Land surface temperature (LST) was retrieved using ATCOR3_T in PCI Geomatica 10.1 image processing software. Linear regression analysis was utilized to analyze the relationship between NDVI and LST. This study reveals that both of the proposed atmospheric correction methods yielded high accuracy through examination of the linear correlation coefficients. To check for the accuracy of the equation obtained through linear regression analysis for every single satellite image, 20 points were randomly chosen. The results showed that the SMAC method yielded a constant value (in terms of error) to predict the NDVI value from linear regression analysis-derived equation. The errors (average) from both proposed atmospheric correction methods were less than 10%.
Missing value imputation in DNA microarrays based on conjugate gradient method.

PubMed

Dorri, Fatemeh; Azmi, Paeiz; Dorri, Faezeh

2012-02-01

Analysis of gene expression profiles needs a complete matrix of gene array values; consequently, imputation methods have been suggested. In this paper, an algorithm that is based on conjugate gradient (CG) method is proposed to estimate missing values. k-nearest neighbors of the missed entry are first selected based on absolute values of their Pearson correlation coefficient. Then a subset of genes among the k-nearest neighbors is labeled as the best similar ones. CG algorithm with this subset as its input is then used to estimate the missing values. Our proposed CG based algorithm (CGimpute) is evaluated on different data sets. The results are compared with sequential local least squares (SLLSimpute), Bayesian principle component analysis (BPCAimpute), local least squares imputation (LLSimpute), iterated local least squares imputation (ILLSimpute) and adaptive k-nearest neighbors imputation (KNNKimpute) methods. The average of normalized root mean squares error (NRMSE) and relative NRMSE in different data sets with various missing rates shows CGimpute outperforms other methods. Copyright © 2011 Elsevier Ltd. All rights reserved.
Advanced stability indicating chemometric methods for quantitation of amlodipine and atorvastatin in their quinary mixture with acidic degradation products

NASA Astrophysics Data System (ADS)

Darwish, Hany W.; Hassan, Said A.; Salem, Maissa Y.; El-Zeany, Badr A.

2016-02-01

Two advanced, accurate and precise chemometric methods are developed for the simultaneous determination of amlodipine besylate (AML) and atorvastatin calcium (ATV) in the presence of their acidic degradation products in tablet dosage forms. The first method was Partial Least Squares (PLS-1) and the second was Artificial Neural Networks (ANN). PLS was compared to ANN models with and without variable selection procedure (genetic algorithm (GA)). For proper analysis, a 5-factor 5-level experimental design was established resulting in 25 mixtures containing different ratios of the interfering species. Fifteen mixtures were used as calibration set and the other ten mixtures were used as validation set to validate the prediction ability of the suggested models. The proposed methods were successfully applied to the analysis of pharmaceutical tablets containing AML and ATV. The methods indicated the ability of the mentioned models to solve the highly overlapped spectra of the quinary mixture, yet using inexpensive and easy to handle instruments like the UV-VIS spectrophotometer.
Kennard-Stone combined with least square support vector machine method for noncontact discriminating human blood species

NASA Astrophysics Data System (ADS)

Zhang, Linna; Li, Gang; Sun, Meixiu; Li, Hongxiao; Wang, Zhennan; Li, Yingxin; Lin, Ling

2017-11-01

Identifying whole bloods to be either human or nonhuman is an important responsibility for import-export ports and inspection and quarantine departments. Analytical methods and DNA testing methods are usually destructive. Previous studies demonstrated that visible diffuse reflectance spectroscopy method can realize noncontact human and nonhuman blood discrimination. An appropriate method for calibration set selection was very important for a robust quantitative model. In this paper, Random Selection (RS) method and Kennard-Stone (KS) method was applied in selecting samples for calibration set. Moreover, proper stoichiometry method can be greatly beneficial for improving the performance of classification model or quantification model. Partial Least Square Discrimination Analysis (PLSDA) method was commonly used in identification of blood species with spectroscopy methods. Least Square Support Vector Machine (LSSVM) was proved to be perfect for discrimination analysis. In this research, PLSDA method and LSSVM method was used for human blood discrimination. Compared with the results of PLSDA method, this method could enhance the performance of identified models. The overall results convinced that LSSVM method was more feasible for identifying human and animal blood species, and sufficiently demonstrated LSSVM method was a reliable and robust method for human blood identification, and can be more effective and accurate.
Use of simulated data sets to evaluate the fidelity of metagenomic processing methods

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mavromatis, K; Ivanova, N; Barry, Kerrie

2007-01-01

Metagenomics is a rapidly emerging field of research for studying microbial communities. To evaluate methods presently used to process metagenomic sequences, we constructed three simulated data sets of varying complexity by combining sequencing reads randomly selected from 113 isolate genomes. These data sets were designed to model real metagenomes in terms of complexity and phylogenetic composition. We assembled sampled reads using three commonly used genome assemblers (Phrap, Arachne and JAZZ), and predicted genes using two popular gene-finding pipelines (fgenesb and CRITICA/GLIMMER). The phylogenetic origins of the assembled contigs were predicted using one sequence similarity-based ( blast hit distribution) and twomore » sequence composition-based (PhyloPythia, oligonucleotide frequencies) binning methods. We explored the effects of the simulated community structure and method combinations on the fidelity of each processing step by comparison to the corresponding isolate genomes. The simulated data sets are available online to facilitate standardized benchmarking of tools for metagenomic analysis.« less
Use of simulated data sets to evaluate the fidelity of Metagenomicprocessing methods

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mavromatis, Konstantinos; Ivanova, Natalia; Barry, Kerri

2006-12-01

Metagenomics is a rapidly emerging field of research for studying microbial communities. To evaluate methods presently used to process metagenomic sequences, we constructed three simulated data sets of varying complexity by combining sequencing reads randomly selected from 113 isolate genomes. These data sets were designed to model real metagenomes in terms of complexity and phylogenetic composition. We assembled sampled reads using three commonly used genome assemblers (Phrap, Arachne and JAZZ), and predicted genes using two popular gene finding pipelines (fgenesb and CRITICA/GLIMMER). The phylogenetic origins of the assembled contigs were predicted using one sequence similarity--based (blast hit distribution) and twomore » sequence composition--based (PhyloPythia, oligonucleotide frequencies) binning methods. We explored the effects of the simulated community structure and method combinations on the fidelity of each processing step by comparison to the corresponding isolate genomes. The simulated data sets are available online to facilitate standardized benchmarking of tools for metagenomic analysis.« less
Principal coordinate analysis assisted chromatographic analysis of bacterial cell wall collection: A robust classification approach.

PubMed

Kumar, Keshav; Cava, Felipe

2018-04-10

In the present work, Principal coordinate analysis (PCoA) is introduced to develop a robust model to classify the chromatographic data sets of peptidoglycan sample. PcoA captures the heterogeneity present in the data sets by using the dissimilarity matrix as input. Thus, in principle, it can even capture the subtle differences in the bacterial peptidoglycan composition and can provide a more robust and fast approach for classifying the bacterial collection and identifying the novel cell wall targets for further biological and clinical studies. The utility of the proposed approach is successfully demonstrated by analysing the two different kind of bacterial collections. The first set comprised of peptidoglycan sample belonging to different subclasses of Alphaproteobacteria. Whereas, the second set that is relatively more intricate for the chemometric analysis consist of different wild type Vibrio Cholerae and its mutants having subtle differences in their peptidoglycan composition. The present work clearly proposes a useful approach that can classify the chromatographic data sets of chromatographic peptidoglycan samples having subtle differences. Furthermore, present work clearly suggest that PCoA can be a method of choice in any data analysis workflow. Copyright © 2018 Elsevier Inc. All rights reserved.
Modeling Eye Gaze Patterns in Clinician-Patient Interaction with Lag Sequential Analysis

PubMed Central

Montague, E; Xu, J; Asan, O; Chen, P; Chewning, B; Barrett, B

2011-01-01

Objective The aim of this study was to examine whether lag-sequential analysis could be used to describe eye gaze orientation between clinicians and patients in the medical encounter. This topic is particularly important as new technologies are implemented into multi-user health care settings where trust is critical and nonverbal cues are integral to achieving trust. This analysis method could lead to design guidelines for technologies and more effective assessments of interventions. Background Nonverbal communication patterns are important aspects of clinician-patient interactions and may impact patient outcomes. Method Eye gaze behaviors of clinicians and patients in 110-videotaped medical encounters were analyzed using the lag-sequential method to identify significant behavior sequences. Lag-sequential analysis included both event-based lag and time-based lag. Results Results from event-based lag analysis showed that the patients’ gaze followed that of clinicians, while clinicians did not follow patients. Time-based sequential analysis showed that responses from the patient usually occurred within two seconds after the initial behavior of the clinician. Conclusion Our data suggest that the clinician’s gaze significantly affects the medical encounter but not the converse. Application Findings from this research have implications for the design of clinical work systems and modeling interactions. Similar research methods could be used to identify different behavior patterns in clinical settings (physical layout, technology, etc.) to facilitate and evaluate clinical work system designs. PMID:22046723

OPATs: Omnibus P-value association tests.

PubMed

Chen, Chia-Wei; Yang, Hsin-Chou

2017-07-10

Combining statistical significances (P-values) from a set of single-locus association tests in genome-wide association studies is a proof-of-principle method for identifying disease-associated genomic segments, functional genes and biological pathways. We review P-value combinations for genome-wide association studies and introduce an integrated analysis tool, Omnibus P-value Association Tests (OPATs), which provides popular analysis methods of P-value combinations. The software OPATs programmed in R and R graphical user interface features a user-friendly interface. In addition to analysis modules for data quality control and single-locus association tests, OPATs provides three types of set-based association test: window-, gene- and biopathway-based association tests. P-value combinations with or without threshold and rank truncation are provided. The significance of a set-based association test is evaluated by using resampling procedures. Performance of the set-based association tests in OPATs has been evaluated by simulation studies and real data analyses. These set-based association tests help boost the statistical power, alleviate the multiple-testing problem, reduce the impact of genetic heterogeneity, increase the replication efficiency of association tests and facilitate the interpretation of association signals by streamlining the testing procedures and integrating the genetic effects of multiple variants in genomic regions of biological relevance. In summary, P-value combinations facilitate the identification of marker sets associated with disease susceptibility and uncover missing heritability in association studies, thereby establishing a foundation for the genetic dissection of complex diseases and traits. OPATs provides an easy-to-use and statistically powerful analysis tool for P-value combinations. OPATs, examples, and user guide can be downloaded from http://www.stat.sinica.edu.tw/hsinchou/genetics/association/OPATs.htm. © The Author 2017. Published by Oxford University Press.
Measuring and Modeling Shared Visual Attention

NASA Technical Reports Server (NTRS)

Mulligan, Jeffrey B.; Gontar, Patrick

2016-01-01

Multi-person teams are sometimes responsible for critical tasks, such as flying an airliner. Here we present a method using gaze tracking data to assess shared visual attention, a term we use to describe the situation where team members are attending to a common set of elements in the environment. Gaze data are quantized with respect to a set of N areas of interest (AOIs); these are then used to construct a time series of N dimensional vectors, with each vector component representing one of the AOIs, all set to 0 except for the component corresponding to the currently fixated AOI, which is set to 1. The resulting sequence of vectors can be averaged in time, with the result that each vector component represents the proportion of time that the corresponding AOI was fixated within the given time interval. We present two methods for comparing sequences of this sort, one based on computing the time-varying correlation of the averaged vectors, and another based on a chi-square test testing the hypothesis that the observed gaze proportions are drawn from identical probability distributions. We have evaluated the method using synthetic data sets, in which the behavior was modeled as a series of "activities," each of which was modeled as a first-order Markov process. By tabulating distributions for pairs of identical and disparate activities, we are able to perform a receiver operating characteristic (ROC) analysis, allowing us to choose appropriate criteria and estimate error rates. We have applied the methods to data from airline crews, collected in a high-fidelity flight simulator (Haslbeck, Gontar & Schubert, 2014). We conclude by considering the problem of automatic (blind) discovery of activities, using methods developed for text analysis.
Measuring and Modeling Shared Visual Attention

NASA Technical Reports Server (NTRS)

Mulligan, Jeffrey B.

2016-01-01

Multi-person teams are sometimes responsible for critical tasks, such as flying an airliner. Here we present a method using gaze tracking data to assess shared visual attention, a term we use to describe the situation where team members are attending to a common set of elements in the environment. Gaze data are quantized with respect to a set of N areas of interest (AOIs); these are then used to construct a time series of N dimensional vectors, with each vector component representing one of the AOIs, all set to 0 except for the component corresponding to the currently fixated AOI, which is set to 1. The resulting sequence of vectors can be averaged in time, with the result that each vector component represents the proportion of time that the corresponding AOI was fixated within the given time interval. We present two methods for comparing sequences of this sort, one based on computing the time-varying correlation of the averaged vectors, and another based on a chi-square test testing the hypothesis that the observed gaze proportions are drawn from identical probability distributions.We have evaluated the method using synthetic data sets, in which the behavior was modeled as a series of activities, each of which was modeled as a first-order Markov process. By tabulating distributions for pairs of identical and disparate activities, we are able to perform a receiver operating characteristic (ROC) analysis, allowing us to choose appropriate criteria and estimate error rates.We have applied the methods to data from airline crews, collected in a high-fidelity flight simulator (Haslbeck, Gontar Schubert, 2014). We conclude by considering the problem of automatic (blind) discovery of activities, using methods developed for text analysis.
Method for detecting the signature of noise-induced structures in spatiotemporal data sets: an application to excitable media

NASA Astrophysics Data System (ADS)

Huett, Marc-Thorsten

2003-05-01

We formulate mathematical tools for analyzing spatiotemporal data sets. The tools are based on nearest-neighbor considerations similar to cellular automata. One of the analysis tools allows for reconstructing the noise intensity in a data set and is an appropriate method for detecting a variety of noise-induced phenomena in spatiotemporal data. The functioning of these methods is illustrated on sample data generated with the forest fire model and with networks of nonlinear oscillators. It is seen that these methods allow the characterization of spatiotemporal stochastic resonance (STSR) in experimental data. Application of these tools to biological spatiotemporal patterns is discussed. For one specific example, the slime mold Dictyostelium discoideum, it is seen, how transitions between different patterns are clearly marked by changes in the spatiotemporal observables.
Comparative Cognitive Task Analysis

DTIC Science & Technology

2007-01-01

is to perform a task analyses to determine how people operate in a specific domain on a specific task. Cognitive Task Analysis (CTA) is a set of...accomplish a task. In this chapter, we build on CTA methods by suggesting that comparative cognitive task analysis (C2TA) can help solve the aforementioned
Cluster Analysis of Minnesota School Districts. A Research Report.

ERIC Educational Resources Information Center

Cleary, James

The term "cluster analysis" refers to a set of statistical methods that classify entities with similar profiles of scores on a number of measured dimensions, in order to create empirically based typologies. A 1980 Minnesota House Research Report employed cluster analysis to categorize school districts according to their relative mixtures…
Assessing the impact of different satellite retrieval methods on forecast available potential energy

NASA Technical Reports Server (NTRS)

Whittaker, Linda M.; Horn, Lyle H.

1990-01-01

The effects of the inclusion of satellite temperature retrieval data, and of different satellite retrieval methods, on forecasts made with the NASA Goddard Laboratory for Atmospheres (GLA) fourth-order model were investigated using, as the parameter, the available potential energy (APE) in its isentropic form. Calculation of the APE were used to study the differences in the forecast sets both globally and in the Northern Hemisphere during 72-h forecast period. The analysis data sets used for the forecasts included one containing the NESDIS TIROS-N retrievals, the GLA retrievals using the physical inversion method, and a third, which did not contain satellite data, used as a control; two data sets, with and without satellite data, were used for verification. For all three data sets, the Northern Hemisphere values for the total APE showed an increase throughout the forecast period, mostly due to an increase in the zonal component, in contrast to the verification sets, which showed a steady level of total APE.
Motif-Synchronization: A new method for analysis of dynamic brain networks with EEG

NASA Astrophysics Data System (ADS)

Rosário, R. S.; Cardoso, P. T.; Muñoz, M. A.; Montoya, P.; Miranda, J. G. V.

2015-12-01

The major aim of this work was to propose a new association method known as Motif-Synchronization. This method was developed to provide information about the synchronization degree and direction between two nodes of a network by counting the number of occurrences of some patterns between any two time series. The second objective of this work was to present a new methodology for the analysis of dynamic brain networks, by combining the Time-Varying Graph (TVG) method with a directional association method. We further applied the new algorithms to a set of human electroencephalogram (EEG) signals to perform a dynamic analysis of the brain functional networks (BFN).
[Tobacco quality analysis of industrial classification of different years using near-infrared (NIR) spectrum].

PubMed

Wang, Yi; Xiang, Ma; Wen, Ya-Dong; Yu, Chun-Xia; Wang, Luo-Ping; Zhao, Long-Lian; Li, Jun-Hui

2012-11-01

In this study, tobacco quality analysis of main Industrial classification of different years was carried out applying spectrum projection and correlation methods. The group of data was near-infrared (NIR) spectrum from Hongta Tobacco (Group) Co., Ltd. 5730 tobacco leaf Industrial classification samples from Yuxi in Yunnan province from 2007 to 2010 year were collected using near infrared spectroscopy, which from different parts and colors and all belong to tobacco varieties of HONGDA. The conclusion showed that, when the samples were divided to two part by the ratio of 2:1 randomly as analysis and verification sets in the same year, the verification set corresponded with the analysis set applying spectrum projection because their correlation coefficients were above 0.98. The correlation coefficients between two different years applying spectrum projection were above 0.97. The highest correlation coefficient was the one between 2008 and 2009 year and the lowest correlation coefficient was the one between 2007 and 2010 year. At the same time, The study discussed a method to get the quantitative similarity values of different industrial classification samples. The similarity and consistency values were instructive in combination and replacement of tobacco leaf blending.
A comparison of analysis methods to estimate contingency strength.

PubMed

Lloyd, Blair P; Staubitz, Johanna L; Tapp, Jon T

2018-05-09

To date, several data analysis methods have been used to estimate contingency strength, yet few studies have compared these methods directly. To compare the relative precision and sensitivity of four analysis methods (i.e., exhaustive event-based, nonexhaustive event-based, concurrent interval, concurrent+lag interval), we applied all methods to a simulated data set in which several response-dependent and response-independent schedules of reinforcement were programmed. We evaluated the degree to which contingency strength estimates produced from each method (a) corresponded with expected values for response-dependent schedules and (b) showed sensitivity to parametric manipulations of response-independent reinforcement. Results indicated both event-based methods produced contingency strength estimates that aligned with expected values for response-dependent schedules, but differed in sensitivity to response-independent reinforcement. The precision of interval-based methods varied by analysis method (concurrent vs. concurrent+lag) and schedule type (continuous vs. partial), and showed similar sensitivities to response-independent reinforcement. Recommendations and considerations for measuring contingencies are identified. © 2018 Society for the Experimental Analysis of Behavior.
The weakest t-norm based intuitionistic fuzzy fault-tree analysis to evaluate system reliability.

PubMed

Kumar, Mohit; Yadav, Shiv Prasad

2012-07-01

In this paper, a new approach of intuitionistic fuzzy fault-tree analysis is proposed to evaluate system reliability and to find the most critical system component that affects the system reliability. Here weakest t-norm based intuitionistic fuzzy fault tree analysis is presented to calculate fault interval of system components from integrating expert's knowledge and experience in terms of providing the possibility of failure of bottom events. It applies fault-tree analysis, α-cut of intuitionistic fuzzy set and T(ω) (the weakest t-norm) based arithmetic operations on triangular intuitionistic fuzzy sets to obtain fault interval and reliability interval of the system. This paper also modifies Tanaka et al.'s fuzzy fault-tree definition. In numerical verification, a malfunction of weapon system "automatic gun" is presented as a numerical example. The result of the proposed method is compared with the listing approaches of reliability analysis methods. Copyright © 2012 ISA. Published by Elsevier Ltd. All rights reserved.
Bayesian non-parametric inference for stochastic epidemic models using Gaussian Processes.

PubMed

Xu, Xiaoguang; Kypraios, Theodore; O'Neill, Philip D

2016-10-01

This paper considers novel Bayesian non-parametric methods for stochastic epidemic models. Many standard modeling and data analysis methods use underlying assumptions (e.g. concerning the rate at which new cases of disease will occur) which are rarely challenged or tested in practice. To relax these assumptions, we develop a Bayesian non-parametric approach using Gaussian Processes, specifically to estimate the infection process. The methods are illustrated with both simulated and real data sets, the former illustrating that the methods can recover the true infection process quite well in practice, and the latter illustrating that the methods can be successfully applied in different settings. © The Author 2016. Published by Oxford University Press.
Probabilistic boundary element method

NASA Technical Reports Server (NTRS)

Cruse, T. A.; Raveendra, S. T.

1989-01-01

The purpose of the Probabilistic Structural Analysis Method (PSAM) project is to develop structural analysis capabilities for the design analysis of advanced space propulsion system hardware. The boundary element method (BEM) is used as the basis of the Probabilistic Advanced Analysis Methods (PADAM) which is discussed. The probabilistic BEM code (PBEM) is used to obtain the structural response and sensitivity results to a set of random variables. As such, PBEM performs analogous to other structural analysis codes such as finite elements in the PSAM system. For linear problems, unlike the finite element method (FEM), the BEM governing equations are written at the boundary of the body only, thus, the method eliminates the need to model the volume of the body. However, for general body force problems, a direct condensation of the governing equations to the boundary of the body is not possible and therefore volume modeling is generally required.
PSEA-Quant: a protein set enrichment analysis on label-free and label-based protein quantification data.

PubMed

Lavallée-Adam, Mathieu; Rauniyar, Navin; McClatchy, Daniel B; Yates, John R

2014-12-05

The majority of large-scale proteomics quantification methods yield long lists of quantified proteins that are often difficult to interpret and poorly reproduced. Computational approaches are required to analyze such intricate quantitative proteomics data sets. We propose a statistical approach to computationally identify protein sets (e.g., Gene Ontology (GO) terms) that are significantly enriched with abundant proteins with reproducible quantification measurements across a set of replicates. To this end, we developed PSEA-Quant, a protein set enrichment analysis algorithm for label-free and label-based protein quantification data sets. It offers an alternative approach to classic GO analyses, models protein annotation biases, and allows the analysis of samples originating from a single condition, unlike analogous approaches such as GSEA and PSEA. We demonstrate that PSEA-Quant produces results complementary to GO analyses. We also show that PSEA-Quant provides valuable information about the biological processes involved in cystic fibrosis using label-free protein quantification of a cell line expressing a CFTR mutant. Finally, PSEA-Quant highlights the differences in the mechanisms taking place in the human, rat, and mouse brain frontal cortices based on tandem mass tag quantification. Our approach, which is available online, will thus improve the analysis of proteomics quantification data sets by providing meaningful biological insights.
PSEA-Quant: A Protein Set Enrichment Analysis on Label-Free and Label-Based Protein Quantification Data

PubMed Central

2015-01-01

The majority of large-scale proteomics quantification methods yield long lists of quantified proteins that are often difficult to interpret and poorly reproduced. Computational approaches are required to analyze such intricate quantitative proteomics data sets. We propose a statistical approach to computationally identify protein sets (e.g., Gene Ontology (GO) terms) that are significantly enriched with abundant proteins with reproducible quantification measurements across a set of replicates. To this end, we developed PSEA-Quant, a protein set enrichment analysis algorithm for label-free and label-based protein quantification data sets. It offers an alternative approach to classic GO analyses, models protein annotation biases, and allows the analysis of samples originating from a single condition, unlike analogous approaches such as GSEA and PSEA. We demonstrate that PSEA-Quant produces results complementary to GO analyses. We also show that PSEA-Quant provides valuable information about the biological processes involved in cystic fibrosis using label-free protein quantification of a cell line expressing a CFTR mutant. Finally, PSEA-Quant highlights the differences in the mechanisms taking place in the human, rat, and mouse brain frontal cortices based on tandem mass tag quantification. Our approach, which is available online, will thus improve the analysis of proteomics quantification data sets by providing meaningful biological insights. PMID:25177766
Analysis of Closely Related Antioxidant Nutraceuticals Using the Green Analytical Methodology of ANN and Smart Spectrophotometric Methods.

PubMed

Korany, Mohamed A; Gazy, Azza A; Khamis, Essam F; Ragab, Marwa A A; Kamal, Miranda F

2017-01-01

Two new, simple, and specific green analytical methods are proposed: zero-crossing first-derivative and chemometric-based spectrophotometric artificial neural network (ANN). The proposed methods were used for the simultaneous estimation of two closely related antioxidant nutraceuticals, coenzyme Q10 (Q10) and vitamin E, in their mixtures and pharmaceutical preparations. The first method is based on the handling of spectrophotometric data with the first-derivative technique, in which both nutraceuticals were determined in ethanol, each at the zero crossing of the other. The amplitudes of the first-derivative spectra for Q10 and vitamin E were recorded at 285 and 235 nm respectively, and correlated with their concentrations. The linearity ranges of Q10 and vitamin E were 10-60 and 5.6-70 μg⋅mL-1, respectively. The second method, ANN, is a multivariate calibration method and it was developed and applied for the simultaneous determination of both analytes. A training set of 90 different synthetic mixtures containing Q10 and vitamin E in the ranges of 0-100 and 0-556 μg⋅mL-1, respectively, was prepared in ethanol. The absorption spectra of the training set were recorded in the spectral region of 230-300 nm. By relating the concentration sets (x-block) with their corresponding absorption data (y-block), gradient-descent back-propagation ANN calibration could be computed. To validate the proposed network, a set of 45 synthetic mixtures of the two drugs was used. Both proposed methods were successfully applied for the assay of Q10 and vitamin E in their laboratory-prepared mixtures and in their pharmaceutical tablets with excellent recovery. These methods offer advantages over other methods because of low-cost equipment, time-saving measures, and environmentally friendly materials. In addition, no chemical separation prior to analysis was needed. The ANN method was superior to the derivative technique because ANN can determine both drugs under nonlinear experimental conditions. Consequently, ANN would be the method of choice in the routine analysis of Q10 and vitamin E tablets. No interference from common pharmaceutical additives was observed. Student's t-test and the F-test were used to compare the two methods. No significant difference was recorded.
eulerAPE: Drawing Area-Proportional 3-Venn Diagrams Using Ellipses

PubMed Central

Micallef, Luana; Rodgers, Peter

2014-01-01

Venn diagrams with three curves are used extensively in various medical and scientific disciplines to visualize relationships between data sets and facilitate data analysis. The area of the regions formed by the overlapping curves is often directly proportional to the cardinality of the depicted set relation or any other related quantitative data. Drawing these diagrams manually is difficult and current automatic drawing methods do not always produce appropriate diagrams. Most methods depict the data sets as circles, as they perceptually pop out as complete distinct objects due to their smoothness and regularity. However, circles cannot draw accurate diagrams for most 3-set data and so the generated diagrams often have misleading region areas. Other methods use polygons to draw accurate diagrams. However, polygons are non-smooth and non-symmetric, so the curves are not easily distinguishable and the diagrams are difficult to comprehend. Ellipses are more flexible than circles and are similarly smooth, but none of the current automatic drawing methods use ellipses. We present eulerAPE as the first method and software that uses ellipses for automatically drawing accurate area-proportional Venn diagrams for 3-set data. We describe the drawing method adopted by eulerAPE and we discuss our evaluation of the effectiveness of eulerAPE and ellipses for drawing random 3-set data. We compare eulerAPE and various other methods that are currently available and we discuss differences between their generated diagrams in terms of accuracy and ease of understanding for real world data. PMID:25032825
eulerAPE: drawing area-proportional 3-Venn diagrams using ellipses.

PubMed

Micallef, Luana; Rodgers, Peter

2014-01-01

Venn diagrams with three curves are used extensively in various medical and scientific disciplines to visualize relationships between data sets and facilitate data analysis. The area of the regions formed by the overlapping curves is often directly proportional to the cardinality of the depicted set relation or any other related quantitative data. Drawing these diagrams manually is difficult and current automatic drawing methods do not always produce appropriate diagrams. Most methods depict the data sets as circles, as they perceptually pop out as complete distinct objects due to their smoothness and regularity. However, circles cannot draw accurate diagrams for most 3-set data and so the generated diagrams often have misleading region areas. Other methods use polygons to draw accurate diagrams. However, polygons are non-smooth and non-symmetric, so the curves are not easily distinguishable and the diagrams are difficult to comprehend. Ellipses are more flexible than circles and are similarly smooth, but none of the current automatic drawing methods use ellipses. We present eulerAPE as the first method and software that uses ellipses for automatically drawing accurate area-proportional Venn diagrams for 3-set data. We describe the drawing method adopted by eulerAPE and we discuss our evaluation of the effectiveness of eulerAPE and ellipses for drawing random 3-set data. We compare eulerAPE and various other methods that are currently available and we discuss differences between their generated diagrams in terms of accuracy and ease of understanding for real world data.
Expediting Combinatorial Data Set Analysis by Combining Human and Algorithmic Analysis.

PubMed

Stein, Helge Sören; Jiao, Sally; Ludwig, Alfred

2017-01-09

A challenge in combinatorial materials science remains the efficient analysis of X-ray diffraction (XRD) data and its correlation to functional properties. Rapid identification of phase-regions and proper assignment of corresponding crystal structures is necessary to keep pace with the improved methods for synthesizing and characterizing materials libraries. Therefore, a new modular software called htAx (high-throughput analysis of X-ray and functional properties data) is presented that couples human intelligence tasks used for "ground-truth" phase-region identification with subsequent unbiased verification by an algorithm to efficiently analyze which phases are present in a materials library. Identified phases and phase-regions may then be correlated to functional properties in an expedited manner. For the functionality of htAx to be proven, two previously published XRD benchmark data sets of the materials systems Al-Cr-Fe-O and Ni-Ti-Cu are analyzed by htAx. The analysis of ∼1000 XRD patterns takes less than 1 day with htAx. The proposed method reliably identifies phase-region boundaries and robustly identifies multiphase structures. The method also addresses the problem of identifying regions with previously unpublished crystal structures using a special daisy ternary plot.
ESEA: Discovering the Dysregulated Pathways based on Edge Set Enrichment Analysis

PubMed Central

Han, Junwei; Shi, Xinrui; Zhang, Yunpeng; Xu, Yanjun; Jiang, Ying; Zhang, Chunlong; Feng, Li; Yang, Haixiu; Shang, Desi; Sun, Zeguo; Su, Fei; Li, Chunquan; Li, Xia

2015-01-01

Pathway analyses are playing an increasingly important role in understanding biological mechanism, cellular function and disease states. Current pathway-identification methods generally focus on only the changes of gene expression levels; however, the biological relationships among genes are also the fundamental components of pathways, and the dysregulated relationships may also alter the pathway activities. We propose a powerful computational method, Edge Set Enrichment Analysis (ESEA), for the identification of dysregulated pathways. This provides a novel way of pathway analysis by investigating the changes of biological relationships of pathways in the context of gene expression data. Simulation studies illustrate the power and performance of ESEA under various simulated conditions. Using real datasets from p53 mutation, Type 2 diabetes and lung cancer, we validate effectiveness of ESEA in identifying dysregulated pathways. We further compare our results with five other pathway enrichment analysis methods. With these analyses, we show that ESEA is able to help uncover dysregulated biological pathways underlying complex traits and human diseases via specific use of the dysregulated biological relationships. We develop a freely available R-based tool of ESEA. Currently, ESEA can support pathway analysis of the seven public databases (KEGG; Reactome; Biocarta; NCI; SPIKE; HumanCyc; Panther). PMID:26267116

Bayesian analysis of rare events

NASA Astrophysics Data System (ADS)

Straub, Daniel; Papaioannou, Iason; Betz, Wolfgang

2016-06-01

In many areas of engineering and science there is an interest in predicting the probability of rare events, in particular in applications related to safety and security. Increasingly, such predictions are made through computer models of physical systems in an uncertainty quantification framework. Additionally, with advances in IT, monitoring and sensor technology, an increasing amount of data on the performance of the systems is collected. This data can be used to reduce uncertainty, improve the probability estimates and consequently enhance the management of rare events and associated risks. Bayesian analysis is the ideal method to include the data into the probabilistic model. It ensures a consistent probabilistic treatment of uncertainty, which is central in the prediction of rare events, where extrapolation from the domain of observation is common. We present a framework for performing Bayesian updating of rare event probabilities, termed BUS. It is based on a reinterpretation of the classical rejection-sampling approach to Bayesian analysis, which enables the use of established methods for estimating probabilities of rare events. By drawing upon these methods, the framework makes use of their computational efficiency. These methods include the First-Order Reliability Method (FORM), tailored importance sampling (IS) methods and Subset Simulation (SuS). In this contribution, we briefly review these methods in the context of the BUS framework and investigate their applicability to Bayesian analysis of rare events in different settings. We find that, for some applications, FORM can be highly efficient and is surprisingly accurate, enabling Bayesian analysis of rare events with just a few model evaluations. In a general setting, BUS implemented through IS and SuS is more robust and flexible.
Applications of modern statistical methods to analysis of data in physical science

NASA Astrophysics Data System (ADS)

Wicker, James Eric

Modern methods of statistical and computational analysis offer solutions to dilemmas confronting researchers in physical science. Although the ideas behind modern statistical and computational analysis methods were originally introduced in the 1970's, most scientists still rely on methods written during the early era of computing. These researchers, who analyze increasingly voluminous and multivariate data sets, need modern analysis methods to extract the best results from their studies. The first section of this work showcases applications of modern linear regression. Since the 1960's, many researchers in spectroscopy have used classical stepwise regression techniques to derive molecular constants. However, problems with thresholds of entry and exit for model variables plagues this analysis method. Other criticisms of this kind of stepwise procedure include its inefficient searching method, the order in which variables enter or leave the model and problems with overfitting data. We implement an information scoring technique that overcomes the assumptions inherent in the stepwise regression process to calculate molecular model parameters. We believe that this kind of information based model evaluation can be applied to more general analysis situations in physical science. The second section proposes new methods of multivariate cluster analysis. The K-means algorithm and the EM algorithm, introduced in the 1960's and 1970's respectively, formed the basis of multivariate cluster analysis methodology for many years. However, several shortcomings of these methods include strong dependence on initial seed values and inaccurate results when the data seriously depart from hypersphericity. We propose new cluster analysis methods based on genetic algorithms that overcomes the strong dependence on initial seed values. In addition, we propose a generalization of the Genetic K-means algorithm which can accurately identify clusters with complex hyperellipsoidal covariance structures. We then use this new algorithm in a genetic algorithm based Expectation-Maximization process that can accurately calculate parameters describing complex clusters in a mixture model routine. Using the accuracy of this GEM algorithm, we assign information scores to cluster calculations in order to best identify the number of mixture components in a multivariate data set. We will showcase how these algorithms can be used to process multivariate data from astronomical observations.
Data publication with the structural biology data grid supports live analysis

DOE PAGES

Meyer, Peter A.; Socias, Stephanie; Key, Jason; ...

2016-03-07

Access to experimental X-ray diffraction image data is fundamental for validation and reproduction of macromolecular models and indispensable for development of structural biology processing methods. Here, we established a diffraction data publication and dissemination system, Structural Biology Data Grid (SBDG; data.sbgrid.org), to preserve primary experimental data sets that support scientific publications. Data sets are accessible to researchers through a community driven data grid, which facilitates global data access. Our analysis of a pilot collection of crystallographic data sets demonstrates that the information archived by SBDG is sufficient to reprocess data to statistics that meet or exceed the quality of themore » original published structures. SBDG has extended its services to the entire community and is used to develop support for other types of biomedical data sets. In conclusion, it is anticipated that access to the experimental data sets will enhance the paradigm shift in the community towards a much more dynamic body of continuously improving data analysis.« less
Data publication with the structural biology data grid supports live analysis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Meyer, Peter A.; Socias, Stephanie; Key, Jason

Access to experimental X-ray diffraction image data is fundamental for validation and reproduction of macromolecular models and indispensable for development of structural biology processing methods. Here, we established a diffraction data publication and dissemination system, Structural Biology Data Grid (SBDG; data.sbgrid.org), to preserve primary experimental data sets that support scientific publications. Data sets are accessible to researchers through a community driven data grid, which facilitates global data access. Our analysis of a pilot collection of crystallographic data sets demonstrates that the information archived by SBDG is sufficient to reprocess data to statistics that meet or exceed the quality of themore » original published structures. SBDG has extended its services to the entire community and is used to develop support for other types of biomedical data sets. In conclusion, it is anticipated that access to the experimental data sets will enhance the paradigm shift in the community towards a much more dynamic body of continuously improving data analysis.« less
Data publication with the structural biology data grid supports live analysis.

PubMed

Meyer, Peter A; Socias, Stephanie; Key, Jason; Ransey, Elizabeth; Tjon, Emily C; Buschiazzo, Alejandro; Lei, Ming; Botka, Chris; Withrow, James; Neau, David; Rajashankar, Kanagalaghatta; Anderson, Karen S; Baxter, Richard H; Blacklow, Stephen C; Boggon, Titus J; Bonvin, Alexandre M J J; Borek, Dominika; Brett, Tom J; Caflisch, Amedeo; Chang, Chung-I; Chazin, Walter J; Corbett, Kevin D; Cosgrove, Michael S; Crosson, Sean; Dhe-Paganon, Sirano; Di Cera, Enrico; Drennan, Catherine L; Eck, Michael J; Eichman, Brandt F; Fan, Qing R; Ferré-D'Amaré, Adrian R; Fromme, J Christopher; Garcia, K Christopher; Gaudet, Rachelle; Gong, Peng; Harrison, Stephen C; Heldwein, Ekaterina E; Jia, Zongchao; Keenan, Robert J; Kruse, Andrew C; Kvansakul, Marc; McLellan, Jason S; Modis, Yorgo; Nam, Yunsun; Otwinowski, Zbyszek; Pai, Emil F; Pereira, Pedro José Barbosa; Petosa, Carlo; Raman, C S; Rapoport, Tom A; Roll-Mecak, Antonina; Rosen, Michael K; Rudenko, Gabby; Schlessinger, Joseph; Schwartz, Thomas U; Shamoo, Yousif; Sondermann, Holger; Tao, Yizhi J; Tolia, Niraj H; Tsodikov, Oleg V; Westover, Kenneth D; Wu, Hao; Foster, Ian; Fraser, James S; Maia, Filipe R N C; Gonen, Tamir; Kirchhausen, Tom; Diederichs, Kay; Crosas, Mercè; Sliz, Piotr

2016-03-07

Access to experimental X-ray diffraction image data is fundamental for validation and reproduction of macromolecular models and indispensable for development of structural biology processing methods. Here, we established a diffraction data publication and dissemination system, Structural Biology Data Grid (SBDG; data.sbgrid.org), to preserve primary experimental data sets that support scientific publications. Data sets are accessible to researchers through a community driven data grid, which facilitates global data access. Our analysis of a pilot collection of crystallographic data sets demonstrates that the information archived by SBDG is sufficient to reprocess data to statistics that meet or exceed the quality of the original published structures. SBDG has extended its services to the entire community and is used to develop support for other types of biomedical data sets. It is anticipated that access to the experimental data sets will enhance the paradigm shift in the community towards a much more dynamic body of continuously improving data analysis.
Data publication with the structural biology data grid supports live analysis

PubMed Central

Meyer, Peter A.; Socias, Stephanie; Key, Jason; Ransey, Elizabeth; Tjon, Emily C.; Buschiazzo, Alejandro; Lei, Ming; Botka, Chris; Withrow, James; Neau, David; Rajashankar, Kanagalaghatta; Anderson, Karen S.; Baxter, Richard H.; Blacklow, Stephen C.; Boggon, Titus J.; Bonvin, Alexandre M. J. J.; Borek, Dominika; Brett, Tom J.; Caflisch, Amedeo; Chang, Chung-I; Chazin, Walter J.; Corbett, Kevin D.; Cosgrove, Michael S.; Crosson, Sean; Dhe-Paganon, Sirano; Di Cera, Enrico; Drennan, Catherine L.; Eck, Michael J.; Eichman, Brandt F.; Fan, Qing R.; Ferré-D'Amaré, Adrian R.; Christopher Fromme, J.; Garcia, K. Christopher; Gaudet, Rachelle; Gong, Peng; Harrison, Stephen C.; Heldwein, Ekaterina E.; Jia, Zongchao; Keenan, Robert J.; Kruse, Andrew C.; Kvansakul, Marc; McLellan, Jason S.; Modis, Yorgo; Nam, Yunsun; Otwinowski, Zbyszek; Pai, Emil F.; Pereira, Pedro José Barbosa; Petosa, Carlo; Raman, C. S.; Rapoport, Tom A.; Roll-Mecak, Antonina; Rosen, Michael K.; Rudenko, Gabby; Schlessinger, Joseph; Schwartz, Thomas U.; Shamoo, Yousif; Sondermann, Holger; Tao, Yizhi J.; Tolia, Niraj H.; Tsodikov, Oleg V.; Westover, Kenneth D.; Wu, Hao; Foster, Ian; Fraser, James S.; Maia, Filipe R. N C.; Gonen, Tamir; Kirchhausen, Tom; Diederichs, Kay; Crosas, Mercè; Sliz, Piotr

2016-01-01

Access to experimental X-ray diffraction image data is fundamental for validation and reproduction of macromolecular models and indispensable for development of structural biology processing methods. Here, we established a diffraction data publication and dissemination system, Structural Biology Data Grid (SBDG; data.sbgrid.org), to preserve primary experimental data sets that support scientific publications. Data sets are accessible to researchers through a community driven data grid, which facilitates global data access. Our analysis of a pilot collection of crystallographic data sets demonstrates that the information archived by SBDG is sufficient to reprocess data to statistics that meet or exceed the quality of the original published structures. SBDG has extended its services to the entire community and is used to develop support for other types of biomedical data sets. It is anticipated that access to the experimental data sets will enhance the paradigm shift in the community towards a much more dynamic body of continuously improving data analysis. PMID:26947396
ASTM clustering for improving coal analysis by near-infrared spectroscopy.

PubMed

Andrés, J M; Bona, M T

2006-11-15

Multivariate analysis techniques have been applied to near-infrared (NIR) spectra coals to investigate the relationship between nine coal properties (moisture (%), ash (%), volatile matter (%), fixed carbon (%), heating value (kcal/kg), carbon (%), hydrogen (%), nitrogen (%) and sulphur (%)) and the corresponding predictor variables. In this work, a whole set of coal samples was grouped into six more homogeneous clusters following the ASTM reference method for classification prior to the application of calibration methods to each coal set. The results obtained showed a considerable improvement of the error determination compared with the calibration for the whole sample set. For some groups, the established calibrations approached the quality required by the ASTM/ISO norms for laboratory analysis. To predict property values for a new coal sample it is necessary the assignation of that sample to its respective group. Thus, the discrimination and classification ability of coal samples by Diffuse Reflectance Infrared Fourier Transform Spectroscopy (DRIFTS) in the NIR range was also studied by applying Soft Independent Modelling of Class Analogy (SIMCA) and Linear Discriminant Analysis (LDA) techniques. Modelling of the groups by SIMCA led to overlapping models that cannot discriminate for unique classification. On the other hand, the application of Linear Discriminant Analysis improved the classification of the samples but not enough to be satisfactory for every group considered.
Efficient visualization of urban spaces

NASA Astrophysics Data System (ADS)

Stamps, A. E.

2012-10-01

This chapter presents a new method for calculating efficiency and applies that method to the issues of selecting simulation media and evaluating the contextual fit of new buildings in urban spaces. The new method is called "meta-analysis". A meta-analytic review of 967 environments indicated that static color simulations are the most efficient media for visualizing urban spaces. For contextual fit, four original experiments are reported on how strongly five factors influence visual appeal of a street: architectural style, trees, height of a new building relative to the heights of existing buildings, setting back a third story, and distance. A meta-analysis of these four experiments and previous findings, covering 461 environments, indicated that architectural style, trees, and height had effects strong enough to warrant implementation, but the effects of setting back third stories and distance were too small to warrant implementation.
Analysis of Longitudinal Outcome Data with Missing Values in Total Knee Arthroplasty.

PubMed

Kang, Yeon Gwi; Lee, Jang Taek; Kang, Jong Yeal; Kim, Ga Hye; Kim, Tae Kyun

2016-01-01

We sought to determine the influence of missing data on the statistical results, and to determine which statistical method is most appropriate for the analysis of longitudinal outcome data of TKA with missing values among repeated measures ANOVA, generalized estimating equation (GEE) and mixed effects model repeated measures (MMRM). Data sets with missing values were generated with different proportion of missing data, sample size and missing-data generation mechanism. Each data set was analyzed with three statistical methods. The influence of missing data was greater with higher proportion of missing data and smaller sample size. MMRM tended to show least changes in the statistics. When missing values were generated by 'missing not at random' mechanism, no statistical methods could fully avoid deviations in the results. Copyright © 2016 Elsevier Inc. All rights reserved.
Practical Use of Computationally Frugal Model Analysis Methods

DOE PAGES

Hill, Mary C.; Kavetski, Dmitri; Clark, Martyn; ...

2015-03-21

Computationally frugal methods of model analysis can provide substantial benefits when developing models of groundwater and other environmental systems. Model analysis includes ways to evaluate model adequacy and to perform sensitivity and uncertainty analysis. Frugal methods typically require 10s of parallelizable model runs; their convenience allows for other uses of the computational effort. We suggest that model analysis be posed as a set of questions used to organize methods that range from frugal to expensive (requiring 10,000 model runs or more). This encourages focus on method utility, even when methods have starkly different theoretical backgrounds. We note that many frugalmore » methods are more useful when unrealistic process-model nonlinearities are reduced. Inexpensive diagnostics are identified for determining when frugal methods are advantageous. Examples from the literature are used to demonstrate local methods and the diagnostics. We suggest that the greater use of computationally frugal model analysis methods would allow questions such as those posed in this work to be addressed more routinely, allowing the environmental sciences community to obtain greater scientific insight from the many ongoing and future modeling efforts« less
The Fungal Frontier: A Comparative Analysis of Methods Used in the Study of the Human Gut Mycobiome

PubMed Central

Huseyin, Chloe E.; Rubio, Raul Cabrera; O’Sullivan, Orla; Cotter, Paul D.; Scanlan, Pauline D.

2017-01-01

The human gut is host to a diverse range of fungal species, collectively referred to as the gut “mycobiome”. The gut mycobiome is emerging as an area of considerable research interest due to the potential roles of these fungi in human health and disease. However, there is no consensus as to what the best or most suitable methodologies available are with respect to characterizing the human gut mycobiome. The aim of this study is to provide a comparative analysis of several previously published mycobiome-specific culture-dependent and -independent methodologies, including choice of culture media, incubation conditions (aerobic versus anaerobic), DNA extraction method, primer set and freezing of fecal samples to assess their relative merits and suitability for gut mycobiome analysis. There was no significant effect of media type or aeration on culture-dependent results. However, freezing was found to have a significant effect on fungal viability, with significantly lower fungal numbers recovered from frozen samples. DNA extraction method had a significant effect on DNA yield and quality. However, freezing and extraction method did not have any impact on either α or β diversity. There was also considerable variation in the ability of different fungal-specific primer sets to generate PCR products for subsequent sequence analysis. Through this investigation two DNA extraction methods and one primer set was identified which facilitated the analysis of the mycobiome for all samples in this study. Ultimately, a diverse range of fungal species were recovered using both approaches, with Candida and Saccharomyces identified as the most common fungal species recovered using culture-dependent and culture-independent methods, respectively. As has been apparent from ecological surveys of the bacterial fraction of the gut microbiota, the use of different methodologies can also impact on our understanding of gut mycobiome composition and therefore requires careful consideration. Future research into the gut mycobiome needs to adopt a common strategy to minimize potentially confounding effects of methodological choice and to facilitate comparative analysis of datasets. PMID:28824566
Comparison of Machine Learning Methods for the Arterial Hypertension Diagnostics

PubMed Central

Belo, David; Gamboa, Hugo

2017-01-01

The paper presents results of machine learning approach accuracy applied analysis of cardiac activity. The study evaluates the diagnostics possibilities of the arterial hypertension by means of the short-term heart rate variability signals. Two groups were studied: 30 relatively healthy volunteers and 40 patients suffering from the arterial hypertension of II-III degree. The following machine learning approaches were studied: linear and quadratic discriminant analysis, k-nearest neighbors, support vector machine with radial basis, decision trees, and naive Bayes classifier. Moreover, in the study, different methods of feature extraction are analyzed: statistical, spectral, wavelet, and multifractal. All in all, 53 features were investigated. Investigation results show that discriminant analysis achieves the highest classification accuracy. The suggested approach of noncorrelated feature set search achieved higher results than data set based on the principal components. PMID:28831239
The Global Oscillation Network Group site survey. 1: Data collection and analysis methods

NASA Technical Reports Server (NTRS)

Hill, Frank; Fischer, George; Grier, Jennifer; Leibacher, John W.; Jones, Harrison B.; Jones, Patricia P.; Kupke, Renate; Stebbins, Robin T.

1994-01-01

The Global Oscillation Network Group (GONG) Project is planning to place a set of instruments around the world to observe solar oscillations as continuously as possible for at least three years. The Project has now chosen the sites that will comprise the network. This paper describes the methods of data collection and analysis that were used to make this decision. Solar irradiance data were collected with a one-minute cadence at fifteen sites around the world and analyzed to produce statistics of cloud cover, atmospheric extinction, and transparency power spectra at the individual sites. Nearly 200 reasonable six-site networks were assembled from the individual stations, and a set of statistical measures of the performance of the networks was analyzed using a principal component analysis. An accompanying paper presents the results of the survey.
Axial Crushing of Thin-Walled Columns with Octagonal Section: Modeling and Design

NASA Astrophysics Data System (ADS)

Liu, Yucheng; Day, Michael L.

This chapter focus on numerical crashworthiness analysis of straight thinwalled columns with octagonal cross sections. Two important issues in this analysis are demonstrated here: computer modeling and crashworthiness design. In the first part, this chapter introduces a method of developing simplified finite element (FE) models for the straight thin-walled octagonal columns, which can be used for the numerical crashworthiness analysis. Next, this chapter performs a crashworthiness design for such thin-walled columns in order to maximize their energy absorption capability. Specific energy absorption (SEA) is set as the design objective, side length of the octagonal cross section and wall thickness are selected as design variables, and maximum crushing force (Pm) occurs during crashes is set as design constraint. Response surface method (RSM) is employed to formulate functions for both SEA and Pm.
Application of a Cloud Model-Set Pair Analysis in Hazard Assessment for Biomass Gasification Stations.

PubMed

Yan, Fang; Xu, Kaili

2017-01-01

Because a biomass gasification station includes various hazard factors, hazard assessment is needed and significant. In this article, the cloud model (CM) is employed to improve set pair analysis (SPA), and a novel hazard assessment method for a biomass gasification station is proposed based on the cloud model-set pair analysis (CM-SPA). In this method, cloud weight is proposed to be the weight of index. In contrast to the index weight of other methods, cloud weight is shown by cloud descriptors; hence, the randomness and fuzziness of cloud weight will make it effective to reflect the linguistic variables of experts. Then, the cloud connection degree (CCD) is proposed to replace the connection degree (CD); the calculation algorithm of CCD is also worked out. By utilizing the CCD, the hazard assessment results are shown by some normal clouds, and the normal clouds are reflected by cloud descriptors; meanwhile, the hazard grade is confirmed by analyzing the cloud descriptors. After that, two biomass gasification stations undergo hazard assessment via CM-SPA and AHP based SPA, respectively. The comparison of assessment results illustrates that the CM-SPA is suitable and effective for the hazard assessment of a biomass gasification station and that CM-SPA will make the assessment results more reasonable and scientific.
Application of a Cloud Model-Set Pair Analysis in Hazard Assessment for Biomass Gasification Stations

PubMed Central

Yan, Fang; Xu, Kaili

2017-01-01

Because a biomass gasification station includes various hazard factors, hazard assessment is needed and significant. In this article, the cloud model (CM) is employed to improve set pair analysis (SPA), and a novel hazard assessment method for a biomass gasification station is proposed based on the cloud model-set pair analysis (CM-SPA). In this method, cloud weight is proposed to be the weight of index. In contrast to the index weight of other methods, cloud weight is shown by cloud descriptors; hence, the randomness and fuzziness of cloud weight will make it effective to reflect the linguistic variables of experts. Then, the cloud connection degree (CCD) is proposed to replace the connection degree (CD); the calculation algorithm of CCD is also worked out. By utilizing the CCD, the hazard assessment results are shown by some normal clouds, and the normal clouds are reflected by cloud descriptors; meanwhile, the hazard grade is confirmed by analyzing the cloud descriptors. After that, two biomass gasification stations undergo hazard assessment via CM-SPA and AHP based SPA, respectively. The comparison of assessment results illustrates that the CM-SPA is suitable and effective for the hazard assessment of a biomass gasification station and that CM-SPA will make the assessment results more reasonable and scientific. PMID:28076440
Photothermal technique in cell microscopy studies

NASA Astrophysics Data System (ADS)

Lapotko, Dmitry; Chebot'ko, Igor; Kutchinsky, Georgy; Cherenkevitch, Sergey

1995-01-01

Photothermal (PT) method is applied for a cell imaging and quantitative studies. The techniques for cell monitoring, imaging and cell viability test are developed. The method and experimental set up for optical and PT-image acquisition and analysis is described. Dual- pulsed laser set up combined with phase contrast illumination of a sample provides visualization of temperature field or absorption structure of a sample with spatial resolution 0.5 micrometers . The experimental optics, hardware and software are designed using the modular principle, so the whole set up can be adjusted for various experiments: PT-response monitoring or photothermal spectroscopy studies. Sensitivity of PT-method provides the imaging of the structural elements of live (non-stained) white blood cells. The results of experiments with normal and subnormal blood cells (red blood cells, lymphocytes, neutrophyles and lymphoblasts) are reported. Obtained PT-images are different from optical analogs and deliver additional information about cell structure. The quantitative analysis of images was used for cell population comparative diagnostic. The viability test for red blood cell differentiation is described. During the study of neutrophyles in norma and sarcoidosis disease the differences in PT-images of cells were found.
Developing a benchmark for emotional analysis of music

PubMed Central

Yang, Yi-Hsuan; Soleymani, Mohammad

2017-01-01

Music emotion recognition (MER) field rapidly expanded in the last decade. Many new methods and new audio features are developed to improve the performance of MER algorithms. However, it is very difficult to compare the performance of the new methods because of the data representation diversity and scarcity of publicly available data. In this paper, we address these problems by creating a data set and a benchmark for MER. The data set that we release, a MediaEval Database for Emotional Analysis in Music (DEAM), is the largest available data set of dynamic annotations (valence and arousal annotations for 1,802 songs and song excerpts licensed under Creative Commons with 2Hz time resolution). Using DEAM, we organized the ‘Emotion in Music’ task at MediaEval Multimedia Evaluation Campaign from 2013 to 2015. The benchmark attracted, in total, 21 active teams to participate in the challenge. We analyze the results of the benchmark: the winning algorithms and feature-sets. We also describe the design of the benchmark, the evaluation procedures and the data cleaning and transformations that we suggest. The results from the benchmark suggest that the recurrent neural network based approaches combined with large feature-sets work best for dynamic MER. PMID:28282400
Developing a benchmark for emotional analysis of music.

PubMed

Aljanaki, Anna; Yang, Yi-Hsuan; Soleymani, Mohammad

2017-01-01

Music emotion recognition (MER) field rapidly expanded in the last decade. Many new methods and new audio features are developed to improve the performance of MER algorithms. However, it is very difficult to compare the performance of the new methods because of the data representation diversity and scarcity of publicly available data. In this paper, we address these problems by creating a data set and a benchmark for MER. The data set that we release, a MediaEval Database for Emotional Analysis in Music (DEAM), is the largest available data set of dynamic annotations (valence and arousal annotations for 1,802 songs and song excerpts licensed under Creative Commons with 2Hz time resolution). Using DEAM, we organized the 'Emotion in Music' task at MediaEval Multimedia Evaluation Campaign from 2013 to 2015. The benchmark attracted, in total, 21 active teams to participate in the challenge. We analyze the results of the benchmark: the winning algorithms and feature-sets. We also describe the design of the benchmark, the evaluation procedures and the data cleaning and transformations that we suggest. The results from the benchmark suggest that the recurrent neural network based approaches combined with large feature-sets work best for dynamic MER.
Hybrid least squares multivariate spectral analysis methods

DOEpatents

Haaland, David M.

2002-01-01

A set of hybrid least squares multivariate spectral analysis methods in which spectral shapes of components or effects not present in the original calibration step are added in a following estimation or calibration step to improve the accuracy of the estimation of the amount of the original components in the sampled mixture. The "hybrid" method herein means a combination of an initial classical least squares analysis calibration step with subsequent analysis by an inverse multivariate analysis method. A "spectral shape" herein means normally the spectral shape of a non-calibrated chemical component in the sample mixture but can also mean the spectral shapes of other sources of spectral variation, including temperature drift, shifts between spectrometers, spectrometer drift, etc. The "shape" can be continuous, discontinuous, or even discrete points illustrative of the particular effect.

Enhancing the Modeling of PFOA Pharmacokinetics with Bayesian Analysis

EPA Science Inventory

The detail sufficient to describe the pharmacokinetics (PK) for perfluorooctanoic acid (PFOA) and the methods necessary to combine information from multiple data sets are both subjects of ongoing investigation. Bayesian analysis provides tools to accommodate these goals. We exa...
The Objective Borderline method (OBM): a probability-based model for setting up an objective pass/fail cut-off score in medical programme assessments.

PubMed

Shulruf, Boaz; Turner, Rolf; Poole, Phillippa; Wilkinson, Tim

2013-05-01

The decision to pass or fail a medical student is a 'high stakes' one. The aim of this study is to introduce and demonstrate the feasibility and practicality of a new objective standard-setting method for determining the pass/fail cut-off score from borderline grades. Three methods for setting up pass/fail cut-off scores were compared: the Regression Method, the Borderline Group Method, and the new Objective Borderline Method (OBM). Using Year 5 students' OSCE results from one medical school we established the pass/fail cut-off scores by the abovementioned three methods. The comparison indicated that the pass/fail cut-off scores generated by the OBM were similar to those generated by the more established methods (0.840 ≤ r ≤ 0.998; p < .0001). Based on theoretical and empirical analysis, we suggest that the OBM has advantages over existing methods in that it combines objectivity, realism, robust empirical basis and, no less importantly, is simple to use.
Accuracy of pedicle screw placement based on preoperative computed tomography versus intraoperative data set acquisition for spinal navigation system.

PubMed

Liu, Hao; Chen, Weikai; Liu, Tao; Meng, Bin; Yang, Huilin

2017-01-01

To investigate the accuracy of pedicle screw placement based on preoperative computed tomography in comparison with intraoperative data set acquisition for spinal navigation system. The PubMed (MEDLINE), EMBASE, and Web of Science were systematically searched for the literature published up to September 2015. This review followed the Preferred Reporting Items for Systematic Reviews and Meta-analysis guidelines. Statistical analysis was performed using the Review Manager 5.3. The dichotomous data for the pedicle violation rate was summarized using relative risk (RR) and 95% confidence intervals (CIs) with the fixed-effects model. The level of significance was set at p < 0.05. For this meta-analysis, seven studies used a total of 579 patients and 2981 screws. The results revealed that the accuracy of intraoperative data set acquisition method is significantly higher than preoperative one using 2 mm grading criteria (RR: 1.82, 95% CI: 1.09, 3.04, I 2 = 0%, p = 0.02). However, there was no significant difference between two kinds of methods at the 0 mm grading criteria (RR: 1.13, 95% CI: 0.88, 1.46, I 2 = 17%, p = 0.34). Using the 2-mm grading criteria, there was a higher accuracy of pedicle screw insertion in O-arm-assisted navigation than CT-based navigation method (RR: 1.96, 95% CI: 1.05, 3.64, I 2 = 0%, p = 0.03). The accuracy between CT-based navigation and two-dimensional-based navigation showed no significant difference (RR: 1.02, 95% CI: 0.35-3.03, I 2 = 0%, p = 0.97). The intraoperative data set acquisition method may decrease the incidence of perforated screws over 2 mm but not increase the number of screws fully contained within the pedicle compared to preoperative CT-based navigation system. A significantly higher accuracy of intraoperative (O-arm) than preoperative CT-based navigation was revealed using 2 mm grading criteria.
DISCO-SCA and Properly Applied GSVD as Swinging Methods to Find Common and Distinctive Processes

PubMed Central

Van Deun, Katrijn; Van Mechelen, Iven; Thorrez, Lieven; Schouteden, Martijn; De Moor, Bart; van der Werf, Mariët J.; De Lathauwer, Lieven; Smilde, Age K.; Kiers, Henk A. L.

2012-01-01

Background In systems biology it is common to obtain for the same set of biological entities information from multiple sources. Examples include expression data for the same set of orthologous genes screened in different organisms and data on the same set of culture samples obtained with different high-throughput techniques. A major challenge is to find the important biological processes underlying the data and to disentangle therein processes common to all data sources and processes distinctive for a specific source. Recently, two promising simultaneous data integration methods have been proposed to attain this goal, namely generalized singular value decomposition (GSVD) and simultaneous component analysis with rotation to common and distinctive components (DISCO-SCA). Results Both theoretical analyses and applications to biologically relevant data show that: (1) straightforward applications of GSVD yield unsatisfactory results, (2) DISCO-SCA performs well, (3) provided proper pre-processing and algorithmic adaptations, GSVD reaches a performance level similar to that of DISCO-SCA, and (4) DISCO-SCA is directly generalizable to more than two data sources. The biological relevance of DISCO-SCA is illustrated with two applications. First, in a setting of comparative genomics, it is shown that DISCO-SCA recovers a common theme of cell cycle progression and a yeast-specific response to pheromones. The biological annotation was obtained by applying Gene Set Enrichment Analysis in an appropriate way. Second, in an application of DISCO-SCA to metabolomics data for Escherichia coli obtained with two different chemical analysis platforms, it is illustrated that the metabolites involved in some of the biological processes underlying the data are detected by one of the two platforms only; therefore, platforms for microbial metabolomics should be tailored to the biological question. Conclusions Both DISCO-SCA and properly applied GSVD are promising integrative methods for finding common and distinctive processes in multisource data. Open source code for both methods is provided. PMID:22693578
Improved accuracy in quantitative laser-induced breakdown spectroscopy using sub-models

DOE Office of Scientific and Technical Information (OSTI.GOV)

Anderson, Ryan B.; Clegg, Samuel M.; Frydenvang, Jens

We report that accurate quantitative analysis of diverse geologic materials is one of the primary challenges faced by the Laser-Induced Breakdown Spectroscopy (LIBS)-based ChemCam instrument on the Mars Science Laboratory (MSL) rover. The SuperCam instrument on the Mars 2020 rover, as well as other LIBS instruments developed for geochemical analysis on Earth or other planets, will face the same challenge. Consequently, part of the ChemCam science team has focused on the development of improved multivariate analysis calibrations methods. Developing a single regression model capable of accurately determining the composition of very different target materials is difficult because the response ofmore » an element’s emission lines in LIBS spectra can vary with the concentration of other elements. We demonstrate a conceptually simple “submodel” method for improving the accuracy of quantitative LIBS analysis of diverse target materials. The method is based on training several regression models on sets of targets with limited composition ranges and then “blending” these “sub-models” into a single final result. Tests of the sub-model method show improvement in test set root mean squared error of prediction (RMSEP) for almost all cases. Lastly, the sub-model method, using partial least squares regression (PLS), is being used as part of the current ChemCam quantitative calibration, but the sub-model method is applicable to any multivariate regression method and may yield similar improvements.« less
Improved accuracy in quantitative laser-induced breakdown spectroscopy using sub-models

DOE PAGES

Anderson, Ryan B.; Clegg, Samuel M.; Frydenvang, Jens; ...

2016-12-15

We report that accurate quantitative analysis of diverse geologic materials is one of the primary challenges faced by the Laser-Induced Breakdown Spectroscopy (LIBS)-based ChemCam instrument on the Mars Science Laboratory (MSL) rover. The SuperCam instrument on the Mars 2020 rover, as well as other LIBS instruments developed for geochemical analysis on Earth or other planets, will face the same challenge. Consequently, part of the ChemCam science team has focused on the development of improved multivariate analysis calibrations methods. Developing a single regression model capable of accurately determining the composition of very different target materials is difficult because the response ofmore » an element’s emission lines in LIBS spectra can vary with the concentration of other elements. We demonstrate a conceptually simple “submodel” method for improving the accuracy of quantitative LIBS analysis of diverse target materials. The method is based on training several regression models on sets of targets with limited composition ranges and then “blending” these “sub-models” into a single final result. Tests of the sub-model method show improvement in test set root mean squared error of prediction (RMSEP) for almost all cases. Lastly, the sub-model method, using partial least squares regression (PLS), is being used as part of the current ChemCam quantitative calibration, but the sub-model method is applicable to any multivariate regression method and may yield similar improvements.« less
Linear Discriminant Analysis for the in Silico Discovery of Mechanism-Based Reversible Covalent Inhibitors of a Serine Protease: Application of Hydration Thermodynamics Analysis and Semi-empirical Molecular Orbital Calculation.

PubMed

Masuda, Yosuke; Yoshida, Tomoki; Yamaotsu, Noriyuki; Hirono, Shuichi

2018-01-01

We recently reported that the Gibbs free energy of hydrolytic water molecules (ΔG wat ) in acyl-trypsin intermediates calculated by hydration thermodynamics analysis could be a useful metric for estimating the catalytic rate constants (k cat ) of mechanism-based reversible covalent inhibitors. For thorough evaluation, the proposed method was tested with an increased number of covalent ligands that have no corresponding crystal structures. After modeling acyl-trypsin intermediate structures using flexible molecular superposition, ΔG wat values were calculated according to the proposed method. The orbital energies of antibonding π* molecular orbitals (MOs) of carbonyl C=O in covalently modified catalytic serine (E orb ) were also calculated by semi-empirical MO calculations. Then, linear discriminant analysis (LDA) was performed to build a model that can discriminate covalent inhibitor candidates from substrate-like ligands using ΔG wat and E orb . The model was built using a training set (10 compounds) and then validated by a test set (4 compounds). As a result, the training set and test set ligands were perfectly discriminated by the model. Hydrolysis was slower when (1) the hydrolytic water molecule has lower ΔG wat ; (2) the covalent ligand presents higher E orb (higher reaction barrier). Results also showed that the entropic term of hydrolytic water molecule (-TΔS wat ) could be used for estimating k cat and for covalent inhibitor optimization; when the rotational freedom of the hydrolytic water molecule is limited, the chance for favorable interaction with the electrophilic acyl group would also be limited. The method proposed in this study would be useful for screening and optimizing the mechanism-based reversible covalent inhibitors.
Quantifying Effects of Pharmacological Blockers of Cardiac Autonomous Control Using Variability Parameters.

PubMed

Miyabara, Renata; Berg, Karsten; Kraemer, Jan F; Baltatu, Ovidiu C; Wessel, Niels; Campos, Luciana A

2017-01-01

Objective: The aim of this study was to identify the most sensitive heart rate and blood pressure variability (HRV and BPV) parameters from a given set of well-known methods for the quantification of cardiovascular autonomic function after several autonomic blockades. Methods: Cardiovascular sympathetic and parasympathetic functions were studied in freely moving rats following peripheral muscarinic (methylatropine), β1-adrenergic (metoprolol), muscarinic + β1-adrenergic, α1-adrenergic (prazosin), and ganglionic (hexamethonium) blockades. Time domain, frequency domain and symbolic dynamics measures for each of HRV and BPV were classified through paired Wilcoxon test for all autonomic drugs separately. In order to select those variables that have a high relevance to, and stable influence on our target measurements (HRV, BPV) we used Fisher's Method to combine the p -value of multiple tests. Results: This analysis led to the following best set of cardiovascular variability parameters: The mean normal beat-to-beat-interval/value (HRV/BPV: meanNN), the coefficient of variation (cvNN = standard deviation over meanNN) and the root mean square differences of successive (RMSSD) of the time domain analysis. In frequency domain analysis the very-low-frequency (VLF) component was selected. From symbolic dynamics Shannon entropy of the word distribution (FWSHANNON) as well as POLVAR3, the non-linear parameter to detect intermittently decreased variability, showed the best ability to discriminate between the different autonomic blockades. Conclusion: Throughout a complex comparative analysis of HRV and BPV measures altered by a set of autonomic drugs, we identified the most sensitive set of informative cardiovascular variability indexes able to pick up the modifications imposed by the autonomic challenges. These indexes may help to increase our understanding of cardiovascular sympathetic and parasympathetic functions in translational studies of experimental diseases.
A novel model for DNA sequence similarity analysis based on graph theory.

PubMed

Qi, Xingqin; Wu, Qin; Zhang, Yusen; Fuller, Eddie; Zhang, Cun-Quan

2011-01-01

Determination of sequence similarity is one of the major steps in computational phylogenetic studies. As we know, during evolutionary history, not only DNA mutations for individual nucleotide but also subsequent rearrangements occurred. It has been one of major tasks of computational biologists to develop novel mathematical descriptors for similarity analysis such that various mutation phenomena information would be involved simultaneously. In this paper, different from traditional methods (eg, nucleotide frequency, geometric representations) as bases for construction of mathematical descriptors, we construct novel mathematical descriptors based on graph theory. In particular, for each DNA sequence, we will set up a weighted directed graph. The adjacency matrix of the directed graph will be used to induce a representative vector for DNA sequence. This new approach measures similarity based on both ordering and frequency of nucleotides so that much more information is involved. As an application, the method is tested on a set of 0.9-kb mtDNA sequences of twelve different primate species. All output phylogenetic trees with various distance estimations have the same topology, and are generally consistent with the reported results from early studies, which proves the new method's efficiency; we also test the new method on a simulated data set, which shows our new method performs better than traditional global alignment method when subsequent rearrangements happen frequently during evolutionary history.
Multiway analysis methods applied to the fluorescence excitation-emission dataset for the simultaneous quantification of valsartan and amlodipine in tablets

NASA Astrophysics Data System (ADS)

Dinç, Erdal; Ertekin, Zehra Ceren; Büker, Eda

2017-09-01

In this study, excitation-emission matrix datasets, which have strong overlapping bands, were processed by using four different chemometric calibration algorithms consisting of parallel factor analysis, Tucker3, three-way partial least squares and unfolded partial least squares for the simultaneous quantitative estimation of valsartan and amlodipine besylate in tablets. In analyses, preliminary separation step was not used before the application of parallel factor analysis Tucker3, three-way partial least squares and unfolded partial least squares approaches for the analysis of the related drug substances in samples. Three-way excitation-emission matrix data array was obtained by concatenating excitation-emission matrices of the calibration set, validation set, and commercial tablet samples. The excitation-emission matrix data array was used to get parallel factor analysis, Tucker3, three-way partial least squares and unfolded partial least squares calibrations and to predict the amounts of valsartan and amlodipine besylate in samples. For all the methods, calibration and prediction of valsartan and amlodipine besylate were performed in the working concentration ranges of 0.25-4.50 μg/mL. The validity and the performance of all the proposed methods were checked by using the validation parameters. From the analysis results, it was concluded that the described two-way and three-way algorithmic methods were very useful for the simultaneous quantitative resolution and routine analysis of the related drug substances in marketed samples.
A COMBINED SPECTROSCOPIC AND PHOTOMETRIC STELLAR ACTIVITY STUDY OF EPSILON ERIDANI

DOE Office of Scientific and Technical Information (OSTI.GOV)

Giguere, Matthew J.; Fischer, Debra A.; Zhang, Cyril X. Y.

2016-06-20

We present simultaneous ground-based radial velocity (RV) measurements and space-based photometric measurements of the young and active K dwarf Epsilon Eridani. These measurements provide a data set for exploring methods of identifying and ultimately distinguishing stellar photospheric velocities from Keplerian motion. We compare three methods we have used in exploring this data set: Dalmatian, an MCMC spot modeling code that fits photometric and RV measurements simultaneously; the FF′ method, which uses photometric measurements to predict the stellar activity signal in simultaneous RV measurements; and H α analysis. We show that our H α measurements are strongly correlated with the Microvariabilitymore » and Oscillations of STars telescope ( MOST ) photometry, which led to a promising new method based solely on the spectroscopic observations. This new method, which we refer to as the HH′ method, uses H α measurements as input into the FF′ model. While the Dalmatian spot modeling analysis and the FF′ method with MOST space-based photometry are currently more robust, the HH′ method only makes use of one of the thousands of stellar lines in the visible spectrum. By leveraging additional spectral activity indicators, we believe the HH′ method may prove quite useful in disentangling stellar signals.« less
The effectiveness of multi-component goal setting interventions for changing physical activity behaviour: a systematic review and meta-analysis.

PubMed

McEwan, Desmond; Harden, Samantha M; Zumbo, Bruno D; Sylvester, Benjamin D; Kaulius, Megan; Ruissen, Geralyn R; Dowd, A Justine; Beauchamp, Mark R

2016-01-01

Drawing from goal setting theory (Latham & Locke, 1991; Locke & Latham, 2002; Locke et al., 1981), the purpose of this study was to conduct a systematic review and meta-analysis of multi-component goal setting interventions for changing physical activity (PA) behaviour. A literature search returned 41,038 potential articles. Included studies consisted of controlled experimental trials wherein participants in the intervention conditions set PA goals and their PA behaviour was compared to participants in a control group who did not set goals. A meta-analysis was ultimately carried out across 45 articles (comprising 52 interventions, 126 effect sizes, n = 5912) that met eligibility criteria using a random-effects model. Overall, a medium, positive effect (Cohen's d(SE) = .552(.06), 95% CI = .43-.67, Z = 9.03, p < .001) of goal setting interventions in relation to PA behaviour was found. Moderator analyses across 20 variables revealed several noteworthy results with regard to features of the study, sample characteristics, PA goal content, and additional goal-related behaviour change techniques. In conclusion, multi-component goal setting interventions represent an effective method of fostering PA across a diverse range of populations and settings. Implications for effective goal setting interventions are discussed.
Identification of Alfalfa Leaf Diseases Using Image Recognition Technology

PubMed Central

Qin, Feng; Liu, Dongxia; Sun, Bingda; Ruan, Liu; Ma, Zhanhong; Wang, Haiguang

2016-01-01

Common leaf spot (caused by Pseudopeziza medicaginis), rust (caused by Uromyces striatus), Leptosphaerulina leaf spot (caused by Leptosphaerulina briosiana) and Cercospora leaf spot (caused by Cercospora medicaginis) are the four common types of alfalfa leaf diseases. Timely and accurate diagnoses of these diseases are critical for disease management, alfalfa quality control and the healthy development of the alfalfa industry. In this study, the identification and diagnosis of the four types of alfalfa leaf diseases were investigated using pattern recognition algorithms based on image-processing technology. A sub-image with one or multiple typical lesions was obtained by artificial cutting from each acquired digital disease image. Then the sub-images were segmented using twelve lesion segmentation methods integrated with clustering algorithms (including K_means clustering, fuzzy C-means clustering and K_median clustering) and supervised classification algorithms (including logistic regression analysis, Naive Bayes algorithm, classification and regression tree, and linear discriminant analysis). After a comprehensive comparison, the segmentation method integrating the K_median clustering algorithm and linear discriminant analysis was chosen to obtain lesion images. After the lesion segmentation using this method, a total of 129 texture, color and shape features were extracted from the lesion images. Based on the features selected using three methods (ReliefF, 1R and correlation-based feature selection), disease recognition models were built using three supervised learning methods, including the random forest, support vector machine (SVM) and K-nearest neighbor methods. A comparison of the recognition results of the models was conducted. The results showed that when the ReliefF method was used for feature selection, the SVM model built with the most important 45 features (selected from a total of 129 features) was the optimal model. For this SVM model, the recognition accuracies of the training set and the testing set were 97.64% and 94.74%, respectively. Semi-supervised models for disease recognition were built based on the 45 effective features that were used for building the optimal SVM model. For the optimal semi-supervised models built with three ratios of labeled to unlabeled samples in the training set, the recognition accuracies of the training set and the testing set were both approximately 80%. The results indicated that image recognition of the four alfalfa leaf diseases can be implemented with high accuracy. This study provides a feasible solution for lesion image segmentation and image recognition of alfalfa leaf disease. PMID:27977767
Identification of Alfalfa Leaf Diseases Using Image Recognition Technology.

PubMed

Qin, Feng; Liu, Dongxia; Sun, Bingda; Ruan, Liu; Ma, Zhanhong; Wang, Haiguang

2016-01-01

Common leaf spot (caused by Pseudopeziza medicaginis), rust (caused by Uromyces striatus), Leptosphaerulina leaf spot (caused by Leptosphaerulina briosiana) and Cercospora leaf spot (caused by Cercospora medicaginis) are the four common types of alfalfa leaf diseases. Timely and accurate diagnoses of these diseases are critical for disease management, alfalfa quality control and the healthy development of the alfalfa industry. In this study, the identification and diagnosis of the four types of alfalfa leaf diseases were investigated using pattern recognition algorithms based on image-processing technology. A sub-image with one or multiple typical lesions was obtained by artificial cutting from each acquired digital disease image. Then the sub-images were segmented using twelve lesion segmentation methods integrated with clustering algorithms (including K_means clustering, fuzzy C-means clustering and K_median clustering) and supervised classification algorithms (including logistic regression analysis, Naive Bayes algorithm, classification and regression tree, and linear discriminant analysis). After a comprehensive comparison, the segmentation method integrating the K_median clustering algorithm and linear discriminant analysis was chosen to obtain lesion images. After the lesion segmentation using this method, a total of 129 texture, color and shape features were extracted from the lesion images. Based on the features selected using three methods (ReliefF, 1R and correlation-based feature selection), disease recognition models were built using three supervised learning methods, including the random forest, support vector machine (SVM) and K-nearest neighbor methods. A comparison of the recognition results of the models was conducted. The results showed that when the ReliefF method was used for feature selection, the SVM model built with the most important 45 features (selected from a total of 129 features) was the optimal model. For this SVM model, the recognition accuracies of the training set and the testing set were 97.64% and 94.74%, respectively. Semi-supervised models for disease recognition were built based on the 45 effective features that were used for building the optimal SVM model. For the optimal semi-supervised models built with three ratios of labeled to unlabeled samples in the training set, the recognition accuracies of the training set and the testing set were both approximately 80%. The results indicated that image recognition of the four alfalfa leaf diseases can be implemented with high accuracy. This study provides a feasible solution for lesion image segmentation and image recognition of alfalfa leaf disease.
Quantitative analysis of Sudan dye adulteration in paprika powder using FTIR spectroscopy.

PubMed

Lohumi, Santosh; Joshi, Ritu; Kandpal, Lalit Mohan; Lee, Hoonsoo; Kim, Moon S; Cho, Hyunjeong; Mo, Changyeun; Seo, Young-Wook; Rahman, Anisur; Cho, Byoung-Kwan

2017-05-01

As adulteration of foodstuffs with Sudan dye, especially paprika- and chilli-containing products, has been reported with some frequency, this issue has become one focal point for addressing food safety. FTIR spectroscopy has been used extensively as an analytical method for quality control and safety determination for food products. Thus, the use of FTIR spectroscopy for rapid determination of Sudan dye in paprika powder was investigated in this study. A net analyte signal (NAS)-based methodology, named HLA/GO (hybrid linear analysis in the literature), was applied to FTIR spectral data to predict Sudan dye concentration. The calibration and validation sets were designed to evaluate the performance of the multivariate method. The obtained results had a high determination coefficient (R 2 ) of 0.98 and low root mean square error (RMSE) of 0.026% for the calibration set, and an R 2 of 0.97 and RMSE of 0.05% for the validation set. The model was further validated using a second validation set and through the figures of merit, such as sensitivity, selectivity, and limits of detection and quantification. The proposed technique of FTIR combined with HLA/GO is rapid, simple and low cost, making this approach advantageous when compared with the main alternative methods based on liquid chromatography (LC) techniques.
Seismic joint analysis for non-destructive testing of asphalt and concrete slabs

USGS Publications Warehouse

Ryden, N.; Park, C.B.

2005-01-01

A seismic approach is used to estimate the thickness and elastic stiffness constants of asphalt or concrete slabs. The overall concept of the approach utilizes the robustness of the multichannel seismic method. A multichannel-equivalent data set is compiled from multiple time series recorded from multiple hammer impacts at progressively different offsets from a fixed receiver. This multichannel simulation with one receiver (MSOR) replaces the true multichannel recording in a cost-effective and convenient manner. A recorded data set is first processed to evaluate the shear wave velocity through a wave field transformation, normally used in the multichannel analysis of surface waves (MASW) method, followed by a Lambwave inversion. Then, the same data set is used to evaluate compression wave velocity from a combined processing of the first-arrival picking and a linear regression. Finally, the amplitude spectra of the time series are used to evaluate the thickness by following the concepts utilized in the Impact Echo (IE) method. Due to the powerful signal extraction capabilities ensured by the multichannel processing schemes used, the entire procedure for all three evaluations can be fully automated and results can be obtained directly in the field. A field data set is used to demonstrate the proposed approach.
Development of a New Optical Measuring Set-Up

NASA Astrophysics Data System (ADS)

Miroshnichenko, I. P.; Parinov, I. A.

2018-06-01

The paper proposes a description of the developed optical measuring set-up for the contactless recording and processing of measurement results for small spatial (linear and angular) displacements of control surfaces based on the use of laser technologies and optical interference methods. The proposed set-up is designed to solve all the arising measurement tasks in the study of the physical and mechanical properties of new materials and in the process of diagnosing the state of structural materials by acoustic active methods of nondestructive testing. The structure of the set-up, its constituent parts are described, and the features of construction and functioning during measurements are discussed. New technical solutions for the implementation of the components of the set-up under consideration are obtained. The purpose and description of the original specialized software, used to perform a priori analysis of measurement results, are present, while performing measurements, for a posteriori analysis of measurement results. Moreover, the influences of internal and external disturbance effects on the measurement results and correcting measurement results directly in their implementation are determined. The technical solutions, used in the set-up, are protected by the patents of the Russian Federation for inventions, and software is protected by the certificates of state registration of computer programs. The proposed set-up is intended for use in instrumentation, mechanical engineering, shipbuilding, aviation, energy sector, etc.
Improving information retrieval in functional analysis.

PubMed

Rodriguez, Juan C; González, Germán A; Fresno, Cristóbal; Llera, Andrea S; Fernández, Elmer A

2016-12-01

Transcriptome analysis is essential to understand the mechanisms regulating key biological processes and functions. The first step usually consists of identifying candidate genes; to find out which pathways are affected by those genes, however, functional analysis (FA) is mandatory. The most frequently used strategies for this purpose are Gene Set and Singular Enrichment Analysis (GSEA and SEA) over Gene Ontology. Several statistical methods have been developed and compared in terms of computational efficiency and/or statistical appropriateness. However, whether their results are similar or complementary, the sensitivity to parameter settings, or possible bias in the analyzed terms has not been addressed so far. Here, two GSEA and four SEA methods and their parameter combinations were evaluated in six datasets by comparing two breast cancer subtypes with well-known differences in genetic background and patient outcomes. We show that GSEA and SEA lead to different results depending on the chosen statistic, model and/or parameters. Both approaches provide complementary results from a biological perspective. Hence, an Integrative Functional Analysis (IFA) tool is proposed to improve information retrieval in FA. It provides a common gene expression analytic framework that grants a comprehensive and coherent analysis. Only a minimal user parameter setting is required, since the best SEA/GSEA alternatives are integrated. IFA utility was demonstrated by evaluating four prostate cancer and the TCGA breast cancer microarray datasets, which showed its biological generalization capabilities. Copyright © 2016 Elsevier Ltd. All rights reserved.
Transient analysis mode participation for modal survey target mode selection using MSC/NASTRAN DMAP

NASA Technical Reports Server (NTRS)

Barnett, Alan R.; Ibrahim, Omar M.; Sullivan, Timothy L.; Goodnight, Thomas W.

1994-01-01

Many methods have been developed to aid analysts in identifying component modes which contribute significantly to component responses. These modes, typically targeted for dynamic model correlation via a modal survey, are known as target modes. Most methods used to identify target modes are based on component global dynamic behavior. It is sometimes unclear if these methods identify all modes contributing to responses important to the analyst. These responses are usually those in areas of hardware design concerns. One method used to check the completeness of target mode sets and identify modes contributing significantly to important component responses is mode participation. With this method, the participation of component modes in dynamic responses is quantified. Those modes which have high participation are likely modal survey target modes. Mode participation is most beneficial when it is used with responses from analyses simulating actual flight events. For spacecraft, these responses are generated via a structural dynamic coupled loads analysis. Using MSC/NASTRAN DMAP, a method has been developed for calculating mode participation based on transient coupled loads analysis results. The algorithm has been implemented to be compatible with an existing coupled loads methodology and has been used successfully to develop a set of modal survey target modes.
Methods for Genome-Wide Analysis of Gene Expression Changes in Polyploids

PubMed Central

Wang, Jianlin; Lee, Jinsuk J.; Tian, Lu; Lee, Hyeon-Se; Chen, Meng; Rao, Sheetal; Wei, Edward N.; Doerge, R. W.; Comai, Luca; Jeffrey Chen, Z.

2007-01-01

Polyploidy is an evolutionary innovation, providing extra sets of genetic material for phenotypic variation and adaptation. It is predicted that changes of gene expression by genetic and epigenetic mechanisms are responsible for novel variation in nascent and established polyploids (Liu and Wendel, 2002; Osborn et al., 2003; Pikaard, 2001). Studying gene expression changes in allopolyploids is more complicated than in autopolyploids, because allopolyploids contain more than two sets of genomes originating from divergent, but related, species. Here we describe two methods that are applicable to the genome-wide analysis of gene expression differences resulting from genome duplication in autopolyploids or interactions between homoeologous genomes in allopolyploids. First, we describe an amplified fragment length polymorphism (AFLP)–complementary DNA (cDNA) display method that allows the discrimination of homoeologous loci based on restriction polymorphisms between the progenitors. Second, we describe microarray analyses that can be used to compare gene expression differences between the allopolyploids and respective progenitors using appropriate experimental design and statistical analysis. We demonstrate the utility of these two complementary methods and discuss the pros and cons of using the methods to analyze gene expression changes in autopolyploids and allopolyploids. Furthermore, we describe these methods in general terms to be of wider applicability for comparative gene expression in a variety of evolutionary, genetic, biological, and physiological contexts. PMID:15865985

Independent Component Analysis of Textures

NASA Technical Reports Server (NTRS)

Manduchi, Roberto; Portilla, Javier

2000-01-01

A common method for texture representation is to use the marginal probability densities over the outputs of a set of multi-orientation, multi-scale filters as a description of the texture. We propose a technique, based on Independent Components Analysis, for choosing the set of filters that yield the most informative marginals, meaning that the product over the marginals most closely approximates the joint probability density function of the filter outputs. The algorithm is implemented using a steerable filter space. Experiments involving both texture classification and synthesis show that compared to Principal Components Analysis, ICA provides superior performance for modeling of natural and synthetic textures.
Behavior of R-Square for Pooled Data Sets.

ERIC Educational Resources Information Center

Adams, Arthur J.; Shiffler, Ronald E.

1989-01-01

New methods of analysis--equations and graphs for iso-r(sup 2) contours--were introduced and used to illustrate location effects for pooled data sets. The "r(sup 2)" is the coefficient of determination. Results are used to highlight imprecise statements in the literature about the behavior of the correlation coefficient for pooled data…
Multivariate analysis in the pharmaceutical industry: enabling process understanding and improvement in the PAT and QbD era.

PubMed

Ferreira, Ana P; Tobyn, Mike

2015-01-01

In the pharmaceutical industry, chemometrics is rapidly establishing itself as a tool that can be used at every step of product development and beyond: from early development to commercialization. This set of multivariate analysis methods allows the extraction of information contained in large, complex data sets thus contributing to increase product and process understanding which is at the core of the Food and Drug Administration's Process Analytical Tools (PAT) Guidance for Industry and the International Conference on Harmonisation's Pharmaceutical Development guideline (Q8). This review is aimed at providing pharmaceutical industry professionals an introduction to multivariate analysis and how it is being adopted and implemented by companies in the transition from "quality-by-testing" to "quality-by-design". It starts with an introduction to multivariate analysis and the two methods most commonly used: principal component analysis and partial least squares regression, their advantages, common pitfalls and requirements for their effective use. That is followed with an overview of the diverse areas of application of multivariate analysis in the pharmaceutical industry: from the development of real-time analytical methods to definition of the design space and control strategy, from formulation optimization during development to the application of quality-by-design principles to improve manufacture of existing commercial products.
Conducting qualitative research in mental health: Thematic and content analyses.

PubMed

Crowe, Marie; Inder, Maree; Porter, Richard

2015-07-01

The objective of this paper is to describe two methods of qualitative analysis - thematic analysis and content analysis - and to examine their use in a mental health context. A description of the processes of thematic analysis and content analysis is provided. These processes are then illustrated by conducting two analyses of the same qualitative data. Transcripts of qualitative interviews are analysed using each method to illustrate these processes. The illustration of the processes highlights the different outcomes from the same set of data. Thematic and content analyses are qualitative methods that serve different research purposes. Thematic analysis provides an interpretation of participants' meanings, while content analysis is a direct representation of participants' responses. These methods provide two ways of understanding meanings and experiences and provide important knowledge in a mental health context. © The Royal Australian and New Zealand College of Psychiatrists 2015.
The Integration of Linguistic Theory: Internal Reconstruction and the Comparative Method in Descriptive Linguistics.

ERIC Educational Resources Information Center

Bailey, Charles-James N.

The author aims: (1) to show that generative phonology uses essentially the method of internal reconstruction which has previously been employed only in diachronic studies in setting up synchronic underlying phonological representations; (2) to show why synchronic analysis should add the comparative method to its arsenal, together with whatever…
Progressive sampling-based Bayesian optimization for efficient and automatic machine learning model selection.

PubMed

Zeng, Xueqiang; Luo, Gang

2017-12-01

Machine learning is broadly used for clinical data analysis. Before training a model, a machine learning algorithm must be selected. Also, the values of one or more model parameters termed hyper-parameters must be set. Selecting algorithms and hyper-parameter values requires advanced machine learning knowledge and many labor-intensive manual iterations. To lower the bar to machine learning, miscellaneous automatic selection methods for algorithms and/or hyper-parameter values have been proposed. Existing automatic selection methods are inefficient on large data sets. This poses a challenge for using machine learning in the clinical big data era. To address the challenge, this paper presents progressive sampling-based Bayesian optimization, an efficient and automatic selection method for both algorithms and hyper-parameter values. We report an implementation of the method. We show that compared to a state of the art automatic selection method, our method can significantly reduce search time, classification error rate, and standard deviation of error rate due to randomization. This is major progress towards enabling fast turnaround in identifying high-quality solutions required by many machine learning-based clinical data analysis tasks.
Level set method for image segmentation based on moment competition

NASA Astrophysics Data System (ADS)

Min, Hai; Wang, Xiao-Feng; Huang, De-Shuang; Jin, Jing; Wang, Hong-Zhi; Li, Hai

2015-05-01

We propose a level set method for image segmentation which introduces the moment competition and weakly supervised information into the energy functional construction. Different from the region-based level set methods which use force competition, the moment competition is adopted to drive the contour evolution. Here, a so-called three-point labeling scheme is proposed to manually label three independent points (weakly supervised information) on the image. Then the intensity differences between the three points and the unlabeled pixels are used to construct the force arms for each image pixel. The corresponding force is generated from the global statistical information of a region-based method and weighted by the force arm. As a result, the moment can be constructed and incorporated into the energy functional to drive the evolving contour to approach the object boundary. In our method, the force arm can take full advantage of the three-point labeling scheme to constrain the moment competition. Additionally, the global statistical information and weakly supervised information are successfully integrated, which makes the proposed method more robust than traditional methods for initial contour placement and parameter setting. Experimental results with performance analysis also show the superiority of the proposed method on segmenting different types of complicated images, such as noisy images, three-phase images, images with intensity inhomogeneity, and texture images.
Study of Burn Scar Extraction Automatically Based on Level Set Method using Remote Sensing Data

PubMed Central

Liu, Yang; Dai, Qin; Liu, JianBo; Liu, ShiBin; Yang, Jin

2014-01-01

Burn scar extraction using remote sensing data is an efficient way to precisely evaluate burn area and measure vegetation recovery. Traditional burn scar extraction methodologies have no well effect on burn scar image with blurred and irregular edges. To address these issues, this paper proposes an automatic method to extract burn scar based on Level Set Method (LSM). This method utilizes the advantages of the different features in remote sensing images, as well as considers the practical needs of extracting the burn scar rapidly and automatically. This approach integrates Change Vector Analysis (CVA), Normalized Difference Vegetation Index (NDVI) and the Normalized Burn Ratio (NBR) to obtain difference image and modifies conventional Level Set Method Chan-Vese (C-V) model with a new initial curve which results from a binary image applying K-means method on fitting errors of two near-infrared band images. Landsat 5 TM and Landsat 8 OLI data sets are used to validate the proposed method. Comparison with conventional C-V model, OSTU algorithm, Fuzzy C-mean (FCM) algorithm are made to show that the proposed approach can extract the outline curve of fire burn scar effectively and exactly. The method has higher extraction accuracy and less algorithm complexity than that of the conventional C-V model. PMID:24503563
Cluster randomised crossover trials with binary data and unbalanced cluster sizes: application to studies of near-universal interventions in intensive care.

PubMed

Forbes, Andrew B; Akram, Muhammad; Pilcher, David; Cooper, Jamie; Bellomo, Rinaldo

2015-02-01

Cluster randomised crossover trials have been utilised in recent years in the health and social sciences. Methods for analysis have been proposed; however, for binary outcomes, these have received little assessment of their appropriateness. In addition, methods for determination of sample size are currently limited to balanced cluster sizes both between clusters and between periods within clusters. This article aims to extend this work to unbalanced situations and to evaluate the properties of a variety of methods for analysis of binary data, with a particular focus on the setting of potential trials of near-universal interventions in intensive care to reduce in-hospital mortality. We derive a formula for sample size estimation for unbalanced cluster sizes, and apply it to the intensive care setting to demonstrate the utility of the cluster crossover design. We conduct a numerical simulation of the design in the intensive care setting and for more general configurations, and we assess the performance of three cluster summary estimators and an individual-data estimator based on binomial-identity-link regression. For settings similar to the intensive care scenario involving large cluster sizes and small intra-cluster correlations, the sample size formulae developed and analysis methods investigated are found to be appropriate, with the unweighted cluster summary method performing well relative to the more optimal but more complex inverse-variance weighted method. More generally, we find that the unweighted and cluster-size-weighted summary methods perform well, with the relative efficiency of each largely determined systematically from the study design parameters. Performance of individual-data regression is adequate with small cluster sizes but becomes inefficient for large, unbalanced cluster sizes. When outcome prevalences are 6% or less and the within-cluster-within-period correlation is 0.05 or larger, all methods display sub-nominal confidence interval coverage, with the less prevalent the outcome the worse the coverage. As with all simulation studies, conclusions are limited to the configurations studied. We confined attention to detecting intervention effects on an absolute risk scale using marginal models and did not explore properties of binary random effects models. Cluster crossover designs with binary outcomes can be analysed using simple cluster summary methods, and sample size in unbalanced cluster size settings can be determined using relatively straightforward formulae. However, caution needs to be applied in situations with low prevalence outcomes and moderate to high intra-cluster correlations. © The Author(s) 2014.
Functional brain segmentation using inter-subject correlation in fMRI.

PubMed

Kauppi, Jukka-Pekka; Pajula, Juha; Niemi, Jari; Hari, Riitta; Tohka, Jussi

2017-05-01

The human brain continuously processes massive amounts of rich sensory information. To better understand such highly complex brain processes, modern neuroimaging studies are increasingly utilizing experimental setups that better mimic daily-life situations. A new exploratory data-analysis approach, functional segmentation inter-subject correlation analysis (FuSeISC), was proposed to facilitate the analysis of functional magnetic resonance (fMRI) data sets collected in these experiments. The method provides a new type of functional segmentation of brain areas, not only characterizing areas that display similar processing across subjects but also areas in which processing across subjects is highly variable. FuSeISC was tested using fMRI data sets collected during traditional block-design stimuli (37 subjects) as well as naturalistic auditory narratives (19 subjects). The method identified spatially local and/or bilaterally symmetric clusters in several cortical areas, many of which are known to be processing the types of stimuli used in the experiments. The method is not only useful for spatial exploration of large fMRI data sets obtained using naturalistic stimuli, but also has other potential applications, such as generation of a functional brain atlases including both lower- and higher-order processing areas. Finally, as a part of FuSeISC, a criterion-based sparsification of the shared nearest-neighbor graph was proposed for detecting clusters in noisy data. In the tests with synthetic data, this technique was superior to well-known clustering methods, such as Ward's method, affinity propagation, and K-means ++. Hum Brain Mapp 38:2643-2665, 2017. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.
Using Trial-Based Functional Analysis to Design Effective Interventions for Students Diagnosed with Autism Spectrum Disorder

ERIC Educational Resources Information Center

Larkin, Wallace; Hawkins, Renee O.; Collins, Tai

2016-01-01

Functional behavior assessments and function-based interventions are effective methods for addressing the challenging behaviors of children; however, traditional functional analysis has limitations that impact usability in applied settings. Trial-based functional analysis addresses concerns relating to the length of time, level of expertise…
Pathway analysis of high-throughput biological data within a Bayesian network framework.

PubMed

Isci, Senol; Ozturk, Cengizhan; Jones, Jon; Otu, Hasan H

2011-06-15

Most current approaches to high-throughput biological data (HTBD) analysis either perform individual gene/protein analysis or, gene/protein set enrichment analysis for a list of biologically relevant molecules. Bayesian Networks (BNs) capture linear and non-linear interactions, handle stochastic events accounting for noise, and focus on local interactions, which can be related to causal inference. Here, we describe for the first time an algorithm that models biological pathways as BNs and identifies pathways that best explain given HTBD by scoring fitness of each network. Proposed method takes into account the connectivity and relatedness between nodes of the pathway through factoring pathway topology in its model. Our simulations using synthetic data demonstrated robustness of our approach. We tested proposed method, Bayesian Pathway Analysis (BPA), on human microarray data regarding renal cell carcinoma (RCC) and compared our results with gene set enrichment analysis. BPA was able to find broader and more specific pathways related to RCC. Accompanying BPA software (BPAS) package is freely available for academic use at http://bumil.boun.edu.tr/bpa.
Data preprocessing methods of FT-NIR spectral data for the classification cooking oil

NASA Astrophysics Data System (ADS)

Ruah, Mas Ezatul Nadia Mohd; Rasaruddin, Nor Fazila; Fong, Sim Siong; Jaafar, Mohd Zuli

2014-12-01

This recent work describes the data pre-processing method of FT-NIR spectroscopy datasets of cooking oil and its quality parameters with chemometrics method. Pre-processing of near-infrared (NIR) spectral data has become an integral part of chemometrics modelling. Hence, this work is dedicated to investigate the utility and effectiveness of pre-processing algorithms namely row scaling, column scaling and single scaling process with Standard Normal Variate (SNV). The combinations of these scaling methods have impact on exploratory analysis and classification via Principle Component Analysis plot (PCA). The samples were divided into palm oil and non-palm cooking oil. The classification model was build using FT-NIR cooking oil spectra datasets in absorbance mode at the range of 4000cm-1-14000cm-1. Savitzky Golay derivative was applied before developing the classification model. Then, the data was separated into two sets which were training set and test set by using Duplex method. The number of each class was kept equal to 2/3 of the class that has the minimum number of sample. Then, the sample was employed t-statistic as variable selection method in order to select which variable is significant towards the classification models. The evaluation of data pre-processing were looking at value of modified silhouette width (mSW), PCA and also Percentage Correctly Classified (%CC). The results show that different data processing strategies resulting to substantial amount of model performances quality. The effects of several data pre-processing i.e. row scaling, column standardisation and single scaling process with Standard Normal Variate indicated by mSW and %CC. At two PCs model, all five classifier gave high %CC except Quadratic Distance Analysis.
[Analyzing crude/processed root of Polygonum multiflorum from different habitats by UPLC fingerprint and mode identification methods].

PubMed

Xiao, Rong; Lin, Yan; Lei, Si-Min; Zhang, Ying; Huang, Jie; Xia, Bo-Hou; Li, Chun; Liao, Duan-Fang; Wu, Ping; Lin, Li-Mei

2017-06-01

To establish a content determination method for 2,3,5,4'-tetrahydroxy stilbene-2-O-β-D-glucoside (TSG) of the crude/processed root of Polygonum multiflorum from different habitats in China and set up the fingerprint by using UPLC. Various samples were pretreated by macro-porous resin. Then UPLC analysis was performed on Waters ACQUITY UPLC@BEH C18 chromatographic column (2.1 mm×50 mm, 1.7 μm) at (25±5) ℃. A binary gradient elution system was composed of acetonitrile (phase A) and 0.5% acetic acid solution (phase B). Detection was performed at the wavelength of 254 nm, and the mobile flow rate was set at 0.3 mL•min⁻¹. Results showed that the yield of extraction of the 2,3,5,4'-tetrahydroxy stilbene-2-O-β-D-glucoside from root of P. multiflorum was all over 25.0% after macro-porous resin separation; an exclusive UPLC fingerprint method of the crude/processed root of P. multiflorum from different habitats was successfully set up and 17 chromatographic peaks were calibrated. Cluster analysis can not entirely distinguish the crude one from the processed one, while principal component analysis absolutely can. 2,3,5,4'-tetrahydroxy stilbene-2-O-β-D-glucoside is the composition that has largest differences in variable importance in projection (VIP) between crude and processed root of P. multiflorum. The separating method can gain high-purity 2,3,5,4'-tetrahydroxy stilbene-2-O-β-D-glucoside, and the determination method is simple, sensitive, reliable and can be used in fast identifying the crude/processed root of P. multiflorum or as a method for overall quality control of root of P. multiflorum. Copyright© by the Chinese Pharmaceutical Association.
Comparison of Random Forest and Support Vector Machine classifiers using UAV remote sensing imagery

NASA Astrophysics Data System (ADS)

Piragnolo, Marco; Masiero, Andrea; Pirotti, Francesco

2017-04-01

Since recent years surveying with unmanned aerial vehicles (UAV) is getting a great amount of attention due to decreasing costs, higher precision and flexibility of usage. UAVs have been applied for geomorphological investigations, forestry, precision agriculture, cultural heritage assessment and for archaeological purposes. It can be used for land use and land cover classification (LULC). In literature, there are two main types of approaches for classification of remote sensing imagery: pixel-based and object-based. On one hand, pixel-based approach mostly uses training areas to define classes and respective spectral signatures. On the other hand, object-based classification considers pixels, scale, spatial information and texture information for creating homogeneous objects. Machine learning methods have been applied successfully for classification, and their use is increasing due to the availability of faster computing capabilities. The methods learn and train the model from previous computation. Two machine learning methods which have given good results in previous investigations are Random Forest (RF) and Support Vector Machine (SVM). The goal of this work is to compare RF and SVM methods for classifying LULC using images collected with a fixed wing UAV. The processing chain regarding classification uses packages in R, an open source scripting language for data analysis, which provides all necessary algorithms. The imagery was acquired and processed in November 2015 with cameras providing information over the red, blue, green and near infrared wavelength reflectivity over a testing area in the campus of Agripolis, in Italy. Images were elaborated and ortho-rectified through Agisoft Photoscan. The ortho-rectified image is the full data set, and the test set is derived from partial sub-setting of the full data set. Different tests have been carried out, using a percentage from 2 % to 20 % of the total. Ten training sets and ten validation sets are obtained from each test set. The control dataset consist of an independent visual classification done by an expert over the whole area. The classes are (i) broadleaf, (ii) building, (iii) grass, (iv) headland access path, (v) road, (vi) sowed land, (vii) vegetable. The RF and SVM are applied to the test set. The performances of the methods are evaluated using the three following accuracy metrics: Kappa index, Classification accuracy and Classification Error. All three are calculated in three different ways: with K-fold cross validation, using the validation test set and using the full test set. The analysis indicates that SVM gets better results in terms of good scores using K-fold cross or validation test set. Using the full test set, RF achieves a better result in comparison to SVM. It also seems that SVM performs better with smaller training sets, whereas RF performs better as training sets get larger.
UpSet: Visualization of Intersecting Sets

PubMed Central

Lex, Alexander; Gehlenborg, Nils; Strobelt, Hendrik; Vuillemot, Romain; Pfister, Hanspeter

2016-01-01

Understanding relationships between sets is an important analysis task that has received widespread attention in the visualization community. The major challenge in this context is the combinatorial explosion of the number of set intersections if the number of sets exceeds a trivial threshold. In this paper we introduce UpSet, a novel visualization technique for the quantitative analysis of sets, their intersections, and aggregates of intersections. UpSet is focused on creating task-driven aggregates, communicating the size and properties of aggregates and intersections, and a duality between the visualization of the elements in a dataset and their set membership. UpSet visualizes set intersections in a matrix layout and introduces aggregates based on groupings and queries. The matrix layout enables the effective representation of associated data, such as the number of elements in the aggregates and intersections, as well as additional summary statistics derived from subset or element attributes. Sorting according to various measures enables a task-driven analysis of relevant intersections and aggregates. The elements represented in the sets and their associated attributes are visualized in a separate view. Queries based on containment in specific intersections, aggregates or driven by attribute filters are propagated between both views. We also introduce several advanced visual encodings and interaction methods to overcome the problems of varying scales and to address scalability. UpSet is web-based and open source. We demonstrate its general utility in multiple use cases from various domains. PMID:26356912
Machine learning for neuroimaging with scikit-learn.

PubMed

Abraham, Alexandre; Pedregosa, Fabian; Eickenberg, Michael; Gervais, Philippe; Mueller, Andreas; Kossaifi, Jean; Gramfort, Alexandre; Thirion, Bertrand; Varoquaux, Gaël

2014-01-01

Statistical machine learning methods are increasingly used for neuroimaging data analysis. Their main virtue is their ability to model high-dimensional datasets, e.g., multivariate analysis of activation images or resting-state time series. Supervised learning is typically used in decoding or encoding settings to relate brain images to behavioral or clinical observations, while unsupervised learning can uncover hidden structures in sets of images (e.g., resting state functional MRI) or find sub-populations in large cohorts. By considering different functional neuroimaging applications, we illustrate how scikit-learn, a Python machine learning library, can be used to perform some key analysis steps. Scikit-learn contains a very large set of statistical learning algorithms, both supervised and unsupervised, and its application to neuroimaging data provides a versatile tool to study the brain.
Machine learning for neuroimaging with scikit-learn

PubMed Central

Abraham, Alexandre; Pedregosa, Fabian; Eickenberg, Michael; Gervais, Philippe; Mueller, Andreas; Kossaifi, Jean; Gramfort, Alexandre; Thirion, Bertrand; Varoquaux, Gaël

2014-01-01

Statistical machine learning methods are increasingly used for neuroimaging data analysis. Their main virtue is their ability to model high-dimensional datasets, e.g., multivariate analysis of activation images or resting-state time series. Supervised learning is typically used in decoding or encoding settings to relate brain images to behavioral or clinical observations, while unsupervised learning can uncover hidden structures in sets of images (e.g., resting state functional MRI) or find sub-populations in large cohorts. By considering different functional neuroimaging applications, we illustrate how scikit-learn, a Python machine learning library, can be used to perform some key analysis steps. Scikit-learn contains a very large set of statistical learning algorithms, both supervised and unsupervised, and its application to neuroimaging data provides a versatile tool to study the brain. PMID:24600388
A Review of Feature Extraction Software for Microarray Gene Expression Data

PubMed Central

Tan, Ching Siang; Ting, Wai Soon; Mohamad, Mohd Saberi; Chan, Weng Howe; Deris, Safaai; Ali Shah, Zuraini

2014-01-01

When gene expression data are too large to be processed, they are transformed into a reduced representation set of genes. Transforming large-scale gene expression data into a set of genes is called feature extraction. If the genes extracted are carefully chosen, this gene set can extract the relevant information from the large-scale gene expression data, allowing further analysis by using this reduced representation instead of the full size data. In this paper, we review numerous software applications that can be used for feature extraction. The software reviewed is mainly for Principal Component Analysis (PCA), Independent Component Analysis (ICA), Partial Least Squares (PLS), and Local Linear Embedding (LLE). A summary and sources of the software are provided in the last section for each feature extraction method. PMID:25250315
Characterization of toners and inkjets by laser ablation spectrochemical methods and Scanning Electron Microscopy-Energy Dispersive X-ray Spectroscopy

NASA Astrophysics Data System (ADS)

Trejos, Tatiana; Corzo, Ruthmara; Subedi, Kiran; Almirall, José

2014-02-01

Detection and sourcing of counterfeit currency, examination of counterfeit security documents and determination of authenticity of medical records are examples of common forensic document investigations. In these cases, the physical and chemical composition of the ink entries can provide important information for the assessment of the authenticity of the document or for making inferences about common source. Previous results reported by our group have demonstrated that elemental analysis, using either Laser Ablation-Inductively Coupled Plasma-Mass Spectrometry (LA-ICP-MS) or Laser Ablation Induced Breakdown Spectroscopy (LIBS), provides an effective, practical and robust technique for the discrimination of document substrates and writing inks with minimal damage to the document. In this study, laser-based methods and Scanning Electron Microscopy-Energy Dispersive X-Ray Spectroscopy (SEM-EDS) methods were developed, optimized and validated for the forensic analysis of more complex inks such as toners and inkjets, to determine if their elemental composition can differentiate documents printed from different sources and to associate documents that originated from the same printing source. Comparison of the performance of each of these methods is presented, including the analytical figures of merit, discrimination capability and error rates. Different calibration strategies resulting in semi-quantitative and qualitative analysis, comparison methods (match criteria) and data analysis and interpretation tools were also developed. A total of 27 black laser toners originating from different manufacturing sources and/or batches were examined to evaluate the discrimination capability of each method. The results suggest that SEM-EDS offers relatively poor discrimination capability for this set (~ 70.7% discrimination of all the possible comparison pairs or a 29.3% type II error rate). Nonetheless, SEM-EDS can still be used as a complementary method of analysis since it has the advantage of being non-destructive to the sample in addition to providing imaging capabilities to further characterize toner samples by their particle morphology. Laser sampling methods resulted in an improvement of the discrimination between different sources with LIBS producing 89% discrimination and LA-ICP-MS resulting in 100% discrimination. In addition, a set of 21 black inkjet samples was examined by each method. The results show that SEM-EDS is not appropriate for inkjet examinations since their elemental composition is typically below the detection capabilities with only sulfur detected in this set, providing only 47.4% discrimination between possible comparison pairs. Laser sampling methods were shown to provide discrimination greater than 94% for this same inkjet set with false exclusion and false inclusion rates lower than 4.1% and 5.7%, for LA-ICP-MS and LIBS respectively. Overall these results confirmed the utility of the examination of printed documents by laser-based micro-spectrochemical methods. SEM-EDS analysis of toners produced a limited utility for discrimination within sources but was not an effective tool for inkjet ink discrimination. Both LA-ICP-MS and LIBS can be used in forensic laboratories to chemically characterize inks on documents and to complement the information obtained by conventional methods and enhance their evidential value.

A Preliminary Rubric Design to Evaluate Mixed Methods Research

ERIC Educational Resources Information Center

Burrows, Timothy J.

2013-01-01

With the increase in frequency of the use of mixed methods, both in research publications and in externally funded grants there are increasing calls for a set of standards to assess the quality of mixed methods research. The purpose of this mixed methods study was to conduct a multi-phase analysis to create a preliminary rubric to evaluate mixed…
Analyzing Interactions by an IIS-Map-Based Method in Face-to-Face Collaborative Learning: An Empirical Study

ERIC Educational Resources Information Center

Zheng, Lanqin; Yang, Kaicheng; Huang, Ronghuai

2012-01-01

This study proposes a new method named the IIS-map-based method for analyzing interactions in face-to-face collaborative learning settings. This analysis method is conducted in three steps: firstly, drawing an initial IIS-map according to collaborative tasks; secondly, coding and segmenting information flows into information items of IIS; thirdly,…
MALDI-TOF mass spectrometry for quantitative gene expression analysis of acid responses in Staphylococcus aureus.

PubMed

Rode, Tone Mari; Berget, Ingunn; Langsrud, Solveig; Møretrø, Trond; Holck, Askild

2009-07-01

Microorganisms are constantly exposed to new and altered growth conditions, and respond by changing gene expression patterns. Several methods for studying gene expression exist. During the last decade, the analysis of microarrays has been one of the most common approaches applied for large scale gene expression studies. A relatively new method for gene expression analysis is MassARRAY, which combines real competitive-PCR and MALDI-TOF (matrix-assisted laser desorption/ionization time-of-flight) mass spectrometry. In contrast to microarray methods, MassARRAY technology is suitable for analysing a larger number of samples, though for a smaller set of genes. In this study we compare the results from MassARRAY with microarrays on gene expression responses of Staphylococcus aureus exposed to acid stress at pH 4.5. RNA isolated from the same stress experiments was analysed using both the MassARRAY and the microarray methods. The MassARRAY and microarray methods showed good correlation. Both MassARRAY and microarray estimated somewhat lower fold changes compared with quantitative real-time PCR (qRT-PCR). The results confirmed the up-regulation of the urease genes in acidic environments, and also indicated the importance of metal ion regulation. This study shows that the MassARRAY technology is suitable for gene expression analysis in prokaryotes, and has advantages when a set of genes is being analysed for an organism exposed to many different environmental conditions.
SAFER, an Analysis Method of Quantitative Proteomic Data, Reveals New Interactors of the C. elegans Autophagic Protein LGG-1.

PubMed

Yi, Zhou; Manil-Ségalen, Marion; Sago, Laila; Glatigny, Annie; Redeker, Virginie; Legouis, Renaud; Mucchielli-Giorgi, Marie-Hélène

2016-05-06

Affinity purifications followed by mass spectrometric analysis are used to identify protein-protein interactions. Because quantitative proteomic data are noisy, it is necessary to develop statistical methods to eliminate false-positives and identify true partners. We present here a novel approach for filtering false interactors, named "SAFER" for mass Spectrometry data Analysis by Filtering of Experimental Replicates, which is based on the reproducibility of the replicates and the fold-change of the protein intensities between bait and control. To identify regulators or targets of autophagy, we characterized the interactors of LGG1, a ubiquitin-like protein involved in autophagosome formation in C. elegans. LGG-1 partners were purified by affinity, analyzed by nanoLC-MS/MS mass spectrometry, and quantified by a label-free proteomic approach based on the mass spectrometric signal intensity of peptide precursor ions. Because the selection of confident interactions depends on the method used for statistical analysis, we compared SAFER with several statistical tests and different scoring algorithms on this set of data. We show that SAFER recovers high-confidence interactors that have been ignored by the other methods and identified new candidates involved in the autophagy process. We further validated our method on a public data set and conclude that SAFER notably improves the identification of protein interactors.
A comparison of cosegregation analysis methods for the clinical setting.

PubMed

Rañola, John Michael O; Liu, Quanhui; Rosenthal, Elisabeth A; Shirts, Brian H

2018-04-01

Quantitative cosegregation analysis can help evaluate the pathogenicity of genetic variants. However, genetics professionals without statistical training often use simple methods, reporting only qualitative findings. We evaluate the potential utility of quantitative cosegregation in the clinical setting by comparing three methods. One thousand pedigrees each were simulated for benign and pathogenic variants in BRCA1 and MLH1 using United States historical demographic data to produce pedigrees similar to those seen in the clinic. These pedigrees were analyzed using two robust methods, full likelihood Bayes factors (FLB) and cosegregation likelihood ratios (CSLR), and a simpler method, counting meioses. Both FLB and CSLR outperform counting meioses when dealing with pathogenic variants, though counting meioses is not far behind. For benign variants, FLB and CSLR greatly outperform as counting meioses is unable to generate evidence for benign variants. Comparing FLB and CSLR, we find that the two methods perform similarly, indicating that quantitative results from either of these methods could be combined in multifactorial calculations. Combining quantitative information will be important as isolated use of cosegregation in single families will yield classification for less than 1% of variants. To encourage wider use of robust cosegregation analysis, we present a website ( http://www.analyze.myvariant.org ) which implements the CSLR, FLB, and Counting Meioses methods for ATM, BRCA1, BRCA2, CHEK2, MEN1, MLH1, MSH2, MSH6, and PMS2. We also present an R package, CoSeg, which performs the CSLR analysis on any gene with user supplied parameters. Future variant classification guidelines should allow nuanced inclusion of cosegregation evidence against pathogenicity.
Rapid analysis of sugars in honey by processing Raman spectrum using chemometric methods and artificial neural networks.

PubMed

Özbalci, Beril; Boyaci, İsmail Hakkı; Topcu, Ali; Kadılar, Cem; Tamer, Uğur

2013-02-15

The aim of this study was to quantify glucose, fructose, sucrose and maltose contents of honey samples using Raman spectroscopy as a rapid method. By performing a single measurement, quantifications of sugar contents have been said to be unaffordable according to the molecular similarities between sugar molecules in honey matrix. This bottleneck was overcome by coupling Raman spectroscopy with chemometric methods (principal component analysis (PCA) and partial least squares (PLS)) and an artificial neural network (ANN). Model solutions of four sugars were processed with PCA and significant separation was observed. This operation, done with the spectral features by using PLS and ANN methods, led to the discriminant analysis of sugar contents. Models/trained networks were created using a calibration data set and evaluated using a validation data set. The correlation coefficient values between actual and predicted values of glucose, fructose, sucrose and maltose were determined as 0.964, 0.965, 0.968 and 0.949 for PLS and 0.965, 0.965, 0.978 and 0.956 for ANN, respectively. The requirement of rapid analysis of sugar contents of commercial honeys has been met by the data processed within this article. Copyright © 2012 Elsevier Ltd. All rights reserved.
Application of principal component analysis to distinguish patients with schizophrenia from healthy controls based on fractional anisotropy measurements.

PubMed

Caprihan, A; Pearlson, G D; Calhoun, V D

2008-08-15

Principal component analysis (PCA) is often used to reduce the dimension of data before applying more sophisticated data analysis methods such as non-linear classification algorithms or independent component analysis. This practice is based on selecting components corresponding to the largest eigenvalues. If the ultimate goal is separation of data in two groups, then these set of components need not have the most discriminatory power. We measured the distance between two such populations using Mahalanobis distance and chose the eigenvectors to maximize it, a modified PCA method, which we call the discriminant PCA (DPCA). DPCA was applied to diffusion tensor-based fractional anisotropy images to distinguish age-matched schizophrenia subjects from healthy controls. The performance of the proposed method was evaluated by the one-leave-out method. We show that for this fractional anisotropy data set, the classification error with 60 components was close to the minimum error and that the Mahalanobis distance was twice as large with DPCA, than with PCA. Finally, by masking the discriminant function with the white matter tracts of the Johns Hopkins University atlas, we identified left superior longitudinal fasciculus as the tract which gave the least classification error. In addition, with six optimally chosen tracts the classification error was zero.
Non-matrix Matched Glass Disk Calibration Standards Improve XRF Micronutrient Analysis of Wheat Grain across Five Laboratories in India

PubMed Central

Guild, Georgia E.; Stangoulis, James C. R.

2016-01-01

Within the HarvestPlus program there are many collaborators currently using X-Ray Fluorescence (XRF) spectroscopy to measure Fe and Zn in their target crops. In India, five HarvestPlus wheat collaborators have laboratories that conduct this analysis and their throughput has increased significantly. The benefits of using XRF are its ease of use, minimal sample preparation and high throughput analysis. The lack of commercially available calibration standards has led to a need for alternative calibration arrangements for many of the instruments. Consequently, the majority of instruments have either been installed with an electronic transfer of an original grain calibration set developed by a preferred lab, or a locally supplied calibration. Unfortunately, neither of these methods has been entirely successful. The electronic transfer is unable to account for small variations between the instruments, whereas the use of a locally provided calibration set is heavily reliant on the accuracy of the reference analysis method, which is particularly difficult to achieve when analyzing low levels of micronutrient. Consequently, we have developed a calibration method that uses non-matrix matched glass disks. Here we present the validation of this method and show this calibration approach can improve the reproducibility and accuracy of whole grain wheat analysis on 5 different XRF instruments across the HarvestPlus breeding program. PMID:27375644
Feature Selection for Classification of Polar Regions Using a Fuzzy Expert System

NASA Technical Reports Server (NTRS)

Penaloza, Mauel A.; Welch, Ronald M.

1996-01-01

Labeling, feature selection, and the choice of classifier are critical elements for classification of scenes and for image understanding. This study examines several methods for feature selection in polar regions, including the list, of a fuzzy logic-based expert system for further refinement of a set of selected features. Six Advanced Very High Resolution Radiometer (AVHRR) Local Area Coverage (LAC) arctic scenes are classified into nine classes: water, snow / ice, ice cloud, land, thin stratus, stratus over water, cumulus over water, textured snow over water, and snow-covered mountains. Sixty-seven spectral and textural features are computed and analyzed by the feature selection algorithms. The divergence, histogram analysis, and discriminant analysis approaches are intercompared for their effectiveness in feature selection. The fuzzy expert system method is used not only to determine the effectiveness of each approach in classifying polar scenes, but also to further reduce the features into a more optimal set. For each selection method,features are ranked from best to worst, and the best half of the features are selected. Then, rules using these selected features are defined. The results of running the fuzzy expert system with these rules show that the divergence method produces the best set features, not only does it produce the highest classification accuracy, but also it has the lowest computation requirements. A reduction of the set of features produced by the divergence method using the fuzzy expert system results in an overall classification accuracy of over 95 %. However, this increase of accuracy has a high computation cost.
Experimental analysis of multi-attribute decision-making based on Atanassov intuitionistic fuzzy sets: a discussion of anchor dependency and accuracy functions

NASA Astrophysics Data System (ADS)

Chen, Ting-Yu

2012-06-01

This article presents a useful method for relating anchor dependency and accuracy functions to multiple attribute decision-making (MADM) problems in the context of Atanassov intuitionistic fuzzy sets (A-IFSs). Considering anchored judgement with displaced ideals and solution precision with minimal hesitation, several auxiliary optimisation models have proposed to obtain the optimal weights of the attributes and to acquire the corresponding TOPSIS (the technique for order preference by similarity to the ideal solution) index for alternative rankings. Aside from the TOPSIS index, as a decision-maker's personal characteristics and own perception of self may also influence the direction in the axiom of choice, the evaluation of alternatives is conducted based on distances of each alternative from the positive and negative ideal alternatives, respectively. This article originates from Li's [Li, D.-F. (2005), 'Multiattribute Decision Making Models and Methods Using Intuitionistic Fuzzy Sets', Journal of Computer and System Sciences, 70, 73-85] work, which is a seminal study of intuitionistic fuzzy decision analysis using deduced auxiliary programming models, and deems it a benchmark method for comparative studies on anchor dependency and accuracy functions. The feasibility and effectiveness of the proposed methods are illustrated by a numerical example. Finally, a comparative analysis is illustrated with computational experiments on averaging accuracy functions, TOPSIS indices, separation measures from positive and negative ideal alternatives, consistency rates of ranking orders, contradiction rates of the top alternative and average Spearman correlation coefficients.
PyPathway: Python Package for Biological Network Analysis and Visualization.

PubMed

Xu, Yang; Luo, Xiao-Chun

2018-05-01

Life science studies represent one of the biggest generators of large data sets, mainly because of rapid sequencing technological advances. Biological networks including interactive networks and human curated pathways are essential to understand these high-throughput data sets. Biological network analysis offers a method to explore systematically not only the molecular complexity of a particular disease but also the molecular relationships among apparently distinct phenotypes. Currently, several packages for Python community have been developed, such as BioPython and Goatools. However, tools to perform comprehensive network analysis and visualization are still needed. Here, we have developed PyPathway, an extensible free and open source Python package for functional enrichment analysis, network modeling, and network visualization. The network process module supports various interaction network and pathway databases such as Reactome, WikiPathway, STRING, and BioGRID. The network analysis module implements overrepresentation analysis, gene set enrichment analysis, network-based enrichment, and de novo network modeling. Finally, the visualization and data publishing modules enable users to share their analysis by using an easy web application. For package availability, see the first Reference.
High-order interactions observed in multi-task intrinsic networks are dominant indicators of aberrant brain function in schizophrenia

PubMed Central

Plis, Sergey M; Sui, Jing; Lane, Terran; Roy, Sushmita; Clark, Vincent P; Potluru, Vamsi K; Huster, Rene J; Michael, Andrew; Sponheim, Scott R; Weisend, Michael P; Calhoun, Vince D

2013-01-01

Identifying the complex activity relationships present in rich, modern neuroimaging data sets remains a key challenge for neuroscience. The problem is hard because (a) the underlying spatial and temporal networks may be nonlinear and multivariate and (b) the observed data may be driven by numerous latent factors. Further, modern experiments often produce data sets containing multiple stimulus contexts or tasks processed by the same subjects. Fusing such multi-session data sets may reveal additional structure, but raises further statistical challenges. We present a novel analysis method for extracting complex activity networks from such multifaceted imaging data sets. Compared to previous methods, we choose a new point in the trade-off space, sacrificing detailed generative probability models and explicit latent variable inference in order to achieve robust estimation of multivariate, nonlinear group factors (“network clusters”). We apply our method to identify relationships of task-specific intrinsic networks in schizophrenia patients and control subjects from a large fMRI study. After identifying network-clusters characterized by within- and between-task interactions, we find significant differences between patient and control groups in interaction strength among networks. Our results are consistent with known findings of brain regions exhibiting deviations in schizophrenic patients. However, we also find high-order, nonlinear interactions that discriminate groups but that are not detected by linear, pair-wise methods. We additionally identify high-order relationships that provide new insights into schizophrenia but that have not been found by traditional univariate or second-order methods. Overall, our approach can identify key relationships that are missed by existing analysis methods, without losing the ability to find relationships that are known to be important. PMID:23876245
Computer-assisted qualitative data analysis software.

PubMed

Cope, Diane G

2014-05-01

Advances in technology have provided new approaches for data collection methods and analysis for researchers. Data collection is no longer limited to paper-and-pencil format, and numerous methods are now available through Internet and electronic resources. With these techniques, researchers are not burdened with entering data manually and data analysis is facilitated by software programs. Quantitative research is supported by the use of computer software and provides ease in the management of large data sets and rapid analysis of numeric statistical methods. New technologies are emerging to support qualitative research with the availability of computer-assisted qualitative data analysis software (CAQDAS).CAQDAS will be presented with a discussion of advantages, limitations, controversial issues, and recommendations for this type of software use.
Learning representative features for facial images based on a modified principal component analysis

NASA Astrophysics Data System (ADS)

Averkin, Anton; Potapov, Alexey

2013-05-01

The paper is devoted to facial image analysis and particularly deals with the problem of automatic evaluation of the attractiveness of human faces. We propose a new approach for automatic construction of feature space based on a modified principal component analysis. Input data sets for the algorithm are the learning data sets of facial images, which are rated by one person. The proposed approach allows one to extract features of the individual subjective face beauty perception and to predict attractiveness values for new facial images, which were not included into a learning data set. The Pearson correlation coefficient between values predicted by our method for new facial images and personal attractiveness estimation values equals to 0.89. This means that the new approach proposed is promising and can be used for predicting subjective face attractiveness values in real systems of the facial images analysis.
Pruning Rogue Taxa Improves Phylogenetic Accuracy: An Efficient Algorithm and Webservice

PubMed Central

Aberer, Andre J.; Krompass, Denis; Stamatakis, Alexandros

2013-01-01

Abstract The presence of rogue taxa (rogues) in a set of trees can frequently have a negative impact on the results of a bootstrap analysis (e.g., the overall support in consensus trees). We introduce an efficient graph-based algorithm for rogue taxon identification as well as an interactive webservice implementing this algorithm. Compared with our previous method, the new algorithm is up to 4 orders of magnitude faster, while returning qualitatively identical results. Because of this significant improvement in scalability, the new algorithm can now identify substantially more complex and compute-intensive rogue taxon constellations. On a large and diverse collection of real-world data sets, we show that our method yields better supported reduced/pruned consensus trees than any competing rogue taxon identification method. Using the parallel version of our open-source code, we successfully identified rogue taxa in a set of 100 trees with 116 334 taxa each. For simulated data sets, we show that when removing/pruning rogue taxa with our method from a tree set, we consistently obtain bootstrap consensus trees as well as maximum-likelihood trees that are topologically closer to the respective true trees. PMID:22962004
Pruning rogue taxa improves phylogenetic accuracy: an efficient algorithm and webservice.

PubMed

Aberer, Andre J; Krompass, Denis; Stamatakis, Alexandros

2013-01-01

The presence of rogue taxa (rogues) in a set of trees can frequently have a negative impact on the results of a bootstrap analysis (e.g., the overall support in consensus trees). We introduce an efficient graph-based algorithm for rogue taxon identification as well as an interactive webservice implementing this algorithm. Compared with our previous method, the new algorithm is up to 4 orders of magnitude faster, while returning qualitatively identical results. Because of this significant improvement in scalability, the new algorithm can now identify substantially more complex and compute-intensive rogue taxon constellations. On a large and diverse collection of real-world data sets, we show that our method yields better supported reduced/pruned consensus trees than any competing rogue taxon identification method. Using the parallel version of our open-source code, we successfully identified rogue taxa in a set of 100 trees with 116 334 taxa each. For simulated data sets, we show that when removing/pruning rogue taxa with our method from a tree set, we consistently obtain bootstrap consensus trees as well as maximum-likelihood trees that are topologically closer to the respective true trees.
Mapping loci influencing blood pressure in the Framingham pedigrees using model-free LOD score analysis of a quantitative trait.

PubMed

Knight, Jo; North, Bernard V; Sham, Pak C; Curtis, David

2003-12-31

This paper presents a method of performing model-free LOD-score based linkage analysis on quantitative traits. It is implemented in the QMFLINK program. The method is used to perform a genome screen on the Framingham Heart Study data. A number of markers that show some support for linkage in our study coincide substantially with those implicated in other linkage studies of hypertension. Although the new method needs further testing on additional real and simulated data sets we can already say that it is straightforward to apply and may offer a useful complementary approach to previously available methods for the linkage analysis of quantitative traits.
Mapping loci influencing blood pressure in the Framingham pedigrees using model-free LOD score analysis of a quantitative trait

PubMed Central

Knight, Jo; North, Bernard V; Sham, Pak C; Curtis, David

2003-01-01

This paper presents a method of performing model-free LOD-score based linkage analysis on quantitative traits. It is implemented in the QMFLINK program. The method is used to perform a genome screen on the Framingham Heart Study data. A number of markers that show some support for linkage in our study coincide substantially with those implicated in other linkage studies of hypertension. Although the new method needs further testing on additional real and simulated data sets we can already say that it is straightforward to apply and may offer a useful complementary approach to previously available methods for the linkage analysis of quantitative traits. PMID:14975142
Curved planar reformation and optimal path tracing (CROP) method for false positive reduction in computer-aided detection of pulmonary embolism in CTPA

NASA Astrophysics Data System (ADS)

Zhou, Chuan; Chan, Heang-Ping; Guo, Yanhui; Wei, Jun; Chughtai, Aamer; Hadjiiski, Lubomir M.; Sundaram, Baskaran; Patel, Smita; Kuriakose, Jean W.; Kazerooni, Ella A.

2013-03-01

The curved planar reformation (CPR) method re-samples the vascular structures along the vessel centerline to generate longitudinal cross-section views. The CPR technique has been commonly used in coronary CTA workstation to facilitate radiologists' visual assessment of coronary diseases, but has not yet been used for pulmonary vessel analysis in CTPA due to the complicated tree structures and the vast network of pulmonary vasculature. In this study, a new curved planar reformation and optimal path tracing (CROP) method was developed to facilitate feature extraction and false positive (FP) reduction and improve our PE detection system. PE candidates are first identified in the segmented pulmonary vessels at prescreening. Based on Dijkstra's algorithm, the optimal path (OP) is traced from the pulmonary trunk bifurcation point to each PE candidate. The traced vessel is then straightened and a reformatted volume is generated using CPR. Eleven new features that characterize the intensity, gradient, and topology are extracted from the PE candidate in the CPR volume and combined with the previously developed 9 features to form a new feature space for FP classification. With IRB approval, CTPA of 59 PE cases were retrospectively collected from our patient files (UM set) and 69 PE cases from the PIOPED II data set with access permission. 595 and 800 PEs were manually marked by experienced radiologists as reference standard for the UM and PIOPED set, respectively. At a test sensitivity of 80%, the average FP rate was improved from 18.9 to 11.9 FPs/case with the new method for the PIOPED set when the UM set was used for training. The FP rate was improved from 22.6 to 14.2 FPs/case for the UM set when the PIOPED set was used for training. The improvement in the free response receiver operating characteristic (FROC) curves was statistically significant (p<0.05) by JAFROC analysis, indicating that the new features extracted from the CROP method are useful for FP reduction.
Computerized multiple image analysis on mammograms: performance improvement of nipple identification for registration of multiple views using texture convergence analyses

NASA Astrophysics Data System (ADS)

Zhou, Chuan; Chan, Heang-Ping; Sahiner, Berkman; Hadjiiski, Lubomir M.; Paramagul, Chintana

2004-05-01

Automated registration of multiple mammograms for CAD depends on accurate nipple identification. We developed two new image analysis techniques based on geometric and texture convergence analyses to improve the performance of our previously developed nipple identification method. A gradient-based algorithm is used to automatically track the breast boundary. The nipple search region along the boundary is then defined by geometric convergence analysis of the breast shape. Three nipple candidates are identified by detecting the changes along the gray level profiles inside and outside the boundary and the changes in the boundary direction. A texture orientation-field analysis method is developed to estimate the fourth nipple candidate based on the convergence of the tissue texture pattern towards the nipple. The final nipple location is determined from the four nipple candidates by a confidence analysis. Our training and test data sets consisted of 419 and 368 randomly selected mammograms, respectively. The nipple location identified on each image by an experienced radiologist was used as the ground truth. For 118 of the training and 70 of the test images, the radiologist could not positively identify the nipple, but provided an estimate of its location. These were referred to as invisible nipple images. In the training data set, 89.37% (269/301) of the visible nipples and 81.36% (96/118) of the invisible nipples could be detected within 1 cm of the truth. In the test data set, 92.28% (275/298) of the visible nipples and 67.14% (47/70) of the invisible nipples were identified within 1 cm of the truth. In comparison, our previous nipple identification method without using the two convergence analysis techniques detected 82.39% (248/301), 77.12% (91/118), 89.93% (268/298) and 54.29% (38/70) of the nipples within 1 cm of the truth for the visible and invisible nipples in the training and test sets, respectively. The results indicate that the nipple on mammograms can be detected accurately. This will be an important step towards automatic multiple image analysis for CAD techniques.

Validation and Application of a PCR Primer Set to Quantify Fungal Communities in the Soil Environment by Real-Time Quantitative PCR

PubMed Central

Chemidlin Prévost-Bouré, Nicolas; Christen, Richard; Dequiedt, Samuel; Mougel, Christophe; Lelièvre, Mélanie; Jolivet, Claudy; Shahbazkia, Hamid Reza; Guillou, Laure; Arrouays, Dominique; Ranjard, Lionel

2011-01-01

Fungi constitute an important group in soil biological diversity and functioning. However, characterization and knowledge of fungal communities is hampered because few primer sets are available to quantify fungal abundance by real-time quantitative PCR (real-time Q-PCR). The aim in this study was to quantify fungal abundance in soils by incorporating, into a real-time Q-PCR using the SYBRGreen® method, a primer set already used to study the genetic structure of soil fungal communities. To satisfy the real-time Q-PCR requirements to enhance the accuracy and reproducibility of the detection technique, this study focused on the 18S rRNA gene conserved regions. These regions are little affected by length polymorphism and may provide sufficiently small targets, a crucial criterion for enhancing accuracy and reproducibility of the detection technique. An in silico analysis of 33 primer sets targeting the 18S rRNA gene was performed to select the primer set with the best potential for real-time Q-PCR: short amplicon length; good fungal specificity and coverage. The best consensus between specificity, coverage and amplicon length among the 33 sets tested was the primer set FR1 / FF390. This in silico analysis of the specificity of FR1 / FF390 also provided additional information to the previously published analysis on this primer set. The specificity of the primer set FR1 / FF390 for Fungi was validated in vitro by cloning - sequencing the amplicons obtained from a real time Q-PCR assay performed on five independent soil samples. This assay was also used to evaluate the sensitivity and reproducibility of the method. Finally, fungal abundance in samples from 24 soils with contrasting physico-chemical and environmental characteristics was examined and ranked to determine the importance of soil texture, organic carbon content, C∶N ratio and land use in determining fungal abundance in soils. PMID:21931659
A cluster analysis method for identification of subpopulations of cells in flow cytometric list-mode arrays

NASA Technical Reports Server (NTRS)

Li, Z. K.

1985-01-01

A specialized program was developed for flow cytometric list-mode data using an heirarchical tree method for identifying and enumerating individual subpopulations, the method of principal components for a two-dimensional display of 6-parameter data array, and a standard sorting algorithm for characterizing subpopulations. The program was tested against a published data set subjected to cluster analysis and experimental data sets from controlled flow cytometry experiments using a Coulter Electronics EPICS V Cell Sorter. A version of the program in compiled BASIC is usable on a 16-bit microcomputer with the MS-DOS operating system. It is specialized for 6 parameters and up to 20,000 cells. Its two-dimensional display of Euclidean distances reveals clusters clearly, as does its 1-dimensional display. The identified subpopulations can, in suitable experiments, be related to functional subpopulations of cells.
Pathway-based analyses.

PubMed

Kent, Jack W

2016-02-03

New technologies for acquisition of genomic data, while offering unprecedented opportunities for genetic discovery, also impose severe burdens of interpretation and penalties for multiple testing. The Pathway-based Analyses Group of the Genetic Analysis Workshop 19 (GAW19) sought reduction of multiple-testing burden through various approaches to aggregation of highdimensional data in pathways informed by prior biological knowledge. Experimental methods testedincluded the use of "synthetic pathways" (random sets of genes) to estimate power and false-positive error rate of methods applied to simulated data; data reduction via independent components analysis, single-nucleotide polymorphism (SNP)-SNP interaction, and use of gene sets to estimate genetic similarity; and general assessment of the efficacy of prior biological knowledge to reduce the dimensionality of complex genomic data. The work of this group explored several promising approaches to managing high-dimensional data, with the caveat that these methods are necessarily constrained by the quality of external bioinformatic annotation.
Ergodic theory and visualization. II. Fourier mesochronic plots visualize (quasi)periodic sets

DOE Office of Scientific and Technical Information (OSTI.GOV)

Levnajić, Zoran; Department of Mechanical Engineering, University of California Santa Barbara, Santa Barbara, California 93106; Mezić, Igor

We present an application and analysis of a visualization method for measure-preserving dynamical systems introduced by I. Mezić and A. Banaszuk [Physica D 197, 101 (2004)], based on frequency analysis and Koopman operator theory. This extends our earlier work on visualization of ergodic partition [Z. Levnajić and I. Mezić, Chaos 20, 033114 (2010)]. Our method employs the concept of Fourier time average [I. Mezić and A. Banaszuk, Physica D 197, 101 (2004)], and is realized as a computational algorithms for visualization of periodic and quasi-periodic sets in the phase space. The complement of periodic phase space partition contains chaotic zone,more » and we show how to identify it. The range of method's applicability is illustrated using well-known Chirikov standard map, while its potential in illuminating higher-dimensional dynamics is presented by studying the Froeschlé map and the Extended Standard Map.« less
Ergodic theory and visualization. II. Fourier mesochronic plots visualize (quasi)periodic sets.

PubMed

Levnajić, Zoran; Mezić, Igor

2015-05-01

We present an application and analysis of a visualization method for measure-preserving dynamical systems introduced by I. Mezić and A. Banaszuk [Physica D 197, 101 (2004)], based on frequency analysis and Koopman operator theory. This extends our earlier work on visualization of ergodic partition [Z. Levnajić and I. Mezić, Chaos 20, 033114 (2010)]. Our method employs the concept of Fourier time average [I. Mezić and A. Banaszuk, Physica D 197, 101 (2004)], and is realized as a computational algorithms for visualization of periodic and quasi-periodic sets in the phase space. The complement of periodic phase space partition contains chaotic zone, and we show how to identify it. The range of method's applicability is illustrated using well-known Chirikov standard map, while its potential in illuminating higher-dimensional dynamics is presented by studying the Froeschlé map and the Extended Standard Map.
New insights into old methods for identifying causal rare variants.

PubMed

Wang, Haitian; Huang, Chien-Hsun; Lo, Shaw-Hwa; Zheng, Tian; Hu, Inchi

2011-11-29

The advance of high-throughput next-generation sequencing technology makes possible the analysis of rare variants. However, the investigation of rare variants in unrelated-individuals data sets faces the challenge of low power, and most methods circumvent the difficulty by using various collapsing procedures based on genes, pathways, or gene clusters. We suggest a new way to identify causal rare variants using the F-statistic and sliced inverse regression. The procedure is tested on the data set provided by the Genetic Analysis Workshop 17 (GAW17). After preliminary data reduction, we ranked markers according to their F-statistic values. Top-ranked markers were then subjected to sliced inverse regression, and those with higher absolute coefficients in the most significant sliced inverse regression direction were selected. The procedure yields good false discovery rates for the GAW17 data and thus is a promising method for future study on rare variants.
The bench scientist's guide to RNA-Seq analysis

USDA-ARS?s Scientific Manuscript database

RNA sequencing (RNA-Seq) is emerging as a highly accurate method to quantify transcript abundance. However, analyses of the large data sets obtained by sequencing the entire transcriptome of organisms have generally been performed by bioinformatic specialists. Here we outline a methods strategy desi...
A Program for Automatic Generation of Dimensionless Parameters.

ERIC Educational Resources Information Center

Hundal, M. S.

1982-01-01

Following a review of the theory of dimensional analysis, presents a method for generating all of the possible sets of nondimensional parameters for a given problem, a digital computer program to implement the method, and a mechanical design problem to illustrate its use. (Author/JN)
Costing Alternative Birth Settings for Women at Low Risk of Complications: A Systematic Review

PubMed Central

Scarf, Vanessa; Catling, Christine; Viney, Rosalie; Homer, Caroline

2016-01-01

Background There is demand from women for alternatives to giving birth in a standard hospital setting however access to these services is limited. This systematic review examines the literature relating to the economic evaluations of birth setting for women at low risk of complications. Methods Searches of the literature to identify economic evaluations of different birth settings of the following electronic databases: MEDLINE, CINAHL, EconLit, Business Source Complete and Maternity and Infant care. Relevant English language publications were chosen using keywords and MeSH terms between 1995 and 2015. Inclusion criteria included studies focussing on the comparison of birth setting. Data were extracted with respect to study design, perspective, PICO principles, and resource use and cost data. Results Eleven studies were included from Australia, Canada, the Netherlands, Norway, the USA, and the UK. Four studies compared costs between homebirth and the hospital setting and the remaining seven focussed on the cost of birth centre care and the hospital setting. Six studies used a cost-effectiveness analysis and the remaining five studies used cost analysis and cost comparison methods. Eight of the 11 studies found a cost saving in the alternative settings. Two found no difference in the cost of the alternative settings and one found an increase in birth centre care. Conclusions There are few studies that compare the cost of birth setting. The variation in the results may be attributable to the cost data collection processes, difference in health systems and differences in which costs were included. A better understanding of the cost of birth setting is needed to inform policy makers and service providers. PMID:26891444
Spectral Regression Discriminant Analysis for Hyperspectral Image Classification

NASA Astrophysics Data System (ADS)

Pan, Y.; Wu, J.; Huang, H.; Liu, J.

2012-08-01

Dimensionality reduction algorithms, which aim to select a small set of efficient and discriminant features, have attracted great attention for Hyperspectral Image Classification. The manifold learning methods are popular for dimensionality reduction, such as Locally Linear Embedding, Isomap, and Laplacian Eigenmap. However, a disadvantage of many manifold learning methods is that their computations usually involve eigen-decomposition of dense matrices which is expensive in both time and memory. In this paper, we introduce a new dimensionality reduction method, called Spectral Regression Discriminant Analysis (SRDA). SRDA casts the problem of learning an embedding function into a regression framework, which avoids eigen-decomposition of dense matrices. Also, with the regression based framework, different kinds of regularizes can be naturally incorporated into our algorithm which makes it more flexible. It can make efficient use of data points to discover the intrinsic discriminant structure in the data. Experimental results on Washington DC Mall and AVIRIS Indian Pines hyperspectral data sets demonstrate the effectiveness of the proposed method.
Gene integrated set profile analysis: a context-based approach for inferring biological endpoints

PubMed Central

Kowalski, Jeanne; Dwivedi, Bhakti; Newman, Scott; Switchenko, Jeffery M.; Pauly, Rini; Gutman, David A.; Arora, Jyoti; Gandhi, Khanjan; Ainslie, Kylie; Doho, Gregory; Qin, Zhaohui; Moreno, Carlos S.; Rossi, Michael R.; Vertino, Paula M.; Lonial, Sagar; Bernal-Mizrachi, Leon; Boise, Lawrence H.

2016-01-01

The identification of genes with specific patterns of change (e.g. down-regulated and methylated) as phenotype drivers or samples with similar profiles for a given gene set as drivers of clinical outcome, requires the integration of several genomic data types for which an ‘integrate by intersection’ (IBI) approach is often applied. In this approach, results from separate analyses of each data type are intersected, which has the limitation of a smaller intersection with more data types. We introduce a new method, GISPA (Gene Integrated Set Profile Analysis) for integrated genomic analysis and its variation, SISPA (Sample Integrated Set Profile Analysis) for defining respective genes and samples with the context of similar, a priori specified molecular profiles. With GISPA, the user defines a molecular profile that is compared among several classes and obtains ranked gene sets that satisfy the profile as drivers of each class. With SISPA, the user defines a gene set that satisfies a profile and obtains sample groups of profile activity. Our results from applying GISPA to human multiple myeloma (MM) cell lines contained genes of known profiles and importance, along with several novel targets, and their further SISPA application to MM coMMpass trial data showed clinical relevance. PMID:26826710
Regression with Small Data Sets: A Case Study using Code Surrogates in Additive Manufacturing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kamath, C.; Fan, Y. J.

There has been an increasing interest in recent years in the mining of massive data sets whose sizes are measured in terabytes. While it is easy to collect such large data sets in some application domains, there are others where collecting even a single data point can be very expensive, so the resulting data sets have only tens or hundreds of samples. For example, when complex computer simulations are used to understand a scientific phenomenon, we want to run the simulation for many different values of the input parameters and analyze the resulting output. The data set relating the simulationmore » inputs and outputs is typically quite small, especially when each run of the simulation is expensive. However, regression techniques can still be used on such data sets to build an inexpensive \\surrogate" that could provide an approximate output for a given set of inputs. A good surrogate can be very useful in sensitivity analysis, uncertainty analysis, and in designing experiments. In this paper, we compare different regression techniques to determine how well they predict melt-pool characteristics in the problem domain of additive manufacturing. Our analysis indicates that some of the commonly used regression methods do perform quite well even on small data sets.« less
Learning to recognize rat social behavior: Novel dataset and cross-dataset application.

PubMed

Lorbach, Malte; Kyriakou, Elisavet I; Poppe, Ronald; van Dam, Elsbeth A; Noldus, Lucas P J J; Veltkamp, Remco C

2018-04-15

Social behavior is an important aspect of rodent models. Automated measuring tools that make use of video analysis and machine learning are an increasingly attractive alternative to manual annotation. Because machine learning-based methods need to be trained, it is important that they are validated using data from different experiment settings. To develop and validate automated measuring tools, there is a need for annotated rodent interaction datasets. Currently, the availability of such datasets is limited to two mouse datasets. We introduce the first, publicly available rat social interaction dataset, RatSI. We demonstrate the practical value of the novel dataset by using it as the training set for a rat interaction recognition method. We show that behavior variations induced by the experiment setting can lead to reduced performance, which illustrates the importance of cross-dataset validation. Consequently, we add a simple adaptation step to our method and improve the recognition performance. Most existing methods are trained and evaluated in one experimental setting, which limits the predictive power of the evaluation to that particular setting. We demonstrate that cross-dataset experiments provide more insight in the performance of classifiers. With our novel, public dataset we encourage the development and validation of automated recognition methods. We are convinced that cross-dataset validation enhances our understanding of rodent interactions and facilitates the development of more sophisticated recognition methods. Combining them with adaptation techniques may enable us to apply automated recognition methods to a variety of animals and experiment settings. Copyright © 2017 Elsevier B.V. All rights reserved.
Signal classification using global dynamical models, Part II: SONAR data analysis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kremliovsky, M.; Kadtke, J.

1996-06-01

In Part I of this paper, we described a numerical method for nonlinear signal detection and classification which made use of techniques borrowed from dynamical systems theory. Here in Part II of the paper, we will describe an example of data analysis using this method, for data consisting of open ocean acoustic (SONAR) recordings of marine mammal transients, supplied from NUWC sources. The purpose here is two-fold: first to give a more operational description of the technique and provide rules-of-thumb for parameter choices; and second to discuss some new issues raised by the analysis of non-ideal (real-world) data sets. Themore » particular data set considered here is quite non-stationary, relatively noisy, is not clearly localized in the background, and as such provides a difficult challenge for most detection/classification schemes. {copyright} {ital 1996 American Institute of Physics.}« less
Pyrcca: Regularized Kernel Canonical Correlation Analysis in Python and Its Applications to Neuroimaging.

PubMed

Bilenko, Natalia Y; Gallant, Jack L

2016-01-01

In this article we introduce Pyrcca, an open-source Python package for performing canonical correlation analysis (CCA). CCA is a multivariate analysis method for identifying relationships between sets of variables. Pyrcca supports CCA with or without regularization, and with or without linear, polynomial, or Gaussian kernelization. We first use an abstract example to describe Pyrcca functionality. We then demonstrate how Pyrcca can be used to analyze neuroimaging data. Specifically, we use Pyrcca to implement cross-subject comparison in a natural movie functional magnetic resonance imaging (fMRI) experiment by finding a data-driven set of functional response patterns that are similar across individuals. We validate this cross-subject comparison method in Pyrcca by predicting responses to novel natural movies across subjects. Finally, we show how Pyrcca can reveal retinotopic organization in brain responses to natural movies without the need for an explicit model.
Pyrcca: Regularized Kernel Canonical Correlation Analysis in Python and Its Applications to Neuroimaging

PubMed Central

Bilenko, Natalia Y.; Gallant, Jack L.

2016-01-01

In this article we introduce Pyrcca, an open-source Python package for performing canonical correlation analysis (CCA). CCA is a multivariate analysis method for identifying relationships between sets of variables. Pyrcca supports CCA with or without regularization, and with or without linear, polynomial, or Gaussian kernelization. We first use an abstract example to describe Pyrcca functionality. We then demonstrate how Pyrcca can be used to analyze neuroimaging data. Specifically, we use Pyrcca to implement cross-subject comparison in a natural movie functional magnetic resonance imaging (fMRI) experiment by finding a data-driven set of functional response patterns that are similar across individuals. We validate this cross-subject comparison method in Pyrcca by predicting responses to novel natural movies across subjects. Finally, we show how Pyrcca can reveal retinotopic organization in brain responses to natural movies without the need for an explicit model. PMID:27920675
Kaplan-Meier survival analysis overestimates cumulative incidence of health-related events in competing risk settings: a meta-analysis.

PubMed

Lacny, Sarah; Wilson, Todd; Clement, Fiona; Roberts, Derek J; Faris, Peter; Ghali, William A; Marshall, Deborah A

2018-01-01

Kaplan-Meier survival analysis overestimates cumulative incidence in competing risks (CRs) settings. The extent of overestimation (or its clinical significance) has been questioned, and CRs methods are infrequently used. This meta-analysis compares the Kaplan-Meier method to the cumulative incidence function (CIF), a CRs method. We searched MEDLINE, EMBASE, BIOSIS Previews, Web of Science (1992-2016), and article bibliographies for studies estimating cumulative incidence using the Kaplan-Meier method and CIF. For studies with sufficient data, we calculated pooled risk ratios (RRs) comparing Kaplan-Meier and CIF estimates using DerSimonian and Laird random effects models. We performed stratified meta-analyses by clinical area, rate of CRs (CRs/events of interest), and follow-up time. Of 2,192 identified abstracts, we included 77 studies in the systematic review and meta-analyzed 55. The pooled RR demonstrated the Kaplan-Meier estimate was 1.41 [95% confidence interval (CI): 1.36, 1.47] times higher than the CIF. Overestimation was highest among studies with high rates of CRs [RR = 2.36 (95% CI: 1.79, 3.12)], studies related to hepatology [RR = 2.60 (95% CI: 2.12, 3.19)], and obstetrics and gynecology [RR = 1.84 (95% CI: 1.52, 2.23)]. The Kaplan-Meier method overestimated the cumulative incidence across 10 clinical areas. Using CRs methods will ensure accurate results inform clinical and policy decisions. Copyright © 2017 Elsevier Inc. All rights reserved.
Pathway results from the chicken data set using GOTM, Pathway Studio and Ingenuity softwares

PubMed Central

Bonnet, Agnès; Lagarrigue, Sandrine; Liaubet, Laurence; Robert-Granié, Christèle; SanCristobal, Magali; Tosser-Klopp, Gwenola

2009-01-01

Background As presented in the introduction paper, three sets of differentially regulated genes were found after the analysis of the chicken infection data set from EADGENE. Different methods were used to interpret these results. Results GOTM, Pathway Studio and Ingenuity softwares were used to investigate the three lists of genes. The three softwares allowed the analysis of the data and highlighted different networks. However, only one set of genes, showing a differential expression between primary and secondary response gave significant biological interpretation. Conclusion Combining these databases that were developed independently on different annotation sources supplies a useful tool for a global biological interpretation of microarray data, even if they may contain some imperfections (e.g. gene not or not well annotated). PMID:19615111
Small vulnerable sets determine large network cascades in power grids

DOE Office of Scientific and Technical Information (OSTI.GOV)

Yang, Yang; Nishikawa, Takashi; Motter, Adilson E.

The understanding of cascading failures in complex systems has been hindered by the lack of realistic large-scale modeling and analysis that can account for variable system conditions. By using the North American power grid, we identified, quantified, and analyzed the set of network components that are vulnerable to cascading failures under any out of multiple conditions. We show that the vulnerable set consists of a small but topologically central portion of the network and that large cascades are disproportionately more likely to be triggered by initial failures close to this set. These results elucidate aspects of the origins and causesmore » of cascading failures relevant for grid design and operation and demonstrate vulnerability analysis methods that are applicable to a wider class of cascade-prone networks.« less
Eigenvalue-eigenvector decomposition (EED) analysis of dissimilarity and covariance matrix obtained from total synchronous fluorescence spectral (TSFS) data sets of herbal preparations: Optimizing the classification approach

NASA Astrophysics Data System (ADS)

Tarai, Madhumita; Kumar, Keshav; Divya, O.; Bairi, Partha; Mishra, Kishor Kumar; Mishra, Ashok Kumar

2017-09-01

The present work compares the dissimilarity and covariance based unsupervised chemometric classification approaches by taking the total synchronous fluorescence spectroscopy data sets acquired for the cumin and non-cumin based herbal preparations. The conventional decomposition method involves eigenvalue-eigenvector analysis of the covariance of the data set and finds the factors that can explain the overall major sources of variation present in the data set. The conventional approach does this irrespective of the fact that the samples belong to intrinsically different groups and hence leads to poor class separation. The present work shows that classification of such samples can be optimized by performing the eigenvalue-eigenvector decomposition on the pair-wise dissimilarity matrix.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.