identified statistical analysis: Topics by Science.gov

Sample records for identified statistical analysis

Dealing with missing standard deviation and mean values in meta-analysis of continuous outcomes: a systematic review.

PubMed

Weir, Christopher J; Butcher, Isabella; Assi, Valentina; Lewis, Stephanie C; Murray, Gordon D; Langhorne, Peter; Brady, Marian C

2018-03-07

Rigorous, informative meta-analyses rely on availability of appropriate summary statistics or individual participant data. For continuous outcomes, especially those with naturally skewed distributions, summary information on the mean or variability often goes unreported. While full reporting of original trial data is the ideal, we sought to identify methods for handling unreported mean or variability summary statistics in meta-analysis. We undertook two systematic literature reviews to identify methodological approaches used to deal with missing mean or variability summary statistics. Five electronic databases were searched, in addition to the Cochrane Colloquium abstract books and the Cochrane Statistics Methods Group mailing list archive. We also conducted cited reference searching and emailed topic experts to identify recent methodological developments. Details recorded included the description of the method, the information required to implement the method, any underlying assumptions and whether the method could be readily applied in standard statistical software. We provided a summary description of the methods identified, illustrating selected methods in example meta-analysis scenarios. For missing standard deviations (SDs), following screening of 503 articles, fifteen methods were identified in addition to those reported in a previous review. These included Bayesian hierarchical modelling at the meta-analysis level; summary statistic level imputation based on observed SD values from other trials in the meta-analysis; a practical approximation based on the range; and algebraic estimation of the SD based on other summary statistics. Following screening of 1124 articles for methods estimating the mean, one approximate Bayesian computation approach and three papers based on alternative summary statistics were identified. Illustrative meta-analyses showed that when replacing a missing SD the approximation using the range minimised loss of precision and generally performed better than omitting trials. When estimating missing means, a formula using the median, lower quartile and upper quartile performed best in preserving the precision of the meta-analysis findings, although in some scenarios, omitting trials gave superior results. Methods based on summary statistics (minimum, maximum, lower quartile, upper quartile, median) reported in the literature facilitate more comprehensive inclusion of randomised controlled trials with missing mean or variability summary statistics within meta-analyses.
Identifiability of PBPK Models with Applications to ...

EPA Pesticide Factsheets

Any statistical model should be identifiable in order for estimates and tests using it to be meaningful. We consider statistical analysis of physiologically-based pharmacokinetic (PBPK) models in which parameters cannot be estimated precisely from available data, and discuss different types of identifiability that occur in PBPK models and give reasons why they occur. We particularly focus on how the mathematical structure of a PBPK model and lack of appropriate data can lead to statistical models in which it is impossible to estimate at least some parameters precisely. Methods are reviewed which can determine whether a purely linear PBPK model is globally identifiable. We propose a theorem which determines when identifiability at a set of finite and specific values of the mathematical PBPK model (global discrete identifiability) implies identifiability of the statistical model. However, we are unable to establish conditions that imply global discrete identifiability, and conclude that the only safe approach to analysis of PBPK models involves Bayesian analysis with truncated priors. Finally, computational issues regarding posterior simulations of PBPK models are discussed. The methodology is very general and can be applied to numerous PBPK models which can be expressed as linear time-invariant systems. A real data set of a PBPK model for exposure to dimethyl arsinic acid (DMA(V)) is presented to illustrate the proposed methodology. We consider statistical analy
DEIVA: a web application for interactive visual analysis of differential gene expression profiles.

PubMed

Harshbarger, Jayson; Kratz, Anton; Carninci, Piero

2017-01-07

Differential gene expression (DGE) analysis is a technique to identify statistically significant differences in RNA abundance for genes or arbitrary features between different biological states. The result of a DGE test is typically further analyzed using statistical software, spreadsheets or custom ad hoc algorithms. We identified a need for a web-based system to share DGE statistical test results, and locate and identify genes in DGE statistical test results with a very low barrier of entry. We have developed DEIVA, a free and open source, browser-based single page application (SPA) with a strong emphasis on being user friendly that enables locating and identifying single or multiple genes in an immediate, interactive, and intuitive manner. By design, DEIVA scales with very large numbers of users and datasets. Compared to existing software, DEIVA offers a unique combination of design decisions that enable inspection and analysis of DGE statistical test results with an emphasis on ease of use.
A Bifactor Approach to Model Multifaceted Constructs in Statistical Mediation Analysis

ERIC Educational Resources Information Center

Gonzalez, Oscar; MacKinnon, David P.

2018-01-01

Statistical mediation analysis allows researchers to identify the most important mediating constructs in the causal process studied. Identifying specific mediators is especially relevant when the hypothesized mediating construct consists of multiple related facets. The general definition of the construct and its facets might relate differently to…
Local sensitivity analysis for inverse problems solved by singular value decomposition

USGS Publications Warehouse

Hill, M.C.; Nolan, B.T.

2010-01-01

Local sensitivity analysis provides computationally frugal ways to evaluate models commonly used for resource management, risk assessment, and so on. This includes diagnosing inverse model convergence problems caused by parameter insensitivity and(or) parameter interdependence (correlation), understanding what aspects of the model and data contribute to measures of uncertainty, and identifying new data likely to reduce model uncertainty. Here, we consider sensitivity statistics relevant to models in which the process model parameters are transformed using singular value decomposition (SVD) to create SVD parameters for model calibration. The statistics considered include the PEST identifiability statistic, and combined use of the process-model parameter statistics composite scaled sensitivities and parameter correlation coefficients (CSS and PCC). The statistics are complimentary in that the identifiability statistic integrates the effects of parameter sensitivity and interdependence, while CSS and PCC provide individual measures of sensitivity and interdependence. PCC quantifies correlations between pairs or larger sets of parameters; when a set of parameters is intercorrelated, the absolute value of PCC is close to 1.00 for all pairs in the set. The number of singular vectors to include in the calculation of the identifiability statistic is somewhat subjective and influences the statistic. To demonstrate the statistics, we use the USDA’s Root Zone Water Quality Model to simulate nitrogen fate and transport in the unsaturated zone of the Merced River Basin, CA. There are 16 log-transformed process-model parameters, including water content at field capacity (WFC) and bulk density (BD) for each of five soil layers. Calibration data consisted of 1,670 observations comprising soil moisture, soil water tension, aqueous nitrate and bromide concentrations, soil nitrate concentration, and organic matter content. All 16 of the SVD parameters could be estimated by regression based on the range of singular values. Identifiability statistic results varied based on the number of SVD parameters included. Identifiability statistics calculated for four SVD parameters indicate the same three most important process-model parameters as CSS/PCC (WFC1, WFC2, and BD2), but the order differed. Additionally, the identifiability statistic showed that BD1 was almost as dominant as WFC1. The CSS/PCC analysis showed that this results from its high correlation with WCF1 (-0.94), and not its individual sensitivity. Such distinctions, combined with analysis of how high correlations and(or) sensitivities result from the constructed model, can produce important insights into, for example, the use of sensitivity analysis to design monitoring networks. In conclusion, the statistics considered identified similar important parameters. They differ because (1) with CSS/PCC can be more awkward because sensitivity and interdependence are considered separately and (2) identifiability requires consideration of how many SVD parameters to include. A continuing challenge is to understand how these computationally efficient methods compare with computationally demanding global methods like Markov-Chain Monte Carlo given common nonlinear processes and the often even more nonlinear models.
DOE Office of Scientific and Technical Information (OSTI.GOV)

White, Amanda M.; Daly, Don S.; Willse, Alan R.

The Automated Microarray Image Analysis (AMIA) Toolbox for MATLAB is a flexible, open-source microarray image analysis tool that allows the user to customize analysis of sets of microarray images. This tool provides several methods of identifying and quantify spot statistics, as well as extensive diagnostic statistics and images to identify poor data quality or processing. The open nature of this software allows researchers to understand the algorithms used to provide intensity estimates and to modify them easily if desired.
Evaluation of The Operational Benefits Versus Costs of An Automated Cargo Mover

DTIC Science & Technology

2016-12-01

logistics footprint and life-cycle cost are presented as part of this report. Analysis of modeling and simulation results identified statistically...life-cycle cost are presented as part of this report. Analysis of modeling and simulation results identified statistically significant differences...Error of Estimation. Source: Eskew and Lawler (1994). ...........................75 Figure 24. Load Results (100 Runs per Scenario
The sumLINK statistic for genetic linkage analysis in the presence of heterogeneity.

PubMed

Christensen, G B; Knight, S; Camp, N J

2009-11-01

We present the "sumLINK" statistic--the sum of multipoint LOD scores for the subset of pedigrees with nominally significant linkage evidence at a given locus--as an alternative to common methods to identify susceptibility loci in the presence of heterogeneity. We also suggest the "sumLOD" statistic (the sum of positive multipoint LOD scores) as a companion to the sumLINK. sumLINK analysis identifies genetic regions of extreme consistency across pedigrees without regard to negative evidence from unlinked or uninformative pedigrees. Significance is determined by an innovative permutation procedure based on genome shuffling that randomizes linkage information across pedigrees. This procedure for generating the empirical null distribution may be useful for other linkage-based statistics as well. Using 500 genome-wide analyses of simulated null data, we show that the genome shuffling procedure results in the correct type 1 error rates for both the sumLINK and sumLOD. The power of the statistics was tested using 100 sets of simulated genome-wide data from the alternative hypothesis from GAW13. Finally, we illustrate the statistics in an analysis of 190 aggressive prostate cancer pedigrees from the International Consortium for Prostate Cancer Genetics, where we identified a new susceptibility locus. We propose that the sumLINK and sumLOD are ideal for collaborative projects and meta-analyses, as they do not require any sharing of identifiable data between contributing institutions. Further, loci identified with the sumLINK have good potential for gene localization via statistical recombinant mapping, as, by definition, several linked pedigrees contribute to each peak.
Consequences of common data analysis inaccuracies in CNS trauma injury basic research.

PubMed

Burke, Darlene A; Whittemore, Scott R; Magnuson, David S K

2013-05-15

The development of successful treatments for humans after traumatic brain or spinal cord injuries (TBI and SCI, respectively) requires animal research. This effort can be hampered when promising experimental results cannot be replicated because of incorrect data analysis procedures. To identify and hopefully avoid these errors in future studies, the articles in seven journals with the highest number of basic science central nervous system TBI and SCI animal research studies published in 2010 (N=125 articles) were reviewed for their data analysis procedures. After identifying the most common statistical errors, the implications of those findings were demonstrated by reanalyzing previously published data from our laboratories using the identified inappropriate statistical procedures, then comparing the two sets of results. Overall, 70% of the articles contained at least one type of inappropriate statistical procedure. The highest percentage involved incorrect post hoc t-tests (56.4%), followed by inappropriate parametric statistics (analysis of variance and t-test; 37.6%). Repeated Measures analysis was inappropriately missing in 52.0% of all articles and, among those with behavioral assessments, 58% were analyzed incorrectly. Reanalysis of our published data using the most common inappropriate statistical procedures resulted in a 14.1% average increase in significant effects compared to the original results. Specifically, an increase of 15.5% occurred with Independent t-tests and 11.1% after incorrect post hoc t-tests. Utilizing proper statistical procedures can allow more-definitive conclusions, facilitate replicability of research results, and enable more accurate translation of those results to the clinic.
Identifiability of PBPK Models with Applications to Dimethylarsinic Acid Exposure

EPA Science Inventory

Any statistical model should be identifiable in order for estimates and tests using it to be meaningful. We consider statistical analysis of physiologically-based pharmacokinetic (PBPK) models in which parameters cannot be estimated precisely from available data, and discuss diff...
Comment on “Two statistics for evaluating parameter identifiability and error reduction” by John Doherty and Randall J. Hunt

USGS Publications Warehouse

Hill, Mary C.

2010-01-01

Doherty and Hunt (2009) present important ideas for first-order-second moment sensitivity analysis, but five issues are discussed in this comment. First, considering the composite-scaled sensitivity (CSS) jointly with parameter correlation coefficients (PCC) in a CSS/PCC analysis addresses the difficulties with CSS mentioned in the introduction. Second, their new parameter identifiability statistic actually is likely to do a poor job of parameter identifiability in common situations. The statistic instead performs the very useful role of showing how model parameters are included in the estimated singular value decomposition (SVD) parameters. Its close relation to CSS is shown. Third, the idea from p. 125 that a suitable truncation point for SVD parameters can be identified using the prediction variance is challenged using results from Moore and Doherty (2005). Fourth, the relative error reduction statistic of Doherty and Hunt is shown to belong to an emerging set of statistics here named perturbed calculated variance statistics. Finally, the perturbed calculated variance statistics OPR and PPR mentioned on p. 121 are shown to explicitly include the parameter null-space component of uncertainty. Indeed, OPR and PPR results that account for null-space uncertainty have appeared in the literature since 2000.
The application of artificial intelligence to microarray data: identification of a novel gene signature to identify bladder cancer progression.

PubMed

Catto, James W F; Abbod, Maysam F; Wild, Peter J; Linkens, Derek A; Pilarsky, Christian; Rehman, Ishtiaq; Rosario, Derek J; Denzinger, Stefan; Burger, Maximilian; Stoehr, Robert; Knuechel, Ruth; Hartmann, Arndt; Hamdy, Freddie C

2010-03-01

New methods for identifying bladder cancer (BCa) progression are required. Gene expression microarrays can reveal insights into disease biology and identify novel biomarkers. However, these experiments produce large datasets that are difficult to interpret. To develop a novel method of microarray analysis combining two forms of artificial intelligence (AI): neurofuzzy modelling (NFM) and artificial neural networks (ANN) and validate it in a BCa cohort. We used AI and statistical analyses to identify progression-related genes in a microarray dataset (n=66 tumours, n=2800 genes). The AI-selected genes were then investigated in a second cohort (n=262 tumours) using immunohistochemistry. We compared the accuracy of AI and statistical approaches to identify tumour progression. AI identified 11 progression-associated genes (odds ratio [OR]: 0.70; 95% confidence interval [CI], 0.56-0.87; p=0.0004), and these were more discriminate than genes chosen using statistical analyses (OR: 1.24; 95% CI, 0.96-1.60; p=0.09). The expression of six AI-selected genes (LIG3, FAS, KRT18, ICAM1, DSG2, and BRCA2) was determined using commercial antibodies and successfully identified tumour progression (concordance index: 0.66; log-rank test: p=0.01). AI-selected genes were more discriminate than pathologic criteria at determining progression (Cox multivariate analysis: p=0.01). Limitations include the use of statistical correlation to identify 200 genes for AI analysis and that we did not compare regression identified genes with immunohistochemistry. AI and statistical analyses use different techniques of inference to determine gene-phenotype associations and identify distinct prognostic gene signatures that are equally valid. We have identified a prognostic gene signature whose members reflect a variety of carcinogenic pathways that could identify progression in non-muscle-invasive BCa. 2009 European Association of Urology. Published by Elsevier B.V. All rights reserved.
Sex differences in discriminative power of volleyball game-related statistics.

PubMed

João, Paulo Vicente; Leite, Nuno; Mesquita, Isabel; Sampaio, Jaime

2010-12-01

To identify sex differences in volleyball game-related statistics, the game-related statistics of several World Championships in 2007 (N=132) were analyzed using the software VIS from the International Volleyball Federation. Discriminant analysis was used to identify the game-related statistics which better discriminated performances by sex. Analysis yielded an emphasis on fault serves (SC = -.40), shot spikes (SC = .40), and reception digs (SC = .31). Specific robust numbers represent that considerable variability was evident in the game-related statistics profile, as men's volleyball games were better associated with terminal actions (errors of service), and women's volleyball games were characterized by continuous actions (in defense and attack). These differences may be related to the anthropometric and physiological differences between women and men and their influence on performance profiles.
Detecting subtle hydrochemical anomalies with multivariate statistics: an example from homogeneous groundwaters in the Great Artesian Basin, Australia

NASA Astrophysics Data System (ADS)

O'Shea, Bethany; Jankowski, Jerzy

2006-12-01

The major ion composition of Great Artesian Basin groundwater in the lower Namoi River valley is relatively homogeneous in chemical composition. Traditional graphical techniques have been combined with multivariate statistical methods to determine whether subtle differences in the chemical composition of these waters can be delineated. Hierarchical cluster analysis and principal components analysis were successful in delineating minor variations within the groundwaters of the study area that were not visually identified in the graphical techniques applied. Hydrochemical interpretation allowed geochemical processes to be identified in each statistically defined water type and illustrated how these groundwaters differ from one another. Three main geochemical processes were identified in the groundwaters: ion exchange, precipitation, and mixing between waters from different sources. Both statistical methods delineated an anomalous sample suspected of being influenced by magmatic CO2 input. The use of statistical methods to complement traditional graphical techniques for waters appearing homogeneous is emphasized for all investigations of this type. Copyright
Protein Sectors: Statistical Coupling Analysis versus Conservation

PubMed Central

Teşileanu, Tiberiu; Colwell, Lucy J.; Leibler, Stanislas

2015-01-01

Statistical coupling analysis (SCA) is a method for analyzing multiple sequence alignments that was used to identify groups of coevolving residues termed “sectors”. The method applies spectral analysis to a matrix obtained by combining correlation information with sequence conservation. It has been asserted that the protein sectors identified by SCA are functionally significant, with different sectors controlling different biochemical properties of the protein. Here we reconsider the available experimental data and note that it involves almost exclusively proteins with a single sector. We show that in this case sequence conservation is the dominating factor in SCA, and can alone be used to make statistically equivalent functional predictions. Therefore, we suggest shifting the experimental focus to proteins for which SCA identifies several sectors. Correlations in protein alignments, which have been shown to be informative in a number of independent studies, would then be less dominated by sequence conservation. PMID:25723535
Pathway analysis with next-generation sequencing data.

PubMed

Zhao, Jinying; Zhu, Yun; Boerwinkle, Eric; Xiong, Momiao

2015-04-01

Although pathway analysis methods have been developed and successfully applied to association studies of common variants, the statistical methods for pathway-based association analysis of rare variants have not been well developed. Many investigators observed highly inflated false-positive rates and low power in pathway-based tests of association of rare variants. The inflated false-positive rates and low true-positive rates of the current methods are mainly due to their lack of ability to account for gametic phase disequilibrium. To overcome these serious limitations, we develop a novel statistic that is based on the smoothed functional principal component analysis (SFPCA) for pathway association tests with next-generation sequencing data. The developed statistic has the ability to capture position-level variant information and account for gametic phase disequilibrium. By intensive simulations, we demonstrate that the SFPCA-based statistic for testing pathway association with either rare or common or both rare and common variants has the correct type 1 error rates. Also the power of the SFPCA-based statistic and 22 additional existing statistics are evaluated. We found that the SFPCA-based statistic has a much higher power than other existing statistics in all the scenarios considered. To further evaluate its performance, the SFPCA-based statistic is applied to pathway analysis of exome sequencing data in the early-onset myocardial infarction (EOMI) project. We identify three pathways significantly associated with EOMI after the Bonferroni correction. In addition, our preliminary results show that the SFPCA-based statistic has much smaller P-values to identify pathway association than other existing methods.
Algorithm for Identifying Erroneous Rain-Gauge Readings

NASA Technical Reports Server (NTRS)

Rickman, Doug

2005-01-01

An algorithm analyzes rain-gauge data to identify statistical outliers that could be deemed to be erroneous readings. Heretofore, analyses of this type have been performed in burdensome manual procedures that have involved subjective judgements. Sometimes, the analyses have included computational assistance for detecting values falling outside of arbitrary limits. The analyses have been performed without statistically valid knowledge of the spatial and temporal variations of precipitation within rain events. In contrast, the present algorithm makes it possible to automate such an analysis, makes the analysis objective, takes account of the spatial distribution of rain gauges in conjunction with the statistical nature of spatial variations in rainfall readings, and minimizes the use of arbitrary criteria. The algorithm implements an iterative process that involves nonparametric statistics.
Demonstration of a software design and statistical analysis methodology with application to patient outcomes data sets

PubMed Central

Mayo, Charles; Conners, Steve; Warren, Christopher; Miller, Robert; Court, Laurence; Popple, Richard

2013-01-01

Purpose: With emergence of clinical outcomes databases as tools utilized routinely within institutions, comes need for software tools to support automated statistical analysis of these large data sets and intrainstitutional exchange from independent federated databases to support data pooling. In this paper, the authors present a design approach and analysis methodology that addresses both issues. Methods: A software application was constructed to automate analysis of patient outcomes data using a wide range of statistical metrics, by combining use of C#.Net and R code. The accuracy and speed of the code was evaluated using benchmark data sets. Results: The approach provides data needed to evaluate combinations of statistical measurements for ability to identify patterns of interest in the data. Through application of the tools to a benchmark data set for dose-response threshold and to SBRT lung data sets, an algorithm was developed that uses receiver operator characteristic curves to identify a threshold value and combines use of contingency tables, Fisher exact tests, Welch t-tests, and Kolmogorov-Smirnov tests to filter the large data set to identify values demonstrating dose-response. Kullback-Leibler divergences were used to provide additional confirmation. Conclusions: The work demonstrates the viability of the design approach and the software tool for analysis of large data sets. PMID:24320426
Demonstration of a software design and statistical analysis methodology with application to patient outcomes data sets.

PubMed

Mayo, Charles; Conners, Steve; Warren, Christopher; Miller, Robert; Court, Laurence; Popple, Richard

2013-11-01

With emergence of clinical outcomes databases as tools utilized routinely within institutions, comes need for software tools to support automated statistical analysis of these large data sets and intrainstitutional exchange from independent federated databases to support data pooling. In this paper, the authors present a design approach and analysis methodology that addresses both issues. A software application was constructed to automate analysis of patient outcomes data using a wide range of statistical metrics, by combining use of C#.Net and R code. The accuracy and speed of the code was evaluated using benchmark data sets. The approach provides data needed to evaluate combinations of statistical measurements for ability to identify patterns of interest in the data. Through application of the tools to a benchmark data set for dose-response threshold and to SBRT lung data sets, an algorithm was developed that uses receiver operator characteristic curves to identify a threshold value and combines use of contingency tables, Fisher exact tests, Welch t-tests, and Kolmogorov-Smirnov tests to filter the large data set to identify values demonstrating dose-response. Kullback-Leibler divergences were used to provide additional confirmation. The work demonstrates the viability of the design approach and the software tool for analysis of large data sets.
Statistical assessment of the learning curves of health technologies.

PubMed

Ramsay, C R; Grant, A M; Wallace, S A; Garthwaite, P H; Monk, A F; Russell, I T

2001-01-01

(1) To describe systematically studies that directly assessed the learning curve effect of health technologies. (2) Systematically to identify 'novel' statistical techniques applied to learning curve data in other fields, such as psychology and manufacturing. (3) To test these statistical techniques in data sets from studies of varying designs to assess health technologies in which learning curve effects are known to exist. METHODS - STUDY SELECTION (HEALTH TECHNOLOGY ASSESSMENT LITERATURE REVIEW): For a study to be included, it had to include a formal analysis of the learning curve of a health technology using a graphical, tabular or statistical technique. METHODS - STUDY SELECTION (NON-HEALTH TECHNOLOGY ASSESSMENT LITERATURE SEARCH): For a study to be included, it had to include a formal assessment of a learning curve using a statistical technique that had not been identified in the previous search. METHODS - DATA SOURCES: Six clinical and 16 non-clinical biomedical databases were searched. A limited amount of handsearching and scanning of reference lists was also undertaken. METHODS - DATA EXTRACTION (HEALTH TECHNOLOGY ASSESSMENT LITERATURE REVIEW): A number of study characteristics were abstracted from the papers such as study design, study size, number of operators and the statistical method used. METHODS - DATA EXTRACTION (NON-HEALTH TECHNOLOGY ASSESSMENT LITERATURE SEARCH): The new statistical techniques identified were categorised into four subgroups of increasing complexity: exploratory data analysis; simple series data analysis; complex data structure analysis, generic techniques. METHODS - TESTING OF STATISTICAL METHODS: Some of the statistical methods identified in the systematic searches for single (simple) operator series data and for multiple (complex) operator series data were illustrated and explored using three data sets. The first was a case series of 190 consecutive laparoscopic fundoplication procedures performed by a single surgeon; the second was a case series of consecutive laparoscopic cholecystectomy procedures performed by ten surgeons; the third was randomised trial data derived from the laparoscopic procedure arm of a multicentre trial of groin hernia repair, supplemented by data from non-randomised operations performed during the trial. RESULTS - HEALTH TECHNOLOGY ASSESSMENT LITERATURE REVIEW: Of 4571 abstracts identified, 272 (6%) were later included in the study after review of the full paper. Some 51% of studies assessed a surgical minimal access technique and 95% were case series. The statistical method used most often (60%) was splitting the data into consecutive parts (such as halves or thirds), with only 14% attempting a more formal statistical analysis. The reporting of the studies was poor, with 31% giving no details of data collection methods. RESULTS - NON-HEALTH TECHNOLOGY ASSESSMENT LITERATURE SEARCH: Of 9431 abstracts assessed, 115 (1%) were deemed appropriate for further investigation and, of these, 18 were included in the study. All of the methods for complex data sets were identified in the non-clinical literature. These were discriminant analysis, two-stage estimation of learning rates, generalised estimating equations, multilevel models, latent curve models, time series models and stochastic parameter models. In addition, eight new shapes of learning curves were identified. RESULTS - TESTING OF STATISTICAL METHODS: No one particular shape of learning curve performed significantly better than another. The performance of 'operation time' as a proxy for learning differed between the three procedures. Multilevel modelling using the laparoscopic cholecystectomy data demonstrated and measured surgeon-specific and confounding effects. The inclusion of non-randomised cases, despite the possible limitations of the method, enhanced the interpretation of learning effects. CONCLUSIONS - HEALTH TECHNOLOGY ASSESSMENT LITERATURE REVIEW: The statistical methods used for assessing learning effects in health technology assessment have been crude and the reporting of studies poor. CONCLUSIONS - NON-HEALTH TECHNOLOGY ASSESSMENT LITERATURE SEARCH: A number of statistical methods for assessing learning effects were identified that had not hitherto been used in health technology assessment. There was a hierarchy of methods for the identification and measurement of learning, and the more sophisticated methods for both have had little if any use in health technology assessment. This demonstrated the value of considering fields outside clinical research when addressing methodological issues in health technology assessment. CONCLUSIONS - TESTING OF STATISTICAL METHODS: It has been demonstrated that the portfolio of techniques identified can enhance investigations of learning curve effects. (ABSTRACT TRUNCATED)

Prison Radicalization: The New Extremist Training Grounds?

DTIC Science & Technology

2007-09-01

distributing and collecting survey data , and the data analysis. The analytical methodology includes descriptive and inferential statistical methods, in... statistical analysis of the responses to identify significant correlations and relationships. B. SURVEY DATA COLLECTION To effectively access a...Q18, Q19, Q20, and Q21. Due to the exploratory nature of this small survey, data analyses were confined mostly to descriptive statistics and
[The principal components analysis--method to classify the statistical variables with applications in medicine].

PubMed

Dascălu, Cristina Gena; Antohe, Magda Ecaterina

2009-01-01

Based on the eigenvalues and the eigenvectors analysis, the principal component analysis has the purpose to identify the subspace of the main components from a set of parameters, which are enough to characterize the whole set of parameters. Interpreting the data for analysis as a cloud of points, we find through geometrical transformations the directions where the cloud's dispersion is maximal--the lines that pass through the cloud's center of weight and have a maximal density of points around them (by defining an appropriate criteria function and its minimization. This method can be successfully used in order to simplify the statistical analysis on questionnaires--because it helps us to select from a set of items only the most relevant ones, which cover the variations of the whole set of data. For instance, in the presented sample we started from a questionnaire with 28 items and, applying the principal component analysis we identified 7 principal components--or main items--fact that simplifies significantly the further data statistical analysis.
A quadratically regularized functional canonical correlation analysis for identifying the global structure of pleiotropy with NGS data

PubMed Central

Zhu, Yun; Fan, Ruzong; Xiong, Momiao

2017-01-01

Investigating the pleiotropic effects of genetic variants can increase statistical power, provide important information to achieve deep understanding of the complex genetic structures of disease, and offer powerful tools for designing effective treatments with fewer side effects. However, the current multiple phenotype association analysis paradigm lacks breadth (number of phenotypes and genetic variants jointly analyzed at the same time) and depth (hierarchical structure of phenotype and genotypes). A key issue for high dimensional pleiotropic analysis is to effectively extract informative internal representation and features from high dimensional genotype and phenotype data. To explore correlation information of genetic variants, effectively reduce data dimensions, and overcome critical barriers in advancing the development of novel statistical methods and computational algorithms for genetic pleiotropic analysis, we proposed a new statistic method referred to as a quadratically regularized functional CCA (QRFCCA) for association analysis which combines three approaches: (1) quadratically regularized matrix factorization, (2) functional data analysis and (3) canonical correlation analysis (CCA). Large-scale simulations show that the QRFCCA has a much higher power than that of the ten competing statistics while retaining the appropriate type 1 errors. To further evaluate performance, the QRFCCA and ten other statistics are applied to the whole genome sequencing dataset from the TwinsUK study. We identify a total of 79 genes with rare variants and 67 genes with common variants significantly associated with the 46 traits using QRFCCA. The results show that the QRFCCA substantially outperforms the ten other statistics. PMID:29040274
The statistics of identifying differentially expressed genes in Expresso and TM4: a comparison

PubMed Central

Sioson, Allan A; Mane, Shrinivasrao P; Li, Pinghua; Sha, Wei; Heath, Lenwood S; Bohnert, Hans J; Grene, Ruth

2006-01-01

Background Analysis of DNA microarray data takes as input spot intensity measurements from scanner software and returns differential expression of genes between two conditions, together with a statistical significance assessment. This process typically consists of two steps: data normalization and identification of differentially expressed genes through statistical analysis. The Expresso microarray experiment management system implements these steps with a two-stage, log-linear ANOVA mixed model technique, tailored to individual experimental designs. The complement of tools in TM4, on the other hand, is based on a number of preset design choices that limit its flexibility. In the TM4 microarray analysis suite, normalization, filter, and analysis methods form an analysis pipeline. TM4 computes integrated intensity values (IIV) from the average intensities and spot pixel counts returned by the scanner software as input to its normalization steps. By contrast, Expresso can use either IIV data or median intensity values (MIV). Here, we compare Expresso and TM4 analysis of two experiments and assess the results against qRT-PCR data. Results The Expresso analysis using MIV data consistently identifies more genes as differentially expressed, when compared to Expresso analysis with IIV data. The typical TM4 normalization and filtering pipeline corrects systematic intensity-specific bias on a per microarray basis. Subsequent statistical analysis with Expresso or a TM4 t-test can effectively identify differentially expressed genes. The best agreement with qRT-PCR data is obtained through the use of Expresso analysis and MIV data. Conclusion The results of this research are of practical value to biologists who analyze microarray data sets. The TM4 normalization and filtering pipeline corrects microarray-specific systematic bias and complements the normalization stage in Expresso analysis. The results of Expresso using MIV data have the best agreement with qRT-PCR results. In one experiment, MIV is a better choice than IIV as input to data normalization and statistical analysis methods, as it yields as greater number of statistically significant differentially expressed genes; TM4 does not support the choice of MIV input data. Overall, the more flexible and extensive statistical models of Expresso achieve more accurate analytical results, when judged by the yardstick of qRT-PCR data, in the context of an experimental design of modest complexity. PMID:16626497
[Continuity of hospital identifiers in hospital discharge data - Analysis of the nationwide German DRG Statistics from 2005 to 2013].

PubMed

Nimptsch, Ulrike; Wengler, Annelene; Mansky, Thomas

2016-11-01

In Germany, nationwide hospital discharge data (DRG statistics provided by the research data centers of the Federal Statistical Office and the Statistical Offices of the 'Länder') are increasingly used as data source for health services research. Within this data hospitals can be separated via their hospital identifier ([Institutionskennzeichen] IK). However, this hospital identifier primarily designates the invoicing unit and is not necessarily equivalent to one hospital location. Aiming to investigate direction and extent of possible bias in hospital-level analyses this study examines the continuity of the hospital identifier within a cross-sectional and longitudinal approach and compares the results to official hospital census statistics. Within the DRG statistics from 2005 to 2013 the annual number of hospitals as classified by hospital identifiers was counted for each year of observation. The annual number of hospitals derived from DRG statistics was compared to the number of hospitals in the official census statistics 'Grunddaten der Krankenhäuser'. Subsequently, the temporal continuity of hospital identifiers in the DRG statistics was analyzed within cohorts of hospitals. Until 2013, the annual number of hospital identifiers in the DRG statistics fell by 175 (from 1,725 to 1,550). This decline affected only providers with small or medium case volume. The number of hospitals identified in the DRG statistics was lower than the number given in the census statistics (e.g., in 2013 1,550 IK vs. 1,668 hospitals in the census statistics). The longitudinal analyses revealed that the majority of hospital identifiers persisted in the years of observation, while one fifth of hospital identifiers changed. In cross-sectional studies of German hospital discharge data the separation of hospitals via the hospital identifier might lead to underestimating the number of hospitals and consequential overestimation of caseload per hospital. Discontinuities of hospital identifiers over time might impair the follow-up of hospital cohorts. These limitations must be taken into account in analyses of German hospital discharge data focusing on the hospital level. Copyright © 2016. Published by Elsevier GmbH.
Identification of key micro-organisms involved in Douchi fermentation by statistical analysis and their use in an experimental fermentation.

PubMed

Chen, C; Xiang, J Y; Hu, W; Xie, Y B; Wang, T J; Cui, J W; Xu, Y; Liu, Z; Xiang, H; Xie, Q

2015-11-01

To screen and identify safe micro-organisms used during Douchi fermentation, and verify the feasibility of producing high-quality Douchi using these identified micro-organisms. PCR-denaturing gradient gel electrophoresis (DGGE) and automatic amino-acid analyser were used to investigate the microbial diversity and free amino acids (FAAs) content of 10 commercial Douchi samples. The correlations between microbial communities and FAAs were analysed by statistical analysis. Ten strains with significant positive correlation were identified. Then an experiment on Douchi fermentation by identified strains was carried out, and the nutritional composition in Douchi was analysed. Results showed that FAAs and relative content of isoflavone aglycones in verification Douchi samples were generally higher than those in commercial Douchi samples. Our study indicated that fungi, yeasts, Bacillus and lactic acid bacteria were the key players in Douchi fermentation, and with identified probiotic micro-organisms participating in fermentation, a higher quality Douchi product was produced. This is the first report to analyse and confirm the key micro-organisms during Douchi fermentation by statistical analysis. This work proves fermentation micro-organisms to be the key influencing factor of Douchi quality, and demonstrates the feasibility of fermenting Douchi using identified starter micro-organisms. © 2015 The Society for Applied Microbiology.
Rumen fluid metabolomics analysis associated with feed efficiency on crossbred steers

USDA-ARS?s Scientific Manuscript database

The rumen has a central role in the efficiency of digestion in ruminants. To identify potential differences in rumen function that lead to differences in feed efficiency, rumen fluid metabolomic analysis by LC-MS and multivariate/univariate statistical analysis were used to identify differences in r...
Research Update: Spatially resolved mapping of electronic structure on atomic level by multivariate statistical analysis

NASA Astrophysics Data System (ADS)

Belianinov, Alex; Ganesh, Panchapakesan; Lin, Wenzhi; Sales, Brian C.; Sefat, Athena S.; Jesse, Stephen; Pan, Minghu; Kalinin, Sergei V.

2014-12-01

Atomic level spatial variability of electronic structure in Fe-based superconductor FeTe0.55Se0.45 (Tc = 15 K) is explored using current-imaging tunneling-spectroscopy. Multivariate statistical analysis of the data differentiates regions of dissimilar electronic behavior that can be identified with the segregation of chalcogen atoms, as well as boundaries between terminations and near neighbor interactions. Subsequent clustering analysis allows identification of the spatial localization of these dissimilar regions. Similar statistical analysis of modeled calculated density of states of chemically inhomogeneous FeTe1-xSex structures further confirms that the two types of chalcogens, i.e., Te and Se, can be identified by their electronic signature and differentiated by their local chemical environment. This approach allows detailed chemical discrimination of the scanning tunneling microscopy data including separation of atomic identities, proximity, and local configuration effects and can be universally applicable to chemically and electronically inhomogeneous surfaces.
Probability of identification: a statistical model for the validation of qualitative botanical identification methods.

PubMed

LaBudde, Robert A; Harnly, James M

2012-01-01

A qualitative botanical identification method (BIM) is an analytical procedure that returns a binary result (1 = Identified, 0 = Not Identified). A BIM may be used by a buyer, manufacturer, or regulator to determine whether a botanical material being tested is the same as the target (desired) material, or whether it contains excessive nontarget (undesirable) material. The report describes the development and validation of studies for a BIM based on the proportion of replicates identified, or probability of identification (POI), as the basic observed statistic. The statistical procedures proposed for data analysis follow closely those of the probability of detection, and harmonize the statistical concepts and parameters between quantitative and qualitative method validation. Use of POI statistics also harmonizes statistical concepts for botanical, microbiological, toxin, and other analyte identification methods that produce binary results. The POI statistical model provides a tool for graphical representation of response curves for qualitative methods, reporting of descriptive statistics, and application of performance requirements. Single collaborator and multicollaborative study examples are given.
Statistical dependency in visual scanning

NASA Technical Reports Server (NTRS)

Ellis, Stephen R.; Stark, Lawrence

1986-01-01

A method to identify statistical dependencies in the positions of eye fixations is developed and applied to eye movement data from subjects who viewed dynamic displays of air traffic and judged future relative position of aircraft. Analysis of approximately 23,000 fixations on points of interest on the display identified statistical dependencies in scanning that were independent of the physical placement of the points of interest. Identification of these dependencies is inconsistent with random-sampling-based theories used to model visual search and information seeking.
A spatial cluster analysis of tractor overturns in Kentucky from 1960 to 2002

USGS Publications Warehouse

Saman, D.M.; Cole, H.P.; Odoi, A.; Myers, M.L.; Carey, D.I.; Westneat, S.C.

2012-01-01

Background: Agricultural tractor overturns without rollover protective structures are the leading cause of farm fatalities in the United States. To our knowledge, no studies have incorporated the spatial scan statistic in identifying high-risk areas for tractor overturns. The aim of this study was to determine whether tractor overturns cluster in certain parts of Kentucky and identify factors associated with tractor overturns. Methods: A spatial statistical analysis using Kulldorff's spatial scan statistic was performed to identify county clusters at greatest risk for tractor overturns. A regression analysis was then performed to identify factors associated with tractor overturns. Results: The spatial analysis revealed a cluster of higher than expected tractor overturns in four counties in northern Kentucky (RR = 2.55) and 10 counties in eastern Kentucky (RR = 1.97). Higher rates of tractor overturns were associated with steeper average percent slope of pasture land by county (p = 0.0002) and a greater percent of total tractors with less than 40 horsepower by county (p<0.0001). Conclusions: This study reveals that geographic hotspots of tractor overturns exist in Kentucky and identifies factors associated with overturns. This study provides policymakers a guide to targeted county-level interventions (e.g., roll-over protective structures promotion interventions) with the intention of reducing tractor overturns in the highest risk counties in Kentucky. ?? 2012 Saman et al.
On the Use of Principal Component and Spectral Density Analysis to Evaluate the Community Multiscale Air Quality (CMAQ) Model

EPA Science Inventory

A 5 year (2002-2006) simulation of CMAQ covering the eastern United States is evaluated using principle component analysis in order to identify and characterize statistically significant patterns of model bias. Such analysis is useful in that in can identify areas of poor model ...
IGESS: a statistical approach to integrating individual-level genotype data and summary statistics in genome-wide association studies.

PubMed

Dai, Mingwei; Ming, Jingsi; Cai, Mingxuan; Liu, Jin; Yang, Can; Wan, Xiang; Xu, Zongben

2017-09-15

Results from genome-wide association studies (GWAS) suggest that a complex phenotype is often affected by many variants with small effects, known as 'polygenicity'. Tens of thousands of samples are often required to ensure statistical power of identifying these variants with small effects. However, it is often the case that a research group can only get approval for the access to individual-level genotype data with a limited sample size (e.g. a few hundreds or thousands). Meanwhile, summary statistics generated using single-variant-based analysis are becoming publicly available. The sample sizes associated with the summary statistics datasets are usually quite large. How to make the most efficient use of existing abundant data resources largely remains an open question. In this study, we propose a statistical approach, IGESS, to increasing statistical power of identifying risk variants and improving accuracy of risk prediction by i ntegrating individual level ge notype data and s ummary s tatistics. An efficient algorithm based on variational inference is developed to handle the genome-wide analysis. Through comprehensive simulation studies, we demonstrated the advantages of IGESS over the methods which take either individual-level data or summary statistics data as input. We applied IGESS to perform integrative analysis of Crohns Disease from WTCCC and summary statistics from other studies. IGESS was able to significantly increase the statistical power of identifying risk variants and improve the risk prediction accuracy from 63.2% ( ±0.4% ) to 69.4% ( ±0.1% ) using about 240 000 variants. The IGESS software is available at https://github.com/daviddaigithub/IGESS . zbxu@xjtu.edu.cn or xwan@comp.hkbu.edu.hk or eeyang@hkbu.edu.hk. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Functional Path Analysis as a Multivariate Technique in Developing a Theory of Participation in Adult Education.

ERIC Educational Resources Information Center

Martin, James L.

This paper reports on attempts by the author to construct a theoretical framework of adult education participation using a theory development process and the corresponding multivariate statistical techniques. Two problems are identified: the lack of theoretical framework in studying problems, and the limiting of statistical analysis to univariate…
Implementation of quality by design principles in the development of microsponges as drug delivery carriers: Identification and optimization of critical factors using multivariate statistical analyses and design of experiments studies.

PubMed

Simonoska Crcarevska, Maja; Dimitrovska, Aneta; Sibinovska, Nadica; Mladenovska, Kristina; Slavevska Raicki, Renata; Glavas Dodov, Marija

2015-07-15

Microsponges drug delivery system (MDDC) was prepared by double emulsion-solvent-diffusion technique using rotor-stator homogenization. Quality by design (QbD) concept was implemented for the development of MDDC with potential to be incorporated into semisolid dosage form (gel). Quality target product profile (QTPP) and critical quality attributes (CQA) were defined and identified, accordingly. Critical material attributes (CMA) and Critical process parameters (CPP) were identified using quality risk management (QRM) tool, failure mode, effects and criticality analysis (FMECA). CMA and CPP were identified based on results obtained from principal component analysis (PCA-X&Y) and partial least squares (PLS) statistical analysis along with literature data, product and process knowledge and understanding. FMECA identified amount of ethylcellulose, chitosan, acetone, dichloromethane, span 80, tween 80 and water ratio in primary/multiple emulsions as CMA and rotation speed and stirrer type used for organic solvent removal as CPP. The relationship between identified CPP and particle size as CQA was described in the design space using design of experiments - one-factor response surface method. Obtained results from statistically designed experiments enabled establishment of mathematical models and equations that were used for detailed characterization of influence of identified CPP upon MDDC particle size and particle size distribution and their subsequent optimization. Copyright © 2015 Elsevier B.V. All rights reserved.
CADDIS Volume 4. Data Analysis: Basic Principles & Issues

EPA Pesticide Factsheets

Use of inferential statistics in causal analysis, introduction to data independence and autocorrelation, methods to identifying and control for confounding variables, references for the Basic Principles section of Data Analysis.
Meteorological regimes for the classification of aerospace air quality predictions for NASA-Kennedy Space Center

NASA Technical Reports Server (NTRS)

Stephens, J. B.; Sloan, J. C.

1976-01-01

A method is described for developing a statistical air quality assessment for the launch of an aerospace vehicle from the Kennedy Space Center in terms of existing climatological data sets. The procedure can be refined as developing meteorological conditions are identified for use with the NASA-Marshall Space Flight Center Rocket Exhaust Effluent Diffusion (REED) description. Classical climatological regimes for the long range analysis can be narrowed as the synoptic and mesoscale structure is identified. Only broad synoptic regimes are identified at this stage of analysis. As the statistical data matrix is developed, synoptic regimes will be refined in terms of the resulting eigenvectors as applicable to aerospace air quality predictions.
Automation method to identify the geological structure of seabed using spatial statistic analysis of echo sounding data

NASA Astrophysics Data System (ADS)

Kwon, O.; Kim, W.; Kim, J.

2017-12-01

Recently construction of subsea tunnel has been increased globally. For safe construction of subsea tunnel, identifying the geological structure including fault at design and construction stage is more than important. Then unlike the tunnel in land, it's very difficult to obtain the data on geological structure because of the limit in geological survey. This study is intended to challenge such difficulties in a way of developing the technology to identify the geological structure of seabed automatically by using echo sounding data. When investigation a potential site for a deep subsea tunnel, there is the technical and economical limit with borehole of geophysical investigation. On the contrary, echo sounding data is easily obtainable while information reliability is higher comparing to above approaches. This study is aimed at developing the algorithm that identifies the large scale of geological structure of seabed using geostatic approach. This study is based on theory of structural geology that topographic features indicate geological structure. Basic concept of algorithm is outlined as follows; (1) convert the seabed topography to the grid data using echo sounding data, (2) apply the moving window in optimal size to the grid data, (3) estimate the spatial statistics of the grid data in the window area, (4) set the percentile standard of spatial statistics, (5) display the values satisfying the standard on the map, (6) visualize the geological structure on the map. The important elements in this study include optimal size of moving window, kinds of optimal spatial statistics and determination of optimal percentile standard. To determine such optimal elements, a numerous simulations were implemented. Eventually, user program based on R was developed using optimal analysis algorithm. The user program was designed to identify the variations of various spatial statistics. It leads to easy analysis of geological structure depending on variation of spatial statistics by arranging to easily designate the type of spatial statistics and percentile standard. This research was supported by the Korea Agency for Infrastructure Technology Advancement under the Ministry of Land, Infrastructure and Transport of the Korean government. (Project Number: 13 Construction Research T01)
On the Use of Biomineral Oxygen Isotope Data to Identify Human Migrants in the Archaeological Record: Intra-Sample Variation, Statistical Methods and Geographical Considerations

PubMed Central

Lightfoot, Emma; O’Connell, Tamsin C.

2016-01-01

Oxygen isotope analysis of archaeological skeletal remains is an increasingly popular tool to study past human migrations. It is based on the assumption that human body chemistry preserves the δ18O of precipitation in such a way as to be a useful technique for identifying migrants and, potentially, their homelands. In this study, the first such global survey, we draw on published human tooth enamel and bone bioapatite data to explore the validity of using oxygen isotope analyses to identify migrants in the archaeological record. We use human δ18O results to show that there are large variations in human oxygen isotope values within a population sample. This may relate to physiological factors influencing the preservation of the primary isotope signal, or due to human activities (such as brewing, boiling, stewing, differential access to water sources and so on) causing variation in ingested water and food isotope values. We compare the number of outliers identified using various statistical methods. We determine that the most appropriate method for identifying migrants is dependent on the data but is likely to be the IQR or median absolute deviation from the median under most archaeological circumstances. Finally, through a spatial assessment of the dataset, we show that the degree of overlap in human isotope values from different locations across Europe is such that identifying individuals’ homelands on the basis of oxygen isotope analysis alone is not possible for the regions analysed to date. Oxygen isotope analysis is a valid method for identifying first-generation migrants from an archaeological site when used appropriately, however it is difficult to identify migrants using statistical methods for a sample size of less than c. 25 individuals. In the absence of local previous analyses, each sample should be treated as an individual dataset and statistical techniques can be used to identify migrants, but in most cases pinpointing a specific homeland should not be attempted. PMID:27124001
Cloud-based solution to identify statistically significant MS peaks differentiating sample categories.

PubMed

Ji, Jun; Ling, Jeffrey; Jiang, Helen; Wen, Qiaojun; Whitin, John C; Tian, Lu; Cohen, Harvey J; Ling, Xuefeng B

2013-03-23

Mass spectrometry (MS) has evolved to become the primary high throughput tool for proteomics based biomarker discovery. Until now, multiple challenges in protein MS data analysis remain: large-scale and complex data set management; MS peak identification, indexing; and high dimensional peak differential analysis with the concurrent statistical tests based false discovery rate (FDR). "Turnkey" solutions are needed for biomarker investigations to rapidly process MS data sets to identify statistically significant peaks for subsequent validation. Here we present an efficient and effective solution, which provides experimental biologists easy access to "cloud" computing capabilities to analyze MS data. The web portal can be accessed at http://transmed.stanford.edu/ssa/. Presented web application supplies large scale MS data online uploading and analysis with a simple user interface. This bioinformatic tool will facilitate the discovery of the potential protein biomarkers using MS.

GUIDANCE FOR STATISTICAL DETERMINATION OF APPROPRIATE PERCENT MINORITY AND PERCENT POVERTY DISTRIBUTIONAL CUTOFF VALUES USING CENSUS DATA FOR AND EPA REGION II ENVIRONMENTAL JUSTICE PROJECT

EPA Science Inventory

The purpose of this report is to assist Region H by providing a statistical analysis identifying the areas with minority and below poverty populations known as "Community of Concern" (COC). The aim was to find a cutoff value as a threshold to identify a COC using demographic data...
Content analysis to detect high stress in oral interviews and text documents

NASA Technical Reports Server (NTRS)

Thirumalainambi, Rajkumar (Inventor); Jorgensen, Charles C. (Inventor)

2012-01-01

A system of interrogation to estimate whether a subject of interrogation is likely experiencing high stress, emotional volatility and/or internal conflict in the subject's responses to an interviewer's questions. The system applies one or more of four procedures, a first statistical analysis, a second statistical analysis, a third analysis and a heat map analysis, to identify one or more documents containing the subject's responses for which further examination is recommended. Words in the documents are characterized in terms of dimensions representing different classes of emotions and states of mind, in which the subject's responses that manifest high stress, emotional volatility and/or internal conflict are identified. A heat map visually displays the dimensions manifested by the subject's responses in different colors, textures, geometric shapes or other visually distinguishable indicia.
Research Update: Spatially resolved mapping of electronic structure on atomic level by multivariate statistical analysis

DOE PAGES

Belianinov, Alex; Panchapakesan, G.; Lin, Wenzhi; ...

2014-12-02

Atomic level spatial variability of electronic structure in Fe-based superconductor FeTe0.55Se0.45 (Tc = 15 K) is explored using current-imaging tunneling-spectroscopy. Multivariate statistical analysis of the data differentiates regions of dissimilar electronic behavior that can be identified with the segregation of chalcogen atoms, as well as boundaries between terminations and near neighbor interactions. Subsequent clustering analysis allows identification of the spatial localization of these dissimilar regions. Similar statistical analysis of modeled calculated density of states of chemically inhomogeneous FeTe1 x Sex structures further confirms that the two types of chalcogens, i.e., Te and Se, can be identified by their electronic signaturemore » and differentiated by their local chemical environment. This approach allows detailed chemical discrimination of the scanning tunneling microscopy data including separation of atomic identities, proximity, and local configuration effects and can be universally applicable to chemically and electronically inhomogeneous surfaces.« less
Research Update: Spatially resolved mapping of electronic structure on atomic level by multivariate statistical analysis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Belianinov, Alex, E-mail: belianinova@ornl.gov; Ganesh, Panchapakesan; Lin, Wenzhi

2014-12-01

Atomic level spatial variability of electronic structure in Fe-based superconductor FeTe{sub 0.55}Se{sub 0.45} (T{sub c} = 15 K) is explored using current-imaging tunneling-spectroscopy. Multivariate statistical analysis of the data differentiates regions of dissimilar electronic behavior that can be identified with the segregation of chalcogen atoms, as well as boundaries between terminations and near neighbor interactions. Subsequent clustering analysis allows identification of the spatial localization of these dissimilar regions. Similar statistical analysis of modeled calculated density of states of chemically inhomogeneous FeTe{sub 1−x}Se{sub x} structures further confirms that the two types of chalcogens, i.e., Te and Se, can be identified bymore » their electronic signature and differentiated by their local chemical environment. This approach allows detailed chemical discrimination of the scanning tunneling microscopy data including separation of atomic identities, proximity, and local configuration effects and can be universally applicable to chemically and electronically inhomogeneous surfaces.« less
Survey of editors and reviewers of high-impact psychology journals: statistical and research design problems in submitted manuscripts.

PubMed

Harris, Alex; Reeder, Rachelle; Hyun, Jenny

2011-01-01

The authors surveyed 21 editors and reviewers from major psychology journals to identify and describe the statistical and design errors they encounter most often and to get their advice regarding prevention of these problems. Content analysis of the text responses revealed themes in 3 major areas: (a) problems with research design and reporting (e.g., lack of an a priori power analysis, lack of congruence between research questions and study design/analysis, failure to adequately describe statistical procedures); (b) inappropriate data analysis (e.g., improper use of analysis of variance, too many statistical tests without adjustments, inadequate strategy for addressing missing data); and (c) misinterpretation of results. If researchers attended to these common methodological and analytic issues, the scientific quality of manuscripts submitted to high-impact psychology journals might be significantly improved.
The Other Twenty Percent: A Statistical Analysis of Poverty in the South.

ERIC Educational Resources Information Center

MacLachlan, Gretchen

Of the 27 million poor people in the United States in 1970, 10 million lived in the 11 Southern states. This was 38% of the nation's poverty population, making the South's poverty rate twice that of the remaining 39 states. This study, essentially a statistical analysis of regional poverty data derived from the 1970 Census, identifies the South's…
Reporting quality of statistical methods in surgical observational studies: protocol for systematic review.

PubMed

Wu, Robert; Glen, Peter; Ramsay, Tim; Martel, Guillaume

2014-06-28

Observational studies dominate the surgical literature. Statistical adjustment is an important strategy to account for confounders in observational studies. Research has shown that published articles are often poor in statistical quality, which may jeopardize their conclusions. The Statistical Analyses and Methods in the Published Literature (SAMPL) guidelines have been published to help establish standards for statistical reporting.This study will seek to determine whether the quality of statistical adjustment and the reporting of these methods are adequate in surgical observational studies. We hypothesize that incomplete reporting will be found in all surgical observational studies, and that the quality and reporting of these methods will be of lower quality in surgical journals when compared with medical journals. Finally, this work will seek to identify predictors of high-quality reporting. This work will examine the top five general surgical and medical journals, based on a 5-year impact factor (2007-2012). All observational studies investigating an intervention related to an essential component area of general surgery (defined by the American Board of Surgery), with an exposure, outcome, and comparator, will be included in this systematic review. Essential elements related to statistical reporting and quality were extracted from the SAMPL guidelines and include domains such as intent of analysis, primary analysis, multiple comparisons, numbers and descriptive statistics, association and correlation analyses, linear regression, logistic regression, Cox proportional hazard analysis, analysis of variance, survival analysis, propensity analysis, and independent and correlated analyses. Each article will be scored as a proportion based on fulfilling criteria in relevant analyses used in the study. A logistic regression model will be built to identify variables associated with high-quality reporting. A comparison will be made between the scores of surgical observational studies published in medical versus surgical journals. Secondary outcomes will pertain to individual domains of analysis. Sensitivity analyses will be conducted. This study will explore the reporting and quality of statistical analyses in surgical observational studies published in the most referenced surgical and medical journals in 2013 and examine whether variables (including the type of journal) can predict high-quality reporting.
SAFER, an Analysis Method of Quantitative Proteomic Data, Reveals New Interactors of the C. elegans Autophagic Protein LGG-1.

PubMed

Yi, Zhou; Manil-Ségalen, Marion; Sago, Laila; Glatigny, Annie; Redeker, Virginie; Legouis, Renaud; Mucchielli-Giorgi, Marie-Hélène

2016-05-06

Affinity purifications followed by mass spectrometric analysis are used to identify protein-protein interactions. Because quantitative proteomic data are noisy, it is necessary to develop statistical methods to eliminate false-positives and identify true partners. We present here a novel approach for filtering false interactors, named "SAFER" for mass Spectrometry data Analysis by Filtering of Experimental Replicates, which is based on the reproducibility of the replicates and the fold-change of the protein intensities between bait and control. To identify regulators or targets of autophagy, we characterized the interactors of LGG1, a ubiquitin-like protein involved in autophagosome formation in C. elegans. LGG-1 partners were purified by affinity, analyzed by nanoLC-MS/MS mass spectrometry, and quantified by a label-free proteomic approach based on the mass spectrometric signal intensity of peptide precursor ions. Because the selection of confident interactions depends on the method used for statistical analysis, we compared SAFER with several statistical tests and different scoring algorithms on this set of data. We show that SAFER recovers high-confidence interactors that have been ignored by the other methods and identified new candidates involved in the autophagy process. We further validated our method on a public data set and conclude that SAFER notably improves the identification of protein interactors.
Autocorrelation and cross-correlation in time series of homicide and attempted homicide

NASA Astrophysics Data System (ADS)

Machado Filho, A.; da Silva, M. F.; Zebende, G. F.

2014-04-01

We propose in this paper to establish the relationship between homicides and attempted homicides by a non-stationary time-series analysis. This analysis will be carried out by Detrended Fluctuation Analysis (DFA), Detrended Cross-Correlation Analysis (DCCA), and DCCA cross-correlation coefficient, ρ(n). Through this analysis we can identify a positive cross-correlation between homicides and attempted homicides. At the same time, looked at from the point of view of autocorrelation (DFA), this analysis can be more informative depending on time scale. For short scale (days), we cannot identify auto-correlations, on the scale of weeks DFA presents anti-persistent behavior, and for long time scales (n>90 days) DFA presents a persistent behavior. Finally, the application of this new type of statistical analysis proved to be efficient and, in this sense, this paper can contribute to a more accurate descriptive statistics of crime.
Spatio-temporal analysis of annual rainfall in Crete, Greece

NASA Astrophysics Data System (ADS)

Varouchakis, Emmanouil A.; Corzo, Gerald A.; Karatzas, George P.; Kotsopoulou, Anastasia

2018-03-01

Analysis of rainfall data from the island of Crete, Greece was performed to identify key hydrological years and return periods as well as to analyze the inter-annual behavior of the rainfall variability during the period 1981-2014. The rainfall spatial distribution was also examined in detail to identify vulnerable areas of the island. Data analysis using statistical tools and spectral analysis were applied to investigate and interpret the temporal course of the available rainfall data set. In addition, spatial analysis techniques were applied and compared to determine the rainfall spatial distribution on the island of Crete. The analysis presented that in contrast to Regional Climate Model estimations, rainfall rates have not decreased, while return periods vary depending on seasonality and geographic location. A small but statistical significant increasing trend was detected in the inter-annual rainfall variations as well as a significant rainfall cycle almost every 8 years. In addition, statistically significant correlation of the island's rainfall variability with the North Atlantic Oscillation is identified for the examined period. On the other hand, regression kriging method combining surface elevation as secondary information improved the estimation of the annual rainfall spatial variability on the island of Crete by 70% compared to ordinary kriging. The rainfall spatial and temporal trends on the island of Crete have variable characteristics that depend on the geographical area and on the hydrological period.
Meta-analysis of correlated traits via summary statistics from GWASs with an application in hypertension.

PubMed

Zhu, Xiaofeng; Feng, Tao; Tayo, Bamidele O; Liang, Jingjing; Young, J Hunter; Franceschini, Nora; Smith, Jennifer A; Yanek, Lisa R; Sun, Yan V; Edwards, Todd L; Chen, Wei; Nalls, Mike; Fox, Ervin; Sale, Michele; Bottinger, Erwin; Rotimi, Charles; Liu, Yongmei; McKnight, Barbara; Liu, Kiang; Arnett, Donna K; Chakravati, Aravinda; Cooper, Richard S; Redline, Susan

2015-01-08

Genome-wide association studies (GWASs) have identified many genetic variants underlying complex traits. Many detected genetic loci harbor variants that associate with multiple-even distinct-traits. Most current analysis approaches focus on single traits, even though the final results from multiple traits are evaluated together. Such approaches miss the opportunity to systemically integrate the phenome-wide data available for genetic association analysis. In this study, we propose a general approach that can integrate association evidence from summary statistics of multiple traits, either correlated, independent, continuous, or binary traits, which might come from the same or different studies. We allow for trait heterogeneity effects. Population structure and cryptic relatedness can also be controlled. Our simulations suggest that the proposed method has improved statistical power over single-trait analysis in most of the cases we studied. We applied our method to the Continental Origins and Genetic Epidemiology Network (COGENT) African ancestry samples for three blood pressure traits and identified four loci (CHIC2, HOXA-EVX1, IGFBP1/IGFBP3, and CDH17; p < 5.0 × 10(-8)) associated with hypertension-related traits that were missed by a single-trait analysis in the original report. Six additional loci with suggestive association evidence (p < 5.0 × 10(-7)) were also observed, including CACNA1D and WNT3. Our study strongly suggests that analyzing multiple phenotypes can improve statistical power and that such analysis can be executed with the summary statistics from GWASs. Our method also provides a way to study a cross phenotype (CP) association by using summary statistics from GWASs of multiple phenotypes. Copyright © 2015 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.
Multi-trait analysis of genome-wide association summary statistics using MTAG.

PubMed

Turley, Patrick; Walters, Raymond K; Maghzian, Omeed; Okbay, Aysu; Lee, James J; Fontana, Mark Alan; Nguyen-Viet, Tuan Anh; Wedow, Robbee; Zacher, Meghan; Furlotte, Nicholas A; Magnusson, Patrik; Oskarsson, Sven; Johannesson, Magnus; Visscher, Peter M; Laibson, David; Cesarini, David; Neale, Benjamin M; Benjamin, Daniel J

2018-02-01

We introduce multi-trait analysis of GWAS (MTAG), a method for joint analysis of summary statistics from genome-wide association studies (GWAS) of different traits, possibly from overlapping samples. We apply MTAG to summary statistics for depressive symptoms (N eff = 354,862), neuroticism (N = 168,105), and subjective well-being (N = 388,538). As compared to the 32, 9, and 13 genome-wide significant loci identified in the single-trait GWAS (most of which are themselves novel), MTAG increases the number of associated loci to 64, 37, and 49, respectively. Moreover, association statistics from MTAG yield more informative bioinformatics analyses and increase the variance explained by polygenic scores by approximately 25%, matching theoretical expectations.
Multivariate statistical analysis: Principles and applications to coorbital streams of meteorite falls

NASA Technical Reports Server (NTRS)

Wolf, S. F.; Lipschutz, M. E.

1993-01-01

Multivariate statistical analysis techniques (linear discriminant analysis and logistic regression) can provide powerful discrimination tools which are generally unfamiliar to the planetary science community. Fall parameters were used to identify a group of 17 H chondrites (Cluster 1) that were part of a coorbital stream which intersected Earth's orbit in May, from 1855 - 1895, and can be distinguished from all other H chondrite falls. Using multivariate statistical techniques, it was demonstrated that a totally different criterion, labile trace element contents - hence thermal histories - or 13 Cluster 1 meteorites are distinguishable from those of 45 non-Cluster 1 H chondrites. Here, we focus upon the principles of multivariate statistical techniques and illustrate their application using non-meteoritic and meteoritic examples.
Detection of crossover time scales in multifractal detrended fluctuation analysis

NASA Astrophysics Data System (ADS)

Ge, Erjia; Leung, Yee

2013-04-01

Fractal is employed in this paper as a scale-based method for the identification of the scaling behavior of time series. Many spatial and temporal processes exhibiting complex multi(mono)-scaling behaviors are fractals. One of the important concepts in fractals is crossover time scale(s) that separates distinct regimes having different fractal scaling behaviors. A common method is multifractal detrended fluctuation analysis (MF-DFA). The detection of crossover time scale(s) is, however, relatively subjective since it has been made without rigorous statistical procedures and has generally been determined by eye balling or subjective observation. Crossover time scales such determined may be spurious and problematic. It may not reflect the genuine underlying scaling behavior of a time series. The purpose of this paper is to propose a statistical procedure to model complex fractal scaling behaviors and reliably identify the crossover time scales under MF-DFA. The scaling-identification regression model, grounded on a solid statistical foundation, is first proposed to describe multi-scaling behaviors of fractals. Through the regression analysis and statistical inference, we can (1) identify the crossover time scales that cannot be detected by eye-balling observation, (2) determine the number and locations of the genuine crossover time scales, (3) give confidence intervals for the crossover time scales, and (4) establish the statistically significant regression model depicting the underlying scaling behavior of a time series. To substantive our argument, the regression model is applied to analyze the multi-scaling behaviors of avian-influenza outbreaks, water consumption, daily mean temperature, and rainfall of Hong Kong. Through the proposed model, we can have a deeper understanding of fractals in general and a statistical approach to identify multi-scaling behavior under MF-DFA in particular.
Statistical Analyses of Scatterplots to Identify Important Factors in Large-Scale Simulations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kleijnen, J.P.C.; Helton, J.C.

1999-04-01

The robustness of procedures for identifying patterns in scatterplots generated in Monte Carlo sensitivity analyses is investigated. These procedures are based on attempts to detect increasingly complex patterns in the scatterplots under consideration and involve the identification of (1) linear relationships with correlation coefficients, (2) monotonic relationships with rank correlation coefficients, (3) trends in central tendency as defined by means, medians and the Kruskal-Wallis statistic, (4) trends in variability as defined by variances and interquartile ranges, and (5) deviations from randomness as defined by the chi-square statistic. The following two topics related to the robustness of these procedures are consideredmore » for a sequence of example analyses with a large model for two-phase fluid flow: the presence of Type I and Type II errors, and the stability of results obtained with independent Latin hypercube samples. Observations from analysis include: (1) Type I errors are unavoidable, (2) Type II errors can occur when inappropriate analysis procedures are used, (3) physical explanations should always be sought for why statistical procedures identify variables as being important, and (4) the identification of important variables tends to be stable for independent Latin hypercube samples.« less
A retrospective analysis of the role of proton pump inhibitors in colorectal cancer disease survival

PubMed Central

Graham, C.; Orr, C.; Bricks, C.S.; Hopman, W.M.; Hammad, N.; Ramjeesingh, R.

2016-01-01

Background Proton pump inhibitors (ppis) are a commonly used medication. A limited number of studies have identified a weak-to-moderate association between ppi use and colorectal cancer (crc) risk, but none to date have identified an effect of ppi use on crc survival. We therefore postulated that an association between ppi use and crc survival might potentially exist. Methods We performed a retrospective chart review of 1304 crc patients diagnosed from January 2005 to December 2011 and treated at the Cancer Centre of Southeastern Ontario. Kaplan–Meier analysis and Cox proportional hazards regression models were used to evaluate overall survival (os). Results We identified 117 patients (9.0%) who were taking ppis at the time of oncology consult. Those taking a ppi were also more often taking asa or statins (or both) and had a statistically significantly increased rate of cardiac disease. No identifiable difference in tumour characteristics was evident in the two groups, including tumour location, differentiation, lymph node status, and stage. Univariate analysis identified a statistically nonsignificant difference in survival, with those taking a ppi experiencing lesser 1-year (82.1% vs. 86.7%, p = 0.161), 2-year (70.1% vs. 76.8%, p = 0.111), and 5-year os (55.2% vs. 62.9%, p = 0.165). When controlling for patient demographics and tumour characteristics, multivariate Cox regression analysis identified a statistically significant effect of ppi in our patient population (hazard ratio: 1.343; 95% confidence interval: 1.011 to 1.785; p = 0.042). Conclusions Our results suggest a potential adverse effect of ppi use on os in crc patients. These results need further evaluation in prospective analyses. PMID:28050148
Preparing systems engineering and computing science students in disciplined methods, quantitative, and advanced statistical techniques to improve process performance

NASA Astrophysics Data System (ADS)

McCray, Wilmon Wil L., Jr.

The research was prompted by a need to conduct a study that assesses process improvement, quality management and analytical techniques taught to students in U.S. colleges and universities undergraduate and graduate systems engineering and the computing science discipline (e.g., software engineering, computer science, and information technology) degree programs during their academic training that can be applied to quantitatively manage processes for performance. Everyone involved in executing repeatable processes in the software and systems development lifecycle processes needs to become familiar with the concepts of quantitative management, statistical thinking, process improvement methods and how they relate to process-performance. Organizations are starting to embrace the de facto Software Engineering Institute (SEI) Capability Maturity Model Integration (CMMI RTM) Models as process improvement frameworks to improve business processes performance. High maturity process areas in the CMMI model imply the use of analytical, statistical, quantitative management techniques, and process performance modeling to identify and eliminate sources of variation, continually improve process-performance; reduce cost and predict future outcomes. The research study identifies and provides a detail discussion of the gap analysis findings of process improvement and quantitative analysis techniques taught in U.S. universities systems engineering and computing science degree programs, gaps that exist in the literature, and a comparison analysis which identifies the gaps that exist between the SEI's "healthy ingredients " of a process performance model and courses taught in U.S. universities degree program. The research also heightens awareness that academicians have conducted little research on applicable statistics and quantitative techniques that can be used to demonstrate high maturity as implied in the CMMI models. The research also includes a Monte Carlo simulation optimization model and dashboard that demonstrates the use of statistical methods, statistical process control, sensitivity analysis, quantitative and optimization techniques to establish a baseline and predict future customer satisfaction index scores (outcomes). The American Customer Satisfaction Index (ACSI) model and industry benchmarks were used as a framework for the simulation model.
From Combat to Campus

ERIC Educational Resources Information Center

Bellafiore, Margaret

2012-01-01

Soldiers are returning from war to college. The number of veterans enrolled nationally is hard to find. Data from the National Center for Veterans Analysis and Statistics identify nearly 924,000 veterans as "total education program beneficiaries" for 2011. These statistics combine many categories, including dependents and survivors. The…
An ANOVA approach for statistical comparisons of brain networks.

PubMed

Fraiman, Daniel; Fraiman, Ricardo

2018-03-16

The study of brain networks has developed extensively over the last couple of decades. By contrast, techniques for the statistical analysis of these networks are less developed. In this paper, we focus on the statistical comparison of brain networks in a nonparametric framework and discuss the associated detection and identification problems. We tested network differences between groups with an analysis of variance (ANOVA) test we developed specifically for networks. We also propose and analyse the behaviour of a new statistical procedure designed to identify different subnetworks. As an example, we show the application of this tool in resting-state fMRI data obtained from the Human Connectome Project. We identify, among other variables, that the amount of sleep the days before the scan is a relevant variable that must be controlled. Finally, we discuss the potential bias in neuroimaging findings that is generated by some behavioural and brain structure variables. Our method can also be applied to other kind of networks such as protein interaction networks, gene networks or social networks.
Statistical software applications used in health services research: analysis of published studies in the U.S

PubMed Central

2011-01-01

Background This study aims to identify the statistical software applications most commonly employed for data analysis in health services research (HSR) studies in the U.S. The study also examines the extent to which information describing the specific analytical software utilized is provided in published articles reporting on HSR studies. Methods Data were extracted from a sample of 1,139 articles (including 877 original research articles) published between 2007 and 2009 in three U.S. HSR journals, that were considered to be representative of the field based upon a set of selection criteria. Descriptive analyses were conducted to categorize patterns in statistical software usage in those articles. The data were stratified by calendar year to detect trends in software use over time. Results Only 61.0% of original research articles in prominent U.S. HSR journals identified the particular type of statistical software application used for data analysis. Stata and SAS were overwhelmingly the most commonly used software applications employed (in 46.0% and 42.6% of articles respectively). However, SAS use grew considerably during the study period compared to other applications. Stratification of the data revealed that the type of statistical software used varied considerably by whether authors were from the U.S. or from other countries. Conclusions The findings highlight a need for HSR investigators to identify more consistently the specific analytical software used in their studies. Knowing that information can be important, because different software packages might produce varying results, owing to differences in the software's underlying estimation methods. PMID:21977990

graph-GPA: A graphical model for prioritizing GWAS results and investigating pleiotropic architecture.

PubMed

Chung, Dongjun; Kim, Hang J; Zhao, Hongyu

2017-02-01

Genome-wide association studies (GWAS) have identified tens of thousands of genetic variants associated with hundreds of phenotypes and diseases, which have provided clinical and medical benefits to patients with novel biomarkers and therapeutic targets. However, identification of risk variants associated with complex diseases remains challenging as they are often affected by many genetic variants with small or moderate effects. There has been accumulating evidence suggesting that different complex traits share common risk basis, namely pleiotropy. Recently, several statistical methods have been developed to improve statistical power to identify risk variants for complex traits through a joint analysis of multiple GWAS datasets by leveraging pleiotropy. While these methods were shown to improve statistical power for association mapping compared to separate analyses, they are still limited in the number of phenotypes that can be integrated. In order to address this challenge, in this paper, we propose a novel statistical framework, graph-GPA, to integrate a large number of GWAS datasets for multiple phenotypes using a hidden Markov random field approach. Application of graph-GPA to a joint analysis of GWAS datasets for 12 phenotypes shows that graph-GPA improves statistical power to identify risk variants compared to statistical methods based on smaller number of GWAS datasets. In addition, graph-GPA also promotes better understanding of genetic mechanisms shared among phenotypes, which can potentially be useful for the development of improved diagnosis and therapeutics. The R implementation of graph-GPA is currently available at https://dongjunchung.github.io/GGPA/.
Identifying What Student Affairs Professionals Value: A Mixed Methods Analysis of Professional Competencies Listed in Job Descriptions

ERIC Educational Resources Information Center

Hoffman, John L.; Bresciani, Marilee J.

2012-01-01

This mixed method study explored the professional competencies that administrators expect from entry-, mid-, and senior-level professionals as reflected in 1,759 job openings posted in 2008. Knowledge, skill, and dispositional competencies were identified during the qualitative phase of the study. Statistical analysis of the prevalence of…
Integrated Analysis of Pharmacologic, Clinical, and SNP Microarray Data using Projection onto the Most Interesting Statistical Evidence with Adaptive Permutation Testing

PubMed Central

Pounds, Stan; Cao, Xueyuan; Cheng, Cheng; Yang, Jun; Campana, Dario; Evans, William E.; Pui, Ching-Hon; Relling, Mary V.

2010-01-01

Powerful methods for integrated analysis of multiple biological data sets are needed to maximize interpretation capacity and acquire meaningful knowledge. We recently developed Projection Onto the Most Interesting Statistical Evidence (PROMISE). PROMISE is a statistical procedure that incorporates prior knowledge about the biological relationships among endpoint variables into an integrated analysis of microarray gene expression data with multiple biological and clinical endpoints. Here, PROMISE is adapted to the integrated analysis of pharmacologic, clinical, and genome-wide genotype data that incorporating knowledge about the biological relationships among pharmacologic and clinical response data. An efficient permutation-testing algorithm is introduced so that statistical calculations are computationally feasible in this higher-dimension setting. The new method is applied to a pediatric leukemia data set. The results clearly indicate that PROMISE is a powerful statistical tool for identifying genomic features that exhibit a biologically meaningful pattern of association with multiple endpoint variables. PMID:21516175
Identification of the isomers using principal component analysis (PCA) method

NASA Astrophysics Data System (ADS)

Kepceoǧlu, Abdullah; Gündoǧdu, Yasemin; Ledingham, Kenneth William David; Kilic, Hamdi Sukur

2016-03-01

In this work, we have carried out a detailed statistical analysis for experimental data of mass spectra from xylene isomers. Principle Component Analysis (PCA) was used to identify the isomers which cannot be distinguished using conventional statistical methods for interpretation of their mass spectra. Experiments have been carried out using a linear TOF-MS coupled to a femtosecond laser system as an energy source for the ionisation processes. We have performed experiments and collected data which has been analysed and interpreted using PCA as a multivariate analysis of these spectra. This demonstrates the strength of the method to get an insight for distinguishing the isomers which cannot be identified using conventional mass analysis obtained through dissociative ionisation processes on these molecules. The PCA results dependending on the laser pulse energy and the background pressure in the spectrometers have been presented in this work.
Statistical tables and charts showing geochemical variation in the Mesoproterozoic Big Creek, Apple Creek, and Gunsight formations, Lemhi group, Salmon River Mountains and Lemhi Range, central Idaho

USGS Publications Warehouse

Lindsey, David A.; Tysdal, Russell G.; Taggart, Joseph E.

2002-01-01

The principal purpose of this report is to provide a reference archive for results of a statistical analysis of geochemical data for metasedimentary rocks of Mesoproterozoic age of the Salmon River Mountains and Lemhi Range, central Idaho. Descriptions of geochemical data sets, statistical methods, rationale for interpretations, and references to the literature are provided. Three methods of analysis are used: R-mode factor analysis of major oxide and trace element data for identifying petrochemical processes, analysis of variance for effects of rock type and stratigraphic position on chemical composition, and major-oxide ratio plots for comparison with the chemical composition of common clastic sedimentary rocks.
Application of multivariable statistical techniques in plant-wide WWTP control strategies analysis.

PubMed

Flores, X; Comas, J; Roda, I R; Jiménez, L; Gernaey, K V

2007-01-01

The main objective of this paper is to present the application of selected multivariable statistical techniques in plant-wide wastewater treatment plant (WWTP) control strategies analysis. In this study, cluster analysis (CA), principal component analysis/factor analysis (PCA/FA) and discriminant analysis (DA) are applied to the evaluation matrix data set obtained by simulation of several control strategies applied to the plant-wide IWA Benchmark Simulation Model No 2 (BSM2). These techniques allow i) to determine natural groups or clusters of control strategies with a similar behaviour, ii) to find and interpret hidden, complex and casual relation features in the data set and iii) to identify important discriminant variables within the groups found by the cluster analysis. This study illustrates the usefulness of multivariable statistical techniques for both analysis and interpretation of the complex multicriteria data sets and allows an improved use of information for effective evaluation of control strategies.
[Statistical analysis of German radiologic periodicals: developmental trends in the last 10 years].

PubMed

Golder, W

1999-09-01

To identify which statistical tests are applied in German radiological publications, to what extent their use has changed during the last decade, and which factors might be responsible for this development. The major articles published in "ROFO" and "DER RADIOLOGE" during 1988, 1993 and 1998 were reviewed for statistical content. The contributions were classified by principal focus and radiological subspecialty. The methods used were assigned to descriptive, basal and advanced statistics. Sample size, significance level and power were established. The use of experts' assistance was monitored. Finally, we calculated the so-called cumulative accessibility of the publications. 525 contributions were found to be eligible. In 1988, 87% used descriptive statistics only, 12.5% basal, and 0.5% advanced statistics. The corresponding figures in 1993 and 1998 are 62 and 49%, 32 and 41%, and 6 and 10%, respectively. Statistical techniques were most likely to be used in research on musculoskeletal imaging and articles dedicated to MRI. Six basic categories of statistical methods account for the complete statistical analysis appearing in 90% of the articles. ROC analysis is the single most common advanced technique. Authors make increasingly use of statistical experts' opinion and programs. During the last decade, the use of statistical methods in German radiological journals has fundamentally improved, both quantitatively and qualitatively. Presently, advanced techniques account for 20% of the pertinent statistical tests. This development seems to be promoted by the increasing availability of statistical analysis software.
Assessing the Kansas water-level monitoring program: An example of the application of classical statistics to a geological problem

USGS Publications Warehouse

Davis, J.C.

2000-01-01

Geologists may feel that geological data are not amenable to statistical analysis, or at best require specialized approaches such as nonparametric statistics and geostatistics. However, there are many circumstances, particularly in systematic studies conducted for environmental or regulatory purposes, where traditional parametric statistical procedures can be beneficial. An example is the application of analysis of variance to data collected in an annual program of measuring groundwater levels in Kansas. Influences such as well conditions, operator effects, and use of the water can be assessed and wells that yield less reliable measurements can be identified. Such statistical studies have resulted in yearly improvements in the quality and reliability of the collected hydrologic data. Similar benefits may be achieved in other geological studies by the appropriate use of classical statistical tools.
Participant Interaction in Asynchronous Learning Environments: Evaluating Interaction Analysis Methods

ERIC Educational Resources Information Center

Blanchette, Judith

2012-01-01

The purpose of this empirical study was to determine the extent to which three different objective analytical methods--sequence analysis, surface cohesion analysis, and lexical cohesion analysis--can most accurately identify specific characteristics of online interaction. Statistically significant differences were found in all points of…
Statistical Analysis of Big Data on Pharmacogenomics

PubMed Central

Fan, Jianqing; Liu, Han

2013-01-01

This paper discusses statistical methods for estimating complex correlation structure from large pharmacogenomic datasets. We selectively review several prominent statistical methods for estimating large covariance matrix for understanding correlation structure, inverse covariance matrix for network modeling, large-scale simultaneous tests for selecting significantly differently expressed genes and proteins and genetic markers for complex diseases, and high dimensional variable selection for identifying important molecules for understanding molecule mechanisms in pharmacogenomics. Their applications to gene network estimation and biomarker selection are used to illustrate the methodological power. Several new challenges of Big data analysis, including complex data distribution, missing data, measurement error, spurious correlation, endogeneity, and the need for robust statistical methods, are also discussed. PMID:23602905
BaTMAn: Bayesian Technique for Multi-image Analysis

NASA Astrophysics Data System (ADS)

Casado, J.; Ascasibar, Y.; García-Benito, R.; Guidi, G.; Choudhury, O. S.; Bellocchi, E.; Sánchez, S. F.; Díaz, A. I.

2016-12-01

Bayesian Technique for Multi-image Analysis (BaTMAn) characterizes any astronomical dataset containing spatial information and performs a tessellation based on the measurements and errors provided as input. The algorithm iteratively merges spatial elements as long as they are statistically consistent with carrying the same information (i.e. identical signal within the errors). The output segmentations successfully adapt to the underlying spatial structure, regardless of its morphology and/or the statistical properties of the noise. BaTMAn identifies (and keeps) all the statistically-significant information contained in the input multi-image (e.g. an IFS datacube). The main aim of the algorithm is to characterize spatially-resolved data prior to their analysis.
Automatic identification of bacterial types using statistical imaging methods

NASA Astrophysics Data System (ADS)

Trattner, Sigal; Greenspan, Hayit; Tepper, Gapi; Abboud, Shimon

2003-05-01

The objective of the current study is to develop an automatic tool to identify bacterial types using computer-vision and statistical modeling techniques. Bacteriophage (phage)-typing methods are used to identify and extract representative profiles of bacterial types, such as the Staphylococcus Aureus. Current systems rely on the subjective reading of plaque profiles by human expert. This process is time-consuming and prone to errors, especially as technology is enabling the increase in the number of phages used for typing. The statistical methodology presented in this work, provides for an automated, objective and robust analysis of visual data, along with the ability to cope with increasing data volumes.
An application of statistics to comparative metagenomics

PubMed Central

Rodriguez-Brito, Beltran; Rohwer, Forest; Edwards, Robert A

2006-01-01

Background Metagenomics, sequence analyses of genomic DNA isolated directly from the environments, can be used to identify organisms and model community dynamics of a particular ecosystem. Metagenomics also has the potential to identify significantly different metabolic potential in different environments. Results Here we use a statistical method to compare curated subsystems, to predict the physiology, metabolism, and ecology from metagenomes. This approach can be used to identify those subsystems that are significantly different between metagenome sequences. Subsystems that were overrepresented in the Sargasso Sea and Acid Mine Drainage metagenome when compared to non-redundant databases were identified. Conclusion The methodology described herein applies statistics to the comparisons of metabolic potential in metagenomes. This analysis reveals those subsystems that are more, or less, represented in the different environments that are compared. These differences in metabolic potential lead to several testable hypotheses about physiology and metabolism of microbes from these ecosystems. PMID:16549025
An application of statistics to comparative metagenomics.

PubMed

Rodriguez-Brito, Beltran; Rohwer, Forest; Edwards, Robert A

2006-03-20

Metagenomics, sequence analyses of genomic DNA isolated directly from the environments, can be used to identify organisms and model community dynamics of a particular ecosystem. Metagenomics also has the potential to identify significantly different metabolic potential in different environments. Here we use a statistical method to compare curated subsystems, to predict the physiology, metabolism, and ecology from metagenomes. This approach can be used to identify those subsystems that are significantly different between metagenome sequences. Subsystems that were overrepresented in the Sargasso Sea and Acid Mine Drainage metagenome when compared to non-redundant databases were identified. The methodology described herein applies statistics to the comparisons of metabolic potential in metagenomes. This analysis reveals those subsystems that are more, or less, represented in the different environments that are compared. These differences in metabolic potential lead to several testable hypotheses about physiology and metabolism of microbes from these ecosystems.
Statistical properties of filtered pseudorandom digital sequences formed from the sum of maximum-length sequences

NASA Technical Reports Server (NTRS)

Wallace, G. R.; Weathers, G. D.; Graf, E. R.

1973-01-01

The statistics of filtered pseudorandom digital sequences called hybrid-sum sequences, formed from the modulo-two sum of several maximum-length sequences, are analyzed. The results indicate that a relation exists between the statistics of the filtered sequence and the characteristic polynomials of the component maximum length sequences. An analysis procedure is developed for identifying a large group of sequences with good statistical properties for applications requiring the generation of analog pseudorandom noise. By use of the analysis approach, the filtering process is approximated by the convolution of the sequence with a sum of unit step functions. A parameter reflecting the overall statistical properties of filtered pseudorandom sequences is derived. This parameter is called the statistical quality factor. A computer algorithm to calculate the statistical quality factor for the filtered sequences is presented, and the results for two examples of sequence combinations are included. The analysis reveals that the statistics of the signals generated with the hybrid-sum generator are potentially superior to the statistics of signals generated with maximum-length generators. Furthermore, fewer calculations are required to evaluate the statistics of a large group of hybrid-sum generators than are required to evaluate the statistics of the same size group of approximately equivalent maximum-length sequences.
EEG Correlates of Fluctuation in Cognitive Performance in an Air Traffic Control Task

DTIC Science & Technology

2014-11-01

using non-parametric statistical analysis to identify neurophysiological patterns due to the time-on-task effect. Significant changes in EEG power...EEG, Cognitive Performance, Power Spectral Analysis , Non-Parametric Analysis Document is available to the public through the Internet...3 Performance Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 EEG
Using statistical text classification to identify health information technology incidents

PubMed Central

Chai, Kevin E K; Anthony, Stephen; Coiera, Enrico; Magrabi, Farah

2013-01-01

Objective To examine the feasibility of using statistical text classification to automatically identify health information technology (HIT) incidents in the USA Food and Drug Administration (FDA) Manufacturer and User Facility Device Experience (MAUDE) database. Design We used a subset of 570 272 incidents including 1534 HIT incidents reported to MAUDE between 1 January 2008 and 1 July 2010. Text classifiers using regularized logistic regression were evaluated with both ‘balanced’ (50% HIT) and ‘stratified’ (0.297% HIT) datasets for training, validation, and testing. Dataset preparation, feature extraction, feature selection, cross-validation, classification, performance evaluation, and error analysis were performed iteratively to further improve the classifiers. Feature-selection techniques such as removing short words and stop words, stemming, lemmatization, and principal component analysis were examined. Measurements κ statistic, F1 score, precision and recall. Results Classification performance was similar on both the stratified (0.954 F1 score) and balanced (0.995 F1 score) datasets. Stemming was the most effective technique, reducing the feature set size to 79% while maintaining comparable performance. Training with balanced datasets improved recall (0.989) but reduced precision (0.165). Conclusions Statistical text classification appears to be a feasible method for identifying HIT reports within large databases of incidents. Automated identification should enable more HIT problems to be detected, analyzed, and addressed in a timely manner. Semi-supervised learning may be necessary when applying machine learning to big data analysis of patient safety incidents and requires further investigation. PMID:23666777
Identification of Microorganisms by High Resolution Tandem Mass Spectrometry with Accurate Statistical Significance

NASA Astrophysics Data System (ADS)

Alves, Gelio; Wang, Guanghui; Ogurtsov, Aleksey Y.; Drake, Steven K.; Gucek, Marjan; Suffredini, Anthony F.; Sacks, David B.; Yu, Yi-Kuo

2016-02-01

Correct and rapid identification of microorganisms is the key to the success of many important applications in health and safety, including, but not limited to, infection treatment, food safety, and biodefense. With the advance of mass spectrometry (MS) technology, the speed of identification can be greatly improved. However, the increasing number of microbes sequenced is challenging correct microbial identification because of the large number of choices present. To properly disentangle candidate microbes, one needs to go beyond apparent morphology or simple `fingerprinting'; to correctly prioritize the candidate microbes, one needs to have accurate statistical significance in microbial identification. We meet these challenges by using peptidome profiles of microbes to better separate them and by designing an analysis method that yields accurate statistical significance. Here, we present an analysis pipeline that uses tandem MS (MS/MS) spectra for microbial identification or classification. We have demonstrated, using MS/MS data of 81 samples, each composed of a single known microorganism, that the proposed pipeline can correctly identify microorganisms at least at the genus and species levels. We have also shown that the proposed pipeline computes accurate statistical significances, i.e., E-values for identified peptides and unified E-values for identified microorganisms. The proposed analysis pipeline has been implemented in MiCId, a freely available software for Microorganism Classification and Identification. MiCId is available for download at http://www.ncbi.nlm.nih.gov/CBBresearch/Yu/downloads.html.
Identifying drought response of semi-arid aeolian systems using near-surface luminescence profiles and changepoint analysis, Nebraska Sandhills.

NASA Astrophysics Data System (ADS)

Buckland, Catherine; Bailey, Richard; Thomas, David

2017-04-01

Two billion people living in drylands are affected by land degradation. Sediment erosion by wind and water removes fertile soil and destabilises landscapes. Vegetation disturbance is a key driver of dryland erosion caused by both natural and human forcings: drought, fire, land use, grazing pressure. A quantified understanding of vegetation cover sensitivities and resultant surface change to forcing factors is needed if the vegetation and landscape response to future climate change and human pressure are to be better predicted. Using quartz luminescence dating and statistical changepoint analysis (Killick & Eckley, 2014) this study demonstrates the ability to identify step-changes in depositional age of near-surface sediments. Lx/Tx luminescence profiles coupled with statistical analysis show the use of near-surface sediments in providing a high-resolution record of recent system response and aeolian system thresholds. This research determines how the environment has recorded and retained sedimentary evidence of drought response and land use disturbances over the last two hundred years across both individual landforms and the wider Nebraska Sandhills. Identifying surface deposition and comparing with records of climate, fire and land use changes allows us to assess the sensitivity and stability of the surface sediment to a range of forcing factors. Killick, R and Eckley, IA. (2014) "changepoint: An R Package for Changepoint Analysis." Journal of Statistical Software, (58) 1-19.
Mild cognitive impairment and fMRI studies of brain functional connectivity: the state of the art

PubMed Central

Farràs-Permanyer, Laia; Guàrdia-Olmos, Joan; Peró-Cebollero, Maribel

2015-01-01

In the last 15 years, many articles have studied brain connectivity in Mild Cognitive Impairment patients with fMRI techniques, seemingly using different connectivity statistical models in each investigation to identify complex connectivity structures so as to recognize typical behavior in this type of patient. This diversity in statistical approaches may cause problems in results comparison. This paper seeks to describe how researchers approached the study of brain connectivity in MCI patients using fMRI techniques from 2002 to 2014. The focus is on the statistical analysis proposed by each research group in reference to the limitations and possibilities of those techniques to identify some recommendations to improve the study of functional connectivity. The included articles came from a search of Web of Science and PsycINFO using the following keywords: f MRI, MCI, and functional connectivity. Eighty-one papers were found, but two of them were discarded because of the lack of statistical analysis. Accordingly, 79 articles were included in this review. We summarized some parts of the articles, including the goal of every investigation, the cognitive paradigm and methods used, brain regions involved, use of ROI analysis and statistical analysis, emphasizing on the connectivity estimation model used in each investigation. The present analysis allowed us to confirm the remarkable variability of the statistical analysis methods found. Additionally, the study of brain connectivity in this type of population is not providing, at the moment, any significant information or results related to clinical aspects relevant for prediction and treatment. We propose to follow guidelines for publishing fMRI data that would be a good solution to the problem of study replication. The latter aspect could be important for future publications because a higher homogeneity would benefit the comparison between publications and the generalization of results. PMID:26300802

Chemical discrimination of lubricant marketing types using direct analysis in real time time-of-flight mass spectrometry.

PubMed

Maric, Mark; Harvey, Lauren; Tomcsak, Maren; Solano, Angelique; Bridge, Candice

2017-06-30

In comparison to other violent crimes, sexual assaults suffer from very low prosecution and conviction rates especially in the absence of DNA evidence. As a result, the forensic community needs to utilize other forms of trace contact evidence, like lubricant evidence, in order to provide a link between the victim and the assailant. In this study, 90 personal bottled and condom lubricants from the three main marketing types, silicone-based, water-based and condoms, were characterized by direct analysis in real time time of flight mass spectrometry (DART-TOFMS). The instrumental data was analyzed by multivariate statistics including hierarchal cluster analysis, principal component analysis, and linear discriminant analysis. By interpreting the mass spectral data with multivariate statistics, 12 discrete groupings were identified, indicating inherent chemical diversity not only between but within the three main marketing groups. A number of unique chemical markers, both major and minor, were identified, other than the three main chemical components (i.e. PEG, PDMS and nonoxynol-9) currently used for lubricant classification. The data was validated by a stratified 20% withheld cross-validation which demonstrated that there was minimal overlap between the groupings. Based on the groupings identified and unique features of each group, a highly discriminating statistical model was then developed that aims to provide the foundation for the development of a forensic lubricant database that may eventually be applied to casework. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
Analysis of the sleep quality of elderly people using biomedical signals.

PubMed

Moreno-Alsasua, L; Garcia-Zapirain, B; Mendez-Zorrilla, A

2015-01-01

This paper presents a technical solution that analyses sleep signals captured by biomedical sensors to find possible disorders during rest. Specifically, the method evaluates electrooculogram (EOG) signals, skin conductance (GSR), air flow (AS), and body temperature. Next, a quantitative sleep quality analysis determines significant changes in the biological signals, and any similarities between them in a given time period. Filtering techniques such as the Fourier transform method and IIR filters process the signal and identify significant variations. Once these changes have been identified, all significant data is compared and a quantitative and statistical analysis is carried out to determine the level of a person's rest. To evaluate the correlation and significant differences, a statistical analysis has been calculated showing correlation between EOG and AS signals (p=0,005), EOG, and GSR signals (p=0,037) and, finally, the EOG and Body temperature (p=0,04). Doctors could use this information to monitor changes within a patient.
GOEAST: a web-based software toolkit for Gene Ontology enrichment analysis.

PubMed

Zheng, Qi; Wang, Xiu-Jie

2008-07-01

Gene Ontology (GO) analysis has become a commonly used approach for functional studies of large-scale genomic or transcriptomic data. Although there have been a lot of software with GO-related analysis functions, new tools are still needed to meet the requirements for data generated by newly developed technologies or for advanced analysis purpose. Here, we present a Gene Ontology Enrichment Analysis Software Toolkit (GOEAST), an easy-to-use web-based toolkit that identifies statistically overrepresented GO terms within given gene sets. Compared with available GO analysis tools, GOEAST has the following improved features: (i) GOEAST displays enriched GO terms in graphical format according to their relationships in the hierarchical tree of each GO category (biological process, molecular function and cellular component), therefore, provides better understanding of the correlations among enriched GO terms; (ii) GOEAST supports analysis for data from various sources (probe or probe set IDs of Affymetrix, Illumina, Agilent or customized microarrays, as well as different gene identifiers) and multiple species (about 60 prokaryote and eukaryote species); (iii) One unique feature of GOEAST is to allow cross comparison of the GO enrichment status of multiple experiments to identify functional correlations among them. GOEAST also provides rigorous statistical tests to enhance the reliability of analysis results. GOEAST is freely accessible at http://omicslab.genetics.ac.cn/GOEAST/
A critique of the usefulness of inferential statistics in applied behavior analysis

PubMed Central

Hopkins, B. L.; Cole, Brian L.; Mason, Tina L.

1998-01-01

Researchers continue to recommend that applied behavior analysts use inferential statistics in making decisions about effects of independent variables on dependent variables. In many other approaches to behavioral science, inferential statistics are the primary means for deciding the importance of effects. Several possible uses of inferential statistics are considered. Rather than being an objective means for making decisions about effects, as is often claimed, inferential statistics are shown to be subjective. It is argued that the use of inferential statistics adds nothing to the complex and admittedly subjective nonstatistical methods that are often employed in applied behavior analysis. Attacks on inferential statistics that are being made, perhaps with increasing frequency, by those who are not behavior analysts, are discussed. These attackers are calling for banning the use of inferential statistics in research publications and commonly recommend that behavioral scientists should switch to using statistics aimed at interval estimation or the method of confidence intervals. Interval estimation is shown to be contrary to the fundamental assumption of behavior analysis that only individuals behave. It is recommended that authors who wish to publish the results of inferential statistics be asked to justify them as a means for helping us to identify any ways in which they may be useful. PMID:22478304
Substituting values for censored data from Texas, USA, reservoirs inflated and obscured trends in analyses commonly used for water quality target development.

PubMed

Grantz, Erin; Haggard, Brian; Scott, J Thad

2018-06-12

We calculated four median datasets (chlorophyll a, Chl a; total phosphorus, TP; and transparency) using multiple approaches to handling censored observations, including substituting fractions of the quantification limit (QL; dataset 1 = 1QL, dataset 2 = 0.5QL) and statistical methods for censored datasets (datasets 3-4) for approximately 100 Texas, USA reservoirs. Trend analyses of differences between dataset 1 and 3 medians indicated percent difference increased linearly above thresholds in percent censored data (%Cen). This relationship was extrapolated to estimate medians for site-parameter combinations with %Cen > 80%, which were combined with dataset 3 as dataset 4. Changepoint analysis of Chl a- and transparency-TP relationships indicated threshold differences up to 50% between datasets. Recursive analysis identified secondary thresholds in dataset 4. Threshold differences show that information introduced via substitution or missing due to limitations of statistical methods biased values, underestimated error, and inflated the strength of TP thresholds identified in datasets 1-3. Analysis of covariance identified differences in linear regression models relating transparency-TP between datasets 1, 2, and the more statistically robust datasets 3-4. Study findings identify high-risk scenarios for biased analytical outcomes when using substitution. These include high probability of median overestimation when %Cen > 50-60% for a single QL, or when %Cen is as low 16% for multiple QL's. Changepoint analysis was uniquely vulnerable to substitution effects when using medians from sites with %Cen > 50%. Linear regression analysis was less sensitive to substitution and missing data effects, but differences in model parameters for transparency cannot be discounted and could be magnified by log-transformation of the variables.
Method for Identifying Probable Archaeological Sites from Remotely Sensed Data

NASA Technical Reports Server (NTRS)

Tilton, James C.; Comer, Douglas C.; Priebe, Carey E.; Sussman, Daniel

2011-01-01

Archaeological sites are being compromised or destroyed at a catastrophic rate in most regions of the world. The best solution to this problem is for archaeologists to find and study these sites before they are compromised or destroyed. One way to facilitate the necessary rapid, wide area surveys needed to find these archaeological sites is through the generation of maps of probable archaeological sites from remotely sensed data. We describe an approach for identifying probable locations of archaeological sites over a wide area based on detecting subtle anomalies in vegetative cover through a statistically based analysis of remotely sensed data from multiple sources. We further developed this approach under a recent NASA ROSES Space Archaeology Program project. Under this project we refined and elaborated this statistical analysis to compensate for potential slight miss-registrations between the remote sensing data sources and the archaeological site location data. We also explored data quantization approaches (required by the statistical analysis approach), and we identified a superior data quantization approached based on a unique image segmentation approach. In our presentation we will summarize our refined approach and demonstrate the effectiveness of the overall approach with test data from Santa Catalina Island off the southern California coast. Finally, we discuss our future plans for further improving our approach.
Application of the Statistical ICA Technique in the DANCE Data Analysis

NASA Astrophysics Data System (ADS)

Baramsai, Bayarbadrakh; Jandel, M.; Bredeweg, T. A.; Rusev, G.; Walker, C. L.; Couture, A.; Mosby, S.; Ullmann, J. L.; Dance Collaboration

2015-10-01

The Detector for Advanced Neutron Capture Experiments (DANCE) at the Los Alamos Neutron Science Center is used to improve our understanding of the neutron capture reaction. DANCE is a highly efficient 4 π γ-ray detector array consisting of 160 BaF2 crystals which make it an ideal tool for neutron capture experiments. The (n, γ) reaction Q-value equals to the sum energy of all γ-rays emitted in the de-excitation cascades from the excited capture state to the ground state. The total γ-ray energy is used to identify reactions on different isotopes as well as the background. However, it's challenging to identify contribution in the Esum spectra from different isotopes with the similar Q-values. Recently we have tested the applicability of modern statistical methods such as Independent Component Analysis (ICA) to identify and separate different (n, γ) reaction yields on different isotopes that are present in the target material. ICA is a recently developed computational tool for separating multidimensional data into statistically independent additive subcomponents. In this conference talk, we present some results of the application of ICA algorithms and its modification for the DANCE experimental data analysis. This research is supported by the U. S. Department of Energy, Office of Science, Nuclear Physics under the Early Career Award No. LANL20135009.
Hydrometeorological application of an extratropical cyclone classification scheme in the southern United States

NASA Astrophysics Data System (ADS)

Senkbeil, J. C.; Brommer, D. M.; Comstock, I. J.; Loyd, T.

2012-07-01

Extratropical cyclones (ETCs) in the southern United States are often overlooked when compared with tropical cyclones in the region and ETCs in the northern United States. Although southern ETCs are significant weather events, there is currently not an operational scheme used for identifying and discussing these nameless storms. In this research, we classified 84 ETCs (1970-2009). We manually identified five distinct formation regions and seven unique ETC types using statistical classification. Statistical classification employed the use of principal components analysis and two methods of cluster analysis. Both manual and statistical storm types generally showed positive (negative) relationships with El Niño (La Niña). Manual storm types displayed precipitation swaths consistent with discrete storm tracks which further legitimizes the existence of multiple modes of southern ETCs. Statistical storm types also displayed unique precipitation intensity swaths, but these swaths were less indicative of track location. It is hoped that by classifying southern ETCs into types, that forecasters, hydrologists, and broadcast meteorologists might be able to better anticipate projected amounts of precipitation at their locations.
The effect of berberine on insulin resistance in women with polycystic ovary syndrome: detailed statistical analysis plan (SAP) for a multicenter randomized controlled trial.

PubMed

Zhang, Ying; Sun, Jin; Zhang, Yun-Jiao; Chai, Qian-Yun; Zhang, Kang; Ma, Hong-Li; Wu, Xiao-Ke; Liu, Jian-Ping

2016-10-21

Although Traditional Chinese Medicine (TCM) has been widely used in clinical settings, a major challenge that remains in TCM is to evaluate its efficacy scientifically. This randomized controlled trial aims to evaluate the efficacy and safety of berberine in the treatment of patients with polycystic ovary syndrome. In order to improve the transparency and research quality of this clinical trial, we prepared this statistical analysis plan (SAP). The trial design, primary and secondary outcomes, and safety outcomes were declared to reduce selection biases in data analysis and result reporting. We specified detailed methods for data management and statistical analyses. Statistics in corresponding tables, listings, and graphs were outlined. The SAP provided more detailed information than trial protocol on data management and statistical analysis methods. Any post hoc analyses could be identified via referring to this SAP, and the possible selection bias and performance bias will be reduced in the trial. This study is registered at ClinicalTrials.gov, NCT01138930 , registered on 7 June 2010.
Effect of the absolute statistic on gene-sampling gene-set analysis methods.

PubMed

Nam, Dougu

2017-06-01

Gene-set enrichment analysis and its modified versions have commonly been used for identifying altered functions or pathways in disease from microarray data. In particular, the simple gene-sampling gene-set analysis methods have been heavily used for datasets with only a few sample replicates. The biggest problem with this approach is the highly inflated false-positive rate. In this paper, the effect of absolute gene statistic on gene-sampling gene-set analysis methods is systematically investigated. Thus far, the absolute gene statistic has merely been regarded as a supplementary method for capturing the bidirectional changes in each gene set. Here, it is shown that incorporating the absolute gene statistic in gene-sampling gene-set analysis substantially reduces the false-positive rate and improves the overall discriminatory ability. Its effect was investigated by power, false-positive rate, and receiver operating curve for a number of simulated and real datasets. The performances of gene-set analysis methods in one-tailed (genome-wide association study) and two-tailed (gene expression data) tests were also compared and discussed.
Statistics used in current nursing research.

PubMed

Zellner, Kathleen; Boerst, Connie J; Tabb, Wil

2007-02-01

Undergraduate nursing research courses should emphasize the statistics most commonly used in the nursing literature to strengthen students' and beginning researchers' understanding of them. To determine the most commonly used statistics, we reviewed all quantitative research articles published in 13 nursing journals in 2000. The findings supported Beitz's categorization of kinds of statistics. Ten primary statistics used in 80% of nursing research published in 2000 were identified. We recommend that the appropriate use of those top 10 statistics be emphasized in undergraduate nursing education and that the nursing profession continue to advocate for the use of methods (e.g., power analysis, odds ratio) that may contribute to the advancement of nursing research.
Computer-aided auditing of prescription drug claims.

PubMed

Iyengar, Vijay S; Hermiz, Keith B; Natarajan, Ramesh

2014-09-01

We describe a methodology for identifying and ranking candidate audit targets from a database of prescription drug claims. The relevant audit targets may include various entities such as prescribers, patients and pharmacies, who exhibit certain statistical behavior indicative of potential fraud and abuse over the prescription claims during a specified period of interest. Our overall approach is consistent with related work in statistical methods for detection of fraud and abuse, but has a relative emphasis on three specific aspects: first, based on the assessment of domain experts, certain focus areas are selected and data elements pertinent to the audit analysis in each focus area are identified; second, specialized statistical models are developed to characterize the normalized baseline behavior in each focus area; and third, statistical hypothesis testing is used to identify entities that diverge significantly from their expected behavior according to the relevant baseline model. The application of this overall methodology to a prescription claims database from a large health plan is considered in detail.
Economic and statistical analysis of time limitations for spotting fluids and fishing operations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Keller, P.S.; Brinkmann, P.E.; Taneja, P.K.

1984-05-01

This paper reviews the statistics of ''Spotting Fluids'' to free stuck drill pipe as well as the economics and statistics of drill string fishing operations. Data were taken from Mobil Oil Exploration and Producing Southeast Inc.'s (MOEPSI) records from 1970-1981. Only those events which occur after a drill string becomes stuck are discussed. The data collected were categorized as Directional Wells and Straight Wells. Bar diagrams are presented to show the Success Ratio vs. Soaking Time for each of the two categories. An analysis was made to identify the elapsed time limit to place the spotting fluid for maximum probabilitymore » of success. Also determined was the statistical minimum soaking time and the maximum soaking time. For determining the time limit for fishing operations, the following criteria were used: 1. The Risked ''Economic Breakeven Analysis'' concept was developed based on the work of Harrison. 2. Statistical Probability of Success based on MOEPSI's records from 1970-1981.« less
Systems Analysis of NASA Aviation Safety Program: Final Report

NASA Technical Reports Server (NTRS)

Jones, Sharon M.; Reveley, Mary S.; Withrow, Colleen A.; Evans, Joni K.; Barr, Lawrence; Leone, Karen

2013-01-01

A three-month study (February to April 2010) of the NASA Aviation Safety (AvSafe) program was conducted. This study comprised three components: (1) a statistical analysis of currently available civilian subsonic aircraft data from the National Transportation Safety Board (NTSB), the Federal Aviation Administration (FAA), and the Aviation Safety Information Analysis and Sharing (ASIAS) system to identify any significant or overlooked aviation safety issues; (2) a high-level qualitative identification of future safety risks, with an assessment of the potential impact of the NASA AvSafe research on the National Airspace System (NAS) based on these risks; and (3) a detailed, top-down analysis of the NASA AvSafe program using an established and peer-reviewed systems analysis methodology. The statistical analysis identified the top aviation "tall poles" based on NTSB accident and FAA incident data from 1997 to 2006. A separate examination of medical helicopter accidents in the United States was also conducted. Multiple external sources were used to develop a compilation of ten "tall poles" in future safety issues/risks. The top-down analysis of the AvSafe was conducted by using a modification of the Gibson methodology. Of the 17 challenging safety issues that were identified, 11 were directly addressed by the AvSafe program research portfolio.
Markov Logic Networks in the Analysis of Genetic Data

PubMed Central

Sakhanenko, Nikita A.

2010-01-01

Abstract Complex, non-additive genetic interactions are common and can be critical in determining phenotypes. Genome-wide association studies (GWAS) and similar statistical studies of linkage data, however, assume additive models of gene interactions in looking for genotype-phenotype associations. These statistical methods view the compound effects of multiple genes on a phenotype as a sum of influences of each gene and often miss a substantial part of the heritable effect. Such methods do not use any biological knowledge about underlying mechanisms. Modeling approaches from the artificial intelligence (AI) field that incorporate deterministic knowledge into models to perform statistical analysis can be applied to include prior knowledge in genetic analysis. We chose to use the most general such approach, Markov Logic Networks (MLNs), for combining deterministic knowledge with statistical analysis. Using simple, logistic regression-type MLNs we can replicate the results of traditional statistical methods, but we also show that we are able to go beyond finding independent markers linked to a phenotype by using joint inference without an independence assumption. The method is applied to genetic data on yeast sporulation, a complex phenotype with gene interactions. In addition to detecting all of the previously identified loci associated with sporulation, our method identifies four loci with smaller effects. Since their effect on sporulation is small, these four loci were not detected with methods that do not account for dependence between markers due to gene interactions. We show how gene interactions can be detected using more complex models, which can be used as a general framework for incorporating systems biology with genetics. PMID:20958249
Metabolomic analysis based on 1H-nuclear magnetic resonance spectroscopy metabolic profiles in tuberculous, malignant and transudative pleural effusion

PubMed Central

Wang, Cheng; Peng, Jingjin; Kuang, Yanling; Zhang, Jiaqiang; Dai, Luming

2017-01-01

Pleural effusion is a common clinical manifestation with various causes. Current diagnostic and therapeutic methods have exhibited numerous limitations. By involving the analysis of dynamic changes in low molecular weight catabolites, metabolomics has been widely applied in various types of disease and have provided platforms to distinguish many novel biomarkers. However, to the best of our knowledge, there are few studies regarding the metabolic profiling for pleural effusion. In the current study, 58 pleural effusion samples were collected, among which 20 were malignant pleural effusions, 20 were tuberculous pleural effusions and 18 were transudative pleural effusions. The small molecule metabolite spectrums were obtained by adopting 1H nuclear magnetic resonance technology, and pattern-recognition multi-variable statistical analysis was used to screen out different metabolites. One-way analysis of variance, and Student-Newman-Keuls and the Kruskal-Wallis test were adopted for statistical analysis. Over 400 metabolites were identified in the untargeted metabolomic analysis and 26 metabolites were identified as significantly different among tuberculous, malignant and transudative pleural effusions. These metabolites were predominantly involved in the metabolic pathways of amino acids metabolism, glycometabolism and lipid metabolism. Statistical analysis revealed that eight metabolites contributed to the distinction between the three groups: Tuberculous, malignant and transudative pleural effusion. In the current study, the feasibility of identifying small molecule biochemical profiles in different types of pleural effusion were investigated reveal novel biological insights into the underlying mechanisms. The results provide specific insights into the biology of tubercular, malignant and transudative pleural effusion and may offer novel strategies for the diagnosis and therapy of associated diseases, including tuberculosis, advanced lung cancer and congestive heart failure. PMID:28627685
Spatiotemporal Analysis of the Ebola Hemorrhagic Fever in West Africa in 2014

NASA Astrophysics Data System (ADS)

Xu, M.; Cao, C. X.; Guo, H. F.

2017-09-01

Ebola hemorrhagic fever (EHF) is an acute hemorrhagic diseases caused by the Ebola virus, which is highly contagious. This paper aimed to explore the possible gathering area of EHF cases in West Africa in 2014, and identify endemic areas and their tendency by means of time-space analysis. We mapped distribution of EHF incidences and explored statistically significant space, time and space-time disease clusters. We utilized hotspot analysis to find the spatial clustering pattern on the basis of the actual outbreak cases. spatial-temporal cluster analysis is used to analyze the spatial or temporal distribution of agglomeration disease, examine whether its distribution is statistically significant. Local clusters were investigated using Kulldorff's scan statistic approach. The result reveals that the epidemic mainly gathered in the western part of Africa near north Atlantic with obvious regional distribution. For the current epidemic, we have found areas in high incidence of EVD by means of spatial cluster analysis.
The relationship between procrastination, learning strategies and statistics anxiety among Iranian college students: a canonical correlation analysis.

PubMed

Vahedi, Shahrum; Farrokhi, Farahman; Gahramani, Farahnaz; Issazadegan, Ali

2012-01-01

Approximately 66-80%of graduate students experience statistics anxiety and some researchers propose that many students identify statistics courses as the most anxiety-inducing courses in their academic curriculums. As such, it is likely that statistics anxiety is, in part, responsible for many students delaying enrollment in these courses for as long as possible. This paper proposes a canonical model by treating academic procrastination (AP), learning strategies (LS) as predictor variables and statistics anxiety (SA) as explained variables. A questionnaire survey was used for data collection and 246-college female student participated in this study. To examine the mutually independent relations between procrastination, learning strategies and statistics anxiety variables, a canonical correlation analysis was computed. Findings show that two canonical functions were statistically significant. The set of variables (metacognitive self-regulation, source management, preparing homework, preparing for test and preparing term papers) helped predict changes of statistics anxiety with respect to fearful behavior, Attitude towards math and class, Performance, but not Anxiety. These findings could be used in educational and psychological interventions in the context of statistics anxiety reduction.
New Statistics for Testing Differential Expression of Pathways from Microarray Data

NASA Astrophysics Data System (ADS)

Siu, Hoicheong; Dong, Hua; Jin, Li; Xiong, Momiao

Exploring biological meaning from microarray data is very important but remains a great challenge. Here, we developed three new statistics: linear combination test, quadratic test and de-correlation test to identify differentially expressed pathways from gene expression profile. We apply our statistics to two rheumatoid arthritis datasets. Notably, our results reveal three significant pathways and 275 genes in common in two datasets. The pathways we found are meaningful to uncover the disease mechanisms of rheumatoid arthritis, which implies that our statistics are a powerful tool in functional analysis of gene expression data.
Teaching Students to Use Summary Statistics and Graphics to Clean and Analyze Data

ERIC Educational Resources Information Center

Holcomb, John; Spalsbury, Angela

2005-01-01

Textbooks and websites today abound with real data. One neglected issue is that statistical investigations often require a good deal of "cleaning" to ready data for analysis. The purpose of this dataset and exercise is to teach students to use exploratory tools to identify erroneous observations. This article discusses the merits of such…

Which Variables Associated with Data-Driven Instruction Are Believed to Best Predict Urban Student Achievement?

ERIC Educational Resources Information Center

Greer, Wil

2013-01-01

This study identified the variables associated with data-driven instruction (DDI) that are perceived to best predict student achievement. Of the DDI variables discussed in the literature, 51 of them had a sufficient enough research base to warrant statistical analysis. Of them, 26 were statistically significant. Multiple regression and an…
Featured Article: Transcriptional landscape analysis identifies differently expressed genes involved in follicle-stimulating hormone induced postmenopausal osteoporosis.

PubMed

Maasalu, Katre; Laius, Ott; Zhytnik, Lidiia; Kõks, Sulev; Prans, Ele; Reimann, Ene; Märtson, Aare

2017-01-01

Osteoporosis is a disorder associated with bone tissue reorganization, bone mass, and mineral density. Osteoporosis can severely affect postmenopausal women, causing bone fragility and osteoporotic fractures. The aim of the current study was to compare blood mRNA profiles of postmenopausal women with and without osteoporosis, with the aim of finding different gene expressions and thus targets for future osteoporosis biomarker studies. Our study consisted of transcriptome analysis of whole blood serum from 12 elderly female osteoporotic patients and 12 non-osteoporotic elderly female controls. The transcriptome analysis was performed with RNA sequencing technology. For data analysis, the edgeR package of R Bioconductor was used. Two hundred and fourteen genes were expressed differently in osteoporotic compared with non-osteoporotic patients. Statistical analysis revealed 20 differently expressed genes with a false discovery rate of less than 1.47 × 10 -4 among osteoporotic patients. The expression of 10 genes were up-regulated and 10 down-regulated. Further statistical analysis identified a potential osteoporosis mRNA biomarker pattern consisting of six genes: CACNA1G, ALG13, SBK1, GGT7, MBNL3, and RIOK3. Functional ingenuity pathway analysis identified the strongest candidate genes with regard to potential involvement in a follicle-stimulating hormone activated network of increased osteoclast activity and hypogonadal bone loss. The differentially expressed genes identified in this study may contribute to future research of postmenopausal osteoporosis blood biomarkers.
Extended local similarity analysis (eLSA) of microbial community and other time series data with replicates.

PubMed

Xia, Li C; Steele, Joshua A; Cram, Jacob A; Cardon, Zoe G; Simmons, Sheri L; Vallino, Joseph J; Fuhrman, Jed A; Sun, Fengzhu

2011-01-01

The increasing availability of time series microbial community data from metagenomics and other molecular biological studies has enabled the analysis of large-scale microbial co-occurrence and association networks. Among the many analytical techniques available, the Local Similarity Analysis (LSA) method is unique in that it captures local and potentially time-delayed co-occurrence and association patterns in time series data that cannot otherwise be identified by ordinary correlation analysis. However LSA, as originally developed, does not consider time series data with replicates, which hinders the full exploitation of available information. With replicates, it is possible to understand the variability of local similarity (LS) score and to obtain its confidence interval. We extended our LSA technique to time series data with replicates and termed it extended LSA, or eLSA. Simulations showed the capability of eLSA to capture subinterval and time-delayed associations. We implemented the eLSA technique into an easy-to-use analytic software package. The software pipeline integrates data normalization, statistical correlation calculation, statistical significance evaluation, and association network construction steps. We applied the eLSA technique to microbial community and gene expression datasets, where unique time-dependent associations were identified. The extended LSA analysis technique was demonstrated to reveal statistically significant local and potentially time-delayed association patterns in replicated time series data beyond that of ordinary correlation analysis. These statistically significant associations can provide insights to the real dynamics of biological systems. The newly designed eLSA software efficiently streamlines the analysis and is freely available from the eLSA homepage, which can be accessed at http://meta.usc.edu/softs/lsa.
Extended local similarity analysis (eLSA) of microbial community and other time series data with replicates

PubMed Central

2011-01-01

Background The increasing availability of time series microbial community data from metagenomics and other molecular biological studies has enabled the analysis of large-scale microbial co-occurrence and association networks. Among the many analytical techniques available, the Local Similarity Analysis (LSA) method is unique in that it captures local and potentially time-delayed co-occurrence and association patterns in time series data that cannot otherwise be identified by ordinary correlation analysis. However LSA, as originally developed, does not consider time series data with replicates, which hinders the full exploitation of available information. With replicates, it is possible to understand the variability of local similarity (LS) score and to obtain its confidence interval. Results We extended our LSA technique to time series data with replicates and termed it extended LSA, or eLSA. Simulations showed the capability of eLSA to capture subinterval and time-delayed associations. We implemented the eLSA technique into an easy-to-use analytic software package. The software pipeline integrates data normalization, statistical correlation calculation, statistical significance evaluation, and association network construction steps. We applied the eLSA technique to microbial community and gene expression datasets, where unique time-dependent associations were identified. Conclusions The extended LSA analysis technique was demonstrated to reveal statistically significant local and potentially time-delayed association patterns in replicated time series data beyond that of ordinary correlation analysis. These statistically significant associations can provide insights to the real dynamics of biological systems. The newly designed eLSA software efficiently streamlines the analysis and is freely available from the eLSA homepage, which can be accessed at http://meta.usc.edu/softs/lsa. PMID:22784572
Rapid Classification and Identification of Multiple Microorganisms with Accurate Statistical Significance via High-Resolution Tandem Mass Spectrometry

NASA Astrophysics Data System (ADS)

Alves, Gelio; Wang, Guanghui; Ogurtsov, Aleksey Y.; Drake, Steven K.; Gucek, Marjan; Sacks, David B.; Yu, Yi-Kuo

2018-06-01

Rapid and accurate identification and classification of microorganisms is of paramount importance to public health and safety. With the advance of mass spectrometry (MS) technology, the speed of identification can be greatly improved. However, the increasing number of microbes sequenced is complicating correct microbial identification even in a simple sample due to the large number of candidates present. To properly untwine candidate microbes in samples containing one or more microbes, one needs to go beyond apparent morphology or simple "fingerprinting"; to correctly prioritize the candidate microbes, one needs to have accurate statistical significance in microbial identification. We meet these challenges by using peptide-centric representations of microbes to better separate them and by augmenting our earlier analysis method that yields accurate statistical significance. Here, we present an updated analysis workflow that uses tandem MS (MS/MS) spectra for microbial identification or classification. We have demonstrated, using 226 MS/MS publicly available data files (each containing from 2500 to nearly 100,000 MS/MS spectra) and 4000 additional MS/MS data files, that the updated workflow can correctly identify multiple microbes at the genus and often the species level for samples containing more than one microbe. We have also shown that the proposed workflow computes accurate statistical significances, i.e., E values for identified peptides and unified E values for identified microbes. Our updated analysis workflow MiCId, a freely available software for Microorganism Classification and Identification, is available for download at https://www.ncbi.nlm.nih.gov/CBBresearch/Yu/downloads.html.
Rapid Classification and Identification of Multiple Microorganisms with Accurate Statistical Significance via High-Resolution Tandem Mass Spectrometry.

PubMed

Alves, Gelio; Wang, Guanghui; Ogurtsov, Aleksey Y; Drake, Steven K; Gucek, Marjan; Sacks, David B; Yu, Yi-Kuo

2018-06-05

Rapid and accurate identification and classification of microorganisms is of paramount importance to public health and safety. With the advance of mass spectrometry (MS) technology, the speed of identification can be greatly improved. However, the increasing number of microbes sequenced is complicating correct microbial identification even in a simple sample due to the large number of candidates present. To properly untwine candidate microbes in samples containing one or more microbes, one needs to go beyond apparent morphology or simple "fingerprinting"; to correctly prioritize the candidate microbes, one needs to have accurate statistical significance in microbial identification. We meet these challenges by using peptide-centric representations of microbes to better separate them and by augmenting our earlier analysis method that yields accurate statistical significance. Here, we present an updated analysis workflow that uses tandem MS (MS/MS) spectra for microbial identification or classification. We have demonstrated, using 226 MS/MS publicly available data files (each containing from 2500 to nearly 100,000 MS/MS spectra) and 4000 additional MS/MS data files, that the updated workflow can correctly identify multiple microbes at the genus and often the species level for samples containing more than one microbe. We have also shown that the proposed workflow computes accurate statistical significances, i.e., E values for identified peptides and unified E values for identified microbes. Our updated analysis workflow MiCId, a freely available software for Microorganism Classification and Identification, is available for download at https://www.ncbi.nlm.nih.gov/CBBresearch/Yu/downloads.html . Graphical Abstract ᅟ.
Exploratory Visual Analysis of Statistical Results from Microarray Experiments Comparing High and Low Grade Glioma

PubMed Central

Reif, David M.; Israel, Mark A.; Moore, Jason H.

2007-01-01

The biological interpretation of gene expression microarray results is a daunting challenge. For complex diseases such as cancer, wherein the body of published research is extensive, the incorporation of expert knowledge provides a useful analytical framework. We have previously developed the Exploratory Visual Analysis (EVA) software for exploring data analysis results in the context of annotation information about each gene, as well as biologically relevant groups of genes. We present EVA as a flexible combination of statistics and biological annotation that provides a straightforward visual interface for the interpretation of microarray analyses of gene expression in the most commonly occuring class of brain tumors, glioma. We demonstrate the utility of EVA for the biological interpretation of statistical results by analyzing publicly available gene expression profiles of two important glial tumors. The results of a statistical comparison between 21 malignant, high-grade glioblastoma multiforme (GBM) tumors and 19 indolent, low-grade pilocytic astrocytomas were analyzed using EVA. By using EVA to examine the results of a relatively simple statistical analysis, we were able to identify tumor class-specific gene expression patterns having both statistical and biological significance. Our interactive analysis highlighted the potential importance of genes involved in cell cycle progression, proliferation, signaling, adhesion, migration, motility, and structure, as well as candidate gene loci on a region of Chromosome 7 that has been implicated in glioma. Because EVA does not require statistical or computational expertise and has the flexibility to accommodate any type of statistical analysis, we anticipate EVA will prove a useful addition to the repertoire of computational methods used for microarray data analysis. EVA is available at no charge to academic users and can be found at http://www.epistasis.org. PMID:19390666
The Use of Citation Counting to Identify Research Trends

ERIC Educational Resources Information Center

Rothman, Harry; Woodhead, Michael

1971-01-01

The analysis and application of manpower statistics to identify some long-term international research trends in economic entomology and pest conrol are described. Movements in research interests, particularly towards biological methods of control, correlations between these sectors, and the difficulties encountered in the construction of a…
Topical tranexamic acid in total knee replacement: a systematic review and meta-analysis.

PubMed

Panteli, Michalis; Papakostidis, Costas; Dahabreh, Ziad; Giannoudis, Peter V

2013-10-01

To examine the safety and efficacy of topical use of tranexamic acid (TA) in total knee arthroplasty (TKA). An electronic literature search of PubMed Medline; Ovid Medline; Embase; and the Cochrane Library was performed, identifying studies published in any language from 1966 to February 2013. The studies enrolled adults undergoing a primary TKA, where topical TA was used. Inverse variance statistical method and either a fixed or random effect model, depending on the absence or presence of statistical heterogeneity were used; subgroup analysis was performed when possible. We identified a total of seven eligible reports for analysis. Our meta-analysis indicated that when compared with the control group, topical application of TA limited significantly postoperative drain output (mean difference: -268.36ml), total blood loss (mean difference=-220.08ml), Hb drop (mean difference=-0.94g/dL) and lowered the risk of transfusion requirements (risk ratio=0.47, 95CI=0.26-0.84), without increased risk of thromboembolic events. Sub-group analysis indicated that a higher dose of topical TA (>2g) significantly reduced transfusion requirements. Although the present meta-analysis proved a statistically significant reduction of postoperative blood loss and transfusion requirements with topical use of TA in TKA, the clinical importance of the respective estimates of effect size should be interpreted with caution. I, II. Copyright © 2013 Elsevier B.V. All rights reserved.
A new statistic for identifying batch effects in high-throughput genomic data that uses guided principal component analysis.

PubMed

Reese, Sarah E; Archer, Kellie J; Therneau, Terry M; Atkinson, Elizabeth J; Vachon, Celine M; de Andrade, Mariza; Kocher, Jean-Pierre A; Eckel-Passow, Jeanette E

2013-11-15

Batch effects are due to probe-specific systematic variation between groups of samples (batches) resulting from experimental features that are not of biological interest. Principal component analysis (PCA) is commonly used as a visual tool to determine whether batch effects exist after applying a global normalization method. However, PCA yields linear combinations of the variables that contribute maximum variance and thus will not necessarily detect batch effects if they are not the largest source of variability in the data. We present an extension of PCA to quantify the existence of batch effects, called guided PCA (gPCA). We describe a test statistic that uses gPCA to test whether a batch effect exists. We apply our proposed test statistic derived using gPCA to simulated data and to two copy number variation case studies: the first study consisted of 614 samples from a breast cancer family study using Illumina Human 660 bead-chip arrays, whereas the second case study consisted of 703 samples from a family blood pressure study that used Affymetrix SNP Array 6.0. We demonstrate that our statistic has good statistical properties and is able to identify significant batch effects in two copy number variation case studies. We developed a new statistic that uses gPCA to identify whether batch effects exist in high-throughput genomic data. Although our examples pertain to copy number data, gPCA is general and can be used on other data types as well. The gPCA R package (Available via CRAN) provides functionality and data to perform the methods in this article. reesese@vcu.edu
Statistical quality control through overall vibration analysis

NASA Astrophysics Data System (ADS)

Carnero, M. ^a. Carmen; González-Palma, Rafael; Almorza, David; Mayorga, Pedro; López-Escobar, Carlos

2010-05-01

The present study introduces the concept of statistical quality control in automotive wheel bearings manufacturing processes. Defects on products under analysis can have a direct influence on passengers' safety and comfort. At present, the use of vibration analysis on machine tools for quality control purposes is not very extensive in manufacturing facilities. Noise and vibration are common quality problems in bearings. These failure modes likely occur under certain operating conditions and do not require high vibration amplitudes but relate to certain vibration frequencies. The vibration frequencies are affected by the type of surface problems (chattering) of ball races that are generated through grinding processes. The purpose of this paper is to identify grinding process variables that affect the quality of bearings by using statistical principles in the field of machine tools. In addition, an evaluation of the quality results of the finished parts under different combinations of process variables is assessed. This paper intends to establish the foundations to predict the quality of the products through the analysis of self-induced vibrations during the contact between the grinding wheel and the parts. To achieve this goal, the overall self-induced vibration readings under different combinations of process variables are analysed using statistical tools. The analysis of data and design of experiments follows a classical approach, considering all potential interactions between variables. The analysis of data is conducted through analysis of variance (ANOVA) for data sets that meet normality and homoscedasticity criteria. This paper utilizes different statistical tools to support the conclusions such as chi squared, Shapiro-Wilks, symmetry, Kurtosis, Cochran, Hartlett, and Hartley and Krushal-Wallis. The analysis presented is the starting point to extend the use of predictive techniques (vibration analysis) for quality control. This paper demonstrates the existence of predictive variables (high-frequency vibration displacements) that are sensible to the processes setup and the quality of the products obtained. Based on the result of this overall vibration analysis, a second paper will analyse self-induced vibration spectrums in order to define limit vibration bands, controllable every cycle or connected to permanent vibration-monitoring systems able to adjust sensible process variables identified by ANOVA, once the vibration readings exceed established quality limits.
Spatial statistical analysis of tree deaths using airborne digital imagery

NASA Astrophysics Data System (ADS)

Chang, Ya-Mei; Baddeley, Adrian; Wallace, Jeremy; Canci, Michael

2013-04-01

High resolution digital airborne imagery offers unprecedented opportunities for observation and monitoring of vegetation, providing the potential to identify, locate and track individual vegetation objects over time. Analytical tools are required to quantify relevant information. In this paper, locations of trees over a large area of native woodland vegetation were identified using morphological image analysis techniques. Methods of spatial point process statistics were then applied to estimate the spatially-varying tree death risk, and to show that it is significantly non-uniform. [Tree deaths over the area were detected in our previous work (Wallace et al., 2008).] The study area is a major source of ground water for the city of Perth, and the work was motivated by the need to understand and quantify vegetation changes in the context of water extraction and drying climate. The influence of hydrological variables on tree death risk was investigated using spatial statistics (graphical exploratory methods, spatial point pattern modelling and diagnostics).
Microscopic saw mark analysis: an empirical approach.

PubMed

Love, Jennifer C; Derrick, Sharon M; Wiersema, Jason M; Peters, Charles

2015-01-01

Microscopic saw mark analysis is a well published and generally accepted qualitative analytical method. However, little research has focused on identifying and mitigating potential sources of error associated with the method. The presented study proposes the use of classification trees and random forest classifiers as an optimal, statistically sound approach to mitigate the potential for error of variability and outcome error in microscopic saw mark analysis. The statistical model was applied to 58 experimental saw marks created with four types of saws. The saw marks were made in fresh human femurs obtained through anatomical gift and were analyzed using a Keyence digital microscope. The statistical approach weighed the variables based on discriminatory value and produced decision trees with an associated outcome error rate of 8.62-17.82%. © 2014 American Academy of Forensic Sciences.
A Mokken scale analysis of the peer physical examination questionnaire.

PubMed

Vaughan, Brett; Grace, Sandra

2018-01-01

Peer physical examination (PPE) is a teaching and learning strategy utilised in most health profession education programs. Perceptions of participating in PPE have been described in the literature, focusing on areas of the body students are willing, or unwilling, to examine. A small number of questionnaires exist to evaluate these perceptions, however none have described the measurement properties that may allow them to be used longitudinally. The present study undertook a Mokken scale analysis of the Peer Physical Examination Questionnaire (PPEQ) to evaluate its dimensionality and structure when used with Australian osteopathy students. Students enrolled in Year 1 of the osteopathy programs at Victoria University (Melbourne, Australia) and Southern Cross University (Lismore, Australia) were invited to complete the PPEQ prior to their first practical skills examination class. R, an open-source statistics program, was used to generate the descriptive statistics and perform a Mokken scale analysis. Mokken scale analysis is a non-parametric item response theory approach that is used to cluster items measuring a latent construct. Initial analysis suggested the PPEQ did not form a single scale. Further analysis identified three subscales: 'comfort', 'concern', and 'professionalism and education'. The properties of each subscale suggested they were unidimensional with variable internal structures. The 'comfort' subscale was the strongest of the three identified. All subscales demonstrated acceptable reliability estimation statistics (McDonald's omega > 0.75) supporting the calculation of a sum score for each subscale. The subscales identified are consistent with the literature. The 'comfort' subscale may be useful to longitudinally evaluate student perceptions of PPE. Further research is required to evaluate changes with PPE and the utility of the questionnaire with other health profession education programs.
Spatial variation of volcanic rock geochemistry in the Virunga Volcanic Province: Statistical analysis of an integrated database

NASA Astrophysics Data System (ADS)

Barette, Florian; Poppe, Sam; Smets, Benoît; Benbakkar, Mhammed; Kervyn, Matthieu

2017-10-01

We present an integrated, spatially-explicit database of existing geochemical major-element analyses available from (post-) colonial scientific reports, PhD Theses and international publications for the Virunga Volcanic Province, located in the western branch of the East African Rift System. This volcanic province is characterised by alkaline volcanism, including silica-undersaturated, alkaline and potassic lavas. The database contains a total of 908 geochemical analyses of eruptive rocks for the entire volcanic province with a localisation for most samples. A preliminary analysis of the overall consistency of the database, using statistical techniques on sets of geochemical analyses with contrasted analytical methods or dates, demonstrates that the database is consistent. We applied a principal component analysis and cluster analysis on whole-rock major element compositions included in the database to study the spatial variation of the chemical composition of eruptive products in the Virunga Volcanic Province. These statistical analyses identify spatially distributed clusters of eruptive products. The known geochemical contrasts are highlighted by the spatial analysis, such as the unique geochemical signature of Nyiragongo lavas compared to other Virunga lavas, the geochemical heterogeneity of the Bulengo area, and the trachyte flows of Karisimbi volcano. Most importantly, we identified separate clusters of eruptive products which originate from primitive magmatic sources. These lavas of primitive composition are preferentially located along NE-SW inherited rift structures, often at distance from the central Virunga volcanoes. Our results illustrate the relevance of a spatial analysis on integrated geochemical data for a volcanic province, as a complement to classical petrological investigations. This approach indeed helps to characterise geochemical variations within a complex of magmatic systems and to identify specific petrologic and geochemical investigations that should be tackled within a study area.
Post-operative diffusion weighted imaging as a predictor of posterior fossa syndrome permanence in paediatric medulloblastoma.

PubMed

Chua, Felicia H Z; Thien, Ady; Ng, Lee Ping; Seow, Wan Tew; Low, David C Y; Chang, Kenneth T E; Lian, Derrick W Q; Loh, Eva; Low, Sharon Y Y

2017-03-01

Posterior fossa syndrome (PFS) is a serious complication faced by neurosurgeons and their patients, especially in paediatric medulloblastoma patients. The uncertain aetiology of PFS, myriad of cited risk factors and therapeutic challenges make this phenomenon an elusive entity. The primary objective of this study was to identify associative factors related to the development of PFS in medulloblastoma patient post-tumour resection. This is a retrospective study based at a single institution. Patient data and all related information were collected from the hospital records, in accordance to a list of possible risk factors associated with PFS. These included pre-operative tumour volume, hydrocephalus, age, gender, extent of resection, metastasis, ventriculoperitoneal shunt insertion, post-operative meningitis and radiological changes in MRI. Additional variables included molecular and histological subtypes of each patient's medulloblastoma tumour. Statistical analysis was employed to determine evidence of each variable's significance in PFS permanence. A total of 19 patients with appropriately complete data was identified. Initial univariate analysis did not show any statistical significance. However, multivariate analysis for MRI-specific changes reported bilateral DWI restricted diffusion changes involving both right and left sides of the surgical cavity was of statistical significance for PFS permanence. The authors performed a clinical study that evaluated possible risk factors for permanent PFS in paediatric medulloblastoma patients. Analysis of collated results found that post-operative DWI restriction in bilateral regions within the surgical cavity demonstrated statistical significance as a predictor of PFS permanence-a novel finding in the current literature.
Configural Frequency Analysis as a Statistical Tool for Developmental Research.

ERIC Educational Resources Information Center

Lienert, Gustav A.; Oeveste, Hans Zur

1985-01-01

Configural frequency analysis (CFA) is suggested as a technique for longitudinal research in developmental psychology. Stability and change in answers to multiple choice and yes-no item patterns obtained with repeated measurements are identified by CFA and illustrated by developmental analysis of an item from Gorham's Proverb Test. (Author/DWH)
AIDS Education for Tanzanian Youth: A Mediation Analysis

ERIC Educational Resources Information Center

Stigler, Melissa H.; Kugler, K. C.; Komro, K. A.; Leshabari, M. T.; Klepp, K. I.

2006-01-01

Mediation analysis is a statistical technique that can be used to identify mechanisms by which intervention programs achieve their effects. This paper presents the results of a mediation analysis of Ngao, an acquired immunodeficiency syndrome (AIDS) education program that was implemented with school children in Grades 6 and 7 in Tanzania in the…
Patterns of Puffery: An Analysis of Non-Fiction Blurbs

ERIC Educational Resources Information Center

Cronin, Blaise; La Barre, Kathryn

2005-01-01

The blurb is a paratextual element which has not previously been subjected to systematic analysis. We describe the nature and purpose of this publishing epiphenomenon, highlight some of the related marketing issues and ethical concerns and provide a statistical analysis of almost 2000 blurbs identified in a sample of 450 non-fiction books.…
Statistical competencies for medical research learners: What is fundamental?

PubMed

Enders, Felicity T; Lindsell, Christopher J; Welty, Leah J; Benn, Emma K T; Perkins, Susan M; Mayo, Matthew S; Rahbar, Mohammad H; Kidwell, Kelley M; Thurston, Sally W; Spratt, Heidi; Grambow, Steven C; Larson, Joseph; Carter, Rickey E; Pollock, Brad H; Oster, Robert A

2017-06-01

It is increasingly essential for medical researchers to be literate in statistics, but the requisite degree of literacy is not the same for every statistical competency in translational research. Statistical competency can range from 'fundamental' (necessary for all) to 'specialized' (necessary for only some). In this study, we determine the degree to which each competency is fundamental or specialized. We surveyed members of 4 professional organizations, targeting doctorally trained biostatisticians and epidemiologists who taught statistics to medical research learners in the past 5 years. Respondents rated 24 educational competencies on a 5-point Likert scale anchored by 'fundamental' and 'specialized.' There were 112 responses. Nineteen of 24 competencies were fundamental. The competencies considered most fundamental were assessing sources of bias and variation (95%), recognizing one's own limits with regard to statistics (93%), identifying the strengths, and limitations of study designs (93%). The least endorsed items were meta-analysis (34%) and stopping rules (18%). We have identified the statistical competencies needed by all medical researchers. These competencies should be considered when designing statistical curricula for medical researchers and should inform which topics are taught in graduate programs and evidence-based medicine courses where learners need to read and understand the medical research literature.

Statistical analysis of long-term monitoring data for persistent organic pollutants in the atmosphere at 20 monitoring stations broadly indicates declining concentrations.

PubMed

Kong, Deguo; MacLeod, Matthew; Hung, Hayley; Cousins, Ian T

2014-11-04

During recent decades concentrations of persistent organic pollutants (POPs) in the atmosphere have been monitored at multiple stations worldwide. We used three statistical methods to analyze a total of 748 time series of selected POPs in the atmosphere to determine if there are statistically significant reductions in levels of POPs that have had control actions enacted to restrict or eliminate manufacture, use and emissions. Significant decreasing trends were identified in 560 (75%) of the 748 time series collected from the Arctic, North America, and Europe, indicating that the atmospheric concentrations of these POPs are generally decreasing, consistent with the overall effectiveness of emission control actions. Statistically significant trends in synthetic time series could be reliably identified with the improved Mann-Kendall (iMK) test and the digital filtration (DF) technique in time series longer than 5 years. The temporal trends of new (or emerging) POPs in the atmosphere are often unclear because time series are too short. A statistical detrending method based on the iMK test was not able to identify abrupt changes in the rates of decline of atmospheric POP concentrations encoded into synthetic time series.
Statistical analysis for understanding and predicting battery degradations in real-life electric vehicle use

NASA Astrophysics Data System (ADS)

Barré, Anthony; Suard, Frédéric; Gérard, Mathias; Montaru, Maxime; Riu, Delphine

2014-01-01

This paper describes the statistical analysis of recorded data parameters of electrical battery ageing during electric vehicle use. These data permit traditional battery ageing investigation based on the evolution of the capacity fade and resistance raise. The measured variables are examined in order to explain the correlation between battery ageing and operating conditions during experiments. Such study enables us to identify the main ageing factors. Then, detailed statistical dependency explorations present the responsible factors on battery ageing phenomena. Predictive battery ageing models are built from this approach. Thereby results demonstrate and quantify a relationship between variables and battery ageing global observations, and also allow accurate battery ageing diagnosis through predictive models.
The Essential Genome of Escherichia coli K-12

PubMed Central

2018-01-01

ABSTRACT Transposon-directed insertion site sequencing (TraDIS) is a high-throughput method coupling transposon mutagenesis with short-fragment DNA sequencing. It is commonly used to identify essential genes. Single gene deletion libraries are considered the gold standard for identifying essential genes. Currently, the TraDIS method has not been benchmarked against such libraries, and therefore, it remains unclear whether the two methodologies are comparable. To address this, a high-density transposon library was constructed in Escherichia coli K-12. Essential genes predicted from sequencing of this library were compared to existing essential gene databases. To decrease false-positive identification of essential genes, statistical data analysis included corrections for both gene length and genome length. Through this analysis, new essential genes and genes previously incorrectly designated essential were identified. We show that manual analysis of TraDIS data reveals novel features that would not have been detected by statistical analysis alone. Examples include short essential regions within genes, orientation-dependent effects, and fine-resolution identification of genome and protein features. Recognition of these insertion profiles in transposon mutagenesis data sets will assist genome annotation of less well characterized genomes and provides new insights into bacterial physiology and biochemistry. PMID:29463657
Applications of Remote Sensing and GIS(Geographic Information System) in Crime Analysis of Gujranwala City.

NASA Astrophysics Data System (ADS)

Munawar, Iqra

2016-07-01

Crime mapping is a dynamic process. It can be used to assist all stages of the problem solving process. Mapping crime can help police protect citizens more effectively. The decision to utilize a certain type of map or design element may change based on the purpose of a map, the audience or the available data. If the purpose of the crime analysis map is to assist in the identification of a particular problem, selected data may be mapped to identify patterns of activity that have been previously undetected. The main objective of this research was to study the spatial distribution patterns of the four common crimes i.e Narcotics, Arms, Burglary and Robbery in Gujranwala City using spatial statistical techniques to identify the hotspots. Hotspots or location of clusters were identified using Getis-Ord Gi* Statistic. Crime analysis mapping can be used to conduct a comprehensive spatial analysis of the problem. Graphic presentations of such findings provide a powerful medium to communicate conditions, patterns and trends thus creating an avenue for analysts to bring about significant policy changes. Moreover Crime mapping also helps in the reduction of crime rate.
Using assemblage data in ecological indicators: A comparison and evaluation of commonly available statistical tools

USGS Publications Warehouse

Smith, Joseph M.; Mather, Martha E.

2012-01-01

Ecological indicators are science-based tools used to assess how human activities have impacted environmental resources. For monitoring and environmental assessment, existing species assemblage data can be used to make these comparisons through time or across sites. An impediment to using assemblage data, however, is that these data are complex and need to be simplified in an ecologically meaningful way. Because multivariate statistics are mathematical relationships, statistical groupings may not make ecological sense and will not have utility as indicators. Our goal was to define a process to select defensible and ecologically interpretable statistical simplifications of assemblage data in which researchers and managers can have confidence. For this, we chose a suite of statistical methods, compared the groupings that resulted from these analyses, identified convergence among groupings, then we interpreted the groupings using species and ecological guilds. When we tested this approach using a statewide stream fish dataset, not all statistical methods worked equally well. For our dataset, logistic regression (Log), detrended correspondence analysis (DCA), cluster analysis (CL), and non-metric multidimensional scaling (NMDS) provided consistent, simplified output. Specifically, the Log, DCA, CL-1, and NMDS-1 groupings were ≥60% similar to each other, overlapped with the fluvial-specialist ecological guild, and contained a common subset of species. Groupings based on number of species (e.g., Log, DCA, CL and NMDS) outperformed groupings based on abundance [e.g., principal components analysis (PCA) and Poisson regression]. Although the specific methods that worked on our test dataset have generality, here we are advocating a process (e.g., identifying convergent groupings with redundant species composition that are ecologically interpretable) rather than the automatic use of any single statistical tool. We summarize this process in step-by-step guidance for the future use of these commonly available ecological and statistical methods in preparing assemblage data for use in ecological indicators.
A Guideline to Univariate Statistical Analysis for LC/MS-Based Untargeted Metabolomics-Derived Data

PubMed Central

Vinaixa, Maria; Samino, Sara; Saez, Isabel; Duran, Jordi; Guinovart, Joan J.; Yanes, Oscar

2012-01-01

Several metabolomic software programs provide methods for peak picking, retention time alignment and quantification of metabolite features in LC/MS-based metabolomics. Statistical analysis, however, is needed in order to discover those features significantly altered between samples. By comparing the retention time and MS/MS data of a model compound to that from the altered feature of interest in the research sample, metabolites can be then unequivocally identified. This paper reports on a comprehensive overview of a workflow for statistical analysis to rank relevant metabolite features that will be selected for further MS/MS experiments. We focus on univariate data analysis applied in parallel on all detected features. Characteristics and challenges of this analysis are discussed and illustrated using four different real LC/MS untargeted metabolomic datasets. We demonstrate the influence of considering or violating mathematical assumptions on which univariate statistical test rely, using high-dimensional LC/MS datasets. Issues in data analysis such as determination of sample size, analytical variation, assumption of normality and homocedasticity, or correction for multiple testing are discussed and illustrated in the context of our four untargeted LC/MS working examples. PMID:24957762
A Guideline to Univariate Statistical Analysis for LC/MS-Based Untargeted Metabolomics-Derived Data.

PubMed

Vinaixa, Maria; Samino, Sara; Saez, Isabel; Duran, Jordi; Guinovart, Joan J; Yanes, Oscar

2012-10-18

Several metabolomic software programs provide methods for peak picking, retention time alignment and quantification of metabolite features in LC/MS-based metabolomics. Statistical analysis, however, is needed in order to discover those features significantly altered between samples. By comparing the retention time and MS/MS data of a model compound to that from the altered feature of interest in the research sample, metabolites can be then unequivocally identified. This paper reports on a comprehensive overview of a workflow for statistical analysis to rank relevant metabolite features that will be selected for further MS/MS experiments. We focus on univariate data analysis applied in parallel on all detected features. Characteristics and challenges of this analysis are discussed and illustrated using four different real LC/MS untargeted metabolomic datasets. We demonstrate the influence of considering or violating mathematical assumptions on which univariate statistical test rely, using high-dimensional LC/MS datasets. Issues in data analysis such as determination of sample size, analytical variation, assumption of normality and homocedasticity, or correction for multiple testing are discussed and illustrated in the context of our four untargeted LC/MS working examples.
A large-scale perspective on stress-induced alterations in resting-state networks

NASA Astrophysics Data System (ADS)

Maron-Katz, Adi; Vaisvaser, Sharon; Lin, Tamar; Hendler, Talma; Shamir, Ron

2016-02-01

Stress is known to induce large-scale neural modulations. However, its neural effect once the stressor is removed and how it relates to subjective experience are not fully understood. Here we used a statistically sound data-driven approach to investigate alterations in large-scale resting-state functional connectivity (rsFC) induced by acute social stress. We compared rsfMRI profiles of 57 healthy male subjects before and after stress induction. Using a parcellation-based univariate statistical analysis, we identified a large-scale rsFC change, involving 490 parcel-pairs. Aiming to characterize this change, we employed statistical enrichment analysis, identifying anatomic structures that were significantly interconnected by these pairs. This analysis revealed strengthening of thalamo-cortical connectivity and weakening of cross-hemispheral parieto-temporal connectivity. These alterations were further found to be associated with change in subjective stress reports. Integrating report-based information on stress sustainment 20 minutes post induction, revealed a single significant rsFC change between the right amygdala and the precuneus, which inversely correlated with the level of subjective recovery. Our study demonstrates the value of enrichment analysis for exploring large-scale network reorganization patterns, and provides new insight on stress-induced neural modulations and their relation to subjective experience.
Analysis and meta-analysis of single-case designs with a standardized mean difference statistic: a primer and applications.

PubMed

Shadish, William R; Hedges, Larry V; Pustejovsky, James E

2014-04-01

This article presents a d-statistic for single-case designs that is in the same metric as the d-statistic used in between-subjects designs such as randomized experiments and offers some reasons why such a statistic would be useful in SCD research. The d has a formal statistical development, is accompanied by appropriate power analyses, and can be estimated using user-friendly SPSS macros. We discuss both advantages and disadvantages of d compared to other approaches such as previous d-statistics, overlap statistics, and multilevel modeling. It requires at least three cases for computation and assumes normally distributed outcomes and stationarity, assumptions that are discussed in some detail. We also show how to test these assumptions. The core of the article then demonstrates in depth how to compute d for one study, including estimation of the autocorrelation and the ratio of between case variance to total variance (between case plus within case variance), how to compute power using a macro, and how to use the d to conduct a meta-analysis of studies using single-case designs in the free program R, including syntax in an appendix. This syntax includes how to read data, compute fixed and random effect average effect sizes, prepare a forest plot and a cumulative meta-analysis, estimate various influence statistics to identify studies contributing to heterogeneity and effect size, and do various kinds of publication bias analyses. This d may prove useful for both the analysis and meta-analysis of data from SCDs. Copyright © 2013 Society for the Study of School Psychology. Published by Elsevier Ltd. All rights reserved.
The Essential Genome of Escherichia coli K-12.

PubMed

Goodall, Emily C A; Robinson, Ashley; Johnston, Iain G; Jabbari, Sara; Turner, Keith A; Cunningham, Adam F; Lund, Peter A; Cole, Jeffrey A; Henderson, Ian R

2018-02-20

Transposon-directed insertion site sequencing (TraDIS) is a high-throughput method coupling transposon mutagenesis with short-fragment DNA sequencing. It is commonly used to identify essential genes. Single gene deletion libraries are considered the gold standard for identifying essential genes. Currently, the TraDIS method has not been benchmarked against such libraries, and therefore, it remains unclear whether the two methodologies are comparable. To address this, a high-density transposon library was constructed in Escherichia coli K-12. Essential genes predicted from sequencing of this library were compared to existing essential gene databases. To decrease false-positive identification of essential genes, statistical data analysis included corrections for both gene length and genome length. Through this analysis, new essential genes and genes previously incorrectly designated essential were identified. We show that manual analysis of TraDIS data reveals novel features that would not have been detected by statistical analysis alone. Examples include short essential regions within genes, orientation-dependent effects, and fine-resolution identification of genome and protein features. Recognition of these insertion profiles in transposon mutagenesis data sets will assist genome annotation of less well characterized genomes and provides new insights into bacterial physiology and biochemistry. IMPORTANCE Incentives to define lists of genes that are essential for bacterial survival include the identification of potential targets for antibacterial drug development, genes required for rapid growth for exploitation in biotechnology, and discovery of new biochemical pathways. To identify essential genes in Escherichia coli , we constructed a transposon mutant library of unprecedented density. Initial automated analysis of the resulting data revealed many discrepancies compared to the literature. We now report more extensive statistical analysis supported by both literature searches and detailed inspection of high-density TraDIS sequencing data for each putative essential gene for the E. coli model laboratory organism. This paper is important because it provides a better understanding of the essential genes of E. coli , reveals the limitations of relying on automated analysis alone, and provides a new standard for the analysis of TraDIS data. Copyright © 2018 Goodall et al.
Gauging Skills of Hospital Security Personnel: a Statistically-driven, Questionnaire-based Approach.

PubMed

Rinkoo, Arvind Vashishta; Mishra, Shubhra; Rahesuddin; Nabi, Tauqeer; Chandra, Vidha; Chandra, Hem

2013-01-01

This study aims to gauge the technical and soft skills of the hospital security personnel so as to enable prioritization of their training needs. A cross sectional questionnaire based study was conducted in December 2011. Two separate predesigned and pretested questionnaires were used for gauging soft skills and technical skills of the security personnel. Extensive statistical analysis, including Multivariate Analysis (Pillai-Bartlett trace along with Multi-factorial ANOVA) and Post-hoc Tests (Bonferroni Test) was applied. The 143 participants performed better on the soft skills front with an average score of 6.43 and standard deviation of 1.40. The average technical skills score was 5.09 with a standard deviation of 1.44. The study avowed a need for formal hands on training with greater emphasis on technical skills. Multivariate analysis of the available data further helped in identifying 20 security personnel who should be prioritized for soft skills training and a group of 36 security personnel who should receive maximum attention during technical skills training. This statistically driven approach can be used as a prototype by healthcare delivery institutions worldwide, after situation specific customizations, to identify the training needs of any category of healthcare staff.
Identifying biologically relevant differences between metagenomic communities.

PubMed

Parks, Donovan H; Beiko, Robert G

2010-03-15

Metagenomics is the study of genetic material recovered directly from environmental samples. Taxonomic and functional differences between metagenomic samples can highlight the influence of ecological factors on patterns of microbial life in a wide range of habitats. Statistical hypothesis tests can help us distinguish ecological influences from sampling artifacts, but knowledge of only the P-value from a statistical hypothesis test is insufficient to make inferences about biological relevance. Current reporting practices for pairwise comparative metagenomics are inadequate, and better tools are needed for comparative metagenomic analysis. We have developed a new software package, STAMP, for comparative metagenomics that supports best practices in analysis and reporting. Examination of a pair of iron mine metagenomes demonstrates that deeper biological insights can be gained using statistical techniques available in our software. An analysis of the functional potential of 'Candidatus Accumulibacter phosphatis' in two enhanced biological phosphorus removal metagenomes identified several subsystems that differ between the A.phosphatis stains in these related communities, including phosphate metabolism, secretion and metal transport. Python source code and binaries are freely available from our website at http://kiwi.cs.dal.ca/Software/STAMP CONTACT: beiko@cs.dal.ca Supplementary data are available at Bioinformatics online.
Gauging Skills of Hospital Security Personnel: a Statistically-driven, Questionnaire-based Approach

PubMed Central

Rinkoo, Arvind Vashishta; Mishra, Shubhra; Rahesuddin; Nabi, Tauqeer; Chandra, Vidha; Chandra, Hem

2013-01-01

Objectives This study aims to gauge the technical and soft skills of the hospital security personnel so as to enable prioritization of their training needs. Methodology A cross sectional questionnaire based study was conducted in December 2011. Two separate predesigned and pretested questionnaires were used for gauging soft skills and technical skills of the security personnel. Extensive statistical analysis, including Multivariate Analysis (Pillai-Bartlett trace along with Multi-factorial ANOVA) and Post-hoc Tests (Bonferroni Test) was applied. Results The 143 participants performed better on the soft skills front with an average score of 6.43 and standard deviation of 1.40. The average technical skills score was 5.09 with a standard deviation of 1.44. The study avowed a need for formal hands on training with greater emphasis on technical skills. Multivariate analysis of the available data further helped in identifying 20 security personnel who should be prioritized for soft skills training and a group of 36 security personnel who should receive maximum attention during technical skills training. Conclusion This statistically driven approach can be used as a prototype by healthcare delivery institutions worldwide, after situation specific customizations, to identify the training needs of any category of healthcare staff. PMID:23559904
Rapid differentiation of Chinese hop varieties (Humulus lupulus) using volatile fingerprinting by HS-SPME-GC-MS combined with multivariate statistical analysis.

PubMed

Liu, Zechang; Wang, Liping; Liu, Yumei

2018-01-18

Hops impart flavor to beer, with the volatile components characterizing the various hop varieties and qualities. Fingerprinting, especially flavor fingerprinting, is often used to identify 'flavor products' because inconsistencies in the description of flavor may lead to an incorrect definition of beer quality. Compared to flavor fingerprinting, volatile fingerprinting is simpler and easier. We performed volatile fingerprinting using head space-solid phase micro-extraction gas chromatography-mass spectrometry combined with similarity analysis and principal component analysis (PCA) for evaluating and distinguishing between three major Chinese hops. Eighty-four volatiles were identified, which were classified into seven categories. Volatile fingerprinting based on similarity analysis did not yield any obvious result. By contrast, hop varieties and qualities were identified using volatile fingerprinting based on PCA. The potential variables explained the variance in the three hop varieties. In addition, the dendrogram and principal component score plot described the differences and classifications of hops. Volatile fingerprinting plus multivariate statistical analysis can rapidly differentiate between the different varieties and qualities of the three major Chinese hops. Furthermore, this method can be used as a reference in other fields. © 2018 Society of Chemical Industry. © 2018 Society of Chemical Industry.
The Relationship Between Procrastination, Learning Strategies and Statistics Anxiety Among Iranian College Students: A Canonical Correlation Analysis

PubMed Central

Vahedi, Shahrum; Farrokhi, Farahman; Gahramani, Farahnaz; Issazadegan, Ali

2012-01-01

Objective: Approximately 66-80%of graduate students experience statistics anxiety and some researchers propose that many students identify statistics courses as the most anxiety-inducing courses in their academic curriculums. As such, it is likely that statistics anxiety is, in part, responsible for many students delaying enrollment in these courses for as long as possible. This paper proposes a canonical model by treating academic procrastination (AP), learning strategies (LS) as predictor variables and statistics anxiety (SA) as explained variables. Methods: A questionnaire survey was used for data collection and 246-college female student participated in this study. To examine the mutually independent relations between procrastination, learning strategies and statistics anxiety variables, a canonical correlation analysis was computed. Results: Findings show that two canonical functions were statistically significant. The set of variables (metacognitive self-regulation, source management, preparing homework, preparing for test and preparing term papers) helped predict changes of statistics anxiety with respect to fearful behavior, Attitude towards math and class, Performance, but not Anxiety. Conclusion: These findings could be used in educational and psychological interventions in the context of statistics anxiety reduction. PMID:24644468
Evaluating the utility of companion animal tick surveillance practices for monitoring spread and occurrence of human Lyme disease in West Virginia, 2014-2016.

PubMed

Hendricks, Brian; Mark-Carew, Miguella; Conley, Jamison

2017-11-13

Domestic dogs and cats are potentially effective sentinel populations for monitoring occurrence and spread of Lyme disease. Few studies have evaluated the public health utility of sentinel programmes using geo-analytic approaches. Confirmed Lyme disease cases diagnosed by physicians and ticks submitted by veterinarians to the West Virginia State Health Department were obtained for 2014-2016. Ticks were identified to species, and only Ixodes scapularis were incorporated in the analysis. Separate ordinary least squares (OLS) and spatial lag regression models were conducted to estimate the association between average numbers of Ix. scapularis collected on pets and human Lyme disease incidence. Regression residuals were visualised using Local Moran's I as a diagnostic tool to identify spatial dependence. Statistically significant associations were identified between average numbers of Ix. scapularis collected from dogs and human Lyme disease in the OLS (β=20.7, P<0.001) and spatial lag (β=12.0, P=0.002) regression. No significant associations were identified for cats in either regression model. Statistically significant (P≤0.05) spatial dependence was identified in all regression models. Local Moran's I maps produced for spatial lag regression residuals indicated a decrease in model over- and under-estimation, but identified a higher number of statistically significant outliers than OLS regression. Results support previous conclusions that dogs are effective sentinel populations for monitoring risk of human exposure to Lyme disease. Findings reinforce the utility of spatial analysis of surveillance data, and highlight West Virginia's unique position within the eastern United States in regards to Lyme disease occurrence.
Using Markov Chain Analyses in Counselor Education Research

ERIC Educational Resources Information Center

Duys, David K.; Headrick, Todd C.

2004-01-01

This study examined the efficacy of an infrequently used statistical analysis in counselor education research. A Markov chain analysis was used to examine hypothesized differences between students' use of counseling skills in an introductory course. Thirty graduate students participated in the study. Independent raters identified the microskills…
PROMISE: a tool to identify genomic features with a specific biologically interesting pattern of associations with multiple endpoint variables.

PubMed

Pounds, Stan; Cheng, Cheng; Cao, Xueyuan; Crews, Kristine R; Plunkett, William; Gandhi, Varsha; Rubnitz, Jeffrey; Ribeiro, Raul C; Downing, James R; Lamba, Jatinder

2009-08-15

In some applications, prior biological knowledge can be used to define a specific pattern of association of multiple endpoint variables with a genomic variable that is biologically most interesting. However, to our knowledge, there is no statistical procedure designed to detect specific patterns of association with multiple endpoint variables. Projection onto the most interesting statistical evidence (PROMISE) is proposed as a general procedure to identify genomic variables that exhibit a specific biologically interesting pattern of association with multiple endpoint variables. Biological knowledge of the endpoint variables is used to define a vector that represents the biologically most interesting values for statistics that characterize the associations of the endpoint variables with a genomic variable. A test statistic is defined as the dot-product of the vector of the observed association statistics and the vector of the most interesting values of the association statistics. By definition, this test statistic is proportional to the length of the projection of the observed vector of correlations onto the vector of most interesting associations. Statistical significance is determined via permutation. In simulation studies and an example application, PROMISE shows greater statistical power to identify genes with the interesting pattern of associations than classical multivariate procedures, individual endpoint analyses or listing genes that have the pattern of interest and are significant in more than one individual endpoint analysis. Documented R routines are freely available from www.stjuderesearch.org/depts/biostats and will soon be available as a Bioconductor package from www.bioconductor.org.
Discovering genetic variants in Crohn's disease by exploring genomic regions enriched of weak association signals.

PubMed

D'Addabbo, Annarita; Palmieri, Orazio; Maglietta, Rosalia; Latiano, Anna; Mukherjee, Sayan; Annese, Vito; Ancona, Nicola

2011-08-01

A meta-analysis has re-analysed previous genome-wide association scanning definitively confirming eleven genes and further identifying 21 new loci. However, the identified genes/loci still explain only the minority of genetic predisposition of Crohn's disease. To identify genes weakly involved in disease predisposition by analysing chromosomal regions enriched of single nucleotide polymorphisms with modest statistical association. We utilized the WTCCC data set evaluating 1748 CD and 2938 controls. The identification of candidate genes/loci was performed by a two-step procedure: first of all chromosomal regions enriched of weak association signals were localized; subsequently, weak signals clustered in gene regions were identified. The statistical significance was assessed by non parametric permutation tests. The cytoband enrichment analysis highlighted 44 regions (P≤0.05) enriched with single nucleotide polymorphisms significantly associated with the trait including 23 out of 31 previously confirmed and replicated genes. Importantly, we highlight further 20 novel chromosomal regions carrying approximately one hundred genes/loci with modest association. Amongst these we find compelling functional candidate genes such as MAPT, GRB2 and CREM, LCT, and IL12RB2. Our study suggests a different statistical perspective to discover genes weakly associated with a given trait, although further confirmatory functional studies are needed. Copyright © 2011 Editrice Gastroenterologica Italiana S.r.l. All rights reserved.
A Genome-Wide Association Analysis Reveals Epistatic Cancellation of Additive Genetic Variance for Root Length in Arabidopsis thaliana.

PubMed

Lachowiec, Jennifer; Shen, Xia; Queitsch, Christine; Carlborg, Örjan

2015-01-01

Efforts to identify loci underlying complex traits generally assume that most genetic variance is additive. Here, we examined the genetics of Arabidopsis thaliana root length and found that the genomic narrow-sense heritability for this trait in the examined population was statistically zero. The low amount of additive genetic variance that could be captured by the genome-wide genotypes likely explains why no associations to root length could be found using standard additive-model-based genome-wide association (GWA) approaches. However, as the broad-sense heritability for root length was significantly larger, and primarily due to epistasis, we also performed an epistatic GWA analysis to map loci contributing to the epistatic genetic variance. Four interacting pairs of loci were revealed, involving seven chromosomal loci that passed a standard multiple-testing corrected significance threshold. The genotype-phenotype maps for these pairs revealed epistasis that cancelled out the additive genetic variance, explaining why these loci were not detected in the additive GWA analysis. Small population sizes, such as in our experiment, increase the risk of identifying false epistatic interactions due to testing for associations with very large numbers of multi-marker genotypes in few phenotyped individuals. Therefore, we estimated the false-positive risk using a new statistical approach that suggested half of the associated pairs to be true positive associations. Our experimental evaluation of candidate genes within the seven associated loci suggests that this estimate is conservative; we identified functional candidate genes that affected root development in four loci that were part of three of the pairs. The statistical epistatic analyses were thus indispensable for confirming known, and identifying new, candidate genes for root length in this population of wild-collected A. thaliana accessions. We also illustrate how epistatic cancellation of the additive genetic variance explains the insignificant narrow-sense and significant broad-sense heritability by using a combination of careful statistical epistatic analyses and functional genetic experiments.

Characteristics of genomic signatures derived using univariate methods and mechanistically anchored functional descriptors for predicting drug- and xenobiotic-induced nephrotoxicity.

PubMed

Shi, Weiwei; Bugrim, Andrej; Nikolsky, Yuri; Nikolskya, Tatiana; Brennan, Richard J

2008-01-01

ABSTRACT The ideal toxicity biomarker is composed of the properties of prediction (is detected prior to traditional pathological signs of injury), accuracy (high sensitivity and specificity), and mechanistic relationships to the endpoint measured (biological relevance). Gene expression-based toxicity biomarkers ("signatures") have shown good predictive power and accuracy, but are difficult to interpret biologically. We have compared different statistical methods of feature selection with knowledge-based approaches, using GeneGo's database of canonical pathway maps, to generate gene sets for the classification of renal tubule toxicity. The gene set selection algorithms include four univariate analyses: t-statistics, fold-change, B-statistics, and RankProd, and their combination and overlap for the identification of differentially expressed probes. Enrichment analysis following the results of the four univariate analyses, Hotelling T-square test, and, finally out-of-bag selection, a variant of cross-validation, were used to identify canonical pathway maps-sets of genes coordinately involved in key biological processes-with classification power. Differentially expressed genes identified by the different statistical univariate analyses all generated reasonably performing classifiers of tubule toxicity. Maps identified by enrichment analysis or Hotelling T-square had lower classification power, but highlighted perturbed lipid homeostasis as a common discriminator of nephrotoxic treatments. The out-of-bag method yielded the best functionally integrated classifier. The map "ephrins signaling" performed comparably to a classifier derived using sparse linear programming, a machine learning algorithm, and represents a signaling network specifically involved in renal tubule development and integrity. Such functional descriptors of toxicity promise to better integrate predictive toxicogenomics with mechanistic analysis, facilitating the interpretation and risk assessment of predictive genomic investigations.
Evaluation of redundancy analysis to identify signatures of local adaptation.

PubMed

Capblancq, Thibaut; Luu, Keurcien; Blum, Michael G B; Bazin, Eric

2018-05-26

Ordination is a common tool in ecology that aims at representing complex biological information in a reduced space. In landscape genetics, ordination methods such as principal component analysis (PCA) have been used to detect adaptive variation based on genomic data. Taking advantage of environmental data in addition to genotype data, redundancy analysis (RDA) is another ordination approach that is useful to detect adaptive variation. This paper aims at proposing a test statistic based on RDA to search for loci under selection. We compare redundancy analysis to pcadapt, which is a nonconstrained ordination method, and to a latent factor mixed model (LFMM), which is a univariate genotype-environment association method. Individual-based simulations identify evolutionary scenarios where RDA genome scans have a greater statistical power than genome scans based on PCA. By constraining the analysis with environmental variables, RDA performs better than PCA in identifying adaptive variation when selection gradients are weakly correlated with population structure. Additionally, we show that if RDA and LFMM have a similar power to identify genetic markers associated with environmental variables, the RDA-based procedure has the advantage to identify the main selective gradients as a combination of environmental variables. To give a concrete illustration of RDA in population genomics, we apply this method to the detection of outliers and selective gradients on an SNP data set of Populus trichocarpa (Geraldes et al., 2013). The RDA-based approach identifies the main selective gradient contrasting southern and coastal populations to northern and continental populations in the northwestern American coast. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.
Statistical power analysis of cardiovascular safety pharmacology studies in conscious rats.

PubMed

Bhatt, Siddhartha; Li, Dingzhou; Flynn, Declan; Wisialowski, Todd; Hemkens, Michelle; Steidl-Nichols, Jill

2016-01-01

Cardiovascular (CV) toxicity and related attrition are a major challenge for novel therapeutic entities and identifying CV liability early is critical for effective derisking. CV safety pharmacology studies in rats are a valuable tool for early investigation of CV risk. Thorough understanding of data analysis techniques and statistical power of these studies is currently lacking and is imperative for enabling sound decision-making. Data from 24 crossover and 12 parallel design CV telemetry rat studies were used for statistical power calculations. Average values of telemetry parameters (heart rate, blood pressure, body temperature, and activity) were logged every 60s (from 1h predose to 24h post-dose) and reduced to 15min mean values. These data were subsequently binned into super intervals for statistical analysis. A repeated measure analysis of variance was used for statistical analysis of crossover studies and a repeated measure analysis of covariance was used for parallel studies. Statistical power analysis was performed to generate power curves and establish relationships between detectable CV (blood pressure and heart rate) changes and statistical power. Additionally, data from a crossover CV study with phentolamine at 4, 20 and 100mg/kg are reported as a representative example of data analysis methods. Phentolamine produced a CV profile characteristic of alpha adrenergic receptor antagonism, evidenced by a dose-dependent decrease in blood pressure and reflex tachycardia. Detectable blood pressure changes at 80% statistical power for crossover studies (n=8) were 4-5mmHg. For parallel studies (n=8), detectable changes at 80% power were 6-7mmHg. Detectable heart rate changes for both study designs were 20-22bpm. Based on our results, the conscious rat CV model is a sensitive tool to detect and mitigate CV risk in early safety studies. Furthermore, these results will enable informed selection of appropriate models and study design for early stage CV studies. Copyright © 2016 Elsevier Inc. All rights reserved.
Citation of previous meta-analyses on the same topic: a clue to perpetuation of incorrect methods?

PubMed

Li, Tianjing; Dickersin, Kay

2013-06-01

Systematic reviews and meta-analyses serve as a basis for decision-making and clinical practice guidelines and should be carried out using appropriate methodology to avoid incorrect inferences. We describe the characteristics, statistical methods used for meta-analyses, and citation patterns of all 21 glaucoma systematic reviews we identified pertaining to the effectiveness of prostaglandin analog eye drops in treating primary open-angle glaucoma, published between December 2000 and February 2012. We abstracted data, assessed whether appropriate statistical methods were applied in meta-analyses, and examined citation patterns of included reviews. We identified two forms of problematic statistical analyses in 9 of the 21 systematic reviews examined. Except in 1 case, none of the 9 reviews that used incorrect statistical methods cited a previously published review that used appropriate methods. Reviews that used incorrect methods were cited 2.6 times more often than reviews that used appropriate statistical methods. We speculate that by emulating the statistical methodology of previous systematic reviews, systematic review authors may have perpetuated incorrect approaches to meta-analysis. The use of incorrect statistical methods, perhaps through emulating methods described in previous research, calls conclusions of systematic reviews into question and may lead to inappropriate patient care. We urge systematic review authors and journal editors to seek the advice of experienced statisticians before undertaking or accepting for publication a systematic review and meta-analysis. The author(s) have no proprietary or commercial interest in any materials discussed in this article. Copyright © 2013 American Academy of Ophthalmology. Published by Elsevier Inc. All rights reserved.
Regression: The Apple Does Not Fall Far From the Tree.

PubMed

Vetter, Thomas R; Schober, Patrick

2018-05-15

Researchers and clinicians are frequently interested in either: (1) assessing whether there is a relationship or association between 2 or more variables and quantifying this association; or (2) determining whether 1 or more variables can predict another variable. The strength of such an association is mainly described by the correlation. However, regression analysis and regression models can be used not only to identify whether there is a significant relationship or association between variables but also to generate estimations of such a predictive relationship between variables. This basic statistical tutorial discusses the fundamental concepts and techniques related to the most common types of regression analysis and modeling, including simple linear regression, multiple regression, logistic regression, ordinal regression, and Poisson regression, as well as the common yet often underrecognized phenomenon of regression toward the mean. The various types of regression analysis are powerful statistical techniques, which when appropriately applied, can allow for the valid interpretation of complex, multifactorial data. Regression analysis and models can assess whether there is a relationship or association between 2 or more observed variables and estimate the strength of this association, as well as determine whether 1 or more variables can predict another variable. Regression is thus being applied more commonly in anesthesia, perioperative, critical care, and pain research. However, it is crucial to note that regression can identify plausible risk factors; it does not prove causation (a definitive cause and effect relationship). The results of a regression analysis instead identify independent (predictor) variable(s) associated with the dependent (outcome) variable. As with other statistical methods, applying regression requires that certain assumptions be met, which can be tested with specific diagnostics.
Which Industries Are Sensitive to Business Cycles?

ERIC Educational Resources Information Center

Berman, Jay; Pfleeger, Janet

1997-01-01

An analysis of the 1994-2005 Bureau of Labor Statistics employment projections can be used to identify industries that are projected to move differently with business cycles in the future than with those of the past, and can be used to identify the industries and occupations that are most prone to business cycle swings. (Author)
Atrial Electrogram Fractionation Distribution before and after Pulmonary Vein Isolation in Human Persistent Atrial Fibrillation-A Retrospective Multivariate Statistical Analysis.

PubMed

Almeida, Tiago P; Chu, Gavin S; Li, Xin; Dastagir, Nawshin; Tuan, Jiun H; Stafford, Peter J; Schlindwein, Fernando S; Ng, G André

2017-01-01

Purpose: Complex fractionated atrial electrograms (CFAE)-guided ablation after pulmonary vein isolation (PVI) has been used for persistent atrial fibrillation (persAF) therapy. This strategy has shown suboptimal outcomes due to, among other factors, undetected changes in the atrial tissue following PVI. In the present work, we investigate CFAE distribution before and after PVI in patients with persAF using a multivariate statistical model. Methods: 207 pairs of atrial electrograms (AEGs) were collected before and after PVI respectively, from corresponding LA regions in 18 persAF patients. Twelve attributes were measured from the AEGs, before and after PVI. Statistical models based on multivariate analysis of variance (MANOVA) and linear discriminant analysis (LDA) have been used to characterize the atrial regions and AEGs. Results: PVI significantly reduced CFAEs in the LA (70 vs. 40%; P < 0.0001). Four types of LA regions were identified, based on the AEGs characteristics: (i) fractionated before PVI that remained fractionated after PVI (31% of the collected points); (ii) fractionated that converted to normal (39%); (iii) normal prior to PVI that became fractionated (9%) and; (iv) normal that remained normal (21%). Individually, the attributes failed to distinguish these LA regions, but multivariate statistical models were effective in their discrimination ( P < 0.0001). Conclusion: Our results have unveiled that there are LA regions resistant to PVI, while others are affected by it. Although, traditional methods were unable to identify these different regions, the proposed multivariate statistical model discriminated LA regions resistant to PVI from those affected by it without prior ablation information.
Disorganization of white matter architecture in major depressive disorder: a meta-analysis of diffusion tensor imaging with tract-based spatial statistics.

PubMed

Chen, Guangxiang; Hu, Xinyu; Li, Lei; Huang, Xiaoqi; Lui, Su; Kuang, Weihong; Ai, Hua; Bi, Feng; Gu, Zhongwei; Gong, Qiyong

2016-02-24

White matter (WM) abnormalities have long been suspected in major depressive disorder (MDD). Tract-based spatial statistics (TBSS) studies have detected abnormalities in fractional anisotropy (FA) in MDD, but the available evidence has been inconsistent. We performed a quantitative meta-analysis of TBSS studies contrasting MDD patients with healthy control subjects (HCS). A total of 17 studies with 18 datasets that included 641 MDD patients and 581 HCS were identified. Anisotropic effect size-signed differential mapping (AES-SDM) meta-analysis was performed to assess FA alterations in MDD patients compared to HCS. FA reductions were identified in the genu of the corpus callosum (CC) extending to the body of the CC and left anterior limb of the internal capsule (ALIC) in MDD patients relative to HCS. Descriptive analysis of quartiles, sensitivity analysis and subgroup analysis further confirmed these findings. Meta-regression analysis revealed that individuals with more severe MDD were significantly more likely to have FA reductions in the genu of the CC. This study provides a thorough profile of WM abnormalities in MDD and evidence that interhemispheric connections and frontal-striatal-thalamic pathways are the most convergent circuits affected in MDD.
Analysis and interpretation of cost data in randomised controlled trials: review of published studies

PubMed Central

Barber, Julie A; Thompson, Simon G

1998-01-01

Objective To review critically the statistical methods used for health economic evaluations in randomised controlled trials where an estimate of cost is available for each patient in the study. Design Survey of published randomised trials including an economic evaluation with cost values suitable for statistical analysis; 45 such trials published in 1995 were identified from Medline. Main outcome measures The use of statistical methods for cost data was assessed in terms of the descriptive statistics reported, use of statistical inference, and whether the reported conclusions were justified. Results Although all 45 trials reviewed apparently had cost data for each patient, only 9 (20%) reported adequate measures of variability for these data and only 25 (56%) gave results of statistical tests or a measure of precision for the comparison of costs between the randomised groups. Only 16 (36%) of the articles gave conclusions which were justified on the basis of results presented in the paper. No paper reported sample size calculations for costs. Conclusions The analysis and interpretation of cost data from published trials reveal a lack of statistical awareness. Strong and potentially misleading conclusions about the relative costs of alternative therapies have often been reported in the absence of supporting statistical evidence. Improvements in the analysis and reporting of health economic assessments are urgently required. Health economic guidelines need to be revised to incorporate more detailed statistical advice. Key messagesHealth economic evaluations required for important healthcare policy decisions are often carried out in randomised controlled trialsA review of such published economic evaluations assessed whether statistical methods for cost outcomes have been appropriately used and interpretedFew publications presented adequate descriptive information for costs or performed appropriate statistical analysesIn at least two thirds of the papers, the main conclusions regarding costs were not justifiedThe analysis and reporting of health economic assessments within randomised controlled trials urgently need improving PMID:9794854
Underascertainment of Child Abuse Fatalities in France: Retrospective Analysis of Judicial Data to Assess Underreporting of Infant Homicides in Mortality Statistics

ERIC Educational Resources Information Center

Tursz, Anne; Crost, Monique; Gerbouin-Rerolle, Pascale; Cook, Jon M.

2010-01-01

Objectives: Test the hypothesis of an underestimation of infant homicides in mortality statistics in France; identify its causes; examine data from the judicial system and their contribution in correcting this underestimation. Methods: A retrospective, cross-sectional study was carried out in 26 courts in three regions of France of cases of infant…
Distributed lags time series analysis versus linear correlation analysis (Pearson's r) in identifying the relationship between antipseudomonal antibiotic consumption and the susceptibility of Pseudomonas aeruginosa isolates in a single Intensive Care Unit of a tertiary hospital.

PubMed

Erdeljić, Viktorija; Francetić, Igor; Bošnjak, Zrinka; Budimir, Ana; Kalenić, Smilja; Bielen, Luka; Makar-Aušperger, Ksenija; Likić, Robert

2011-05-01

The relationship between antibiotic consumption and selection of resistant strains has been studied mainly by employing conventional statistical methods. A time delay in effect must be anticipated and this has rarely been taken into account in previous studies. Therefore, distributed lags time series analysis and simple linear correlation were compared in their ability to evaluate this relationship. Data on monthly antibiotic consumption for ciprofloxacin, piperacillin/tazobactam, carbapenems and cefepime as well as Pseudomonas aeruginosa susceptibility were retrospectively collected for the period April 2006 to July 2007. Using distributed lags analysis, a significant temporal relationship was identified between ciprofloxacin, meropenem and cefepime consumption and the resistance rates of P. aeruginosa isolates to these antibiotics. This effect was lagged for ciprofloxacin and cefepime [1 month (R=0.827, P=0.039) and 2 months (R=0.962, P=0.001), respectively] and was simultaneous for meropenem (lag 0, R=0.876, P=0.002). Furthermore, a significant concomitant effect of meropenem consumption on the appearance of multidrug-resistant P. aeruginosa strains (resistant to three or more representatives of classes of antibiotics) was identified (lag 0, R=0.992, P<0.001). This effect was not delayed and it was therefore identified both by distributed lags analysis and the Pearson's correlation coefficient. Correlation coefficient analysis was not able to identify relationships between antibiotic consumption and bacterial resistance when the effect was delayed. These results indicate that the use of diverse statistical methods can yield significantly different results, thus leading to the introduction of possibly inappropriate infection control measures. Copyright © 2010 Elsevier B.V. and the International Society of Chemotherapy. All rights reserved.
Comparative genome analysis of a large Dutch Legionella pneumophila strain collection identifies five markers highly correlated with clinical strains

PubMed Central

2010-01-01

Background Discrimination between clinical and environmental strains within many bacterial species is currently underexplored. Genomic analyses have clearly shown the enormous variability in genome composition between different strains of a bacterial species. In this study we have used Legionella pneumophila, the causative agent of Legionnaire's disease, to search for genomic markers related to pathogenicity. During a large surveillance study in The Netherlands well-characterized patient-derived strains and environmental strains were collected. We have used a mixed-genome microarray to perform comparative-genome analysis of 257 strains from this collection. Results Microarray analysis indicated that 480 DNA markers (out of in total 3360 markers) showed clear variation in presence between individual strains and these were therefore selected for further analysis. Unsupervised statistical analysis of these markers showed the enormous genomic variation within the species but did not show any correlation with a pathogenic phenotype. We therefore used supervised statistical analysis to identify discriminating markers. Genetic programming was used both to identify predictive markers and to define their interrelationships. A model consisting of five markers was developed that together correctly predicted 100% of the clinical strains and 69% of the environmental strains. Conclusions A novel approach for identifying predictive markers enabling discrimination between clinical and environmental isolates of L. pneumophila is presented. Out of over 3000 possible markers, five were selected that together enabled correct prediction of all the clinical strains included in this study. This novel approach for identifying predictive markers can be applied to all bacterial species, allowing for better discrimination between strains well equipped to cause human disease and relatively harmless strains. PMID:20630115
Opportunities for Applied Behavior Analysis in the Total Quality Movement.

ERIC Educational Resources Information Center

Redmon, William K.

1992-01-01

This paper identifies critical components of recent organizational quality improvement programs and specifies how applied behavior analysis can contribute to quality technology. Statistical Process Control and Total Quality Management approaches are compared, and behavior analysts are urged to build their research base and market behavior change…
Detection of outliers in the response and explanatory variables of the simple circular regression model

NASA Astrophysics Data System (ADS)

Mahmood, Ehab A.; Rana, Sohel; Hussin, Abdul Ghapor; Midi, Habshah

2016-06-01

The circular regression model may contain one or more data points which appear to be peculiar or inconsistent with the main part of the model. This may be occur due to recording errors, sudden short events, sampling under abnormal conditions etc. The existence of these data points "outliers" in the data set cause lot of problems in the research results and the conclusions. Therefore, we should identify them before applying statistical analysis. In this article, we aim to propose a statistic to identify outliers in the both of the response and explanatory variables of the simple circular regression model. Our proposed statistic is robust circular distance RCDxy and it is justified by the three robust measurements such as proportion of detection outliers, masking and swamping rates.
Statistical Coupling Analysis-Guided Library Design for the Discovery of Mutant Luciferases.

PubMed

Liu, Mira D; Warner, Elliot A; Morrissey, Charlotte E; Fick, Caitlyn W; Wu, Taia S; Ornelas, Marya Y; Ochoa, Gabriela V; Zhang, Brendan S; Rathbun, Colin M; Porterfield, William B; Prescher, Jennifer A; Leconte, Aaron M

2018-02-06

Directed evolution has proven to be an invaluable tool for protein engineering; however, there is still a need for developing new approaches to continue to improve the efficiency and efficacy of these methods. Here, we demonstrate a new method for library design that applies a previously developed bioinformatic method, Statistical Coupling Analysis (SCA). SCA uses homologous enzymes to identify amino acid positions that are mutable and functionally important and engage in synergistic interactions between amino acids. We use SCA to guide a library of the protein luciferase and demonstrate that, in a single round of selection, we can identify luciferase mutants with several valuable properties. Specifically, we identify luciferase mutants that possess both red-shifted emission spectra and improved stability relative to those of the wild-type enzyme. We also identify luciferase mutants that possess a >50-fold change in specificity for modified luciferins. To understand the mutational origin of these improved mutants, we demonstrate the role of mutations at N229, S239, and G246 in altered function. These studies show that SCA can be used to guide library design and rapidly identify synergistic amino acid mutations from a small library.
Statistical analysis of Geopotential Height (GH) timeseries based on Tsallis non-extensive statistical mechanics

NASA Astrophysics Data System (ADS)

Karakatsanis, L. P.; Iliopoulos, A. C.; Pavlos, E. G.; Pavlos, G. P.

2018-02-01

In this paper, we perform statistical analysis of time series deriving from Earth's climate. The time series are concerned with Geopotential Height (GH) and correspond to temporal and spatial components of the global distribution of month average values, during the period (1948-2012). The analysis is based on Tsallis non-extensive statistical mechanics and in particular on the estimation of Tsallis' q-triplet, namely {qstat, qsens, qrel}, the reconstructed phase space and the estimation of correlation dimension and the Hurst exponent of rescaled range analysis (R/S). The deviation of Tsallis q-triplet from unity indicates non-Gaussian (Tsallis q-Gaussian) non-extensive character with heavy tails probability density functions (PDFs), multifractal behavior and long range dependences for all timeseries considered. Also noticeable differences of the q-triplet estimation found in the timeseries at distinct local or temporal regions. Moreover, in the reconstructive phase space revealed a lower-dimensional fractal set in the GH dynamical phase space (strong self-organization) and the estimation of Hurst exponent indicated multifractality, non-Gaussianity and persistence. The analysis is giving significant information identifying and characterizing the dynamical characteristics of the earth's climate.
Challenging nurse student selection policy: Using a lifeworld approach to explore the link between care experience and student values.

PubMed

Scammell, Janet; Tait, Desiree; White, Sara; Tait, Michael

2017-10-01

This study uses a lifeworld perspective to explore beginning students' values about nursing. Internationally, increasing care demand, a focus on targets and evidence of dehumanized care cultures have resulted in scrutiny of practitioner values. In England, selection policy dictates that prospective nursing students demonstrate person-centred values and care work experience. However, there is limited recent evidence exploring values at programme commencement or the effect of care experience on values. Mixed method study. A total of 161 undergraduate nursing students were recruited in 2013 from one English university. Thematic content analysis and frequency distribution to reveal descriptive statistics were used. Statistical analysis indicated that most of the values identified in student responses were not significantly affected by paid care experience. Five themes were identified: How I want care to be; Making a difference; The value of learning; Perceived characteristics of a nurse; and Respecting our humanity. Students readily drew on their experience of living to identify person-centred values about nursing.
Multivariate statistical analysis software technologies for astrophysical research involving large data bases

NASA Technical Reports Server (NTRS)

Djorgovski, George

1993-01-01

The existing and forthcoming data bases from NASA missions contain an abundance of information whose complexity cannot be efficiently tapped with simple statistical techniques. Powerful multivariate statistical methods already exist which can be used to harness much of the richness of these data. Automatic classification techniques have been developed to solve the problem of identifying known types of objects in multiparameter data sets, in addition to leading to the discovery of new physical phenomena and classes of objects. We propose an exploratory study and integration of promising techniques in the development of a general and modular classification/analysis system for very large data bases, which would enhance and optimize data management and the use of human research resource.
Statistical trends of episiotomy around the world: Comparative systematic review of changing practices.

PubMed

Clesse, Christophe; Lighezzolo-Alnot, Joëlle; De Lavergne, Sylvie; Hamlin, Sandrine; Scheffler, Michèle

2018-06-01

The authors' purpose for this article is to identify, review and interpret all publications about the episiotomy rates worldwide. Based on the criteria from the PRISMA guidelines, twenty databases were scrutinized. All studies which include national statistics related to episiotomy were selected, as well as studies presenting estimated data. Sixty-one papers were selected with publication dates between 1995 and 2016. A static and dynamic analysis of all the results was carried out. The assumption for the decline in the number of episiotomies is discussed and confirmed, recalling that nowadays high rates of episiotomy remain in less industrialized countries and East Asia. Finally, our analysis aims to investigate the potential determinants which influence apparent statistical disparities.
Multivariate statistical analysis software technologies for astrophysical research involving large data bases

NASA Technical Reports Server (NTRS)

Djorgovski, Stanislav

1992-01-01

The existing and forthcoming data bases from NASA missions contain an abundance of information whose complexity cannot be efficiently tapped with simple statistical techniques. Powerful multivariate statistical methods already exist which can be used to harness much of the richness of these data. Automatic classification techniques have been developed to solve the problem of identifying known types of objects in multi parameter data sets, in addition to leading to the discovery of new physical phenomena and classes of objects. We propose an exploratory study and integration of promising techniques in the development of a general and modular classification/analysis system for very large data bases, which would enhance and optimize data management and the use of human research resources.

Identification of Chinese plague foci from long-term epidemiological data

PubMed Central

Ben-Ari, Tamara; Neerinckx, Simon; Agier, Lydiane; Cazelles, Bernard; Xu, Lei; Zhang, Zhibin; Fang, Xiye; Wang, Shuchun; Liu, Qiyong; Stenseth, Nils C.

2012-01-01

Carrying out statistical analysis over an extensive dataset of human plague reports in Chinese villages from 1772 to 1964, we identified plague endemic territories in China (i.e., plague foci). Analyses rely on (i) a clustering method that groups time series based on their time-frequency resemblances and (ii) an ecological niche model that helps identify plague suitable territories characterized by value ranges for a set of predefined environmental variables. Results from both statistical tools indicate the existence of two disconnected plague territories corresponding to Northern and Southern China. Altogether, at least four well defined independent foci are identified. Their contours compare favorably with field observations. Potential and limitations of inferring plague foci and dynamics using epidemiological data is discussed. PMID:22570501
Statistical identification of stimulus-activated network nodes in multi-neuron voltage-sensitive dye optical recordings.

PubMed

Fathiazar, Elham; Anemuller, Jorn; Kretzberg, Jutta

2016-08-01

Voltage-Sensitive Dye (VSD) imaging is an optical imaging method that allows measuring the graded voltage changes of multiple neurons simultaneously. In neuroscience, this method is used to reveal networks of neurons involved in certain tasks. However, the recorded relative dye fluorescence changes are usually low and signals are superimposed by noise and artifacts. Therefore, establishing a reliable method to identify which cells are activated by specific stimulus conditions is the first step to identify functional networks. In this paper, we present a statistical method to identify stimulus-activated network nodes as cells, whose activities during sensory network stimulation differ significantly from the un-stimulated control condition. This method is demonstrated based on voltage-sensitive dye recordings from up to 100 neurons in a ganglion of the medicinal leech responding to tactile skin stimulation. Without relying on any prior physiological knowledge, the network nodes identified by our statistical analysis were found to match well with published cell types involved in tactile stimulus processing and to be consistent across stimulus conditions and preparations.
Identification of Major Histocompatibility Complex-Regulated Body Odorants by Statistical Analysis of a Comparative Gas Chromatography/Mass Spectrometry Experiment

DOE Office of Scientific and Technical Information (OSTI.GOV)

Willse, Alan R.; Belcher, Ann; Preti, George

2005-04-15

Gas chromatography (GC), combined with mass spectrometry (MS) detection, is a powerful analytical technique that can be used to separate, quantify, and identify volatile compounds in complex mixtures. This paper examines the application of GC-MS in a comparative experiment to identify volatiles that differ in concentration between two groups. A complex mixture might comprise several hundred or even thousands of volatile compounds. Because their number and location in a chromatogram generally are unknown, and because components overlap in populous chromatograms, the statistical problems offer significant challenges beyond traditional two-group screening procedures. We describe a statistical procedure to compare two-dimensional GC-MSmore » profiles between groups, which entails (1) signal processing: baseline correction and peak detection in single ion chromatograms; (2) aligning chromatograms in time; (3) normalizing differences in overall signal intensities; and (4) detecting chromatographic regions that differ between groups. Compared to existing approaches, the proposed method is robust to errors made at earlier stages of analysis, such as missed peaks or slightly misaligned chromatograms. To illustrate the method, we identify differences in GC-MS chromatograms of ether-extracted urine collected from two nearly identical inbred groups of mice, to investigate the relationship between odor and genetics of the major histocompatibility complex.« less
Risk management in inpatient units in the Czech Republic from the point of view of nurses in leadership positions.

PubMed

Prokešová, Radka; Brabcová, Iva; Pokojová, Radka; Bártlová, Sylva

2016-12-01

The goal of this study was to assess specific features of risk management from the point of view of nurses in leadership positions in inpatient units in Czech hospitals. The study was performed using a quantitative research strategy, i.e., a questionnaire. The data sample was analyzed using SPSS v. 23.0. Pearson's chi-square and analysis of adjusted residues were used for identifying the existence associations of nominal and/or ordinal quantities. 315 nurses in leadership positions working in inpatient units of Czech hospitals were included in the sample. The sample was created using random selection by means of quotas. Based on the study results, statistically significant relations between the respondents' education and the utilization of methods to identify risks were identified. Furthermore, statistically significant relationships were found between a nurse's functional role within the system and regular analysis and evaluation of risks and between the type of the healthcare facility and the degree of patient involvement in risk management. The study found statistically significant correlations that can be used to increase the effectiveness of risk management in inpatient units of Czech hospitals. From this perspective, the fact that patient involvement in risk management was only reported by 37.8% of respondents seems to be the most notable problem.
Assessment of statistical methods used in library-based approaches to microbial source tracking.

PubMed

Ritter, Kerry J; Carruthers, Ethan; Carson, C Andrew; Ellender, R D; Harwood, Valerie J; Kingsley, Kyle; Nakatsu, Cindy; Sadowsky, Michael; Shear, Brian; West, Brian; Whitlock, John E; Wiggins, Bruce A; Wilbur, Jayson D

2003-12-01

Several commonly used statistical methods for fingerprint identification in microbial source tracking (MST) were examined to assess the effectiveness of pattern-matching algorithms to correctly identify sources. Although numerous statistical methods have been employed for source identification, no widespread consensus exists as to which is most appropriate. A large-scale comparison of several MST methods, using identical fecal sources, presented a unique opportunity to assess the utility of several popular statistical methods. These included discriminant analysis, nearest neighbour analysis, maximum similarity and average similarity, along with several measures of distance or similarity. Threshold criteria for excluding uncertain or poorly matched isolates from final analysis were also examined for their ability to reduce false positives and increase prediction success. Six independent libraries used in the study were constructed from indicator bacteria isolated from fecal materials of humans, seagulls, cows and dogs. Three of these libraries were constructed using the rep-PCR technique and three relied on antibiotic resistance analysis (ARA). Five of the libraries were constructed using Escherichia coli and one using Enterococcus spp. (ARA). Overall, the outcome of this study suggests a high degree of variability across statistical methods. Despite large differences in correct classification rates among the statistical methods, no single statistical approach emerged as superior. Thresholds failed to consistently increase rates of correct classification and improvement was often associated with substantial effective sample size reduction. Recommendations are provided to aid in selecting appropriate analyses for these types of data.
Association of bladder sensation measures and bladder diary in patients with urinary incontinence.

PubMed

King, Ashley B; Wolters, Jeff P; Klausner, Adam P; Rapp, David E

2012-04-01

Investigation suggests the involvement of afferent actions in the pathophysiology of urinary incontinence. Current diagnostic modalities do not allow for the accurate identification of sensory dysfunction. We previously reported urodynamic derivatives that may be useful in assessing bladder sensation. We sought to further investigate these derivatives by assessing for a relationship with 3-day bladder diary. Subset analysis was performed in patients without stress urinary incontinence (SUI) attempting to isolate patients with urgency symptoms. No association was demonstrated between bladder diary parameters and urodynamic derivatives (r coefficient range (-0.06 to 0.08)(p > 0.05)). However, subset analysis demonstrated an association between detrusor overactivity (DO) and bladder urgency velocity (BUV), with a lower BUV identified in patients without DO. Subset analysis of patients with isolated urgency/urge incontinence identified weak associations between voiding frequency and FSR (r = 0.39) and between daily incontinence episodes and BUV (r = 0.35). However, these associations failed to demonstrate statistical significance. No statistical association was seen between bladder diary and urodynamic derivatives. This is not unexpected, given that bladder diary parameters may reflect numerous pathologies including not only sensory dysfunction but also SUI and DO. However, weak associations were identified in patients without SUI and, further, a statistical relationship between DO and BUV was seen. Additional research is needed to assess the utility of FSR/BUV in characterizing sensory dysfunction, especially in patients without concurrent pathology (e.g. SUI, DO).
Identification of Chemical Attribution Signatures of Fentanyl Syntheses Using Multivariate Statistical Analysis of Orthogonal Analytical Data

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mayer, B. P.; Mew, D. A.; DeHope, A.

Attribution of the origin of an illicit drug relies on identification of compounds indicative of its clandestine production and is a key component of many modern forensic investigations. The results of these studies can yield detailed information on method of manufacture, starting material source, and final product - all critical forensic evidence. In the present work, chemical attribution signatures (CAS) associated with the synthesis of the analgesic fentanyl, N-(1-phenylethylpiperidin-4-yl)-N-phenylpropanamide, were investigated. Six synthesis methods, all previously published fentanyl synthetic routes or hybrid versions thereof, were studied in an effort to identify and classify route-specific signatures. 160 distinct compounds and inorganicmore » species were identified using gas and liquid chromatographies combined with mass spectrometric methods (GC-MS and LCMS/ MS-TOF) in conjunction with inductively coupled plasma mass spectrometry (ICPMS). The complexity of the resultant data matrix urged the use of multivariate statistical analysis. Using partial least squares discriminant analysis (PLS-DA), 87 route-specific CAS were classified and a statistical model capable of predicting the method of fentanyl synthesis was validated and tested against CAS profiles from crude fentanyl products deposited and later extracted from two operationally relevant surfaces: stainless steel and vinyl tile. This work provides the most detailed fentanyl CAS investigation to date by using orthogonal mass spectral data to identify CAS of forensic significance for illicit drug detection, profiling, and attribution.« less
Identifying the impact of social determinants of health on disease rates using correlation analysis of area-based summary information.

PubMed

Song, Ruiguang; Hall, H Irene; Harrison, Kathleen McDavid; Sharpe, Tanya Telfair; Lin, Lillian S; Dean, Hazel D

2011-01-01

We developed a statistical tool that brings together standard, accessible, and well-understood analytic approaches and uses area-based information and other publicly available data to identify social determinants of health (SDH) that significantly affect the morbidity of a specific disease. We specified AIDS as the disease of interest and used data from the American Community Survey and the National HIV Surveillance System. Morbidity and socioeconomic variables in the two data systems were linked through geographic areas that can be identified in both systems. Correlation and partial correlation coefficients were used to measure the impact of socioeconomic factors on AIDS diagnosis rates in certain geographic areas. We developed an easily explained approach that can be used by a data analyst with access to publicly available datasets and standard statistical software to identify the impact of SDH. We found that the AIDS diagnosis rate was highly correlated with the distribution of race/ethnicity, population density, and marital status in an area. The impact of poverty, education level, and unemployment depended on other SDH variables. Area-based measures of socioeconomic variables can be used to identify risk factors associated with a disease of interest. When correlation analysis is used to identify risk factors, potential confounding from other variables must be taken into account.
Support Provided to the External Tank (ET) Project on the Use of Statistical Analysis for ET Certification Consultation Position Paper

NASA Technical Reports Server (NTRS)

Null, Cynthia H.

2009-01-01

In June 2004, the June Space Flight Leadership Council (SFLC) assigned an action to the NASA Engineering and Safety Center (NESC) and External Tank (ET) project jointly to characterize the available dataset [of defect sizes from dissections of foam], identify resultant limitations to statistical treatment of ET as-built foam as part of the overall thermal protection system (TPS) certification, and report to the Program Requirements Change Board (PRCB) and SFLC in September 2004. The NESC statistics team was formed to assist the ET statistics group in August 2004. The NESC's conclusions are presented in this report.
A bibliometric analysis of statistical terms used in American Physical Therapy Association journals (2011-2012): evidence for educating physical therapists.

PubMed

Tilson, Julie K; Marshall, Katie; Tam, Jodi J; Fetters, Linda

2016-04-22

A primary barrier to the implementation of evidence based practice (EBP) in physical therapy is therapists' limited ability to understand and interpret statistics. Physical therapists demonstrate limited skills and report low self-efficacy for interpreting results of statistical procedures. While standards for physical therapist education include statistics, little empirical evidence is available to inform what should constitute such curricula. The purpose of this study was to conduct a census of the statistical terms and study designs used in physical therapy literature and to use the results to make recommendations for curricular development in physical therapist education. We conducted a bibliometric analysis of 14 peer-reviewed journals associated with the American Physical Therapy Association over 12 months (Oct 2011-Sept 2012). Trained raters recorded every statistical term appearing in identified systematic reviews, primary research reports, and case series and case reports. Investigator-reported study design was also recorded. Terms representing the same statistical test or concept were combined into a single, representative term. Cumulative percentage was used to identify the most common representative statistical terms. Common representative terms were organized into eight categories to inform curricular design. Of 485 articles reviewed, 391 met the inclusion criteria. These 391 articles used 532 different terms which were combined into 321 representative terms; 13.1 (sd = 8.0) terms per article. Eighty-one representative terms constituted 90% of all representative term occurrences. Of the remaining 240 representative terms, 105 (44%) were used in only one article. The most common study design was prospective cohort (32.5%). Physical therapy literature contains a large number of statistical terms and concepts for readers to navigate. However, in the year sampled, 81 representative terms accounted for 90% of all occurrences. These "common representative terms" can be used to inform curricula to promote physical therapists' skills, competency, and confidence in interpreting statistics in their professional literature. We make specific recommendations for curriculum development informed by our findings.
Analysis of Exhaled Breath Volatile Organic Compounds in Inflammatory Bowel Disease: A Pilot Study.

PubMed

Hicks, Lucy C; Huang, Juzheng; Kumar, Sacheen; Powles, Sam T; Orchard, Timothy R; Hanna, George B; Williams, Horace R T

2015-09-01

Distinguishing between the inflammatory bowel diseases [IBD], Crohn's disease [CD] and ulcerative colitis [UC], is important for determining management and prognosis. Selected ion flow tube mass spectrometry [SIFT-MS] may be used to analyse volatile organic compounds [VOCs] in exhaled breath: these may be altered in disease states, and distinguishing breath VOC profiles can be identified. The aim of this pilot study was to identify, quantify, and analyse VOCs present in the breath of IBD patients and controls, potentially providing insights into disease pathogenesis and complementing current diagnostic algorithms. SIFT-MS breath profiling of 56 individuals [20 UC, 18 CD, and 18 healthy controls] was undertaken. Multivariate analysis included principal components analysis and partial least squares discriminant analysis with orthogonal signal correction [OSC-PLS-DA]. Receiver operating characteristic [ROC] analysis was performed for each comparative analysis using statistically significant VOCs. OSC-PLS-DA modelling was able to distinguish both CD and UC from healthy controls and from one other with good sensitivity and specificity. ROC analysis using combinations of statistically significant VOCs [dimethyl sulphide, hydrogen sulphide, hydrogen cyanide, ammonia, butanal, and nonanal] gave integrated areas under the curve of 0.86 [CD vs healthy controls], 0.74 [UC vs healthy controls], and 0.83 [CD vs UC]. Exhaled breath VOC profiling was able to distinguish IBD patients from controls, as well as to separate UC from CD, using both multivariate and univariate statistical techniques. Copyright © 2015 European Crohn’s and Colitis Organisation (ECCO). Published by Oxford University Press. All rights reserved. For permissions, please email: journals.permissions@oup.com.
Specialized data analysis of SSME and advanced propulsion system vibration measurements

NASA Technical Reports Server (NTRS)

Coffin, Thomas; Swanson, Wayne L.; Jong, Yen-Yi

1993-01-01

The basic objectives of this contract were to perform detailed analysis and evaluation of dynamic data obtained during Space Shuttle Main Engine (SSME) test and flight operations, including analytical/statistical assessment of component dynamic performance, and to continue the development and implementation of analytical/statistical models to effectively define nominal component dynamic characteristics, detect anomalous behavior, and assess machinery operational conditions. This study was to provide timely assessment of engine component operational status, identify probable causes of malfunction, and define feasible engineering solutions. The work was performed under three broad tasks: (1) Analysis, Evaluation, and Documentation of SSME Dynamic Test Results; (2) Data Base and Analytical Model Development and Application; and (3) Development and Application of Vibration Signature Analysis Techniques.
DNA analysis in Disaster Victim Identification.

PubMed

Montelius, Kerstin; Lindblom, Bertil

2012-06-01

DNA profiling and matching is one of the primary methods to identify missing persons in a disaster, as defined by the Interpol Disaster Victim Identification Guide. The process to identify a victim by DNA includes: the collection of the best possible ante-mortem (AM) samples, the choice of post-mortem (PM) samples, DNA-analysis, matching and statistical weighting of the genetic relationship or match. Each disaster has its own scenario, and each scenario defines its own methods for identification of the deceased.
Synthesizing Single-Case Research to Identify Evidence-Based Practices: Some Brief Reflections

ERIC Educational Resources Information Center

Horner, Robert H.; Kratochwill, Thomas R.

2012-01-01

The purposes of this paper are to (a) propose an operational standard for defining a "practice," (b) encourage development of professional standards for visual and statistical analysis of single-case research, and (c) propose a standard for using single-case research results to identify practices that are "evidence-based." These topics are not new…
PROMISE: a tool to identify genomic features with a specific biologically interesting pattern of associations with multiple endpoint variables

PubMed Central

Pounds, Stan; Cheng, Cheng; Cao, Xueyuan; Crews, Kristine R.; Plunkett, William; Gandhi, Varsha; Rubnitz, Jeffrey; Ribeiro, Raul C.; Downing, James R.; Lamba, Jatinder

2009-01-01

Motivation: In some applications, prior biological knowledge can be used to define a specific pattern of association of multiple endpoint variables with a genomic variable that is biologically most interesting. However, to our knowledge, there is no statistical procedure designed to detect specific patterns of association with multiple endpoint variables. Results: Projection onto the most interesting statistical evidence (PROMISE) is proposed as a general procedure to identify genomic variables that exhibit a specific biologically interesting pattern of association with multiple endpoint variables. Biological knowledge of the endpoint variables is used to define a vector that represents the biologically most interesting values for statistics that characterize the associations of the endpoint variables with a genomic variable. A test statistic is defined as the dot-product of the vector of the observed association statistics and the vector of the most interesting values of the association statistics. By definition, this test statistic is proportional to the length of the projection of the observed vector of correlations onto the vector of most interesting associations. Statistical significance is determined via permutation. In simulation studies and an example application, PROMISE shows greater statistical power to identify genes with the interesting pattern of associations than classical multivariate procedures, individual endpoint analyses or listing genes that have the pattern of interest and are significant in more than one individual endpoint analysis. Availability: Documented R routines are freely available from www.stjuderesearch.org/depts/biostats and will soon be available as a Bioconductor package from www.bioconductor.org. Contact: stanley.pounds@stjude.org Supplementary information: Supplementary data are available at Bioinformatics online. PMID:19528086
Model Performance Evaluation and Scenario Analysis ...

EPA Pesticide Factsheets

This tool consists of two parts: model performance evaluation and scenario analysis (MPESA). The model performance evaluation consists of two components: model performance evaluation metrics and model diagnostics. These metrics provides modelers with statistical goodness-of-fit measures that capture magnitude only, sequence only, and combined magnitude and sequence errors. The performance measures include error analysis, coefficient of determination, Nash-Sutcliffe efficiency, and a new weighted rank method. These performance metrics only provide useful information about the overall model performance. Note that MPESA is based on the separation of observed and simulated time series into magnitude and sequence components. The separation of time series into magnitude and sequence components and the reconstruction back to time series provides diagnostic insights to modelers. For example, traditional approaches lack the capability to identify if the source of uncertainty in the simulated data is due to the quality of the input data or the way the analyst adjusted the model parameters. This report presents a suite of model diagnostics that identify if mismatches between observed and simulated data result from magnitude or sequence related errors. MPESA offers graphical and statistical options that allow HSPF users to compare observed and simulated time series and identify the parameter values to adjust or the input data to modify. The scenario analysis part of the too
An efficient approach to identify different chemical markers between fibrous root and rhizome of Anemarrhena asphodeloides by ultra high-performance liquid chromatography quadrupole time-of-flight tandem mass spectrometry with multivariate statistical analysis.

PubMed

Wang, Fang-Xu; Yuan, Jian-Chao; Kang, Li-Ping; Pang, Xu; Yan, Ren-Yi; Zhao, Yang; Zhang, Jie; Sun, Xin-Guang; Ma, Bai-Ping

2016-09-10

An ultra high-performance liquid chromatography quadrupole time-of-flight tandem mass spectrometry approach coupled with multivariate statistical analysis was established and applied to rapidly distinguish the chemical differences between fibrous root and rhizome of Anemarrhena asphodeloides. The datasets of tR-m/z pairs, ion intensity and sample code were processed by principal component analysis and orthogonal partial least squares discriminant analysis. Chemical markers could be identified based on their exact mass data, fragmentation characteristics, and retention times. And the new compounds among chemical markers could be isolated rapidly guided by the ultra high-performance liquid chromatography quadrupole time-of-flight tandem mass spectrometry and their definitive structures would be further elucidated by NMR spectra. Using this approach, twenty-four markers were identified on line including nine new saponins and five new steroidal saponins of them were obtained in pure form. The study validated this proposed approach as a suitable method for identification of the chemical differences between various medicinal parts in order to expand medicinal parts and increase the utilization rate of resources. Copyright © 2016 Elsevier B.V. All rights reserved.
Compositional Solution Space Quantification for Probabilistic Software Analysis

NASA Technical Reports Server (NTRS)

Borges, Mateus; Pasareanu, Corina S.; Filieri, Antonio; d'Amorim, Marcelo; Visser, Willem

2014-01-01

Probabilistic software analysis aims at quantifying how likely a target event is to occur during program execution. Current approaches rely on symbolic execution to identify the conditions to reach the target event and try to quantify the fraction of the input domain satisfying these conditions. Precise quantification is usually limited to linear constraints, while only approximate solutions can be provided in general through statistical approaches. However, statistical approaches may fail to converge to an acceptable accuracy within a reasonable time. We present a compositional statistical approach for the efficient quantification of solution spaces for arbitrarily complex constraints over bounded floating-point domains. The approach leverages interval constraint propagation to improve the accuracy of the estimation by focusing the sampling on the regions of the input domain containing the sought solutions. Preliminary experiments show significant improvement on previous approaches both in results accuracy and analysis time.
Pathway-GPS and SIGORA: identifying relevant pathways based on the over-representation of their gene-pair signatures

PubMed Central

Foroushani, Amir B.K.; Brinkman, Fiona S.L.

2013-01-01

Motivation. Predominant pathway analysis approaches treat pathways as collections of individual genes and consider all pathway members as equally informative. As a result, at times spurious and misleading pathways are inappropriately identified as statistically significant, solely due to components that they share with the more relevant pathways. Results. We introduce the concept of Pathway Gene-Pair Signatures (Pathway-GPS) as pairs of genes that, as a combination, are specific to a single pathway. We devised and implemented a novel approach to pathway analysis, Signature Over-representation Analysis (SIGORA), which focuses on the statistically significant enrichment of Pathway-GPS in a user-specified gene list of interest. In a comparative evaluation of several published datasets, SIGORA outperformed traditional methods by delivering biologically more plausible and relevant results. Availability. An efficient implementation of SIGORA, as an R package with precompiled GPS data for several human and mouse pathway repositories is available for download from http://sigora.googlecode.com/svn/. PMID:24432194
A systematic review of statistical methods used to test for reliability of medical instruments measuring continuous variables.

PubMed

Zaki, Rafdzah; Bulgiba, Awang; Nordin, Noorhaire; Azina Ismail, Noor

2013-06-01

Reliability measures precision or the extent to which test results can be replicated. This is the first ever systematic review to identify statistical methods used to measure reliability of equipment measuring continuous variables. This studyalso aims to highlight the inappropriate statistical method used in the reliability analysis and its implication in the medical practice. In 2010, five electronic databases were searched between 2007 and 2009 to look for reliability studies. A total of 5,795 titles were initially identified. Only 282 titles were potentially related, and finally 42 fitted the inclusion criteria. The Intra-class Correlation Coefficient (ICC) is the most popular method with 25 (60%) studies having used this method followed by the comparing means (8 or 19%). Out of 25 studies using the ICC, only 7 (28%) reported the confidence intervals and types of ICC used. Most studies (71%) also tested the agreement of instruments. This study finds that the Intra-class Correlation Coefficient is the most popular method used to assess the reliability of medical instruments measuring continuous outcomes. There are also inappropriate applications and interpretations of statistical methods in some studies. It is important for medical researchers to be aware of this issue, and be able to correctly perform analysis in reliability studies.

Multivariate Statistical Analysis of Orthogonal Mass Spectral Data for the Identification of Chemical Attribution Signatures of 3-Methylfentanyl

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mayer, B. P.; Valdez, C. A.; DeHope, A. J.

Critical to many modern forensic investigations is the chemical attribution of the origin of an illegal drug. This process greatly relies on identification of compounds indicative of its clandestine or commercial production. The results of these studies can yield detailed information on method of manufacture, sophistication of the synthesis operation, starting material source, and final product. In the present work, chemical attribution signatures (CAS) associated with the synthesis of the analgesic 3- methylfentanyl, N-(3-methyl-1-phenethylpiperidin-4-yl)-N-phenylpropanamide, were investigated. Six synthesis methods were studied in an effort to identify and classify route-specific signatures. These methods were chosen to minimize the use of scheduledmore » precursors, complicated laboratory equipment, number of overall steps, and demanding reaction conditions. Using gas and liquid chromatographies combined with mass spectrometric methods (GC-QTOF and LC-QTOF) in conjunction with inductivelycoupled plasma mass spectrometry (ICP-MS), over 240 distinct compounds and elements were monitored. As seen in our previous work with CAS of fentanyl synthesis the complexity of the resultant data matrix necessitated the use of multivariate statistical analysis. Using partial least squares discriminant analysis (PLS-DA), 62 statistically significant, route-specific CAS were identified. Statistical classification models using a variety of machine learning techniques were then developed with the ability to predict the method of 3-methylfentanyl synthesis from three blind crude samples generated by synthetic chemists without prior experience with these methods.« less
Statistical testing and power analysis for brain-wide association study.

PubMed

Gong, Weikang; Wan, Lin; Lu, Wenlian; Ma, Liang; Cheng, Fan; Cheng, Wei; Grünewald, Stefan; Feng, Jianfeng

2018-04-05

The identification of connexel-wise associations, which involves examining functional connectivities between pairwise voxels across the whole brain, is both statistically and computationally challenging. Although such a connexel-wise methodology has recently been adopted by brain-wide association studies (BWAS) to identify connectivity changes in several mental disorders, such as schizophrenia, autism and depression, the multiple correction and power analysis methods designed specifically for connexel-wise analysis are still lacking. Therefore, we herein report the development of a rigorous statistical framework for connexel-wise significance testing based on the Gaussian random field theory. It includes controlling the family-wise error rate (FWER) of multiple hypothesis testings using topological inference methods, and calculating power and sample size for a connexel-wise study. Our theoretical framework can control the false-positive rate accurately, as validated empirically using two resting-state fMRI datasets. Compared with Bonferroni correction and false discovery rate (FDR), it can reduce false-positive rate and increase statistical power by appropriately utilizing the spatial information of fMRI data. Importantly, our method bypasses the need of non-parametric permutation to correct for multiple comparison, thus, it can efficiently tackle large datasets with high resolution fMRI images. The utility of our method is shown in a case-control study. Our approach can identify altered functional connectivities in a major depression disorder dataset, whereas existing methods fail. A software package is available at https://github.com/weikanggong/BWAS. Copyright © 2018 Elsevier B.V. All rights reserved.
Technical Note: The Initial Stages of Statistical Data Analysis

PubMed Central

Tandy, Richard D.

1998-01-01

Objective: To provide an overview of several important data-related considerations in the design stage of a research project and to review the levels of measurement and their relationship to the statistical technique chosen for the data analysis. Background: When planning a study, the researcher must clearly define the research problem and narrow it down to specific, testable questions. The next steps are to identify the variables in the study, decide how to group and treat subjects, and determine how to measure, and the underlying level of measurement of, the dependent variables. Then the appropriate statistical technique can be selected for data analysis. Description: The four levels of measurement in increasing complexity are nominal, ordinal, interval, and ratio. Nominal data are categorical or “count” data, and the numbers are treated as labels. Ordinal data can be ranked in a meaningful order by magnitude. Interval data possess the characteristics of ordinal data and also have equal distances between levels. Ratio data have a natural zero point. Nominal and ordinal data are analyzed with nonparametric statistical techniques and interval and ratio data with parametric statistical techniques. Advantages: Understanding the four levels of measurement and when it is appropriate to use each is important in determining which statistical technique to use when analyzing data. PMID:16558489
Identification of Intensity Ratio Break Points from Photon Arrival Trajectories in Ratiometric Single Molecule Spectroscopy

PubMed Central

Bingemann, Dieter; Allen, Rachel M.

2012-01-01

We describe a statistical method to analyze dual-channel photon arrival trajectories from single molecule spectroscopy model-free to identify break points in the intensity ratio. Photons are binned with a short bin size to calculate the logarithm of the intensity ratio for each bin. Stochastic photon counting noise leads to a near-normal distribution of this logarithm and the standard student t-test is used to find statistically significant changes in this quantity. In stochastic simulations we determine the significance threshold for the t-test’s p-value at a given level of confidence. We test the method’s sensitivity and accuracy indicating that the analysis reliably locates break points with significant changes in the intensity ratio with little or no error in realistic trajectories with large numbers of small change points, while still identifying a large fraction of the frequent break points with small intensity changes. Based on these results we present an approach to estimate confidence intervals for the identified break point locations and recommend a bin size to choose for the analysis. The method proves powerful and reliable in the analysis of simulated and actual data of single molecule reorientation in a glassy matrix. PMID:22837704
Managing Complexity in Evidence Analysis: A Worked Example in Pediatric Weight Management.

PubMed

Parrott, James Scott; Henry, Beverly; Thompson, Kyle L; Ziegler, Jane; Handu, Deepa

2018-05-02

Nutrition interventions are often complex and multicomponent. Typical approaches to meta-analyses that focus on individual causal relationships to provide guideline recommendations are not sufficient to capture this complexity. The objective of this study is to describe the method of meta-analysis used for the Pediatric Weight Management (PWM) Guidelines update and provide a worked example that can be applied in other areas of dietetics practice. The effects of PWM interventions were examined for body mass index (BMI), body mass index z-score (BMIZ), and waist circumference at four different time periods. For intervention-level effects, intervention types were identified empirically using multiple correspondence analysis paired with cluster analysis. Pooled effects of identified types were examined using random effects meta-analysis models. Differences in effects among types were examined using meta-regression. Context-level effects are examined using qualitative comparative analysis. Three distinct types (or families) of PWM interventions were identified: medical nutrition, behavioral, and missing components. Medical nutrition and behavioral types showed statistically significant improvements in BMIZ across all time points. Results were less consistent for BMI and waist circumference, although four distinct patterns of weight status change were identified. These varied by intervention type as well as outcome measure. Meta-regression indicated statistically significant differences between the medical nutrition and behavioral types vs the missing component type for both BMIZ and BMI, although the pattern varied by time period and intervention type. Qualitative comparative analysis identified distinct configurations of context characteristics at each time point that were consistent with positive outcomes among the intervention types. Although analysis of individual causal relationships is invaluable, this approach is inadequate to capture the complexity of dietetics practice. An alternative approach that integrates intervention-level with context-level meta-analyses may provide deeper understanding in the development of practice guidelines. Copyright © 2018 Academy of Nutrition and Dietetics. Published by Elsevier Inc. All rights reserved.
Meta-analysis of gene-level associations for rare variants based on single-variant statistics.

PubMed

Hu, Yi-Juan; Berndt, Sonja I; Gustafsson, Stefan; Ganna, Andrea; Hirschhorn, Joel; North, Kari E; Ingelsson, Erik; Lin, Dan-Yu

2013-08-08

Meta-analysis of genome-wide association studies (GWASs) has led to the discoveries of many common variants associated with complex human diseases. There is a growing recognition that identifying "causal" rare variants also requires large-scale meta-analysis. The fact that association tests with rare variants are performed at the gene level rather than at the variant level poses unprecedented challenges in the meta-analysis. First, different studies may adopt different gene-level tests, so the results are not compatible. Second, gene-level tests require multivariate statistics (i.e., components of the test statistic and their covariance matrix), which are difficult to obtain. To overcome these challenges, we propose to perform gene-level tests for rare variants by combining the results of single-variant analysis (i.e., p values of association tests and effect estimates) from participating studies. This simple strategy is possible because of an insight that multivariate statistics can be recovered from single-variant statistics, together with the correlation matrix of the single-variant test statistics, which can be estimated from one of the participating studies or from a publicly available database. We show both theoretically and numerically that the proposed meta-analysis approach provides accurate control of the type I error and is as powerful as joint analysis of individual participant data. This approach accommodates any disease phenotype and any study design and produces all commonly used gene-level tests. An application to the GWAS summary results of the Genetic Investigation of ANthropometric Traits (GIANT) consortium reveals rare and low-frequency variants associated with human height. The relevant software is freely available. Copyright © 2013 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.
Finding Groups Using Model-Based Cluster Analysis: Heterogeneous Emotional Self-Regulatory Processes and Heavy Alcohol Use Risk

ERIC Educational Resources Information Center

Mun, Eun Young; von Eye, Alexander; Bates, Marsha E.; Vaschillo, Evgeny G.

2008-01-01

Model-based cluster analysis is a new clustering procedure to investigate population heterogeneity utilizing finite mixture multivariate normal densities. It is an inferentially based, statistically principled procedure that allows comparison of nonnested models using the Bayesian information criterion to compare multiple models and identify the…
Using Rasch Analysis to Identify Uncharacteristic Responses to Undergraduate Assessments

ERIC Educational Resources Information Center

Edwards, Antony; Alcock, Lara

2010-01-01

Rasch Analysis is a statistical technique that is commonly used to analyse both test data and Likert survey data, to construct and evaluate question item banks, and to evaluate change in longitudinal studies. In this article, we introduce the dichotomous Rasch model, briefly discussing its assumptions. Then, using data collected in an…
Introducing Undergraduate Students to Metabolomics Using a NMR-Based Analysis of Coffee Beans

ERIC Educational Resources Information Center

Sandusky, Peter Olaf

2017-01-01

Metabolomics applies multivariate statistical analysis to sets of high-resolution spectra taken over a population of biologically derived samples. The objective is to distinguish subpopulations within the overall sample population, and possibly also to identify biomarkers. While metabolomics has become part of the standard analytical toolbox in…
A Meta-Analysis of Writing Instruction for Students in the Elementary Grades

ERIC Educational Resources Information Center

Graham, Steve; McKeown, Debra; Kiuhara, Sharlene; Harris, Karen R.

2012-01-01

In an effort to identify effective instructional practices for teaching writing to elementary grade students, we conducted a meta-analysis of the writing intervention literature, focusing our efforts on true and quasi-experiments. We located 115 documents that included the statistics for computing an effect size (ES). We calculated an average…
Development of Consistency between Marketing and Planning.

ERIC Educational Resources Information Center

Williford, A. Michael

1986-01-01

Examined descriptive information about marketing, enrollment management, institutional planning and factors affecting them. A factor analysis of statistically appropriate variables identified factors associated with a state of symbiosis between marketing and institutional planning. (Author/BL)
Multivariate analysis for stormwater quality characteristics identification from different urban surface types in macau.

PubMed

Huang, J; Du, P; Ao, C; Ho, M; Lei, M; Zhao, D; Wang, Z

2007-12-01

Statistical analysis of stormwater runoff data enables general identification of runoff characteristics. Six catchments with different urban surface type including roofs, roadway, park, and residential/commercial in Macau were selected for sampling and study during the period from June 2005 to September 2006. Based on univariate statistical analysis of data sampled, major pollutants discharged from different urban surface type were identified. As for iron roof runoff, Zn is the most significant pollutant. The major pollutants from urban roadway runoff are TSS and COD. Stormwater runoff from commercial/residential and Park catchments show high level of COD, TN, and TP concentration. Principal component analysis was further done for identification of linkages between stormwater quality and urban surface types. Two potential pollution sources were identified for study catchments with different urban surface types. The first one is referred as nutrients losses, soil losses and organic pollutants discharges, the second is related to heavy metals losses. PCA was proved to be a viable tool to explain the type of pollution sources and its mechanism for different urban surface type catchments.
Meta-Analysis of Placental Transcriptome Data Identifies a Novel Molecular Pathway Related to Preeclampsia.

PubMed

van Uitert, Miranda; Moerland, Perry D; Enquobahrie, Daniel A; Laivuori, Hannele; van der Post, Joris A M; Ris-Stalpers, Carrie; Afink, Gijs B

2015-01-01

Studies using the placental transcriptome to identify key molecules relevant for preeclampsia are hampered by a relatively small sample size. In addition, they use a variety of bioinformatics and statistical methods, making comparison of findings challenging. To generate a more robust preeclampsia gene expression signature, we performed a meta-analysis on the original data of 11 placenta RNA microarray experiments, representing 139 normotensive and 116 preeclamptic pregnancies. Microarray data were pre-processed and analyzed using standardized bioinformatics and statistical procedures and the effect sizes were combined using an inverse-variance random-effects model. Interactions between genes in the resulting gene expression signature were identified by pathway analysis (Ingenuity Pathway Analysis, Gene Set Enrichment Analysis, Graphite) and protein-protein associations (STRING). This approach has resulted in a comprehensive list of differentially expressed genes that led to a 388-gene meta-signature of preeclamptic placenta. Pathway analysis highlights the involvement of the previously identified hypoxia/HIF1A pathway in the establishment of the preeclamptic gene expression profile, while analysis of protein interaction networks indicates CREBBP/EP300 as a novel element central to the preeclamptic placental transcriptome. In addition, there is an apparent high incidence of preeclampsia in women carrying a child with a mutation in CREBBP/EP300 (Rubinstein-Taybi Syndrome). The 388-gene preeclampsia meta-signature offers a vital starting point for further studies into the relevance of these genes (in particular CREBBP/EP300) and their concomitant pathways as biomarkers or functional molecules in preeclampsia. This will result in a better understanding of the molecular basis of this disease and opens up the opportunity to develop rational therapies targeting the placental dysfunction causal to preeclampsia.
Comparison of untreated adolescent idiopathic scoliosis with normal controls: a review and statistical analysis of the literature.

PubMed

Rushton, Paul R P; Grevitt, Michael P

2013-04-20

Review and statistical analysis of studies evaluating health-related quality of life (HRQOL) in adolescents with untreated adolescent idiopathic scoliosis (AIS) using Scoliosis Research Society (SRS) outcomes. To apply normative values and minimum clinical important differences for the SRS-22r to the literature. Identify whether the HRQOL of adolescents with untreated AIS differs from unaffected peers and whether any differences are clinically relevant. The effect of untreated AIS on adolescent HRQOL is uncertain. The lack of published normative values and minimum clinical important difference for the SRS-22r has so far hindered our interpretation of previous studies. The publication of this background data allows these studies to be re-examined. Using suitable inclusion criteria, a literature search identified studies examining HRQOL in untreated adolescents with AIS. Each cohort was analyzed individually. Statistically significant differences were identified by using 95% confidence intervals for the difference in SRS-22r domain mean scores between the cohorts with AIS and the published data for unaffected adolescents. If the lower bound of the confidence interval was greater than the minimum clinical important difference, the difference was considered clinically significant. Of the 21 included patient cohorts, 81% reported statistically worse pain than those unaffected. Yet in only 5% of cohorts was this difference clinically important. Of the 11 cohorts included examining patient self-image, 91% reported statistically worse scores than those unaffected. In 73% of cohorts this difference was clinically significant. Affected cohorts tended to score well in function/activity and mental health domains and differences from those unaffected rarely reached clinically significant values. Pain and self-image tend to be statistically lower among cohorts with AIS than those unaffected. The literature to date suggests that it is only self-image which consistently differs clinically. This should be considered when assessing the possible benefits of surgery.
Mapping Quantitative Traits in Unselected Families: Algorithms and Examples

PubMed Central

Dupuis, Josée; Shi, Jianxin; Manning, Alisa K.; Benjamin, Emelia J.; Meigs, James B.; Cupples, L. Adrienne; Siegmund, David

2009-01-01

Linkage analysis has been widely used to identify from family data genetic variants influencing quantitative traits. Common approaches have both strengths and limitations. Likelihood ratio tests typically computed in variance component analysis can accommodate large families but are highly sensitive to departure from normality assumptions. Regression-based approaches are more robust but their use has primarily been restricted to nuclear families. In this paper, we develop methods for mapping quantitative traits in moderately large pedigrees. Our methods are based on the score statistic which in contrast to the likelihood ratio statistic, can use nonparametric estimators of variability to achieve robustness of the false positive rate against departures from the hypothesized phenotypic model. Because the score statistic is easier to calculate than the likelihood ratio statistic, our basic mapping methods utilize relatively simple computer code that performs statistical analysis on output from any program that computes estimates of identity-by-descent. This simplicity also permits development and evaluation of methods to deal with multivariate and ordinal phenotypes, and with gene-gene and gene-environment interaction. We demonstrate our methods on simulated data and on fasting insulin, a quantitative trait measured in the Framingham Heart Study. PMID:19278016
Rare-Variant Association Analysis: Study Designs and Statistical Tests

PubMed Central

Lee, Seunggeung; Abecasis, Gonçalo R.; Boehnke, Michael; Lin, Xihong

2014-01-01

Despite the extensive discovery of trait- and disease-associated common variants, much of the genetic contribution to complex traits remains unexplained. Rare variants can explain additional disease risk or trait variability. An increasing number of studies are underway to identify trait- and disease-associated rare variants. In this review, we provide an overview of statistical issues in rare-variant association studies with a focus on study designs and statistical tests. We present the design and analysis pipeline of rare-variant studies and review cost-effective sequencing designs and genotyping platforms. We compare various gene- or region-based association tests, including burden tests, variance-component tests, and combined omnibus tests, in terms of their assumptions and performance. Also discussed are the related topics of meta-analysis, population-stratification adjustment, genotype imputation, follow-up studies, and heritability due to rare variants. We provide guidelines for analysis and discuss some of the challenges inherent in these studies and future research directions. PMID:24995866
Searching for molecular markers in head and neck squamous cell carcinomas (HNSCC) by statistical and bioinformatic analysis of larynx-derived SAGE libraries

PubMed Central

Silveira, Nelson JF; Varuzza, Leonardo; Machado-Lima, Ariane; Lauretto, Marcelo S; Pinheiro, Daniel G; Rodrigues, Rodrigo V; Severino, Patrícia; Nobrega, Francisco G; Silva, Wilson A; de B Pereira, Carlos A; Tajara, Eloiza H

2008-01-01

Background Head and neck squamous cell carcinoma (HNSCC) is one of the most common malignancies in humans. The average 5-year survival rate is one of the lowest among aggressive cancers, showing no significant improvement in recent years. When detected early, HNSCC has a good prognosis, but most patients present metastatic disease at the time of diagnosis, which significantly reduces survival rate. Despite extensive research, no molecular markers are currently available for diagnostic or prognostic purposes. Methods Aiming to identify differentially-expressed genes involved in laryngeal squamous cell carcinoma (LSCC) development and progression, we generated individual Serial Analysis of Gene Expression (SAGE) libraries from a metastatic and non-metastatic larynx carcinoma, as well as from a normal larynx mucosa sample. Approximately 54,000 unique tags were sequenced in three libraries. Results Statistical data analysis identified a subset of 1,216 differentially expressed tags between tumor and normal libraries, and 894 differentially expressed tags between metastatic and non-metastatic carcinomas. Three genes displaying differential regulation, one down-regulated (KRT31) and two up-regulated (BST2, MFAP2), as well as one with a non-significant differential expression pattern (GNA15) in our SAGE data were selected for real-time polymerase chain reaction (PCR) in a set of HNSCC samples. Consistent with our statistical analysis, quantitative PCR confirmed the upregulation of BST2 and MFAP2 and the downregulation of KRT31 when samples of HNSCC were compared to tumor-free surgical margins. As expected, GNA15 presented a non-significant differential expression pattern when tumor samples were compared to normal tissues. Conclusion To the best of our knowledge, this is the first study reporting SAGE data in head and neck squamous cell tumors. Statistical analysis was effective in identifying differentially expressed genes reportedly involved in cancer development. The differential expression of a subset of genes was confirmed in additional larynx carcinoma samples and in carcinomas from a distinct head and neck subsite. This result suggests the existence of potential common biomarkers for prognosis and targeted-therapy development in this heterogeneous type of tumor. PMID:19014460
Common Scientific and Statistical Errors in Obesity Research

PubMed Central

George, Brandon J.; Beasley, T. Mark; Brown, Andrew W.; Dawson, John; Dimova, Rositsa; Divers, Jasmin; Goldsby, TaShauna U.; Heo, Moonseong; Kaiser, Kathryn A.; Keith, Scott; Kim, Mimi Y.; Li, Peng; Mehta, Tapan; Oakes, J. Michael; Skinner, Asheley; Stuart, Elizabeth; Allison, David B.

2015-01-01

We identify 10 common errors and problems in the statistical analysis, design, interpretation, and reporting of obesity research and discuss how they can be avoided. The 10 topics are: 1) misinterpretation of statistical significance, 2) inappropriate testing against baseline values, 3) excessive and undisclosed multiple testing and “p-value hacking,” 4) mishandling of clustering in cluster randomized trials, 5) misconceptions about nonparametric tests, 6) mishandling of missing data, 7) miscalculation of effect sizes, 8) ignoring regression to the mean, 9) ignoring confirmation bias, and 10) insufficient statistical reporting. We hope that discussion of these errors can improve the quality of obesity research by helping researchers to implement proper statistical practice and to know when to seek the help of a statistician. PMID:27028280
Using statistical process control for monitoring the prevalence of hospital-acquired pressure ulcers.

PubMed

Kottner, Jan; Halfens, Ruud

2010-05-01

Institutionally acquired pressure ulcers are used as outcome indicators to assess the quality of pressure ulcer prevention programs. Determining whether quality improvement projects that aim to decrease the proportions of institutionally acquired pressure ulcers lead to real changes in clinical practice depends on the measurement method and statistical analysis used. To examine whether nosocomial pressure ulcer prevalence rates in hospitals in the Netherlands changed, a secondary data analysis using different statistical approaches was conducted of annual (1998-2008) nationwide nursing-sensitive health problem prevalence studies in the Netherlands. Institutions that participated regularly in all survey years were identified. Risk-adjusted nosocomial pressure ulcers prevalence rates, grade 2 to 4 (European Pressure Ulcer Advisory Panel system) were calculated per year and hospital. Descriptive statistics, chi-square trend tests, and P charts based on statistical process control (SPC) were applied and compared. Six of the 905 healthcare institutions participated in every survey year and 11,444 patients in these six hospitals were identified as being at risk for pressure ulcers. Prevalence rates per year ranged from 0.05 to 0.22. Chi-square trend tests revealed statistically significant downward trends in four hospitals but based on SPC methods, prevalence rates of five hospitals varied by chance only. Results of chi-square trend tests and SPC methods were not comparable, making it impossible to decide which approach is more appropriate. P charts provide more valuable information than single P values and are more helpful for monitoring institutional performance. Empirical evidence about the decrease of nosocomial pressure ulcer prevalence rates in the Netherlands is contradictory and limited.
Statistical methods and errors in family medicine articles between 2010 and 2014-Suez Canal University, Egypt: A cross-sectional study.

PubMed

Nour-Eldein, Hebatallah

2016-01-01

With limited statistical knowledge of most physicians it is not uncommon to find statistical errors in research articles. To determine the statistical methods and to assess the statistical errors in family medicine (FM) research articles that were published between 2010 and 2014. This was a cross-sectional study. All 66 FM research articles that were published over 5 years by FM authors with affiliation to Suez Canal University were screened by the researcher between May and August 2015. Types and frequencies of statistical methods were reviewed in all 66 FM articles. All 60 articles with identified inferential statistics were examined for statistical errors and deficiencies. A comprehensive 58-item checklist based on statistical guidelines was used to evaluate the statistical quality of FM articles. Inferential methods were recorded in 62/66 (93.9%) of FM articles. Advanced analyses were used in 29/66 (43.9%). Contingency tables 38/66 (57.6%), regression (logistic, linear) 26/66 (39.4%), and t-test 17/66 (25.8%) were the most commonly used inferential tests. Within 60 FM articles with identified inferential statistics, no prior sample size 19/60 (31.7%), application of wrong statistical tests 17/60 (28.3%), incomplete documentation of statistics 59/60 (98.3%), reporting P value without test statistics 32/60 (53.3%), no reporting confidence interval with effect size measures 12/60 (20.0%), use of mean (standard deviation) to describe ordinal/nonnormal data 8/60 (13.3%), and errors related to interpretation were mainly for conclusions without support by the study data 5/60 (8.3%). Inferential statistics were used in the majority of FM articles. Data analysis and reporting statistics are areas for improvement in FM research articles.

Statistical methods and errors in family medicine articles between 2010 and 2014-Suez Canal University, Egypt: A cross-sectional study

PubMed Central

Nour-Eldein, Hebatallah

2016-01-01

Background: With limited statistical knowledge of most physicians it is not uncommon to find statistical errors in research articles. Objectives: To determine the statistical methods and to assess the statistical errors in family medicine (FM) research articles that were published between 2010 and 2014. Methods: This was a cross-sectional study. All 66 FM research articles that were published over 5 years by FM authors with affiliation to Suez Canal University were screened by the researcher between May and August 2015. Types and frequencies of statistical methods were reviewed in all 66 FM articles. All 60 articles with identified inferential statistics were examined for statistical errors and deficiencies. A comprehensive 58-item checklist based on statistical guidelines was used to evaluate the statistical quality of FM articles. Results: Inferential methods were recorded in 62/66 (93.9%) of FM articles. Advanced analyses were used in 29/66 (43.9%). Contingency tables 38/66 (57.6%), regression (logistic, linear) 26/66 (39.4%), and t-test 17/66 (25.8%) were the most commonly used inferential tests. Within 60 FM articles with identified inferential statistics, no prior sample size 19/60 (31.7%), application of wrong statistical tests 17/60 (28.3%), incomplete documentation of statistics 59/60 (98.3%), reporting P value without test statistics 32/60 (53.3%), no reporting confidence interval with effect size measures 12/60 (20.0%), use of mean (standard deviation) to describe ordinal/nonnormal data 8/60 (13.3%), and errors related to interpretation were mainly for conclusions without support by the study data 5/60 (8.3%). Conclusion: Inferential statistics were used in the majority of FM articles. Data analysis and reporting statistics are areas for improvement in FM research articles. PMID:27453839
Use of multivariate statistics to identify unreliable data obtained using CASA.

PubMed

Martínez, Luis Becerril; Crispín, Rubén Huerta; Mendoza, Maximino Méndez; Gallegos, Oswaldo Hernández; Martínez, Andrés Aragón

2013-06-01

In order to identify unreliable data in a dataset of motility parameters obtained from a pilot study acquired by a veterinarian with experience in boar semen handling, but without experience in the operation of a computer assisted sperm analysis (CASA) system, a multivariate graphical and statistical analysis was performed. Sixteen boar semen samples were aliquoted then incubated with varying concentrations of progesterone from 0 to 3.33 µg/ml and analyzed in a CASA system. After standardization of the data, Chernoff faces were pictured for each measurement, and a principal component analysis (PCA) was used to reduce the dimensionality and pre-process the data before hierarchical clustering. The first twelve individual measurements showed abnormal features when Chernoff faces were drawn. PCA revealed that principal components 1 and 2 explained 63.08% of the variance in the dataset. Values of principal components for each individual measurement of semen samples were mapped to identify differences among treatment or among boars. Twelve individual measurements presented low values of principal component 1. Confidence ellipses on the map of principal components showed no statistically significant effects for treatment or boar. Hierarchical clustering realized on two first principal components produced three clusters. Cluster 1 contained evaluations of the two first samples in each treatment, each one of a different boar. With the exception of one individual measurement, all other measurements in cluster 1 were the same as observed in abnormal Chernoff faces. Unreliable data in cluster 1 are probably related to the operator inexperience with a CASA system. These findings could be used to objectively evaluate the skill level of an operator of a CASA system. This may be particularly useful in the quality control of semen analysis using CASA systems.
Biomarkers identified by urinary metabonomics for noninvasive diagnosis of nutritional rickets.

PubMed

Wang, Maoqing; Yang, Xue; Ren, Lihong; Li, Songtao; He, Xuan; Wu, Xiaoyan; Liu, Tingting; Lin, Liqun; Li, Ying; Sun, Changhao

2014-09-05

Nutritional rickets is a worldwide public health problem; however, the current diagnostic methods retain shortcomings for accurate diagnosis of nutritional rickets. To identify urinary biomarkers associated with nutritional rickets and establish a noninvasive diagnosis method, urinary metabonomics analysis by ultra-performance liquid chromatography/quadrupole time-of-flight tandem mass spectrometry and multivariate statistical analysis were employed to investigate the metabolic alterations associated with nutritional rickets in 200 children with or without nutritional rickets. The pathophysiological changes and pathogenesis of nutritional rickets were illustrated by the identified biomarkers. By urinary metabolic profiling, 31 biomarkers of nutritional rickets were identified and five candidate biomarkers for clinical diagnosis were screened and identified by quantitative analysis and receiver operating curve analysis. Urinary levels of five candidate biomarkers were measured using mass spectrometry or commercial kits. In the validation step, the combination of phosphate and sebacic acid was able to give a noninvasive and accurate diagnostic with high sensitivity (94.0%) and specificity (71.2%). Furthermore, on the basis of the pathway analysis of biomarkers, our urinary metabonomics analysis gives new insight into the pathogenesis and pathophysiology of nutritional rickets.
Manipulating measurement scales in medical statistical analysis and data mining: A review of methodologies

PubMed Central

Marateb, Hamid Reza; Mansourian, Marjan; Adibi, Peyman; Farina, Dario

2014-01-01

Background: selecting the correct statistical test and data mining method depends highly on the measurement scale of data, type of variables, and purpose of the analysis. Different measurement scales are studied in details and statistical comparison, modeling, and data mining methods are studied based upon using several medical examples. We have presented two ordinal–variables clustering examples, as more challenging variable in analysis, using Wisconsin Breast Cancer Data (WBCD). Ordinal-to-Interval scale conversion example: a breast cancer database of nine 10-level ordinal variables for 683 patients was analyzed by two ordinal-scale clustering methods. The performance of the clustering methods was assessed by comparison with the gold standard groups of malignant and benign cases that had been identified by clinical tests. Results: the sensitivity and accuracy of the two clustering methods were 98% and 96%, respectively. Their specificity was comparable. Conclusion: by using appropriate clustering algorithm based on the measurement scale of the variables in the study, high performance is granted. Moreover, descriptive and inferential statistics in addition to modeling approach must be selected based on the scale of the variables. PMID:24672565
Statistical flaws in design and analysis of fertility treatment studies on cryopreservation raise doubts on the conclusions

PubMed Central

van Gelder, P.H.A.J.M.; Nijs, M.

2011-01-01

Decisions about pharmacotherapy are being taken by medical doctors and authorities based on comparative studies on the use of medications. In studies on fertility treatments in particular, the methodological quality is of utmost importance in the application of evidence-based medicine and systematic reviews. Nevertheless, flaws and omissions appear quite regularly in these types of studies. Current study aims to present an overview of some of the typical statistical flaws, illustrated by a number of example studies which have been published in peer reviewed journals. Based on an investigation of eleven studies at random selected on fertility treatments with cryopreservation, it appeared that the methodological quality of these studies often did not fulfil the required statistical criteria. The following statistical flaws were identified: flaws in study design, patient selection, and units of analysis or in the definition of the primary endpoints. Other errors could be found in p-value and power calculations or in critical p-value definitions. Proper interpretation of the results and/or use of these study results in a meta analysis should therefore be conducted with care. PMID:24753877
Statistical flaws in design and analysis of fertility treatment -studies on cryopreservation raise doubts on the conclusions.

PubMed

van Gelder, P H A J M; Nijs, M

2011-01-01

Decisions about pharmacotherapy are being taken by medical doctors and authorities based on comparative studies on the use of medications. In studies on fertility treatments in particular, the methodological quality is of utmost -importance in the application of evidence-based medicine and systematic reviews. Nevertheless, flaws and omissions appear quite regularly in these types of studies. Current study aims to present an overview of some of the typical statistical flaws, illustrated by a number of example studies which have been published in peer reviewed journals. Based on an investigation of eleven studies at random selected on fertility treatments with cryopreservation, it appeared that the methodological quality of these studies often did not fulfil the -required statistical criteria. The following statistical flaws were identified: flaws in study design, patient selection, and units of analysis or in the definition of the primary endpoints. Other errors could be found in p-value and power calculations or in critical p-value definitions. Proper -interpretation of the results and/or use of these study results in a meta analysis should therefore be conducted with care.
Optimizing construction quality management of pavements using mechanistic performance analysis.

DOT National Transportation Integrated Search

2004-08-01

This report presents a statistical-based algorithm that was developed to reconcile the results from several pavement performance models used in the state of practice with systematic process control techniques. These algorithms identify project-specif...
Spatio-temporal surveillance of water based infectious disease (malaria) in Rawalpindi, Pakistan using geostatistical modeling techniques.

PubMed

Ahmad, Sheikh Saeed; Aziz, Neelam; Butt, Amna; Shabbir, Rabia; Erum, Summra

2015-09-01

One of the features of medical geography that has made it so useful in health research is statistical spatial analysis, which enables the quantification and qualification of health events. The main objective of this research was to study the spatial distribution patterns of malaria in Rawalpindi district using spatial statistical techniques to identify the hot spots and the possible risk factor. Spatial statistical analyses were done in ArcGIS, and satellite images for land use classification were processed in ERDAS Imagine. Four hundred and fifty water samples were also collected from the study area to identify the presence or absence of any microbial contamination. The results of this study indicated that malaria incidence varied according to geographical location, with eco-climatic condition and showing significant positive spatial autocorrelation. Hotspots or location of clusters were identified using Getis-Ord Gi* statistic. Significant clustering of malaria incidence occurred in rural central part of the study area including Gujar Khan, Kaller Syedan, and some part of Kahuta and Rawalpindi Tehsil. Ordinary least square (OLS) regression analysis was conducted to analyze the relationship of risk factors with the disease cases. Relationship of different land cover with the disease cases indicated that malaria was more related with agriculture, low vegetation, and water class. Temporal variation of malaria cases showed significant positive association with the meteorological variables including average monthly rainfall and temperature. The results of the study further suggested that water supply and sewage system and solid waste collection system needs a serious attention to prevent any outbreak in the study area.
Using spatial statistics to identify emerging hot spots of forest loss

NASA Astrophysics Data System (ADS)

Harris, Nancy L.; Goldman, Elizabeth; Gabris, Christopher; Nordling, Jon; Minnemeyer, Susan; Ansari, Stephen; Lippmann, Michael; Bennett, Lauren; Raad, Mansour; Hansen, Matthew; Potapov, Peter

2017-02-01

As sources of data for global forest monitoring grow larger, more complex and numerous, data analysis and interpretation become critical bottlenecks for effectively using them to inform land use policy discussions. Here in this paper, we present a method that combines big data analytical tools with Emerging Hot Spot Analysis (ArcGIS) to identify statistically significant spatiotemporal trends of forest loss in Brazil, Indonesia and the Democratic Republic of Congo (DRC) between 2000 and 2014. Results indicate that while the overall rate of forest loss in Brazil declined over the 14-year time period, spatiotemporal patterns of loss shifted, with forest loss significantly diminishing within the Amazonian states of Mato Grosso and Rondônia and intensifying within the cerrado biome. In Indonesia, forest loss intensified in Riau province in Sumatra and in Sukamara and West Kotawaringin regencies in Central Kalimantan. Substantial portions of West Kalimantan became new and statistically significant hot spots of forest loss in the years 2013 and 2014. Similarly, vast areas of DRC emerged as significant new hot spots of forest loss, with intensified loss radiating out from city centers such as Beni and Kisangani. While our results focus on identifying significant trends at the national scale, we also demonstrate the scalability of our approach to smaller or larger regions depending on the area of interest and specific research question involved. When combined with other contextual information, these statistical data models can help isolate the most significant clusters of loss occurring over dynamic forest landscapes and provide more coherent guidance for the allocation of resources for forest monitoring and enforcement efforts.
New insights into old methods for identifying causal rare variants.

PubMed

Wang, Haitian; Huang, Chien-Hsun; Lo, Shaw-Hwa; Zheng, Tian; Hu, Inchi

2011-11-29

The advance of high-throughput next-generation sequencing technology makes possible the analysis of rare variants. However, the investigation of rare variants in unrelated-individuals data sets faces the challenge of low power, and most methods circumvent the difficulty by using various collapsing procedures based on genes, pathways, or gene clusters. We suggest a new way to identify causal rare variants using the F-statistic and sliced inverse regression. The procedure is tested on the data set provided by the Genetic Analysis Workshop 17 (GAW17). After preliminary data reduction, we ranked markers according to their F-statistic values. Top-ranked markers were then subjected to sliced inverse regression, and those with higher absolute coefficients in the most significant sliced inverse regression direction were selected. The procedure yields good false discovery rates for the GAW17 data and thus is a promising method for future study on rare variants.
Bayesian test for colocalisation between pairs of genetic association studies using summary statistics.

PubMed

Giambartolomei, Claudia; Vukcevic, Damjan; Schadt, Eric E; Franke, Lude; Hingorani, Aroon D; Wallace, Chris; Plagnol, Vincent

2014-05-01

Genetic association studies, in particular the genome-wide association study (GWAS) design, have provided a wealth of novel insights into the aetiology of a wide range of human diseases and traits, in particular cardiovascular diseases and lipid biomarkers. The next challenge consists of understanding the molecular basis of these associations. The integration of multiple association datasets, including gene expression datasets, can contribute to this goal. We have developed a novel statistical methodology to assess whether two association signals are consistent with a shared causal variant. An application is the integration of disease scans with expression quantitative trait locus (eQTL) studies, but any pair of GWAS datasets can be integrated in this framework. We demonstrate the value of the approach by re-analysing a gene expression dataset in 966 liver samples with a published meta-analysis of lipid traits including >100,000 individuals of European ancestry. Combining all lipid biomarkers, our re-analysis supported 26 out of 38 reported colocalisation results with eQTLs and identified 14 new colocalisation results, hence highlighting the value of a formal statistical test. In three cases of reported eQTL-lipid pairs (SYPL2, IFT172, TBKBP1) for which our analysis suggests that the eQTL pattern is not consistent with the lipid association, we identify alternative colocalisation results with SORT1, GCKR, and KPNB1, indicating that these genes are more likely to be causal in these genomic intervals. A key feature of the method is the ability to derive the output statistics from single SNP summary statistics, hence making it possible to perform systematic meta-analysis type comparisons across multiple GWAS datasets (implemented online at http://coloc.cs.ucl.ac.uk/coloc/). Our methodology provides information about candidate causal genes in associated intervals and has direct implications for the understanding of complex diseases as well as the design of drugs to target disease pathways.
A Simple Test of Class-Level Genetic Association Can Reveal Novel Cardiometabolic Trait Loci.

PubMed

Qian, Jing; Nunez, Sara; Reed, Eric; Reilly, Muredach P; Foulkes, Andrea S

2016-01-01

Characterizing the genetic determinants of complex diseases can be further augmented by incorporating knowledge of underlying structure or classifications of the genome, such as newly developed mappings of protein-coding genes, epigenetic marks, enhancer elements and non-coding RNAs. We apply a simple class-level testing framework, termed Genetic Class Association Testing (GenCAT), to identify protein-coding gene association with 14 cardiometabolic (CMD) related traits across 6 publicly available genome wide association (GWA) meta-analysis data resources. GenCAT uses SNP-level meta-analysis test statistics across all SNPs within a class of elements, as well as the size of the class and its unique correlation structure, to determine if the class is statistically meaningful. The novelty of findings is evaluated through investigation of regional signals. A subset of findings are validated using recently updated, larger meta-analysis resources. A simulation study is presented to characterize overall performance with respect to power, control of family-wise error and computational efficiency. All analysis is performed using the GenCAT package, R version 3.2.1. We demonstrate that class-level testing complements the common first stage minP approach that involves individual SNP-level testing followed by post-hoc ascribing of statistically significant SNPs to genes and loci. GenCAT suggests 54 protein-coding genes at 41 distinct loci for the 13 CMD traits investigated in the discovery analysis, that are beyond the discoveries of minP alone. An additional application to biological pathways demonstrates flexibility in defining genetic classes. We conclude that it would be prudent to include class-level testing as standard practice in GWA analysis. GenCAT, for example, can be used as a simple, complementary and efficient strategy for class-level testing that leverages existing data resources, requires only summary level data in the form of test statistics, and adds significant value with respect to its potential for identifying multiple novel and clinically relevant trait associations.
Fundamental frequency and voice perturbation measures in smokers and non-smokers: An acoustic and perceptual study

NASA Astrophysics Data System (ADS)

Freeman, Allison

This research examined the fundamental frequency and perturbation (jitter % and shimmer %) measures in young adult (20-30 year-old) and middle-aged adult (40-55 year-old) smokers and non-smokers; there were 36 smokers and 36 non-smokers. Acoustic analysis was carried out utilizing one task: production of sustained /a/. These voice samples were analyzed utilizing Multi-Dimensional Voice Program (MDVP) software, which provided values for fundamental frequency, jitter %, and shimmer %.These values were analyzed for trends regarding smoking status, age, and gender. Statistical significance was found regarding the fundamental frequency, jitter %, and shimmer % for smokers as compared to non-smokers; smokers were found to have significantly lower fundamental frequency values, and significantly higher jitter % and shimmer % values. Statistical significance was not found regarding fundamental frequency, jitter %, and shimmer % for age group comparisons. With regard to gender, statistical significance was found regarding fundamental frequency; females were found to have statistically higher fundamental frequencies as compared to males. However, the relationships between gender and jitter % and shimmer % lacked statistical significance. These results indicate that smoking negatively affects voice quality. This study also examined the ability of untrained listeners to identify smokers and non-smokers based on their voices. Results of this voice perception task suggest that listeners are not accurately able to identify smokers and non-smokers, as statistical significance was not reached. However, despite a lack of significance, trends in data suggest that listeners are able to utilize voice quality to identify smokers and non-smokers.
Quasi-experimental Studies in the Fields of Infection Control and Antibiotic Resistance, Ten Years Later: A Systematic Review.

PubMed

Alsaggaf, Rotana; O'Hara, Lyndsay M; Stafford, Kristen A; Leekha, Surbhi; Harris, Anthony D

2018-02-01

OBJECTIVE A systematic review of quasi-experimental studies in the field of infectious diseases was published in 2005. The aim of this study was to assess improvements in the design and reporting of quasi-experiments 10 years after the initial review. We also aimed to report the statistical methods used to analyze quasi-experimental data. DESIGN Systematic review of articles published from January 1, 2013, to December 31, 2014, in 4 major infectious disease journals. METHODS Quasi-experimental studies focused on infection control and antibiotic resistance were identified and classified based on 4 criteria: (1) type of quasi-experimental design used, (2) justification of the use of the design, (3) use of correct nomenclature to describe the design, and (4) statistical methods used. RESULTS Of 2,600 articles, 173 (7%) featured a quasi-experimental design, compared to 73 of 2,320 articles (3%) in the previous review (P<.01). Moreover, 21 articles (12%) utilized a study design with a control group; 6 (3.5%) justified the use of a quasi-experimental design; and 68 (39%) identified their design using the correct nomenclature. In addition, 2-group statistical tests were used in 75 studies (43%); 58 studies (34%) used standard regression analysis; 18 (10%) used segmented regression analysis; 7 (4%) used standard time-series analysis; 5 (3%) used segmented time-series analysis; and 10 (6%) did not utilize statistical methods for comparisons. CONCLUSIONS While some progress occurred over the decade, it is crucial to continue improving the design and reporting of quasi-experimental studies in the fields of infection control and antibiotic resistance to better evaluate the effectiveness of important interventions. Infect Control Hosp Epidemiol 2018;39:170-176.
Statistical Characteristics of Single Sort of Grape Bulgarian Wines

NASA Astrophysics Data System (ADS)

Boyadzhiev, D.

2008-10-01

The aim of this paper is to evaluate the differences in the values of the 8 basic physicochemical indices of single sort of grape Bulgarian wines (white and red ones), obligatory for the standardization of ready production in the winery. Statistically significant differences in the values of various sorts and vintages are established and possibilities for identifying the sort and the vintage on the base of these indices by applying discriminant analysis are discussed.
Functional genomics annotation of a statistical epistasis network associated with bladder cancer susceptibility.

PubMed

Hu, Ting; Pan, Qinxin; Andrew, Angeline S; Langer, Jillian M; Cole, Michael D; Tomlinson, Craig R; Karagas, Margaret R; Moore, Jason H

2014-04-11

Several different genetic and environmental factors have been identified as independent risk factors for bladder cancer in population-based studies. Recent studies have turned to understanding the role of gene-gene and gene-environment interactions in determining risk. We previously developed the bioinformatics framework of statistical epistasis networks (SEN) to characterize the global structure of interacting genetic factors associated with a particular disease or clinical outcome. By applying SEN to a population-based study of bladder cancer among Caucasians in New Hampshire, we were able to identify a set of connected genetic factors with strong and significant interaction effects on bladder cancer susceptibility. To support our statistical findings using networks, in the present study, we performed pathway enrichment analyses on the set of genes identified using SEN, and found that they are associated with the carcinogen benzo[a]pyrene, a component of tobacco smoke. We further carried out an mRNA expression microarray experiment to validate statistical genetic interactions, and to determine if the set of genes identified in the SEN were differentially expressed in a normal bladder cell line and a bladder cancer cell line in the presence or absence of benzo[a]pyrene. Significant nonrandom sets of genes from the SEN were found to be differentially expressed in response to benzo[a]pyrene in both the normal bladder cells and the bladder cancer cells. In addition, the patterns of gene expression were significantly different between these two cell types. The enrichment analyses and the gene expression microarray results support the idea that SEN analysis of bladder in population-based studies is able to identify biologically meaningful statistical patterns. These results bring us a step closer to a systems genetic approach to understanding cancer susceptibility that integrates population and laboratory-based studies.
Identifying unusual performance in Australian and New Zealand intensive care units from 2000 to 2010.

PubMed

Solomon, Patricia J; Kasza, Jessica; Moran, John L

2014-04-22

The Australian and New Zealand Intensive Care Society (ANZICS) Adult Patient Database (APD) collects voluntary data on patient admissions to Australian and New Zealand intensive care units (ICUs). This paper presents an in-depth statistical analysis of risk-adjusted mortality of ICU admissions from 2000 to 2010 for the purpose of identifying ICUs with unusual performance. A cohort of 523,462 patients from 144 ICUs was analysed. For each ICU, the natural logarithm of the standardised mortality ratio (log-SMR) was estimated from a risk-adjusted, three-level hierarchical model. This is the first time a three-level model has been fitted to such a large ICU database anywhere. The analysis was conducted in three stages which included the estimation of a null distribution to describe usual ICU performance. Log-SMRs with appropriate estimates of standard errors are presented in a funnel plot using 5% false discovery rate thresholds. False coverage-statement rate confidence intervals are also presented. The observed numbers of deaths for ICUs identified as unusual are compared to the predicted true worst numbers of deaths under the model for usual ICU performance. Seven ICUs were identified as performing unusually over the period 2000 to 2010, in particular, demonstrating high risk-adjusted mortality compared to the majority of ICUs. Four of the seven were ICUs in private hospitals. Our three-stage approach to the analysis detected outlying ICUs which were not identified in a conventional (single) risk-adjusted model for mortality using SMRs to compare ICUs. We also observed a significant linear decline in mortality over the decade. Distinct yearly and weekly respiratory seasonal effects were observed across regions of Australia and New Zealand for the first time. The statistical approach proposed in this paper is intended to be used for the review of observed ICU and hospital mortality. Two important messages from our study are firstly, that comprehensive risk-adjustment is essential in modelling patient mortality for comparing performance, and secondly, that the appropriate statistical analysis is complicated.
Geospatial Characterization of Fluvial Wood Arrangement in a Semi-confined Alluvial River

NASA Astrophysics Data System (ADS)

Martin, D. J.; Harden, C. P.; Pavlowsky, R. T.

2014-12-01

Large woody debris (LWD) has become universally recognized as an integral component of fluvial systems, and as a result, has become increasingly common as a river restoration tool. However, "natural" processes of wood recruitment and the subsequent arrangement of LWD within the river network are poorly understood. This research used a suite of spatial statistics to investigate longitudinal arrangement patterns of LWD in a low-gradient, Midwestern river. First, a large-scale GPS inventory of LWD, performed on the Big River in the eastern Missouri Ozarks, resulted in over 4,000 logged positions of LWD along seven river segments that covered nearly 100 km of the 237 km river system. A global Moran's I analysis indicates that LWD density is spatially autocorrelated and displays a clustering tendency within all seven river segments (P-value range = 0.000 to 0.054). A local Moran's I analysis identified specific locations along the segments where clustering occurs and revealed that, on average, clusters of LWD density (high or low) spanned 400 m. Spectral analyses revealed that, in some segments, LWD density is spatially periodic. Two segments displayed strong periodicity, while the remaining segments displayed varying degrees of noisiness. Periodicity showed a positive association with gravel bar spacing and meander wavelength, although there were insufficient data to statistically confirm the relationship. A wavelet analysis was then performed to investigate periodicity relative to location along the segment. The wavelet analysis identified significant (α = 0.05) periodicity at discrete locations along each of the segments. Those reaches yielding strong periodicity showed stronger relationships between LWD density and the geomorphic/riparian independent variables tested. Analyses consistently identified valley width and sinuosity as being associated with LWD density. The results of these analyses contribute a new perspective on the longitudinal distribution of LWD in a river system, which should help identify physical and/or riparian control mechanisms of LWD arrangement and support the development of models of LWD arrangement. Additionally, the spatial statistical tools presented here have shown to be valuable for identifying longitudinal patterns in river system components.
Identifying unusual performance in Australian and New Zealand intensive care units from 2000 to 2010

PubMed Central

2014-01-01

Background The Australian and New Zealand Intensive Care Society (ANZICS) Adult Patient Database (APD) collects voluntary data on patient admissions to Australian and New Zealand intensive care units (ICUs). This paper presents an in-depth statistical analysis of risk-adjusted mortality of ICU admissions from 2000 to 2010 for the purpose of identifying ICUs with unusual performance. Methods A cohort of 523,462 patients from 144 ICUs was analysed. For each ICU, the natural logarithm of the standardised mortality ratio (log-SMR) was estimated from a risk-adjusted, three-level hierarchical model. This is the first time a three-level model has been fitted to such a large ICU database anywhere. The analysis was conducted in three stages which included the estimation of a null distribution to describe usual ICU performance. Log-SMRs with appropriate estimates of standard errors are presented in a funnel plot using 5% false discovery rate thresholds. False coverage-statement rate confidence intervals are also presented. The observed numbers of deaths for ICUs identified as unusual are compared to the predicted true worst numbers of deaths under the model for usual ICU performance. Results Seven ICUs were identified as performing unusually over the period 2000 to 2010, in particular, demonstrating high risk-adjusted mortality compared to the majority of ICUs. Four of the seven were ICUs in private hospitals. Our three-stage approach to the analysis detected outlying ICUs which were not identified in a conventional (single) risk-adjusted model for mortality using SMRs to compare ICUs. We also observed a significant linear decline in mortality over the decade. Distinct yearly and weekly respiratory seasonal effects were observed across regions of Australia and New Zealand for the first time. Conclusions The statistical approach proposed in this paper is intended to be used for the review of observed ICU and hospital mortality. Two important messages from our study are firstly, that comprehensive risk-adjustment is essential in modelling patient mortality for comparing performance, and secondly, that the appropriate statistical analysis is complicated. PMID:24755369
Statistical analysis plan of the head position in acute ischemic stroke trial pilot (HEADPOST pilot).

PubMed

Olavarría, Verónica V; Arima, Hisatomi; Anderson, Craig S; Brunser, Alejandro; Muñoz-Venturelli, Paula; Billot, Laurent; Lavados, Pablo M

2017-02-01

Background The HEADPOST Pilot is a proof-of-concept, open, prospective, multicenter, international, cluster randomized, phase IIb controlled trial, with masked outcome assessment. The trial will test if lying flat head position initiated in patients within 12 h of onset of acute ischemic stroke involving the anterior circulation increases cerebral blood flow in the middle cerebral arteries, as measured by transcranial Doppler. The study will also assess the safety and feasibility of patients lying flat for ≥24 h. The trial was conducted in centers in three countries, with ability to perform early transcranial Doppler. A feature of this trial was that patients were randomized to a certain position according to the month of admission to hospital. Objective To outline in detail the predetermined statistical analysis plan for HEADPOST Pilot study. Methods All data collected by participating researchers will be reviewed and formally assessed. Information pertaining to the baseline characteristics of patients, their process of care, and the delivery of treatments will be classified, and for each item, appropriate descriptive statistical analyses are planned with comparisons made between randomized groups. For the outcomes, statistical comparisons to be made between groups are planned and described. Results This statistical analysis plan was developed for the analysis of the results of the HEADPOST Pilot study to be transparent, available, verifiable, and predetermined before data lock. Conclusions We have developed a statistical analysis plan for the HEADPOST Pilot study which is to be followed to avoid analysis bias arising from prior knowledge of the study findings. Trial registration The study is registered under HEADPOST-Pilot, ClinicalTrials.gov Identifier NCT01706094.

Differentiation of chocolates according to the cocoa's geographical origin using chemometrics.

PubMed

Cambrai, Amandine; Marcic, Christophe; Morville, Stéphane; Sae Houer, Pierre; Bindler, Françoise; Marchioni, Eric

2010-02-10

The determination of the geographical origin of cocoa used to produce chocolate has been assessed through the analysis of the volatile compounds of chocolate samples. The analysis of the volatile content and their statistical processing by multivariate analyses tended to form independent groups for both Africa and Madagascar, even if some of the chocolate samples analyzed appeared in a mixed zone together with those from America. This analysis also allowed a clear separation between Caribbean chocolates and those from other origins. Height compounds (such as linalool or (E,E)-2,4-decadienal) characteristic of chocolate's different geographical origins were also identified. The method described in this work (hydrodistillation, GC analysis, and statistic treatment) may improve the control of the geographical origin of chocolate during its long production process.
Students' attitudes towards learning statistics

NASA Astrophysics Data System (ADS)

Ghulami, Hassan Rahnaward; Hamid, Mohd Rashid Ab; Zakaria, Roslinazairimah

2015-05-01

Positive attitude towards learning is vital in order to master the core content of the subject matters under study. This is unexceptional in learning statistics course especially at the university level. Therefore, this study investigates the students' attitude towards learning statistics. Six variables or constructs have been identified such as affect, cognitive competence, value, difficulty, interest, and effort. The instrument used for the study is questionnaire that was adopted and adapted from the reliable instrument of Survey of Attitudes towards Statistics(SATS©). This study is conducted to engineering undergraduate students in one of the university in the East Coast of Malaysia. The respondents consist of students who were taking the applied statistics course from different faculties. The results are analysed in terms of descriptive analysis and it contributes to the descriptive understanding of students' attitude towards the teaching and learning process of statistics.
Avalanche Statistics Identify Intrinsic Stellar Processes near Criticality in KIC 8462852

NASA Astrophysics Data System (ADS)

Sheikh, Mohammed A.; Weaver, Richard L.; Dahmen, Karin A.

2016-12-01

The star KIC8462852 (Tabby's star) has shown anomalous drops in light flux. We perform a statistical analysis of the more numerous smaller dimming events by using methods found useful for avalanches in ferromagnetism and plastic flow. Scaling exponents for avalanche statistics and temporal profiles of the flux during the dimming events are close to mean field predictions. Scaling collapses suggest that this star may be near a nonequilibrium critical point. The large events are interpreted as avalanches marked by modified dynamics, limited by the system size, and not within the scaling regime.
Multilevel Latent Class Analysis: An Application of Adolescent Smoking Typologies with Individual and Contextual Predictors

ERIC Educational Resources Information Center

Henry, Kimberly L.; Muthen, Bengt

2010-01-01

Latent class analysis (LCA) is a statistical method used to identify subtypes of related cases using a set of categorical or continuous observed variables. Traditional LCA assumes that observations are independent. However, multilevel data structures are common in social and behavioral research and alternative strategies are needed. In this…
Explore the Usefulness of Person-Fit Analysis on Large-Scale Assessment

ERIC Educational Resources Information Center

Cui, Ying; Mousavi, Amin

2015-01-01

The current study applied the person-fit statistic, l[subscript z], to data from a Canadian provincial achievement test to explore the usefulness of conducting person-fit analysis on large-scale assessments. Item parameter estimates were compared before and after the misfitting student responses, as identified by l[subscript z], were removed. The…
An Analysis of Construction Contractor Performance Evaluation System

DTIC Science & Technology

2009-03-01

65 8. Summary of Determinant and KMO Values for Finalized...principle component analysis output is the KMO and Bartlett‘s Test. KMO or Kaiser-Meyer-Olkin measure of sampling adequacy is used to identify if a...set of variables, when factored together, yield distinct and reliable factors (Field, 2005). KMO statistics vary between values of 0 to 1. Kaiser
Using Multi-Group Confirmatory Factor Analysis to Evaluate Cross-Cultural Research: Identifying and Understanding Non-Invariance

ERIC Educational Resources Information Center

Brown, Gavin T. L.; Harris, Lois R.; O'Quin, Chrissie; Lane, Kenneth E.

2017-01-01

Multi-group confirmatory factor analysis (MGCFA) allows researchers to determine whether a research inventory elicits similar response patterns across samples. If statistical equivalence in responding is found, then scale score comparisons become possible and samples can be said to be from the same population. This paper illustrates the use of…
Variable Neighborhood Search Heuristics for Selecting a Subset of Variables in Principal Component Analysis

ERIC Educational Resources Information Center

Brusco, Michael J.; Singh, Renu; Steinley, Douglas

2009-01-01

The selection of a subset of variables from a pool of candidates is an important problem in several areas of multivariate statistics. Within the context of principal component analysis (PCA), a number of authors have argued that subset selection is crucial for identifying those variables that are required for correct interpretation of the…
Chemical Attribution of Fentanyl Using Multivariate Statistical Analysis of Orthogonal Mass Spectral Data

DOE PAGES

Mayer, Brian P.; DeHope, Alan J.; Mew, Daniel A.; ...

2016-03-24

Attribution of the origin of an illicit drug relies on identification of compounds indicative of its clandestine production and is a key component of many modern forensic investigations. Here, the results of these studies can yield detailed information on method of manufacture, starting material source, and final product, all critical forensic evidence. In the present work, chemical attribution signatures (CAS) associated with the synthesis of the analgesic fentanyl, N-(1-phenylethylpiperidin-4-yl)-N-phenylpropanamide, were investigated. Six synthesis methods, all previously published fentanyl synthetic routes or hybrid versions thereof, were studied in an effort to identify and classify route-specific signatures. A total of 160 distinctmore » compounds and inorganic species were identified using gas and liquid chromatographies combined with mass spectrometric methods (gas chromatography/mass spectrometry (GC/MS) and liquid chromatography–tandem mass spectrometry-time of-flight (LC–MS/MS-TOF)) in conjunction with inductively coupled plasma mass spectrometry (ICPMS). The complexity of the resultant data matrix urged the use of multivariate statistical analysis. Using partial least-squares-discriminant analysis (PLS-DA), 87 route-specific CAS were classified and a statistical model capable of predicting the method of fentanyl synthesis was validated and tested against CAS profiles from crude fentanyl products deposited and later extracted from two operationally relevant surfaces: stainless steel and vinyl tile. Finally, this work provides the most detailed fentanyl CAS investigation to date by using orthogonal mass spectral data to identify CAS of forensic significance for illicit drug detection, profiling, and attribution.« less
Effects of Inaccurate Identification of Interictal Epileptiform Discharges in Concurrent EEG-fMRI

NASA Astrophysics Data System (ADS)

Gkiatis, K.; Bromis, K.; Kakkos, I.; Karanasiou, I. S.; Matsopoulos, G. K.; Garganis, K.

2017-11-01

Concurrent continuous EEG-fMRI is a novel multimodal technique that is finding its way into clinical practice in epilepsy. EEG timeseries are used to identify the timing of interictal epileptiform discharges (IEDs) which is then included in a GLM analysis in fMRI to localize the epileptic onset zone. Nevertheless, there are still some concerns about its reliability concerning BOLD changes correlated with IEDs. Even though IEDs are identified by an experienced neurologist-epiliptologist, the reliability and concordance of the mark-ups is depending on many factors including the level of fatigue, the amount of time that he spent or, in some cases, even the screen that is being used for the display of timeseries. This investigation is aiming to unravel the effect of misidentification or inaccuracy in the mark-ups of IEDs in the fMRI statistical parametric maps. Concurrent EEG-fMRI was conducted in six subjects with various types of epilepsy. IEDs were identified by an experienced neurologist-epiliptologist. Analysis of EEG was performed with EEGLAB and analysis of fMRI was conducted in FSL. Preliminary results revealed lower statistical significance for missing events or larger period of IEDs than the actual ones and the introduction of false positives and false negatives in statistical parametric maps when random events were included in the GLM on top of the IEDs. Our results suggest that mark-ups in EEG for simultaneous EEG-fMRI should be done with caution from an experienced and restful neurologist as it affects the fMRI results in various and unpredicted ways.
Chemical Attribution of Fentanyl Using Multivariate Statistical Analysis of Orthogonal Mass Spectral Data

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mayer, Brian P.; DeHope, Alan J.; Mew, Daniel A.

Attribution of the origin of an illicit drug relies on identification of compounds indicative of its clandestine production and is a key component of many modern forensic investigations. Here, the results of these studies can yield detailed information on method of manufacture, starting material source, and final product, all critical forensic evidence. In the present work, chemical attribution signatures (CAS) associated with the synthesis of the analgesic fentanyl, N-(1-phenylethylpiperidin-4-yl)-N-phenylpropanamide, were investigated. Six synthesis methods, all previously published fentanyl synthetic routes or hybrid versions thereof, were studied in an effort to identify and classify route-specific signatures. A total of 160 distinctmore » compounds and inorganic species were identified using gas and liquid chromatographies combined with mass spectrometric methods (gas chromatography/mass spectrometry (GC/MS) and liquid chromatography–tandem mass spectrometry-time of-flight (LC–MS/MS-TOF)) in conjunction with inductively coupled plasma mass spectrometry (ICPMS). The complexity of the resultant data matrix urged the use of multivariate statistical analysis. Using partial least-squares-discriminant analysis (PLS-DA), 87 route-specific CAS were classified and a statistical model capable of predicting the method of fentanyl synthesis was validated and tested against CAS profiles from crude fentanyl products deposited and later extracted from two operationally relevant surfaces: stainless steel and vinyl tile. Finally, this work provides the most detailed fentanyl CAS investigation to date by using orthogonal mass spectral data to identify CAS of forensic significance for illicit drug detection, profiling, and attribution.« less
A risk-based approach to management of leachables utilizing statistical analysis of extractables.

PubMed

Stults, Cheryl L M; Mikl, Jaromir; Whelehan, Oliver; Morrical, Bradley; Duffield, William; Nagao, Lee M

2015-04-01

To incorporate quality by design concepts into the management of leachables, an emphasis is often put on understanding the extractable profile for the materials of construction for manufacturing disposables, container-closure, or delivery systems. Component manufacturing processes may also impact the extractable profile. An approach was developed to (1) identify critical components that may be sources of leachables, (2) enable an understanding of manufacturing process factors that affect extractable profiles, (3) determine if quantitative models can be developed that predict the effect of those key factors, and (4) evaluate the practical impact of the key factors on the product. A risk evaluation for an inhalation product identified injection molding as a key process. Designed experiments were performed to evaluate the impact of molding process parameters on the extractable profile from an ABS inhaler component. Statistical analysis of the resulting GC chromatographic profiles identified processing factors that were correlated with peak levels in the extractable profiles. The combination of statistically significant molding process parameters was different for different types of extractable compounds. ANOVA models were used to obtain optimal process settings and predict extractable levels for a selected number of compounds. The proposed paradigm may be applied to evaluate the impact of material composition and processing parameters on extractable profiles and utilized to manage product leachables early in the development process and throughout the product lifecycle.
Statistical analysis of soil geochemical data to identify pathfinders associated with mineral deposits: An example from the Coles Hill uranium deposit, Virginia, USA

USGS Publications Warehouse

Levitan, Denise M.; Zipper, Carl E.; Donovan, Patricia; Schreiber, Madeline E.; Seal, Robert; Engle, Mark A.; Chermak, John A.; Bodnar, Robert J.; Johnson, Daniel K.; Aylor, Joseph G.

2015-01-01

Soil geochemical anomalies can be used to identify pathfinders in exploration for ore deposits. In this study, compositional data analysis is used with multivariate statistical methods to analyse soil geochemical data collected from the Coles Hill uranium deposit, Virginia, USA, to identify pathfinders associated with this deposit. Elemental compositions and relationships were compared between the collected Coles Hill soil and reference soil samples extracted from a regional subset of a national-scale geochemical survey. Results show that pathfinders for the Coles Hill deposit include light rare earth elements (La and Ce), which, when normalised by their Al content, are correlated with U/Al, and elevated Th/Al values, which are not correlated with U/Al, supporting decoupling of U from Th during soil generation. These results can be used in genetic and weathering models of the Coles Hill deposit, and can also be applied to future prospecting for similar U deposits in the eastern United States, and in regions with similar geological/climatic conditions.
Identifying hearing loss by means of iridology.

PubMed

Stearn, Natalie; Swanepoel, De Wet

2006-11-13

Isolated reports of hearing loss presenting as markings on the iris exist, but to date the effectiveness of iridology to identify hearing loss has not been investigated. This study therefore aimed to determine the efficacy of iridological analysis in the identification of moderate to profound sensorineural hearing loss in adolescents. A controlled trial was conducted with an iridologist, blind to the actual hearing status of participants, analyzing the irises of participants with and without hearing loss. Fifty hearing impaired and fifty normal hearing subjects, between the ages of 15 and 19 years, controlled for gender, participated in the study. An experienced iridologist analyzed the randomised set of participants' irises. A 70% correct identification of hearing status was obtained by iridological analyses with a false negative rate of 41% compared to a 19% false positive rate. The respective sensitivity and specificity rates therefore came to 59% and 81%. Iridological analysis of hearing status indicated a statistically significant relationship to actual hearing status (P < 0.05). Although statistically significant sensitivity and specificity rates for identifying hearing loss by iridology were not comparable to those of traditional audiological screening procedures.
A Novel Genome-Information Content-Based Statistic for Genome-Wide Association Analysis Designed for Next-Generation Sequencing Data

PubMed Central

Luo, Li; Zhu, Yun

2012-01-01

Abstract The genome-wide association studies (GWAS) designed for next-generation sequencing data involve testing association of genomic variants, including common, low frequency, and rare variants. The current strategies for association studies are well developed for identifying association of common variants with the common diseases, but may be ill-suited when large amounts of allelic heterogeneity are present in sequence data. Recently, group tests that analyze their collective frequency differences between cases and controls shift the current variant-by-variant analysis paradigm for GWAS of common variants to the collective test of multiple variants in the association analysis of rare variants. However, group tests ignore differences in genetic effects among SNPs at different genomic locations. As an alternative to group tests, we developed a novel genome-information content-based statistics for testing association of the entire allele frequency spectrum of genomic variation with the diseases. To evaluate the performance of the proposed statistics, we use large-scale simulations based on whole genome low coverage pilot data in the 1000 Genomes Project to calculate the type 1 error rates and power of seven alternative statistics: a genome-information content-based statistic, the generalized T2, collapsing method, multivariate and collapsing (CMC) method, individual χ2 test, weighted-sum statistic, and variable threshold statistic. Finally, we apply the seven statistics to published resequencing dataset from ANGPTL3, ANGPTL4, ANGPTL5, and ANGPTL6 genes in the Dallas Heart Study. We report that the genome-information content-based statistic has significantly improved type 1 error rates and higher power than the other six statistics in both simulated and empirical datasets. PMID:22651812
A novel genome-information content-based statistic for genome-wide association analysis designed for next-generation sequencing data.

PubMed

Luo, Li; Zhu, Yun; Xiong, Momiao

2012-06-01

The genome-wide association studies (GWAS) designed for next-generation sequencing data involve testing association of genomic variants, including common, low frequency, and rare variants. The current strategies for association studies are well developed for identifying association of common variants with the common diseases, but may be ill-suited when large amounts of allelic heterogeneity are present in sequence data. Recently, group tests that analyze their collective frequency differences between cases and controls shift the current variant-by-variant analysis paradigm for GWAS of common variants to the collective test of multiple variants in the association analysis of rare variants. However, group tests ignore differences in genetic effects among SNPs at different genomic locations. As an alternative to group tests, we developed a novel genome-information content-based statistics for testing association of the entire allele frequency spectrum of genomic variation with the diseases. To evaluate the performance of the proposed statistics, we use large-scale simulations based on whole genome low coverage pilot data in the 1000 Genomes Project to calculate the type 1 error rates and power of seven alternative statistics: a genome-information content-based statistic, the generalized T(2), collapsing method, multivariate and collapsing (CMC) method, individual χ(2) test, weighted-sum statistic, and variable threshold statistic. Finally, we apply the seven statistics to published resequencing dataset from ANGPTL3, ANGPTL4, ANGPTL5, and ANGPTL6 genes in the Dallas Heart Study. We report that the genome-information content-based statistic has significantly improved type 1 error rates and higher power than the other six statistics in both simulated and empirical datasets.
Combined data preprocessing and multivariate statistical analysis characterizes fed-batch culture of mouse hybridoma cells for rational medium design.

PubMed

Selvarasu, Suresh; Kim, Do Yun; Karimi, Iftekhar A; Lee, Dong-Yup

2010-10-01

We present an integrated framework for characterizing fed-batch cultures of mouse hybridoma cells producing monoclonal antibody (mAb). This framework systematically combines data preprocessing, elemental balancing and statistical analysis technique. Initially, specific rates of cell growth, glucose/amino acid consumptions and mAb/metabolite productions were calculated via curve fitting using logistic equations, with subsequent elemental balancing of the preprocessed data indicating the presence of experimental measurement errors. Multivariate statistical analysis was then employed to understand physiological characteristics of the cellular system. The results from principal component analysis (PCA) revealed three major clusters of amino acids with similar trends in their consumption profiles: (i) arginine, threonine and serine, (ii) glycine, tyrosine, phenylalanine, methionine, histidine and asparagine, and (iii) lysine, valine and isoleucine. Further analysis using partial least square (PLS) regression identified key amino acids which were positively or negatively correlated with the cell growth, mAb production and the generation of lactate and ammonia. Based on these results, the optimal concentrations of key amino acids in the feed medium can be inferred, potentially leading to an increase in cell viability and productivity, as well as a decrease in toxic waste production. The study demonstrated how the current methodological framework using multivariate statistical analysis techniques can serve as a potential tool for deriving rational medium design strategies. Copyright © 2010 Elsevier B.V. All rights reserved.
Causal network analysis of head and neck keloid tissue identifies potential master regulators.

PubMed

Garcia-Rodriguez, Laura; Jones, Lamont; Chen, Kang Mei; Datta, Indrani; Divine, George; Worsham, Maria J

2016-10-01

To generate novel insights and hypotheses in keloid development from potential master regulators. Prospective cohort. Six fresh keloid and six normal skin samples from 12 anonymous donors were used in a prospective cohort study. Genome-wide profiling was done previously on the cohort using the Infinium HumanMethylation450 BeadChip (Illumina, San Diego, CA). The 190 statistically significant CpG islands between keloid and normal tissue mapped to 152 genes (P < .05). The top 10 statistically significant genes (VAMP5, ACTR3C, GALNT3, KCNAB2, LRRC61, SCML4, SYNGR1, TNS1, PLEKHG5, PPP1R13-α, false discovery rate <.015) were uploaded into the Ingenuity Pathway Analysis software's Causal Network Analysis (QIAGEN, Redwood City, CA). To reflect expected gene expression direction in the context of methylation changes, the inverse of the methylation ratio from keloid versus normal tissue was used for the analysis. Causal Network Analysis identified disease-specific master regulator molecules based on downstream differentially expressed keloid-specific genes and expected directionality of expression (hypermethylated vs. hypomethylated). Causal Network Analysis software identified four hierarchical networks that included four master regulators (pyroxamide, tributyrin, PRKG2, and PENK) and 19 intermediate regulators. Causal Network Analysis of differentiated methylated gene data of keloid versus normal skin demonstrated four causal networks with four master regulators. These hierarchical networks suggest potential driver roles for their downstream keloid gene targets in the pathogenesis of the keloid phenotype, likely triggered due to perturbation/injury to normal tissue. NA Laryngoscope, 126:E319-E324, 2016. © 2016 The American Laryngological, Rhinological and Otological Society, Inc.
Multicollinearity in Regression Analyses Conducted in Epidemiologic Studies

PubMed Central

Vatcheva, Kristina P.; Lee, MinJae; McCormick, Joseph B.; Rahbar, Mohammad H.

2016-01-01

The adverse impact of ignoring multicollinearity on findings and data interpretation in regression analysis is very well documented in the statistical literature. The failure to identify and report multicollinearity could result in misleading interpretations of the results. A review of epidemiological literature in PubMed from January 2004 to December 2013, illustrated the need for a greater attention to identifying and minimizing the effect of multicollinearity in analysis of data from epidemiologic studies. We used simulated datasets and real life data from the Cameron County Hispanic Cohort to demonstrate the adverse effects of multicollinearity in the regression analysis and encourage researchers to consider the diagnostic for multicollinearity as one of the steps in regression analysis. PMID:27274911
Multicollinearity in Regression Analyses Conducted in Epidemiologic Studies.

PubMed

Vatcheva, Kristina P; Lee, MinJae; McCormick, Joseph B; Rahbar, Mohammad H

2016-04-01

The adverse impact of ignoring multicollinearity on findings and data interpretation in regression analysis is very well documented in the statistical literature. The failure to identify and report multicollinearity could result in misleading interpretations of the results. A review of epidemiological literature in PubMed from January 2004 to December 2013, illustrated the need for a greater attention to identifying and minimizing the effect of multicollinearity in analysis of data from epidemiologic studies. We used simulated datasets and real life data from the Cameron County Hispanic Cohort to demonstrate the adverse effects of multicollinearity in the regression analysis and encourage researchers to consider the diagnostic for multicollinearity as one of the steps in regression analysis.

Spiritual and ceremonial plants in North America: an assessment of Moerman's ethnobotanical database comparing Residual, Binomial, Bayesian and Imprecise Dirichlet Model (IDM) analysis.

PubMed

Turi, Christina E; Murch, Susan J

2013-07-09

Ethnobotanical research and the study of plants used for rituals, ceremonies and to connect with the spirit world have led to the discovery of many novel psychoactive compounds such as nicotine, caffeine, and cocaine. In North America, spiritual and ceremonial uses of plants are well documented and can be accessed online via the University of Michigan's Native American Ethnobotany Database. The objective of the study was to compare Residual, Bayesian, Binomial and Imprecise Dirichlet Model (IDM) analyses of ritual, ceremonial and spiritual plants in Moerman's ethnobotanical database and to identify genera that may be good candidates for the discovery of novel psychoactive compounds. The database was queried with the following format "Family Name AND Ceremonial OR Spiritual" for 263 North American botanical families. Spiritual and ceremonial flora consisted of 86 families with 517 species belonging to 292 genera. Spiritual taxa were then grouped further into ceremonial medicines and items categories. Residual, Bayesian, Binomial and IDM analysis were performed to identify over and under-utilized families. The 4 statistical approaches were in good agreement when identifying under-utilized families but large families (>393 species) were underemphasized by Binomial, Bayesian and IDM approaches for over-utilization. Residual, Binomial, and IDM analysis identified similar families as over-utilized in the medium (92-392 species) and small (<92 species) classes. The families Apiaceae, Asteraceae, Ericacea, Pinaceae and Salicaceae were identified as significantly over-utilized as ceremonial medicines in medium and large sized families. Analysis of genera within the Apiaceae and Asteraceae suggest that the genus Ligusticum and Artemisia are good candidates for facilitating the discovery of novel psychoactive compounds. The 4 statistical approaches were not consistent in the selection of over-utilization of flora. Residual analysis revealed overall trends that were supported by Binomial analysis when separated into small, medium and large families. The Bayesian, Binomial and IDM approaches identified different genera as potentially important. Species belonging to the genus Artemisia and Ligusticum were most consistently identified and may be valuable in future studies of the ethnopharmacology. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
GWAMA: software for genome-wide association meta-analysis.

PubMed

Mägi, Reedik; Morris, Andrew P

2010-05-28

Despite the recent success of genome-wide association studies in identifying novel loci contributing effects to complex human traits, such as type 2 diabetes and obesity, much of the genetic component of variation in these phenotypes remains unexplained. One way to improving power to detect further novel loci is through meta-analysis of studies from the same population, increasing the sample size over any individual study. Although statistical software analysis packages incorporate routines for meta-analysis, they are ill equipped to meet the challenges of the scale and complexity of data generated in genome-wide association studies. We have developed flexible, open-source software for the meta-analysis of genome-wide association studies. The software incorporates a variety of error trapping facilities, and provides a range of meta-analysis summary statistics. The software is distributed with scripts that allow simple formatting of files containing the results of each association study and generate graphical summaries of genome-wide meta-analysis results. The GWAMA (Genome-Wide Association Meta-Analysis) software has been developed to perform meta-analysis of summary statistics generated from genome-wide association studies of dichotomous phenotypes or quantitative traits. Software with source files, documentation and example data files are freely available online at http://www.well.ox.ac.uk/GWAMA.
Analysis of select Dalbergia and trade timber using direct analysis in real time and time-of-flight mass spectrometry for CITES enforcement.

PubMed

Lancaster, Cady; Espinoza, Edgard

2012-05-15

International trade of several Dalbergia wood species is regulated by The Convention on International Trade in Endangered Species of Wild Fauna and Flora (CITES). In order to supplement morphological identification of these species, a rapid chemical method of analysis was developed. Using Direct Analysis in Real Time (DART) ionization coupled with Time-of-Flight (TOF) Mass Spectrometry (MS), selected Dalbergia and common trade species were analyzed. Each of the 13 wood species was classified using principal component analysis and linear discriminant analysis (LDA). These statistical data clusters served as reliable anchors for species identification of unknowns. Analysis of 20 or more samples from the 13 species studied in this research indicates that the DART-TOFMS results are reproducible. Statistical analysis of the most abundant ions gave good classifications that were useful for identifying unknown wood samples. DART-TOFMS and LDA analysis of 13 species of selected timber samples and the statistical classification allowed for the correct assignment of unknown wood samples. This method is rapid and can be useful when anatomical identification is difficult but needed in order to support CITES enforcement. Published 2012. This article is a US Government work and is in the public domain in the USA.
A statistical analysis of cervical auscultation signals from adults with unsafe airway protection.

PubMed

Dudik, Joshua M; Kurosu, Atsuko; Coyle, James L; Sejdić, Ervin

2016-01-22

Aspiration, where food or liquid is allowed to enter the larynx during a swallow, is recognized as the most clinically salient feature of oropharyngeal dysphagia. This event can lead to short-term harm via airway obstruction or more long-term effects such as pneumonia. In order to non-invasively identify this event using high resolution cervical auscultation there is a need to characterize cervical auscultation signals from subjects with dysphagia who aspirate. In this study, we collected swallowing sound and vibration data from 76 adults (50 men, 26 women, mean age 62) who underwent a routine videofluoroscopy swallowing examination. The analysis was limited to swallows of liquid with either thin (<5 cps) or viscous (≈300 cps) consistency and was divided into those with deep laryngeal penetration or aspiration (unsafe airway protection), and those with either shallow or no laryngeal penetration (safe airway protection), using a standardized scale. After calculating a selection of time, frequency, and time-frequency features for each swallow, the safe and unsafe categories were compared using Wilcoxon rank-sum statistical tests. Our analysis found that few of our chosen features varied in magnitude between safe and unsafe swallows with thin swallows demonstrating no statistical variation. We also supported our past findings with regard to the effects of sex and the presence or absence of stroke on cervical ausculation signals, but noticed certain discrepancies with regards to bolus viscosity. Overall, our results support the necessity of using multiple statistical features concurrently to identify laryngeal penetration of swallowed boluses in future work with high resolution cervical auscultation.
Integrating statistical and clinical research elements in intervention-related grant applications: summary from an NIMH workshop.

PubMed

Sherrill, Joel T; Sommers, David I; Nierenberg, Andrew A; Leon, Andrew C; Arndt, Stephan; Bandeen-Roche, Karen; Greenhouse, Joel; Guthrie, Donald; Normand, Sharon-Lise; Phillips, Katharine A; Shear, M Katherine; Woolson, Robert

2009-01-01

The authors summarize points for consideration generated in a National Institute of Mental Health (NIMH) workshop convened to provide an opportunity for reviewers from different disciplines-specifically clinical researchers and statisticians-to discuss how their differing and complementary expertise can be well integrated in the review of intervention-related grant applications. A 1-day workshop was convened in October, 2004. The workshop featured panel presentations on key topics followed by interactive discussion. This article summarizes the workshop and subsequent discussions, which centered on topics including weighting the statistics/data analysis elements of an application in the assessment of the application's overall merit; the level of statistical sophistication appropriate to different stages of research and for different funding mechanisms; some key considerations in the design and analysis portions of applications; appropriate statistical methods for addressing essential questions posed by an application; and the role of the statistician in the application's development, study conduct, and interpretation and dissemination of results. A number of key elements crucial to the construction and review of grant applications were identified. It was acknowledged that intervention-related studies unavoidably involve trade-offs. Reviewers are helped when applications acknowledge such trade-offs and provide good rationale for their choices. Clear linkage among the design, aims, hypotheses, and data analysis plan and avoidance of disconnections among these elements also strengthens applications. The authors identify multiple points to consider when constructing intervention-related grant applications. The points are presented here as questions and do not reflect institute policy or comprise a list of best practices, but rather represent points for consideration.
ToNER: A tool for identifying nucleotide enrichment signals in feature-enriched RNA-seq data.

PubMed

Promworn, Yuttachon; Kaewprommal, Pavita; Shaw, Philip J; Intarapanich, Apichart; Tongsima, Sissades; Piriyapongsa, Jittima

2017-01-01

Biochemical methods are available for enriching 5' ends of RNAs in prokaryotes, which are employed in the differential RNA-seq (dRNA-seq) and the more recent Cappable-seq protocols. Computational methods are needed to locate RNA 5' ends from these data by statistical analysis of the enrichment. Although statistical-based analysis methods have been developed for dRNA-seq, they may not be suitable for Cappable-seq data. The more efficient enrichment method employed in Cappable-seq compared with dRNA-seq could affect data distribution and thus algorithm performance. We present Transformation of Nucleotide Enrichment Ratios (ToNER), a tool for statistical modeling of enrichment from RNA-seq data obtained from enriched and unenriched libraries. The tool calculates nucleotide enrichment scores and determines the global transformation for fitting to the normal distribution using the Box-Cox procedure. From the transformed distribution, sites of significant enrichment are identified. To increase power of detection, meta-analysis across experimental replicates is offered. We tested the tool on Cappable-seq and dRNA-seq data for identifying Escherichia coli transcript 5' ends and compared the results with those from the TSSAR tool, which is designed for analyzing dRNA-seq data. When combining results across Cappable-seq replicates, ToNER detects more known transcript 5' ends than TSSAR. In general, the transcript 5' ends detected by ToNER but not TSSAR occur in regions which cannot be locally modeled by TSSAR. ToNER uses a simple yet robust statistical modeling approach, which can be used for detecting RNA 5'ends from Cappable-seq data, in particular when combining information from experimental replicates. The ToNER tool could potentially be applied for analyzing other RNA-seq datasets in which enrichment for other structural features of RNA is employed. The program is freely available for download at ToNER webpage (http://www4a.biotec.or.th/GI/tools/toner) and GitHub repository (https://github.com/PavitaKae/ToNER).
Validation of a Delirium Risk Assessment Using Electronic Medical Record Information.

PubMed

Rudolph, James L; Doherty, Kelly; Kelly, Brittany; Driver, Jane A; Archambault, Elizabeth

2016-03-01

Identifying patients at risk for delirium allows prompt application of prevention, diagnostic, and treatment strategies; but is rarely done. Once delirium develops, patients are more likely to need posthospitalization skilled care. This study developed an a priori electronic prediction rule using independent risk factors identified in a National Center of Clinical Excellence meta-analysis and validated the ability to predict delirium in 2 cohorts. Retrospective analysis followed by prospective validation. Tertiary VA Hospital in New England. A total of 27,625 medical records of hospitalized patients and 246 prospectively enrolled patients admitted to the hospital. The electronic delirium risk prediction rule was created using data obtained from the patient electronic medical record (EMR). The primary outcome, delirium, was identified 2 ways: (1) from the EMR (retrospective cohort) and (2) clinical assessment on enrollment and daily thereafter (prospective participants). We assessed discrimination of the delirium prediction rule with the C-statistic. Secondary outcomes were length of stay and discharge to rehabilitation. Retrospectively, delirium was identified in 8% of medical records (n = 2343); prospectively, delirium during hospitalization was present in 26% of participants (n = 64). In the retrospective cohort, medical record delirium was identified in 2%, 3%, 11%, and 38% of the low, intermediate, high, and very high-risk groups, respectively (C-statistic = 0.81; 95% confidence interval 0.80-0.82). Prospectively, the electronic prediction rule identified delirium in 15%, 18%, 31%, and 55% of these groups (C-statistic = 0.69; 95% confidence interval 0.61-0.77). Compared with low-risk patients, those at high- or very high delirium risk had increased length of stay (5.7 ± 5.6 vs 3.7 ± 2.7 days; P = .001) and higher rates of discharge to rehabilitation (8.9% vs 20.8%; P = .02). Automatic calculation of delirium risk using an EMR algorithm identifies patients at risk for delirium, which creates a critical opportunity for gaining clinical efficiencies and improving delirium identification, including those needing skilled care. Published by Elsevier Inc.
A Baseline for the Multivariate Comparison of Resting-State Networks

PubMed Central

Allen, Elena A.; Erhardt, Erik B.; Damaraju, Eswar; Gruner, William; Segall, Judith M.; Silva, Rogers F.; Havlicek, Martin; Rachakonda, Srinivas; Fries, Jill; Kalyanam, Ravi; Michael, Andrew M.; Caprihan, Arvind; Turner, Jessica A.; Eichele, Tom; Adelsheim, Steven; Bryan, Angela D.; Bustillo, Juan; Clark, Vincent P.; Feldstein Ewing, Sarah W.; Filbey, Francesca; Ford, Corey C.; Hutchison, Kent; Jung, Rex E.; Kiehl, Kent A.; Kodituwakku, Piyadasa; Komesu, Yuko M.; Mayer, Andrew R.; Pearlson, Godfrey D.; Phillips, John P.; Sadek, Joseph R.; Stevens, Michael; Teuscher, Ursina; Thoma, Robert J.; Calhoun, Vince D.

2011-01-01

As the size of functional and structural MRI datasets expands, it becomes increasingly important to establish a baseline from which diagnostic relevance may be determined, a processing strategy that efficiently prepares data for analysis, and a statistical approach that identifies important effects in a manner that is both robust and reproducible. In this paper, we introduce a multivariate analytic approach that optimizes sensitivity and reduces unnecessary testing. We demonstrate the utility of this mega-analytic approach by identifying the effects of age and gender on the resting-state networks (RSNs) of 603 healthy adolescents and adults (mean age: 23.4 years, range: 12–71 years). Data were collected on the same scanner, preprocessed using an automated analysis pipeline based in SPM, and studied using group independent component analysis. RSNs were identified and evaluated in terms of three primary outcome measures: time course spectral power, spatial map intensity, and functional network connectivity. Results revealed robust effects of age on all three outcome measures, largely indicating decreases in network coherence and connectivity with increasing age. Gender effects were of smaller magnitude but suggested stronger intra-network connectivity in females and more inter-network connectivity in males, particularly with regard to sensorimotor networks. These findings, along with the analysis approach and statistical framework described here, provide a useful baseline for future investigations of brain networks in health and disease. PMID:21442040
Expression Profiling of Nonpolar Lipids in Meibum From Patients With Dry Eye: A Pilot Study

PubMed Central

Chen, Jianzhong; Keirsey, Jeremy K.; Green, Kari B.; Nichols, Kelly K.

2017-01-01

Purpose The purpose of this investigation was to characterize differentially expressed lipids in meibum samples from patients with dry eye disease (DED) in order to better understand the underlying pathologic mechanisms. Methods Meibum samples were collected from postmenopausal women with DED (PW-DED; n = 5) and a control group of postmenopausal women without DED (n = 4). Lipid profiles were analyzed by direct infusion full-scan electrospray ionization mass spectrometry (ESI-MS). An initial analysis of 145 representative peaks from four classes of lipids in PW-DED samples revealed that additional manual corrections for peak overlap and isotopes only slightly affected the statistical analysis. Therefore, analysis of uncorrected data, which can be applied to a greater number of peaks, was used to compare more than 500 lipid peaks common to PW-DED and control samples. Statistical analysis of peak intensities identified several lipid species that differed significantly between the two groups. Data from contact lens wearers with DED (CL-DED; n = 5) were also analyzed. Results Many species of the two types of diesters (DE) and very long chain wax esters (WE) were decreased by ∼20% in PW-DED, whereas levels of triacylglycerols were increased by an average of 39% ± 3% in meibum from PW-DED compared to that in the control group. Approximately the same reduction (20%) of similar DE and WE was observed for CL-DED. Conclusions Statistical analysis of peak intensities from direct infusion ESI-MS results identified differentially expressed lipids in meibum from dry eye patients. Further studies are warranted to support these findings. PMID:28426869
Quantitative investigation of inappropriate regression model construction and the importance of medical statistics experts in observational medical research: a cross-sectional study.

PubMed

Nojima, Masanori; Tokunaga, Mutsumi; Nagamura, Fumitaka

2018-05-05

To investigate under what circumstances inappropriate use of 'multivariate analysis' is likely to occur and to identify the population that needs more support with medical statistics. The frequency of inappropriate regression model construction in multivariate analysis and related factors were investigated in observational medical research publications. The inappropriate algorithm of using only variables that were significant in univariate analysis was estimated to occur at 6.4% (95% CI 4.8% to 8.5%). This was observed in 1.1% of the publications with a medical statistics expert (hereinafter 'expert') as the first author, 3.5% if an expert was included as coauthor and in 12.2% if experts were not involved. In the publications where the number of cases was 50 or less and the study did not include experts, inappropriate algorithm usage was observed with a high proportion of 20.2%. The OR of the involvement of experts for this outcome was 0.28 (95% CI 0.15 to 0.53). A further, nation-level, analysis showed that the involvement of experts and the implementation of unfavourable multivariate analysis are associated at the nation-level analysis (R=-0.652). Based on the results of this study, the benefit of participation of medical statistics experts is obvious. Experts should be involved for proper confounding adjustment and interpretation of statistical models. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
qFeature

DOE Office of Scientific and Technical Information (OSTI.GOV)

2015-09-14

This package contains statistical routines for extracting features from multivariate time-series data which can then be used for subsequent multivariate statistical analysis to identify patterns and anomalous behavior. It calculates local linear or quadratic regression model fits to moving windows for each series and then summarizes the model coefficients across user-defined time intervals for each series. These methods are domain agnostic-but they have been successfully applied to a variety of domains, including commercial aviation and electric power grid data.
Statistical analysis of microgravity experiment performance using the degrees of success scale

NASA Technical Reports Server (NTRS)

Upshaw, Bernadette; Liou, Ying-Hsin Andrew; Morilak, Daniel P.

1994-01-01

This paper describes an approach to identify factors that significantly influence microgravity experiment performance. Investigators developed the 'degrees of success' scale to provide a numerical representation of success. A degree of success was assigned to 293 microgravity experiments. Experiment information including the degree of success rankings and factors for analysis was compiled into a database. Through an analysis of variance, nine significant factors in microgravity experiment performance were identified. The frequencies of these factors are presented along with the average degree of success at each level. A preliminary discussion of the relationship between the significant factors and the degree of success is presented.
Transition-Region Ultraviolet Explosive Events in IRIS Si IV: A Statistical Analysis

NASA Astrophysics Data System (ADS)

Bartz, Allison

2018-01-01

Explosive events (EEs) in the solar transition region are characterized by broad, non-Gaussian line profiles with wings at Doppler velocities exceeding the speed of sound. We present a statistical analysis of 23 IRIS (Interface Region Imaging Spectrograph) sit-and-stare observations, observed between April 2014 and March 2017. Using the IRIS Si IV 1394 Å and 1403 Å spectral windows and the 1400Å Slit Jaw images we have identified 581 EEs. We found that most EEs last less than 20 min. and have a spatial scale on the slit less than 10”, agreeing with measurements in previous work. We observed most EEs in active regions, regardless of date of observation, but selection bias of IRIS observations cannot be ruled out. We also present preliminary findings of optical depth effects from our statistical study.
Kepler AutoRegressive Planet Search

NASA Astrophysics Data System (ADS)

Feigelson, Eric

NASA's Kepler mission is the source of more exoplanets than any other instrument, but the discovery depends on complex statistical analysis procedures embedded in the Kepler pipeline. A particular challenge is mitigating irregular stellar variability without loss of sensitivity to faint periodic planetary transits. This proposal presents a two-stage alternative analysis procedure. First, parametric autoregressive ARFIMA models, commonly used in econometrics, remove most of the stellar variations. Second, a novel matched filter is used to create a periodogram from which transit-like periodicities are identified. This analysis procedure, the Kepler AutoRegressive Planet Search (KARPS), is confirming most of the Kepler Objects of Interest and is expected to identify additional planetary candidates. The proposed research will complete application of the KARPS methodology to the prime Kepler mission light curves of 200,000: stars, and compare the results with Kepler Objects of Interest obtained with the Kepler pipeline. We will then conduct a variety of astronomical studies based on the KARPS results. Important subsamples will be extracted including Habitable Zone planets, hot super-Earths, grazing-transit hot Jupiters, and multi-planet systems. Groundbased spectroscopy of poorly studied candidates will be performed to better characterize the host stars. Studies of stellar variability will then be pursued based on KARPS analysis. The autocorrelation function and nonstationarity measures will be used to identify spotted stars at different stages of autoregressive modeling. Periodic variables with folded light curves inconsistent with planetary transits will be identified; they may be eclipsing or mutually-illuminating binary star systems. Classification of stellar variables with KARPS-derived statistical properties will be attempted. KARPS procedures will then be applied to archived K2 data to identify planetary transits and characterize stellar variability.
A systematic review of the quality of statistical methods employed for analysing quality of life data in cancer randomised controlled trials.

PubMed

Hamel, Jean-Francois; Saulnier, Patrick; Pe, Madeline; Zikos, Efstathios; Musoro, Jammbe; Coens, Corneel; Bottomley, Andrew

2017-09-01

Over the last decades, Health-related Quality of Life (HRQoL) end-points have become an important outcome of the randomised controlled trials (RCTs). HRQoL methodology in RCTs has improved following international consensus recommendations. However, no international recommendations exist concerning the statistical analysis of such data. The aim of our study was to identify and characterise the quality of the statistical methods commonly used for analysing HRQoL data in cancer RCTs. Building on our recently published systematic review, we analysed a total of 33 published RCTs studying the HRQoL methods reported in RCTs since 1991. We focussed on the ability of the methods to deal with the three major problems commonly encountered when analysing HRQoL data: their multidimensional and longitudinal structure and the commonly high rate of missing data. All studies reported HRQoL being assessed repeatedly over time for a period ranging from 2 to 36 months. Missing data were common, with compliance rates ranging from 45% to 90%. From the 33 studies considered, 12 different statistical methods were identified. Twenty-nine studies analysed each of the questionnaire sub-dimensions without type I error adjustment. Thirteen studies repeated the HRQoL analysis at each assessment time again without type I error adjustment. Only 8 studies used methods suitable for repeated measurements. Our findings show a lack of consistency in statistical methods for analysing HRQoL data. Problems related to multiple comparisons were rarely considered leading to a high risk of false positive results. It is therefore critical that international recommendations for improving such statistical practices are developed. Copyright © 2017. Published by Elsevier Ltd.
The fragility of statistically significant findings from randomized trials in head and neck surgery.

PubMed

Noel, Christopher W; McMullen, Caitlin; Yao, Christopher; Monteiro, Eric; Goldstein, David P; Eskander, Antoine; de Almeida, John R

2018-04-23

The Fragility Index (FI) is a novel tool for evaluating the robustness of statistically significant findings in a randomized control trial (RCT). It measures the number of events upon which statistical significance depends. We sought to calculate the FI scores for RCTs in the head and neck cancer literature where surgery was a primary intervention. Potential articles were identified in PubMed (MEDLINE), Embase, and Cochrane without publication date restrictions. Two reviewers independently screened eligible RCTs reporting at least one dichotomous and statistically significant outcome. The data from each trial were extracted and the FI scores were calculated. Associations between trial characteristics and FI were determined. In total, 27 articles were identified. The median sample size was 67.5 (interquartile range [IQR] = 42-143) and the median number of events per trial was 8 (IQR = 2.25-18.25). The median FI score was 1 (IQR = 0-2.5), meaning that changing one patient from a nonevent to an event in the treatment arm would change the result to a statistically nonsignificant result, or P > .05. The FI score was less than the number of patients lost to follow-up in 71% of cases. The FI score was found to be moderately correlated with P value (ρ = -0.52, P = .007) and with journal impact factor (ρ = 0.49, P = .009) on univariable analysis. On multivariable analysis, only the P value was found to be a predictor of FI score (P = .001). Randomized trials in the head and neck cancer literature where surgery is a primary modality are relatively nonrobust statistically with low FI scores. Laryngoscope, 2018. © 2018 The American Laryngological, Rhinological and Otological Society, Inc.
Power analysis as a tool to identify statistically informative indicators for monitoring coral reef disturbances.

PubMed

Van Wynsberge, Simon; Gilbert, Antoine; Guillemot, Nicolas; Heintz, Tom; Tremblay-Boyer, Laura

2017-07-01

Extensive biological field surveys are costly and time consuming. To optimize sampling and ensure regular monitoring on the long term, identifying informative indicators of anthropogenic disturbances is a priority. In this study, we used 1800 candidate indicators by combining metrics measured from coral, fish, and macro-invertebrate assemblages surveyed from 2006 to 2012 in the vicinity of an ongoing mining project in the Voh-Koné-Pouembout lagoon, New Caledonia. We performed a power analysis to identify a subset of indicators which would best discriminate temporal changes due to a simulated chronic anthropogenic impact. Only 4% of tested indicators were likely to detect a 10% annual decrease of values with sufficient power (>0.80). Corals generally exerted higher statistical power than macro-invertebrates and fishes because of lower natural variability and higher occurrence. For the same reasons, higher taxonomic ranks provided higher power than lower taxonomic ranks. Nevertheless, a number of families of common sedentary or sessile macro-invertebrates and fishes also performed well in detecting changes: Echinometridae, Isognomidae, Muricidae, Tridacninae, Arcidae, and Turbinidae for macro-invertebrates and Pomacentridae, Labridae, and Chaetodontidae for fishes. Interestingly, these families did not provide high power in all geomorphological strata, suggesting that the ability of indicators in detecting anthropogenic impacts was closely linked to reef geomorphology. This study provides a first operational step toward identifying statistically relevant indicators of anthropogenic disturbances in New Caledonia's coral reefs, which can be useful in similar tropical reef ecosystems where little information is available regarding the responses of ecological indicators to anthropogenic disturbances.
The Job Dimensions Underlying the Job Elements of the Position Analysis Questionnaire (PAQ) (Form B).

DTIC Science & Technology

The study was concerned with the identification of the job dimension underlying the job elements of the Position Analysis Questionnaire ( PAQ ), Form B...The PAQ is a structured job analysis instrument consisting of 187 worker-oriented job elements which are divided into six a priori major divisions...The statistical procedure of principal components analysis was used to identify the job dimensions of the PAQ . Forty-five job dimensions were
A robust clustering algorithm for identifying problematic samples in genome-wide association studies.

PubMed

Bellenguez, Céline; Strange, Amy; Freeman, Colin; Donnelly, Peter; Spencer, Chris C A

2012-01-01

High-throughput genotyping arrays provide an efficient way to survey single nucleotide polymorphisms (SNPs) across the genome in large numbers of individuals. Downstream analysis of the data, for example in genome-wide association studies (GWAS), often involves statistical models of genotype frequencies across individuals. The complexities of the sample collection process and the potential for errors in the experimental assay can lead to biases and artefacts in an individual's inferred genotypes. Rather than attempting to model these complications, it has become a standard practice to remove individuals whose genome-wide data differ from the sample at large. Here we describe a simple, but robust, statistical algorithm to identify samples with atypical summaries of genome-wide variation. Its use as a semi-automated quality control tool is demonstrated using several summary statistics, selected to identify different potential problems, and it is applied to two different genotyping platforms and sample collections. The algorithm is written in R and is freely available at www.well.ox.ac.uk/chris-spencer chris.spencer@well.ox.ac.uk Supplementary data are available at Bioinformatics online.
Mass detection, localization and estimation for wind turbine blades based on statistical pattern recognition

NASA Astrophysics Data System (ADS)

Colone, L.; Hovgaard, M. K.; Glavind, L.; Brincker, R.

2018-07-01

A method for mass change detection on wind turbine blades using natural frequencies is presented. The approach is based on two statistical tests. The first test decides if there is a significant mass change and the second test is a statistical group classification based on Linear Discriminant Analysis. The frequencies are identified by means of Operational Modal Analysis using natural excitation. Based on the assumption of Gaussianity of the frequencies, a multi-class statistical model is developed by combining finite element model sensitivities in 10 classes of change location on the blade, the smallest area being 1/5 of the span. The method is experimentally validated for a full scale wind turbine blade in a test setup and loaded by natural wind. Mass change from natural causes was imitated with sand bags and the algorithm was observed to perform well with an experimental detection rate of 1, localization rate of 0.88 and mass estimation rate of 0.72.

Scheduled intercity transportation : rural service areas in the United States

DOT National Transportation Integrated Search

2005-06-01

To identify how many of the countrys 82.4 million rural residents are within the reasonable coverage radius of at least one intercity transportation facility, in 2003 the Bureau of Transportation Statistics (BTS) undertook a geospatial analysis us...
Finding the Root Causes of Statistical Inconsistency in Community Earth System Model Output

NASA Astrophysics Data System (ADS)

Milroy, D.; Hammerling, D.; Baker, A. H.

2017-12-01

Baker et al (2015) developed the Community Earth System Model Ensemble Consistency Test (CESM-ECT) to provide a metric for software quality assurance by determining statistical consistency between an ensemble of CESM outputs and new test runs. The test has proved useful for detecting statistical difference caused by compiler bugs and errors in physical modules. However, detection is only the necessary first step in finding the causes of statistical difference. The CESM is a vastly complex model comprised of millions of lines of code which is developed and maintained by a large community of software engineers and scientists. Any root cause analysis is correspondingly challenging. We propose a new capability for CESM-ECT: identifying the sections of code that cause statistical distinguishability. The first step is to discover CESM variables that cause CESM-ECT to classify new runs as statistically distinct, which we achieve via Randomized Logistic Regression. Next we use a tool developed to identify CESM components that define or compute the variables found in the first step. Finally, we employ the application Kernel GENerator (KGEN) created in Kim et al (2016) to detect fine-grained floating point differences. We demonstrate an example of the procedure and advance a plan to automate this process in our future work.
Improvements to an earth observing statistical performance model with applications to LWIR spectral variability

NASA Astrophysics Data System (ADS)

Zhao, Runchen; Ientilucci, Emmett J.

2017-05-01

Hyperspectral remote sensing systems provide spectral data composed of hundreds of narrow spectral bands. Spectral remote sensing systems can be used to identify targets, for example, without physical interaction. Often it is of interested to characterize the spectral variability of targets or objects. The purpose of this paper is to identify and characterize the LWIR spectral variability of targets based on an improved earth observing statistical performance model, known as the Forecasting and Analysis of Spectroradiometric System Performance (FASSP) model. FASSP contains three basic modules including a scene model, sensor model and a processing model. Instead of using mean surface reflectance only as input to the model, FASSP transfers user defined statistical characteristics of a scene through the image chain (i.e., from source to sensor). The radiative transfer model, MODTRAN, is used to simulate the radiative transfer based on user defined atmospheric parameters. To retrieve class emissivity and temperature statistics, or temperature / emissivity separation (TES), a LWIR atmospheric compensation method is necessary. The FASSP model has a method to transform statistics in the visible (ie., ELM) but currently does not have LWIR TES algorithm in place. This paper addresses the implementation of such a TES algorithm and its associated transformation of statistics.
SBCDDB: Sleeping Beauty Cancer Driver Database for gene discovery in mouse models of human cancers

PubMed Central

Mann, Michael B

2018-01-01

Abstract Large-scale oncogenomic studies have identified few frequently mutated cancer drivers and hundreds of infrequently mutated drivers. Defining the biological context for rare driving events is fundamentally important to increasing our understanding of the druggable pathways in cancer. Sleeping Beauty (SB) insertional mutagenesis is a powerful gene discovery tool used to model human cancers in mice. Our lab and others have published a number of studies that identify cancer drivers from these models using various statistical and computational approaches. Here, we have integrated SB data from primary tumor models into an analysis and reporting framework, the Sleeping Beauty Cancer Driver DataBase (SBCDDB, http://sbcddb.moffitt.org), which identifies drivers in individual tumors or tumor populations. Unique to this effort, the SBCDDB utilizes a single, scalable, statistical analysis method that enables data to be grouped by different biological properties. This allows for SB drivers to be evaluated (and re-evaluated) under different contexts. The SBCDDB provides visual representations highlighting the spatial attributes of transposon mutagenesis and couples this functionality with analysis of gene sets, enabling users to interrogate relationships between drivers. The SBCDDB is a powerful resource for comparative oncogenomic analyses with human cancer genomics datasets for driver prioritization. PMID:29059366
A Powerful Procedure for Pathway-Based Meta-analysis Using Summary Statistics Identifies 43 Pathways Associated with Type II Diabetes in European Populations.

PubMed

Zhang, Han; Wheeler, William; Hyland, Paula L; Yang, Yifan; Shi, Jianxin; Chatterjee, Nilanjan; Yu, Kai

2016-06-01

Meta-analysis of multiple genome-wide association studies (GWAS) has become an effective approach for detecting single nucleotide polymorphism (SNP) associations with complex traits. However, it is difficult to integrate the readily accessible SNP-level summary statistics from a meta-analysis into more powerful multi-marker testing procedures, which generally require individual-level genetic data. We developed a general procedure called Summary based Adaptive Rank Truncated Product (sARTP) for conducting gene and pathway meta-analysis that uses only SNP-level summary statistics in combination with genotype correlation estimated from a panel of individual-level genetic data. We demonstrated the validity and power advantage of sARTP through empirical and simulated data. We conducted a comprehensive pathway-based meta-analysis with sARTP on type 2 diabetes (T2D) by integrating SNP-level summary statistics from two large studies consisting of 19,809 T2D cases and 111,181 controls with European ancestry. Among 4,713 candidate pathways from which genes in neighborhoods of 170 GWAS established T2D loci were excluded, we detected 43 T2D globally significant pathways (with Bonferroni corrected p-values < 0.05), which included the insulin signaling pathway and T2D pathway defined by KEGG, as well as the pathways defined according to specific gene expression patterns on pancreatic adenocarcinoma, hepatocellular carcinoma, and bladder carcinoma. Using summary data from 8 eastern Asian T2D GWAS with 6,952 cases and 11,865 controls, we showed 7 out of the 43 pathways identified in European populations remained to be significant in eastern Asians at the false discovery rate of 0.1. We created an R package and a web-based tool for sARTP with the capability to analyze pathways with thousands of genes and tens of thousands of SNPs.
A Powerful Procedure for Pathway-Based Meta-analysis Using Summary Statistics Identifies 43 Pathways Associated with Type II Diabetes in European Populations

PubMed Central

Zhang, Han; Wheeler, William; Hyland, Paula L.; Yang, Yifan; Shi, Jianxin; Chatterjee, Nilanjan; Yu, Kai

2016-01-01

Meta-analysis of multiple genome-wide association studies (GWAS) has become an effective approach for detecting single nucleotide polymorphism (SNP) associations with complex traits. However, it is difficult to integrate the readily accessible SNP-level summary statistics from a meta-analysis into more powerful multi-marker testing procedures, which generally require individual-level genetic data. We developed a general procedure called Summary based Adaptive Rank Truncated Product (sARTP) for conducting gene and pathway meta-analysis that uses only SNP-level summary statistics in combination with genotype correlation estimated from a panel of individual-level genetic data. We demonstrated the validity and power advantage of sARTP through empirical and simulated data. We conducted a comprehensive pathway-based meta-analysis with sARTP on type 2 diabetes (T2D) by integrating SNP-level summary statistics from two large studies consisting of 19,809 T2D cases and 111,181 controls with European ancestry. Among 4,713 candidate pathways from which genes in neighborhoods of 170 GWAS established T2D loci were excluded, we detected 43 T2D globally significant pathways (with Bonferroni corrected p-values < 0.05), which included the insulin signaling pathway and T2D pathway defined by KEGG, as well as the pathways defined according to specific gene expression patterns on pancreatic adenocarcinoma, hepatocellular carcinoma, and bladder carcinoma. Using summary data from 8 eastern Asian T2D GWAS with 6,952 cases and 11,865 controls, we showed 7 out of the 43 pathways identified in European populations remained to be significant in eastern Asians at the false discovery rate of 0.1. We created an R package and a web-based tool for sARTP with the capability to analyze pathways with thousands of genes and tens of thousands of SNPs. PMID:27362418
Prioritizing GWAS Results: A Review of Statistical Methods and Recommendations for Their Application

PubMed Central

Cantor, Rita M.; Lange, Kenneth; Sinsheimer, Janet S.

2010-01-01

Genome-wide association studies (GWAS) have rapidly become a standard method for disease gene discovery. A substantial number of recent GWAS indicate that for most disorders, only a few common variants are implicated and the associated SNPs explain only a small fraction of the genetic risk. This review is written from the viewpoint that findings from the GWAS provide preliminary genetic information that is available for additional analysis by statistical procedures that accumulate evidence, and that these secondary analyses are very likely to provide valuable information that will help prioritize the strongest constellations of results. We review and discuss three analytic methods to combine preliminary GWAS statistics to identify genes, alleles, and pathways for deeper investigations. Meta-analysis seeks to pool information from multiple GWAS to increase the chances of finding true positives among the false positives and provides a way to combine associations across GWAS, even when the original data are unavailable. Testing for epistasis within a single GWAS study can identify the stronger results that are revealed when genes interact. Pathway analysis of GWAS results is used to prioritize genes and pathways within a biological context. Following a GWAS, association results can be assigned to pathways and tested in aggregate with computational tools and pathway databases. Reviews of published methods with recommendations for their application are provided within the framework for each approach. PMID:20074509
Population data of five genetic markers in the Turkish population: comparison with four American population groups.

PubMed

Kurtuluş-Ulküer, M; Ulküer, U; Kesici, T; Menevşe, S

2002-09-01

In this study, the phenotype and allele frequencies of five enzyme systems were determined in a total of 611 unrelated Turkish individuals and analyzed by using the exact and the chi 2 test. The following five red cell enzymes were identified by cellulose acetate electrophoresis: phosphoglucomutase (PGM), adenosine deaminase (ADA), phosphoglucose isomerase (PGI), adenylate kinase (AK), and 6-phosphogluconate dehydrogenase (6-PGD). The ADA, PGM and AK enzymes were found to be polymorphic in the Turkish population. The results of the statistical analysis showed, that the phenotype frequencies of the five enzyme under study are in Hardy-Weinberg equilibrium. Statistical analysis was performed in order to examine whether there are significant differences in the phenotype frequencies between the Turkish population and four American population groups. This analysis showed, that there are some statistically significant differences between the Turkish and the other groups. Moreover, the observed phenotype and allele frequencies were compared with those obtained in other population groups of Turkey.
The mediating effect of calling on the relationship between medical school students' academic burnout and empathy.

PubMed

Chae, Su Jin; Jeong, So Mi; Chung, Yoon-Sok

2017-09-01

This study is aimed at identifying the relationships between medical school students' academic burnout, empathy, and calling, and determining whether their calling has a mediating effect on the relationship between academic burnout and empathy. A mixed method study was conducted. One hundred twenty-seven medical students completed a survey. Scales measuring academic burnout, medical students' empathy, and calling were utilized. For statistical analysis, correlation analysis, descriptive statistics analysis, and hierarchical multiple regression analyses were conducted. For qualitative approach, eight medical students participated in a focus group interview. The study found that empathy has a statistically significant, negative correlation with academic burnout, while having a significant, positive correlation with calling. Sense of calling proved to be an effective mediator of the relationship between academic burnout and empathy. This result demonstrates that calling is a key variable that mediates the relationship between medical students' academic burnout and empathy. As such, this study provides baseline data for an education that could improve medical students' empathy skills.
Traumatic injury among drywall installers, 1992 to 1995.

PubMed

Chiou, S S; Pan, C S; Keane, P

2000-11-01

This study examined the traumatic-injury characteristics associated with one of the high-risk occupations in the construction industry--drywall installers--through an analysis of the traumatic-injury data obtained from the Bureau of Labor Statistics. An additional objective was to demonstrate a feasible and economic approach to identify risk factors associated with a specific occupation by using an existing database. An analysis of nonfatal traumatic injuries with days away from work among wage-and-salary drywall installers was performed for 1992 through 1995 using the Occupational Injury and Illness Survey conducted by the Bureau of Labor Statistics. Results from this study indicate that drywall installers are at a high risk of overexertion and falls to a lower level. More than 40% of the injured drywall installers suffered sprains, strains, and/or tears. The most frequently injured body part was the trunk. More than one-third of the trunk injuries occurred while handling solid building materials, mainly drywall. In addition, the database analysis used in this study is valid in identifying overall risk factors for specific occupations.
How Will DSM-5 Affect Autism Diagnosis? A Systematic Literature Review and Meta-Analysis

ERIC Educational Resources Information Center

Kulage, Kristine M.; Smaldone, Arlene M.; Cohn, Elizabeth G.

2014-01-01

We conducted a systematic review and meta-analysis to determine the effect of changes to the Diagnostic and Statistical Manual (DSM)-5 on autism spectrum disorder (ASD) and explore policy implications. We identified 418 studies; 14 met inclusion criteria. Studies consistently reported decreases in ASD diagnosis (range 7.3-68.4%) using DSM-5…
A Secondary Analysis of the Impact of School Management Practices on School Performance

ERIC Educational Resources Information Center

Talbert, Dale A.

2009-01-01

The purpose of this study was to conduct a secondary analysis of the impact of school management practices on school performance utilizing a survey design of School and Staffing (SASS) data collected by the National Center for Education Statistics (NCES) of the U.S. Department of Education, 1999-2000. The study identifies those school management…
Robust Mokken Scale Analysis by Means of the Forward Search Algorithm for Outlier Detection

ERIC Educational Resources Information Center

Zijlstra, Wobbe P.; van der Ark, L. Andries; Sijtsma, Klaas

2011-01-01

Exploratory Mokken scale analysis (MSA) is a popular method for identifying scales from larger sets of items. As with any statistical method, in MSA the presence of outliers in the data may result in biased results and wrong conclusions. The forward search algorithm is a robust diagnostic method for outlier detection, which we adapt here to…
Analysis of Parasite and Other Skewed Counts

PubMed Central

Alexander, Neal

2012-01-01

Objective To review methods for the statistical analysis of parasite and other skewed count data. Methods Statistical methods for skewed count data are described and compared, with reference to those used over a ten year period of Tropical Medicine and International Health. Two parasitological datasets are used for illustration. Results Ninety papers were identified, 89 with descriptive and 60 with inferential analysis. A lack of clarity is noted in identifying measures of location, in particular the Williams and geometric mean. The different measures are compared, emphasizing the legitimacy of the arithmetic mean for skewed data. In the published papers, the t test and related methods were often used on untransformed data, which is likely to be invalid. Several approaches to inferential analysis are described, emphasizing 1) non-parametric methods, while noting that they are not simply comparisons of medians, and 2) generalized linear modelling, in particular with the negative binomial distribution. Additional methods, such as the bootstrap, with potential for greater use are described. Conclusions Clarity is recommended when describing transformations and measures of location. It is suggested that non-parametric methods and generalized linear models are likely to be sufficient for most analyses. PMID:22943299
Characterizing and locating air pollution sources in a complex industrial district using optical remote sensing technology and multivariate statistical modeling.

PubMed

Chang, Pao-Erh Paul; Yang, Jen-Chih Rena; Den, Walter; Wu, Chang-Fu

2014-09-01

Emissions of volatile organic compounds (VOCs) are most frequent environmental nuisance complaints in urban areas, especially where industrial districts are nearby. Unfortunately, identifying the responsible emission sources of VOCs is essentially a difficult task. In this study, we proposed a dynamic approach to gradually confine the location of potential VOC emission sources in an industrial complex, by combining multi-path open-path Fourier transform infrared spectrometry (OP-FTIR) measurement and the statistical method of principal component analysis (PCA). Close-cell FTIR was further used to verify the VOC emission source by measuring emitted VOCs from selected exhaust stacks at factories in the confined areas. Multiple open-path monitoring lines were deployed during a 3-month monitoring campaign in a complex industrial district. The emission patterns were identified and locations of emissions were confined by the wind data collected simultaneously. N,N-Dimethyl formamide (DMF), 2-butanone, toluene, and ethyl acetate with mean concentrations of 80.0 ± 1.8, 34.5 ± 0.8, 103.7 ± 2.8, and 26.6 ± 0.7 ppbv, respectively, were identified as the major VOC mixture at all times of the day around the receptor site. As the toxic air pollutant, the concentrations of DMF in air samples were found exceeding the ambient standard despite the path-average effect of OP-FTIR upon concentration levels. The PCA data identified three major emission sources, including PU coating, chemical packaging, and lithographic printing industries. Applying instrumental measurement and statistical modeling, this study has established a systematic approach for locating emission sources. Statistical modeling (PCA) plays an important role in reducing dimensionality of a large measured dataset and identifying underlying emission sources. Instrumental measurement, however, helps verify the outcomes of the statistical modeling. The field study has demonstrated the feasibility of using multi-path OP-FTIR measurement. The wind data incorporating with the statistical modeling (PCA) may successfully identify the major emission source in a complex industrial district.
SigTree: A Microbial Community Analysis Tool to Identify and Visualize Significantly Responsive Branches in a Phylogenetic Tree.

PubMed

Stevens, John R; Jones, Todd R; Lefevre, Michael; Ganesan, Balasubramanian; Weimer, Bart C

2017-01-01

Microbial community analysis experiments to assess the effect of a treatment intervention (or environmental change) on the relative abundance levels of multiple related microbial species (or operational taxonomic units) simultaneously using high throughput genomics are becoming increasingly common. Within the framework of the evolutionary phylogeny of all species considered in the experiment, this translates to a statistical need to identify the phylogenetic branches that exhibit a significant consensus response (in terms of operational taxonomic unit abundance) to the intervention. We present the R software package SigTree , a collection of flexible tools that make use of meta-analysis methods and regular expressions to identify and visualize significantly responsive branches in a phylogenetic tree, while appropriately adjusting for multiple comparisons.
An evaluation of intraoperative and postoperative outcomes of torsional mode versus longitudinal ultrasound mode phacoemulsification: a Meta-analysis.

PubMed

Leon, Pia; Umari, Ingrid; Mangogna, Alessandro; Zanei, Andrea; Tognetto, Daniele

2016-01-01

To evaluate and compare the intraoperative parameters and postoperative outcomes of torsional mode and longitudinal mode of phacoemulsification. Pertinent studies were identified by a computerized MEDLINE search from January 2002 to September 2013. The Meta-analysis is composed of two parts. In the first part the intraoperative parameters were considered: ultrasound time (UST) and cumulative dissipated energy (CDE). The intraoperative values were also distinctly considered for two categories (moderate and hard cataract group) depending on the nuclear opacity grade. In the second part of the study the postoperative outcomes as the best corrected visual acuity (BCVA) and the endothelial cell loss (ECL) were taken in consideration. The UST and CDE values proved statistically significant in support of torsional mode for both moderate and hard cataract group. The analysis of BCVA did not present statistically significant difference between the two surgical modalities. The ECL count was statistically significant in support of torsional mode (P<0.001). The Meta-analysis shows the superiority of the torsional mode for intraoperative parameters (UST, CDE) and postoperative ECL outcomes.
An evaluation of intraoperative and postoperative outcomes of torsional mode versus longitudinal ultrasound mode phacoemulsification: a Meta-analysis

PubMed Central

Leon, Pia; Umari, Ingrid; Mangogna, Alessandro; Zanei, Andrea; Tognetto, Daniele

2016-01-01

AIM To evaluate and compare the intraoperative parameters and postoperative outcomes of torsional mode and longitudinal mode of phacoemulsification. METHODS Pertinent studies were identified by a computerized MEDLINE search from January 2002 to September 2013. The Meta-analysis is composed of two parts. In the first part the intraoperative parameters were considered: ultrasound time (UST) and cumulative dissipated energy (CDE). The intraoperative values were also distinctly considered for two categories (moderate and hard cataract group) depending on the nuclear opacity grade. In the second part of the study the postoperative outcomes as the best corrected visual acuity (BCVA) and the endothelial cell loss (ECL) were taken in consideration. RESULTS The UST and CDE values proved statistically significant in support of torsional mode for both moderate and hard cataract group. The analysis of BCVA did not present statistically significant difference between the two surgical modalities. The ECL count was statistically significant in support of torsional mode (P<0.001). CONCLUSION The Meta-analysis shows the superiority of the torsional mode for intraoperative parameters (UST, CDE) and postoperative ECL outcomes. PMID:27366694
Incorporating Information of microRNAs into Pathway Analysis in a Genome-Wide Association Study of Bipolar Disorder

PubMed Central

Shih, Wei-Liang; Kao, Chung-Feng; Chuang, Li-Chung; Kuo, Po-Hsiu

2012-01-01

MicroRNAs (miRNAs) are known to be important post-transcriptional regulators that are involved in the etiology of complex psychiatric traits. The present study aimed to incorporate miRNAs information into pathway analysis using a genome-wide association dataset to identify relevant biological pathways for bipolar disorder (BPD). We selected psychiatric- and neurological-associated miRNAs (N = 157) from PhenomiR database. The miRNA target genes (miTG) predictions were obtained from microRNA.org. Canonical pathways (N = 4,051) were downloaded from the Molecule Signature Database. We employed a novel weighting scheme for miTGs in pathway analysis using methods of gene set enrichment analysis and sum-statistic. Under four statistical scenarios, 38 significantly enriched pathways (P-value < 0.01 after multiple testing correction) were identified for the risk of developing BPD, including pathways of ion channels associated (e.g., gated channel activity, ion transmembrane transporter activity, and ion channel activity) and nervous related biological processes (e.g., nervous system development, cytoskeleton, and neuroactive ligand receptor interaction). Among them, 19 were identified only when the weighting scheme was applied. Many miRNA-targeted genes were functionally related to ion channels, collagen, and axonal growth and guidance that have been suggested to be associated with BPD previously. Some of these genes are linked to the regulation of miRNA machinery in the literature. Our findings provide support for the potential involvement of miRNAs in the psychopathology of BPD. Further investigations to elucidate the functions and mechanisms of identified candidate pathways are needed. PMID:23264780
Faith-adapted psychological therapies for depression and anxiety: Systematic review and meta-analysis.

PubMed

Anderson, Naomi; Heywood-Everett, Suzanne; Siddiqi, Najma; Wright, Judy; Meredith, Jodi; McMillan, Dean

2015-05-01

Incorporating faith (religious or spiritual) perspectives into psychological treatments has attracted significant interest in recent years. However, previous suggestion that good psychiatric care should include spiritual components has provoked controversy. To try to address ongoing uncertainty in this field we present a systematic review and meta-analysis to assess the efficacy of faith-based adaptations of bona fide psychological therapies for depression or anxiety. A systematic review and meta-analysis of randomised controlled trials were performed. The literature search yielded 2274 citations of which 16 studies were eligible for inclusion. All studies used cognitive or cognitive behavioural models as the basis for their faith-adapted treatment (F-CBT). We identified statistically significant benefits of using F-CBT. However, quality assessment using the Cochrane risk of bias tool revealed methodological limitations that reduce the apparent strength of these findings. Whilst the effect sizes identified here were statistically significant, there were relatively a few relevant RCTs available, and those included were typically small and susceptible to significant biases. Biases associated with researcher or therapist allegiance were identified as a particular concern. Despite some suggestion that faith-adapted CBT may out-perform both standard CBT and control conditions (waiting list or "treatment as usual"), the effect sizes identified in this meta-analysis must be considered in the light of the substantial methodological limitations that affect the primary research data. Before firm recommendations about the value of faith-adapted treatments can be made, further large-scale, rigorously performed trials are required. Copyright © 2015 Elsevier B.V. All rights reserved.

Evaluation of the Feasibility of Screening Patients for Early Signs of Lung Carcinoma in Web Search Logs.

PubMed

White, Ryen W; Horvitz, Eric

2017-03-01

A statistical model that predicts the appearance of strong evidence of a lung carcinoma diagnosis via analysis of large-scale anonymized logs of web search queries from millions of people across the United States. To evaluate the feasibility of screening patients at risk of lung carcinoma via analysis of signals from online search activity. We identified people who issue special queries that provide strong evidence of a recent diagnosis of lung carcinoma. We then considered patterns of symptoms expressed as searches about concerning symptoms over several months prior to the appearance of the landmark web queries. We built statistical classifiers that predict the future appearance of landmark queries based on the search log signals. This was a retrospective log analysis of the online activity of millions of web searchers seeking health-related information online. Of web searchers who queried for symptoms related to lung carcinoma, some (n = 5443 of 4 813 985) later issued queries that provide strong evidence of recent clinical diagnosis of lung carcinoma and are regarded as positive cases in our analysis. Additional evidence on the reliability of these queries as representing clinical diagnoses is based on the significant increase in follow-on searches for treatments and medications for these searchers and on the correlation between lung carcinoma incidence rates and our log-based statistics. The remaining symptom searchers (n = 4 808 542) are regarded as negative cases. Performance of the statistical model for early detection from online search behavior, for different lead times, different sets of signals, and different cohorts of searchers stratified by potential risk. The statistical classifier predicting the future appearance of landmark web queries based on search log signals identified searchers who later input queries consistent with a lung carcinoma diagnosis, with a true-positive rate ranging from 3% to 57% for false-positive rates ranging from 0.00001 to 0.001, respectively. The methods can be used to identify people at highest risk up to a year in advance of the inferred diagnosis time. The 5 factors associated with the highest relative risk (RR) were evidence of family history (RR = 7.548; 95% CI, 3.937-14.470), age (RR = 3.558; 95% CI, 3.357-3.772), radon (RR = 2.529; 95% CI, 1.137-5.624), primary location (RR = 2.463; 95% CI, 1.364-4.446), and occupation (RR = 1.969; 95% CI, 1.143-3.391). Evidence of smoking (RR = 1.646; 95% CI, 1.032-2.260) was important but not top-ranked, which was due to the difficulty of identifying smoking history from search terms. Pattern recognition based on data drawn from large-scale web search queries holds opportunity for identifying risk factors and frames new directions with early detection of lung carcinoma.
Statistical Approaches Used to Assess the Equity of Access to Food Outlets: A Systematic Review

PubMed Central

Lamb, Karen E.; Thornton, Lukar E.; Cerin, Ester; Ball, Kylie

2015-01-01

Background Inequalities in eating behaviours are often linked to the types of food retailers accessible in neighbourhood environments. Numerous studies have aimed to identify if access to healthy and unhealthy food retailers is socioeconomically patterned across neighbourhoods, and thus a potential risk factor for dietary inequalities. Existing reviews have examined differences between methodologies, particularly focussing on neighbourhood and food outlet access measure definitions. However, no review has informatively discussed the suitability of the statistical methodologies employed; a key issue determining the validity of study findings. Our aim was to examine the suitability of statistical approaches adopted in these analyses. Methods Searches were conducted for articles published from 2000–2014. Eligible studies included objective measures of the neighbourhood food environment and neighbourhood-level socio-economic status, with a statistical analysis of the association between food outlet access and socio-economic status. Results Fifty-four papers were included. Outlet accessibility was typically defined as the distance to the nearest outlet from the neighbourhood centroid, or as the number of food outlets within a neighbourhood (or buffer). To assess if these measures were linked to neighbourhood disadvantage, common statistical methods included ANOVA, correlation, and Poisson or negative binomial regression. Although all studies involved spatial data, few considered spatial analysis techniques or spatial autocorrelation. Conclusions With advances in GIS software, sophisticated measures of neighbourhood outlet accessibility can be considered. However, approaches to statistical analysis often appear less sophisticated. Care should be taken to consider assumptions underlying the analysis and the possibility of spatially correlated residuals which could affect the results. PMID:29546115
Evaluating the Effects of Heavy Sugarcane Truck Operations on Repair Cost of Low Volume Highways.

DOT National Transportation Integrated Search

2008-11-01

This study assesses the economic impact of overweight permitted vehicles hauling sugarcane on Louisiana highways. : The highway routes being used to haul these commodities were identified, and statistically selected samples were used in : the analysi...
Statistical model specification and power: recommendations on the use of test-qualified pooling in analysis of experimental data

PubMed Central

Colegrave, Nick

2017-01-01

A common approach to the analysis of experimental data across much of the biological sciences is test-qualified pooling. Here non-significant terms are dropped from a statistical model, effectively pooling the variation associated with each removed term with the error term used to test hypotheses (or estimate effect sizes). This pooling is only carried out if statistical testing on the basis of applying that data to a previous more complicated model provides motivation for this model simplification; hence the pooling is test-qualified. In pooling, the researcher increases the degrees of freedom of the error term with the aim of increasing statistical power to test their hypotheses of interest. Despite this approach being widely adopted and explicitly recommended by some of the most widely cited statistical textbooks aimed at biologists, here we argue that (except in highly specialized circumstances that we can identify) the hoped-for improvement in statistical power will be small or non-existent, and there is likely to be much reduced reliability of the statistical procedures through deviation of type I error rates from nominal levels. We thus call for greatly reduced use of test-qualified pooling across experimental biology, more careful justification of any use that continues, and a different philosophy for initial selection of statistical models in the light of this change in procedure. PMID:28330912
Estimation of trends

NASA Technical Reports Server (NTRS)

1981-01-01

The application of statistical methods to recorded ozone measurements. The effects of a long term depletion of ozone at magnitudes predicted by the NAS is harmful to most forms of life. Empirical prewhitening filters the derivation of which is independent of the underlying physical mechanisms were analyzed. Statistical analysis performs a checks and balances effort. Time series filters variations into systematic and random parts, errors are uncorrelated, and significant phase lag dependencies are identified. The use of time series modeling to enhance the capability of detecting trends is discussed.
Statistics, Uncertainty, and Transmitted Variation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wendelberger, Joanne Roth

2014-11-05

The field of Statistics provides methods for modeling and understanding data and making decisions in the presence of uncertainty. When examining response functions, variation present in the input variables will be transmitted via the response function to the output variables. This phenomenon can potentially have significant impacts on the uncertainty associated with results from subsequent analysis. This presentation will examine the concept of transmitted variation, its impact on designed experiments, and a method for identifying and estimating sources of transmitted variation in certain settings.
Archival Legacy Investigations of Circumstellar Environments (ALICE): Statistical assessment of point source detections

NASA Astrophysics Data System (ADS)

Choquet, Élodie; Pueyo, Laurent; Soummer, Rémi; Perrin, Marshall D.; Hagan, J. Brendan; Gofas-Salas, Elena; Rajan, Abhijith; Aguilar, Jonathan

2015-09-01

The ALICE program, for Archival Legacy Investigation of Circumstellar Environment, is currently conducting a virtual survey of about 400 stars, by re-analyzing the HST-NICMOS coronagraphic archive with advanced post-processing techniques. We present here the strategy that we adopted to identify detections and potential candidates for follow-up observations, and we give a preliminary overview of our detections. We present a statistical analysis conducted to evaluate the confidence level on these detection and the completeness of our candidate search.
Differences in game-related statistics of basketball performance by game location for men's winning and losing teams.

PubMed

Gómez, Miguel A; Lorenzo, Alberto; Barakat, Rubén; Ortega, Enrique; Palao, José M

2008-02-01

The aim of the present study was to identify game-related statistics that differentiate winning and losing teams according to game location. The sample included 306 games of the 2004-2005 regular season of the Spanish professional men's league (ACB League). The independent variables were game location (home or away) and game result (win or loss). The game-related statistics registered were free throws (successful and unsuccessful), 2- and 3-point field goals (successful and unsuccessful), offensive and defensive rebounds, blocks, assists, fouls, steals, and turnovers. Descriptive and inferential analyses were done (one-way analysis of variance and discriminate analysis). The multivariate analysis showed that winning teams differ from losing teams in defensive rebounds (SC = .42) and in assists (SC = .38). Similarly, winning teams differ from losing teams when they play at home in defensive rebounds (SC = .40) and in assists (SC = .41). On the other hand, winning teams differ from losing teams when they play away in defensive rebounds (SC = .44), assists (SC = .30), successful 2-point field goals (SC = .31), and unsuccessful 3-point field goals (SC = -.35). Defensive rebounds and assists were the only game-related statistics common to all three analyses.
Differential protein-coding gene and long noncoding RNA expression in smoking-related lung squamous cell carcinoma.

PubMed

Li, Shicheng; Sun, Xiao; Miao, Shuncheng; Liu, Jia; Jiao, Wenjie

2017-11-01

Cigarette smoking is one of the greatest preventable risk factors for developing cancer, and most cases of lung squamous cell carcinoma (lung SCC) are associated with smoking. The pathogenesis mechanism of tumor progress is unclear. This study aimed to identify biomarkers in smoking-related lung cancer, including protein-coding gene, long noncoding RNA, and transcription factors. We selected and obtained messenger RNA microarray datasets and clinical data from the Gene Expression Omnibus database to identify gene expression altered by cigarette smoking. Integrated bioinformatic analysis was used to clarify biological functions of the identified genes, including Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway, the construction of a protein-protein interaction network, transcription factor, and statistical analyses. Subsequent quantitative real-time PCR was utilized to verify these bioinformatic analyses. Five hundred and ninety-eight differentially expressed genes and 21 long noncoding RNA were identified in smoking-related lung SCC. GO and KEGG pathway analysis showed that identified genes were enriched in the cancer-related functions and pathways. The protein-protein interaction network revealed seven hub genes identified in lung SCC. Several transcription factors and their binding sites were predicted. The results of real-time quantitative PCR revealed that AURKA and BIRC5 were significantly upregulated and LINC00094 was downregulated in the tumor tissues of smoking patients. Further statistical analysis indicated that dysregulation of AURKA, BIRC5, and LINC00094 indicated poor prognosis in lung SCC. Protein-coding genes AURKA, BIRC5, and LINC00094 could be biomarkers or therapeutic targets for smoking-related lung SCC. © 2017 The Authors. Thoracic Cancer published by China Lung Oncology Group and John Wiley & Sons Australia, Ltd.
Statistical Analysis of the Links between Blocking and Nor'easters

NASA Astrophysics Data System (ADS)

Booth, J. F.; Pfahl, S.

2015-12-01

Nor'easters can be loosely defined as extratropical cyclones that develop as they progress northward along the eastern coast of North America. The path makes it possible for these storms to generate storm surge along the coastline and/or heavy precipitation or snow inland. In the present analysis, the path of the storms is investigated relative to the behavior of upstream blocking events over the North Atlantic Ocean. For this analysis, two separate Lagrangian tracking methods are used to identify the extratropical cyclone paths and the blocking events. Using the cyclone paths, Nor'easters are identified and blocking statistics are calculated for the days prior to, during and following the occurrence of the Nor'easters. The path, strength and intensification rates of the cyclones are compared with the strength and location of the blocks. In the event that a Nor'easter occurs, the likelihood of the presence of block at the southeast tip of Greenland is statistically significantly increased, i.e., the presence of a block concurrent with a Nor'easter happens more often than by random coincidence. However no significant link between the strength of the storms and the strength of the block is identified. These results suggest that the presence of the block mainly affects the path of the Nor'easters. On the other hand, in the event of blocking at the southeast tip of Greenland, the likelihood of a Nor'easter, as opposed to a different type of storm is no greater than what one might expect from randomly sampling cyclone tracks. The results confirm a long held understanding in forecast meteorology that upstream blocking is a necessary but not sufficient condition for generating a Nor'easter.
An expert panel-based study on recognition of gastro-esophageal reflux in difficult esophageal pH-impedance tracings.

PubMed

Smits, M J; Loots, C M; van Wijk, M P; Bredenoord, A J; Benninga, M A; Smout, A J P M

2015-05-01

Despite existing criteria for scoring gastro-esophageal reflux (GER) in esophageal multichannel pH-impedance measurement (pH-I) tracings, inter- and intra-rater variability is large and agreement with automated analysis is poor. To identify parameters of difficult to analyze pH-I patterns and combine these into a statistical model that can identify GER episodes with an international consensus as gold standard. Twenty-one experts from 10 countries were asked to mark GER presence for adult and pediatric pH-I patterns in an online pre-assessment. During a consensus meeting, experts voted on patterns not reaching majority consensus (>70% agreement). Agreement was calculated between raters, between consensus and individual raters, and between consensus and software generated automated analysis. With eight selected parameters, multiple logistic regression analysis was performed to describe an algorithm sensitive and specific for detection of GER. Majority consensus was reached for 35/79 episodes in the online pre-assessment (interrater κ = 0.332). Mean agreement between pre-assessment scores and final consensus was moderate (κ = 0.466). Combining eight pH-I parameters did not result in a statistically significant model able to identify presence of GER. Recognizing a pattern as retrograde is the best indicator of GER, with 100% sensitivity and 81% specificity with expert consensus as gold standard. Agreement between experts scoring difficult impedance patterns for presence or absence of GER is poor. Combining several characteristics into a statistical model did not improve diagnostic accuracy. Only the parameter 'retrograde propagation pattern' is an indicator of GER in difficult pH-I patterns. © 2015 John Wiley & Sons Ltd.
Pooled Genome-Wide Analysis to Identify Novel Risk Loci for Pediatric Allergic Asthma

PubMed Central

Ricci, Giampaolo; Astolfi, Annalisa; Remondini, Daniel; Cipriani, Francesca; Formica, Serena; Dondi, Arianna; Pession, Andrea

2011-01-01

Background Genome-wide association studies of pooled DNA samples were shown to be a valuable tool to identify candidate SNPs associated to a phenotype. No such study was up to now applied to childhood allergic asthma, even if the very high complexity of asthma genetics is an appropriate field to explore the potential of pooled GWAS approach. Methodology/Principal Findings We performed a pooled GWAS and individual genotyping in 269 children with allergic respiratory diseases comparing allergic children with and without asthma. We used a modular approach to identify the most significant loci associated with asthma by combining silhouette statistics and physical distance method with cluster-adapted thresholding. We found 97% concordance between pooled GWAS and individual genotyping, with 36 out of 37 top-scoring SNPs significant at individual genotyping level. The most significant SNP is located inside the coding sequence of C5, an already identified asthma susceptibility gene, while the other loci regulate functions that are relevant to bronchial physiopathology, as immune- or inflammation-mediated mechanisms and airway smooth muscle contraction. Integration with gene expression data showed that almost half of the putative susceptibility genes are differentially expressed in experimental asthma mouse models. Conclusion/Significance Combined silhouette statistics and cluster-adapted physical distance threshold analysis of pooled GWAS data is an efficient method to identify candidate SNP associated to asthma development in an allergic pediatric population. PMID:21359210
Quantitative trait nucleotide analysis using Bayesian model selection.

PubMed

Blangero, John; Goring, Harald H H; Kent, Jack W; Williams, Jeff T; Peterson, Charles P; Almasy, Laura; Dyer, Thomas D

2005-10-01

Although much attention has been given to statistical genetic methods for the initial localization and fine mapping of quantitative trait loci (QTLs), little methodological work has been done to date on the problem of statistically identifying the most likely functional polymorphisms using sequence data. In this paper we provide a general statistical genetic framework, called Bayesian quantitative trait nucleotide (BQTN) analysis, for assessing the likely functional status of genetic variants. The approach requires the initial enumeration of all genetic variants in a set of resequenced individuals. These polymorphisms are then typed in a large number of individuals (potentially in families), and marker variation is related to quantitative phenotypic variation using Bayesian model selection and averaging. For each sequence variant a posterior probability of effect is obtained and can be used to prioritize additional molecular functional experiments. An example of this quantitative nucleotide analysis is provided using the GAW12 simulated data. The results show that the BQTN method may be useful for choosing the most likely functional variants within a gene (or set of genes). We also include instructions on how to use our computer program, SOLAR, for association analysis and BQTN analysis.
Genome-wide association study identifies multiple loci associated with bladder cancer risk

PubMed Central

Figueroa, Jonine D.; Ye, Yuanqing; Siddiq, Afshan; Garcia-Closas, Montserrat; Chatterjee, Nilanjan; Prokunina-Olsson, Ludmila; Cortessis, Victoria K.; Kooperberg, Charles; Cussenot, Olivier; Benhamou, Simone; Prescott, Jennifer; Porru, Stefano; Dinney, Colin P.; Malats, Núria; Baris, Dalsu; Purdue, Mark; Jacobs, Eric J.; Albanes, Demetrius; Wang, Zhaoming; Deng, Xiang; Chung, Charles C.; Tang, Wei; Bas Bueno-de-Mesquita, H.; Trichopoulos, Dimitrios; Ljungberg, Börje; Clavel-Chapelon, Françoise; Weiderpass, Elisabete; Krogh, Vittorio; Dorronsoro, Miren; Travis, Ruth; Tjønneland, Anne; Brenan, Paul; Chang-Claude, Jenny; Riboli, Elio; Conti, David; Gago-Dominguez, Manuela; Stern, Mariana C.; Pike, Malcolm C.; Van Den Berg, David; Yuan, Jian-Min; Hohensee, Chancellor; Rodabough, Rebecca; Cancel-Tassin, Geraldine; Roupret, Morgan; Comperat, Eva; Chen, Constance; De Vivo, Immaculata; Giovannucci, Edward; Hunter, David J.; Kraft, Peter; Lindstrom, Sara; Carta, Angela; Pavanello, Sofia; Arici, Cecilia; Mastrangelo, Giuseppe; Kamat, Ashish M.; Lerner, Seth P.; Barton Grossman, H.; Lin, Jie; Gu, Jian; Pu, Xia; Hutchinson, Amy; Burdette, Laurie; Wheeler, William; Kogevinas, Manolis; Tardón, Adonina; Serra, Consol; Carrato, Alfredo; García-Closas, Reina; Lloreta, Josep; Schwenn, Molly; Karagas, Margaret R.; Johnson, Alison; Schned, Alan; Armenti, Karla R.; Hosain, G.M.; Andriole, Gerald; Grubb, Robert; Black, Amanda; Ryan Diver, W.; Gapstur, Susan M.; Weinstein, Stephanie J.; Virtamo, Jarmo; Haiman, Chris A.; Landi, Maria T.; Caporaso, Neil; Fraumeni, Joseph F.; Vineis, Paolo; Wu, Xifeng; Silverman, Debra T.; Chanock, Stephen; Rothman, Nathaniel

2014-01-01

Candidate gene and genome-wide association studies (GWAS) have identified 11 independent susceptibility loci associated with bladder cancer risk. To discover additional risk variants, we conducted a new GWAS of 2422 bladder cancer cases and 5751 controls, followed by a meta-analysis with two independently published bladder cancer GWAS, resulting in a combined analysis of 6911 cases and 11 814 controls of European descent. TaqMan genotyping of 13 promising single nucleotide polymorphisms with P < 1 × 10−5 was pursued in a follow-up set of 801 cases and 1307 controls. Two new loci achieved genome-wide statistical significance: rs10936599 on 3q26.2 (P = 4.53 × 10−9) and rs907611 on 11p15.5 (P = 4.11 × 10−8). Two notable loci were also identified that approached genome-wide statistical significance: rs6104690 on 20p12.2 (P = 7.13 × 10−7) and rs4510656 on 6p22.3 (P = 6.98 × 10−7); these require further studies for confirmation. In conclusion, our study has identified new susceptibility alleles for bladder cancer risk that require fine-mapping and laboratory investigation, which could further understanding into the biological underpinnings of bladder carcinogenesis. PMID:24163127
Do climate extreme events foster violent civil conflicts? A coincidence analysis

NASA Astrophysics Data System (ADS)

Schleussner, Carl-Friedrich; Donges, Jonathan F.; Donner, Reik V.

2014-05-01

Civil conflicts promoted by adverse environmental conditions represent one of the most important potential feedbacks in the global socio-environmental nexus. While the role of climate extremes as a triggering factor is often discussed, no consensus is yet reached about the cause-and-effect relation in the observed data record. Here we present results of a rigorous statistical coincidence analysis based on the Munich Re Inc. extreme events database and the Uppsala conflict data program. We report evidence for statistically significant synchronicity between climate extremes with high economic impact and violent conflicts for various regions, although no coherent global signal emerges from our analysis. Our results indicate the importance of regional vulnerability and might aid to identify hot-spot regions for potential climate-triggered violent social conflicts.
Cosmology constraints from shear peak statistics in Dark Energy Survey Science Verification data

NASA Astrophysics Data System (ADS)

Kacprzak, T.; Kirk, D.; Friedrich, O.; Amara, A.; Refregier, A.; Marian, L.; Dietrich, J. P.; Suchyta, E.; Aleksić, J.; Bacon, D.; Becker, M. R.; Bonnett, C.; Bridle, S. L.; Chang, C.; Eifler, T. F.; Hartley, W. G.; Huff, E. M.; Krause, E.; MacCrann, N.; Melchior, P.; Nicola, A.; Samuroff, S.; Sheldon, E.; Troxel, M. A.; Weller, J.; Zuntz, J.; Abbott, T. M. C.; Abdalla, F. B.; Armstrong, R.; Benoit-Lévy, A.; Bernstein, G. M.; Bernstein, R. A.; Bertin, E.; Brooks, D.; Burke, D. L.; Carnero Rosell, A.; Carrasco Kind, M.; Carretero, J.; Castander, F. J.; Crocce, M.; D'Andrea, C. B.; da Costa, L. N.; Desai, S.; Diehl, H. T.; Evrard, A. E.; Neto, A. Fausti; Flaugher, B.; Fosalba, P.; Frieman, J.; Gerdes, D. W.; Goldstein, D. A.; Gruen, D.; Gruendl, R. A.; Gutierrez, G.; Honscheid, K.; Jain, B.; James, D. J.; Jarvis, M.; Kuehn, K.; Kuropatkin, N.; Lahav, O.; Lima, M.; March, M.; Marshall, J. L.; Martini, P.; Miller, C. J.; Miquel, R.; Mohr, J. J.; Nichol, R. C.; Nord, B.; Plazas, A. A.; Romer, A. K.; Roodman, A.; Rykoff, E. S.; Sanchez, E.; Scarpine, V.; Schubnell, M.; Sevilla-Noarbe, I.; Smith, R. C.; Soares-Santos, M.; Sobreira, F.; Swanson, M. E. C.; Tarle, G.; Thomas, D.; Vikram, V.; Walker, A. R.; Zhang, Y.; DES Collaboration

2016-12-01

Shear peak statistics has gained a lot of attention recently as a practical alternative to the two-point statistics for constraining cosmological parameters. We perform a shear peak statistics analysis of the Dark Energy Survey (DES) Science Verification (SV) data, using weak gravitational lensing measurements from a 139 deg2 field. We measure the abundance of peaks identified in aperture mass maps, as a function of their signal-to-noise ratio, in the signal-to-noise range 04 would require significant corrections, which is why we do not include them in our analysis. We compare our results to the cosmological constraints from the two-point analysis on the SV field and find them to be in good agreement in both the central value and its uncertainty. We discuss prospects for future peak statistics analysis with upcoming DES data.

Gender subordination in the vulnerability of women to domestic violence.

PubMed

Macedo Piosiadlo, Laura Christina; Godoy Serpa da Fonseca, Rosa Maria

2016-06-01

To create and validate an instrument that identifies women's vulnerability to domestic violence through gender subordination indicators in the family. An instrument consisting on 61 phrases was created, that indicates gender subordination in the family. After the assessment from ten judges, 34 phrases were validated. The approved version was administered to 321 health service users of São José dos Pinhais (Estado de Paraná, Brasil), along with the validated Portuguese version of the Abuse Assessment Screen (AAS) (for purposes of separating the sample group - the ''YES'' group was composed of women who have suffered violence and the ''NO'' group consisted of women who had not suffered violence). Data were transferred into the Statistical Package for the Social Sciences (SPSS) software, version 22, and quantitatively analyzed using exploratory and factor analysis, and tests for internal consistency. After analysis (Kaiser-Meyer-Olkin (KMO) statistics, Monte Carlo Principal Components Analysis (PCA, and diagram segmentation), two factors were identified: F1 - consisting of phrases related to home maintenance and family structure; F2 - phrases intrinsic to the couple's relationship. For the statements that reinforce gender subordination, the mean of the factors were higher for the group that answered YES to one of the violence identifying issues. The created instrument was able to identify women who were vulnerable to domestic violence using gender subordination indicators. This could be an important tool for nurses and other professionals in multidisciplinary teams, in order to organize and plan actions to prevent violence against women.

Principal component analysis of normalized full spectrum mass spectrometry data in multiMS-toolbox: An effective tool to identify important factors for classification of different metabolic patterns and bacterial strains.

PubMed

Cejnar, Pavel; Kuckova, Stepanka; Prochazka, Ales; Karamonova, Ludmila; Svobodova, Barbora

2018-06-15

Explorative statistical analysis of mass spectrometry data is still a time-consuming step. We analyzed critical factors for application of principal component analysis (PCA) in mass spectrometry and focused on two whole spectrum based normalization techniques and their application in the analysis of registered peak data and, in comparison, in full spectrum data analysis. We used this technique to identify different metabolic patterns in the bacterial culture of Cronobacter sakazakii, an important foodborne pathogen. Two software utilities, the ms-alone, a python-based utility for mass spectrometry data preprocessing and peak extraction, and the multiMS-toolbox, an R software tool for advanced peak registration and detailed explorative statistical analysis, were implemented. The bacterial culture of Cronobacter sakazakii was cultivated on Enterobacter sakazakii Isolation Agar, Blood Agar Base and Tryptone Soya Agar for 24 h and 48 h and applied by the smear method on an Autoflex speed MALDI-TOF mass spectrometer. For three tested cultivation media only two different metabolic patterns of Cronobacter sakazakii were identified using PCA applied on data normalized by two different normalization techniques. Results from matched peak data and subsequent detailed full spectrum analysis identified only two different metabolic patterns - a cultivation on Enterobacter sakazakii Isolation Agar showed significant differences to the cultivation on the other two tested media. The metabolic patterns for all tested cultivation media also proved the dependence on cultivation time. Both whole spectrum based normalization techniques together with the full spectrum PCA allow identification of important discriminative factors in experiments with several variable condition factors avoiding any problems with improper identification of peaks or emphasis on bellow threshold peak data. The amounts of processed data remain still manageable. Both implemented software utilities are available free of charge from http://uprt.vscht.cz/ms. Copyright © 2018 John Wiley & Sons, Ltd.

Identifying natural flow regimes using fish communities

NASA Astrophysics Data System (ADS)

Chang, Fi-John; Tsai, Wen-Ping; Wu, Tzu-Ching; Chen, Hung-kwai; Herricks, Edwin E.

2011-10-01

SummaryModern water resources management has adopted natural flow regimes as reasonable targets for river restoration and conservation. The characterization of a natural flow regime begins with the development of hydrologic statistics from flow records. However, little guidance exists for defining the period of record needed for regime determination. In Taiwan, the Taiwan Eco-hydrological Indicator System (TEIS), a group of hydrologic statistics selected for fisheries relevance, is being used to evaluate ecological flows. The TEIS consists of a group of hydrologic statistics selected to characterize the relationships between flow and the life history of indigenous species. Using the TEIS and biosurvey data for Taiwan, this paper identifies the length of hydrologic record sufficient for natural flow regime characterization. To define the ecological hydrology of fish communities, this study connected hydrologic statistics to fish communities by using methods to define antecedent conditions that influence existing community composition. A moving average method was applied to TEIS statistics to reflect the effects of antecedent flow condition and a point-biserial correlation method was used to relate fisheries collections with TEIS statistics. The resulting fish species-TEIS (FISH-TEIS) hydrologic statistics matrix takes full advantage of historical flows and fisheries data. The analysis indicates that, in the watersheds analyzed, averaging TEIS statistics for the present year and 3 years prior to the sampling date, termed MA(4), is sufficient to develop a natural flow regime. This result suggests that flow regimes based on hydrologic statistics for the period of record can be replaced by regimes developed for sampled fish communities.

Transfusion Indication Threshold Reduction (TITRe2) randomized controlled trial in cardiac surgery: statistical analysis plan.

PubMed

Pike, Katie; Nash, Rachel L; Murphy, Gavin J; Reeves, Barnaby C; Rogers, Chris A

2015-02-22

The Transfusion Indication Threshold Reduction (TITRe2) trial is the largest randomized controlled trial to date to compare red blood cell transfusion strategies following cardiac surgery. This update presents the statistical analysis plan, detailing how the study will be analyzed and presented. The statistical analysis plan has been written following recommendations from the International Conference on Harmonisation of Technical Requirements for Registration of Pharmaceuticals for Human Use, prior to database lock and the final analysis of trial data. Outlined analyses are in line with the Consolidated Standards of Reporting Trials (CONSORT). The study aims to randomize 2000 patients from 17 UK centres. Patients are randomized to either a restrictive (transfuse if haemoglobin concentration <7.5 g/dl) or liberal (transfuse if haemoglobin concentration <9 g/dl) transfusion strategy. The primary outcome is a binary composite outcome of any serious infectious or ischaemic event in the first 3 months following randomization. The statistical analysis plan details how non-adherence with the intervention, withdrawals from the study, and the study population will be derived and dealt with in the analysis. The planned analyses of the trial primary and secondary outcome measures are described in detail, including approaches taken to deal with multiple testing, model assumptions not being met and missing data. Details of planned subgroup and sensitivity analyses and pre-specified ancillary analyses are given, along with potential issues that have been identified with such analyses and possible approaches to overcome such issues. ISRCTN70923932 .

«

13

14

15

16

17

»

«

14

15

16

17

18

»

A Method for Gene-Based Pathway Analysis Using Genomewide Association Study Summary Statistics Reveals Nine New Type 1 Diabetes Associations

PubMed Central

Evangelou, Marina; Smyth, Deborah J; Fortune, Mary D; Burren, Oliver S; Walker, Neil M; Guo, Hui; Onengut-Gumuscu, Suna; Chen, Wei-Min; Concannon, Patrick; Rich, Stephen S; Todd, John A; Wallace, Chris

2014-01-01

Pathway analysis can complement point-wise single nucleotide polymorphism (SNP) analysis in exploring genomewide association study (GWAS) data to identify specific disease-associated genes that can be candidate causal genes. We propose a straightforward methodology that can be used for conducting a gene-based pathway analysis using summary GWAS statistics in combination with widely available reference genotype data. We used this method to perform a gene-based pathway analysis of a type 1 diabetes (T1D) meta-analysis GWAS (of 7,514 cases and 9,045 controls). An important feature of the conducted analysis is the removal of the major histocompatibility complex gene region, the major genetic risk factor for T1D. Thirty-one of the 1,583 (2%) tested pathways were identified to be enriched for association with T1D at a 5% false discovery rate. We analyzed these 31 pathways and their genes to identify SNPs in or near these pathway genes that showed potentially novel association with T1D and attempted to replicate the association of 22 SNPs in additional samples. Replication P-values were skewed () with 12 of the 22 SNPs showing . Support, including replication evidence, was obtained for nine T1D associated variants in genes ITGB7 (rs11170466, ), NRP1 (rs722988, ), BAD (rs694739, ), CTSB (rs1296023, ), FYN (rs11964650, ), UBE2G1 (rs9906760, ), MAP3K14 (rs17759555, ), ITGB1 (rs1557150, ), and IL7R (rs1445898, ). The proposed methodology can be applied to other GWAS datasets for which only summary level data are available. PMID:25371288

Groundwater quality assessment of urban Bengaluru using multivariate statistical techniques

NASA Astrophysics Data System (ADS)

Gulgundi, Mohammad Shahid; Shetty, Amba

2018-03-01

Groundwater quality deterioration due to anthropogenic activities has become a subject of prime concern. The objective of the study was to assess the spatial and temporal variations in groundwater quality and to identify the sources in the western half of the Bengaluru city using multivariate statistical techniques. Water quality index rating was calculated for pre and post monsoon seasons to quantify overall water quality for human consumption. The post-monsoon samples show signs of poor quality in drinking purpose compared to pre-monsoon. Cluster analysis (CA), principal component analysis (PCA) and discriminant analysis (DA) were applied to the groundwater quality data measured on 14 parameters from 67 sites distributed across the city. Hierarchical cluster analysis (CA) grouped the 67 sampling stations into two groups, cluster 1 having high pollution and cluster 2 having lesser pollution. Discriminant analysis (DA) was applied to delineate the most meaningful parameters accounting for temporal and spatial variations in groundwater quality of the study area. Temporal DA identified pH as the most important parameter, which discriminates between water quality in the pre-monsoon and post-monsoon seasons and accounts for 72% seasonal assignation of cases. Spatial DA identified Mg, Cl and NO3 as the three most important parameters discriminating between two clusters and accounting for 89% spatial assignation of cases. Principal component analysis was applied to the dataset obtained from the two clusters, which evolved three factors in each cluster, explaining 85.4 and 84% of the total variance, respectively. Varifactors obtained from principal component analysis showed that groundwater quality variation is mainly explained by dissolution of minerals from rock water interactions in the aquifer, effect of anthropogenic activities and ion exchange processes in water.

A phylogenetic transform enhances analysis of compositional microbiota data.

PubMed

Silverman, Justin D; Washburne, Alex D; Mukherjee, Sayan; David, Lawrence A

2017-02-15

Surveys of microbial communities (microbiota), typically measured as relative abundance of species, have illustrated the importance of these communities in human health and disease. Yet, statistical artifacts commonly plague the analysis of relative abundance data. Here, we introduce the PhILR transform, which incorporates microbial evolutionary models with the isometric log-ratio transform to allow off-the-shelf statistical tools to be safely applied to microbiota surveys. We demonstrate that analyses of community-level structure can be applied to PhILR transformed data with performance on benchmarks rivaling or surpassing standard tools. Additionally, by decomposing distance in the PhILR transformed space, we identified neighboring clades that may have adapted to distinct human body sites. Decomposing variance revealed that covariation of bacterial clades within human body sites increases with phylogenetic relatedness. Together, these findings illustrate how the PhILR transform combines statistical and phylogenetic models to overcome compositional data challenges and enable evolutionary insights relevant to microbial communities.

Urban-Induced Rainfall Anomalies in an Arid Regime: Evidence from a 108-Year Data Record and Satellite Measurements

NASA Technical Reports Server (NTRS)

Shepherd, J. Marshall

2004-01-01

The study employs a 108-year precipitation data record to identify statistically significant anomalies in rainfall downwind of the Phoenix urban region. The analysis reveals that during the monsoon season locations northeastern suburbs and exurbs of the Phoenix metropolitan area have experienced statistically significant increases in mean precipitation of 12 to 14 percent from a pre-urban (1895-1949) to post-urban (1950-2003) period. Mean and median post-urban precipitation totals in the anomaly region are significantly greater, in the statistical sense, than regions west of the city and in nearby mountainous regions of similar or greater topography. Further analysis of satellite-based rainfall totals for the summer of 2003 also reveal the existence of the anomaly region during a severe drought period. The anomaly can not simply be attributed to maximum topographic relief and is hypothesize to be related to urban-topographic interactions.

Identifying Node Role in Social Network Based on Multiple Indicators

PubMed Central

Huang, Shaobin; Lv, Tianyang; Zhang, Xizhe; Yang, Yange; Zheng, Weimin; Wen, Chao

2014-01-01

It is a classic topic of social network analysis to evaluate the importance of nodes and identify the node that takes on the role of core or bridge in a network. Because a single indicator is not sufficient to analyze multiple characteristics of a node, it is a natural solution to apply multiple indicators that should be selected carefully. An intuitive idea is to select some indicators with weak correlations to efficiently assess different characteristics of a node. However, this paper shows that it is much better to select the indicators with strong correlations. Because indicator correlation is based on the statistical analysis of a large number of nodes, the particularity of an important node will be outlined if its indicator relationship doesn't comply with the statistical correlation. Therefore, the paper selects the multiple indicators including degree, ego-betweenness centrality and eigenvector centrality to evaluate the importance and the role of a node. The importance of a node is equal to the normalized sum of its three indicators. A candidate for core or bridge is selected from the great degree nodes or the nodes with great ego-betweenness centrality respectively. Then, the role of a candidate is determined according to the difference between its indicators' relationship with the statistical correlation of the overall network. Based on 18 real networks and 3 kinds of model networks, the experimental results show that the proposed methods perform quite well in evaluating the importance of nodes and in identifying the node role. PMID:25089823

[Effect of vinegar-processed Curcumae Rhizoma on bile metabolism in rats].

PubMed

Gu, Wei; Lu, Tu-Lin; Li, Jin-Ci; Wang, Qiao-Han; Pan, Zi-Hao; Ji, De; Li, Lin; Zhang, Ji; Mao, Chun-Qin

2016-04-01

To explore the effect of vinegar-processed Curcumae Rhizoma on endogenous metabolites in bile by investigating the endogenous metabolites difference in bile before and after Curcumae Rhizoma was processed with vinegar. Alcohol extracts of crude and vinegar-processed Curcumae Rhizoma, as well as normal saline were prepared respectively, which were then given to the rats by intragastric administration for 0.5 h. Then common bile duct intubation drainage was conducted to collect 12 h bile of the rats. UPLC-TOF-MS analysis of bile samples was applied after 1∶3 acetonitrile protein precipitation; unidimensional statistics were combined with multivariate statistics and PeakView software was compared with network database to identify the potential biomarkers. Vinegar-processed Curcumae Rhizoma extracts had significant effects on metabolites spectrum in bile of the rats. With the boundaries of P<0.05, 13 metabolites with significant differences were found in bile of crude and vinegar-processed Curcumae Rhizoma groups, and 8 of them were identified when considering the network database. T-test unidimensional statistical analysis was applied between administration groups and blank group to obtain 7 metabolites with significant differences and identify them as potential biomarkers. 6 of the potential biomarkers were up-regulated in vinegar-processed group, which were related to the metabolism regulation of phospholipid metabolism, fat metabolism, bile acid metabolism, and N-acylethanolamine hydrolysis reaction balance, indicating the mechanism of vinegar-processed Curcumae Rhizoma on endogenous metabolites in bile of the rats. Copyright© by the Chinese Pharmaceutical Association.

Statistical Significance of Optical Map Alignments

PubMed Central

Sarkar, Deepayan; Goldstein, Steve; Schwartz, David C.

2012-01-01

Abstract The Optical Mapping System constructs ordered restriction maps spanning entire genomes through the assembly and analysis of large datasets comprising individually analyzed genomic DNA molecules. Such restriction maps uniquely reveal mammalian genome structure and variation, but also raise computational and statistical questions beyond those that have been solved in the analysis of smaller, microbial genomes. We address the problem of how to filter maps that align poorly to a reference genome. We obtain map-specific thresholds that control errors and improve iterative assembly. We also show how an optimal self-alignment score provides an accurate approximation to the probability of alignment, which is useful in applications seeking to identify structural genomic abnormalities. PMID:22506568

The use of the temporal scan statistic to detect methicillin-resistant Staphylococcus aureus clusters in a community hospital.

PubMed

Faires, Meredith C; Pearl, David L; Ciccotelli, William A; Berke, Olaf; Reid-Smith, Richard J; Weese, J Scott

2014-07-08

In healthcare facilities, conventional surveillance techniques using rule-based guidelines may result in under- or over-reporting of methicillin-resistant Staphylococcus aureus (MRSA) outbreaks, as these guidelines are generally unvalidated. The objectives of this study were to investigate the utility of the temporal scan statistic for detecting MRSA clusters, validate clusters using molecular techniques and hospital records, and determine significant differences in the rate of MRSA cases using regression models. Patients admitted to a community hospital between August 2006 and February 2011, and identified with MRSA>48 hours following hospital admission, were included in this study. Between March 2010 and February 2011, MRSA specimens were obtained for spa typing. MRSA clusters were investigated using a retrospective temporal scan statistic. Tests were conducted on a monthly scale and significant clusters were compared to MRSA outbreaks identified by hospital personnel. Associations between the rate of MRSA cases and the variables year, month, and season were investigated using a negative binomial regression model. During the study period, 735 MRSA cases were identified and 167 MRSA isolates were spa typed. Nine different spa types were identified with spa type 2/t002 (88.6%) the most prevalent. The temporal scan statistic identified significant MRSA clusters at the hospital (n=2), service (n=16), and ward (n=10) levels (P ≤ 0.05). Seven clusters were concordant with nine MRSA outbreaks identified by hospital staff. For the remaining clusters, seven events may have been equivalent to true outbreaks and six clusters demonstrated possible transmission events. The regression analysis indicated years 2009-2011, compared to 2006, and months March and April, compared to January, were associated with an increase in the rate of MRSA cases (P ≤ 0.05). The application of the temporal scan statistic identified several MRSA clusters that were not detected by hospital personnel. The identification of specific years and months with increased MRSA rates may be attributable to several hospital level factors including the presence of other pathogens. Within hospitals, the incorporation of the temporal scan statistic to standard surveillance techniques is a valuable tool for healthcare workers to evaluate surveillance strategies and aid in the identification of MRSA clusters.

Best practices from WisDOT mega and ARRA projects : statistical analysis and % time vs. % cost metrics.

DOT National Transportation Integrated Search

2012-03-01

This study was undertaken to: 1) apply a benchmarking process to identify best practices within four areas Wisconsin Department of Transportation (WisDOT) construction management and 2) analyze two performance metrics, % Cost vs. % Time, tracked by t...

Increasing Army Supply Chain Performance: Using an Integrated End to End Metrics System

DTIC Science & Technology

2017-01-01

Sched Deliver Sched Delinquent Contracts Current Metrics PQDR/SDRs Forecasting Accuracy Reliability Demand Management Asset Mgmt Strategies Pipeline...are identified and characterized by statistical analysis. The study proposed a framework and tool for inventory management based on factors such as

Data Analysis and Instrumentation Requirements for Evaluating Rail Joints and Rail Fasteners in Urban Track

DOT National Transportation Integrated Search

1975-02-01

Rail fasteners for concrete ties and direct fixation and bolted rail joints have been identified as key components for improving track performance. However, the lack of statistical load data limits the development of improved design criteria and eval...

Statistical and clustering analysis for disturbances: A case study of voltage dips in wind farms

DOE PAGES

Garcia-Sanchez, Tania; Gomez-Lazaro, Emilio; Muljadi, Eduard; ...

2016-01-28

This study proposes and evaluates an alternative statistical methodology to analyze a large number of voltage dips. For a given voltage dip, a set of lengths is first identified to characterize the root mean square (rms) voltage evolution along the disturbance, deduced from partial linearized time intervals and trajectories. Principal component analysis and K-means clustering processes are then applied to identify rms-voltage patterns and propose a reduced number of representative rms-voltage profiles from the linearized trajectories. This reduced group of averaged rms-voltage profiles enables the representation of a large amount of disturbances, which offers a visual and graphical representation ofmore » their evolution along the events, aspects that were not previously considered in other contributions. The complete process is evaluated on real voltage dips collected in intense field-measurement campaigns carried out in a wind farm in Spain among different years. The results are included in this paper.« less

Data on xylem sap proteins from Mn- and Fe-deficient tomato plants obtained using shotgun proteomics.

PubMed

Ceballos-Laita, Laura; Gutierrez-Carbonell, Elain; Takahashi, Daisuke; Abadía, Anunciación; Uemura, Matsuo; Abadía, Javier; López-Millán, Ana Flor

2018-04-01

This article contains consolidated proteomic data obtained from xylem sap collected from tomato plants grown in Fe- and Mn-sufficient control, as well as Fe-deficient and Mn-deficient conditions. Data presented here cover proteins identified and quantified by shotgun proteomics and Progenesis LC-MS analyses: proteins identified with at least two peptides and showing changes statistically significant (ANOVA; p ≤ 0.05) and above a biologically relevant selected threshold (fold ≥ 2) between treatments are listed. The comparison between Fe-deficient, Mn-deficient and control xylem sap samples using a multivariate statistical data analysis (Principal Component Analysis, PCA) is also included. Data included in this article are discussed in depth in the research article entitled "Effects of Fe and Mn deficiencies on the protein profiles of tomato ( Solanum lycopersicum) xylem sap as revealed by shotgun analyses" [1]. This dataset is made available to support the cited study as well to extend analyses at a later stage.

Countermeasures for Reducing Unsteady Aerodynamic Force Acting on High-Speed Train in Tunnel by Use of Modifications of Train Shapes

NASA Astrophysics Data System (ADS)

Suzuki, Masahiro; Nakade, Koji; Ido, Atsushi

As the maximum speed of high-speed trains increases, flow-induced vibration of trains in tunnels has become a subject of discussion in Japan. In this paper, we report the result of a study on use of modifications of train shapes as a countermeasure for reducing an unsteady aerodynamic force by on-track tests and a wind tunnel test. First, we conduct a statistical analysis of on-track test data to identify exterior parts of a train which cause the unsteady aerodynamic force. Next, we carry out a wind tunnel test to measure the unsteady aerodynamic force acting on a train in a tunnel and examined train shapes with a particular emphasis on the exterior parts identified by the statistical analysis. The wind tunnel test shows that fins under the car body are effective in reducing the unsteady aerodynamic force. Finally, we test the fins by an on-track test and confirmed its effectiveness.

Applying social network analysis to understand the knowledge sharing behaviour of practitioners in a clinical online discussion forum.

PubMed

Stewart, Samuel Alan; Abidi, Syed Sibte Raza

2012-12-04

Knowledge Translation (KT) plays a vital role in the modern health care community, facilitating the incorporation of new evidence into practice. Web 2.0 tools provide a useful mechanism for establishing an online KT environment in which health practitioners share their practice-related knowledge and experiences with an online community of practice. We have implemented a Web 2.0 based KT environment--an online discussion forum--for pediatric pain practitioners across seven different hospitals in Thailand. The online discussion forum enabled the pediatric pain practitioners to share and translate their experiential knowledge to help improve the management of pediatric pain in hospitals. The goal of this research is to investigate the knowledge sharing dynamics of a community of practice through an online discussion forum. We evaluated the communication patterns of the community members using statistical and social network analysis methods in order to better understand how the online community engages to share experiential knowledge. Statistical analyses and visualizations provide a broad overview of the communication patterns within the discussion forum. Social network analysis provides the tools to delve deeper into the social network, identifying the most active members of the community, reporting the overall health of the social network, isolating the potential core members of the social network, and exploring the inter-group relationships that exist across institutions and professions. The statistical analyses revealed a network dominated by a single institution and a single profession, and found a varied relationship between reading and posting content to the discussion forum. The social network analysis discovered a healthy network with strong communication patterns, while identifying which users are at the center of the community in terms of facilitating communication. The group-level analysis suggests that there is strong interprofessional and interregional communication, but a dearth of non-nurse participants has been identified as a shortcoming. The results of the analysis suggest that the discussion forum is active and healthy, and that, though few, the interprofessional and interinstitutional ties are strong.

Symptom Clusters in Advanced Cancer Patients: An Empirical Comparison of Statistical Methods and the Impact on Quality of Life.

PubMed

Dong, Skye T; Costa, Daniel S J; Butow, Phyllis N; Lovell, Melanie R; Agar, Meera; Velikova, Galina; Teckle, Paulos; Tong, Allison; Tebbutt, Niall C; Clarke, Stephen J; van der Hoek, Kim; King, Madeleine T; Fayers, Peter M

2016-01-01

Symptom clusters in advanced cancer can influence patient outcomes. There is large heterogeneity in the methods used to identify symptom clusters. To investigate the consistency of symptom cluster composition in advanced cancer patients using different statistical methodologies for all patients across five primary cancer sites, and to examine which clusters predict functional status, a global assessment of health and global quality of life. Principal component analysis and exploratory factor analysis (with different rotation and factor selection methods) and hierarchical cluster analysis (with different linkage and similarity measures) were used on a data set of 1562 advanced cancer patients who completed the European Organization for the Research and Treatment of Cancer Quality of Life Questionnaire-Core 30. Four clusters consistently formed for many of the methods and cancer sites: tense-worry-irritable-depressed (emotional cluster), fatigue-pain, nausea-vomiting, and concentration-memory (cognitive cluster). The emotional cluster was a stronger predictor of overall quality of life than the other clusters. Fatigue-pain was a stronger predictor of overall health than the other clusters. The cognitive cluster and fatigue-pain predicted physical functioning, role functioning, and social functioning. The four identified symptom clusters were consistent across statistical methods and cancer types, although there were some noteworthy differences. Statistical derivation of symptom clusters is in need of greater methodological guidance. A psychosocial pathway in the management of symptom clusters may improve quality of life. Biological mechanisms underpinning symptom clusters need to be delineated by future research. A framework for evidence-based screening, assessment, treatment, and follow-up of symptom clusters in advanced cancer is essential. Copyright © 2016 American Academy of Hospice and Palliative Medicine. Published by Elsevier Inc. All rights reserved.

Typhoid fever acquired in the United States, 1999–2010: epidemiology, microbiology, and use of a space–time scan statistic for outbreak detection

PubMed Central

IMANISHI, M.; NEWTON, A. E.; VIEIRA, A. R.; GONZALEZ-AVILES, G.; KENDALL SCOTT, M. E.; MANIKONDA, K.; MAXWELL, T. N.; HALPIN, J. L.; FREEMAN, M. M.; MEDALLA, F.; AYERS, T. L.; DERADO, G.; MAHON, B. E.; MINTZ, E. D.

2016-01-01

SUMMARY Although rare, typhoid fever cases acquired in the United States continue to be reported. Detection and investigation of outbreaks in these domestically acquired cases offer opportunities to identify chronic carriers. We searched surveillance and laboratory databases for domestically acquired typhoid fever cases, used a space–time scan statistic to identify clusters, and classified clusters as outbreaks or non-outbreaks. From 1999 to 2010, domestically acquired cases accounted for 18% of 3373 reported typhoid fever cases; their isolates were less often multidrug-resistant (2% vs. 15%) compared to isolates from travel-associated cases. We identified 28 outbreaks and two possible outbreaks within 45 space–time clusters of ⩾2 domestically acquired cases, including three outbreaks involving ⩾2 molecular subtypes. The approach detected seven of the ten outbreaks published in the literature or reported to CDC. Although this approach did not definitively identify any previously unrecognized outbreaks, it showed the potential to detect outbreaks of typhoid fever that may escape detection by routine analysis of surveillance data. Sixteen outbreaks had been linked to a carrier. Every case of typhoid fever acquired in a non-endemic country warrants thorough investigation. Space–time scan statistics, together with shoe-leather epidemiology and molecular subtyping, may improve outbreak detection. PMID:25427666

Typhoid fever acquired in the United States, 1999-2010: epidemiology, microbiology, and use of a space-time scan statistic for outbreak detection.

PubMed

Imanishi, M; Newton, A E; Vieira, A R; Gonzalez-Aviles, G; Kendall Scott, M E; Manikonda, K; Maxwell, T N; Halpin, J L; Freeman, M M; Medalla, F; Ayers, T L; Derado, G; Mahon, B E; Mintz, E D

2015-08-01

Although rare, typhoid fever cases acquired in the United States continue to be reported. Detection and investigation of outbreaks in these domestically acquired cases offer opportunities to identify chronic carriers. We searched surveillance and laboratory databases for domestically acquired typhoid fever cases, used a space-time scan statistic to identify clusters, and classified clusters as outbreaks or non-outbreaks. From 1999 to 2010, domestically acquired cases accounted for 18% of 3373 reported typhoid fever cases; their isolates were less often multidrug-resistant (2% vs. 15%) compared to isolates from travel-associated cases. We identified 28 outbreaks and two possible outbreaks within 45 space-time clusters of ⩾2 domestically acquired cases, including three outbreaks involving ⩾2 molecular subtypes. The approach detected seven of the ten outbreaks published in the literature or reported to CDC. Although this approach did not definitively identify any previously unrecognized outbreaks, it showed the potential to detect outbreaks of typhoid fever that may escape detection by routine analysis of surveillance data. Sixteen outbreaks had been linked to a carrier. Every case of typhoid fever acquired in a non-endemic country warrants thorough investigation. Space-time scan statistics, together with shoe-leather epidemiology and molecular subtyping, may improve outbreak detection.

28 CFR 22.25 - Final disposition of identifiable materials.

Code of Federal Regulations, 2011 CFR

2011-07-01

... RESEARCH AND STATISTICAL INFORMATION § 22.25 Final disposition of identifiable materials. Upon completion of a research or statistical project the security of identifiable research or statistical information...

28 CFR 22.25 - Final disposition of identifiable materials.

Code of Federal Regulations, 2010 CFR

2010-07-01

... RESEARCH AND STATISTICAL INFORMATION § 22.25 Final disposition of identifiable materials. Upon completion of a research or statistical project the security of identifiable research or statistical information...

«

14

15

16

17

18

»

«

15

16

17

18

19

»

Comparison of Salmonella enteritidis phage types isolated from layers and humans in Belgium in 2005.

PubMed

Welby, Sarah; Imberechts, Hein; Riocreux, Flavien; Bertrand, Sophie; Dierick, Katelijne; Wildemauwe, Christa; Hooyberghs, Jozef; Van der Stede, Yves

2011-08-01

The aim of this study was to investigate the available results for Belgium of the European Union coordinated monitoring program (2004/665 EC) on Salmonella in layers in 2005, as well as the results of the monthly outbreak reports of Salmonella Enteritidis in humans in 2005 to identify a possible statistical significant trend in both populations. Separate descriptive statistics and univariate analysis were carried out and the parametric and/or non-parametric hypothesis tests were conducted. A time cluster analysis was performed for all Salmonella Enteritidis phage types (PTs) isolated. The proportions of each Salmonella Enteritidis PT in layers and in humans were compared and the monthly distribution of the most common PT, isolated in both populations, was evaluated. The time cluster analysis revealed significant clusters during the months May and June for layers and May, July, August, and September for humans. PT21, the most frequently isolated PT in both populations in 2005, seemed to be responsible of these significant clusters. PT4 was the second most frequently isolated PT. No significant difference was found for the monthly trend evolution of both PT in both populations based on parametric and non-parametric methods. A similar monthly trend of PT distribution in humans and layers during the year 2005 was observed. The time cluster analysis and the statistical significance testing confirmed these results. Moreover, the time cluster analysis showed significant clusters during the summer time and slightly delayed in time (humans after layers). These results suggest a common link between the prevalence of Salmonella Enteritidis in layers and the occurrence of the pathogen in humans. Phage typing was confirmed to be a useful tool for identifying temporal trends.

Geostatistical applications in environmental remediation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Stewart, R.N.; Purucker, S.T.; Lyon, B.F.

1995-02-01

Geostatistical analysis refers to a collection of statistical methods for addressing data that vary in space. By incorporating spatial information into the analysis, geostatistics has advantages over traditional statistical analysis for problems with a spatial context. Geostatistics has a history of success in earth science applications, and its popularity is increasing in other areas, including environmental remediation. Due to recent advances in computer technology, geostatistical algorithms can be executed at a speed comparable to many standard statistical software packages. When used responsibly, geostatistics is a systematic and defensible tool can be used in various decision frameworks, such as the Datamore » Quality Objectives (DQO) process. At every point in the site, geostatistics can estimate both the concentration level and the probability or risk of exceeding a given value. Using these probability maps can assist in identifying clean-up zones. Given any decision threshold and an acceptable level of risk, the probability maps identify those areas that are estimated to be above or below the acceptable risk. Those areas that are above the threshold are of the most concern with regard to remediation. In addition to estimating clean-up zones, geostatistics can assist in designing cost-effective secondary sampling schemes. Those areas of the probability map with high levels of estimated uncertainty are areas where more secondary sampling should occur. In addition, geostatistics has the ability to incorporate soft data directly into the analysis. These data include historical records, a highly correlated secondary contaminant, or expert judgment. The role of geostatistics in environmental remediation is a tool that in conjunction with other methods can provide a common forum for building consensus.« less

Vedolizumab Compared with Certolizumab in the Therapy of Crohn Disease: A Systematic Review and Indirect Comparison.

PubMed

Kawalec, Paweł; Moćko, Pawel; Pilc, Andrzej; Radziwon-Zalewska, Maria; Malinowska-Lipień, Iwona

2016-08-01

The increasing prevalence of Crohn disease (CD) underscores the need to identify new effective drugs, which is particularly important for patients who do not respond or do not tolerate standard biologic therapies. The purpose of this analysis was to compare the efficacy and safety of vedolizumab and certolizumab pegol in patients with active moderate to severe CD. This analysis was prepared according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines. A systematic literature search of Medline (PubMed), Embase, and the Cochrane Library was conducted through March 5, 2016. Studies included were randomized controlled trials (RCTs) that enrolled patients treated for CD with vedolizumab or certolizumab pegol. All studies were critically appraised; indirect comparison was performed with the Bucher method. Eight RCTs were identified, and four were homogeneous enough to be included in the indirect comparison of the induction phase of treatment. No statistically significant differences were found in clinical response (relative risk [RR] 1.23, 95% confidence interval [CI] 0.81-1.88) or remission (RR 1.35, 95% CI 0.89-2.07) between vedolizumab and certolizumab pegol in the overall population. Similar nonstatistically significant differences in response and remission were noted in a subgroup analysis of anti-tumor necrosis factor-naive patients (RR 1.10, 95% CI 0.72-1.66 and RR 1.98, 95% CI 0.95-4.11, respectively). In addition, there were no statistically significant differences in safety profiles. This indirect comparison analysis demonstrated no statistically significant differences in efficacy and safety between vedolizumab and certolizumab pegol. © 2016 Pharmacotherapy Publications, Inc.

Testing alternative ground water models using cross-validation and other methods

USGS Publications Warehouse

Foglia, L.; Mehl, S.W.; Hill, M.C.; Perona, P.; Burlando, P.

2007-01-01

Many methods can be used to test alternative ground water models. Of concern in this work are methods able to (1) rank alternative models (also called model discrimination) and (2) identify observations important to parameter estimates and predictions (equivalent to the purpose served by some types of sensitivity analysis). Some of the measures investigated are computationally efficient; others are computationally demanding. The latter are generally needed to account for model nonlinearity. The efficient model discrimination methods investigated include the information criteria: the corrected Akaike information criterion, Bayesian information criterion, and generalized cross-validation. The efficient sensitivity analysis measures used are dimensionless scaled sensitivity (DSS), composite scaled sensitivity, and parameter correlation coefficient (PCC); the other statistics are DFBETAS, Cook's D, and observation-prediction statistic. Acronyms are explained in the introduction. Cross-validation (CV) is a computationally intensive nonlinear method that is used for both model discrimination and sensitivity analysis. The methods are tested using up to five alternative parsimoniously constructed models of the ground water system of the Maggia Valley in southern Switzerland. The alternative models differ in their representation of hydraulic conductivity. A new method for graphically representing CV and sensitivity analysis results for complex models is presented and used to evaluate the utility of the efficient statistics. The results indicate that for model selection, the information criteria produce similar results at much smaller computational cost than CV. For identifying important observations, the only obviously inferior linear measure is DSS; the poor performance was expected because DSS does not include the effects of parameter correlation and PCC reveals large parameter correlations. ?? 2007 National Ground Water Association.

Exploring the validity and statistical utility of a racism scale among Black men who have sex with men: a pilot study.

PubMed

Smith, William Pastor

2013-09-01

The primary purpose of this two-phased study was to examine the structural validity and statistical utility of a racism scale specific to Black men who have sex with men (MSM) who resided in the Washington, DC, metropolitan area and Baltimore, Maryland. Phase I involved pretesting a 10-item racism measure with 20 Black MSM. Based on pretest findings, the scale was adapted into a 21-item racism scale for use in collecting data on 166 respondents in Phase II. Exploratory factor analysis of the 21-item racism scale resulted in a 19-item, two-factor solution. The two factors or subscales were the following: General Racism and Relationships and Racism. Confirmatory factor analysis was used in testing construct validity of the factored racism scale. Specifically, the two racism factors were combined with three homophobia factors into a confirmatory factor analysis model. Based on a summary of the fit indices, both comparative and incremental were equal to .90, suggesting an adequate convergence of the racism and homophobia dimensions into a single social oppression construct. Statistical utility of the two racism subscales was demonstrated when regression analysis revealed that the gay-identified men versus bisexual-identified men in the sample were more likely to experience increased racism within the context of intimate relationships and less likely to be exposed to repeated experiences of general racism. Overall, the findings in this study highlight the importance of continuing to explore the psychometric properties of a racism scale that accounts for the unique psychosocial concerns experienced by Black MSM.

Nitrate source identification in groundwater of multiple land-use areas by combining isotopes and multivariate statistical analysis: A case study of Asopos basin (Central Greece).

PubMed

Matiatos, Ioannis

2016-01-15

Nitrate (NO3) is one of the most common contaminants in aquatic environments and groundwater. Nitrate concentrations and environmental isotope data (δ(15)N-NO3 and δ(18)O-NO3) from groundwater of Asopos basin, which has different land-use types, i.e., a large number of industries (e.g., textile, metal processing, food, fertilizers, paint), urban and agricultural areas and livestock breeding facilities, were analyzed to identify the nitrate sources of water contamination and N-biogeochemical transformations. A Bayesian isotope mixing model (SIAR) and multivariate statistical analysis of hydrochemical data were used to estimate the proportional contribution of different NO3 sources and to identify the dominant factors controlling the nitrate content of the groundwater in the region. The comparison of SIAR and Principal Component Analysis showed that wastes originating from urban and industrial zones of the basin are mainly responsible for nitrate contamination of groundwater in these areas. Agricultural fertilizers and manure likely contribute to groundwater contamination away from urban fabric and industrial land-use areas. Soil contribution to nitrate contamination due to organic matter is higher in the south-western part of the area far from the industries and the urban settlements. The present study aims to highlight the use of environmental isotopes combined with multivariate statistical analysis in locating sources of nitrate contamination in groundwater leading to a more effective planning of environmental measures and remediation strategies in river basins and water bodies as defined by the European Water Frame Directive (Directive 2000/60/EC).

Estimating short-run and long-run interaction mechanisms in interictal state.

PubMed

Ozkaya, Ata; Korürek, Mehmet

2010-04-01

We address the issue of analyzing electroencephalogram (EEG) from seizure patients in order to test, model and determine the statistical properties that distinguish between EEG states (interictal, pre-ictal, ictal) by introducing a new class of time series analysis methods. In the present study: firstly, we employ statistical methods to determine the non-stationary behavior of focal interictal epileptiform series within very short time intervals; secondly, for such intervals that are deemed non-stationary we suggest the concept of Autoregressive Integrated Moving Average (ARIMA) process modelling, well known in time series analysis. We finally address the queries of causal relationships between epileptic states and between brain areas during epileptiform activity. We estimate the interaction between different EEG series (channels) in short time intervals by performing Granger-causality analysis and also estimate such interaction in long time intervals by employing Cointegration analysis, both analysis methods are well-known in econometrics. Here we find: first, that the causal relationship between neuronal assemblies can be identified according to the duration and the direction of their possible mutual influences; second, that although the estimated bidirectional causality in short time intervals yields that the neuronal ensembles positively affect each other, in long time intervals neither of them is affected (increasing amplitudes) from this relationship. Moreover, Cointegration analysis of the EEG series enables us to identify whether there is a causal link from the interictal state to ictal state.

A method for screening active components from Chinese herbs by cell membrane chromatography-offline-high performance liquid chromatography/mass spectrometry and an online statistical tool for data processing.

PubMed

Cao, Yan; Wang, Shaozhan; Li, Yinghua; Chen, Xiaofei; Chen, Langdong; Wang, Dongyao; Zhu, Zhenyu; Yuan, Yongfang; Lv, Diya

2018-03-09

Cell membrane chromatography (CMC) has been successfully applied to screen bioactive compounds from Chinese herbs for many years, and some offline and online two-dimensional (2D) CMC-high performance liquid chromatography (HPLC) hyphenated systems have been established to perform screening assays. However, the requirement of sample preparation steps for the second-dimensional analysis in offline systems and the need for an interface device and technical expertise in the online system limit their extensive use. In the present study, an offline 2D CMC-HPLC analysis combined with the XCMS (various forms of chromatography coupled to mass spectrometry) Online statistical tool for data processing was established. First, our previously reported online 2D screening system was used to analyze three Chinese herbs that were reported to have potential anti-inflammatory effects, and two binding components were identified. By contrast, the proposed offline 2D screening method with XCMS Online analysis was applied, and three more ingredients were discovered in addition to the two compounds revealed by the online system. Then, cross-validation of the three compounds was performed, and they were confirmed to be included in the online data as well, but were not identified there because of their low concentrations and lack of credible statistical approaches. Last, pharmacological experiments showed that these five ingredients could inhibit IL-6 release and IL-6 gene expression on LPS-induced RAW cells in a dose-dependent manner. Compared with previous 2D CMC screening systems, this newly developed offline 2D method needs no sample preparation steps for the second-dimensional analysis, and it is sensitive, efficient, and convenient. It will be applicable in identifying active components from Chinese herbs and practical in discovery of lead compounds derived from herbs. Copyright © 2018 Elsevier B.V. All rights reserved.

Comparative analysis of targeted metabolomics: dominance-based rough set approach versus orthogonal partial least square-discriminant analysis.

PubMed

Blasco, H; Błaszczyński, J; Billaut, J C; Nadal-Desbarats, L; Pradat, P F; Devos, D; Moreau, C; Andres, C R; Emond, P; Corcia, P; Słowiński, R

2015-02-01

Metabolomics is an emerging field that includes ascertaining a metabolic profile from a combination of small molecules, and which has health applications. Metabolomic methods are currently applied to discover diagnostic biomarkers and to identify pathophysiological pathways involved in pathology. However, metabolomic data are complex and are usually analyzed by statistical methods. Although the methods have been widely described, most have not been either standardized or validated. Data analysis is the foundation of a robust methodology, so new mathematical methods need to be developed to assess and complement current methods. We therefore applied, for the first time, the dominance-based rough set approach (DRSA) to metabolomics data; we also assessed the complementarity of this method with standard statistical methods. Some attributes were transformed in a way allowing us to discover global and local monotonic relationships between condition and decision attributes. We used previously published metabolomics data (18 variables) for amyotrophic lateral sclerosis (ALS) and non-ALS patients. Principal Component Analysis (PCA) and Orthogonal Partial Least Square-Discriminant Analysis (OPLS-DA) allowed satisfactory discrimination (72.7%) between ALS and non-ALS patients. Some discriminant metabolites were identified: acetate, acetone, pyruvate and glutamine. The concentrations of acetate and pyruvate were also identified by univariate analysis as significantly different between ALS and non-ALS patients. DRSA correctly classified 68.7% of the cases and established rules involving some of the metabolites highlighted by OPLS-DA (acetate and acetone). Some rules identified potential biomarkers not revealed by OPLS-DA (beta-hydroxybutyrate). We also found a large number of common discriminating metabolites after Bayesian confirmation measures, particularly acetate, pyruvate, acetone and ascorbate, consistent with the pathophysiological pathways involved in ALS. DRSA provides a complementary method for improving the predictive performance of the multivariate data analysis usually used in metabolomics. This method could help in the identification of metabolites involved in disease pathogenesis. Interestingly, these different strategies mostly identified the same metabolites as being discriminant. The selection of strong decision rules with high value of Bayesian confirmation provides useful information about relevant condition-decision relationships not otherwise revealed in metabolomics data. Copyright © 2014 Elsevier Inc. All rights reserved.

MEASURE: An integrated data-analysis and model identification facility

NASA Technical Reports Server (NTRS)

Singh, Jaidip; Iyer, Ravi K.

1990-01-01

The first phase of the development of MEASURE, an integrated data analysis and model identification facility is described. The facility takes system activity data as input and produces as output representative behavioral models of the system in near real time. In addition a wide range of statistical characteristics of the measured system are also available. The usage of the system is illustrated on data collected via software instrumentation of a network of SUN workstations at the University of Illinois. Initially, statistical clustering is used to identify high density regions of resource-usage in a given environment. The identified regions form the states for building a state-transition model to evaluate system and program performance in real time. The model is then solved to obtain useful parameters such as the response-time distribution and the mean waiting time in each state. A graphical interface which displays the identified models and their characteristics (with real time updates) was also developed. The results provide an understanding of the resource-usage in the system under various workload conditions. This work is targeted for a testbed of UNIX workstations with the initial phase ported to SUN workstations on the NASA, Ames Research Center Advanced Automation Testbed.

Second primary malignancies after treatment for malignant lymphoma

PubMed Central

Okines, A; Thomson, C S; Radstone, C R; Horsman, J M; Hancock, B W

2005-01-01

To determine the incidence and possible causes of second primary malignancies after treatment for Hodgkin's and Non-Hodgkin's lymphoma (HL and NHL). A cohort of 3764 consecutive patients diagnosed with HL or NHL between January 1970 and July 2001 was identified using the Sheffield Lymphoma Group database. A search was undertaken for all patients diagnosed with a subsequent primary malignancy. Two matched controls were identified for each case. Odds ratios were calculated to detect and quantify any risk factors in the cases compared to their matched controls. Mean follow-up for the cohort was 5.2 years. A total of 68 patients who developed second cancers at least 6 months after their primary diagnosis were identified, giving a crude incidence of 1.89% overall: 3.21% among the patients treated for HL, 1.32% in those treated for NHL. Most common were bronchial, breast, colorectal and haematological malignancies. High stage at diagnosis almost reached statistical significance in the analysis of just the NHL patients (odds ratio=3.48; P=0.068) after adjustment for other factors. Treatment modality was not statistically significant in any analysis. High stage at diagnosis of NHL may be a risk factor for developing a second primary cancer. PMID:16106249

Combination of pharmacotherapy and psychotherapy in the treatment of chronic depression: A systematic review and meta-analysis

PubMed Central

2012-01-01

Background Chronic depression represents a substantial portion of depressive disorders and is associated with severe consequences. This review examined whether the combination of pharmacological treatments and psychotherapy is associated with higher effectiveness than pharmacotherapy alone via meta-analysis; and identified possible treatment effect modifiers via meta-regression-analysis. Methods A systematic search was conducted in the following databases: Cochrane Central Register of Controlled Trials (CENTRAL), MEDLINE, EMBASE, ISI Web of Science, BIOSIS, PsycINFO, and CINAHL. Primary efficacy outcome was a response to treatment; primary acceptance outcome was dropping out of the study. Only randomized controlled trials were considered. Results We identified 8 studies with a total of 9 relevant comparisons. Our analysis revealed small, but statistically not significant effects of combined therapies on outcomes directly related to depression (BR = 1.20) with substantial heterogeneity between studies (I² = 67%). Three treatment effect modifiers were identified: target disorders, the type of psychotherapy and the type of pharmacotherapy. Small but statistically significant effects of combined therapies on quality of life (SMD = 0.18) were revealed. No differences in acceptance rates and the long-term effects between combined treatments and pure pharmacological interventions were observed. Conclusions This systematic review could not provide clear evidence for the combination of pharmacotherapy and psychotherapy. However, due to the small amount of primary studies further research is needed for a conclusive decision. PMID:22694751

Factors influencing initiation and duration of breast feeding in Ireland.

PubMed

Leahy-Warren, Patricia; Mulcahy, Helen; Phelan, Agnes; Corcoran, Paul

2014-03-01

The aim of this research was to identify factors associated with mothers breast feeding and to identify, for those who breast fed, factors associated with breast feeding for as long as planned. breast feeding rates in Ireland are amongst the lowest in Europe. Research evidence indicates that in order for mothers to be successful at breast feeding, multiplicities of supports are necessary for both initiation and duration. The nature of these supports in tandem with other influencing factors requires analysis from an Irish perspective. cross-sectional study involving public health nurses and mothers in Ireland. This paper presents the results of the mothers' evaluation. mothers (n=1715) with children less than three years were offered a choice of completing the self-report questionnaires online or by mail. Data were analysed and reported using descriptive and inferential statistics. four in every five participants breast fed their infant and two thirds of them breast fed as long as planned. The multivariate logistic regression analysis identified that third level education, being a first time mother or previously having breast fed, participating online, having more than two public health nurse visits, and having a positive infant feeding attitude were independently and statistically significantly associated with breast feeding. Among mothers who breast fed, being aged at least 35 years, participating online, having a positive infant feeding attitude and high breast feeding self-efficacy were independently and statistically significantly associated with breast feeding for as long as planned. findings from this study reinforce health inequalities therefore there needs to be a renewed commitment to reducing health inequalities in relation to breast feeding. this study has identified factors associated with initiation and duration of breast feeding that are potentially modifiable through public health interventions. Copyright © 2013 Elsevier Ltd. All rights reserved.

An extended data mining method for identifying differentially expressed assay-specific signatures in functional genomic studies.

PubMed

Rollins, Derrick K; Teh, Ailing

2010-12-17

Microarray data sets provide relative expression levels for thousands of genes for a small number, in comparison, of different experimental conditions called assays. Data mining techniques are used to extract specific information of genes as they relate to the assays. The multivariate statistical technique of principal component analysis (PCA) has proven useful in providing effective data mining methods. This article extends the PCA approach of Rollins et al. to the development of ranking genes of microarray data sets that express most differently between two biologically different grouping of assays. This method is evaluated on real and simulated data and compared to a current approach on the basis of false discovery rate (FDR) and statistical power (SP) which is the ability to correctly identify important genes. This work developed and evaluated two new test statistics based on PCA and compared them to a popular method that is not PCA based. Both test statistics were found to be effective as evaluated in three case studies: (i) exposing E. coli cells to two different ethanol levels; (ii) application of myostatin to two groups of mice; and (iii) a simulated data study derived from the properties of (ii). The proposed method (PM) effectively identified critical genes in these studies based on comparison with the current method (CM). The simulation study supports higher identification accuracy for PM over CM for both proposed test statistics when the gene variance is constant and for one of the test statistics when the gene variance is non-constant. PM compares quite favorably to CM in terms of lower FDR and much higher SP. Thus, PM can be quite effective in producing accurate signatures from large microarray data sets for differential expression between assays groups identified in a preliminary step of the PCA procedure and is, therefore, recommended for use in these applications.

Random forests for classification in ecology

USGS Publications Warehouse

Cutler, D.R.; Edwards, T.C.; Beard, K.H.; Cutler, A.; Hess, K.T.; Gibson, J.; Lawler, J.J.

2007-01-01

Classification procedures are some of the most widely used statistical methods in ecology. Random forests (RF) is a new and powerful statistical classifier that is well established in other disciplines but is relatively unknown in ecology. Advantages of RF compared to other statistical classifiers include (1) very high classification accuracy; (2) a novel method of determining variable importance; (3) ability to model complex interactions among predictor variables; (4) flexibility to perform several types of statistical data analysis, including regression, classification, survival analysis, and unsupervised learning; and (5) an algorithm for imputing missing values. We compared the accuracies of RF and four other commonly used statistical classifiers using data on invasive plant species presence in Lava Beds National Monument, California, USA, rare lichen species presence in the Pacific Northwest, USA, and nest sites for cavity nesting birds in the Uinta Mountains, Utah, USA. We observed high classification accuracy in all applications as measured by cross-validation and, in the case of the lichen data, by independent test data, when comparing RF to other common classification methods. We also observed that the variables that RF identified as most important for classifying invasive plant species coincided with expectations based on the literature. ?? 2007 by the Ecological Society of America.

Evaluating the statistical methodology of randomized trials on dentin hypersensitivity management.

PubMed

Matranga, Domenica; Matera, Federico; Pizzo, Giuseppe

2017-12-27

The present study aimed to evaluate the characteristics and quality of statistical methodology used in clinical studies on dentin hypersensitivity management. An electronic search was performed for data published from 2009 to 2014 by using PubMed, Ovid/MEDLINE, and Cochrane Library databases. The primary search terms were used in combination. Eligibility criteria included randomized clinical trials that evaluated the efficacy of desensitizing agents in terms of reducing dentin hypersensitivity. A total of 40 studies were considered eligible for assessment of quality statistical methodology. The four main concerns identified were i) use of nonparametric tests in the presence of large samples, coupled with lack of information about normality and equality of variances of the response; ii) lack of P-value adjustment for multiple comparisons; iii) failure to account for interactions between treatment and follow-up time; and iv) no information about the number of teeth examined per patient and the consequent lack of cluster-specific approach in data analysis. Owing to these concerns, statistical methodology was judged as inappropriate in 77.1% of the 35 studies that used parametric methods. Additional studies with appropriate statistical analysis are required to obtain appropriate assessment of the efficacy of desensitizing agents.

Multiple Phenotype Association Tests Using Summary Statistics in Genome-Wide Association Studies

PubMed Central

Liu, Zhonghua; Lin, Xihong

2017-01-01

Summary We study in this paper jointly testing the associations of a genetic variant with correlated multiple phenotypes using the summary statistics of individual phenotype analysis from Genome-Wide Association Studies (GWASs). We estimated the between-phenotype correlation matrix using the summary statistics of individual phenotype GWAS analyses, and developed genetic association tests for multiple phenotypes by accounting for between-phenotype correlation without the need to access individual-level data. Since genetic variants often affect multiple phenotypes differently across the genome and the between-phenotype correlation can be arbitrary, we proposed robust and powerful multiple phenotype testing procedures by jointly testing a common mean and a variance component in linear mixed models for summary statistics. We computed the p-values of the proposed tests analytically. This computational advantage makes our methods practically appealing in large-scale GWASs. We performed simulation studies to show that the proposed tests maintained correct type I error rates, and to compare their powers in various settings with the existing methods. We applied the proposed tests to a GWAS Global Lipids Genetics Consortium summary statistics data set and identified additional genetic variants that were missed by the original single-trait analysis. PMID:28653391

Multiple phenotype association tests using summary statistics in genome-wide association studies.

PubMed

Liu, Zhonghua; Lin, Xihong

2018-03-01

We study in this article jointly testing the associations of a genetic variant with correlated multiple phenotypes using the summary statistics of individual phenotype analysis from Genome-Wide Association Studies (GWASs). We estimated the between-phenotype correlation matrix using the summary statistics of individual phenotype GWAS analyses, and developed genetic association tests for multiple phenotypes by accounting for between-phenotype correlation without the need to access individual-level data. Since genetic variants often affect multiple phenotypes differently across the genome and the between-phenotype correlation can be arbitrary, we proposed robust and powerful multiple phenotype testing procedures by jointly testing a common mean and a variance component in linear mixed models for summary statistics. We computed the p-values of the proposed tests analytically. This computational advantage makes our methods practically appealing in large-scale GWASs. We performed simulation studies to show that the proposed tests maintained correct type I error rates, and to compare their powers in various settings with the existing methods. We applied the proposed tests to a GWAS Global Lipids Genetics Consortium summary statistics data set and identified additional genetic variants that were missed by the original single-trait analysis. © 2017, The International Biometric Society.

An information-theoretic approach to the modeling and analysis of whole-genome bisulfite sequencing data.

PubMed

Jenkinson, Garrett; Abante, Jordi; Feinberg, Andrew P; Goutsias, John

2018-03-07

DNA methylation is a stable form of epigenetic memory used by cells to control gene expression. Whole genome bisulfite sequencing (WGBS) has emerged as a gold-standard experimental technique for studying DNA methylation by producing high resolution genome-wide methylation profiles. Statistical modeling and analysis is employed to computationally extract and quantify information from these profiles in an effort to identify regions of the genome that demonstrate crucial or aberrant epigenetic behavior. However, the performance of most currently available methods for methylation analysis is hampered by their inability to directly account for statistical dependencies between neighboring methylation sites, thus ignoring significant information available in WGBS reads. We present a powerful information-theoretic approach for genome-wide modeling and analysis of WGBS data based on the 1D Ising model of statistical physics. This approach takes into account correlations in methylation by utilizing a joint probability model that encapsulates all information available in WGBS methylation reads and produces accurate results even when applied on single WGBS samples with low coverage. Using the Shannon entropy, our approach provides a rigorous quantification of methylation stochasticity in individual WGBS samples genome-wide. Furthermore, it utilizes the Jensen-Shannon distance to evaluate differences in methylation distributions between a test and a reference sample. Differential performance assessment using simulated and real human lung normal/cancer data demonstrate a clear superiority of our approach over DSS, a recently proposed method for WGBS data analysis. Critically, these results demonstrate that marginal methods become statistically invalid when correlations are present in the data. This contribution demonstrates clear benefits and the necessity of modeling joint probability distributions of methylation using the 1D Ising model of statistical physics and of quantifying methylation stochasticity using concepts from information theory. By employing this methodology, substantial improvement of DNA methylation analysis can be achieved by effectively taking into account the massive amount of statistical information available in WGBS data, which is largely ignored by existing methods.

28 CFR 22.21 - Use of identifiable data.

Code of Federal Regulations, 2010 CFR

2010-07-01

... STATISTICAL INFORMATION § 22.21 Use of identifiable data. Research or statistical information identifiable to a private person may be used only for research or statistical purposes. ... 28 Judicial Administration 1 2010-07-01 2010-07-01 false Use of identifiable data. 22.21 Section...

«

15

16

17

18

19

»

«

16

17

18

19

20

»

Statistical Analysis of 30 Years Rainfall Data: A Case Study

NASA Astrophysics Data System (ADS)

Arvind, G.; Ashok Kumar, P.; Girish Karthi, S.; Suribabu, C. R.

2017-07-01

Rainfall is a prime input for various engineering design such as hydraulic structures, bridges and culverts, canals, storm water sewer and road drainage system. The detailed statistical analysis of each region is essential to estimate the relevant input value for design and analysis of engineering structures and also for crop planning. A rain gauge station located closely in Trichy district is selected for statistical analysis where agriculture is the prime occupation. The daily rainfall data for a period of 30 years is used to understand normal rainfall, deficit rainfall, Excess rainfall and Seasonal rainfall of the selected circle headquarters. Further various plotting position formulae available is used to evaluate return period of monthly, seasonally and annual rainfall. This analysis will provide useful information for water resources planner, farmers and urban engineers to assess the availability of water and create the storage accordingly. The mean, standard deviation and coefficient of variation of monthly and annual rainfall was calculated to check the rainfall variability. From the calculated results, the rainfall pattern is found to be erratic. The best fit probability distribution was identified based on the minimum deviation between actual and estimated values. The scientific results and the analysis paved the way to determine the proper onset and withdrawal of monsoon results which were used for land preparation and sowing.

Randomized Trial of Plaque-Identifying Toothpaste: Decreasing Plaque and Inflammation.

PubMed

Fasula, Kim; Evans, Carla A; Boyd, Linda; Giblin, Lori; Belavsky, Benjamin Z; Hetzel, Scott; McBride, Patrick; DeMets, David L; Hennekens, Charles H

2017-06-01

Randomized data are sparse about whether a plaque-identifying toothpaste reduces dental plaque and nonexistent for inflammation. Inflammation is intimately involved in the pathogenesis of atherosclerosis and is accurately measured by high-sensitivity C-reactive protein (hs-CRP), a sensitive marker for cardiovascular disease. The hypotheses that Plaque HD (TJA Health LLC, Joliet, Ill), a plaque-identifying toothpaste, produces statistically significant reductions in dental plaque and hs-CRP were tested in this randomized trial. Sixty-one apparently healthy subjects aged 19 to 44 years were assigned at random to this plaque-identifying (n = 31) or placebo toothpaste (n = 30) for 60 days. Changes from baseline to follow-up in dental plaque and hs-CRP were assessed. In an intention-to-treat analysis, the plaque-identifying toothpaste reduced mean plaque score by 49%, compared with a 24% reduction in placebo (P = .001). In a prespecified subgroup analysis of 38 subjects with baseline levels >0.5 mg/L, the plaque-identifying toothpaste reduced hs-CRP by 29%, compared with a 25% increase in placebo toothpaste (P = .041). This plaque-identifying toothpaste produced statistically significant reductions in dental plaque and hs-CRP. The observed reduction in dental plaque confirms and extends a previous observation. The observed reduction in inflammation supports the hypothesis of a reduction in risks of cardiovascular disease. The direct test of this hypothesis requires a large-scale randomized trial of sufficient size and duration designed a priori to do so. Such a finding would have major clinical and public health implications. Copyright © 2017 Elsevier Inc. All rights reserved.

Methodological quality of behavioural weight loss studies: a systematic review

PubMed Central

Lemon, S. C.; Wang, M. L.; Haughton, C. F.; Estabrook, D. P.; Frisard, C. F.; Pagoto, S. L.

2018-01-01

Summary This systematic review assessed the methodological quality of behavioural weight loss intervention studies conducted among adults and associations between quality and statistically significant weight loss outcome, strength of intervention effectiveness and sample size. Searches for trials published between January, 2009 and December, 2014 were conducted using PUBMED, MEDLINE and PSYCINFO and identified ninety studies. Methodological quality indicators included study design, anthropometric measurement approach, sample size calculations, intent-to-treat (ITT) analysis, loss to follow-up rate, missing data strategy, sampling strategy, report of treatment receipt and report of intervention fidelity (mean = 6.3). Indicators most commonly utilized included randomized design (100%), objectively measured anthropometrics (96.7%), ITT analysis (86.7%) and reporting treatment adherence (76.7%). Most studies (62.2%) had a follow-up rate >75% and reported a loss to follow-up analytic strategy or minimal missing data (69.9%). Describing intervention fidelity (34.4%) and sampling from a known population (41.1%) were least common. Methodological quality was not associated with reporting a statistically significant result, effect size or sample size. This review found the published literature of behavioural weight loss trials to be of high quality for specific indicators, including study design and measurement. Identified for improvement include utilization of more rigorous statistical approaches to loss to follow up and better fidelity reporting. PMID:27071775

Direction dependence analysis: A framework to test the direction of effects in linear models with an implementation in SPSS.

PubMed

Wiedermann, Wolfgang; Li, Xintong

2018-04-16

In nonexperimental data, at least three possible explanations exist for the association of two variables x and y: (1) x is the cause of y, (2) y is the cause of x, or (3) an unmeasured confounder is present. Statistical tests that identify which of the three explanatory models fits best would be a useful adjunct to the use of theory alone. The present article introduces one such statistical method, direction dependence analysis (DDA), which assesses the relative plausibility of the three explanatory models on the basis of higher-moment information about the variables (i.e., skewness and kurtosis). DDA involves the evaluation of three properties of the data: (1) the observed distributions of the variables, (2) the residual distributions of the competing models, and (3) the independence properties of the predictors and residuals of the competing models. When the observed variables are nonnormally distributed, we show that DDA components can be used to uniquely identify each explanatory model. Statistical inference methods for model selection are presented, and macros to implement DDA in SPSS are provided. An empirical example is given to illustrate the approach. Conceptual and empirical considerations are discussed for best-practice applications in psychological data, and sample size recommendations based on previous simulation studies are provided.

Identifying Pleiotropic Genes in Genome-Wide Association Studies for Multivariate Phenotypes with Mixed Measurement Scales

PubMed Central

Williams, L. Keoki; Buu, Anne

2017-01-01

We propose a multivariate genome-wide association test for mixed continuous, binary, and ordinal phenotypes. A latent response model is used to estimate the correlation between phenotypes with different measurement scales so that the empirical distribution of the Fisher’s combination statistic under the null hypothesis is estimated efficiently. The simulation study shows that our proposed correlation estimation methods have high levels of accuracy. More importantly, our approach conservatively estimates the variance of the test statistic so that the type I error rate is controlled. The simulation also shows that the proposed test maintains the power at the level very close to that of the ideal analysis based on known latent phenotypes while controlling the type I error. In contrast, conventional approaches–dichotomizing all observed phenotypes or treating them as continuous variables–could either reduce the power or employ a linear regression model unfit for the data. Furthermore, the statistical analysis on the database of the Study of Addiction: Genetics and Environment (SAGE) demonstrates that conducting a multivariate test on multiple phenotypes can increase the power of identifying markers that may not be, otherwise, chosen using marginal tests. The proposed method also offers a new approach to analyzing the Fagerström Test for Nicotine Dependence as multivariate phenotypes in genome-wide association studies. PMID:28081206

Research Analysis on MOOC Course Dropout and Retention Rates

ERIC Educational Resources Information Center

Gomez-Zermeno, Marcela Gerogina; Aleman de La Garza, Lorena

2016-01-01

This research's objective was to identify the terminal efficiency of the Massive Online Open Course "Educational Innovation with Open Resources" offered by a Mexican private university. A quantitative methodology was used, combining descriptive statistics and probabilistic models to analyze the levels of retention, completion, and…

Mapping cell populations in flow cytometry data for cross‐sample comparison using the Friedman–Rafsky test statistic as a distance measure

PubMed Central

Hsiao, Chiaowen; Liu, Mengya; Stanton, Rick; McGee, Monnie; Qian, Yu

2015-01-01

Abstract Flow cytometry (FCM) is a fluorescence‐based single‐cell experimental technology that is routinely applied in biomedical research for identifying cellular biomarkers of normal physiological responses and abnormal disease states. While many computational methods have been developed that focus on identifying cell populations in individual FCM samples, very few have addressed how the identified cell populations can be matched across samples for comparative analysis. This article presents FlowMap‐FR, a novel method for cell population mapping across FCM samples. FlowMap‐FR is based on the Friedman–Rafsky nonparametric test statistic (FR statistic), which quantifies the equivalence of multivariate distributions. As applied to FCM data by FlowMap‐FR, the FR statistic objectively quantifies the similarity between cell populations based on the shapes, sizes, and positions of fluorescence data distributions in the multidimensional feature space. To test and evaluate the performance of FlowMap‐FR, we simulated the kinds of biological and technical sample variations that are commonly observed in FCM data. The results show that FlowMap‐FR is able to effectively identify equivalent cell populations between samples under scenarios of proportion differences and modest position shifts. As a statistical test, FlowMap‐FR can be used to determine whether the expression of a cellular marker is statistically different between two cell populations, suggesting candidates for new cellular phenotypes by providing an objective statistical measure. In addition, FlowMap‐FR can indicate situations in which inappropriate splitting or merging of cell populations has occurred during gating procedures. We compared the FR statistic with the symmetric version of Kullback–Leibler divergence measure used in a previous population matching method with both simulated and real data. The FR statistic outperforms the symmetric version of KL‐distance in distinguishing equivalent from nonequivalent cell populations. FlowMap‐FR was also employed as a distance metric to match cell populations delineated by manual gating across 30 FCM samples from a benchmark FlowCAP data set. An F‐measure of 0.88 was obtained, indicating high precision and recall of the FR‐based population matching results. FlowMap‐FR has been implemented as a standalone R/Bioconductor package so that it can be easily incorporated into current FCM data analytical workflows. © 2015 International Society for Advancement of Cytometry PMID:26274018

Mapping cell populations in flow cytometry data for cross-sample comparison using the Friedman-Rafsky test statistic as a distance measure.

PubMed

Hsiao, Chiaowen; Liu, Mengya; Stanton, Rick; McGee, Monnie; Qian, Yu; Scheuermann, Richard H

2016-01-01

Flow cytometry (FCM) is a fluorescence-based single-cell experimental technology that is routinely applied in biomedical research for identifying cellular biomarkers of normal physiological responses and abnormal disease states. While many computational methods have been developed that focus on identifying cell populations in individual FCM samples, very few have addressed how the identified cell populations can be matched across samples for comparative analysis. This article presents FlowMap-FR, a novel method for cell population mapping across FCM samples. FlowMap-FR is based on the Friedman-Rafsky nonparametric test statistic (FR statistic), which quantifies the equivalence of multivariate distributions. As applied to FCM data by FlowMap-FR, the FR statistic objectively quantifies the similarity between cell populations based on the shapes, sizes, and positions of fluorescence data distributions in the multidimensional feature space. To test and evaluate the performance of FlowMap-FR, we simulated the kinds of biological and technical sample variations that are commonly observed in FCM data. The results show that FlowMap-FR is able to effectively identify equivalent cell populations between samples under scenarios of proportion differences and modest position shifts. As a statistical test, FlowMap-FR can be used to determine whether the expression of a cellular marker is statistically different between two cell populations, suggesting candidates for new cellular phenotypes by providing an objective statistical measure. In addition, FlowMap-FR can indicate situations in which inappropriate splitting or merging of cell populations has occurred during gating procedures. We compared the FR statistic with the symmetric version of Kullback-Leibler divergence measure used in a previous population matching method with both simulated and real data. The FR statistic outperforms the symmetric version of KL-distance in distinguishing equivalent from nonequivalent cell populations. FlowMap-FR was also employed as a distance metric to match cell populations delineated by manual gating across 30 FCM samples from a benchmark FlowCAP data set. An F-measure of 0.88 was obtained, indicating high precision and recall of the FR-based population matching results. FlowMap-FR has been implemented as a standalone R/Bioconductor package so that it can be easily incorporated into current FCM data analytical workflows. © The Authors. Published by Wiley Periodicals, Inc. on behalf of ISAC.

Information-dependent enrichment analysis reveals time-dependent transcriptional regulation of the estrogen pathway of toxicity.

PubMed

Pendse, Salil N; Maertens, Alexandra; Rosenberg, Michael; Roy, Dipanwita; Fasani, Rick A; Vantangoli, Marguerite M; Madnick, Samantha J; Boekelheide, Kim; Fornace, Albert J; Odwin, Shelly-Ann; Yager, James D; Hartung, Thomas; Andersen, Melvin E; McMullen, Patrick D

2017-04-01

The twenty-first century vision for toxicology involves a transition away from high-dose animal studies to in vitro and computational models (NRC in Toxicity testing in the 21st century: a vision and a strategy, The National Academies Press, Washington, DC, 2007). This transition requires mapping pathways of toxicity by understanding how in vitro systems respond to chemical perturbation. Uncovering transcription factors/signaling networks responsible for gene expression patterns is essential for defining pathways of toxicity, and ultimately, for determining the chemical modes of action through which a toxicant acts. Traditionally, transcription factor identification is achieved via chromatin immunoprecipitation studies and summarized by calculating which transcription factors are statistically associated with up- and downregulated genes. These lists are commonly determined via statistical or fold-change cutoffs, a procedure that is sensitive to statistical power and may not be as useful for determining transcription factor associations. To move away from an arbitrary statistical or fold-change-based cutoff, we developed, in the context of the Mapping the Human Toxome project, an enrichment paradigm called information-dependent enrichment analysis (IDEA) to guide identification of the transcription factor network. We used a test case of activation in MCF-7 cells by 17β estradiol (E2). Using this new approach, we established a time course for transcriptional and functional responses to E2. ERα and ERβ were associated with short-term transcriptional changes in response to E2. Sustained exposure led to recruitment of additional transcription factors and alteration of cell cycle machinery. TFAP2C and SOX2 were the transcription factors most highly correlated with dose. E2F7, E2F1, and Foxm1, which are involved in cell proliferation, were enriched only at 24 h. IDEA should be useful for identifying candidate pathways of toxicity. IDEA outperforms gene set enrichment analysis (GSEA) and provides similar results to weighted gene correlation network analysis, a platform that helps to identify genes not annotated to pathways.

Retrospective space-time cluster analysis of whooping cough, re-emergence in Barcelona, Spain, 2000-2011.

PubMed

Solano, Rubén; Gómez-Barroso, Diana; Simón, Fernando; Lafuente, Sarah; Simón, Pere; Rius, Cristina; Gorrindo, Pilar; Toledo, Diana; Caylà, Joan A

2014-05-01

A retrospective, space-time study of whooping cough cases reported to the Public Health Agency of Barcelona, Spain between the years 2000 and 2011 is presented. It is based on 633 individual whooping cough cases and the 2006 population census from the Spanish National Statistics Institute, stratified by age and sex at the census tract level. Cluster identification was attempted using space-time scan statistic assuming a Poisson distribution and restricting temporal extent to 7 days and spatial distance to 500 m. Statistical calculations were performed with Stata 11 and SatScan and mapping was performed with ArcGis 10.0. Only clusters showing statistical significance (P <0.05) were mapped. The most likely cluster identified included five census tracts located in three neighbourhoods in central Barcelona during the week from 17 to 23 August 2011. This cluster included five cases compared with the expected level of 0.0021 (relative risk = 2436, P <0.001). In addition, 11 secondary significant space-time clusters were detected with secondary clusters occurring at different times and localizations. Spatial statistics is felt to be useful by complementing epidemiological surveillance systems through visualizing excess in the number of cases in space and time and thus increase the possibility of identifying outbreaks not reported by the surveillance system.

Detection of Anomalies in Hydrometric Data Using Artificial Intelligence Techniques

NASA Astrophysics Data System (ADS)

Lauzon, N.; Lence, B. J.

2002-12-01

This work focuses on the detection of anomalies in hydrometric data sequences, such as 1) outliers, which are individual data having statistical properties that differ from those of the overall population; 2) shifts, which are sudden changes over time in the statistical properties of the historical records of data; and 3) trends, which are systematic changes over time in the statistical properties. For the purpose of the design and management of water resources systems, it is important to be aware of these anomalies in hydrometric data, for they can induce a bias in the estimation of water quantity and quality parameters. These anomalies may be viewed as specific patterns affecting the data, and therefore pattern recognition techniques can be used for identifying them. However, the number of possible patterns is very large for each type of anomaly and consequently large computing capacities are required to account for all possibilities using the standard statistical techniques, such as cluster analysis. Artificial intelligence techniques, such as the Kohonen neural network and fuzzy c-means, are clustering techniques commonly used for pattern recognition in several areas of engineering and have recently begun to be used for the analysis of natural systems. They require much less computing capacity than the standard statistical techniques, and therefore are well suited for the identification of outliers, shifts and trends in hydrometric data. This work constitutes a preliminary study, using synthetic data representing hydrometric data that can be found in Canada. The analysis of the results obtained shows that the Kohonen neural network and fuzzy c-means are reasonably successful in identifying anomalies. This work also addresses the problem of uncertainties inherent to the calibration procedures that fit the clusters to the possible patterns for both the Kohonen neural network and fuzzy c-means. Indeed, for the same database, different sets of clusters can be established with these calibration procedures. A simple method for analyzing uncertainties associated with the Kohonen neural network and fuzzy c-means is developed here. The method combines the results from several sets of clusters, either from the Kohonen neural network or fuzzy c-means, so as to provide an overall diagnosis as to the identification of outliers, shifts and trends. The results indicate an improvement in the performance for identifying anomalies when the method of combining cluster sets is used, compared with when only one cluster set is used.

Efficiency of the Bethesda System for Thyroid Cytopathology.

PubMed

Mora-Guzmán, Ismael; Muñoz de Nova, José Luis; Marín-Campos, Cristina; Jiménez-Heffernan, José Antonio; Cuesta Pérez, Juan Julián; Lahera Vargas, Marcos; Torres Mínguez, Emma; Martín-Pérez, Elena

2018-03-28

Fine-needle aspiration biopsies are a key tool for preoperative assessment of thyroid nodules, and the Bethesda system is the preferred method to report cytological analysis. The purpose of this study is to assess the efficiency of the Bethesda system to identify the malignancy risk of thyroid nodules. Patients who underwent thyroid surgery between June 2010 and June 2017 were included. Samples were classified into 6categories according to rates of malignancy associated with each diagnostic category. In order to investigate the correlation between categories, a statistical analysis compared the categories with pathology reports. Diagnostic indicators were calculated as a screening test (categories IV, V, VI as true-positive) and as a method to identify malignancy (V, VI as true-positive). In a series of 522 patients, we found 184 (35.2%) malignant tumours, papillary carcinoma being the most prevalent with 155 cases (84.2%). Malignant rates for diagnostic categories were: I, 0%; II, 1.5%; III, 6.4%; IV, 31%; V, 86.5%; VI, 100%. A robust correlation was identified between categories on statistical analysis. For the «screening test» analysis, sensitivity was 98.9%, specificity 84.4%, positive predictive value 69.6%, negative predictive value 99.5%, and diagnostic accuracy 88.2%. Analysing the accuracy to detect malignancy, values were: sensitivity 98.6%, specificity 97.6%, positive predictive value 93.5%, negative predictive value 99.5%, diagnostic accuracy 97.9%. The Bethesda system is a clear and reliable approach to report thyroid cytology and therefore is an effective tool to identify malignancy risk and guide clinical management. Copyright © 2018 AEC. Publicado por Elsevier España, S.L.U. All rights reserved.

Identifying functional reorganization of spelling networks: an individual peak probability comparison approach

PubMed Central

Purcell, Jeremy J.; Rapp, Brenda

2013-01-01

Previous research has shown that damage to the neural substrates of orthographic processing can lead to functional reorganization during reading (Tsapkini et al., 2011); in this research we ask if the same is true for spelling. To examine the functional reorganization of spelling networks we present a novel three-stage Individual Peak Probability Comparison (IPPC) analysis approach for comparing the activation patterns obtained during fMRI of spelling in a single brain-damaged individual with dysgraphia to those obtained in a set of non-impaired control participants. The first analysis stage characterizes the convergence in activations across non-impaired control participants by applying a technique typically used for characterizing activations across studies: Activation Likelihood Estimate (ALE) (Turkeltaub et al., 2002). This method was used to identify locations that have a high likelihood of yielding activation peaks in the non-impaired participants. The second stage provides a characterization of the degree to which the brain-damaged individual's activations correspond to the group pattern identified in Stage 1. This involves performing a Mahalanobis distance statistics analysis (Tsapkini et al., 2011) that compares each of a control group's peak activation locations to the nearest peak generated by the brain-damaged individual. The third stage evaluates the extent to which the brain-damaged individual's peaks are atypical relative to the range of individual variation among the control participants. This IPPC analysis allows for a quantifiable, statistically sound method for comparing an individual's activation pattern to the patterns observed in a control group and, thus, provides a valuable tool for identifying functional reorganization in a brain-damaged individual with impaired spelling. Furthermore, this approach can be applied more generally to compare any individual's activation pattern with that of a set of other individuals. PMID:24399981

A better way to evaluate remote monitoring programs in chronic disease care: receiver operating characteristic analysis.

PubMed

Brown Connolly, Nancy E

2014-12-01

This foundational study applies the process of receiver operating characteristic (ROC) analysis to evaluate utility and predictive value of a disease management (DM) model that uses RM devices for chronic obstructive pulmonary disease (COPD). The literature identifies a need for a more rigorous method to validate and quantify evidence-based value for remote monitoring (RM) systems being used to monitor persons with a chronic disease. ROC analysis is an engineering approach widely applied in medical testing, but that has not been evaluated for its utility in RM. Classifiers (saturated peripheral oxygen [SPO2], blood pressure [BP], and pulse), optimum threshold, and predictive accuracy are evaluated based on patient outcomes. Parametric and nonparametric methods were used. Event-based patient outcomes included inpatient hospitalization, accident and emergency, and home health visits. Statistical analysis tools included Microsoft (Redmond, WA) Excel(®) and MedCalc(®) (MedCalc Software, Ostend, Belgium) version 12 © 1993-2013 to generate ROC curves and statistics. Persons with COPD were monitored a minimum of 183 days, with at least one inpatient hospitalization within 12 months prior to monitoring. Retrospective, de-identified patient data from a United Kingdom National Health System COPD program were used. Datasets included biometric readings, alerts, and resource utilization. SPO2 was identified as a predictive classifier, with an optimal average threshold setting of 85-86%. BP and pulse were failed classifiers, and areas of design were identified that may improve utility and predictive capacity. Cost avoidance methodology was developed. RESULTS can be applied to health services planning decisions. Methods can be applied to system design and evaluation based on patient outcomes. This study validated the use of ROC in RM program evaluation.

Physical and genetic-interaction density reveals functional organization and informs significance cutoffs in genome-wide screens

PubMed Central

Dittmar, John C.; Pierce, Steven; Rothstein, Rodney; Reid, Robert J. D.

2013-01-01

Genome-wide experiments often measure quantitative differences between treated and untreated cells to identify affected strains. For these studies, statistical models are typically used to determine significance cutoffs. We developed a method termed “CLIK” (Cutoff Linked to Interaction Knowledge) that overlays biological knowledge from the interactome on screen results to derive a cutoff. The method takes advantage of the fact that groups of functionally related interacting genes often respond similarly to experimental conditions and, thus, cluster in a ranked list of screen results. We applied CLIK analysis to five screens of the yeast gene disruption library and found that it defined a significance cutoff that differed from traditional statistics. Importantly, verification experiments revealed that the CLIK cutoff correlated with the position in the rank order where the rate of true positives drops off significantly. In addition, the gene sets defined by CLIK analysis often provide further biological perspectives. For example, applying CLIK analysis retrospectively to a screen for cisplatin sensitivity allowed us to identify the importance of the Hrq1 helicase in DNA crosslink repair. Furthermore, we demonstrate the utility of CLIK to determine optimal treatment conditions by analyzing genome-wide screens at multiple rapamycin concentrations. We show that CLIK is an extremely useful tool for evaluating screen quality, determining screen cutoffs, and comparing results between screens. Furthermore, because CLIK uses previously annotated interaction data to determine biologically informed cutoffs, it provides additional insights into screen results, which supplement traditional statistical approaches. PMID:23589890

A Bifactor Approach to Model Multifaceted Constructs in Statistical Mediation Analysis.

PubMed

Gonzalez, Oscar; MacKinnon, David P

Statistical mediation analysis allows researchers to identify the most important mediating constructs in the causal process studied. Identifying specific mediators is especially relevant when the hypothesized mediating construct consists of multiple related facets. The general definition of the construct and its facets might relate differently to an outcome. However, current methods do not allow researchers to study the relationships between general and specific aspects of a construct to an outcome simultaneously. This study proposes a bifactor measurement model for the mediating construct as a way to parse variance and represent the general aspect and specific facets of a construct simultaneously. Monte Carlo simulation results are presented to help determine the properties of mediated effect estimation when the mediator has a bifactor structure and a specific facet of a construct is the true mediator. This study also investigates the conditions when researchers can detect the mediated effect when the multidimensionality of the mediator is ignored and treated as unidimensional. Simulation results indicated that the mediation model with a bifactor mediator measurement model had unbiased and adequate power to detect the mediated effect with a sample size greater than 500 and medium a - and b -paths. Also, results indicate that parameter bias and detection of the mediated effect in both the data-generating model and the misspecified model varies as a function of the amount of facet variance represented in the mediation model. This study contributes to the largely unexplored area of measurement issues in statistical mediation analysis.

Outcome predictors in the management of intramedullary classic ependymoma: An integrative survival analysis.

PubMed

Wang, Yinqing; Cai, Ranze; Wang, Rui; Wang, Chunhua; Chen, Chunmei

2018-06-01

This is a retrospective study.The aim of this study was to illustrate the survival outcomes of patients with classic ependymoma (CE) and identify potential prognostic factors.CE is the most common category of spinal ependymomas, but few published studies have discussed predictors of the survival outcome.A Boolean search of the PubMed, Embase, and OVID databases was conducted by 2 investigators independently. The objects were intramedullary grade II ependymoma according to 2007 WHO classification. Univariate Kaplan-Meier analysis and Log-Rank tests were performed to identify variables associated with progression-free survival (PFS) or overall survival (OS). Multivariate Cox regression was performed to assess hazard ratios (HRs) with 95% confidence intervals (95% CIs). Statistical analysis was performed by SPSS version 23.0 (IBM Corp.) with statistical significance defined as P < .05.A total of 35 studies were identified, including 169 cases of CE. The mean follow-up time across cases was 64.2 ± 51.5 months. Univariate analysis showed that patients who had undergone total resection (TR) had better PFS and OS than those with subtotal resection (STR) and biopsy (P = .002, P = .004, respectively). Within either univariate or multivariate analysis (P = .000, P = .07, respectively), histological type was an independent prognostic factor for PFS of CE [papillary type: HR 0.002, 95% CI (0.000-0.073), P = .001, tanycytic type: HR 0.010, 95% CI (0.000-0.218), P = .003].It was the first integrative analysis of CE to elucidate the correlation between kinds of factors and prognostic outcomes. Definite histological type and safely TR were foundation of CE's management. 4.

A Serological Biopsy Using Five Stomach-Specific Circulating Biomarkers for Gastric Cancer Risk Assessment: A Multi-Phase Study.

PubMed

Tu, Huakang; Sun, Liping; Dong, Xiao; Gong, Yuehua; Xu, Qian; Jing, Jingjing; Bostick, Roberd M; Wu, Xifeng; Yuan, Yuan

2017-05-01

We aimed to assess a serological biopsy using five stomach-specific circulating biomarkers-pepsinogen I (PGI), PGII, PGI/II ratio, anti-Helicobacter pylori (H. pylori) antibody, and gastrin-17 (G-17)-for identifying high-risk individuals and predicting risk of developing gastric cancer (GC). Among 12,112 participants with prospective follow-up from an ongoing population-based screening program using both serology and gastroscopy in China, we conducted a multi-phase study involving a cross-sectional analysis, a follow-up analysis, and an integrative risk prediction modeling analysis. In the cross-sectional analysis, the five biomarkers (especially PGII, the PGI/II ratio, and H. pylori sero-positivity) were associated with the presence of precancerous gastric lesions or GC at enrollment. In the follow-up analysis, low PGI levels and PGI/II ratios were associated with higher risk of developing GC, and both low (<0.5 pmol/l) and high (>4.7 pmol/l) G-17 levels were associated with higher risk of developing GC, suggesting a J-shaped association. In the risk prediction modeling analysis, the five biomarkers combined yielded a C statistic of 0.803 (95% confidence interval (CI)=0.789-0.816) and improved prediction beyond traditional risk factors (C statistic from 0.580 to 0.811, P<0.001) for identifying precancerous lesions at enrollment, and higher serological biopsy scores based on the five biomarkers at enrollment were associated with higher risk of developing GC during follow-up (P for trend <0.001). A serological biopsy composed of the five stomach-specific circulating biomarkers could be used to identify high-risk individuals for further diagnostic gastroscopy, and to stratify individuals' risk of developing GC and thus to guide targeted screening and precision prevention.

Geosocial process and its regularities

NASA Astrophysics Data System (ADS)

Vikulina, Marina; Vikulin, Alexander; Dolgaya, Anna

2015-04-01

Natural disasters and social events (wars, revolutions, genocides, epidemics, fires, etc.) accompany each other throughout human civilization, thus reflecting the close relationship of these phenomena that are seemingly of different nature. In order to study this relationship authors compiled and analyzed the list of the 2,400 natural disasters and social phenomena weighted by their magnitude that occurred during the last XXXVI centuries of our history. Statistical analysis was performed separately for each aggregate (natural disasters and social phenomena), and for particular statistically representative types of events. There was 5 + 5 = 10 types. It is shown that the numbers of events in the list are distributed by logarithmic law: the bigger the event, the less likely it happens. For each type of events and each aggregate the existence of periodicities with periods of 280 ± 60 years was established. Statistical analysis of the time intervals between adjacent events for both aggregates showed good agreement with Weibull-Gnedenko distribution with shape parameter less than 1, which is equivalent to the conclusion about the grouping of events at small time intervals. Modeling of statistics of time intervals with Pareto distribution allowed to identify the emergent property for all events in the aggregate. This result allowed the authors to make conclusion about interaction between natural disasters and social phenomena. The list of events compiled by authors and first identified properties of cyclicity, grouping and interaction process reflected by this list is the basis of modeling essentially unified geosocial process at high enough statistical level. Proof of interaction between "lifeless" Nature and Society is fundamental and provided a new approach to forecasting demographic crises with taking into account both natural disasters and social phenomena.

The Ups and Downs of Repeated Cleavage and Internal Fragment Production in Top-Down Proteomics.

PubMed

Lyon, Yana A; Riggs, Dylan; Fornelli, Luca; Compton, Philip D; Julian, Ryan R

2018-01-01

Analysis of whole proteins by mass spectrometry, or top-down proteomics, has several advantages over methods relying on proteolysis. For example, proteoforms can be unambiguously identified and examined. However, from a gas-phase ion-chemistry perspective, proteins are enormous molecules that present novel challenges relative to peptide analysis. Herein, the statistics of cleaving the peptide backbone multiple times are examined to evaluate the inherent propensity for generating internal versus terminal ions. The raw statistics reveal an inherent bias favoring production of terminal ions, which holds true regardless of protein size. Importantly, even if the full suite of internal ions is generated by statistical dissociation, terminal ions are predicted to account for at least 50% of the total ion current, regardless of protein size, if there are three backbone dissociations or fewer. Top-down analysis should therefore be a viable approach for examining proteins of significant size. Comparison of the purely statistical analysis with actual top-down data derived from ultraviolet photodissociation (UVPD) and higher-energy collisional dissociation (HCD) reveals that terminal ions account for much of the total ion current in both experiments. Terminal ion production is more favored in UVPD relative to HCD, which is likely due to differences in the mechanisms controlling fragmentation. Importantly, internal ions are not found to dominate from either the theoretical or experimental point of view. Graphical abstract ᅟ.

«

16

17

18

19

20

»

«

17

18

19

20

21

»

The Ups and Downs of Repeated Cleavage and Internal Fragment Production in Top-Down Proteomics

NASA Astrophysics Data System (ADS)

Lyon, Yana A.; Riggs, Dylan; Fornelli, Luca; Compton, Philip D.; Julian, Ryan R.

2018-01-01

Analysis of whole proteins by mass spectrometry, or top-down proteomics, has several advantages over methods relying on proteolysis. For example, proteoforms can be unambiguously identified and examined. However, from a gas-phase ion-chemistry perspective, proteins are enormous molecules that present novel challenges relative to peptide analysis. Herein, the statistics of cleaving the peptide backbone multiple times are examined to evaluate the inherent propensity for generating internal versus terminal ions. The raw statistics reveal an inherent bias favoring production of terminal ions, which holds true regardless of protein size. Importantly, even if the full suite of internal ions is generated by statistical dissociation, terminal ions are predicted to account for at least 50% of the total ion current, regardless of protein size, if there are three backbone dissociations or fewer. Top-down analysis should therefore be a viable approach for examining proteins of significant size. Comparison of the purely statistical analysis with actual top-down data derived from ultraviolet photodissociation (UVPD) and higher-energy collisional dissociation (HCD) reveals that terminal ions account for much of the total ion current in both experiments. Terminal ion production is more favored in UVPD relative to HCD, which is likely due to differences in the mechanisms controlling fragmentation. Importantly, internal ions are not found to dominate from either the theoretical or experimental point of view. [Figure not available: see fulltext.

DARHT Multi-intelligence Seismic and Acoustic Data Analysis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Stevens, Garrison Nicole; Van Buren, Kendra Lu; Hemez, Francois M.

The purpose of this report is to document the analysis of seismic and acoustic data collected at the Dual-Axis Radiographic Hydrodynamic Test (DARHT) facility at Los Alamos National Laboratory for robust, multi-intelligence decision making. The data utilized herein is obtained from two tri-axial seismic sensors and three acoustic sensors, resulting in a total of nine data channels. The goal of this analysis is to develop a generalized, automated framework to determine internal operations at DARHT using informative features extracted from measurements collected external of the facility. Our framework involves four components: (1) feature extraction, (2) data fusion, (3) classification, andmore » finally (4) robustness analysis. Two approaches are taken for extracting features from the data. The first of these, generic feature extraction, involves extraction of statistical features from the nine data channels. The second approach, event detection, identifies specific events relevant to traffic entering and leaving the facility as well as explosive activities at DARHT and nearby explosive testing sites. Event detection is completed using a two stage method, first utilizing signatures in the frequency domain to identify outliers and second extracting short duration events of interest among these outliers by evaluating residuals of an autoregressive exogenous time series model. Features extracted from each data set are then fused to perform analysis with a multi-intelligence paradigm, where information from multiple data sets are combined to generate more information than available through analysis of each independently. The fused feature set is used to train a statistical classifier and predict the state of operations to inform a decision maker. We demonstrate this classification using both generic statistical features and event detection and provide a comparison of the two methods. Finally, the concept of decision robustness is presented through a preliminary analysis where uncertainty is added to the system through noise in the measurements.« less

Calypso: a user-friendly web-server for mining and visualizing microbiome-environment interactions.

PubMed

Zakrzewski, Martha; Proietti, Carla; Ellis, Jonathan J; Hasan, Shihab; Brion, Marie-Jo; Berger, Bernard; Krause, Lutz

2017-03-01

Calypso is an easy-to-use online software suite that allows non-expert users to mine, interpret and compare taxonomic information from metagenomic or 16S rDNA datasets. Calypso has a focus on multivariate statistical approaches that can identify complex environment-microbiome associations. The software enables quantitative visualizations, statistical testing, multivariate analysis, supervised learning, factor analysis, multivariable regression, network analysis and diversity estimates. Comprehensive help pages, tutorials and videos are provided via a wiki page. The web-interface is accessible via http://cgenome.net/calypso/ . The software is programmed in Java, PERL and R and the source code is available from Zenodo ( https://zenodo.org/record/50931 ). The software is freely available for non-commercial users. l.krause@uq.edu.au. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

A methodological analysis of chaplaincy research: 2000-2009.

PubMed

Galek, Kathleen; Flannelly, Kevin J; Jankowski, Katherine R B; Handzo, George F

2011-01-01

The present article presents a comprehensive review and analysis of quantitative research conducted in the United States on chaplaincy and closely related topics published between 2000 and 2009. A combined search strategy identified 49 quantitative studies in 13 journals. The analysis focuses on the methodological sophistication of the studies, compared to earlier research on chaplaincy and pastoral care. Cross-sectional surveys of convenience samples still dominate the field, but sample sizes have increased somewhat over the past three decades. Reporting of the validity and reliability of measures continues to be low, although reporting of response rates has improved. Improvements in the use of inferential statistics and statistical controls were also observed, compared to previous research. The authors conclude that more experimental research is needed on chaplaincy, along with an increased use of hypothesis testing, regardless of the research designs that are used.

Integration of statistical and physiological analyses of adaptation of near-isogenic barley lines.

PubMed

Romagosa, I; Fox, P N; García Del Moral, L F; Ramos, J M; García Del Moral, B; Roca de Togores, F; Molina-Cano, J L

1993-08-01

Seven near-isogenic barley lines, differing for three independent mutant genes, were grown in 15 environments in Spain. Genotype x environment interaction (G x E) for grain yield was examined with the Additive Main Effects and Multiplicative interaction (AMMI) model. The results of this statistical analysis of multilocation yield-data were compared with a morpho-physiological characterization of the lines at two sites (Molina-Cano et al. 1990). The first two principal component axes from the AMMI analysis were strongly associated with the morpho-physiological characters. The independent but parallel discrimination among genotypes reflects genetic differences and highlights the power of the AMMI analysis as a tool to investigate G x E. Characters which appear to be positively associated with yield in the germplasm under study could be identified for some environments.

Disutility analysis of oil spills: graphs and trends.

PubMed

Ventikos, Nikolaos P; Sotiropoulos, Foivos S

2014-04-15

This paper reports the results of an analysis of oil spill cost data assembled from a worldwide pollution database that mainly includes data from the International Oil Pollution Compensation Fund. The purpose of the study is to analyze the conditions of marine pollution accidents and the factors that impact the costs of oil spills worldwide. The accidents are classified into categories based on their characteristics, and the cases are compared using charts to show how the costs are affected under all conditions. This study can be used as a helpful reference for developing a detailed statistical model that is capable of reliably and realistically estimating the total costs of oil spills. To illustrate the differences identified by this statistical analysis, the results are compared with the results of previous studies, and the findings are discussed. Copyright © 2014 Elsevier Ltd. All rights reserved.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zdarek, J.; Pecinka, L.

Leak-before-break (LBB) analysis of WWER type reactors in the Czech and Sloval Republics is summarized in this paper. Legislative bases, required procedures, and validation and verification of procedures are discussed. A list of significant issues identified during the application of LBB analysis is presented. The results of statistical evaluation of crack length characteristics are presented and compared for the WWER 440 Type 230 and 213 reactors and for the WWER 1000 Type 302, 320 and 338 reactors.

Agile Airmen: Developing the Capacity to Quickly Create Innovative Ideas

DTIC Science & Technology

2011-03-23

economic growth.26 In contrast, a 2008 statistical analysis finds a high correlation to economic growth. Eric Hanushek and Ludger Woessmann studied... Hanushek and Woessmann analysis identified STEM leaders as vital to America‟s long-term prosperity, but having quality teachers who can teach STEM...accessed November 10, 2010). 27 Eric A. Hanushek & Ludger Woessmann, "Do Better Schools Lead to More Growth? Cognitive Skills, Economic Outcomes

Total RNA Sequencing Analysis of DCIS Progressing to Invasive Breast Cancer

DTIC Science & Technology

2015-09-01

EPICOPY to obtain reliable copy number variation ( CNV ) data from the methylome array data, thereby decreasing the DNA requirements in half...in the R statistical environment. Samples were assessed for good performance on the array using detection p-values, a metric implemented by...Illumina to identify probes detected with confidence. Samples less than 90% of probes detected were removed from the analysis and probes undetected in any

On Statistical Analysis of Neuroimages with Imperfect Registration

PubMed Central

Kim, Won Hwa; Ravi, Sathya N.; Johnson, Sterling C.; Okonkwo, Ozioma C.; Singh, Vikas

2016-01-01

A variety of studies in neuroscience/neuroimaging seek to perform statistical inference on the acquired brain image scans for diagnosis as well as understanding the pathological manifestation of diseases. To do so, an important first step is to register (or co-register) all of the image data into a common coordinate system. This permits meaningful comparison of the intensities at each voxel across groups (e.g., diseased versus healthy) to evaluate the effects of the disease and/or use machine learning algorithms in a subsequent step. But errors in the underlying registration make this problematic, they either decrease the statistical power or make the follow-up inference tasks less effective/accurate. In this paper, we derive a novel algorithm which offers immunity to local errors in the underlying deformation field obtained from registration procedures. By deriving a deformation invariant representation of the image, the downstream analysis can be made more robust as if one had access to a (hypothetical) far superior registration procedure. Our algorithm is based on recent work on scattering transform. Using this as a starting point, we show how results from harmonic analysis (especially, non-Euclidean wavelets) yields strategies for designing deformation and additive noise invariant representations of large 3-D brain image volumes. We present a set of results on synthetic and real brain images where we achieve robust statistical analysis even in the presence of substantial deformation errors; here, standard analysis procedures significantly under-perform and fail to identify the true signal. PMID:27042168

K-nearest neighbors based methods for identification of different gear crack levels under different motor speeds and loads: Revisited

NASA Astrophysics Data System (ADS)

Wang, Dong

2016-03-01

Gears are the most commonly used components in mechanical transmission systems. Their failures may cause transmission system breakdown and result in economic loss. Identification of different gear crack levels is important to prevent any unexpected gear failure because gear cracks lead to gear tooth breakage. Signal processing based methods mainly require expertize to explain gear fault signatures which is usually not easy to be achieved by ordinary users. In order to automatically identify different gear crack levels, intelligent gear crack identification methods should be developed. The previous case studies experimentally proved that K-nearest neighbors based methods exhibit high prediction accuracies for identification of 3 different gear crack levels under different motor speeds and loads. In this short communication, to further enhance prediction accuracies of existing K-nearest neighbors based methods and extend identification of 3 different gear crack levels to identification of 5 different gear crack levels, redundant statistical features are constructed by using Daubechies 44 (db44) binary wavelet packet transform at different wavelet decomposition levels, prior to the use of a K-nearest neighbors method. The dimensionality of redundant statistical features is 620, which provides richer gear fault signatures. Since many of these statistical features are redundant and highly correlated with each other, dimensionality reduction of redundant statistical features is conducted to obtain new significant statistical features. At last, the K-nearest neighbors method is used to identify 5 different gear crack levels under different motor speeds and loads. A case study including 3 experiments is investigated to demonstrate that the developed method provides higher prediction accuracies than the existing K-nearest neighbors based methods for recognizing different gear crack levels under different motor speeds and loads. Based on the new significant statistical features, some other popular statistical models including linear discriminant analysis, quadratic discriminant analysis, classification and regression tree and naive Bayes classifier, are compared with the developed method. The results show that the developed method has the highest prediction accuracies among these statistical models. Additionally, selection of the number of new significant features and parameter selection of K-nearest neighbors are thoroughly investigated.

MAGMA: analysis of two-channel microarrays made easy.

PubMed

Rehrauer, Hubert; Zoller, Stefan; Schlapbach, Ralph

2007-07-01

The web application MAGMA provides a simple and intuitive interface to identify differentially expressed genes from two-channel microarray data. While the underlying algorithms are not superior to those of similar web applications, MAGMA is particularly user friendly and can be used without prior training. The user interface guides the novice user through the most typical microarray analysis workflow consisting of data upload, annotation, normalization and statistical analysis. It automatically generates R-scripts that document MAGMA's entire data processing steps, thereby allowing the user to regenerate all results in his local R installation. The implementation of MAGMA follows the model-view-controller design pattern that strictly separates the R-based statistical data processing, the web-representation and the application logic. This modular design makes the application flexible and easily extendible by experts in one of the fields: statistical microarray analysis, web design or software development. State-of-the-art Java Server Faces technology was used to generate the web interface and to perform user input processing. MAGMA's object-oriented modular framework makes it easily extendible and applicable to other fields and demonstrates that modern Java technology is also suitable for rather small and concise academic projects. MAGMA is freely available at www.magma-fgcz.uzh.ch.

Effectiveness of propolis on oral health: a meta-analysis.

PubMed

Hwu, Yueh-Juen; Lin, Feng-Yu

2014-12-01

The use of propolis mouth rinse or gel as a supplementary intervention has increased during the last decade in Taiwan. However, the effect of propolis on oral health is not well understood. The purpose of this meta-analysis was to present the best available evidence regarding the effects of propolis use on oral health, including oral infection, dental plaque, and stomatitis. Researchers searched seven electronic databases for relevant articles published between 1969 and 2012. Data were collected using inclusion and exclusion criteria. The Joanna Briggs Institute Meta Analysis of Statistics Assessment and Review Instrument was used to evaluate the quality of the identified articles. Eight trials published from 1997 to 2011 with 194 participants had extractable data. The result of the meta-analysis indicated that, although propolis had an effect on reducing dental plaque, this effect was not statistically significant. The results were not statistically significant for oral infection or stomatitis. Although there are a number of promising indications, in view of the limited number and quality of studies and the variation in results among studies, this review highlights the need for additional well-designed trials to draw conclusions that are more robust.

Gis-Based Spatial Statistical Analysis of College Graduates Employment

NASA Astrophysics Data System (ADS)

Tang, R.

2012-07-01

It is urgently necessary to be aware of the distribution and employment status of college graduates for proper allocation of human resources and overall arrangement of strategic industry. This study provides empirical evidence regarding the use of geocoding and spatial analysis in distribution and employment status of college graduates based on the data from 2004-2008 Wuhan Municipal Human Resources and Social Security Bureau, China. Spatio-temporal distribution of employment unit were analyzed with geocoding using ArcGIS software, and the stepwise multiple linear regression method via SPSS software was used to predict the employment and to identify spatially associated enterprise and professionals demand in the future. The results show that the enterprises in Wuhan east lake high and new technology development zone increased dramatically from 2004 to 2008, and tended to distributed southeastward. Furthermore, the models built by statistical analysis suggest that the specialty of graduates major in has an important impact on the number of the employment and the number of graduates engaging in pillar industries. In conclusion, the combination of GIS and statistical analysis which helps to simulate the spatial distribution of the employment status is a potential tool for human resource development research.

Water quality analysis of the Rapur area, Andhra Pradesh, South India using multivariate techniques

NASA Astrophysics Data System (ADS)

Nagaraju, A.; Sreedhar, Y.; Thejaswi, A.; Sayadi, Mohammad Hossein

2017-10-01

The groundwater samples from Rapur area were collected from different sites to evaluate the major ion chemistry. The large number of data can lead to difficulties in the integration, interpretation, and representation of the results. Two multivariate statistical methods, hierarchical cluster analysis (HCA) and factor analysis (FA), were applied to evaluate their usefulness to classify and identify geochemical processes controlling groundwater geochemistry. Four statistically significant clusters were obtained from 30 sampling stations. This has resulted two important clusters viz., cluster 1 (pH, Si, CO3, Mg, SO4, Ca, K, HCO3, alkalinity, Na, Na + K, Cl, and hardness) and cluster 2 (EC and TDS) which are released to the study area from different sources. The application of different multivariate statistical techniques, such as principal component analysis (PCA), assists in the interpretation of complex data matrices for a better understanding of water quality of a study area. From PCA, it is clear that the first factor (factor 1), accounted for 36.2% of the total variance, was high positive loading in EC, Mg, Cl, TDS, and hardness. Based on the PCA scores, four significant cluster groups of sampling locations were detected on the basis of similarity of their water quality.

Statistical Analysis of Demographic and Temporal Differences in LANL's 2014 Voluntary Protection Program Survey

DOE Office of Scientific and Technical Information (OSTI.GOV)

Davis, Adam Christopher; Booth, Steven Richard

2015-08-20

Voluntary Protection Program (VPP) surveys were conducted in 2013 and 2014 to assess the degree to which workers at Los Alamos National Laboratory feel that their safety is valued by their management and peers. The goal of this analysis is to determine whether the difference between the VPP survey scores in 2013 and 2014 is significant, and to present the data in a way such that it can help identify either positive changes or potential opportunities for improvement. Data for several questions intended to identify the demographic groups of the respondent are included in both the 2013 and 2014 VPPmore » survey results. These can be used to identify any significant differences among groups of employees as well as to identify any temporal trends in these cohorts.« less

The mediating effect of calling on the relationship between medical school students’ academic burnout and empathy

PubMed Central

2017-01-01

Purpose This study is aimed at identifying the relationships between medical school students’ academic burnout, empathy, and calling, and determining whether their calling has a mediating effect on the relationship between academic burnout and empathy. Methods A mixed method study was conducted. One hundred twenty-seven medical students completed a survey. Scales measuring academic burnout, medical students’ empathy, and calling were utilized. For statistical analysis, correlation analysis, descriptive statistics analysis, and hierarchical multiple regression analyses were conducted. For qualitative approach, eight medical students participated in a focus group interview. Results The study found that empathy has a statistically significant, negative correlation with academic burnout, while having a significant, positive correlation with calling. Sense of calling proved to be an effective mediator of the relationship between academic burnout and empathy. Conclusion This result demonstrates that calling is a key variable that mediates the relationship between medical students’ academic burnout and empathy. As such, this study provides baseline data for an education that could improve medical students’ empathy skills. PMID:28870019

Segment and fit thresholding: a new method for image analysis applied to microarray and immunofluorescence data.

PubMed

Ensink, Elliot; Sinha, Jessica; Sinha, Arkadeep; Tang, Huiyuan; Calderone, Heather M; Hostetter, Galen; Winter, Jordan; Cherba, David; Brand, Randall E; Allen, Peter J; Sempere, Lorenzo F; Haab, Brian B

2015-10-06

Experiments involving the high-throughput quantification of image data require algorithms for automation. A challenge in the development of such algorithms is to properly interpret signals over a broad range of image characteristics, without the need for manual adjustment of parameters. Here we present a new approach for locating signals in image data, called Segment and Fit Thresholding (SFT). The method assesses statistical characteristics of small segments of the image and determines the best-fit trends between the statistics. Based on the relationships, SFT identifies segments belonging to background regions; analyzes the background to determine optimal thresholds; and analyzes all segments to identify signal pixels. We optimized the initial settings for locating background and signal in antibody microarray and immunofluorescence data and found that SFT performed well over multiple, diverse image characteristics without readjustment of settings. When used for the automated analysis of multicolor, tissue-microarray images, SFT correctly found the overlap of markers with known subcellular localization, and it performed better than a fixed threshold and Otsu's method for selected images. SFT promises to advance the goal of full automation in image analysis.

Water quality and non-point sources of risk: the Jiulong River Watershed, P. R. of China.

PubMed

Zhang, Jingjing; Zhang, Luoping; Ricci, Paolo F

2012-01-01

Retrospective water quality assessment plays an essential role in identifying trends and causal associations between exposures and risks, thus it can be a guide for water resources management. We have developed empirical relationships between several time-varying social and economic factors of economic development, water quality variables such as nitrate-nitrogen, COD(Mn), BOD(5), and DO, in the Jiulong River Watershed and its main tributary, the West River. Our analyses used alternative statistical methods to reduce the dimensionality of the analysis first and then strengthen the study's causal associations. The statistical methods included: factor analysis (FA), trend analysis, Monte Carlo/bootstrap simulations, robust regressions and a coupled equations model, integrated into a framework that allows an investigation and resolution of the issues that may affect the estimated results. After resolving these, we found that the concentrations of nitrogen compounds increased over time in the West River region, and that fertilizer used in agricultural fruit crops was the main risk with regard to nitrogen pollution. The relationships we developed can identify hazards and explain the impact of sources of different types of pollution, such as urbanization, and agriculture.

Segment and Fit Thresholding: A New Method for Image Analysis Applied to Microarray and Immunofluorescence Data

PubMed Central

Ensink, Elliot; Sinha, Jessica; Sinha, Arkadeep; Tang, Huiyuan; Calderone, Heather M.; Hostetter, Galen; Winter, Jordan; Cherba, David; Brand, Randall E.; Allen, Peter J.; Sempere, Lorenzo F.; Haab, Brian B.

2016-01-01

Certain experiments involve the high-throughput quantification of image data, thus requiring algorithms for automation. A challenge in the development of such algorithms is to properly interpret signals over a broad range of image characteristics, without the need for manual adjustment of parameters. Here we present a new approach for locating signals in image data, called Segment and Fit Thresholding (SFT). The method assesses statistical characteristics of small segments of the image and determines the best-fit trends between the statistics. Based on the relationships, SFT identifies segments belonging to background regions; analyzes the background to determine optimal thresholds; and analyzes all segments to identify signal pixels. We optimized the initial settings for locating background and signal in antibody microarray and immunofluorescence data and found that SFT performed well over multiple, diverse image characteristics without readjustment of settings. When used for the automated analysis of multi-color, tissue-microarray images, SFT correctly found the overlap of markers with known subcellular localization, and it performed better than a fixed threshold and Otsu’s method for selected images. SFT promises to advance the goal of full automation in image analysis. PMID:26339978

«

17

18

19

20

21

»

«

18

19

20

21

22

»

A preliminary study on identification of Thai rice samples by INAA and statistical analysis

NASA Astrophysics Data System (ADS)

Kongsri, S.; Kukusamude, C.

2017-09-01

This study aims to investigate the elemental compositions in 93 Thai rice samples using instrumental neutron activation analysis (INAA) and to identify rice according to their types and rice cultivars using statistical analysis. As, Mg, Cl, Al, Br, Mn, K, Rb and Zn in Thai jasmine rice and Sung Yod rice samples were successfully determined by INAA. The accuracy and precision of the INAA method were verified by SRM 1568a Rice Flour. All elements were found to be in a good agreement with the certified values. The precisions in term of %RSD were lower than 7%. The LODs were obtained in range of 0.01 to 29 mg kg-1. The concentration of 9 elements distributed in Thai rice samples was evaluated and used as chemical indicators to identify the type of rice samples. The result found that Mg, Cl, As, Br, Mn, K, Rb, and Zn concentrations in Thai jasmine rice samples are significantly different but there was no evidence that Al is significantly different from concentration in Sung Yod rice samples at 95% confidence interval. Our results may provide preliminary information for discrimination of rice samples and may be useful database of Thai rice.

Comparative inter-institutional study of stress among dentists.

PubMed

Pozos-Radillo, Blanca E; Galván-Ramírez, Ma Luz; Pando, Manuel; Carrión, Ma De los Angeles; González, Guillermo J

2010-01-01

Dentistry is considered to be a stressful profession due to different factors caused by work, representing a threat to dentists'health. The objectives of this work were to identify and compare chronic stress in dentists among the different health institutions and the association of stress with risk factors. The study in question is observational, transversal and comparative; 256 dentists were included, distributed among five public health institutions in the city of Guadalajara, Jalisco, Mexico, namely: the Mexican Institute of Social Security (IMSS), the Ministry of Health (SS), the Integral Development of the Family (DIF), the Social Security Services Institute for the Workers (ISSSTE) and the University of Guadalajara (U. de G) Data were obtained by means of the census technique. Stress was identified using the Stress Symptoms Inventory and the statistical analysis was performed using the Odds Ratio (O.R.) and the chi-square statistic. From the total population studied, 219 subjects presented high levels of chronic stress and 37, low levels. In the results of comparative analysis, significant differences were found between IMSS and U. de G and likewise between IMSS and SS. However, in the analysis of association, only U. de G was found to be associated with the high level of chronic stress.

Cross Time-Frequency Analysis of Gastrocnemius Electromyographic Signals in Hypertensive and Nonhypertensive Subjects

NASA Astrophysics Data System (ADS)

Mitchell, Patrick; Krotish, Debra; Shin, Yong-June; Hirth, Victor

2010-12-01

The effects of hypertension are chronic and continuous; it affects gait, balance, and fall risk. Therefore, it is desirable to assess gait health across hypertensive and nonhypertensive subjects in order to prevent or reduce the risk of falls. Analysis of electromyography (EMG) signals can identify age related changes of neuromuscular activation due to various neuropathies and myopathies, but it is difficult to translate these medical changes to clinical diagnosis. To examine and compare geriatrics patients with these gait-altering diseases, we acquire EMG muscle activation signals, and by use of a timesynchronized mat capable of recording pressure information, we localize the EMG data to the gait cycle, ensuring identical comparison across subjects. Using time-frequency analysis on the EMG signal, in conjunction with several parameters obtained from the time-frequency analyses, we can determine the statistical discrepancy between diseases. We base these parameters on physiological manifestations caused by hypertension, as well as other comorbities that affect the geriatrics community. Using these metrics in a small population, we identify a statistical discrepancy between a control group and subjects with hypertension, neuropathy, diabetes, osteoporosis, arthritis, and several other common diseases which severely affect the geriatrics community.

A systematic review and meta-analysis of tract-based spatial statistics studies regarding attention-deficit/hyperactivity disorder.

PubMed

Chen, Lizhou; Hu, Xinyu; Ouyang, Luo; He, Ning; Liao, Yi; Liu, Qi; Zhou, Ming; Wu, Min; Huang, Xiaoqi; Gong, Qiyong

2016-09-01

Diffusion tensor imaging (DTI) studies that use tract-based spatial statistics (TBSS) have demonstrated the microstructural abnormalities of white matter (WM) in patients with attention-deficit/hyperactivity disorder (ADHD); however, robust conclusions have not yet been drawn. The present study integrated the findings of previous TBSS studies to determine the most consistent WM alterations in ADHD via a narrative review and meta-analysis. The literature search was conducted through October 2015 to identify TBSS studies that compared fractional anisotropy (FA) between ADHD patients and healthy controls. FA reductions were identified in the splenium of the corpus callosum (CC) that extended to the right cingulum, right sagittal stratum, and left tapetum. The first two clusters retained significance in the sensitivity analysis and in all subgroup analyses. The FA reduction in the CC splenium was negatively associated with the mean age of the ADHD group. We hypothesize that, in addition to the fronto-striatal-cerebellar circuit, the disturbed WM matter tracts that integrate the bilateral hemispheres and posterior-brain circuitries play a crucial role in the pathophysiology of ADHD. Copyright © 2016 Elsevier Ltd. All rights reserved.

Test data analysis for concentrating photovoltaic arrays

NASA Astrophysics Data System (ADS)

Maish, A. B.; Cannon, J. E.

A test data analysis approach for use with steady state efficiency measurements taken on concentrating photovoltaic arrays is presented. The analysis procedures can be used to identify based and erroneous data. The steps involved in analyzing the test data are screening the data, developing coefficients for the performance equation, analyzing statistics to ensure adequacy of the regression fit to the data, and plotting the data. In addition, this paper analyzes the sources and magnitudes of precision and bias errors that affect measurement accuracy are analyzed.

Race and Gender Bias in the Administration of Corporal Punishment.

ERIC Educational Resources Information Center

Shaw, Steven R.; Braden, Jeffery B.

1990-01-01

Examined disciplinary actions taken by school building administrators after receiving discipline referral to identify evidence of race and gender bias in administration of corporal punishment (CP). Analysis of discipline files (n=6,244) demonstrated statistically significant relationships between race and CP and between gender and CP. Results…

The Novice-Expert Continuum in Astronomy Knowledge

ERIC Educational Resources Information Center

Bryce, T. G. K.; Blown, E. J.

2012-01-01

The nature of expertise in astronomy was investigated across a broad spectrum of ages and experience in China and New Zealand. Five hypotheses (capable of quantification and statistical analysis) were used to probe types of expertise identified by previous researchers: (a) domain-specific knowledge-skill in the use of scientific vocabulary and…

Drei neue gamma-Doradus-Sterne aus der ASAS-3 Datenbank

NASA Astrophysics Data System (ADS)

Bernhard, Klaus; Huemmerich, Stefan

2016-02-01

By analysis of data from the ASAS-3 archive, the stars HD 18011, NSV 16873 and NSV 3272 were identified as multiperiodic gamma Doradus variables. Essential information on these variables is presented, along with unwhitened frequency spectra and statistically significant frequencies, as derived with Period 04.

A multi-scale analysis of landscape statistics

Treesearch

Douglas H. Cain; Kurt H. Riitters; Kenneth Orvis

1997-01-01

It is now feasible to monitor some aspects of landscape ecological condition nationwide using remotely- sensed imagery and indicators of land cover pattern. Previous research showed redundancies among many reported pattern indicators and identified six unique dimensions of land cover pattern. This study tested the stability of those dimensions and representative...

Identifying the Factors That Influence Change in SEBD Using Logistic Regression Analysis

ERIC Educational Resources Information Center

Camilleri, Liberato; Cefai, Carmel

2013-01-01

Multiple linear regression and ANOVA models are widely used in applications since they provide effective statistical tools for assessing the relationship between a continuous dependent variable and several predictors. However these models rely heavily on linearity and normality assumptions and they do not accommodate categorical dependent…

Contributions to Statistical Problems Related to Microarray Data

ERIC Educational Resources Information Center

Hong, Feng

2009-01-01

Microarray is a high throughput technology to measure the gene expression. Analysis of microarray data brings many interesting and challenging problems. This thesis consists three studies related to microarray data. First, we propose a Bayesian model for microarray data and use Bayes Factors to identify differentially expressed genes. Second, we…

Developing and Refining the Taiwan Birth Cohort Study (TBCS): Five Years of Experience

ERIC Educational Resources Information Center

Lung, For-Wey; Chiang, Tung-Liang; Lin, Shio-Jean; Shu, Bih-Ching; Lee, Meng-Chih

2011-01-01

The Taiwan Birth Cohort Study (TBCS) is the first nationwide birth cohort database in Asia designed to establish national norms of children's development. Several challenges during database development and data analysis were identified. Challenges include sampling methods, instrument development and statistical approach to missing data. The…

Research Review: Children and Poverty [Book Review].

ERIC Educational Resources Information Center

Holman, Bob

1994-01-01

This study is a careful review and analysis of recent official statistics and academic studies about children and poverty in the United Kingdom. Kumar fully and succinctly identifies the link between increasing child poverty and economic, demographic, and policy changes and the greater risks of children from ethnic minorities. (SLD)

Identifying subgroups of patients using latent class analysis: should we use a single-stage or a two-stage approach? A methodological study using a cohort of patients with low back pain.

PubMed

Nielsen, Anne Molgaard; Kent, Peter; Hestbaek, Lise; Vach, Werner; Kongsted, Alice

2017-02-01

Heterogeneity in patients with low back pain (LBP) is well recognised and different approaches to subgrouping have been proposed. Latent Class Analysis (LCA) is a statistical technique that is increasingly being used to identify subgroups based on patient characteristics. However, as LBP is a complex multi-domain condition, the optimal approach when using LCA is unknown. Therefore, this paper describes the exploration of two approaches to LCA that may help improve the identification of clinically relevant and interpretable LBP subgroups. From 928 LBP patients consulting a chiropractor, baseline data were used as input to the statistical subgrouping. In a single-stage LCA, all variables were modelled simultaneously to identify patient subgroups. In a two-stage LCA, we used the latent class membership from our previously published LCA within each of six domains of health (activity, contextual factors, pain, participation, physical impairment and psychology) (first stage) as the variables entered into the second stage of the two-stage LCA to identify patient subgroups. The description of the results of the single-stage and two-stage LCA was based on a combination of statistical performance measures, qualitative evaluation of clinical interpretability (face validity) and a subgroup membership comparison. For the single-stage LCA, a model solution with seven patient subgroups was preferred, and for the two-stage LCA, a nine patient subgroup model. Both approaches identified similar, but not identical, patient subgroups characterised by (i) mild intermittent LBP, (ii) recent severe LBP and activity limitations, (iii) very recent severe LBP with both activity and participation limitations, (iv) work-related LBP, (v) LBP and several negative consequences and (vi) LBP with nerve root involvement. Both approaches identified clinically interpretable patient subgroups. The potential importance of these subgroups needs to be investigated by exploring whether they can be identified in other cohorts and by examining their possible association with patient outcomes. This may inform the selection of a preferred LCA approach.

Cosmology constraints from shear peak statistics in Dark Energy Survey Science Verification data

DOE PAGES

Kacprzak, T.; Kirk, D.; Friedrich, O.; ...

2016-08-19

Shear peak statistics has gained a lot of attention recently as a practical alternative to the two point statistics for constraining cosmological parameters. We perform a shear peak statistics analysis of the Dark Energy Survey (DES) Science Verification (SV) data, using weak gravitational lensing measurements from a 139 degmore » $^2$ field. We measure the abundance of peaks identified in aperture mass maps, as a function of their signal-to-noise ratio, in the signal-to-noise range $$0<\\mathcal S / \\mathcal N<4$$. To predict the peak counts as a function of cosmological parameters we use a suite of $N$-body simulations spanning 158 models with varying $$\\Omega_{\\rm m}$$ and $$\\sigma_8$$, fixing $w = -1$, $$\\Omega_{\\rm b} = 0.04$$, $h = 0.7$ and $$n_s=1$$, to which we have applied the DES SV mask and redshift distribution. In our fiducial analysis we measure $$\\sigma_{8}(\\Omega_{\\rm m}/0.3)^{0.6}=0.77 \\pm 0.07$$, after marginalising over the shear multiplicative bias and the error on the mean redshift of the galaxy sample. We introduce models of intrinsic alignments, blending, and source contamination by cluster members. These models indicate that peaks with $$\\mathcal S / \\mathcal N>4$$ would require significant corrections, which is why we do not include them in our analysis. We compare our results to the cosmological constraints from the two point analysis on the SV field and find them to be in good agreement in both the central value and its uncertainty. As a result, we discuss prospects for future peak statistics analysis with upcoming DES data.« less

Sedimentological analysis and bed thickness statistics from a Carboniferous deep-water channel-levee complex: Myall Trough, SE Australia

NASA Astrophysics Data System (ADS)

Palozzi, Jason; Pantopoulos, George; Maravelis, Angelos G.; Nordsvan, Adam; Zelilidis, Avraam

2018-02-01

This investigation presents an outcrop-based integrated study of internal division analysis and statistical treatment of turbidite bed thickness applied to a Carboniferous deep-water channel-levee complex in the Myall Trough, southeast Australia. Turbidite beds of the studied succession are characterized by a range of sedimentary structures grouped into two main associations, a thick-bedded and a thin-bedded one, that reflect channel-fill and overbank/levee deposits, respectively. Three vertically stacked channel-levee cycles have been identified. Results of statistical analysis of bed thickness, grain-size and internal division patterns applied on the studied channel-levee succession, indicate that turbidite bed thickness data seem to be well characterized by a bimodal lognormal distribution, which is possibly reflecting the difference between deposition from lower-density flows (in a levee/overbank setting) and very high-density flows (in a channel fill setting). Power law and exponential distributions were observed to hold only for the thick-bedded parts of the succession and cannot characterize the whole bed thickness range of the studied sediments. The succession also exhibits non-random clustering of bed thickness and grain-size measurements. The studied sediments are also characterized by the presence of statistically detected fining-upward sandstone packets. A novel quantitative approach (change-point analysis) is proposed for the detection of those packets. Markov permutation statistics also revealed the existence of order in the alternation of internal divisions in the succession expressed by an optimal internal division cycle reflecting two main types of gravity flow events deposited within both thick-bedded conglomeratic and thin-bedded sandstone associations. The analytical methods presented in this study can be used as additional tools for quantitative analysis and recognition of depositional environments in hydrocarbon-bearing research of ancient deep-water channel-levee settings.

A randomized trial in a massive online open course shows people don't know what a statistically significant relationship looks like, but they can learn.

PubMed

Fisher, Aaron; Anderson, G Brooke; Peng, Roger; Leek, Jeff

2014-01-01

Scatterplots are the most common way for statisticians, scientists, and the public to visually detect relationships between measured variables. At the same time, and despite widely publicized controversy, P-values remain the most commonly used measure to statistically justify relationships identified between variables. Here we measure the ability to detect statistically significant relationships from scatterplots in a randomized trial of 2,039 students in a statistics massive open online course (MOOC). Each subject was shown a random set of scatterplots and asked to visually determine if the underlying relationships were statistically significant at the P < 0.05 level. Subjects correctly classified only 47.4% (95% CI [45.1%-49.7%]) of statistically significant relationships, and 74.6% (95% CI [72.5%-76.6%]) of non-significant relationships. Adding visual aids such as a best fit line or scatterplot smooth increased the probability a relationship was called significant, regardless of whether the relationship was actually significant. Classification of statistically significant relationships improved on repeat attempts of the survey, although classification of non-significant relationships did not. Our results suggest: (1) that evidence-based data analysis can be used to identify weaknesses in theoretical procedures in the hands of average users, (2) data analysts can be trained to improve detection of statistically significant results with practice, but (3) data analysts have incorrect intuition about what statistically significant relationships look like, particularly for small effects. We have built a web tool for people to compare scatterplots with their corresponding p-values which is available here: http://glimmer.rstudio.com/afisher/EDA/.

A randomized trial in a massive online open course shows people don’t know what a statistically significant relationship looks like, but they can learn

PubMed Central

Fisher, Aaron; Anderson, G. Brooke; Peng, Roger

2014-01-01

Scatterplots are the most common way for statisticians, scientists, and the public to visually detect relationships between measured variables. At the same time, and despite widely publicized controversy, P-values remain the most commonly used measure to statistically justify relationships identified between variables. Here we measure the ability to detect statistically significant relationships from scatterplots in a randomized trial of 2,039 students in a statistics massive open online course (MOOC). Each subject was shown a random set of scatterplots and asked to visually determine if the underlying relationships were statistically significant at the P < 0.05 level. Subjects correctly classified only 47.4% (95% CI [45.1%–49.7%]) of statistically significant relationships, and 74.6% (95% CI [72.5%–76.6%]) of non-significant relationships. Adding visual aids such as a best fit line or scatterplot smooth increased the probability a relationship was called significant, regardless of whether the relationship was actually significant. Classification of statistically significant relationships improved on repeat attempts of the survey, although classification of non-significant relationships did not. Our results suggest: (1) that evidence-based data analysis can be used to identify weaknesses in theoretical procedures in the hands of average users, (2) data analysts can be trained to improve detection of statistically significant results with practice, but (3) data analysts have incorrect intuition about what statistically significant relationships look like, particularly for small effects. We have built a web tool for people to compare scatterplots with their corresponding p-values which is available here: http://glimmer.rstudio.com/afisher/EDA/. PMID:25337457

Pixel Statistical Analysis of Diabetic vs. Non-diabetic Foot-Sole Spectral Terahertz Reflection Images

NASA Astrophysics Data System (ADS)

Hernandez-Cardoso, G. G.; Alfaro-Gomez, M.; Rojas-Landeros, S. C.; Salas-Gutierrez, I.; Castro-Camus, E.

2018-03-01

In this article, we present a series of hydration mapping images of the foot soles of diabetic and non-diabetic subjects measured by terahertz reflectance. In addition to the hydration images, we present a series of RYG-color-coded (red yellow green) images where pixels are assigned one of the three colors in order to easily identify areas in risk of ulceration. We also present the statistics of the number of pixels with each color as a potential quantitative indicator for diabetic foot-syndrome deterioration.

ToNER: A tool for identifying nucleotide enrichment signals in feature-enriched RNA-seq data

PubMed Central

Promworn, Yuttachon; Kaewprommal, Pavita; Shaw, Philip J.; Intarapanich, Apichart; Tongsima, Sissades

2017-01-01

Background Biochemical methods are available for enriching 5′ ends of RNAs in prokaryotes, which are employed in the differential RNA-seq (dRNA-seq) and the more recent Cappable-seq protocols. Computational methods are needed to locate RNA 5′ ends from these data by statistical analysis of the enrichment. Although statistical-based analysis methods have been developed for dRNA-seq, they may not be suitable for Cappable-seq data. The more efficient enrichment method employed in Cappable-seq compared with dRNA-seq could affect data distribution and thus algorithm performance. Results We present Transformation of Nucleotide Enrichment Ratios (ToNER), a tool for statistical modeling of enrichment from RNA-seq data obtained from enriched and unenriched libraries. The tool calculates nucleotide enrichment scores and determines the global transformation for fitting to the normal distribution using the Box-Cox procedure. From the transformed distribution, sites of significant enrichment are identified. To increase power of detection, meta-analysis across experimental replicates is offered. We tested the tool on Cappable-seq and dRNA-seq data for identifying Escherichia coli transcript 5′ ends and compared the results with those from the TSSAR tool, which is designed for analyzing dRNA-seq data. When combining results across Cappable-seq replicates, ToNER detects more known transcript 5′ ends than TSSAR. In general, the transcript 5′ ends detected by ToNER but not TSSAR occur in regions which cannot be locally modeled by TSSAR. Conclusion ToNER uses a simple yet robust statistical modeling approach, which can be used for detecting RNA 5′ends from Cappable-seq data, in particular when combining information from experimental replicates. The ToNER tool could potentially be applied for analyzing other RNA-seq datasets in which enrichment for other structural features of RNA is employed. The program is freely available for download at ToNER webpage (http://www4a.biotec.or.th/GI/tools/toner) and GitHub repository (https://github.com/PavitaKae/ToNER). PMID:28542466

«

18

19

20

21

22

»

«

19

20

21

22

23

»

Discriminatory power of water polo game-related statistics at the 2008 Olympic Games.

PubMed

Escalante, Yolanda; Saavedra, Jose M; Mansilla, Mirella; Tella, Victor

2011-02-01

The aims of this study were (1) to compare water polo game-related statistics by context (winning and losing teams) and sex (men and women), and (2) to identify characteristics discriminating the performances for each sex. The game-related statistics of the 64 matches (44 men's and 20 women's) played in the final phase of the Olympic Games held in Beijing in 2008 were analysed. Unpaired t-tests compared winners and losers and men and women, and confidence intervals and effect sizes of the differences were calculated. The results were subjected to a discriminant analysis to identify the differentiating game-related statistics of the winning and losing teams. The results showed the differences between winning and losing men's teams to be in both defence and offence, whereas in women's teams they were only in offence. In men's games, passing (assists), aggressive play (exclusions), centre position effectiveness (centre shots), and goalkeeper defence (goalkeeper-blocked 5-m shots) predominated, whereas in women's games the play was more dynamic (possessions). The variable that most discriminated performance in men was goalkeeper-blocked shots, and in women shooting effectiveness (shots). These results should help coaches when planning training and competition.

Accounting for isotopic clustering in Fourier transform mass spectrometry data analysis for clinical diagnostic studies.

PubMed

Kakourou, Alexia; Vach, Werner; Nicolardi, Simone; van der Burgt, Yuri; Mertens, Bart

2016-10-01

Mass spectrometry based clinical proteomics has emerged as a powerful tool for high-throughput protein profiling and biomarker discovery. Recent improvements in mass spectrometry technology have boosted the potential of proteomic studies in biomedical research. However, the complexity of the proteomic expression introduces new statistical challenges in summarizing and analyzing the acquired data. Statistical methods for optimally processing proteomic data are currently a growing field of research. In this paper we present simple, yet appropriate methods to preprocess, summarize and analyze high-throughput MALDI-FTICR mass spectrometry data, collected in a case-control fashion, while dealing with the statistical challenges that accompany such data. The known statistical properties of the isotopic distribution of the peptide molecules are used to preprocess the spectra and translate the proteomic expression into a condensed data set. Information on either the intensity level or the shape of the identified isotopic clusters is used to derive summary measures on which diagnostic rules for disease status allocation will be based. Results indicate that both the shape of the identified isotopic clusters and the overall intensity level carry information on the class outcome and can be used to predict the presence or absence of the disease.

Effect of intraoperative neuromonitoring on recurrent laryngeal nerve palsy rates after thyroid surgery--a meta-analysis.

PubMed

Zheng, Shixing; Xu, Zhiwen; Wei, Yuanyuan; Zeng, Manli; He, Jinnian

2013-08-01

Though intraoperative nerve monitoring (IONM) during thyroid surgery has gained universal acceptance for localizing and identifying the recurrent laryngeal nerve (RLN), its role in reducing the rate of RLN injury remains controversial. In order to assess the effect of IONM during thyroid surgery, its value in reducing the incidence of RLN palsy was systematically evaluated. Studies were evaluated for inclusion in this analysis by researching PubMed, Embase, the Cochrane Central Register of Controlled Trials, and the references of included studies. The initial screening of article titles and abstracts was independently performed by five reviewers based on the research protocol criteria. Each article was then read in detail and discussed before inclusion in the meta-analysis. Data were independently extracted, including the level of evidence, number of at-risk nerves, allocation method, baseline equivalence between groups, definitions of transient and permanent vocal fold palsy, systematic application of electrodes, etc. The meta-analysis was then performed. Odds ratios were pooled using a random effects model. Five randomized clinical trials and 12 comparative trials evaluating 36,487 at-risk nerves were included. Statistically significant differences in terms of total recurrent laryngeal nerve palsy (3.37% with intraoperative nerve monitoring [IONM] vs. 3.76% without IONM [OR: 0.74; 95% confidence interval [CI]: 0.59-0.92]) and transient recurrent laryngeal nerve palsy (2.56% with IONM vs. 2.71% without IONM [OR: 0.80; 95% CI: 0.65-0.99]) were identified. The persistent incidence of recurrent laryngeal nerve palsy was 0.78% for IONM versus 0.96% for nerve identification alone (OR: 0.80; 95% CI: 0.62-1.03). Based on this meta-analysis, statistically significant differences were determined in terms of the incidences of total and transient recurrent laryngeal nerve palsy after using IONM versus recurrent laryngeal nerve identification alone during thyroidectomy. However, no statistically significant differences were identified regarding the incidence of persistent recurrent laryngeal nerve palsy between groups. Copyright © 2012. Published by Elsevier B.V.

Spatio-Temporal Analysis of Smear-Positive Tuberculosis in the Sidama Zone, Southern Ethiopia

PubMed Central

Dangisso, Mesay Hailu; Datiko, Daniel Gemechu; Lindtjørn, Bernt

2015-01-01

Background Tuberculosis (TB) is a disease of public health concern, with a varying distribution across settings depending on socio-economic status, HIV burden, availability and performance of the health system. Ethiopia is a country with a high burden of TB, with regional variations in TB case notification rates (CNRs). However, TB program reports are often compiled and reported at higher administrative units that do not show the burden at lower units, so there is limited information about the spatial distribution of the disease. We therefore aim to assess the spatial distribution and presence of the spatio-temporal clustering of the disease in different geographic settings over 10 years in the Sidama Zone in southern Ethiopia. Methods A retrospective space–time and spatial analysis were carried out at the kebele level (the lowest administrative unit within a district) to identify spatial and space-time clusters of smear-positive pulmonary TB (PTB). Scan statistics, Global Moran’s I, and Getis and Ordi (Gi*) statistics were all used to help analyze the spatial distribution and clusters of the disease across settings. Results A total of 22,545 smear-positive PTB cases notified over 10 years were used for spatial analysis. In a purely spatial analysis, we identified the most likely cluster of smear-positive PTB in 192 kebeles in eight districts (RR= 2, p<0.001), with 12,155 observed and 8,668 expected cases. The Gi* statistic also identified the clusters in the same areas, and the spatial clusters showed stability in most areas in each year during the study period. The space-time analysis also detected the most likely cluster in 193 kebeles in the same eight districts (RR= 1.92, p<0.001), with 7,584 observed and 4,738 expected cases in 2003-2012. Conclusion The study found variations in CNRs and significant spatio-temporal clusters of smear-positive PTB in the Sidama Zone. The findings can be used to guide TB control programs to devise effective TB control strategies for the geographic areas characterized by the highest CNRs. Further studies are required to understand the factors associated with clustering based on individual level locations and investigation of cases. PMID:26030162

Using protein-protein interactions for refining gene networks estimated from microarray data by Bayesian networks.

PubMed

Nariai, N; Kim, S; Imoto, S; Miyano, S

2004-01-01

We propose a statistical method to estimate gene networks from DNA microarray data and protein-protein interactions. Because physical interactions between proteins or multiprotein complexes are likely to regulate biological processes, using only mRNA expression data is not sufficient for estimating a gene network accurately. Our method adds knowledge about protein-protein interactions to the estimation method of gene networks under a Bayesian statistical framework. In the estimated gene network, a protein complex is modeled as a virtual node based on principal component analysis. We show the effectiveness of the proposed method through the analysis of Saccharomyces cerevisiae cell cycle data. The proposed method improves the accuracy of the estimated gene networks, and successfully identifies some biological facts.

Wavelet analysis of polarization azimuths maps for laser images of myocardial tissue for the purpose of diagnosing acute coronary insufficiency

NASA Astrophysics Data System (ADS)

Wanchuliak, O. Ya.; Peresunko, A. P.; Bakko, Bouzan Adel; Kushnerick, L. Ya.

2011-09-01

This paper presents the foundations of a large scale - localized wavelet - polarization analysis - inhomogeneous laser images of histological sections of myocardial tissue. Opportunities were identified defining relations between the structures of wavelet coefficients and causes of death. The optical model of polycrystalline networks of myocardium protein fibrils is presented. The technique of determining the coordinate distribution of polarization azimuth of the points of laser images of myocardium histological sections is suggested. The results of investigating the interrelation between the values of statistical (statistical moments of the 1st-4th order) parameters are presented which characterize distributions of wavelet - coefficients polarization maps of myocardium layers and death reasons.

Quasi-experimental study designs series-paper 10: synthesizing evidence for effects collected from quasi-experimental studies presents surmountable challenges.

PubMed

Becker, Betsy Jane; Aloe, Ariel M; Duvendack, Maren; Stanley, T D; Valentine, Jeffrey C; Fretheim, Atle; Tugwell, Peter

2017-09-01

To outline issues of importance to analytic approaches to the synthesis of quasi-experiments (QEs) and to provide a statistical model for use in analysis. We drew on studies of statistics, epidemiology, and social-science methodology to outline methods for synthesis of QE studies. The design and conduct of QEs, effect sizes from QEs, and moderator variables for the analysis of those effect sizes were discussed. Biases, confounding, design complexities, and comparisons across designs offer serious challenges to syntheses of QEs. Key components of meta-analyses of QEs were identified, including the aspects of QE study design to be coded and analyzed. Of utmost importance are the design and statistical controls implemented in the QEs. Such controls and any potential sources of bias and confounding must be modeled in analyses, along with aspects of the interventions and populations studied. Because of such controls, effect sizes from QEs are more complex than those from randomized experiments. A statistical meta-regression model that incorporates important features of the QEs under review was presented. Meta-analyses of QEs provide particular challenges, but thorough coding of intervention characteristics and study methods, along with careful analysis, should allow for sound inferences. Copyright © 2017 Elsevier Inc. All rights reserved.

Multivariate analysis, mass balance techniques, and statistical tests as tools in igneous petrology: application to the Sierra de las Cruces volcanic range (Mexican Volcanic Belt).

PubMed

Velasco-Tapia, Fernando

2014-01-01

Magmatic processes have usually been identified and evaluated using qualitative or semiquantitative geochemical or isotopic tools based on a restricted number of variables. However, a more complete and quantitative view could be reached applying multivariate analysis, mass balance techniques, and statistical tests. As an example, in this work a statistical and quantitative scheme is applied to analyze the geochemical features for the Sierra de las Cruces (SC) volcanic range (Mexican Volcanic Belt). In this locality, the volcanic activity (3.7 to 0.5 Ma) was dominantly dacitic, but the presence of spheroidal andesitic enclaves and/or diverse disequilibrium features in majority of lavas confirms the operation of magma mixing/mingling. New discriminant-function-based multidimensional diagrams were used to discriminate tectonic setting. Statistical tests of discordancy and significance were applied to evaluate the influence of the subducting Cocos plate, which seems to be rather negligible for the SC magmas in relation to several major and trace elements. A cluster analysis following Ward's linkage rule was carried out to classify the SC volcanic rocks geochemical groups. Finally, two mass-balance schemes were applied for the quantitative evaluation of the proportion of the end-member components (dacitic and andesitic magmas) in the comingled lavas (binary mixtures).

Comprehensive analysis of yeast metabolite GC x GC-TOFMS data: combining discovery-mode and deconvolution chemometric software.

PubMed

Mohler, Rachel E; Dombek, Kenneth M; Hoggard, Jamin C; Pierce, Karisa M; Young, Elton T; Synovec, Robert E

2007-08-01

The first extensive study of yeast metabolite GC x GC-TOFMS data from cells grown under fermenting, R, and respiring, DR, conditions is reported. In this study, recently developed chemometric software for use with three-dimensional instrumentation data was implemented, using a statistically-based Fisher ratio method. The Fisher ratio method is fully automated and will rapidly reduce the data to pinpoint two-dimensional chromatographic peaks differentiating sample types while utilizing all the mass channels. The effect of lowering the Fisher ratio threshold on peak identification was studied. At the lowest threshold (just above the noise level), 73 metabolite peaks were identified, nearly three-fold greater than the number of previously reported metabolite peaks identified (26). In addition to the 73 identified metabolites, 81 unknown metabolites were also located. A Parallel Factor Analysis graphical user interface (PARAFAC GUI) was applied to selected mass channels to obtain a concentration ratio, for each metabolite under the two growth conditions. Of the 73 known metabolites identified by the Fisher ratio method, 54 were statistically changing to the 95% confidence limit between the DR and R conditions according to the rigorous Student's t-test. PARAFAC determined the concentration ratio and provided a fully-deconvoluted (i.e. mathematically resolved) mass spectrum for each of the metabolites. The combination of the Fisher ratio method with the PARAFAC GUI provides high-throughput software for discovery-based metabolomics research, and is novel for GC x GC-TOFMS data due to the use of the entire data set in the analysis (640 MB x 70 runs, double precision floating point).

Probabilistic characterization of sleep architecture: home based study on healthy volunteers.

PubMed

Garcia-Molina, Gary; Vissapragada, Sreeram; Mahadevan, Anandi; Goodpaster, Robert; Riedner, Brady; Bellesi, Michele; Tononi, Giulio

2016-08-01

The quantification of sleep architecture has high clinical value for diagnostic purposes. While the clinical standard to assess sleep architecture is in-lab based polysomnography, higher ecological validity can be obtained with multiple sleep recordings at home. In this paper, we use a dataset composed of fifty sleep EEG recordings at home (10 per study participant for five participants) to analyze the sleep stage transition dynamics using Markov chain based modeling. The statistical analysis of the duration of continuous sleep stage bouts is also analyzed to identify the speed of transition between sleep stages. This analysis identified two types of NREM states characterized by fast and slow exit rates which from the EEG analysis appear to correspond to shallow and deep sleep respectively.

Living systematic reviews: 3. Statistical methods for updating meta-analyses.

PubMed

Simmonds, Mark; Salanti, Georgia; McKenzie, Joanne; Elliott, Julian

2017-11-01

A living systematic review (LSR) should keep the review current as new research evidence emerges. Any meta-analyses included in the review will also need updating as new material is identified. If the aim of the review is solely to present the best current evidence standard meta-analysis may be sufficient, provided reviewers are aware that results may change at later updates. If the review is used in a decision-making context, more caution may be needed. When using standard meta-analysis methods, the chance of incorrectly concluding that any updated meta-analysis is statistically significant when there is no effect (the type I error) increases rapidly as more updates are performed. Inaccurate estimation of any heterogeneity across studies may also lead to inappropriate conclusions. This paper considers four methods to avoid some of these statistical problems when updating meta-analyses: two methods, that is, law of the iterated logarithm and the Shuster method control primarily for inflation of type I error and two other methods, that is, trial sequential analysis and sequential meta-analysis control for type I and II errors (failing to detect a genuine effect) and take account of heterogeneity. This paper compares the methods and considers how they could be applied to LSRs. Copyright © 2017 Elsevier Inc. All rights reserved.

Multi-resolutional shape features via non-Euclidean wavelets: Applications to statistical analysis of cortical thickness

PubMed Central

Kim, Won Hwa; Singh, Vikas; Chung, Moo K.; Hinrichs, Chris; Pachauri, Deepti; Okonkwo, Ozioma C.; Johnson, Sterling C.

2014-01-01

Statistical analysis on arbitrary surface meshes such as the cortical surface is an important approach to understanding brain diseases such as Alzheimer’s disease (AD). Surface analysis may be able to identify specific cortical patterns that relate to certain disease characteristics or exhibit differences between groups. Our goal in this paper is to make group analysis of signals on surfaces more sensitive. To do this, we derive multi-scale shape descriptors that characterize the signal around each mesh vertex, i.e., its local context, at varying levels of resolution. In order to define such a shape descriptor, we make use of recent results from harmonic analysis that extend traditional continuous wavelet theory from the Euclidean to a non-Euclidean setting (i.e., a graph, mesh or network). Using this descriptor, we conduct experiments on two different datasets, the Alzheimer’s Disease NeuroImaging Initiative (ADNI) data and images acquired at the Wisconsin Alzheimer’s Disease Research Center (W-ADRC), focusing on individuals labeled as having Alzheimer’s disease (AD), mild cognitive impairment (MCI) and healthy controls. In particular, we contrast traditional univariate methods with our multi-resolution approach which show increased sensitivity and improved statistical power to detect a group-level effects. We also provide an open source implementation. PMID:24614060

Genetic overlap between type 2 diabetes and major depressive disorder identified by bioinformatics analysis.

PubMed

Ji, Hong-Fang; Zhuang, Qi-Shuai; Shen, Liang

2016-04-05

Our study investigated the shared genetic etiology underlying type 2 diabetes (T2D) and major depressive disorder (MDD) by analyzing large-scale genome wide association studies statistics. A total of 496 shared SNPs associated with both T2D and MDD were identified at p-value ≤ 1.0E-07. Functional enrichment analysis showed that the enriched pathways pertained to immune responses (Fc gamma R-mediated phagocytosis, T cell and B cell receptors signaling), cell signaling (MAPK, Wnt signaling), lipid metabolism, and cancer associated pathways. The findings will have potential implications for future interventional studies of the two diseases.

Psychological profiling of offender characteristics from crime behaviors in serial rape offences.

PubMed

Kocsis, Richard N; Cooksey, Ray W; Irwin, Harvey J

2002-04-01

Criminal psychological profiling has progressively been incorporated into police procedures despite a dearth of empirical research. Indeed, in the study of serial violent crimes for the purpose of psychological profiling, very few original, quantitative, academically reviewed studies actually exist. This article reports on the analysis of 62 incidents of serial sexual assault. The statistical procedure of multidimensional scaling was employed in the analysis of this data, which in turn produced a five-cluster model of serial rapist behavior. First, a central cluster of behaviors were identified that represent common behaviors to all patterns of serial rape. Second, four distinct outlying patterns were identified as demonstrating distinct offence styles, these being assigned the following descriptive labels brutality, intercourse, chaotic, and ritual. Furthermore, analysis of these patterns also identified distinct offender characteristics that allow for the use of empirically robust offender profiles in future serial rape investigations.

An Automated Method of Scanning Probe Microscopy (SPM) Data Analysis and Reactive Site Tracking for Mineral-Water Interface Reactions Observed at the Nanometer Scale

NASA Astrophysics Data System (ADS)

Campbell, B. D.; Higgins, S. R.

2008-12-01

Developing a method for bridging the gap between macroscopic and microscopic measurements of reaction kinetics at the mineral-water interface has important implications in geological and chemical fields. Investigating these reactions on the nanometer scale with SPM is often limited by image analysis and data extraction due to the large quantity of data usually obtained in SPM experiments. Here we present a computer algorithm for automated analysis of mineral-water interface reactions. This algorithm automates the analysis of sequential SPM images by identifying the kinetically active surface sites (i.e., step edges), and by tracking the displacement of these sites from image to image. The step edge positions in each image are readily identified and tracked through time by a standard edge detection algorithm followed by statistical analysis on the Hough Transform of the edge-mapped image. By quantifying this displacement as a function of time, the rate of step edge displacement is determined. Furthermore, the total edge length, also determined from analysis of the Hough Transform, combined with the computed step speed, yields the surface area normalized rate of the reaction. The algorithm was applied to a study of the spiral growth of the calcite(104) surface from supersaturated solutions, yielding results almost 20 times faster than performing this analysis by hand, with results being statistically similar for both analysis methods. This advance in analysis of kinetic data from SPM images will facilitate the building of experimental databases on the microscopic kinetics of mineral-water interface reactions.

Methods for evaluating temporal groundwater quality data and results of decadal-scale changes in chloride, dissolved solids, and nitrate concentrations in groundwater in the United States, 1988-2010

USGS Publications Warehouse

Lindsey, Bruce D.; Rupert, Michael G.

2012-01-01

Decadal-scale changes in groundwater quality were evaluated by the U.S. Geological Survey National Water-Quality Assessment (NAWQA) Program. Samples of groundwater collected from wells during 1988-2000 - a first sampling event representing the decade ending the 20th century - were compared on a pair-wise basis to samples from the same wells collected during 2001-2010 - a second sampling event representing the decade beginning the 21st century. The data set consists of samples from 1,236 wells in 56 well networks, representing major aquifers and urban and agricultural land-use areas, with analytical results for chloride, dissolved solids, and nitrate. Statistical analysis was done on a network basis rather than by individual wells. Although spanning slightly more or less than a 10-year period, the two-sample comparison between the first and second sampling events is referred to as an analysis of decadal-scale change based on a step-trend analysis. The 22 principal aquifers represented by these 56 networks account for nearly 80 percent of the estimated withdrawals of groundwater used for drinking-water supply in the Nation. Well networks where decadal-scale changes in concentrations were statistically significant were identified using the Wilcoxon-Pratt signed-rank test. For the statistical analysis of chloride, dissolved solids, and nitrate concentrations at the network level, more than half revealed no statistically significant change over the decadal period. However, for networks that had statistically significant changes, increased concentrations outnumbered decreased concentrations by a large margin. Statistically significant increases of chloride concentrations were identified for 43 percent of 56 networks. Dissolved solids concentrations increased significantly in 41 percent of the 54 networks with dissolved solids data, and nitrate concentrations increased significantly in 23 percent of 56 networks. At least one of the three - chloride, dissolved solids, or nitrate - had a statistically significant increase in concentration in 66 percent of the networks. Statistically significant decreases in concentrations were identified in 4 percent of the networks for chloride, 2 percent of the networks for dissolved solids, and 9 percent of the networks for nitrate. A larger percentage of urban land-use networks had statistically significant increases in chloride, dissolved solids, and nitrate concentrations than agricultural land-use networks. In order to assess the magnitude of statistically significant changes, the median of the differences between constituent concentrations from the first full-network sampling event and those from the second full-network sampling event was calculated using the Turnbull method. The largest median decadal increases in chloride concentrations were in networks in the Upper Illinois River Basin (67 mg/L) and in the New England Coastal Basins (34 mg/L), whereas the largest median decadal decrease in chloride concentrations was in the Upper Snake River Basin (1 mg/L). The largest median decadal increases in dissolved solids concentrations were in networks in the Rio Grande Valley (260 mg/L) and the Upper Illinois River Basin (160 mg/L). The largest median decadal decrease in dissolved solids concentrations was in the Apalachicola-Chattahoochee-Flint River Basin (6.0 mg/L). The largest median decadal increases in nitrate as nitrogen (N) concentrations were in networks in the South Platte River Basin (2.0 mg/L as N) and the San Joaquin-Tulare Basins (1.0 mg/L as N). The largest median decadal decrease in nitrate concentrations was in the Santee River Basin and Coastal Drainages (0.63 mg/L). The magnitude of change in networks with statistically significant increases typically was much larger than the magnitude of change in networks with statistically significant decreases. The magnitude of change was greatest for chloride in the urban land-use networks and greatest for dissolved solids and nitrate in the agricultural land-use networks. Analysis of data from all networks combined indicated statistically significant increases for chloride, dissolved solids, and nitrate. Although chloride, dissolved solids, and nitrate concentrations were typically less than the drinking-water standards and guidelines, a statistical test was used to determine whether or not the proportion of samples exceeding the drinking-water standard or guideline changed significantly between the first and second full-network sampling events. The proportion of samples exceeding the U.S. Environmental Protection Agency (USEPA) Secondary Maximum Contaminant Level for dissolved solids (500 milligrams per liter) increased significantly between the first and second full-network sampling events when evaluating all networks combined at the national level. Also, for all networks combined, the proportion of samples exceeding the USEPA Maximum Contaminant Level (MCL) of 10 mg/L as N for nitrate increased significantly. One network in the Delmarva Peninsula had a significant increase in the proportion of samples exceeding the MCL for nitrate. A subset of 261 wells was sampled every other year (biennially) to evaluate decadal-scale changes using a time-series analysis. The analysis of the biennial data set showed that changes were generally similar to the findings from the analysis of decadal-scale change that was based on a step-trend analysis. Because of the small number of wells in a network with biennial data (typically 4-5 wells), the time-series analysis is more useful for understanding water-quality responses to changes in site-specific conditions rather than as an indicator of the change for the entire network.

Developing a Continuous Quality Improvement Assessment Using a Patient-Centered Approach in Optimizing Systemic Lupus Erythematosus Disease Control.

PubMed

Updyke, Katelyn Mariko; Urso, Brittany; Beg, Shazia; Solomon, James

2017-10-09

Systemic lupus erythematosus (SLE) is a multi-organ, autoimmune disease in which patients lose self-tolerance and develop immune complexes which deposit systemically causing multi-organ damage and inflammation. Patients often experience unpredictable flares of symptoms with poorly identified triggers. Literature suggests exogenous exposures may contribute to flares in symptoms. An online pilot survey was marketed globally through social media to self-reported SLE patients with the goal to identify specific subpopulations who are susceptible to disease state changes based on analyzed exogenous factors. The pilot survey was promoted for two weeks, 80 respondents fully completed the survey and were included in statistical analysis. Descriptive statistical analysis was performed on de-identified patient surveys and compared to previous literature studies reporting known or theorized triggers in the SLE disease state. The pilot survey identified similar exogenous triggers compared to previous literature, including antibiotics, increasing beef intake, and metal implants. The goal of the pilot survey is to utilize similar questions to develop a detailed internet-based patient interactive form that can be edited and time stamped as a method to promote continuous quality improvement assessments. The ultimate objective of the platform is to interact with SLE patients from across the globe longitudinally to optimize disease control and improve quality of care by allowing them to avoid harmful triggers.

Developing a Continuous Quality Improvement Assessment Using a Patient-Centered Approach in Optimizing Systemic Lupus Erythematosus Disease Control

PubMed Central

Urso, Brittany; Beg, Shazia; Solomon, James

2017-01-01

Systemic lupus erythematosus (SLE) is a multi-organ, autoimmune disease in which patients lose self-tolerance and develop immune complexes which deposit systemically causing multi-organ damage and inflammation. Patients often experience unpredictable flares of symptoms with poorly identified triggers. Literature suggests exogenous exposures may contribute to flares in symptoms. An online pilot survey was marketed globally through social media to self-reported SLE patients with the goal to identify specific subpopulations who are susceptible to disease state changes based on analyzed exogenous factors. The pilot survey was promoted for two weeks, 80 respondents fully completed the survey and were included in statistical analysis. Descriptive statistical analysis was performed on de-identified patient surveys and compared to previous literature studies reporting known or theorized triggers in the SLE disease state. The pilot survey identified similar exogenous triggers compared to previous literature, including antibiotics, increasing beef intake, and metal implants. The goal of the pilot survey is to utilize similar questions to develop a detailed internet-based patient interactive form that can be edited and time stamped as a method to promote continuous quality improvement assessments. The ultimate objective of the platform is to interact with SLE patients from across the globe longitudinally to optimize disease control and improve quality of care by allowing them to avoid harmful triggers. PMID:29226052

Prognostic factors for risk stratification of adult cancer patients with chemotherapy-induced febrile neutropenia: a systematic review and meta-analysis.

PubMed

Lee, Yee Mei; Lang, Dora; Lockwood, Craig

Increasing numbers of studies identify new prognostic factors for categorising chemotherapy-induced febrile neutropenia adult cancer patients into high- or low-risk groups for adverse outcomes. These groupings are used to tailor therapy according to level of risk. However many emerging factors with prognostic significance remain controversial, being based on single studies only. A systematic review was conducted to determine the strength of association of all identified factors associated with the outcomes of chemotherapy-induced febrile neutropenia patients. The participants included were adults of 15 years old and above, with a cancer diagnosis and who underwent cancer treatment.The review focused on clinical factors and their association with the outcomes of cancer patients with chemotherapy-induced febrile neutropenia at presentation of fever.All quantitative studies published in English which investigated clinical factors for risk stratification of adult cancer patients with chemotherapy-induced febrile neutropenia were considered.The primary outcome of interest was to identify the clinical factors for risk stratification of adult cancer patients with chemotherapy-induced febrile neutropenia. Electronic databases searched from their respective inception date up to December 2011 include MEDLINE, EMBASE, CINAHL, Cochrane Central Register of Controlled Trials (CENTRAL), Web of Science, Science-Direct, Scopus and Mednar. The quality of the included studies was subjected to assessment by two independent reviewers. The standardised critical appraisal tool from the Joanna Briggs Institute Meta-Analysis of Statistics Assessment and Review Instrument (JBI-MAStARI) was used to assess the following criteria: representativeness of study population; clearly defined prognostic factors and outcomes; whether potential confounders were addressed and appropriate statistical analysis was undertaken for the study design. Data extraction was performed using a modified version of the standardised extraction tool from the JBI-MAStARI. Prognostic factors and the accompanying odds ratio reported for the significance of these factors that were identified by multivariate regression, were extracted from each included study. Studies results were pooled in statistical meta-analysis using Review Manager 5.1. Where statistical pooling was not possible, the findings were presented in narrative form. Seven studies (four prospective cohort and three retrospective cohort) investigating 22 factors in total were included. Fixed effects meta-analysis showed: hypotension [OR=1.66, 95%CI, 1.14-2.41, p=0.008] and thrombocytopenia [OR=3.92, 95%CI, 2.19-7.01, p<0.00001)] were associated with high-risk of adverse outcomes for febrile neutropenia. Other factors that were statistically significant from single studies included: age of patients, clinical presentation at fever onset, presence or absence of co-morbidities, infections, duration and severity of neutropenia state. Five prognostic factors failed to demonstrate an association between the variables and the outcomes measured and they include: presence of pneumonia, total febrile days, median days to fever, recovery from neutropenia and presence of moderate clinical symptoms in association with Gram-negative bacteraemia. Despite the overall limitations identified in the included studies, this review has provided a synthesis of the best available evidence for the prognostic factors used in risk stratification of febrile neutropenia patients. However, the dynamic aspects of prognostic model development, validation and utilisation have not been addressed adequately thus far. Given the findings of this review, it is timely to address these issues and improve the utilisation of prognostic models in the management of febrile neutropenia patients. The identified factors are similar to the factors in current prognostic models. However, additional factors that were reported to be statistically significant in this review (thrombocytopenia, presence of central venous catheter, and duration and severity of neutropenia) have not previously been included in prognostic models. This review has found these factors may improve the performance of current models by adding or replacing some of the factors. The role of risk stratification of chemotherapy-induced febrile neutropenia patients continues to evolve as the practice of risk-based therapy has been demonstrated to be beneficial to patients, clinicians and health care organisations. Further research to identify new factors /markers is needed to develop a new model which is reliable and accurate for these patients, regardless of cancer types. A robust and well-validated prognostic model is the key to enhance patient safety in the risk-based management of cancer patients with chemotherapy-induced febrile neutropenia.

Analyzing Large Gene Expression and Methylation Data Profiles Using StatBicRM: Statistical Biclustering-Based Rule Mining

PubMed Central

Maulik, Ujjwal; Mallik, Saurav; Mukhopadhyay, Anirban; Bandyopadhyay, Sanghamitra

2015-01-01

Microarray and beadchip are two most efficient techniques for measuring gene expression and methylation data in bioinformatics. Biclustering deals with the simultaneous clustering of genes and samples. In this article, we propose a computational rule mining framework, StatBicRM (i.e., statistical biclustering-based rule mining) to identify special type of rules and potential biomarkers using integrated approaches of statistical and binary inclusion-maximal biclustering techniques from the biological datasets. At first, a novel statistical strategy has been utilized to eliminate the insignificant/low-significant/redundant genes in such way that significance level must satisfy the data distribution property (viz., either normal distribution or non-normal distribution). The data is then discretized and post-discretized, consecutively. Thereafter, the biclustering technique is applied to identify maximal frequent closed homogeneous itemsets. Corresponding special type of rules are then extracted from the selected itemsets. Our proposed rule mining method performs better than the other rule mining algorithms as it generates maximal frequent closed homogeneous itemsets instead of frequent itemsets. Thus, it saves elapsed time, and can work on big dataset. Pathway and Gene Ontology analyses are conducted on the genes of the evolved rules using David database. Frequency analysis of the genes appearing in the evolved rules is performed to determine potential biomarkers. Furthermore, we also classify the data to know how much the evolved rules are able to describe accurately the remaining test (unknown) data. Subsequently, we also compare the average classification accuracy, and other related factors with other rule-based classifiers. Statistical significance tests are also performed for verifying the statistical relevance of the comparative results. Here, each of the other rule mining methods or rule-based classifiers is also starting with the same post-discretized data-matrix. Finally, we have also included the integrated analysis of gene expression and methylation for determining epigenetic effect (viz., effect of methylation) on gene expression level. PMID:25830807

«

19

20

21

22

23

»

«

20

21

22

23

24

»

Analyzing large gene expression and methylation data profiles using StatBicRM: statistical biclustering-based rule mining.

PubMed

Maulik, Ujjwal; Mallik, Saurav; Mukhopadhyay, Anirban; Bandyopadhyay, Sanghamitra

2015-01-01

Microarray and beadchip are two most efficient techniques for measuring gene expression and methylation data in bioinformatics. Biclustering deals with the simultaneous clustering of genes and samples. In this article, we propose a computational rule mining framework, StatBicRM (i.e., statistical biclustering-based rule mining) to identify special type of rules and potential biomarkers using integrated approaches of statistical and binary inclusion-maximal biclustering techniques from the biological datasets. At first, a novel statistical strategy has been utilized to eliminate the insignificant/low-significant/redundant genes in such way that significance level must satisfy the data distribution property (viz., either normal distribution or non-normal distribution). The data is then discretized and post-discretized, consecutively. Thereafter, the biclustering technique is applied to identify maximal frequent closed homogeneous itemsets. Corresponding special type of rules are then extracted from the selected itemsets. Our proposed rule mining method performs better than the other rule mining algorithms as it generates maximal frequent closed homogeneous itemsets instead of frequent itemsets. Thus, it saves elapsed time, and can work on big dataset. Pathway and Gene Ontology analyses are conducted on the genes of the evolved rules using David database. Frequency analysis of the genes appearing in the evolved rules is performed to determine potential biomarkers. Furthermore, we also classify the data to know how much the evolved rules are able to describe accurately the remaining test (unknown) data. Subsequently, we also compare the average classification accuracy, and other related factors with other rule-based classifiers. Statistical significance tests are also performed for verifying the statistical relevance of the comparative results. Here, each of the other rule mining methods or rule-based classifiers is also starting with the same post-discretized data-matrix. Finally, we have also included the integrated analysis of gene expression and methylation for determining epigenetic effect (viz., effect of methylation) on gene expression level.

End-of-life care practices of critical care nurses: A national cross-sectional survey.

PubMed

Ranse, Kristen; Yates, Patsy; Coyer, Fiona

2016-05-01

The critical care context presents important opportunities for nurses to deliver skilled, comprehensive care to patients at the end of life and their families. Limited research has identified the actual end-of-life care practices of critical care nurses. To identify the end-of-life care practices of critical care nurses. A national cross-sectional online survey. The survey was distributed to members of an Australian critical care nursing association and 392 critical care nurses (response rate 25%) completed the survey. Exploratory factor analysis using principal axis factoring with oblique rotation was undertaken on survey responses to identify the domains of end-of-life care practice. Descriptive statistics were calculated for individual survey items. Exploratory factor analysis identified six domains of end-of-life care practice: information sharing, environmental modification, emotional support, patient and family centred decision-making, symptom management and spiritual support. Descriptive statistics identified a high level of engagement in information sharing and environmental modification practices and less frequent engagement in items from the emotional support and symptom management practice areas. The findings of this study identified domains of end-of-life care practice, and critical care nurse engagement in these practices. The findings highlight future training and practice development opportunities, including the need for experiential learning targeting the emotional support practice domain. Further research is needed to enhance knowledge of symptom management practices during the provision of end-of-life care to inform and improve practice in this area. Copyright © 2015 Australian College of Critical Care Nurses Ltd. Published by Elsevier Ltd. All rights reserved.

Effects of Consecutive Basketball Games on the Game-Related Statistics that Discriminate Winner and Losing Teams

PubMed Central

Ibáñez, Sergio J.; García, Javier; Feu, Sebastian; Lorenzo, Alberto; Sampaio, Jaime

2009-01-01

The aim of the present study was to identify the game-related statistics that discriminated basketball winning and losing teams in each of the three consecutive games played in a condensed tournament format. The data were obtained from the Spanish Basketball Federation and included game-related statistics from the Under-20 league (2005-2006 and 2006-2007 seasons). A total of 223 games were analyzed with the following game-related statistics: two and three-point field goal (made and missed), free-throws (made and missed), offensive and defensive rebounds, assists, steals, turnovers, blocks (made and received), fouls committed, ball possessions and offensive rating. Results showed that winning teams in this competition had better values in all game-related statistics, with the exception of three point field goals made, free-throws missed and turnovers (p ≥ 0.05). The main effect of game number was only identified in turnovers, with a statistical significant decrease between the second and third game. No interaction was found in the analysed variables. A discriminant analysis allowed identifying the two-point field goals made, the defensive rebounds and the assists as discriminators between winning and losing teams in all three games. Additionally to these, only the three-point field goals made contributed to discriminate teams in game three, suggesting a moderate effect of fatigue. Coaches may benefit from being aware of this variation in game determinant related statistics and, also, from using offensive and defensive strategies in the third game, allowing to explore or hide the three point field-goals performance. Key points Overall team performances along the three consecutive games were very similar, not confirming an accumulated fatigue effect. The results from the three-point field goals in the third game suggested that winning teams were able to shoot better from longer distances and this could be the result of exhibiting higher conditioning status and/or the losing teams’ exhibiting low conditioning in defense. PMID:24150011

Demonstration of Wavelet Techniques in the Spectral Analysis of Bypass Transition Data

NASA Technical Reports Server (NTRS)

Lewalle, Jacques; Ashpis, David E.; Sohn, Ki-Hyeon

1997-01-01

A number of wavelet-based techniques for the analysis of experimental data are developed and illustrated. A multiscale analysis based on the Mexican hat wavelet is demonstrated as a tool for acquiring physical and quantitative information not obtainable by standard signal analysis methods. Experimental data for the analysis came from simultaneous hot-wire velocity traces in a bypass transition of the boundary layer on a heated flat plate. A pair of traces (two components of velocity) at one location was excerpted. A number of ensemble and conditional statistics related to dominant time scales for energy and momentum transport were calculated. The analysis revealed a lack of energy-dominant time scales inside turbulent spots but identified transport-dominant scales inside spots that account for the largest part of the Reynolds stress. Momentum transport was much more intermittent than were energetic fluctuations. This work is the first step in a continuing study of the spatial evolution of these scale-related statistics, the goal being to apply the multiscale analysis results to improve the modeling of transitional and turbulent industrial flows.

The Canadian Precipitation Analysis (CaPA): Evaluation of the statistical interpolation scheme

NASA Astrophysics Data System (ADS)

Evans, Andrea; Rasmussen, Peter; Fortin, Vincent

2013-04-01

CaPA (Canadian Precipitation Analysis) is a data assimilation system which employs statistical interpolation to combine observed precipitation with gridded precipitation fields produced by Environment Canada's Global Environmental Multiscale (GEM) climate model into a final gridded precipitation analysis. Precipitation is important in many fields and applications, including agricultural water management projects, flood control programs, and hydroelectric power generation planning. Precipitation is a key input to hydrological models, and there is a desire to have access to the best available information about precipitation in time and space. The principal goal of CaPA is to produce this type of information. In order to perform the necessary statistical interpolation, CaPA requires the estimation of a semi-variogram. This semi-variogram is used to describe the spatial correlations between precipitation innovations, defined as the observed precipitation amounts minus the GEM forecasted amounts predicted at the observation locations. Currently, CaPA uses a single isotropic variogram across the entire analysis domain. The present project investigates the implications of this choice by first conducting a basic variographic analysis of precipitation innovation data across the Canadian prairies, with specific interest in identifying and quantifying potential anisotropy within the domain. This focus is further expanded by identifying the effect of storm type on the variogram. The ultimate goal of the variographic analysis is to develop improved semi-variograms for CaPA that better capture the spatial complexities of precipitation over the Canadian prairies. CaPA presently applies a Box-Cox data transformation to both the observations and the GEM data, prior to the calculation of the innovations. The data transformation is necessary to satisfy the normal distribution assumption, but introduces a significant bias. The second part of the investigation aims at devising a bias correction scheme based on a moving-window averaging technique. For both the variogram and bias correction components of this investigation, a series of trial runs are conducted to evaluate the impact of these changes on the resulting CaPA precipitation analyses.

Individual risk factors for deep infection and compromised fracture healing after intramedullary nailing of tibial shaft fractures: a single centre experience of 480 patients.

PubMed

Metsemakers, W-J; Handojo, K; Reynders, P; Sermon, A; Vanderschot, P; Nijs, S

2015-04-01

Despite modern advances in the treatment of tibial shaft fractures, complications including nonunion, malunion, and infection remain relatively frequent. A better understanding of these injuries and its complications could lead to prevention rather than treatment strategies. A retrospective study was performed to identify risk factors for deep infection and compromised fracture healing after intramedullary nailing (IMN) of tibial shaft fractures. Between January 2000 and January 2012, 480 consecutive patients with 486 tibial shaft fractures were enrolled in the study. Statistical analysis was performed to determine predictors of deep infection and compromised fracture healing. Compromised fracture healing was subdivided in delayed union and nonunion. The following independent variables were selected for analysis: age, sex, smoking, obesity, diabetes, American Society of Anaesthesiologists (ASA) classification, polytrauma, fracture type, open fractures, Gustilo type, primary external fixation (EF), time to nailing (TTN) and reaming. As primary statistical evaluation we performed a univariate analysis, followed by a multiple logistic regression model. Univariate regression analysis revealed similar risk factors for delayed union and nonunion, including fracture type, open fractures and Gustilo type. Factors affecting the occurrence of deep infection in this model were primary EF, a prolonged TTN, open fractures and Gustilo type. Multiple logistic regression analysis revealed polytrauma as the single risk factor for nonunion. With respect to delayed union, no risk factors could be identified. In the same statistical model, deep infection was correlated with primary EF. The purpose of this study was to evaluate risk factors of poor outcome after IMN of tibial shaft fractures. The univariate regression analysis showed that the nature of complications after tibial shaft nailing could be multifactorial. This was not confirmed in a multiple logistic regression model, which only revealed polytrauma and primary EF as risk factors for nonunion and deep infection, respectively. Future strategies should focus on prevention in high-risk populations such as polytrauma patients treated with EF. Copyright © 2014 Elsevier Ltd. All rights reserved.

Analysis of S-box in Image Encryption Using Root Mean Square Error Method

NASA Astrophysics Data System (ADS)

Hussain, Iqtadar; Shah, Tariq; Gondal, Muhammad Asif; Mahmood, Hasan

2012-07-01

The use of substitution boxes (S-boxes) in encryption applications has proven to be an effective nonlinear component in creating confusion and randomness. The S-box is evolving and many variants appear in literature, which include advanced encryption standard (AES) S-box, affine power affine (APA) S-box, Skipjack S-box, Gray S-box, Lui J S-box, residue prime number S-box, Xyi S-box, and S8 S-box. These S-boxes have algebraic and statistical properties which distinguish them from each other in terms of encryption strength. In some circumstances, the parameters from algebraic and statistical analysis yield results which do not provide clear evidence in distinguishing an S-box for an application to a particular set of data. In image encryption applications, the use of S-boxes needs special care because the visual analysis and perception of a viewer can sometimes identify artifacts embedded in the image. In addition to existing algebraic and statistical analysis already used for image encryption applications, we propose an application of root mean square error technique, which further elaborates the results and enables the analyst to vividly distinguish between the performances of various S-boxes. While the use of the root mean square error analysis in statistics has proven to be effective in determining the difference in original data and the processed data, its use in image encryption has shown promising results in estimating the strength of the encryption method. In this paper, we show the application of the root mean square error analysis to S-box image encryption. The parameters from this analysis are used in determining the strength of S-boxes

Systems Analysis Directorate Activities Summary - September 1977. Volume 1

DTIC Science & Technology

1977-10-01

Identify by block number) Chemical agent Censor criteria Purity of the agent Statistical samples 20. ABSTRACT (Continue on reverse side U... chemical agent lots. Volume II (CONF) contains an analysis fo the operational capability as the 105rom MIOIAI and M102 Hpwltzej^g, DD , FORM JAN 73...Data Entered) CONTENTS Page Procedure for Determining the Serviceability Category of Chemical Agent Lots •• 5 User’s Guide to the Computer

Sensitivity analysis of navy aviation readiness based sparing model

DTIC Science & Technology

2017-09-01

variability. (See Figure 4.) Figure 4. Research design flowchart 18 Figure 4 lays out the four steps of the methodology , starting in the upper left-hand...as a function of changes in key inputs. We develop NAVARM Experimental Designs (NED), a computational tool created by applying a state-of-the-art...experimental design to the NAVARM model. Statistical analysis of the resulting data identifies the most influential cost factors. Those are, in order of

Quantitative Analysis of Repertoire Scale Immunoglobulin properties in Vaccine Induced B cell Responses

DTIC Science & Technology

Immunosequencing now readily generates 103105 sequences per sample ; however, statistical analysis of these repertoires is challenging because of the high genetic...diversity of BCRs and the elaborate clonal relationships among them. To date, most immunosequencing analyses have focused on reporting qualitative ...repertoire differences, (2) identifying how two repertoires differ, and (3) determining appropriate confidence intervals for assessing the size of the differences and their potential biological relevance.

The Use of Satellite Observed Cloud Patterns in Northern Hemisphere 300 mb and 1000/300 mb Numerical Analysis.

DTIC Science & Technology

1984-02-01

prediction Extratropical cyclones Objective analysis Bogus techniques 20. ABSTRACT (Continue on reverse aide If necooearn mid Identify by block number) Jh A...quasi-objective statistical method for deriving 300 mb geopotential heights and 1000/300 mb thicknesses in the vicinity of extratropical cyclones 0I...with the aid of satellite imagery is presented. The technique utilizes satellite observed extratropical spiral cloud pattern parameters in conjunction

Physics Education: A Significant Backbone of Sustainable Development in Developing Countries

NASA Astrophysics Data System (ADS)

Akintola, R. A.

2006-08-01

In the quest for technological self-reliance, many policies, programs and projects have been proposed and implemented in order to procure solutions to the problems of technological inadequacies of developing countries. It has been observed that all these failed. This research identifies the problems and proposes lasting solutions to emancipate physics education in developing nations and highlight possible future gains. The statistical analysis employed was based on questionnaires, interviews and data analysis.

Psychosocial work environment and mental health--a meta-analytic review.

PubMed

Stansfeld, Stephen; Candy, Bridget

2006-12-01

To clarify the associations between psychosocial work stressors and mental ill health, a meta-analysis of psychosocial work stressors and common mental disorders was undertaken using longitudinal studies identified through a systematic literature review. The review used a standardized search strategy and strict inclusion and quality criteria in seven databases in 1994-2005. Papers were identified from 24,939 citations covering social determinants of health, 50 relevant papers were identified, 38 fulfilled inclusion criteria, and 11 were suitable for a meta-analysis. The Comprehensive Meta-analysis Programme was used for decision authority, decision latitude, psychological demands, and work social support, components of the job-strain and iso-strain models, and the combination of effort and reward that makes up the effort-reward imbalance model and job insecurity. Cochran's Q statistic assessed the heterogeneity of the results, and the I2 statistic determined any inconsistency between studies. Job strain, low decision latitude, low social support, high psychological demands, effort-reward imbalance, and high job insecurity predicted common mental disorders despite the heterogeneity for psychological demands and social support among men. The strongest effects were found for job strain and effort-reward imbalance. This meta-analysis provides robust consistent evidence that (combinations of) high demands and low decision latitude and (combinations of) high efforts and low rewards are prospective risk factors for common mental disorders and suggests that the psychosocial work environment is important for mental health. The associations are not merely explained by response bias. The impact of work stressors on common mental disorders differs for women and men.

Gene-Based Association Analysis for Censored Traits Via Fixed Effect Functional Regressions.

PubMed

Fan, Ruzong; Wang, Yifan; Yan, Qi; Ding, Ying; Weeks, Daniel E; Lu, Zhaohui; Ren, Haobo; Cook, Richard J; Xiong, Momiao; Swaroop, Anand; Chew, Emily Y; Chen, Wei

2016-02-01

Genetic studies of survival outcomes have been proposed and conducted recently, but statistical methods for identifying genetic variants that affect disease progression are rarely developed. Motivated by our ongoing real studies, here we develop Cox proportional hazard models using functional regression (FR) to perform gene-based association analysis of survival traits while adjusting for covariates. The proposed Cox models are fixed effect models where the genetic effects of multiple genetic variants are assumed to be fixed. We introduce likelihood ratio test (LRT) statistics to test for associations between the survival traits and multiple genetic variants in a genetic region. Extensive simulation studies demonstrate that the proposed Cox RF LRT statistics have well-controlled type I error rates. To evaluate power, we compare the Cox FR LRT with the previously developed burden test (BT) in a Cox model and sequence kernel association test (SKAT), which is based on mixed effect Cox models. The Cox FR LRT statistics have higher power than or similar power as Cox SKAT LRT except when 50%/50% causal variants had negative/positive effects and all causal variants are rare. In addition, the Cox FR LRT statistics have higher power than Cox BT LRT. The models and related test statistics can be useful in the whole genome and whole exome association studies. An age-related macular degeneration dataset was analyzed as an example. © 2016 WILEY PERIODICALS, INC.

Gene-based Association Analysis for Censored Traits Via Fixed Effect Functional Regressions

PubMed Central

Fan, Ruzong; Wang, Yifan; Yan, Qi; Ding, Ying; Weeks, Daniel E.; Lu, Zhaohui; Ren, Haobo; Cook, Richard J; Xiong, Momiao; Swaroop, Anand; Chew, Emily Y.; Chen, Wei

2015-01-01

Summary Genetic studies of survival outcomes have been proposed and conducted recently, but statistical methods for identifying genetic variants that affect disease progression are rarely developed. Motivated by our ongoing real studies, we develop here Cox proportional hazard models using functional regression (FR) to perform gene-based association analysis of survival traits while adjusting for covariates. The proposed Cox models are fixed effect models where the genetic effects of multiple genetic variants are assumed to be fixed. We introduce likelihood ratio test (LRT) statistics to test for associations between the survival traits and multiple genetic variants in a genetic region. Extensive simulation studies demonstrate that the proposed Cox RF LRT statistics have well-controlled type I error rates. To evaluate power, we compare the Cox FR LRT with the previously developed burden test (BT) in a Cox model and sequence kernel association test (SKAT) which is based on mixed effect Cox models. The Cox FR LRT statistics have higher power than or similar power as Cox SKAT LRT except when 50%/50% causal variants had negative/positive effects and all causal variants are rare. In addition, the Cox FR LRT statistics have higher power than Cox BT LRT. The models and related test statistics can be useful in the whole genome and whole exome association studies. An age-related macular degeneration dataset was analyzed as an example. PMID:26782979

A simple rapid approach using coupled multivariate statistical methods, GIS and trajectory models to delineate areas of common oil spill risk

NASA Astrophysics Data System (ADS)

Guillen, George; Rainey, Gail; Morin, Michelle

2004-04-01

Currently, the Minerals Management Service uses the Oil Spill Risk Analysis model (OSRAM) to predict the movement of potential oil spills greater than 1000 bbl originating from offshore oil and gas facilities. OSRAM generates oil spill trajectories using meteorological and hydrological data input from either actual physical measurements or estimates generated from other hydrological models. OSRAM and many other models produce output matrices of average, maximum and minimum contact probabilities to specific landfall or target segments (columns) from oil spills at specific points (rows). Analysts and managers are often interested in identifying geographic areas or groups of facilities that pose similar risks to specific targets or groups of targets if a spill occurred. Unfortunately, due to the potentially large matrix generated by many spill models, this question is difficult to answer without the use of data reduction and visualization methods. In our study we utilized a multivariate statistical method called cluster analysis to group areas of similar risk based on potential distribution of landfall target trajectory probabilities. We also utilized ArcView™ GIS to display spill launch point groupings. The combination of GIS and multivariate statistical techniques in the post-processing of trajectory model output is a powerful tool for identifying and delineating areas of similar risk from multiple spill sources. We strongly encourage modelers, statistical and GIS software programmers to closely collaborate to produce a more seamless integration of these technologies and approaches to analyzing data. They are complimentary methods that strengthen the overall assessment of spill risks.

Investigating spousal concordance of diabetes through statistical analysis and data mining.

PubMed

Wang, Jong-Yi; Liu, Chiu-Shong; Lung, Chi-Hsuan; Yang, Ya-Tun; Lin, Ming-Hung

2017-01-01

Spousal clustering of diabetes merits attention. Whether old-age vulnerability or a shared family environment determines the concordance of diabetes is also uncertain. This study investigated the spousal concordance of diabetes and compared the risk of diabetes concordance between couples and noncouples by using nationally representative data. A total of 22,572 individuals identified from the 2002-2013 National Health Insurance Research Database of Taiwan constituted 5,643 couples and 5,643 noncouples through 1:1 dual propensity score matching (PSM). Factors associated with concordance in both spouses with diabetes were analyzed at the individual level. The risk of diabetes concordance between couples and noncouples was compared at the couple level. Logistic regression was the main statistical method. Statistical data were analyzed using SAS 9.4. C&RT and Apriori of data mining conducted in IBM SPSS Modeler 13 served as a supplement to statistics. High odds of the spousal concordance of diabetes were associated with old age, middle levels of urbanization, and high comorbidities (all P < 0.05). The dual PSM analysis revealed that the risk of diabetes concordance was significantly higher in couples (5.19%) than in noncouples (0.09%; OR = 61.743, P < 0.0001). A high concordance rate of diabetes in couples may indicate the influences of assortative mating and shared environment. Diabetes in a spouse implicates its risk in the partner. Family-based diabetes care that emphasizes the screening of couples at risk of diabetes by using the identified risk factors is suggested in prospective clinical practice interventions.

Investigating spousal concordance of diabetes through statistical analysis and data mining

PubMed Central

Liu, Chiu-Shong; Lung, Chi-Hsuan; Yang, Ya-Tun; Lin, Ming-Hung

2017-01-01

Objective Spousal clustering of diabetes merits attention. Whether old-age vulnerability or a shared family environment determines the concordance of diabetes is also uncertain. This study investigated the spousal concordance of diabetes and compared the risk of diabetes concordance between couples and noncouples by using nationally representative data. Methods A total of 22,572 individuals identified from the 2002–2013 National Health Insurance Research Database of Taiwan constituted 5,643 couples and 5,643 noncouples through 1:1 dual propensity score matching (PSM). Factors associated with concordance in both spouses with diabetes were analyzed at the individual level. The risk of diabetes concordance between couples and noncouples was compared at the couple level. Logistic regression was the main statistical method. Statistical data were analyzed using SAS 9.4. C&RT and Apriori of data mining conducted in IBM SPSS Modeler 13 served as a supplement to statistics. Results High odds of the spousal concordance of diabetes were associated with old age, middle levels of urbanization, and high comorbidities (all P < 0.05). The dual PSM analysis revealed that the risk of diabetes concordance was significantly higher in couples (5.19%) than in noncouples (0.09%; OR = 61.743, P < 0.0001). Conclusions A high concordance rate of diabetes in couples may indicate the influences of assortative mating and shared environment. Diabetes in a spouse implicates its risk in the partner. Family-based diabetes care that emphasizes the screening of couples at risk of diabetes by using the identified risk factors is suggested in prospective clinical practice interventions. PMID:28817654

Overweight, but not obesity, paradox on mortality following coronary artery bypass grafting.

PubMed

Takagi, Hisato; Umemoto, Takuya

2016-09-01

To determine whether an "obesity paradox" on post-coronary artery bypass grafting (CABG) mortality exists, we abstracted exclusively adjusted odds ratios (ORs) and/or hazard ratios (HRs) for mortality from each study, and then combined them in a meta-analysis. MEDLINE and EMBASE were searched through April 2015 using PubMed and OVID, to identify comparative studies, of overweight or obese versus normal weight patients undergoing CABG, reporting adjusted relative risk estimates for short-term (30-day or in-hospital) and/or mid-to-long-term all-cause mortality. Our search identified 14 eligible studies. In total our meta-analysis included data on 79,140 patients undergoing CABG. Pooled analyses in short-term mortality demonstrated that overweight was associated with a statistically significant 15% reduction relative to normal weight (OR, 0.85; 95% confidence interval [CI], 0.74-0.98; p=0.03) and no statistically significant differences between mild obesity, moderate/severe obesity, or overall obesity and normal weight. Pooled analyses in mid-to-long-term mortality demonstrated that overweight was associated with a statistically significant 10% reduction relative to normal weight (HR, 0.90; 95% CI, 0.84 to 0.96; p=0.001); and no statistically significant differences between mild obesity, moderate/severe obesity, or overall obesity and normal weight. Overweight, but not obesity, may be associated with better short-term and mid-to-long-term post-CABG survival relative to normal weight. An overweight, but not obesity, paradox on post-CABG mortality appears to exist. Copyright © 2015 Japanese College of Cardiology. Published by Elsevier Ltd. All rights reserved.

Improving information retrieval in functional analysis.

PubMed

Rodriguez, Juan C; González, Germán A; Fresno, Cristóbal; Llera, Andrea S; Fernández, Elmer A

2016-12-01

Transcriptome analysis is essential to understand the mechanisms regulating key biological processes and functions. The first step usually consists of identifying candidate genes; to find out which pathways are affected by those genes, however, functional analysis (FA) is mandatory. The most frequently used strategies for this purpose are Gene Set and Singular Enrichment Analysis (GSEA and SEA) over Gene Ontology. Several statistical methods have been developed and compared in terms of computational efficiency and/or statistical appropriateness. However, whether their results are similar or complementary, the sensitivity to parameter settings, or possible bias in the analyzed terms has not been addressed so far. Here, two GSEA and four SEA methods and their parameter combinations were evaluated in six datasets by comparing two breast cancer subtypes with well-known differences in genetic background and patient outcomes. We show that GSEA and SEA lead to different results depending on the chosen statistic, model and/or parameters. Both approaches provide complementary results from a biological perspective. Hence, an Integrative Functional Analysis (IFA) tool is proposed to improve information retrieval in FA. It provides a common gene expression analytic framework that grants a comprehensive and coherent analysis. Only a minimal user parameter setting is required, since the best SEA/GSEA alternatives are integrated. IFA utility was demonstrated by evaluating four prostate cancer and the TCGA breast cancer microarray datasets, which showed its biological generalization capabilities. Copyright © 2016 Elsevier Ltd. All rights reserved.

«

20

21

22

23

24

»

«

21

22

23

24

25

»

Gene co-expression network analysis in Rhodobacter capsulatus and application to comparative expression analysis of Rhodobacter sphaeroides

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pena-Castillo, Lourdes; Mercer, Ryan; Gurinovich, Anastasia

2014-08-28

The genus Rhodobacter contains purple nonsulfur bacteria found mostly in freshwater environments. Representative strains of two Rhodobacter species, R. capsulatus and R. sphaeroides, have had their genomes fully sequenced and both have been the subject of transcriptional profiling studies. Gene co-expression networks can be used to identify modules of genes with similar expression profiles. Functional analysis of gene modules can then associate co-expressed genes with biological pathways, and network statistics can determine the degree of module preservation in related networks. In this paper, we constructed an R. capsulatus gene co-expression network, performed functional analysis of identified gene modules, and investigatedmore » preservation of these modules in R. capsulatus proteomics data and in R. sphaeroides transcriptomics data. Results: The analysis identified 40 gene co-expression modules in R. capsulatus. Investigation of the module gene contents and expression profiles revealed patterns that were validated based on previous studies supporting the biological relevance of these modules. We identified two R. capsulatus gene modules preserved in the protein abundance data. We also identified several gene modules preserved between both Rhodobacter species, which indicate that these cellular processes are conserved between the species and are candidates for functional information transfer between species. Many gene modules were non-preserved, providing insight into processes that differentiate the two species. In addition, using Local Network Similarity (LNS), a recently proposed metric for expression divergence, we assessed the expression conservation of between-species pairs of orthologs, and within-species gene-protein expression profiles. Conclusions: Our analyses provide new sources of information for functional annotation in R. capsulatus because uncharacterized genes in modules are now connected with groups of genes that constitute a joint functional annotation. We identified R. capsulatus modules enriched with genes for ribosomal proteins, porphyrin and bacteriochlorophyll anabolism, and biosynthesis of secondary metabolites to be preserved in R. sphaeroides whereas modules related to RcGTA production and signalling showed lack of preservation in R. sphaeroides. In addition, we demonstrated that network statistics may also be applied within-species to identify congruence between mRNA expression and protein abundance data for which simple correlation measurements have previously had mixed results.« less

Applications of modern statistical methods to analysis of data in physical science

NASA Astrophysics Data System (ADS)

Wicker, James Eric

Modern methods of statistical and computational analysis offer solutions to dilemmas confronting researchers in physical science. Although the ideas behind modern statistical and computational analysis methods were originally introduced in the 1970's, most scientists still rely on methods written during the early era of computing. These researchers, who analyze increasingly voluminous and multivariate data sets, need modern analysis methods to extract the best results from their studies. The first section of this work showcases applications of modern linear regression. Since the 1960's, many researchers in spectroscopy have used classical stepwise regression techniques to derive molecular constants. However, problems with thresholds of entry and exit for model variables plagues this analysis method. Other criticisms of this kind of stepwise procedure include its inefficient searching method, the order in which variables enter or leave the model and problems with overfitting data. We implement an information scoring technique that overcomes the assumptions inherent in the stepwise regression process to calculate molecular model parameters. We believe that this kind of information based model evaluation can be applied to more general analysis situations in physical science. The second section proposes new methods of multivariate cluster analysis. The K-means algorithm and the EM algorithm, introduced in the 1960's and 1970's respectively, formed the basis of multivariate cluster analysis methodology for many years. However, several shortcomings of these methods include strong dependence on initial seed values and inaccurate results when the data seriously depart from hypersphericity. We propose new cluster analysis methods based on genetic algorithms that overcomes the strong dependence on initial seed values. In addition, we propose a generalization of the Genetic K-means algorithm which can accurately identify clusters with complex hyperellipsoidal covariance structures. We then use this new algorithm in a genetic algorithm based Expectation-Maximization process that can accurately calculate parameters describing complex clusters in a mixture model routine. Using the accuracy of this GEM algorithm, we assign information scores to cluster calculations in order to best identify the number of mixture components in a multivariate data set. We will showcase how these algorithms can be used to process multivariate data from astronomical observations.

Guidelines for Genome-Scale Analysis of Biological Rhythms.

PubMed

Hughes, Michael E; Abruzzi, Katherine C; Allada, Ravi; Anafi, Ron; Arpat, Alaaddin Bulak; Asher, Gad; Baldi, Pierre; de Bekker, Charissa; Bell-Pedersen, Deborah; Blau, Justin; Brown, Steve; Ceriani, M Fernanda; Chen, Zheng; Chiu, Joanna C; Cox, Juergen; Crowell, Alexander M; DeBruyne, Jason P; Dijk, Derk-Jan; DiTacchio, Luciano; Doyle, Francis J; Duffield, Giles E; Dunlap, Jay C; Eckel-Mahan, Kristin; Esser, Karyn A; FitzGerald, Garret A; Forger, Daniel B; Francey, Lauren J; Fu, Ying-Hui; Gachon, Frédéric; Gatfield, David; de Goede, Paul; Golden, Susan S; Green, Carla; Harer, John; Harmer, Stacey; Haspel, Jeff; Hastings, Michael H; Herzel, Hanspeter; Herzog, Erik D; Hoffmann, Christy; Hong, Christian; Hughey, Jacob J; Hurley, Jennifer M; de la Iglesia, Horacio O; Johnson, Carl; Kay, Steve A; Koike, Nobuya; Kornacker, Karl; Kramer, Achim; Lamia, Katja; Leise, Tanya; Lewis, Scott A; Li, Jiajia; Li, Xiaodong; Liu, Andrew C; Loros, Jennifer J; Martino, Tami A; Menet, Jerome S; Merrow, Martha; Millar, Andrew J; Mockler, Todd; Naef, Felix; Nagoshi, Emi; Nitabach, Michael N; Olmedo, Maria; Nusinow, Dmitri A; Ptáček, Louis J; Rand, David; Reddy, Akhilesh B; Robles, Maria S; Roenneberg, Till; Rosbash, Michael; Ruben, Marc D; Rund, Samuel S C; Sancar, Aziz; Sassone-Corsi, Paolo; Sehgal, Amita; Sherrill-Mix, Scott; Skene, Debra J; Storch, Kai-Florian; Takahashi, Joseph S; Ueda, Hiroki R; Wang, Han; Weitz, Charles; Westermark, Pål O; Wijnen, Herman; Xu, Ying; Wu, Gang; Yoo, Seung-Hee; Young, Michael; Zhang, Eric Erquan; Zielinski, Tomasz; Hogenesch, John B

2017-10-01

Genome biology approaches have made enormous contributions to our understanding of biological rhythms, particularly in identifying outputs of the clock, including RNAs, proteins, and metabolites, whose abundance oscillates throughout the day. These methods hold significant promise for future discovery, particularly when combined with computational modeling. However, genome-scale experiments are costly and laborious, yielding "big data" that are conceptually and statistically difficult to analyze. There is no obvious consensus regarding design or analysis. Here we discuss the relevant technical considerations to generate reproducible, statistically sound, and broadly useful genome-scale data. Rather than suggest a set of rigid rules, we aim to codify principles by which investigators, reviewers, and readers of the primary literature can evaluate the suitability of different experimental designs for measuring different aspects of biological rhythms. We introduce CircaInSilico, a web-based application for generating synthetic genome biology data to benchmark statistical methods for studying biological rhythms. Finally, we discuss several unmet analytical needs, including applications to clinical medicine, and suggest productive avenues to address them.

Guidelines for Genome-Scale Analysis of Biological Rhythms

PubMed Central

Hughes, Michael E.; Abruzzi, Katherine C.; Allada, Ravi; Anafi, Ron; Arpat, Alaaddin Bulak; Asher, Gad; Baldi, Pierre; de Bekker, Charissa; Bell-Pedersen, Deborah; Blau, Justin; Brown, Steve; Ceriani, M. Fernanda; Chen, Zheng; Chiu, Joanna C.; Cox, Juergen; Crowell, Alexander M.; DeBruyne, Jason P.; Dijk, Derk-Jan; DiTacchio, Luciano; Doyle, Francis J.; Duffield, Giles E.; Dunlap, Jay C.; Eckel-Mahan, Kristin; Esser, Karyn A.; FitzGerald, Garret A.; Forger, Daniel B.; Francey, Lauren J.; Fu, Ying-Hui; Gachon, Frédéric; Gatfield, David; de Goede, Paul; Golden, Susan S.; Green, Carla; Harer, John; Harmer, Stacey; Haspel, Jeff; Hastings, Michael H.; Herzel, Hanspeter; Herzog, Erik D.; Hoffmann, Christy; Hong, Christian; Hughey, Jacob J.; Hurley, Jennifer M.; de la Iglesia, Horacio O.; Johnson, Carl; Kay, Steve A.; Koike, Nobuya; Kornacker, Karl; Kramer, Achim; Lamia, Katja; Leise, Tanya; Lewis, Scott A.; Li, Jiajia; Li, Xiaodong; Liu, Andrew C.; Loros, Jennifer J.; Martino, Tami A.; Menet, Jerome S.; Merrow, Martha; Millar, Andrew J.; Mockler, Todd; Naef, Felix; Nagoshi, Emi; Nitabach, Michael N.; Olmedo, Maria; Nusinow, Dmitri A.; Ptáček, Louis J.; Rand, David; Reddy, Akhilesh B.; Robles, Maria S.; Roenneberg, Till; Rosbash, Michael; Ruben, Marc D.; Rund, Samuel S.C.; Sancar, Aziz; Sassone-Corsi, Paolo; Sehgal, Amita; Sherrill-Mix, Scott; Skene, Debra J.; Storch, Kai-Florian; Takahashi, Joseph S.; Ueda, Hiroki R.; Wang, Han; Weitz, Charles; Westermark, Pål O.; Wijnen, Herman; Xu, Ying; Wu, Gang; Yoo, Seung-Hee; Young, Michael; Zhang, Eric Erquan; Zielinski, Tomasz; Hogenesch, John B.

2017-01-01

Genome biology approaches have made enormous contributions to our understanding of biological rhythms, particularly in identifying outputs of the clock, including RNAs, proteins, and metabolites, whose abundance oscillates throughout the day. These methods hold significant promise for future discovery, particularly when combined with computational modeling. However, genome-scale experiments are costly and laborious, yielding “big data” that are conceptually and statistically difficult to analyze. There is no obvious consensus regarding design or analysis. Here we discuss the relevant technical considerations to generate reproducible, statistically sound, and broadly useful genome-scale data. Rather than suggest a set of rigid rules, we aim to codify principles by which investigators, reviewers, and readers of the primary literature can evaluate the suitability of different experimental designs for measuring different aspects of biological rhythms. We introduce CircaInSilico, a web-based application for generating synthetic genome biology data to benchmark statistical methods for studying biological rhythms. Finally, we discuss several unmet analytical needs, including applications to clinical medicine, and suggest productive avenues to address them. PMID:29098954

A phylogenetic transform enhances analysis of compositional microbiota data

PubMed Central

Silverman, Justin D; Washburne, Alex D; Mukherjee, Sayan; David, Lawrence A

2017-01-01

Surveys of microbial communities (microbiota), typically measured as relative abundance of species, have illustrated the importance of these communities in human health and disease. Yet, statistical artifacts commonly plague the analysis of relative abundance data. Here, we introduce the PhILR transform, which incorporates microbial evolutionary models with the isometric log-ratio transform to allow off-the-shelf statistical tools to be safely applied to microbiota surveys. We demonstrate that analyses of community-level structure can be applied to PhILR transformed data with performance on benchmarks rivaling or surpassing standard tools. Additionally, by decomposing distance in the PhILR transformed space, we identified neighboring clades that may have adapted to distinct human body sites. Decomposing variance revealed that covariation of bacterial clades within human body sites increases with phylogenetic relatedness. Together, these findings illustrate how the PhILR transform combines statistical and phylogenetic models to overcome compositional data challenges and enable evolutionary insights relevant to microbial communities. DOI: http://dx.doi.org/10.7554/eLife.21887.001 PMID:28198697

Factorial analysis of trihalomethanes formation in drinking water.

PubMed

Chowdhury, Shakhawat; Champagne, Pascale; McLellan, P James

2010-06-01

Disinfection of drinking water reduces pathogenic infection, but may pose risks to human health through the formation of disinfection byproducts. The effects of different factors on the formation of trihalomethanes were investigated using a statistically designed experimental program, and a predictive model for trihalomethanes formation was developed. Synthetic water samples with different factor levels were produced, and trihalomethanes concentrations were measured. A replicated fractional factorial design with center points was performed, and significant factors were identified through statistical analysis. A second-order trihalomethanes formation model was developed from 92 experiments, and the statistical adequacy was assessed through appropriate diagnostics. This model was validated using additional data from the Drinking Water Surveillance Program database and was applied to the Smiths Falls water supply system in Ontario, Canada. The model predictions were correlated strongly to the measured trihalomethanes, with correlations of 0.95 and 0.91, respectively. The resulting model can assist in analyzing risk-cost tradeoffs in the design and operation of water supply systems.

Synchronization from Second Order Network Connectivity Statistics

PubMed Central

Zhao, Liqiong; Beverlin, Bryce; Netoff, Theoden; Nykamp, Duane Q.

2011-01-01

We investigate how network structure can influence the tendency for a neuronal network to synchronize, or its synchronizability, independent of the dynamical model for each neuron. The synchrony analysis takes advantage of the framework of second order networks, which defines four second order connectivity statistics based on the relative frequency of two-connection network motifs. The analysis identifies two of these statistics, convergent connections, and chain connections, as highly influencing the synchrony. Simulations verify that synchrony decreases with the frequency of convergent connections and increases with the frequency of chain connections. These trends persist with simulations of multiple models for the neuron dynamics and for different types of networks. Surprisingly, divergent connections, which determine the fraction of shared inputs, do not strongly influence the synchrony. The critical role of chains, rather than divergent connections, in influencing synchrony can be explained by their increasing the effective coupling strength. The decrease of synchrony with convergent connections is primarily due to the resulting heterogeneity in firing rates. PMID:21779239

Statistical Study of the Properties of Magnetosheath Lion Roars using MMS observations

NASA Astrophysics Data System (ADS)

Giagkiozis, S.; Wilson, L. B., III

2017-12-01

Intense whistler-mode waves of very short duration are frequently encountered in the magnetosheath. These emissions have been linked to mirror mode waves and the Earth's bow shock. They can efficiently transfer energy between different plasma populations. These electromagnetic waves are commonly referred to as Lion roars (LR), due to the sound generated when the signals are sonified. They are generally observed during dips of the magnetic field that are anti-correlated with increases of density. Using MMS data, we have identified more than 1750 individual LR burst intervals. Each emission was band-pass filtered and further split into >35,000 subintervals, for which the direction of propagation and the polarization were calculated. The analysis of subinterval properties provides a more accurate representation of their true nature than the more commonly used time- and frequency-averaged dynamic spectra analysis. The results of the statistical analysis of the wave properties will be presented.

28 CFR 22.22 - Revelation of identifiable data.

Code of Federal Regulations, 2011 CFR

2011-07-01

... STATISTICAL INFORMATION § 22.22 Revelation of identifiable data. (a) Except as noted in paragraph (b) of this section, research and statistical information relating to a private person may be revealed in identifiable... Act. (3) Persons or organizations for research or statistical purposes. Information may only be...

Statistical analysis of field data for aircraft warranties

NASA Astrophysics Data System (ADS)

Lakey, Mary J.

Air Force and Navy maintenance data collection systems were researched to determine their scientific applicability to the warranty process. New and unique algorithms were developed to extract failure distributions which were then used to characterize how selected families of equipment typically fails. Families of similar equipment were identified in terms of function, technology and failure patterns. Statistical analyses and applications such as goodness-of-fit test, maximum likelihood estimation and derivation of confidence intervals for the probability density function parameters were applied to characterize the distributions and their failure patterns. Statistical and reliability theory, with relevance to equipment design and operational failures were also determining factors in characterizing the failure patterns of the equipment families. Inferences about the families with relevance to warranty needs were then made.

Application of principal component analysis (PCA) as a sensory assessment tool for fermented food products.

PubMed

Ghosh, Debasree; Chattopadhyay, Parimal

2012-06-01

The objective of the work was to use the method of quantitative descriptive analysis (QDA) to describe the sensory attributes of the fermented food products prepared with the incorporation of lactic cultures. Panellists were selected and trained to evaluate various attributes specially color and appearance, body texture, flavor, overall acceptability and acidity of the fermented food products like cow milk curd and soymilk curd, idli, sauerkraut and probiotic ice cream. Principal component analysis (PCA) identified the six significant principal components that accounted for more than 90% of the variance in the sensory attribute data. Overall product quality was modelled as a function of principal components using multiple least squares regression (R (2) = 0.8). The result from PCA was statistically analyzed by analysis of variance (ANOVA). These findings demonstrate the utility of quantitative descriptive analysis for identifying and measuring the fermented food product attributes that are important for consumer acceptability.

Methods for the evaluation of alternative disaster warning systems

NASA Technical Reports Server (NTRS)

Agnew, C. E.; Anderson, R. J., Jr.; Lanen, W. N.

1977-01-01

For each of the methods identified, a theoretical basis is provided and an illustrative example is described. The example includes sufficient realism and detail to enable an analyst to conduct an evaluation of other systems. The methods discussed in the study include equal capability cost analysis, consumers' surplus, and statistical decision theory.

Teaching: An Option for Mid-Life Retirees.

ERIC Educational Resources Information Center

Bell, David

This document identifies patterns of characteristics of those who have leisure as an option at mid-life. A comparison was made between individuals electing to enter teaching and those electing to pursue leisure at this life stage. Results of structured interviews, statistical results, and an analysis of a life satisfaction scale is given. In…

Promoting College Students' Problem Understanding Using Schema-Emphasizing Worked Examples

ERIC Educational Resources Information Center

Yan, Jie; Lavigne, Nancy C.

2014-01-01

Statistics learners often bypass the critical step of understanding a problem before executing solutions. Worked-out examples that identify problem information (e.g., data type, number of groups, purpose of analysis) key to determining a solution (e.g., "t" test, chi-square, correlation) can address this concern. The authors examined the…

Emotional Intelligence and ADHD: A Comparative Analysis in Students of Lima Metropolitan Area

ERIC Educational Resources Information Center

Barahona, Luciana M.; Alegre, Alberto A.

2016-01-01

The following study aims to identify statistically significant differences between adolescent students with and without Attention Deficit Disorder and Hyperactivity (ADHD) in emotional intelligence skills. The study sample was composed of 44 students with ADHD diagnosis and 192 students without ADHD; both groups were obtained by an intentional…

Qualification and Employment Opportunities. IAB Labour Market Research Topics No. 38.

ERIC Educational Resources Information Center

Rauch, Angela; Reinberg, Alexander

Official German unemployment statistics were analyzed along with data from Germany's microcensus and other published sources to identify recent labor market trends and to clarify the relationship between qualifications and employment opportunities in the new German economy. The analysis revealed that, as has been true for years, the lower the…

Saturation analysis of ChIP-seq data for reproducible identification of binding peaks

PubMed Central

Hansen, Peter; Hecht, Jochen; Ibrahim, Daniel M.; Krannich, Alexander; Truss, Matthias; Robinson, Peter N.

2015-01-01

Chromatin immunoprecipitation coupled with next-generation sequencing (ChIP-seq) is a powerful technology to identify the genome-wide locations of transcription factors and other DNA binding proteins. Computational ChIP-seq peak calling infers the location of protein–DNA interactions based on various measures of enrichment of sequence reads. In this work, we introduce an algorithm, Q, that uses an assessment of the quadratic enrichment of reads to center candidate peaks followed by statistical analysis of saturation of candidate peaks by 5′ ends of reads. We show that our method not only is substantially faster than several competing methods but also demonstrates statistically significant advantages with respect to reproducibility of results and in its ability to identify peaks with reproducible binding site motifs. We show that Q has superior performance in the delineation of double RNAPII and H3K4me3 peaks surrounding transcription start sites related to a better ability to resolve individual peaks. The method is implemented in C+l+ and is freely available under an open source license. PMID:26163319

Measuring outcome from vestibular rehabilitation, part II: refinement and validation of a new self-report measure.

PubMed

Morris, Anna E; Lutman, Mark E; Yardley, Lucy

2009-01-01

A prototype self-report measure of vestibular rehabilitation outcome is described in a previous paper. The objectives of the present work were to identify the most useful items and assess their psychometric properties. Stage 1: One hundred fifty-five participants completed a prototype 36-item Vestibular Rehabilitation Benefit Questionnaire (VRBQ). Statistical analysis demonstrated its subscale structure and identified redundant items. Stage 2: One hundred twenty-four participants completed a refined 22-item VRBQ and three established questionnaires (Dizziness Handicap Inventory, DHI; Vertigo Symptom Scale short form, VSS-sf; Medical Outcomes Study short form 36, SF-36) in a longitudinal study. Statistical analysis revealed four internally consistent subscales of the VRBQ: Dizziness, Anxiety, Motion-Provoked Dizziness, and Quality of Life. Correlations with the DHI, VSS-sf, and SF-36 support the validity of the VRBQ, and effect size estimates suggest that the VRBQ is more responsive than comparable questionnaires. Twenty participants completed the VRBQ twice in a 24-hour period, indicating excellent test-retest reliability. The VRBQ appears to be a concise and psychometrically robust questionnaire that addresses the main aspects of dizziness impact.

Data integration aids understanding of butterfly-host plant networks

NASA Astrophysics Data System (ADS)

Muto-Fujita, Ai; Takemoto, Kazuhiro; Kanaya, Shigehiko; Nakazato, Takeru; Tokimatsu, Toshiaki; Matsumoto, Natsushi; Kono, Mayo; Chubachi, Yuko; Ozaki, Katsuhisa; Kotera, Masaaki

2017-03-01

Although host-plant selection is a central topic in ecology, its general underpinnings are poorly understood. Here, we performed a case study focusing on the publicly available data on Japanese butterflies. A combined statistical analysis of plant-herbivore relationships and taxonomy revealed that some butterfly subfamilies in different families feed on the same plant families, and the occurrence of this phenomenon more than just by chance, thus indicating the independent acquisition of adaptive phenotypes to the same hosts. We consequently integrated plant-herbivore and plant-compound relationship data and conducted a statistical analysis to identify compounds unique to host plants of specific butterfly families. Some of the identified plant compounds are known to attract certain butterfly groups while repelling others. The additional incorporation of insect-compound relationship data revealed potential metabolic processes that are related to host plant selection. Our results demonstrate that data integration enables the computational detection of compounds putatively involved in particular interspecies interactions and that further data enrichment and integration of genomic and transcriptomic data facilitates the unveiling of the molecular mechanisms involved in host plant selection.

Genetic programming based models in plant tissue culture: An addendum to traditional statistical approach.

PubMed

Mridula, Meenu R; Nair, Ashalatha S; Kumar, K Satheesh

2018-02-01

In this paper, we compared the efficacy of observation based modeling approach using a genetic algorithm with the regular statistical analysis as an alternative methodology in plant research. Preliminary experimental data on in vitro rooting was taken for this study with an aim to understand the effect of charcoal and naphthalene acetic acid (NAA) on successful rooting and also to optimize the two variables for maximum result. Observation-based modelling, as well as traditional approach, could identify NAA as a critical factor in rooting of the plantlets under the experimental conditions employed. Symbolic regression analysis using the software deployed here optimised the treatments studied and was successful in identifying the complex non-linear interaction among the variables, with minimalistic preliminary data. The presence of charcoal in the culture medium has a significant impact on root generation by reducing basal callus mass formation. Such an approach is advantageous for establishing in vitro culture protocols as these models will have significant potential for saving time and expenditure in plant tissue culture laboratories, and it further reduces the need for specialised background.

«

21

22

23

24

25

»

«

21

22

23

24

25

»

Data integration aids understanding of butterfly–host plant networks

PubMed Central

Muto-Fujita, Ai; Takemoto, Kazuhiro; Kanaya, Shigehiko; Nakazato, Takeru; Tokimatsu, Toshiaki; Matsumoto, Natsushi; Kono, Mayo; Chubachi, Yuko; Ozaki, Katsuhisa; Kotera, Masaaki

2017-01-01

Although host-plant selection is a central topic in ecology, its general underpinnings are poorly understood. Here, we performed a case study focusing on the publicly available data on Japanese butterflies. A combined statistical analysis of plant–herbivore relationships and taxonomy revealed that some butterfly subfamilies in different families feed on the same plant families, and the occurrence of this phenomenon more than just by chance, thus indicating the independent acquisition of adaptive phenotypes to the same hosts. We consequently integrated plant–herbivore and plant–compound relationship data and conducted a statistical analysis to identify compounds unique to host plants of specific butterfly families. Some of the identified plant compounds are known to attract certain butterfly groups while repelling others. The additional incorporation of insect–compound relationship data revealed potential metabolic processes that are related to host plant selection. Our results demonstrate that data integration enables the computational detection of compounds putatively involved in particular interspecies interactions and that further data enrichment and integration of genomic and transcriptomic data facilitates the unveiling of the molecular mechanisms involved in host plant selection. PMID:28262809

An Empirical Taxonomy of Hospital Governing Board Roles

PubMed Central

Lee, Shoou-Yih D; Alexander, Jeffrey A; Wang, Virginia; Margolin, Frances S; Combes, John R

2008-01-01

Objective To develop a taxonomy of governing board roles in U.S. hospitals. Data Sources 2005 AHA Hospital Governance Survey, 2004 AHA Annual Survey of Hospitals, and Area Resource File. Study Design A governing board taxonomy was developed using cluster analysis. Results were validated and reviewed by industry experts. Differences in hospital and environmental characteristics across clusters were examined. Data Extraction Methods One-thousand three-hundred thirty-four hospitals with complete information on the study variables were included in the analysis. Principal Findings Five distinct clusters of hospital governing boards were identified. Statistical tests showed that the five clusters had high internal reliability and high internal validity. Statistically significant differences in hospital and environmental conditions were found among clusters. Conclusions The developed taxonomy provides policy makers, health care executives, and researchers a useful way to describe and understand hospital governing board roles. The taxonomy may also facilitate valid and systematic assessment of governance performance. Further, the taxonomy could be used as a framework for governing boards themselves to identify areas for improvement and direction for change. PMID:18355260

Observed and Simulated Radiative and Microphysical Properties of Tropical Convective Storms

NASA Technical Reports Server (NTRS)

DelGenio, Anthony D.; Hansen, James E. (Technical Monitor)

2001-01-01

Increases in the ice content, albedo and cloud cover of tropical convective storms in a warmer climate produce a large negative contribution to cloud feedback in the GISS GCM. Unfortunately, the physics of convective upward water transport, detrainment, and ice sedimentation, and the relationship of microphysical to radiative properties, are all quite uncertain. We apply a clustering algorithm to TRMM satellite microwave rainfall retrievals to identify contiguous deep precipitating storms throughout the tropics. Each storm is characterized according to its size, albedo, OLR, rain rate, microphysical structure, and presence/absence of lightning. A similar analysis is applied to ISCCP data during the TOGA/COARE experiment to identify optically thick deep cloud systems and relate them to large-scale environmental conditions just before storm onset. We examine the statistics of these storms to understand the relative climatic roles of small and large storms and the factors that regulate convective storm size and albedo. The results are compared to GISS GCM simulated statistics of tropical convective storms to identify areas of agreement and disagreement.

Acute myeloid leukemia risk by industry and occupation.

PubMed

Tsai, Rebecca J; Luckhaupt, Sara E; Schumacher, Pam; Cress, Rosemary D; Deapen, Dennis M; Calvert, Geoffrey M

2014-11-01

Acute myeloid leukemia (AML) is the most common type of leukemia found in adults. Identifying jobs that pose a risk for AML may be useful for identifying new risk factors. A matched case-control analysis was conducted using California Cancer Registry data from 1988 to 2007. This study included 8999 cases of AML and 24 822 controls. Industries with a statistically significant increased AML risk were construction (matched odds ratio [mOR] = 1.13); crop production (mOR = 1.41); support activities for agriculture and forestry (mOR = 2.05); and animal slaughtering and processing (mOR = 2.09). Among occupations with a statistically significant increased AML risk were miscellaneous agricultural workers (mOR = 1.76); fishers and related fishing workers (mOR = 2.02); nursing, psychiatric and home health aides (mOR = 1.65); and janitors and building cleaners (mOR = 1.54). Further investigation is needed to confirm study findings and to identify specific exposures responsible for the increased risks.

Identification of curriculum content for a renewable energy graduate degree program

NASA Astrophysics Data System (ADS)

Haughery, John R.

There currently exists a disconnect between renewable energy industry workforce needs and academic program proficiencies. This is evidenced by an absence of clear curriculum content on renewable energy graduate program websites. The purpose of this study was to identify a set of curriculum content for graduate degrees in renewable energy. At the conclusion, a clear list of 42 content items was identified and statistically ranked. The content items identified were based on a review of literature from government initiatives, professional society's body of knowledge, and related research studies. Leaders and experts in the field of renewable energy and sustainability were surveyed, using a five-point Likert-Scale model. This allowed each item's importance level to be analyzed and prioritized based on non-parametric statistical analysis methods. The study found seven competency items to be very important , 30 to be important, and five to be somewhat important. The results were also appropriate for use as a framework in developing or improving renewable energy graduate programs.

Acute myeloid leukemia risk by industry and occupation

PubMed Central

Tsai, Rebecca J.; Luckhaupt, Sara E.; Schumacher, Pam; Cress, Rosemary D.; Deapen, Dennis M.; Calvert, Geoffrey M.

2015-01-01

Acute myeloid leukemia (AML) is the most common type of leukemia found in adults. Identifying jobs that pose a risk for AML may be useful for identifying new risk factors. A matched case–control analysis was conducted using California Cancer Registry data from 1988 to 2007. This study included 8999 cases of AML and 24 822 controls. Industries with a statistically significant increased AML risk were construction (matched odds ratio [mOR] = 1.13); crop production (mOR = 1.41); support activities for agriculture and forestry (mOR = 2.05); and animal slaughtering and processing (mOR = 2.09). Among occupations with a statistically significant increased AML risk were miscellaneous agricultural workers (mOR = 1.76); fishers and related fishing workers (mOR = 2.02); nursing, psychiatric and home health aides (mOR = 1.65); and janitors and building cleaners (mOR = 1.54). Further investigation is needed to confirm study findings and to identify specific exposures responsible for the increased risks. PMID:24547710

Area estimation using multiyear designs and partial crop identification

NASA Technical Reports Server (NTRS)

Sielken, R. L., Jr.

1984-01-01

Statistical procedures were developed for large area assessments using both satellite and conventional data. Crop acreages, other ground cover indices, and measures of change were the principal characteristics of interest. These characteristics are capable of being estimated from samples collected possibly from several sources at varying times, with different levels of identification. Multiyear analysis techniques were extended to include partially identified samples; the best current year sampling design corresponding to a given sampling history was determined; weights reflecting the precision or confidence in each observation were identified and utilized, and the variation in estimates incorporating partially identified samples were quantified.

Advanced LIGO low-latency searches

NASA Astrophysics Data System (ADS)

Kanner, Jonah; LIGO Scientific Collaboration, Virgo Collaboration

2016-06-01

Advanced LIGO recently made the first detection of gravitational waves from merging binary black holes. The signal was first identified by a low-latency analysis, which identifies gravitational-wave transients within a few minutes of data collection. More generally, Advanced LIGO transients are sought with a suite of automated tools, which collectively identify events, evaluate statistical significance, estimate source position, and attempt to characterize source properties. This low-latency effort is enabling a broad multi-messenger approach to the science of compact object mergers and other transients. This talk will give an overview of the low-latency methodology and recent results.

AN EXPLORATION OF THE STATISTICAL SIGNATURES OF STELLAR FEEDBACK

DOE Office of Scientific and Technical Information (OSTI.GOV)

Boyden, Ryan D.; Offner, Stella S. R.; Koch, Eric W.

2016-12-20

All molecular clouds are observed to be turbulent, but the origin, means of sustenance, and evolution of the turbulence remain debated. One possibility is that stellar feedback injects enough energy into the cloud to drive observed motions on parsec scales. Recent numerical studies of molecular clouds have found that feedback from stars, such as protostellar outflows and winds, injects energy and impacts turbulence. We expand upon these studies by analyzing magnetohydrodynamic simulations of molecular clouds, including stellar winds, with a range of stellar mass-loss rates and magnetic field strengths. We generate synthetic {sup 12}CO(1–0) maps assuming that the simulations aremore » at the distance of the nearby Perseus molecular cloud. By comparing the outputs from different initial conditions and evolutionary times, we identify differences in the synthetic observations and characterize these using common astrostatistics. We quantify the different statistical responses using a variety of metrics proposed in the literature. We find that multiple astrostatistics, including the principal component analysis, the spectral correlation function, and the velocity coordinate spectrum (VCS), are sensitive to changes in stellar mass-loss rates and/or time evolution. A few statistics, including the Cramer statistic and VCS, are sensitive to the magnetic field strength. These findings demonstrate that stellar feedback influences molecular cloud turbulence and can be identified and quantified observationally using such statistics.« less

Differential Adverse Event Profiles Associated with BCG as a Preventive Tuberculosis Vaccine or Therapeutic Bladder Cancer Vaccine Identified by Comparative Ontology-Based VAERS and Literature Meta-Analysis

PubMed Central

Xie, Jiangan; Codd, Christopher; Mo, Kevin; He, Yongqun

2016-01-01

M. bovis strain Bacillus Calmette–Guérin (BCG) has been the only licensed live attenuated vaccine against tuberculosis (TB) for nearly one century and has also been approved as a therapeutic vaccine for bladder cancer treatment since 1990. During its long time usage, different adverse events (AEs) have been reported. However, the AEs associated with the BCG preventive TB vaccine and therapeutic cancer vaccine have not been systematically compared. In this study, we systematically collected various BCG AE data mined from the US VAERS database and PubMed literature reports, identified statistically significant BCG-associated AEs, and ontologically classified and compared these AEs related to these two types of BCG vaccine. From 397 VAERS BCG AE case reports, we identified 64 AEs statistically significantly associated with the BCG TB vaccine and 14 AEs with the BCG cancer vaccine. Our meta-analysis of 41 peer-reviewed journal reports identified 48 AEs associated with the BCG TB vaccine and 43 AEs associated with the BCG cancer vaccine. Among all identified AEs from VAERS and literature reports, 25 AEs belong to serious AEs. The Ontology of Adverse Events (OAE)-based ontological hierarchical analysis indicated that the AEs associated with the BCG TB vaccine were enriched in immune system (e.g., lymphadenopathy and lymphadenitis), skin (e.g., skin ulceration and cyanosis), and respiratory system (e.g., cough and pneumonia); in contrast, the AEs associated with the BCG cancer vaccine mainly occurred in the urinary system (e.g., dysuria, pollakiuria, and hematuria). With these distinct AE profiles detected, this study also discovered three AEs (i.e., chills, pneumonia, and C-reactive protein increased) shared by the BCG TB vaccine and bladder cancer vaccine. Furthermore, our deep investigation of 24 BCG-associated death cases from VAERS identified the important effects of age, vaccine co-administration, and immunosuppressive status on the final BCG-associated death outcome. PMID:27749923

Statistical analysis of hydrological response in urbanising catchments based on adaptive sampling using inter-amount times

NASA Astrophysics Data System (ADS)

ten Veldhuis, Marie-Claire; Schleiss, Marc

2017-04-01

Urban catchments are typically characterised by a more flashy nature of the hydrological response compared to natural catchments. Predicting flow changes associated with urbanisation is not straightforward, as they are influenced by interactions between impervious cover, basin size, drainage connectivity and stormwater management infrastructure. In this study, we present an alternative approach to statistical analysis of hydrological response variability and basin flashiness, based on the distribution of inter-amount times. We analyse inter-amount time distributions of high-resolution streamflow time series for 17 (semi-)urbanised basins in North Carolina, USA, ranging from 13 to 238 km2 in size. We show that in the inter-amount-time framework, sampling frequency is tuned to the local variability of the flow pattern, resulting in a different representation and weighting of high and low flow periods in the statistical distribution. This leads to important differences in the way the distribution quantiles, mean, coefficient of variation and skewness vary across scales and results in lower mean intermittency and improved scaling. Moreover, we show that inter-amount-time distributions can be used to detect regulation effects on flow patterns, identify critical sampling scales and characterise flashiness of hydrological response. The possibility to use both the classical approach and the inter-amount-time framework to identify minimum observable scales and analyse flow data opens up interesting areas for future research.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kacprzak, T.; Kirk, D.; Friedrich, O.

Shear peak statistics has gained a lot of attention recently as a practical alternative to the two point statistics for constraining cosmological parameters. We perform a shear peak statistics analysis of the Dark Energy Survey (DES) Science Verification (SV) data, using weak gravitational lensing measurements from a 139 degmore » $^2$ field. We measure the abundance of peaks identified in aperture mass maps, as a function of their signal-to-noise ratio, in the signal-to-noise range $$0<\\mathcal S / \\mathcal N<4$$. To predict the peak counts as a function of cosmological parameters we use a suite of $N$-body simulations spanning 158 models with varying $$\\Omega_{\\rm m}$$ and $$\\sigma_8$$, fixing $w = -1$, $$\\Omega_{\\rm b} = 0.04$$, $h = 0.7$ and $$n_s=1$$, to which we have applied the DES SV mask and redshift distribution. In our fiducial analysis we measure $$\\sigma_{8}(\\Omega_{\\rm m}/0.3)^{0.6}=0.77 \\pm 0.07$$, after marginalising over the shear multiplicative bias and the error on the mean redshift of the galaxy sample. We introduce models of intrinsic alignments, blending, and source contamination by cluster members. These models indicate that peaks with $$\\mathcal S / \\mathcal N>4$$ would require significant corrections, which is why we do not include them in our analysis. We compare our results to the cosmological constraints from the two point analysis on the SV field and find them to be in good agreement in both the central value and its uncertainty. As a result, we discuss prospects for future peak statistics analysis with upcoming DES data.« less

STAPP: Spatiotemporal analysis of plantar pressure measurements using statistical parametric mapping.

PubMed

Booth, Brian G; Keijsers, Noël L W; Sijbers, Jan; Huysmans, Toon

2018-05-03

Pedobarography produces large sets of plantar pressure samples that are routinely subsampled (e.g. using regions of interest) or aggregated (e.g. center of pressure trajectories, peak pressure images) in order to simplify statistical analysis and provide intuitive clinical measures. We hypothesize that these data reductions discard gait information that can be used to differentiate between groups or conditions. To test the hypothesis of null information loss, we created an implementation of statistical parametric mapping (SPM) for dynamic plantar pressure datasets (i.e. plantar pressure videos). Our SPM software framework brings all plantar pressure videos into anatomical and temporal correspondence, then performs statistical tests at each sampling location in space and time. Novelly, we introduce non-linear temporal registration into the framework in order to normalize for timing differences within the stance phase. We refer to our software framework as STAPP: spatiotemporal analysis of plantar pressure measurements. Using STAPP, we tested our hypothesis on plantar pressure videos from 33 healthy subjects walking at different speeds. As walking speed increased, STAPP was able to identify significant decreases in plantar pressure at mid-stance from the heel through the lateral forefoot. The extent of these plantar pressure decreases has not previously been observed using existing plantar pressure analysis techniques. We therefore conclude that the subsampling of plantar pressure videos - a task which led to the discarding of gait information in our study - can be avoided using STAPP. Copyright © 2018 Elsevier B.V. All rights reserved.

Social media engagement analysis of U.S. Federal health agencies on Facebook.

PubMed

Bhattacharya, Sanmitra; Srinivasan, Padmini; Polgreen, Philip

2017-04-21

It is becoming increasingly common for individuals and organizations to use social media platforms such as Facebook. These are being used for a wide variety of purposes including disseminating, discussing and seeking health related information. U.S. Federal health agencies are leveraging these platforms to 'engage' social media users to read, spread, promote and encourage health related discussions. However, different agencies and their communications get varying levels of engagement. In this study we use statistical models to identify factors that associate with engagement. We analyze over 45,000 Facebook posts from 72 Facebook accounts belonging to 24 health agencies. Account usage, user activity, sentiment and content of these posts are studied. We use the hurdle regression model to identify factors associated with the level of engagement and Cox proportional hazards model to identify factors associated with duration of engagement. In our analysis we find that agencies and accounts vary widely in their usage of social media and activity they generate. Statistical analysis shows, for instance, that Facebook posts with more visual cues such as photos or videos or those which express positive sentiment generate more engagement. We further find that posts on certain topics such as occupation or organizations negatively affect the duration of engagement. We present the first comprehensive analyses of engagement with U.S. Federal health agencies on Facebook. In addition, we briefly compare and contrast findings from this study to our earlier study with similar focus but on Twitter to show the robustness of our methods.

Assessing Statistically Significant Heavy-Metal Concentrations in Abandoned Mine Areas via Hot Spot Analysis of Portable XRF Data

PubMed Central

Kim, Sung-Min; Choi, Yosoon

2017-01-01

To develop appropriate measures to prevent soil contamination in abandoned mining areas, an understanding of the spatial variation of the potentially toxic trace elements (PTEs) in the soil is necessary. For the purpose of effective soil sampling, this study uses hot spot analysis, which calculates a z-score based on the Getis-Ord Gi* statistic to identify a statistically significant hot spot sample. To constitute a statistically significant hot spot, a feature with a high value should also be surrounded by other features with high values. Using relatively cost- and time-effective portable X-ray fluorescence (PXRF) analysis, sufficient input data are acquired from the Busan abandoned mine and used for hot spot analysis. To calibrate the PXRF data, which have a relatively low accuracy, the PXRF analysis data are transformed using the inductively coupled plasma atomic emission spectrometry (ICP-AES) data. The transformed PXRF data of the Busan abandoned mine are classified into four groups according to their normalized content and z-scores: high content with a high z-score (HH), high content with a low z-score (HL), low content with a high z-score (LH), and low content with a low z-score (LL). The HL and LH cases may be due to measurement errors. Additional or complementary surveys are required for the areas surrounding these suspect samples or for significant hot spot areas. The soil sampling is conducted according to a four-phase procedure in which the hot spot analysis and proposed group classification method are employed to support the development of a sampling plan for the following phase. Overall, 30, 50, 80, and 100 samples are investigated and analyzed in phases 1–4, respectively. The method implemented in this case study may be utilized in the field for the assessment of statistically significant soil contamination and the identification of areas for which an additional survey is required. PMID:28629168

Assessing Statistically Significant Heavy-Metal Concentrations in Abandoned Mine Areas via Hot Spot Analysis of Portable XRF Data.

PubMed

Kim, Sung-Min; Choi, Yosoon

2017-06-18

To develop appropriate measures to prevent soil contamination in abandoned mining areas, an understanding of the spatial variation of the potentially toxic trace elements (PTEs) in the soil is necessary. For the purpose of effective soil sampling, this study uses hot spot analysis, which calculates a z -score based on the Getis-Ord Gi* statistic to identify a statistically significant hot spot sample. To constitute a statistically significant hot spot, a feature with a high value should also be surrounded by other features with high values. Using relatively cost- and time-effective portable X-ray fluorescence (PXRF) analysis, sufficient input data are acquired from the Busan abandoned mine and used for hot spot analysis. To calibrate the PXRF data, which have a relatively low accuracy, the PXRF analysis data are transformed using the inductively coupled plasma atomic emission spectrometry (ICP-AES) data. The transformed PXRF data of the Busan abandoned mine are classified into four groups according to their normalized content and z -scores: high content with a high z -score (HH), high content with a low z -score (HL), low content with a high z -score (LH), and low content with a low z -score (LL). The HL and LH cases may be due to measurement errors. Additional or complementary surveys are required for the areas surrounding these suspect samples or for significant hot spot areas. The soil sampling is conducted according to a four-phase procedure in which the hot spot analysis and proposed group classification method are employed to support the development of a sampling plan for the following phase. Overall, 30, 50, 80, and 100 samples are investigated and analyzed in phases 1-4, respectively. The method implemented in this case study may be utilized in the field for the assessment of statistically significant soil contamination and the identification of areas for which an additional survey is required.

Pan-Cancer Analysis of Mutation Hotspots in Protein Domains.

PubMed

Miller, Martin L; Reznik, Ed; Gauthier, Nicholas P; Aksoy, Bülent Arman; Korkut, Anil; Gao, Jianjiong; Ciriello, Giovanni; Schultz, Nikolaus; Sander, Chris

2015-09-23

In cancer genomics, recurrence of mutations in independent tumor samples is a strong indicator of functional impact. However, rare functional mutations can escape detection by recurrence analysis owing to lack of statistical power. We enhance statistical power by extending the notion of recurrence of mutations from single genes to gene families that share homologous protein domains. Domain mutation analysis also sharpens the functional interpretation of the impact of mutations, as domains more succinctly embody function than entire genes. By mapping mutations in 22 different tumor types to equivalent positions in multiple sequence alignments of domains, we confirm well-known functional mutation hotspots, identify uncharacterized rare variants in one gene that are equivalent to well-characterized mutations in another gene, detect previously unknown mutation hotspots, and provide hypotheses about molecular mechanisms and downstream effects of domain mutations. With the rapid expansion of cancer genomics projects, protein domain hotspot analysis will likely provide many more leads linking mutations in proteins to the cancer phenotype. Copyright © 2015 Elsevier Inc. All rights reserved.

Point-by-point compositional analysis for atom probe tomography.

PubMed

Stephenson, Leigh T; Ceguerra, Anna V; Li, Tong; Rojhirunsakool, Tanaporn; Nag, Soumya; Banerjee, Rajarshi; Cairney, Julie M; Ringer, Simon P

2014-01-01

This new alternate approach to data processing for analyses that traditionally employed grid-based counting methods is necessary because it removes a user-imposed coordinate system that not only limits an analysis but also may introduce errors. We have modified the widely used "binomial" analysis for APT data by replacing grid-based counting with coordinate-independent nearest neighbour identification, improving the measurements and the statistics obtained, allowing quantitative analysis of smaller datasets, and datasets from non-dilute solid solutions. It also allows better visualisation of compositional fluctuations in the data. Our modifications include:.•using spherical k-atom blocks identified by each detected atom's first k nearest neighbours.•3D data visualisation of block composition and nearest neighbour anisotropy.•using z-statistics to directly compare experimental and expected composition curves. Similar modifications may be made to other grid-based counting analyses (contingency table, Langer-Bar-on-Miller, sinusoidal model) and could be instrumental in developing novel data visualisation options.

To Identify the Important Soil Properties Affecting Dinoseb Adsorption with Statistical Analysis

PubMed Central

Guan, Yiqing; Wei, Jianhui; Zhang, Danrong; Zu, Mingjuan; Zhang, Liru

2013-01-01

Investigating the influences of soil characteristic factors on dinoseb adsorption parameter with different statistical methods would be valuable to explicitly figure out the extent of these influences. The correlation coefficients and the direct, indirect effects of soil characteristic factors on dinoseb adsorption parameter were analyzed through bivariate correlation analysis, and path analysis. With stepwise regression analysis the factors which had little influence on the adsorption parameter were excluded. Results indicate that pH and CEC had moderate relationship and lower direct effect on dinoseb adsorption parameter due to the multicollinearity with other soil factors, and organic carbon and clay contents were found to be the most significant soil factors which affect the dinoseb adsorption process. A regression is thereby set up to explore the relationship between the dinoseb adsorption parameter and the two soil factors: the soil organic carbon and clay contents. A 92% of the variation of dinoseb sorption coefficient could be attributed to the variation of the soil organic carbon and clay contents. PMID:23737715

Statistical framework for detection of genetically modified organisms based on Next Generation Sequencing.

PubMed

Willems, Sander; Fraiture, Marie-Alice; Deforce, Dieter; De Keersmaecker, Sigrid C J; De Loose, Marc; Ruttink, Tom; Herman, Philippe; Van Nieuwerburgh, Filip; Roosens, Nancy

2016-02-01

Because the number and diversity of genetically modified (GM) crops has significantly increased, their analysis based on real-time PCR (qPCR) methods is becoming increasingly complex and laborious. While several pioneers already investigated Next Generation Sequencing (NGS) as an alternative to qPCR, its practical use has not been assessed for routine analysis. In this study a statistical framework was developed to predict the number of NGS reads needed to detect transgene sequences, to prove their integration into the host genome and to identify the specific transgene event in a sample with known composition. This framework was validated by applying it to experimental data from food matrices composed of pure GM rice, processed GM rice (noodles) or a 10% GM/non-GM rice mixture, revealing some influential factors. Finally, feasibility of NGS for routine analysis of GM crops was investigated by applying the framework to samples commonly encountered in routine analysis of GM crops. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.

«

21

22

23

24

25

»

«

21

22

23

24

25

»

Never too old for anonymity: a statistical standard for demographic data sharing via the HIPAA Privacy Rule

PubMed Central

Benitez, Kathleen; Masys, Daniel

2010-01-01

Objective Healthcare organizations must de-identify patient records before sharing data. Many organizations rely on the Safe Harbor Standard of the HIPAA Privacy Rule, which enumerates 18 identifiers that must be suppressed (eg, ages over 89). An alternative model in the Privacy Rule, known as the Statistical Standard, can facilitate the sharing of more detailed data, but is rarely applied because of a lack of published methodologies. The authors propose an intuitive approach to de-identifying patient demographics in accordance with the Statistical Standard. Design The authors conduct an analysis of the demographics of patient cohorts in five medical centers developed for the NIH-sponsored Electronic Medical Records and Genomics network, with respect to the US census. They report the re-identification risk of patient demographics disclosed according to the Safe Harbor policy and the relative risk rate for sharing such information via alternative policies. Measurements The re-identification risk of Safe Harbor demographics ranged from 0.01% to 0.19%. The findings show alternative de-identification models can be created with risks no greater than Safe Harbor. The authors illustrate that the disclosure of patient ages over the age of 89 is possible when other features are reduced in granularity. Limitations The de-identification approach described in this paper was evaluated with demographic data only and should be evaluated with other potential identifiers. Conclusion Alternative de-identification policies to the Safe Harbor model can be derived for patient demographics to enable the disclosure of values that were previously suppressed. The method is generalizable to any environment in which population statistics are available. PMID:21169618

The Cognitive Effects of Antidepressants in Major Depressive Disorder: A Systematic Review and Meta-Analysis of Randomized Clinical Trials

PubMed Central

Rosenblat, Joshua D; Kakar, Ron

2016-01-01

Background: Cognitive dysfunction is often present in major depressive disorder (MDD). Several clinical trials have noted a pro-cognitive effect of antidepressants in MDD. The objective of the current systematic review and meta-analysis was to assess the pooled efficacy of antidepressants on various domains of cognition in MDD. Methods: Trials published prior to April 15, 2015, were identified through searching the Cochrane Central Register of Controlled Trials, PubMed, Embase, PsychINFO, Clinicaltrials.gov, and relevant review articles. Data from randomized clinical trials assessing the cognitive effects of antidepressants were pooled to determine standard mean differences (SMD) using a random-effects model. Results: Nine placebo-controlled randomized trials (2 550 participants) evaluating the cognitive effects of vortioxetine (n = 728), duloxetine (n = 714), paroxetine (n = 23), citalopram (n = 84), phenelzine (n = 28), nortryptiline (n = 32), and sertraline (n = 49) were identified. Antidepressants had a positive effect on psychomotor speed (SMD 0.16; 95% confidence interval [CI] 0.05–0.27; I2 = 46%) and delayed recall (SMD 0.24; 95% CI 0.15–0.34; I2 = 0%). The effect on cognitive control and executive function did not reach statistical significance. Of note, after removal of vortioxetine from the analysis, statistical significance was lost for psychomotor speed. Eight head-to-head randomized trials comparing the effects of selective serotonin reuptake inhibitors (SSRIs; n = 371), selective serotonin and norepinephrine reuptake inhibitors (SNRIs; n = 25), tricyclic antidepressants (TCAs; n = 138), and norepinephrine and dopamine reuptake inhibitors (NDRIs; n = 46) were identified. No statistically significant difference in cognitive effects was found when pooling results from head-to-head trials of SSRIs, SNRIs, TCAs, and NDRIs. Significant limitations were the heterogeneity of results, limited number of studies, and small sample sizes. Conclusions: Available evidence suggests that antidepressants have a significant positive effect on psychomotor speed and delayed recall. PMID:26209859

Resection of Concomitant Hepatic and Extrahepatic Metastases from Colorectal Cancer - A Worthwhile Operation?

PubMed

Diaconescu, Andrei; Alexandrescu, Sorin; Ionel, Zenaida; Zlate, Cristian; Grigorie, Razvan; Brasoveanu, Vladislav; Hrehoret, Doina; Ciurea, Silviu; Botea, Florin; Tomescu, Dana; Droc, Gabriela; Croitoru, Adina; Herlea, Vlad; Boros, Mirela; Grasu, Mugur; Dumitru, Radu; Toma, Mihai; Ionescu, Mihnea; Vasilescu, Catalin; Popescu, Irinel

2017-01-01

Background: The benefit of hepatic resection in case of concomitant colorectal hepatic and extrahepatic metastases (CHEHMs) is still debatable. The purpose of this study is to assess the results of resection of hepatic and extrahepatic metastases in patients with CHEHMs in a high-volume center for both hepatobiliary and colorectal surgery and to identify prognostic factors that correlate with longer survival in these patients. It was performed a retrospective analysis of 678 consecutive patients with liver resection for colorectal cancer metastases operated in a single Centre between April 1996 and March 2016. Among these, 73 patients presented CHEHMs. Univariate analysis was performed to identify the risk factors for overall survival (OS) in these patients. Results: There were 20 CHMs located at the lymphatic node level, 20 at the peritoneal level, 12 at the ovary and lung level, 12 presenting as local relapses and 9 other sites. 53 curative resections (R0) were performed. The difference in overall survival between the CHEHMs group and the CHMs group is statistically significant for the entire groups (p 0.0001), as well as in patients who underwent R0 resection (p 0.0001). In CHEHMs group, the OS was statistically significant higher in patients who underwent R0 resection vs. those with R1/R2 resection (p=0.004). Three variables were identified as prognostic factors for poor OS following univariate analysis: 4 or more hepatic metastases, major hepatectomy and the performance of operation during first period of the study (1996 - 2004). There was a tendency toward better OS in patients with ovarian or pulmonary location of extrahepatic disease, although the difference was not statistically significant. In patients with concomitant hepatic and extrahepatic metastases, complete resection of metastatic burden significantly prolong survival. The patients with up to 4 liver metastases, resectable by minor hepatectomy benefit the most from this aggressive onco-surgical management. Celsius.

Identification of stress responsive genes by studying specific relationships between mRNA and protein abundance.

PubMed

Morimoto, Shimpei; Yahara, Koji

2018-03-01

Protein expression is regulated by the production and degradation of mRNAs and proteins but the specifics of their relationship are controversial. Although technological advances have enabled genome-wide and time-series surveys of mRNA and protein abundance, recent studies have shown paradoxical results, with most statistical analyses being limited to linear correlation, or analysis of variance applied separately to mRNA and protein datasets. Here, using recently analyzed genome-wide time-series data, we have developed a statistical analysis framework for identifying which types of genes or biological gene groups have significant correlation between mRNA and protein abundance after accounting for potential time delays. Our framework stratifies all genes in terms of the extent of time delay, conducts gene clustering in each stratum, and performs a non-parametric statistical test of the correlation between mRNA and protein abundance in a gene cluster. Consequently, we revealed stronger correlations than previously reported between mRNA and protein abundance in two metabolic pathways. Moreover, we identified a pair of stress responsive genes ( ADC17 and KIN1 ) that showed a highly similar time series of mRNA and protein abundance. Furthermore, we confirmed robustness of the analysis framework by applying it to another genome-wide time-series data and identifying a cytoskeleton-related gene cluster (keratin 18, keratin 17, and mitotic spindle positioning) that shows similar correlation. The significant correlation and highly similar changes of mRNA and protein abundance suggests a concerted role of these genes in cellular stress response, which we consider provides an answer to the question of the specific relationships between mRNA and protein in a cell. In addition, our framework for studying the relationship between mRNAs and proteins in a cell will provide a basis for studying specific relationships between mRNA and protein abundance after accounting for potential time delays.

Evaluation of risk communication in a mammography patient decision aid.

PubMed

Klein, Krystal A; Watson, Lindsey; Ash, Joan S; Eden, Karen B

2016-07-01

We characterized patients' comprehension, memory, and impressions of risk communication messages in a patient decision aid (PtDA), Mammopad, and clarified perceived importance of numeric risk information in medical decision making. Participants were 75 women in their forties with average risk factors for breast cancer. We used mixed methods, comprising a risk estimation problem administered within a pretest-posttest design, and semi-structured qualitative interviews with a subsample of 21 women. Participants' positive predictive value estimates of screening mammography improved after using Mammopad. Although risk information was only briefly memorable, through content analysis, we identified themes describing why participants value quantitative risk information, and obstacles to understanding. We describe ways the most complicated graphic was incompletely comprehended. Comprehension of risk information following Mammopad use could be improved. Patients valued receiving numeric statistical information, particularly in pictograph format. Obstacles to understanding risk information, including potential for confusion between statistics, should be identified and mitigated in PtDA design. Using simple pictographs accompanied by text, PtDAs may enhance a shared decision-making discussion. PtDA designers and providers should be aware of benefits and limitations of graphical risk presentations. Incorporating comprehension checks could help identify and correct misapprehensions of graphically presented statistics. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

Evaluation of risk communication in a mammography patient decision aid

PubMed Central

Klein, Krystal A.; Watson, Lindsey; Ash, Joan S.; Eden, Karen B.

2016-01-01

Objectives We characterized patients’ comprehension, memory, and impressions of risk communication messages in a patient decision aid (PtDA), Mammopad, and clarified perceived importance of numeric risk information in medical decision making. Methods Participants were 75 women in their forties with average risk factors for breast cancer. We used mixed methods, comprising a risk estimation problem administered within a pretest–posttest design, and semi-structured qualitative interviews with a subsample of 21 women. Results Participants’ positive predictive value estimates of screening mammography improved after using Mammopad. Although risk information was only briefly memorable, through content analysis, we identified themes describing why participants value quantitative risk information, and obstacles to understanding. We describe ways the most complicated graphic was incompletely comprehended. Conclusions Comprehension of risk information following Mammopad use could be improved. Patients valued receiving numeric statistical information, particularly in pictograph format. Obstacles to understanding risk information, including potential for confusion between statistics, should be identified and mitigated in PtDA design. Practice implications Using simple pictographs accompanied by text, PtDAs may enhance a shared decision-making discussion. PtDA designers and providers should be aware of benefits and limitations of graphical risk presentations. Incorporating comprehension checks could help identify and correct misapprehensions of graphically presented statistics PMID:26965020

Graph theory applied to noise and vibration control in statistical energy analysis models.

PubMed

Guasch, Oriol; Cortés, Lluís

2009-06-01

A fundamental aspect of noise and vibration control in statistical energy analysis (SEA) models consists in first identifying and then reducing the energy flow paths between subsystems. In this work, it is proposed to make use of some results from graph theory to address both issues. On the one hand, linear and path algebras applied to adjacency matrices of SEA graphs are used to determine the existence of any order paths between subsystems, counting and labeling them, finding extremal paths, or determining the power flow contributions from groups of paths. On the other hand, a strategy is presented that makes use of graph cut algorithms to reduce the energy flow from a source subsystem to a receiver one, modifying as few internal and coupling loss factors as possible.

Comparative inference of duplicated genes produced by polyploidization in soybean genome.

PubMed

Yang, Yanmei; Wang, Jinpeng; Di, Jianyong

2013-01-01

Soybean (Glycine max) is one of the most important crop plants for providing protein and oil. It is important to investigate soybean genome for its economic and scientific value. Polyploidy is a widespread and recursive phenomenon during plant evolution, and it could generate massive duplicated genes which is an important resource for genetic innovation. Improved sequence alignment criteria and statistical analysis are used to identify and characterize duplicated genes produced by polyploidization in soybean. Based on the collinearity method, duplicated genes by whole genome duplication account for 70.3% in soybean. From the statistical analysis of the molecular distances between duplicated genes, our study indicates that the whole genome duplication event occurred more than once in the genome evolution of soybean, which is often distributed near the ends of chromosomes.

Docking studies on NSAID/COX-2 isozyme complexes using Contact Statistics analysis

NASA Astrophysics Data System (ADS)

Ermondi, Giuseppe; Caron, Giulia; Lawrence, Raelene; Longo, Dario

2004-11-01

The selective inhibition of COX-2 isozymes should lead to a new generation of NSAIDs with significantly reduced side effects; e.g. celecoxib (Celebrex®) and rofecoxib (Vioxx®). To obtain inhibitors with higher selectivity it has become essential to gain additional insight into the details of the interactions between COX isozymes and NSAIDs. Although X-ray structures of COX-2 complexed with a small number of ligands are available, experimental data are missing for two well-known selective COX-2 inhibitors (rofecoxib and nimesulide) and docking results reported are controversial. We use a combination of a traditional docking procedure with a new computational tool (Contact Statistics analysis) that identifies the best orientation among a number of solutions to shed some light on this topic.

Haystack, a web-based tool for metabolomics research

PubMed Central

2014-01-01

Background Liquid chromatography coupled to mass spectrometry (LCMS) has become a widely used technique in metabolomics research for differential profiling, the broad screening of biomolecular constituents across multiple samples to diagnose phenotypic differences and elucidate relevant features. However, a significant limitation in LCMS-based metabolomics is the high-throughput data processing required for robust statistical analysis and data modeling for large numbers of samples with hundreds of unique chemical species. Results To address this problem, we developed Haystack, a web-based tool designed to visualize, parse, filter, and extract significant features from LCMS datasets rapidly and efficiently. Haystack runs in a browser environment with an intuitive graphical user interface that provides both display and data processing options. Total ion chromatograms (TICs) and base peak chromatograms (BPCs) are automatically displayed, along with time-resolved mass spectra and extracted ion chromatograms (EICs) over any mass range. Output files in the common .csv format can be saved for further statistical analysis or customized graphing. Haystack's core function is a flexible binning procedure that converts the mass dimension of the chromatogram into a set of interval variables that can uniquely identify a sample. Binned mass data can be analyzed by exploratory methods such as principal component analysis (PCA) to model class assignment and identify discriminatory features. The validity of this approach is demonstrated by comparison of a dataset from plants grown at two light conditions with manual and automated peak detection methods. Haystack successfully predicted class assignment based on PCA and cluster analysis, and identified discriminatory features based on analysis of EICs of significant bins. Conclusion Haystack, a new online tool for rapid processing and analysis of LCMS-based metabolomics data is described. It offers users a range of data visualization options and supports non-biased differential profiling studies through a unique and flexible binning function that provides an alternative to conventional peak deconvolution analysis methods. PMID:25350247

Haystack, a web-based tool for metabolomics research.

PubMed

Grace, Stephen C; Embry, Stephen; Luo, Heng

2014-01-01

Liquid chromatography coupled to mass spectrometry (LCMS) has become a widely used technique in metabolomics research for differential profiling, the broad screening of biomolecular constituents across multiple samples to diagnose phenotypic differences and elucidate relevant features. However, a significant limitation in LCMS-based metabolomics is the high-throughput data processing required for robust statistical analysis and data modeling for large numbers of samples with hundreds of unique chemical species. To address this problem, we developed Haystack, a web-based tool designed to visualize, parse, filter, and extract significant features from LCMS datasets rapidly and efficiently. Haystack runs in a browser environment with an intuitive graphical user interface that provides both display and data processing options. Total ion chromatograms (TICs) and base peak chromatograms (BPCs) are automatically displayed, along with time-resolved mass spectra and extracted ion chromatograms (EICs) over any mass range. Output files in the common .csv format can be saved for further statistical analysis or customized graphing. Haystack's core function is a flexible binning procedure that converts the mass dimension of the chromatogram into a set of interval variables that can uniquely identify a sample. Binned mass data can be analyzed by exploratory methods such as principal component analysis (PCA) to model class assignment and identify discriminatory features. The validity of this approach is demonstrated by comparison of a dataset from plants grown at two light conditions with manual and automated peak detection methods. Haystack successfully predicted class assignment based on PCA and cluster analysis, and identified discriminatory features based on analysis of EICs of significant bins. Haystack, a new online tool for rapid processing and analysis of LCMS-based metabolomics data is described. It offers users a range of data visualization options and supports non-biased differential profiling studies through a unique and flexible binning function that provides an alternative to conventional peak deconvolution analysis methods.

Reliability analysis of composite structures

NASA Technical Reports Server (NTRS)

Kan, Han-Pin

1992-01-01

A probabilistic static stress analysis methodology has been developed to estimate the reliability of a composite structure. Closed form stress analysis methods are the primary analytical tools used in this methodology. These structural mechanics methods are used to identify independent variables whose variations significantly affect the performance of the structure. Once these variables are identified, scatter in their values is evaluated and statistically characterized. The scatter in applied loads and the structural parameters are then fitted to appropriate probabilistic distribution functions. Numerical integration techniques are applied to compute the structural reliability. The predicted reliability accounts for scatter due to variability in material strength, applied load, fabrication and assembly processes. The influence of structural geometry and mode of failure are also considerations in the evaluation. Example problems are given to illustrate various levels of analytical complexity.

WASP (Write a Scientific Paper) using Excel - 3: Plotting data.

PubMed

Grech, Victor

2018-02-01

The plotting of data into graphs should be a mandatory step in all data analysis as part of a descriptive statistics exercise, since it gives the researcher an overview of the shape and nature of the data. Moreover, outlier values may be identified, which may be incorrect data, or true outliers, from which important findings (and publications) may arise. This exercise should always precede inferential statistics, when possible, and this paper in the Early Human Development WASP series provides some pointers for doing so in Microsoft Excel™. Copyright © 2018 Elsevier B.V. All rights reserved.

A Prototype System for Retrieval of Gene Functional Information

PubMed Central

Folk, Lillian C.; Patrick, Timothy B.; Pattison, James S.; Wolfinger, Russell D.; Mitchell, Joyce A.

2003-01-01

Microarrays allow researchers to gather data about the expression patterns of thousands of genes simultaneously. Statistical analysis can reveal which genes show statistically significant results. Making biological sense of those results requires the retrieval of functional information about the genes thus identified, typically a manual gene-by-gene retrieval of information from various on-line databases. For experiments generating thousands of genes of interest, retrieval of functional information can become a significant bottleneck. To address this issue, we are currently developing a prototype system to automate the process of retrieval of functional information from multiple on-line sources. PMID:14728346

Evaluation and application of summary statistic imputation to discover new height-associated loci.

PubMed

Rüeger, Sina; McDaid, Aaron; Kutalik, Zoltán

2018-05-01

As most of the heritability of complex traits is attributed to common and low frequency genetic variants, imputing them by combining genotyping chips and large sequenced reference panels is the most cost-effective approach to discover the genetic basis of these traits. Association summary statistics from genome-wide meta-analyses are available for hundreds of traits. Updating these to ever-increasing reference panels is very cumbersome as it requires reimputation of the genetic data, rerunning the association scan, and meta-analysing the results. A much more efficient method is to directly impute the summary statistics, termed as summary statistics imputation, which we improved to accommodate variable sample size across SNVs. Its performance relative to genotype imputation and practical utility has not yet been fully investigated. To this end, we compared the two approaches on real (genotyped and imputed) data from 120K samples from the UK Biobank and show that, genotype imputation boasts a 3- to 5-fold lower root-mean-square error, and better distinguishes true associations from null ones: We observed the largest differences in power for variants with low minor allele frequency and low imputation quality. For fixed false positive rates of 0.001, 0.01, 0.05, using summary statistics imputation yielded a decrease in statistical power by 9, 43 and 35%, respectively. To test its capacity to discover novel associations, we applied summary statistics imputation to the GIANT height meta-analysis summary statistics covering HapMap variants, and identified 34 novel loci, 19 of which replicated using data in the UK Biobank. Additionally, we successfully replicated 55 out of the 111 variants published in an exome chip study. Our study demonstrates that summary statistics imputation is a very efficient and cost-effective way to identify and fine-map trait-associated loci. Moreover, the ability to impute summary statistics is important for follow-up analyses, such as Mendelian randomisation or LD-score regression.

Evaluation and application of summary statistic imputation to discover new height-associated loci

PubMed Central

2018-01-01

As most of the heritability of complex traits is attributed to common and low frequency genetic variants, imputing them by combining genotyping chips and large sequenced reference panels is the most cost-effective approach to discover the genetic basis of these traits. Association summary statistics from genome-wide meta-analyses are available for hundreds of traits. Updating these to ever-increasing reference panels is very cumbersome as it requires reimputation of the genetic data, rerunning the association scan, and meta-analysing the results. A much more efficient method is to directly impute the summary statistics, termed as summary statistics imputation, which we improved to accommodate variable sample size across SNVs. Its performance relative to genotype imputation and practical utility has not yet been fully investigated. To this end, we compared the two approaches on real (genotyped and imputed) data from 120K samples from the UK Biobank and show that, genotype imputation boasts a 3- to 5-fold lower root-mean-square error, and better distinguishes true associations from null ones: We observed the largest differences in power for variants with low minor allele frequency and low imputation quality. For fixed false positive rates of 0.001, 0.01, 0.05, using summary statistics imputation yielded a decrease in statistical power by 9, 43 and 35%, respectively. To test its capacity to discover novel associations, we applied summary statistics imputation to the GIANT height meta-analysis summary statistics covering HapMap variants, and identified 34 novel loci, 19 of which replicated using data in the UK Biobank. Additionally, we successfully replicated 55 out of the 111 variants published in an exome chip study. Our study demonstrates that summary statistics imputation is a very efficient and cost-effective way to identify and fine-map trait-associated loci. Moreover, the ability to impute summary statistics is important for follow-up analyses, such as Mendelian randomisation or LD-score regression. PMID:29782485

Software Reliability, Measurement, and Testing. Volume 2. Guidebook for Software Reliability Measurement and Testing

DTIC Science & Technology

1992-04-01

contractor’s existing data collection, analysis and corrective action system shall be utilized, with modification only as necessary to meet the...either from test or from analysis of field data . The procedures of MIL-STD-756B assume that the reliability of a 18 DEFINE IDENTIFY SOFTWARE LIFE CYCLE...to generate sufficient data to report a statistically valid reliability figure for a class of software. Casual data gathering accumulates data more

A Monte Carlo Analysis of the Thrust Imbalance for the RSRMV Booster During Both the Ignition Transient and Steady State Operation

NASA Technical Reports Server (NTRS)

Foster, Winfred A., Jr.; Crowder, Winston; Steadman, Todd E.

2014-01-01

This paper presents the results of statistical analyses performed to predict the thrust imbalance between two solid rocket motor boosters to be used on the Space Launch System (SLS) vehicle. Two legacy internal ballistics codes developed for the Space Shuttle program were coupled with a Monte Carlo analysis code to determine a thrust imbalance envelope for the SLS vehicle based on the performance of 1000 motor pairs. Thirty three variables which could impact the performance of the motors during the ignition transient and thirty eight variables which could impact the performance of the motors during steady state operation of the motor were identified and treated as statistical variables for the analyses. The effects of motor to motor variation as well as variations between motors of a single pair were included in the analyses. The statistical variations of the variables were defined based on data provided by NASA's Marshall Space Flight Center for the upgraded five segment booster and from the Space Shuttle booster when appropriate. The results obtained for the statistical envelope are compared with the design specification thrust imbalance limits for the SLS launch vehicle

A Monte Carlo Analysis of the Thrust Imbalance for the Space Launch System Booster During Both the Ignition Transient and Steady State Operation

NASA Technical Reports Server (NTRS)

Foster, Winfred A., Jr.; Crowder, Winston; Steadman, Todd E.

2014-01-01

This paper presents the results of statistical analyses performed to predict the thrust imbalance between two solid rocket motor boosters to be used on the Space Launch System (SLS) vehicle. Two legacy internal ballistics codes developed for the Space Shuttle program were coupled with a Monte Carlo analysis code to determine a thrust imbalance envelope for the SLS vehicle based on the performance of 1000 motor pairs. Thirty three variables which could impact the performance of the motors during the ignition transient and thirty eight variables which could impact the performance of the motors during steady state operation of the motor were identified and treated as statistical variables for the analyses. The effects of motor to motor variation as well as variations between motors of a single pair were included in the analyses. The statistical variations of the variables were defined based on data provided by NASA's Marshall Space Flight Center for the upgraded five segment booster and from the Space Shuttle booster when appropriate. The results obtained for the statistical envelope are compared with the design specification thrust imbalance limits for the SLS launch vehicle.

Statistical Quality Control of Moisture Data in GEOS DAS

NASA Technical Reports Server (NTRS)

Dee, D. P.; Rukhovets, L.; Todling, R.

1999-01-01

A new statistical quality control algorithm was recently implemented in the Goddard Earth Observing System Data Assimilation System (GEOS DAS). The final step in the algorithm consists of an adaptive buddy check that either accepts or rejects outlier observations based on a local statistical analysis of nearby data. A basic assumption in any such test is that the observed field is spatially coherent, in the sense that nearby data can be expected to confirm each other. However, the buddy check resulted in excessive rejection of moisture data, especially during the Northern Hemisphere summer. The analysis moisture variable in GEOS DAS is water vapor mixing ratio. Observational evidence shows that the distribution of mixing ratio errors is far from normal. Furthermore, spatial correlations among mixing ratio errors are highly anisotropic and difficult to identify. Both factors contribute to the poor performance of the statistical quality control algorithm. To alleviate the problem, we applied the buddy check to relative humidity data instead. This variable explicitly depends on temperature and therefore exhibits a much greater spatial coherence. As a result, reject rates of moisture data are much more reasonable and homogeneous in time and space.

«

21

22

23

24

25

»

Some links on this page may take you to non-federal websites. Their policies may differ from this site.