ERIC Educational Resources Information Center
King, James M.; And Others
The materials described here represent the conversion of a highly popular student workbook "Sets, Probability and Statistics: The Mathematics of Life Insurance" into a computer program. The program is designed to familiarize students with the concepts of sets, probability, and statistics, and to provide practice using real life examples. It also…
Mayo, Charles; Conners, Steve; Warren, Christopher; Miller, Robert; Court, Laurence; Popple, Richard
2013-01-01
Purpose: With emergence of clinical outcomes databases as tools utilized routinely within institutions, comes need for software tools to support automated statistical analysis of these large data sets and intrainstitutional exchange from independent federated databases to support data pooling. In this paper, the authors present a design approach and analysis methodology that addresses both issues. Methods: A software application was constructed to automate analysis of patient outcomes data using a wide range of statistical metrics, by combining use of C#.Net and R code. The accuracy and speed of the code was evaluated using benchmark data sets. Results: The approach provides data needed to evaluate combinations of statistical measurements for ability to identify patterns of interest in the data. Through application of the tools to a benchmark data set for dose-response threshold and to SBRT lung data sets, an algorithm was developed that uses receiver operator characteristic curves to identify a threshold value and combines use of contingency tables, Fisher exact tests, Welch t-tests, and Kolmogorov-Smirnov tests to filter the large data set to identify values demonstrating dose-response. Kullback-Leibler divergences were used to provide additional confirmation. Conclusions: The work demonstrates the viability of the design approach and the software tool for analysis of large data sets. PMID:24320426
Mayo, Charles; Conners, Steve; Warren, Christopher; Miller, Robert; Court, Laurence; Popple, Richard
2013-11-01
With emergence of clinical outcomes databases as tools utilized routinely within institutions, comes need for software tools to support automated statistical analysis of these large data sets and intrainstitutional exchange from independent federated databases to support data pooling. In this paper, the authors present a design approach and analysis methodology that addresses both issues. A software application was constructed to automate analysis of patient outcomes data using a wide range of statistical metrics, by combining use of C#.Net and R code. The accuracy and speed of the code was evaluated using benchmark data sets. The approach provides data needed to evaluate combinations of statistical measurements for ability to identify patterns of interest in the data. Through application of the tools to a benchmark data set for dose-response threshold and to SBRT lung data sets, an algorithm was developed that uses receiver operator characteristic curves to identify a threshold value and combines use of contingency tables, Fisher exact tests, Welch t-tests, and Kolmogorov-Smirnov tests to filter the large data set to identify values demonstrating dose-response. Kullback-Leibler divergences were used to provide additional confirmation. The work demonstrates the viability of the design approach and the software tool for analysis of large data sets.
Biostatistical and medical statistics graduate education
2014-01-01
The development of graduate education in biostatistics and medical statistics is discussed in the context of training within a medical center setting. The need for medical researchers to employ a wide variety of statistical designs in clinical, genetic, basic science and translational settings justifies the ongoing integration of biostatistical training into medical center educational settings and informs its content. The integration of large data issues are a challenge. PMID:24472088
NASA Technical Reports Server (NTRS)
Howell, Leonard W., Jr.; Six, N. Frank (Technical Monitor)
2002-01-01
The Maximum Likelihood (ML) statistical theory required to estimate spectra information from an arbitrary number of astrophysics data sets produced by vastly different science instruments is developed in this paper. This theory and its successful implementation will facilitate the interpretation of spectral information from multiple astrophysics missions and thereby permit the derivation of superior spectral information based on the combination of data sets. The procedure is of significant value to both existing data sets and those to be produced by future astrophysics missions consisting of two or more detectors by allowing instrument developers to optimize each detector's design parameters through simulation studies in order to design and build complementary detectors that will maximize the precision with which the science objectives may be obtained. The benefits of this ML theory and its application is measured in terms of the reduction of the statistical errors (standard deviations) of the spectra information using the multiple data sets in concert as compared to the statistical errors of the spectra information when the data sets are considered separately, as well as any biases resulting from poor statistics in one or more of the individual data sets that might be reduced when the data sets are combined.
Statistical plant set estimation using Schroeder-phased multisinusoidal input design
NASA Technical Reports Server (NTRS)
Bayard, D. S.
1992-01-01
A frequency domain method is developed for plant set estimation. The estimation of a plant 'set' rather than a point estimate is required to support many methods of modern robust control design. The approach here is based on using a Schroeder-phased multisinusoid input design which has the special property of placing input energy only at the discrete frequency points used in the computation. A detailed analysis of the statistical properties of the frequency domain estimator is given, leading to exact expressions for the probability distribution of the estimation error, and many important properties. It is shown that, for any nominal parametric plant estimate, one can use these results to construct an overbound on the additive uncertainty to any prescribed statistical confidence. The 'soft' bound thus obtained can be used to replace 'hard' bounds presently used in many robust control analysis and synthesis methods.
ERIC Educational Resources Information Center
Parker, Loran Carleton; Gleichsner, Alyssa M.; Adedokun, Omolola A.; Forney, James
2016-01-01
Transformation of research in all biological fields necessitates the design, analysis and, interpretation of large data sets. Preparing students with the requisite skills in experimental design, statistical analysis, and interpretation, and mathematical reasoning will require both curricular reform and faculty who are willing and able to integrate…
Study unique artistic lopburi province for design brass tea set of bantahkrayang community
NASA Astrophysics Data System (ADS)
Pliansiri, V.; Seviset, S.
2017-07-01
The objectives of this study were as follows: 1) to study the production process of handcrafted Brass Tea Set; and 2) to design and develop the handcrafted of Brass Tea Set. The process of design was started by mutual analytical processes and conceptual framework for product design, Quality Function Deployment, Theory of Inventive Problem Solving, Principles of Craft Design, and Principle of Reverse Engineering. The experts in field of both Industrial Product Design and Brass Handicraft Product, have evaluated the Brass Tea Set design and created prototype of Brass tea set by the sample of consumers who have ever bought the Brass Tea Set of Bantahkrayang Community on this research. The statistics methods used were percentage, mean ({{{\\overline X}} = }) and standard deviation (S.D.) 3. To assess consumer satisfaction toward of handcrafted Brass tea set was at the high level.
Statistical molecular design of building blocks for combinatorial chemistry.
Linusson, A; Gottfries, J; Lindgren, F; Wold, S
2000-04-06
The reduction of the size of a combinatorial library can be made in two ways, either base the selection on the building blocks (BB's) or base it on the full set of virtually constructed products. In this paper we have investigated the effects of applying statistical designs to BB sets compared to selections based on the final products. The two sets of BB's and the virtually constructed library were described by structural parameters, and the correlation between the two characterizations was investigated. Three different selection approaches were used both for the BB sets and for the products. In the first two the selection algorithms were applied directly to the data sets (D-optimal design and space-filling design), while for the third a cluster analysis preceded the selection (cluster-based design). The selections were compared using visual inspection, the Tanimoto coefficient, the Euclidean distance, the condition number, and the determinant of the resulting data matrix. No difference in efficiency was found between selections made in the BB space and in the product space. However, it is of critical importance to investigate the BB space carefully and to select an appropriate number of BB's to result in an adequate diversity. An example from the pharmaceutical industry is then presented, where selection via BB's was made using a cluster-based design.
1993-08-01
subtitled "Simulation Data," consists of detailed infonrnation on the design parmneter variations tested, subsequent statistical analyses conducted...used with confidence during the design process. The data quality can be examined in various forms such as statistical analyses of measure of merit data...merit, such as time to capture or nmaximurn pitch rate, can be calculated from the simulation time history data. Statistical techniques are then used
The Impact of Student-Directed Projects in Introductory Statistics
ERIC Educational Resources Information Center
Spence, Dianna J.; Bailey, Brad; Sharp, Julia L.
2017-01-01
A multi-year study investigated the impact of incorporating student-directed discovery projects into introductory statistics courses. Pilot instructors at institutions across the United States taught statistics implementing student-directed projects with the help of a common set of instructional materials designed to facilitate such projects.…
Statistical issues in quality control of proteomic analyses: good experimental design and planning.
Cairns, David A
2011-03-01
Quality control is becoming increasingly important in proteomic investigations as experiments become more multivariate and quantitative. Quality control applies to all stages of an investigation and statistics can play a key role. In this review, the role of statistical ideas in the design and planning of an investigation is described. This involves the design of unbiased experiments using key concepts from statistical experimental design, the understanding of the biological and analytical variation in a system using variance components analysis and the determination of a required sample size to perform a statistically powerful investigation. These concepts are described through simple examples and an example data set from a 2-D DIGE pilot experiment. Each of these concepts can prove useful in producing better and more reproducible data. Copyright © 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Study/experimental/research design: much more than statistics.
Knight, Kenneth L
2010-01-01
The purpose of study, experimental, or research design in scientific manuscripts has changed significantly over the years. It has evolved from an explanation of the design of the experiment (ie, data gathering or acquisition) to an explanation of the statistical analysis. This practice makes "Methods" sections hard to read and understand. To clarify the difference between study design and statistical analysis, to show the advantages of a properly written study design on article comprehension, and to encourage authors to correctly describe study designs. The role of study design is explored from the introduction of the concept by Fisher through modern-day scientists and the AMA Manual of Style. At one time, when experiments were simpler, the study design and statistical design were identical or very similar. With the complex research that is common today, which often includes manipulating variables to create new variables and the multiple (and different) analyses of a single data set, data collection is very different than statistical design. Thus, both a study design and a statistical design are necessary. Scientific manuscripts will be much easier to read and comprehend. A proper experimental design serves as a road map to the study methods, helping readers to understand more clearly how the data were obtained and, therefore, assisting them in properly analyzing the results.
Design of off-statistics axial-flow fans by means of vortex law optimization
NASA Astrophysics Data System (ADS)
Lazari, Andrea; Cattanei, Andrea
2014-12-01
Off-statistics input data sets are common in axial-flow fans design and may easily result in some violation of the requirements of a good aerodynamic blade design. In order to circumvent this problem, in the present paper, a solution to the radial equilibrium equation is found which minimizes the outlet kinetic energy and fulfills the aerodynamic constraints, thus ensuring that the resulting blade has acceptable aerodynamic performance. The presented method is based on the optimization of a three-parameters vortex law and of the meridional channel size. The aerodynamic quantities to be employed as constraints are individuated and their suitable ranges of variation are proposed. The method is validated by means of a design with critical input data values and CFD analysis. Then, by means of systematic computations with different input data sets, some correlations and charts are obtained which are analogous to classic correlations based on statistical investigations on existing machines. Such new correlations help size a fan of given characteristics as well as study the feasibility of a given design.
A new statistical method for design and analyses of component tolerance
NASA Astrophysics Data System (ADS)
Movahedi, Mohammad Mehdi; Khounsiavash, Mohsen; Otadi, Mahmood; Mosleh, Maryam
2017-03-01
Tolerancing conducted by design engineers to meet customers' needs is a prerequisite for producing high-quality products. Engineers use handbooks to conduct tolerancing. While use of statistical methods for tolerancing is not something new, engineers often use known distributions, including the normal distribution. Yet, if the statistical distribution of the given variable is unknown, a new statistical method will be employed to design tolerance. In this paper, we use generalized lambda distribution for design and analyses component tolerance. We use percentile method (PM) to estimate the distribution parameters. The findings indicated that, when the distribution of the component data is unknown, the proposed method can be used to expedite the design of component tolerance. Moreover, in the case of assembled sets, more extensive tolerance for each component with the same target performance can be utilized.
ERIC Educational Resources Information Center
Bliss, Leonard B.; Tashakkori, Abbas
This paper discusses the objectives that would be appropriate for statistics classes for students who are not majoring in statistics, evaluation, or quantitative research design. These "non-majors" should be able to choose appropriate analytical methods for specific sets of data based on the research question and the nature of the data, and they…
Engineering Students Designing a Statistical Procedure for Quantifying Variability
ERIC Educational Resources Information Center
Hjalmarson, Margret A.
2007-01-01
The study examined first-year engineering students' responses to a statistics task that asked them to generate a procedure for quantifying variability in a data set from an engineering context. Teams used technological tools to perform computations, and their final product was a ranking procedure. The students could use any statistical measures,…
Olives, Casey; Valadez, Joseph J; Brooker, Simon J; Pagano, Marcello
2012-01-01
Originally a binary classifier, Lot Quality Assurance Sampling (LQAS) has proven to be a useful tool for classification of the prevalence of Schistosoma mansoni into multiple categories (≤10%, >10 and <50%, ≥50%), and semi-curtailed sampling has been shown to effectively reduce the number of observations needed to reach a decision. To date the statistical underpinnings for Multiple Category-LQAS (MC-LQAS) have not received full treatment. We explore the analytical properties of MC-LQAS, and validate its use for the classification of S. mansoni prevalence in multiple settings in East Africa. We outline MC-LQAS design principles and formulae for operating characteristic curves. In addition, we derive the average sample number for MC-LQAS when utilizing semi-curtailed sampling and introduce curtailed sampling in this setting. We also assess the performance of MC-LQAS designs with maximum sample sizes of n=15 and n=25 via a weighted kappa-statistic using S. mansoni data collected in 388 schools from four studies in East Africa. Overall performance of MC-LQAS classification was high (kappa-statistic of 0.87). In three of the studies, the kappa-statistic for a design with n=15 was greater than 0.75. In the fourth study, where these designs performed poorly (kappa-statistic less than 0.50), the majority of observations fell in regions where potential error is known to be high. Employment of semi-curtailed and curtailed sampling further reduced the sample size by as many as 0.5 and 3.5 observations per school, respectively, without increasing classification error. This work provides the needed analytics to understand the properties of MC-LQAS for assessing the prevalance of S. mansoni and shows that in most settings a sample size of 15 children provides a reliable classification of schools.
Study/Experimental/Research Design: Much More Than Statistics
Knight, Kenneth L.
2010-01-01
Abstract Context: The purpose of study, experimental, or research design in scientific manuscripts has changed significantly over the years. It has evolved from an explanation of the design of the experiment (ie, data gathering or acquisition) to an explanation of the statistical analysis. This practice makes “Methods” sections hard to read and understand. Objective: To clarify the difference between study design and statistical analysis, to show the advantages of a properly written study design on article comprehension, and to encourage authors to correctly describe study designs. Description: The role of study design is explored from the introduction of the concept by Fisher through modern-day scientists and the AMA Manual of Style. At one time, when experiments were simpler, the study design and statistical design were identical or very similar. With the complex research that is common today, which often includes manipulating variables to create new variables and the multiple (and different) analyses of a single data set, data collection is very different than statistical design. Thus, both a study design and a statistical design are necessary. Advantages: Scientific manuscripts will be much easier to read and comprehend. A proper experimental design serves as a road map to the study methods, helping readers to understand more clearly how the data were obtained and, therefore, assisting them in properly analyzing the results. PMID:20064054
Highly Efficient Design-of-Experiments Methods for Combining CFD Analysis and Experimental Data
NASA Technical Reports Server (NTRS)
Anderson, Bernhard H.; Haller, Harold S.
2009-01-01
It is the purpose of this study to examine the impact of "highly efficient" Design-of-Experiments (DOE) methods for combining sets of CFD generated analysis data with smaller sets of Experimental test data in order to accurately predict performance results where experimental test data were not obtained. The study examines the impact of micro-ramp flow control on the shock wave boundary layer (SWBL) interaction where a complete paired set of data exist from both CFD analysis and Experimental measurements By combining the complete set of CFD analysis data composed of fifteen (15) cases with a smaller subset of experimental test data containing four/five (4/5) cases, compound data sets (CFD/EXP) were generated which allows the prediction of the complete set of Experimental results No statistical difference were found to exist between the combined (CFD/EXP) generated data sets and the complete Experimental data set composed of fifteen (15) cases. The same optimal micro-ramp configuration was obtained using the (CFD/EXP) generated data as obtained with the complete set of Experimental data, and the DOE response surfaces generated by the two data sets were also not statistically different.
GLIMMPSE Lite: Calculating Power and Sample Size on Smartphone Devices
Munjal, Aarti; Sakhadeo, Uttara R.; Muller, Keith E.; Glueck, Deborah H.; Kreidler, Sarah M.
2014-01-01
Researchers seeking to develop complex statistical applications for mobile devices face a common set of difficult implementation issues. In this work, we discuss general solutions to the design challenges. We demonstrate the utility of the solutions for a free mobile application designed to provide power and sample size calculations for univariate, one-way analysis of variance (ANOVA), GLIMMPSE Lite. Our design decisions provide a guide for other scientists seeking to produce statistical software for mobile platforms. PMID:25541688
SSD for R: A Comprehensive Statistical Package to Analyze Single-System Data
ERIC Educational Resources Information Center
Auerbach, Charles; Schudrich, Wendy Zeitlin
2013-01-01
The need for statistical analysis in single-subject designs presents a challenge, as analytical methods that are applied to group comparison studies are often not appropriate in single-subject research. "SSD for R" is a robust set of statistical functions with wide applicability to single-subject research. It is a comprehensive package…
Rank-based permutation approaches for non-parametric factorial designs.
Umlauft, Maria; Konietschke, Frank; Pauly, Markus
2017-11-01
Inference methods for null hypotheses formulated in terms of distribution functions in general non-parametric factorial designs are studied. The methods can be applied to continuous, ordinal or even ordered categorical data in a unified way, and are based only on ranks. In this set-up Wald-type statistics and ANOVA-type statistics are the current state of the art. The first method is asymptotically exact but a rather liberal statistical testing procedure for small to moderate sample size, while the latter is only an approximation which does not possess the correct asymptotic α level under the null. To bridge these gaps, a novel permutation approach is proposed which can be seen as a flexible generalization of the Kruskal-Wallis test to all kinds of factorial designs with independent observations. It is proven that the permutation principle is asymptotically correct while keeping its finite exactness property when data are exchangeable. The results of extensive simulation studies foster these theoretical findings. A real data set exemplifies its applicability. © 2017 The British Psychological Society.
ERIC Educational Resources Information Center
O'Brien, Mark
2011-01-01
The appropriateness of using statistical data to inform the design of any given service development or initiative often depends upon judgements regarding scale. Large-scale data sets, perhaps national in scope, whilst potentially important in informing the design, implementation and roll-out of experimental initiatives, will often remain unused…
The microcomputer scientific software series 3: general linear model--analysis of variance.
Harold M. Rauscher
1985-01-01
A BASIC language set of programs, designed for use on microcomputers, is presented. This set of programs will perform the analysis of variance for any statistical model describing either balanced or unbalanced designs. The program computes and displays the degrees of freedom, Type I sum of squares, and the mean square for the overall model, the error, and each factor...
ERIC Educational Resources Information Center
Olsen, Robert J.
2008-01-01
I describe how data pooling and data visualization can be employed in the first-semester general chemistry laboratory to introduce core statistical concepts such as central tendency and dispersion of a data set. The pooled data are plotted as a 1-D scatterplot, a purpose-designed number line through which statistical features of the data are…
Introductory Life Science Mathematics and Quantitative Neuroscience Courses
ERIC Educational Resources Information Center
Duffus, Dwight; Olifer, Andrei
2010-01-01
We describe two sets of courses designed to enhance the mathematical, statistical, and computational training of life science undergraduates at Emory College. The first course is an introductory sequence in differential and integral calculus, modeling with differential equations, probability, and inferential statistics. The second is an…
Thompson, Steven K
2006-12-01
A flexible class of adaptive sampling designs is introduced for sampling in network and spatial settings. In the designs, selections are made sequentially with a mixture distribution based on an active set that changes as the sampling progresses, using network or spatial relationships as well as sample values. The new designs have certain advantages compared with previously existing adaptive and link-tracing designs, including control over sample sizes and of the proportion of effort allocated to adaptive selections. Efficient inference involves averaging over sample paths consistent with the minimal sufficient statistic. A Markov chain resampling method makes the inference computationally feasible. The designs are evaluated in network and spatial settings using two empirical populations: a hidden human population at high risk for HIV/AIDS and an unevenly distributed bird population.
Quantitative knowledge acquisition for expert systems
NASA Technical Reports Server (NTRS)
Belkin, Brenda L.; Stengel, Robert F.
1991-01-01
A common problem in the design of expert systems is the definition of rules from data obtained in system operation or simulation. While it is relatively easy to collect data and to log the comments of human operators engaged in experiments, generalizing such information to a set of rules has not previously been a direct task. A statistical method is presented for generating rule bases from numerical data, motivated by an example based on aircraft navigation with multiple sensors. The specific objective is to design an expert system that selects a satisfactory suite of measurements from a dissimilar, redundant set, given an arbitrary navigation geometry and possible sensor failures. The systematic development is described of a Navigation Sensor Management (NSM) Expert System from Kalman Filter convariance data. The method invokes two statistical techniques: Analysis of Variance (ANOVA) and the ID3 Algorithm. The ANOVA technique indicates whether variations of problem parameters give statistically different covariance results, and the ID3 algorithms identifies the relationships between the problem parameters using probabilistic knowledge extracted from a simulation example set. Both are detailed.
Library Statistical Data Base Formats and Definitions.
ERIC Educational Resources Information Center
Jones, Dennis; And Others
Represented are the detailed set of data structures relevant to the categorization of information, terminology, and definitions employed in the design of the library statistical data base. The data base, or management information system, provides administrators with a framework of information and standardized data for library management, planning,…
Computer Assisted Problem Solving in an Introductory Statistics Course. Technical Report No. 56.
ERIC Educational Resources Information Center
Anderson, Thomas H.; And Others
The computer assisted problem solving system (CAPS) described in this booklet administered "homework" problem sets designed to develop students' computational, estimation, and procedural skills. These skills were related to important concepts in an introductory statistics course. CAPS generated unique data, judged student performance,…
Online Updating of Statistical Inference in the Big Data Setting.
Schifano, Elizabeth D; Wu, Jing; Wang, Chun; Yan, Jun; Chen, Ming-Hui
2016-01-01
We present statistical methods for big data arising from online analytical processing, where large amounts of data arrive in streams and require fast analysis without storage/access to the historical data. In particular, we develop iterative estimating algorithms and statistical inferences for linear models and estimating equations that update as new data arrive. These algorithms are computationally efficient, minimally storage-intensive, and allow for possible rank deficiencies in the subset design matrices due to rare-event covariates. Within the linear model setting, the proposed online-updating framework leads to predictive residual tests that can be used to assess the goodness-of-fit of the hypothesized model. We also propose a new online-updating estimator under the estimating equation setting. Theoretical properties of the goodness-of-fit tests and proposed estimators are examined in detail. In simulation studies and real data applications, our estimator compares favorably with competing approaches under the estimating equation setting.
Statistics based sampling for controller and estimator design
NASA Astrophysics Data System (ADS)
Tenne, Dirk
The purpose of this research is the development of statistical design tools for robust feed-forward/feedback controllers and nonlinear estimators. This dissertation is threefold and addresses the aforementioned topics nonlinear estimation, target tracking and robust control. To develop statistically robust controllers and nonlinear estimation algorithms, research has been performed to extend existing techniques, which propagate the statistics of the state, to achieve higher order accuracy. The so-called unscented transformation has been extended to capture higher order moments. Furthermore, higher order moment update algorithms based on a truncated power series have been developed. The proposed techniques are tested on various benchmark examples. Furthermore, the unscented transformation has been utilized to develop a three dimensional geometrically constrained target tracker. The proposed planar circular prediction algorithm has been developed in a local coordinate framework, which is amenable to extension of the tracking algorithm to three dimensional space. This tracker combines the predictions of a circular prediction algorithm and a constant velocity filter by utilizing the Covariance Intersection. This combined prediction can be updated with the subsequent measurement using a linear estimator. The proposed technique is illustrated on a 3D benchmark trajectory, which includes coordinated turns and straight line maneuvers. The third part of this dissertation addresses the design of controller which include knowledge of parametric uncertainties and their distributions. The parameter distributions are approximated by a finite set of points which are calculated by the unscented transformation. This set of points is used to design robust controllers which minimize a statistical performance of the plant over the domain of uncertainty consisting of a combination of the mean and variance. The proposed technique is illustrated on three benchmark problems. The first relates to the design of prefilters for a linear and nonlinear spring-mass-dashpot system and the second applies a feedback controller to a hovering helicopter. Lastly, the statistical robust controller design is devoted to a concurrent feed-forward/feedback controller structure for a high-speed low tension tape drive.
NASA Astrophysics Data System (ADS)
Andersson, C. David; Hillgren, J. Mikael; Lindgren, Cecilia; Qian, Weixing; Akfur, Christine; Berg, Lotta; Ekström, Fredrik; Linusson, Anna
2015-03-01
Scientific disciplines such as medicinal- and environmental chemistry, pharmacology, and toxicology deal with the questions related to the effects small organic compounds exhort on biological targets and the compounds' physicochemical properties responsible for these effects. A common strategy in this endeavor is to establish structure-activity relationships (SARs). The aim of this work was to illustrate benefits of performing a statistical molecular design (SMD) and proper statistical analysis of the molecules' properties before SAR and quantitative structure-activity relationship (QSAR) analysis. Our SMD followed by synthesis yielded a set of inhibitors of the enzyme acetylcholinesterase (AChE) that had very few inherent dependencies between the substructures in the molecules. If such dependencies exist, they cause severe errors in SAR interpretation and predictions by QSAR-models, and leave a set of molecules less suitable for future decision-making. In our study, SAR- and QSAR models could show which molecular sub-structures and physicochemical features that were advantageous for the AChE inhibition. Finally, the QSAR model was used for the prediction of the inhibition of AChE by an external prediction set of molecules. The accuracy of these predictions was asserted by statistical significance tests and by comparisons to simple but relevant reference models.
Olives, Casey; Valadez, Joseph J.; Brooker, Simon J.; Pagano, Marcello
2012-01-01
Background Originally a binary classifier, Lot Quality Assurance Sampling (LQAS) has proven to be a useful tool for classification of the prevalence of Schistosoma mansoni into multiple categories (≤10%, >10 and <50%, ≥50%), and semi-curtailed sampling has been shown to effectively reduce the number of observations needed to reach a decision. To date the statistical underpinnings for Multiple Category-LQAS (MC-LQAS) have not received full treatment. We explore the analytical properties of MC-LQAS, and validate its use for the classification of S. mansoni prevalence in multiple settings in East Africa. Methodology We outline MC-LQAS design principles and formulae for operating characteristic curves. In addition, we derive the average sample number for MC-LQAS when utilizing semi-curtailed sampling and introduce curtailed sampling in this setting. We also assess the performance of MC-LQAS designs with maximum sample sizes of n = 15 and n = 25 via a weighted kappa-statistic using S. mansoni data collected in 388 schools from four studies in East Africa. Principle Findings Overall performance of MC-LQAS classification was high (kappa-statistic of 0.87). In three of the studies, the kappa-statistic for a design with n = 15 was greater than 0.75. In the fourth study, where these designs performed poorly (kappa-statistic less than 0.50), the majority of observations fell in regions where potential error is known to be high. Employment of semi-curtailed and curtailed sampling further reduced the sample size by as many as 0.5 and 3.5 observations per school, respectively, without increasing classification error. Conclusion/Significance This work provides the needed analytics to understand the properties of MC-LQAS for assessing the prevalance of S. mansoni and shows that in most settings a sample size of 15 children provides a reliable classification of schools. PMID:22970333
Using Design Capability Indices to Satisfy Ranged Sets of Design Requirements
NASA Technical Reports Server (NTRS)
Chen, Wei; Allen, Janet K.; Simpson, Timothy W.; Mistree, Farrokh
1996-01-01
For robust design it is desirable to allow the design requirements to vary within a certain range rather than setting point targets. This is particularly important during the early stages of design when little is known about the system and its requirements. Toward this end, design capability indices are developed in this paper to assess the capability of a family of designs, represented by a range of top-level design specifications, to satisfy a ranged set of design requirements. Design capability indices are based on process capability indices from statistical process control and provide a single objective, alternate approach to the use of Taguchi's signal-to- noise ratio which is often used for robust design. Successful implementation of design capability indices ensures that a family of designs conforms to a given ranged set of design requirements. To demonstrate an application and the usefulness of design capability indices, the design of a solar powered irrigation system is presented. Our focus in this paper is on the development and implementation of design capability indices as an alternate approach to the use of the signal-to-noise ratio and not on the results of the example problem, per se.
47 CFR 1.363 - Introduction of statistical data.
Code of Federal Regulations, 2010 CFR
2010-10-01
... case of sample surveys, there shall be a clear description of the survey design, including the... evidence in common carrier hearing proceedings, including but not limited to sample surveys, econometric... description of the experimental design shall be set forth, including a specification of the controlled...
47 CFR 1.363 - Introduction of statistical data.
Code of Federal Regulations, 2013 CFR
2013-10-01
... case of sample surveys, there shall be a clear description of the survey design, including the... evidence in common carrier hearing proceedings, including but not limited to sample surveys, econometric... description of the experimental design shall be set forth, including a specification of the controlled...
47 CFR 1.363 - Introduction of statistical data.
Code of Federal Regulations, 2014 CFR
2014-10-01
... case of sample surveys, there shall be a clear description of the survey design, including the... evidence in common carrier hearing proceedings, including but not limited to sample surveys, econometric... description of the experimental design shall be set forth, including a specification of the controlled...
47 CFR 1.363 - Introduction of statistical data.
Code of Federal Regulations, 2012 CFR
2012-10-01
... case of sample surveys, there shall be a clear description of the survey design, including the... evidence in common carrier hearing proceedings, including but not limited to sample surveys, econometric... description of the experimental design shall be set forth, including a specification of the controlled...
47 CFR 1.363 - Introduction of statistical data.
Code of Federal Regulations, 2011 CFR
2011-10-01
... case of sample surveys, there shall be a clear description of the survey design, including the... evidence in common carrier hearing proceedings, including but not limited to sample surveys, econometric... description of the experimental design shall be set forth, including a specification of the controlled...
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wurtz, R.; Kaplan, A.
Pulse shape discrimination (PSD) is a variety of statistical classifier. Fully-realized statistical classifiers rely on a comprehensive set of tools for designing, building, and implementing. PSD advances rely on improvements to the implemented algorithm. PSD advances can be improved by using conventional statistical classifier or machine learning methods. This paper provides the reader with a glossary of classifier-building elements and their functions in a fully-designed and operational classifier framework that can be used to discover opportunities for improving PSD classifier projects. This paper recommends reporting the PSD classifier’s receiver operating characteristic (ROC) curve and its behavior at a gamma rejectionmore » rate (GRR) relevant for realistic applications.« less
Westfall, Jacob; Kenny, David A; Judd, Charles M
2014-10-01
Researchers designing experiments in which a sample of participants responds to a sample of stimuli are faced with difficult questions about optimal study design. The conventional procedures of statistical power analysis fail to provide appropriate answers to these questions because they are based on statistical models in which stimuli are not assumed to be a source of random variation in the data, models that are inappropriate for experiments involving crossed random factors of participants and stimuli. In this article, we present new methods of power analysis for designs with crossed random factors, and we give detailed, practical guidance to psychology researchers planning experiments in which a sample of participants responds to a sample of stimuli. We extensively examine 5 commonly used experimental designs, describe how to estimate statistical power in each, and provide power analysis results based on a reasonable set of default parameter values. We then develop general conclusions and formulate rules of thumb concerning the optimal design of experiments in which a sample of participants responds to a sample of stimuli. We show that in crossed designs, statistical power typically does not approach unity as the number of participants goes to infinity but instead approaches a maximum attainable power value that is possibly small, depending on the stimulus sample. We also consider the statistical merits of designs involving multiple stimulus blocks. Finally, we provide a simple and flexible Web-based power application to aid researchers in planning studies with samples of stimuli.
Wolstenholme, Daniel; Downes, Tom; Leaver, Jackie; Partridge, Rebecca; Langley, Joseph
2014-01-01
Advances in surgical and medical management have significantly reduced the length of time that patients with spinal cord injury (SCI) have to stay in hospital, but has left patients with potentially less time to psychologically adjust. Following a pilot in 2012, this project was designed to test the effect of "design thinking" workshops on the self-efficacy of people undergoing rehabilitation following spinal injuries. Design thinking is about understanding the approaches and methods that designers use and then applying these to think creatively about problems and suggest ways to solve them. In this instance, design thinking is not about designing new products (although the approaches can be used to do this) but about developing a long term creative and explorative mind-set through skills such as lateral thinking, prototyping and verbal and visual communication. The principles of "design thinking" have underpinned design education and practice for many years, it is also recognised in business and innovation for example, but a literature review indicated that there was no evidence of it being used in rehabilitation or spinal injury settings. Twenty participants took part in the study; 13 (65%) were male and the average age was 37 years (range 16 to 72). Statistically significant improvements were seen for EQ-5D score (t = -3.13, p = 0.007) and Patient Activation Measure score (t = -3.85, p = 0.001). Other outcome measures improved but not statistically. There were no statistical effects on length of stay or readmission rates, but qualitative interviews indicated improved patient experience.
Learning Compositional Simulation Models
2010-01-01
techniques developed by social scientists, economists, and medical researchers over the past four decades. Quasi-experimental designs (QEDs) are...statistical techniques from the social sciences known as quasi- experimental design (QED). QEDs allow a researcher to exploit unique characteristics...can be grouped under the rubric “quasi-experimental design ” (QED), and they attempt to exploit inherent characteristics of observational data sets
An, Ming-Wen; Lu, Xin; Sargent, Daniel J; Mandrekar, Sumithra J
2015-01-01
A phase II design with an option for direct assignment (stop randomization and assign all patients to experimental treatment based on interim analysis, IA) for a predefined subgroup was previously proposed. Here, we illustrate the modularity of the direct assignment option by applying it to the setting of two predefined subgroups and testing for separate subgroup main effects. We power the 2-subgroup direct assignment option design with 1 IA (DAD-1) to test for separate subgroup main effects, with assessment of power to detect an interaction in a post-hoc test. Simulations assessed the statistical properties of this design compared to the 2-subgroup balanced randomized design with 1 IA, BRD-1. Different response rates for treatment/control in subgroup 1 (0.4/0.2) and in subgroup 2 (0.1/0.2, 0.4/0.2) were considered. The 2-subgroup DAD-1 preserves power and type I error rate compared to the 2-subgroup BRD-1, while exhibiting reasonable power in a post-hoc test for interaction. The direct assignment option is a flexible design component that can be incorporated into broader design frameworks, while maintaining desirable statistical properties, clinical appeal, and logistical simplicity.
Design of partially supervised classifiers for multispectral image data
NASA Technical Reports Server (NTRS)
Jeon, Byeungwoo; Landgrebe, David
1993-01-01
A partially supervised classification problem is addressed, especially when the class definition and corresponding training samples are provided a priori only for just one particular class. In practical applications of pattern classification techniques, a frequently observed characteristic is the heavy, often nearly impossible requirements on representative prior statistical class characteristics of all classes in a given data set. Considering the effort in both time and man-power required to have a well-defined, exhaustive list of classes with a corresponding representative set of training samples, this 'partially' supervised capability would be very desirable, assuming adequate classifier performance can be obtained. Two different classification algorithms are developed to achieve simplicity in classifier design by reducing the requirement of prior statistical information without sacrificing significant classifying capability. The first one is based on optimal significance testing, where the optimal acceptance probability is estimated directly from the data set. In the second approach, the partially supervised classification is considered as a problem of unsupervised clustering with initially one known cluster or class. A weighted unsupervised clustering procedure is developed to automatically define other classes and estimate their class statistics. The operational simplicity thus realized should make these partially supervised classification schemes very viable tools in pattern classification.
2016 American Indian/Alaska Native/Native Hawaiian Areas (AIANNH) Michigan, Minnesota, and Wisconsin
The TIGER/Line shapefiles and related database files (.dbf) are an extract of selected geographic and cartographic information from the U.S. Census Bureau's Master Address File / Topologically Integrated Geographic Encoding and Referencing (MAF/TIGER) Database (MTDB). The MTDB represents a seamless national file with no overlaps or gaps between parts, however, each TIGER/Line shapefile is designed to stand alone as an independent data set, or they can be combined to cover the entire nation. The American Indian/Alaska Native/Native Hawaiian (AIANNH) Areas Shapefile includes the following legal entities: federally recognized American Indian reservations and off-reservation trust land areas, state-recognized American Indian reservations, and Hawaiian home lands (HHLs). The statistical entities included are Alaska Native village statistical areas (ANVSAs), Oklahoma tribal statistical areas (OTSAs), tribal designated statistical areas (TDSAs), and state designated tribal statistical areas (SDTSAs). Joint use areas are also included in this shapefile refer to areas that are administered jointly and/or claimed by two or more American Indian tribes. The Census Bureau designates both legal and statistical joint use areas as unique geographic entities for the purpose of presenting statistical data. The Bureau of Indian Affairs (BIA) within the U.S. Department of the Interior (DOI) provides the list of federally recognized tribes and only provides legal boundary infor
Masoumi, Hamid Reza Fard; Basri, Mahiran; Kassim, Anuar; Abdullah, Dzulkefly Kuang; Abdollahi, Yadollah; Abd Gani, Siti Salwa; Rezaee, Malahat
2013-01-01
Lipase-catalyzed production of triethanolamine-based esterquat by esterification of oleic acid (OA) with triethanolamine (TEA) in n-hexane was performed in 2 L stirred-tank reactor. A set of experiments was designed by central composite design to process modeling and statistically evaluate the findings. Five independent process variables, including enzyme amount, reaction time, reaction temperature, substrates molar ratio of OA to TEA, and agitation speed, were studied under the given conditions designed by Design Expert software. Experimental data were examined for normality test before data processing stage and skewness and kurtosis indices were determined. The mathematical model developed was found to be adequate and statistically accurate to predict the optimum conversion of product. Response surface methodology with central composite design gave the best performance in this study, and the methodology as a whole has been proven to be adequate for the design and optimization of the enzymatic process.
NASA Technical Reports Server (NTRS)
Howell, L. W.
2001-01-01
A simple power law model consisting of a single spectral index alpha-1 is believed to be an adequate description of the galactic cosmic-ray (GCR) proton flux at energies below 10(exp 13) eV. Two procedures for estimating alpha-1 the method of moments and maximum likelihood (ML), are developed and their statistical performance compared. It is concluded that the ML procedure attains the most desirable statistical properties and is hence the recommended statistical estimation procedure for estimating alpha-1. The ML procedure is then generalized for application to a set of real cosmic-ray data and thereby makes this approach applicable to existing cosmic-ray data sets. Several other important results, such as the relationship between collecting power and detector energy resolution, as well as inclusion of a non-Gaussian detector response function, are presented. These results have many practical benefits in the design phase of a cosmic-ray detector as they permit instrument developers to make important trade studies in design parameters as a function of one of the science objectives. This is particularly important for space-based detectors where physical parameters, such as dimension and weight, impose rigorous practical limits to the design envelope.
Optimal Testlet Pool Assembly for Multistage Testing Designs
ERIC Educational Resources Information Center
Ariel, Adelaide; Veldkamp, Bernard P.; Breithaupt, Krista
2006-01-01
Computerized multistage testing (MST) designs require sets of test questions (testlets) to be assembled to meet strict, often competing criteria. Rules that govern testlet assembly may dictate the number of questions on a particular subject or may describe desirable statistical properties for the test, such as measurement precision. In an MST…
ERIC Educational Resources Information Center
Iyioke, Ifeoma Chika
2013-01-01
This dissertation describes a design for training, in accordance with probability judgment heuristics principles, for the Angoff standard setting method. The new training with instruction, practice, and feedback tailored to the probability judgment heuristics principles was called the Heuristic training and the prevailing Angoff method training…
Maximum unbiased validation (MUV) data sets for virtual screening based on PubChem bioactivity data.
Rohrer, Sebastian G; Baumann, Knut
2009-02-01
Refined nearest neighbor analysis was recently introduced for the analysis of virtual screening benchmark data sets. It constitutes a technique from the field of spatial statistics and provides a mathematical framework for the nonparametric analysis of mapped point patterns. Here, refined nearest neighbor analysis is used to design benchmark data sets for virtual screening based on PubChem bioactivity data. A workflow is devised that purges data sets of compounds active against pharmaceutically relevant targets from unselective hits. Topological optimization using experimental design strategies monitored by refined nearest neighbor analysis functions is applied to generate corresponding data sets of actives and decoys that are unbiased with regard to analogue bias and artificial enrichment. These data sets provide a tool for Maximum Unbiased Validation (MUV) of virtual screening methods. The data sets and a software package implementing the MUV design workflow are freely available at http://www.pharmchem.tu-bs.de/lehre/baumann/MUV.html.
Putting engineering back into protein engineering: bioinformatic approaches to catalyst design.
Gustafsson, Claes; Govindarajan, Sridhar; Minshull, Jeremy
2003-08-01
Complex multivariate engineering problems are commonplace and not unique to protein engineering. Mathematical and data-mining tools developed in other fields of engineering have now been applied to analyze sequence-activity relationships of peptides and proteins and to assist in the design of proteins and peptides with specified properties. Decreasing costs of DNA sequencing in conjunction with methods to quickly synthesize statistically representative sets of proteins allow modern heuristic statistics to be applied to protein engineering. This provides an alternative approach to expensive assays or unreliable high-throughput surrogate screens.
2010 Anthropometric Survey of U.S. Marine Corps Personnel: Methods and Summary Statistics
2013-06-01
models for the ergonomic design of working environments. Today, the entire production chain for a piece of clothing, beginning with the design and...Corps 382 crewstations and workstations. Digital models are increasingly used in the design process for seated and standing workstations, as well...International Standards for Ergonomic Design : These dimensions are useful for comparing data sets between nations, and are measured according to
Pareto fronts for multiobjective optimization design on materials data
NASA Astrophysics Data System (ADS)
Gopakumar, Abhijith; Balachandran, Prasanna; Gubernatis, James E.; Lookman, Turab
Optimizing multiple properties simultaneously is vital in materials design. Here we apply infor- mation driven, statistical optimization strategies blended with machine learning methods, to address multi-objective optimization tasks on materials data. These strategies aim to find the Pareto front consisting of non-dominated data points from a set of candidate compounds with known character- istics. The objective is to find the pareto front in as few additional measurements or calculations as possible. We show how exploration of the data space to find the front is achieved by using uncer- tainties in predictions from regression models. We test our proposed design strategies on multiple, independent data sets including those from computations as well as experiments. These include data sets for Max phases, piezoelectrics and multicomponent alloys.
ZERODUR: deterministic approach for strength design
NASA Astrophysics Data System (ADS)
Hartmann, Peter
2012-12-01
There is an increasing request for zero expansion glass ceramic ZERODUR substrates being capable of enduring higher operational static loads or accelerations. The integrity of structures such as optical or mechanical elements for satellites surviving rocket launches, filigree lightweight mirrors, wobbling mirrors, and reticle and wafer stages in microlithography must be guaranteed with low failure probability. Their design requires statistically relevant strength data. The traditional approach using the statistical two-parameter Weibull distribution suffered from two problems. The data sets were too small to obtain distribution parameters with sufficient accuracy and also too small to decide on the validity of the model. This holds especially for the low failure probability levels that are required for reliable applications. Extrapolation to 0.1% failure probability and below led to design strengths so low that higher load applications seemed to be not feasible. New data have been collected with numbers per set large enough to enable tests on the applicability of the three-parameter Weibull distribution. This distribution revealed to provide much better fitting of the data. Moreover it delivers a lower threshold value, which means a minimum value for breakage stress, allowing of removing statistical uncertainty by introducing a deterministic method to calculate design strength. Considerations taken from the theory of fracture mechanics as have been proven to be reliable with proof test qualifications of delicate structures made from brittle materials enable including fatigue due to stress corrosion in a straight forward way. With the formulae derived, either lifetime can be calculated from given stress or allowable stress from minimum required lifetime. The data, distributions, and design strength calculations for several practically relevant surface conditions of ZERODUR are given. The values obtained are significantly higher than those resulting from the two-parameter Weibull distribution approach and no longer subject to statistical uncertainty.
An automated approach to the design of decision tree classifiers
NASA Technical Reports Server (NTRS)
Argentiero, P.; Chin, P.; Beaudet, P.
1980-01-01
The classification of large dimensional data sets arising from the merging of remote sensing data with more traditional forms of ancillary data is considered. Decision tree classification, a popular approach to the problem, is characterized by the property that samples are subjected to a sequence of decision rules before they are assigned to a unique class. An automated technique for effective decision tree design which relies only on apriori statistics is presented. This procedure utilizes a set of two dimensional canonical transforms and Bayes table look-up decision rules. An optimal design at each node is derived based on the associated decision table. A procedure for computing the global probability of correct classfication is also provided. An example is given in which class statistics obtained from an actual LANDSAT scene are used as input to the program. The resulting decision tree design has an associated probability of correct classification of .76 compared to the theoretically optimum .79 probability of correct classification associated with a full dimensional Bayes classifier. Recommendations for future research are included.
Cluster detection methods applied to the Upper Cape Cod cancer data.
Ozonoff, Al; Webster, Thomas; Vieira, Veronica; Weinberg, Janice; Ozonoff, David; Aschengrau, Ann
2005-09-15
A variety of statistical methods have been suggested to assess the degree and/or the location of spatial clustering of disease cases. However, there is relatively little in the literature devoted to comparison and critique of different methods. Most of the available comparative studies rely on simulated data rather than real data sets. We have chosen three methods currently used for examining spatial disease patterns: the M-statistic of Bonetti and Pagano; the Generalized Additive Model (GAM) method as applied by Webster; and Kulldorff's spatial scan statistic. We apply these statistics to analyze breast cancer data from the Upper Cape Cancer Incidence Study using three different latency assumptions. The three different latency assumptions produced three different spatial patterns of cases and controls. For 20 year latency, all three methods generally concur. However, for 15 year latency and no latency assumptions, the methods produce different results when testing for global clustering. The comparative analyses of real data sets by different statistical methods provides insight into directions for further research. We suggest a research program designed around examining real data sets to guide focused investigation of relevant features using simulated data, for the purpose of understanding how to interpret statistical methods applied to epidemiological data with a spatial component.
Vanniyasingam, Thuva; Daly, Caitlin; Jin, Xuejing; Zhang, Yuan; Foster, Gary; Cunningham, Charles; Thabane, Lehana
2018-06-01
This study reviews simulation studies of discrete choice experiments to determine (i) how survey design features affect statistical efficiency, (ii) and to appraise their reporting quality. Statistical efficiency was measured using relative design (D-) efficiency, D-optimality, or D-error. For this systematic survey, we searched Journal Storage (JSTOR), Since Direct, PubMed, and OVID which included a search within EMBASE. Searches were conducted up to year 2016 for simulation studies investigating the impact of DCE design features on statistical efficiency. Studies were screened and data were extracted independently and in duplicate. Results for each included study were summarized by design characteristic. Previously developed criteria for reporting quality of simulation studies were also adapted and applied to each included study. Of 371 potentially relevant studies, 9 were found to be eligible, with several varying in study objectives. Statistical efficiency improved when increasing the number of choice tasks or alternatives; decreasing the number of attributes, attribute levels; using an unrestricted continuous "manipulator" attribute; using model-based approaches with covariates incorporating response behaviour; using sampling approaches that incorporate previous knowledge of response behaviour; incorporating heterogeneity in a model-based design; correctly specifying Bayesian priors; minimizing parameter prior variances; and using an appropriate method to create the DCE design for the research question. The simulation studies performed well in terms of reporting quality. Improvement is needed in regards to clearly specifying study objectives, number of failures, random number generators, starting seeds, and the software used. These results identify the best approaches to structure a DCE. An investigator can manipulate design characteristics to help reduce response burden and increase statistical efficiency. Since studies varied in their objectives, conclusions were made on several design characteristics, however, the validity of each conclusion was limited. Further research should be conducted to explore all conclusions in various design settings and scenarios. Additional reviews to explore other statistical efficiency outcomes and databases can also be performed to enhance the conclusions identified from this review.
Bryant, Fred B
2016-12-01
This paper introduces a special section of the current issue of the Journal of Evaluation in Clinical Practice that includes a set of 6 empirical articles showcasing a versatile, new machine-learning statistical method, known as optimal data (or discriminant) analysis (ODA), specifically designed to produce statistical models that maximize predictive accuracy. As this set of papers clearly illustrates, ODA offers numerous important advantages over traditional statistical methods-advantages that enhance the validity and reproducibility of statistical conclusions in empirical research. This issue of the journal also includes a review of a recently published book that provides a comprehensive introduction to the logic, theory, and application of ODA in empirical research. It is argued that researchers have much to gain by using ODA to analyze their data. © 2016 John Wiley & Sons, Ltd.
Statistical Methods for Rapid Aerothermal Analysis and Design Technology: Validation
NASA Technical Reports Server (NTRS)
DePriest, Douglas; Morgan, Carolyn
2003-01-01
The cost and safety goals for NASA s next generation of reusable launch vehicle (RLV) will require that rapid high-fidelity aerothermodynamic design tools be used early in the design cycle. To meet these requirements, it is desirable to identify adequate statistical models that quantify and improve the accuracy, extend the applicability, and enable combined analyses using existing prediction tools. The initial research work focused on establishing suitable candidate models for these purposes. The second phase is focused on assessing the performance of these models to accurately predict the heat rate for a given candidate data set. This validation work compared models and methods that may be useful in predicting the heat rate.
Ries, Kernell G.; Newson, Jeremy K.; Smith, Martyn J.; Guthrie, John D.; Steeves, Peter A.; Haluska, Tana L.; Kolb, Katharine R.; Thompson, Ryan F.; Santoro, Richard D.; Vraga, Hans W.
2017-10-30
IntroductionStreamStats version 4, available at https://streamstats.usgs.gov, is a map-based web application that provides an assortment of analytical tools that are useful for water-resources planning and management, and engineering purposes. Developed by the U.S. Geological Survey (USGS), the primary purpose of StreamStats is to provide estimates of streamflow statistics for user-selected ungaged sites on streams and for USGS streamgages, which are locations where streamflow data are collected.Streamflow statistics, such as the 1-percent flood, the mean flow, and the 7-day 10-year low flow, are used by engineers, land managers, biologists, and many others to help guide decisions in their everyday work. For example, estimates of the 1-percent flood (which is exceeded, on average, once in 100 years and has a 1-percent chance of exceedance in any year) are used to create flood-plain maps that form the basis for setting insurance rates and land-use zoning. This and other streamflow statistics also are used for dam, bridge, and culvert design; water-supply planning and management; permitting of water withdrawals and wastewater and industrial discharges; hydropower facility design and regulation; and setting of minimum allowed streamflows to protect freshwater ecosystems. Streamflow statistics can be computed from available data at USGS streamgages depending on the type of data collected at the stations. Most often, however, streamflow statistics are needed at ungaged sites, where no streamflow data are available to determine the statistics.
NASA Astrophysics Data System (ADS)
Williams, Arnold C.; Pachowicz, Peter W.
2004-09-01
Current mine detection research indicates that no single sensor or single look from a sensor will detect mines/minefields in a real-time manner at a performance level suitable for a forward maneuver unit. Hence, the integrated development of detectors and fusion algorithms are of primary importance. A problem in this development process has been the evaluation of these algorithms with relatively small data sets, leading to anecdotal and frequently over trained results. These anecdotal results are often unreliable and conflicting among various sensors and algorithms. Consequently, the physical phenomena that ought to be exploited and the performance benefits of this exploitation are often ambiguous. The Army RDECOM CERDEC Night Vision Laboratory and Electron Sensors Directorate has collected large amounts of multisensor data such that statistically significant evaluations of detection and fusion algorithms can be obtained. Even with these large data sets care must be taken in algorithm design and data processing to achieve statistically significant performance results for combined detectors and fusion algorithms. This paper discusses statistically significant detection and combined multilook fusion results for the Ellipse Detector (ED) and the Piecewise Level Fusion Algorithm (PLFA). These statistically significant performance results are characterized by ROC curves that have been obtained through processing this multilook data for the high resolution SAR data of the Veridian X-Band radar. We discuss the implications of these results on mine detection and the importance of statistical significance, sample size, ground truth, and algorithm design in performance evaluation.
No-Reference Video Quality Assessment Based on Statistical Analysis in 3D-DCT Domain.
Li, Xuelong; Guo, Qun; Lu, Xiaoqiang
2016-05-13
It is an important task to design models for universal no-reference video quality assessment (NR-VQA) in multiple video processing and computer vision applications. However, most existing NR-VQA metrics are designed for specific distortion types which are not often aware in practical applications. A further deficiency is that the spatial and temporal information of videos is hardly considered simultaneously. In this paper, we propose a new NR-VQA metric based on the spatiotemporal natural video statistics (NVS) in 3D discrete cosine transform (3D-DCT) domain. In the proposed method, a set of features are firstly extracted based on the statistical analysis of 3D-DCT coefficients to characterize the spatiotemporal statistics of videos in different views. These features are used to predict the perceived video quality via the efficient linear support vector regression (SVR) model afterwards. The contributions of this paper are: 1) we explore the spatiotemporal statistics of videos in 3DDCT domain which has the inherent spatiotemporal encoding advantage over other widely used 2D transformations; 2) we extract a small set of simple but effective statistical features for video visual quality prediction; 3) the proposed method is universal for multiple types of distortions and robust to different databases. The proposed method is tested on four widely used video databases. Extensive experimental results demonstrate that the proposed method is competitive with the state-of-art NR-VQA metrics and the top-performing FR-VQA and RR-VQA metrics.
Measuring Multidimensional Latent Growth. Research Report. ETS RR-10-24
ERIC Educational Resources Information Center
Rijmen, Frank
2010-01-01
As is the case for any statistical model, a multidimensional latent growth model comes with certain requirements with respect to the data collection design. In order to measure growth, repeated measurements of the same set of individuals are required. Furthermore, the data collection design should be specified such that no individual is given the…
NASA Astrophysics Data System (ADS)
Sun, Hongyue; Luo, Shuai; Jin, Ran; He, Zhen
2017-07-01
Mathematical modeling is an important tool to investigate the performance of microbial fuel cell (MFC) towards its optimized design. To overcome the shortcoming of traditional MFC models, an ensemble model is developed through integrating both engineering model and statistical analytics for the extrapolation scenarios in this study. Such an ensemble model can reduce laboring effort in parameter calibration and require fewer measurement data to achieve comparable accuracy to traditional statistical model under both the normal and extreme operation regions. Based on different weight between current generation and organic removal efficiency, the ensemble model can give recommended input factor settings to achieve the best current generation and organic removal efficiency. The model predicts a set of optimal design factors for the present tubular MFCs including the anode flow rate of 3.47 mL min-1, organic concentration of 0.71 g L-1, and catholyte pumping flow rate of 14.74 mL min-1 to achieve the peak current at 39.2 mA. To maintain 100% organic removal efficiency, the anode flow rate and organic concentration should be controlled lower than 1.04 mL min-1 and 0.22 g L-1, respectively. The developed ensemble model can be potentially modified to model other types of MFCs or bioelectrochemical systems.
Statistics of Shared Components in Complex Component Systems
NASA Astrophysics Data System (ADS)
Mazzolini, Andrea; Gherardi, Marco; Caselle, Michele; Cosentino Lagomarsino, Marco; Osella, Matteo
2018-04-01
Many complex systems are modular. Such systems can be represented as "component systems," i.e., sets of elementary components, such as LEGO bricks in LEGO sets. The bricks found in a LEGO set reflect a target architecture, which can be built following a set-specific list of instructions. In other component systems, instead, the underlying functional design and constraints are not obvious a priori, and their detection is often a challenge of both scientific and practical importance, requiring a clear understanding of component statistics. Importantly, some quantitative invariants appear to be common to many component systems, most notably a common broad distribution of component abundances, which often resembles the well-known Zipf's law. Such "laws" affect in a general and nontrivial way the component statistics, potentially hindering the identification of system-specific functional constraints or generative processes. Here, we specifically focus on the statistics of shared components, i.e., the distribution of the number of components shared by different system realizations, such as the common bricks found in different LEGO sets. To account for the effects of component heterogeneity, we consider a simple null model, which builds system realizations by random draws from a universe of possible components. Under general assumptions on abundance heterogeneity, we provide analytical estimates of component occurrence, which quantify exhaustively the statistics of shared components. Surprisingly, this simple null model can positively explain important features of empirical component-occurrence distributions obtained from large-scale data on bacterial genomes, LEGO sets, and book chapters. Specific architectural features and functional constraints can be detected from occurrence patterns as deviations from these null predictions, as we show for the illustrative case of the "core" genome in bacteria.
Robust Lee local statistic filter for removal of mixed multiplicative and impulse noise
NASA Astrophysics Data System (ADS)
Ponomarenko, Nikolay N.; Lukin, Vladimir V.; Egiazarian, Karen O.; Astola, Jaakko T.
2004-05-01
A robust version of Lee local statistic filter able to effectively suppress the mixed multiplicative and impulse noise in images is proposed. The performance of the proposed modification is studied for a set of test images, several values of multiplicative noise variance, Gaussian and Rayleigh probability density functions of speckle, and different characteris-tics of impulse noise. The advantages of the designed filter in comparison to the conventional Lee local statistic filter and some other filters able to cope with mixed multiplicative+impulse noise are demonstrated.
Two-Dimensional Hermite Filters Simplify the Description of High-Order Statistics of Natural Images.
Hu, Qin; Victor, Jonathan D
2016-09-01
Natural image statistics play a crucial role in shaping biological visual systems, understanding their function and design principles, and designing effective computer-vision algorithms. High-order statistics are critical for conveying local features, but they are challenging to study - largely because their number and variety is large. Here, via the use of two-dimensional Hermite (TDH) functions, we identify a covert symmetry in high-order statistics of natural images that simplifies this task. This emerges from the structure of TDH functions, which are an orthogonal set of functions that are organized into a hierarchy of ranks. Specifically, we find that the shape (skewness and kurtosis) of the distribution of filter coefficients depends only on the projection of the function onto a 1-dimensional subspace specific to each rank. The characterization of natural image statistics provided by TDH filter coefficients reflects both their phase and amplitude structure, and we suggest an intuitive interpretation for the special subspace within each rank.
Carle, C; Alexander, P; Columb, M; Johal, J
2013-04-01
We designed and internally validated an aggregate weighted early warning scoring system specific to the obstetric population that has the potential for use in the ward environment. Direct obstetric admissions from the Intensive Care National Audit and Research Centre's Case Mix Programme Database were randomly allocated to model development (n = 2240) or validation (n = 2200) sets. Physiological variables collected during the first 24 h of critical care admission were analysed. Logistic regression analysis for mortality in the model development set was initially used to create a statistically based early warning score. The statistical score was then modified to create a clinically acceptable early warning score. Important features of this clinical obstetric early warning score are that the variables are weighted according to their statistical importance, a surrogate for the FI O2 /Pa O2 relationship is included, conscious level is assessed using a simplified alert/not alert variable, and the score, trigger thresholds and response are consistent with the new non-obstetric National Early Warning Score system. The statistical and clinical early warning scores were internally validated using the validation set. The area under the receiver operating characteristic curve was 0.995 (95% CI 0.992-0.998) for the statistical score and 0.957 (95% CI 0.923-0.991) for the clinical score. Pre-existing empirically designed early warning scores were also validated in the same way for comparison. The area under the receiver operating characteristic curve was 0.955 (95% CI 0.922-0.988) for Swanton et al.'s Modified Early Obstetric Warning System, 0.937 (95% CI 0.884-0.991) for the obstetric early warning score suggested in the 2003-2005 Report on Confidential Enquiries into Maternal Deaths in the UK, and 0.973 (95% CI 0.957-0.989) for the non-obstetric National Early Warning Score. This highlights that the new clinical obstetric early warning score has an excellent ability to discriminate survivors from non-survivors in this critical care data set. Further work is needed to validate our new clinical early warning score externally in the obstetric ward environment. Anaesthesia © 2013 The Association of Anaesthetists of Great Britain and Ireland.
Austin, Peter C.; van Klaveren, David; Vergouwe, Yvonne; Nieboer, Daan; Lee, Douglas S.; Steyerberg, Ewout W.
2017-01-01
Objective Validation of clinical prediction models traditionally refers to the assessment of model performance in new patients. We studied different approaches to geographic and temporal validation in the setting of multicenter data from two time periods. Study Design and Setting We illustrated different analytic methods for validation using a sample of 14,857 patients hospitalized with heart failure at 90 hospitals in two distinct time periods. Bootstrap resampling was used to assess internal validity. Meta-analytic methods were used to assess geographic transportability. Each hospital was used once as a validation sample, with the remaining hospitals used for model derivation. Hospital-specific estimates of discrimination (c-statistic) and calibration (calibration intercepts and slopes) were pooled using random effects meta-analysis methods. I2 statistics and prediction interval width quantified geographic transportability. Temporal transportability was assessed using patients from the earlier period for model derivation and patients from the later period for model validation. Results Estimates of reproducibility, pooled hospital-specific performance, and temporal transportability were on average very similar, with c-statistics of 0.75. Between-hospital variation was moderate according to I2 statistics and prediction intervals for c-statistics. Conclusion This study illustrates how performance of prediction models can be assessed in settings with multicenter data at different time periods. PMID:27262237
A novel approach to simulate gene-environment interactions in complex diseases.
Amato, Roberto; Pinelli, Michele; D'Andrea, Daniel; Miele, Gennaro; Nicodemi, Mario; Raiconi, Giancarlo; Cocozza, Sergio
2010-01-05
Complex diseases are multifactorial traits caused by both genetic and environmental factors. They represent the major part of human diseases and include those with largest prevalence and mortality (cancer, heart disease, obesity, etc.). Despite a large amount of information that has been collected about both genetic and environmental risk factors, there are few examples of studies on their interactions in epidemiological literature. One reason can be the incomplete knowledge of the power of statistical methods designed to search for risk factors and their interactions in these data sets. An improvement in this direction would lead to a better understanding and description of gene-environment interactions. To this aim, a possible strategy is to challenge the different statistical methods against data sets where the underlying phenomenon is completely known and fully controllable, for example simulated ones. We present a mathematical approach that models gene-environment interactions. By this method it is possible to generate simulated populations having gene-environment interactions of any form, involving any number of genetic and environmental factors and also allowing non-linear interactions as epistasis. In particular, we implemented a simple version of this model in a Gene-Environment iNteraction Simulator (GENS), a tool designed to simulate case-control data sets where a one gene-one environment interaction influences the disease risk. The main aim has been to allow the input of population characteristics by using standard epidemiological measures and to implement constraints to make the simulator behaviour biologically meaningful. By the multi-logistic model implemented in GENS it is possible to simulate case-control samples of complex disease where gene-environment interactions influence the disease risk. The user has full control of the main characteristics of the simulated population and a Monte Carlo process allows random variability. A knowledge-based approach reduces the complexity of the mathematical model by using reasonable biological constraints and makes the simulation more understandable in biological terms. Simulated data sets can be used for the assessment of novel statistical methods or for the evaluation of the statistical power when designing a study.
Du, Q S; Ma, Y; Xie, N Z; Huang, R B
2014-01-01
In the design of peptide inhibitors the huge possible variety of the peptide sequences is of high concern. In collaboration with the fast accumulation of the peptide experimental data and database, a statistical method is suggested for peptide inhibitor design. In the two-level peptide prediction network (2L-QSAR) one level is the physicochemical properties of amino acids and the other level is the peptide sequence position. The activity contributions of amino acids are the functions of physicochemical properties and the sequence positions. In the prediction equation two weight coefficient sets {ak} and {bl} are assigned to the physicochemical properties and to the sequence positions, respectively. After the two coefficient sets are optimized based on the experimental data of known peptide inhibitors using the iterative double least square (IDLS) procedure, the coefficients are used to evaluate the bioactivities of new designed peptide inhibitors. The two-level prediction network can be applied to the peptide inhibitor design that may aim for different target proteins, or different positions of a protein. A notable advantage of the two-level statistical algorithm is that there is no need for host protein structural information. It may also provide useful insight into the amino acid properties and the roles of sequence positions.
Sirota, Miroslav; Kostovičová, Lenka; Juanchich, Marie
2014-08-01
Knowing which properties of visual displays facilitate statistical reasoning bears practical and theoretical implications. Therefore, we studied the effect of one property of visual diplays - iconicity (i.e., the resemblance of a visual sign to its referent) - on Bayesian reasoning. Two main accounts of statistical reasoning predict different effect of iconicity on Bayesian reasoning. The ecological-rationality account predicts a positive iconicity effect, because more highly iconic signs resemble more individuated objects, which tap better into an evolutionary-designed frequency-coding mechanism that, in turn, facilitates Bayesian reasoning. The nested-sets account predicts a null iconicity effect, because iconicity does not affect the salience of a nested-sets structure-the factor facilitating Bayesian reasoning processed by a general reasoning mechanism. In two well-powered experiments (N = 577), we found no support for a positive iconicity effect across different iconicity levels that were manipulated in different visual displays (meta-analytical overall effect: log OR = -0.13, 95% CI [-0.53, 0.28]). A Bayes factor analysis provided strong evidence in favor of the null hypothesis-the null iconicity effect. Thus, these findings corroborate the nested-sets rather than the ecological-rationality account of statistical reasoning.
3D QSAR based design of novel oxindole derivative as 5HT7 inhibitors.
Chitta, Aparna; Sivan, Sree Kanth; Manga, Vijjulatha
2014-06-01
To understand the structural requirements of 5-hydroxytryptamine (5HT7) receptor inhibitors and to design new ligands against 5HT7 receptor with enhanced inhibitory potency, a three-dimensional quantitative structure-activity relationship study with comparative molecular field analysis (CoMFA) and comparative molecular similarity indices analysis (CoMSIA) for a data set of 56 molecules consisting of oxindole, tetrahydronaphthalene, aryl ketone substituted arylpiperazinealkylamide derivatives was performed. Derived model showed good statistical reliability in terms of predicting 5HT7 inhibitory activity of the molecules, based on molecular property fields like steric, electrostatic, hydrophobic, hydrogen bond donor and hydrogen bond acceptor fields. This is evident from statistical parameters like conventional r2 and a cross validated (q2) values of 0.985, 0.743 for CoMFA and 0.970, 0.608 for CoMSIA, respectively. Predictive ability of the models to determine 5HT7 antagonistic activity is validated using a test set of 16 molecules that were not included in the training set. Predictive r2 obtained for the test set was 0.560 and 0.619 for CoMFA and CoMSIA, respectively. Steric, electrostatic fields majorly contributed toward activity which forms the basis for design of new molecules. Absorption, distribution, metabolism and elimination (ADME) calculation using QikProp 2.5 (Schrodinger 2010, Portland, OR) reveals that the molecules confer to Lipinski's rule of five in majority of the cases.
Ho, Lindsey A; Lange, Ethan M
2010-12-01
Genome-wide association (GWA) studies are a powerful approach for identifying novel genetic risk factors associated with human disease. A GWA study typically requires the inclusion of thousands of samples to have sufficient statistical power to detect single nucleotide polymorphisms that are associated with only modest increases in risk of disease given the heavy burden of a multiple test correction that is necessary to maintain valid statistical tests. Low statistical power and the high financial cost of performing a GWA study remains prohibitive for many scientific investigators anxious to perform such a study using their own samples. A number of remedies have been suggested to increase statistical power and decrease cost, including the utilization of free publicly available genotype data and multi-stage genotyping designs. Herein, we compare the statistical power and relative costs of alternative association study designs that use cases and screened controls to study designs that are based only on, or additionally include, free public control genotype data. We describe a novel replication-based two-stage study design, which uses free public control genotype data in the first stage and follow-up genotype data on case-matched controls in the second stage that preserves many of the advantages inherent when using only an epidemiologically matched set of controls. Specifically, we show that our proposed two-stage design can substantially increase statistical power and decrease cost of performing a GWA study while controlling the type-I error rate that can be inflated when using public controls due to differences in ancestry and batch genotype effects.
Recommendations for research design of telehealth studies.
Chumbler, Neale R; Kobb, Rita; Brennan, David M; Rabinowitz, Terry
2008-11-01
Properly designed randomized controlled trials (RCTs) are the gold standard to use when examining the effectiveness of telehealth interventions on clinical outcomes. Some published telehealth studies have employed well-designed RCTs. However, such methods are not always feasible and practical in particular settings. This white paper addresses not only the need for properly designed RCTs, but also offers alternative research designs, such as quasi-experimental designs, and statistical techniques that can be employed to rigorously assess the effectiveness of telehealth studies. This paper further offers design and measurement recommendations aimed at and relevant to administrative decision-makers, policymakers, and practicing clinicians.
Analytical procedure validation and the quality by design paradigm.
Rozet, Eric; Lebrun, Pierre; Michiels, Jean-François; Sondag, Perceval; Scherder, Tara; Boulanger, Bruno
2015-01-01
Since the adoption of the ICH Q8 document concerning the development of pharmaceutical processes following a quality by design (QbD) approach, there have been many discussions on the opportunity for analytical procedure developments to follow a similar approach. While development and optimization of analytical procedure following QbD principles have been largely discussed and described, the place of analytical procedure validation in this framework has not been clarified. This article aims at showing that analytical procedure validation is fully integrated into the QbD paradigm and is an essential step in developing analytical procedures that are effectively fit for purpose. Adequate statistical methodologies have also their role to play: such as design of experiments, statistical modeling, and probabilistic statements. The outcome of analytical procedure validation is also an analytical procedure design space, and from it, control strategy can be set.
Semenov, Alexander V; Elsas, Jan Dirk; Glandorf, Debora C M; Schilthuizen, Menno; Boer, Willem F
2013-01-01
Abstract To fulfill existing guidelines, applicants that aim to place their genetically modified (GM) insect-resistant crop plants on the market are required to provide data from field experiments that address the potential impacts of the GM plants on nontarget organisms (NTO's). Such data may be based on varied experimental designs. The recent EFSA guidance document for environmental risk assessment (2010) does not provide clear and structured suggestions that address the statistics of field trials on effects on NTO's. This review examines existing practices in GM plant field testing such as the way of randomization, replication, and pseudoreplication. Emphasis is placed on the importance of design features used for the field trials in which effects on NTO's are assessed. The importance of statistical power and the positive and negative aspects of various statistical models are discussed. Equivalence and difference testing are compared, and the importance of checking the distribution of experimental data is stressed to decide on the selection of the proper statistical model. While for continuous data (e.g., pH and temperature) classical statistical approaches – for example, analysis of variance (ANOVA) – are appropriate, for discontinuous data (counts) only generalized linear models (GLM) are shown to be efficient. There is no golden rule as to which statistical test is the most appropriate for any experimental situation. In particular, in experiments in which block designs are used and covariates play a role GLMs should be used. Generic advice is offered that will help in both the setting up of field testing and the interpretation and data analysis of the data obtained in this testing. The combination of decision trees and a checklist for field trials, which are provided, will help in the interpretation of the statistical analyses of field trials and to assess whether such analyses were correctly applied. We offer generic advice to risk assessors and applicants that will help in both the setting up of field testing and the interpretation and data analysis of the data obtained in field testing. PMID:24567836
Semenov, Alexander V; Elsas, Jan Dirk; Glandorf, Debora C M; Schilthuizen, Menno; Boer, Willem F
2013-08-01
To fulfill existing guidelines, applicants that aim to place their genetically modified (GM) insect-resistant crop plants on the market are required to provide data from field experiments that address the potential impacts of the GM plants on nontarget organisms (NTO's). Such data may be based on varied experimental designs. The recent EFSA guidance document for environmental risk assessment (2010) does not provide clear and structured suggestions that address the statistics of field trials on effects on NTO's. This review examines existing practices in GM plant field testing such as the way of randomization, replication, and pseudoreplication. Emphasis is placed on the importance of design features used for the field trials in which effects on NTO's are assessed. The importance of statistical power and the positive and negative aspects of various statistical models are discussed. Equivalence and difference testing are compared, and the importance of checking the distribution of experimental data is stressed to decide on the selection of the proper statistical model. While for continuous data (e.g., pH and temperature) classical statistical approaches - for example, analysis of variance (ANOVA) - are appropriate, for discontinuous data (counts) only generalized linear models (GLM) are shown to be efficient. There is no golden rule as to which statistical test is the most appropriate for any experimental situation. In particular, in experiments in which block designs are used and covariates play a role GLMs should be used. Generic advice is offered that will help in both the setting up of field testing and the interpretation and data analysis of the data obtained in this testing. The combination of decision trees and a checklist for field trials, which are provided, will help in the interpretation of the statistical analyses of field trials and to assess whether such analyses were correctly applied. We offer generic advice to risk assessors and applicants that will help in both the setting up of field testing and the interpretation and data analysis of the data obtained in field testing.
Austin, Peter C; Anderson, Geoffrey M; Cigsar, Candemir; Gruneir, Andrea
2012-01-01
Purpose Observational studies using electronic administrative healthcare databases are often used to estimate the effects of treatments and exposures. Traditionally, a cohort design has been used to estimate these effects, but increasingly, studies are using a nested case–control (NCC) design. The relative statistical efficiency of these two designs has not been examined in detail. Methods We used Monte Carlo simulations to compare these two designs in terms of the bias and precision of effect estimates. We examined three different settings: (A) treatment occurred at baseline, and there was a single outcome of interest; (B) treatment was time varying, and there was a single outcome; and C treatment occurred at baseline, and there was a secondary event that competed with the primary event of interest. Comparisons were made of percentage bias, length of 95% confidence interval, and mean squared error (MSE) as a combined measure of bias and precision. Results In Setting A, bias was similar between designs, but the cohort design was more precise and had a lower MSE in all scenarios. In Settings B and C, the cohort design was more precise and had a lower MSE in all scenarios. In both Settings B and C, the NCC design tended to result in estimates with greater bias compared with the cohort design. Conclusions We conclude that in a range of settings and scenarios, the cohort design is superior in terms of precision and MSE. Copyright © 2012 John Wiley & Sons, Ltd. PMID:22653805
ERIC Educational Resources Information Center
Li, Yuan H.; Yang, Yu N.; Tompkins, Leroy J.; Modarresi, Shahpar
2005-01-01
The statistical technique, "Zero-One Linear Programming," that has successfully been used to create multiple tests with similar characteristics (e.g., item difficulties, test information and test specifications) in the area of educational measurement, was deemed to be a suitable method for creating multiple sets of matched samples to be…
A shrinking hypersphere PSO for engineering optimisation problems
NASA Astrophysics Data System (ADS)
Yadav, Anupam; Deep, Kusum
2016-03-01
Many real-world and engineering design problems can be formulated as constrained optimisation problems (COPs). Swarm intelligence techniques are a good approach to solve COPs. In this paper an efficient shrinking hypersphere-based particle swarm optimisation (SHPSO) algorithm is proposed for constrained optimisation. The proposed SHPSO is designed in such a way that the movement of the particle is set to move under the influence of shrinking hyperspheres. A parameter-free approach is used to handle the constraints. The performance of the SHPSO is compared against the state-of-the-art algorithms for a set of 24 benchmark problems. An exhaustive comparison of the results is provided statistically as well as graphically. Moreover three engineering design problems namely welded beam design, compressed string design and pressure vessel design problems are solved using SHPSO and the results are compared with the state-of-the-art algorithms.
2016-10-31
statistical physics. Sec. IV includes several examples of the application of the stochastic method, including matching of a shape to a fixed design, and...an important part of any future application of this method. Second, re-initialization of the level set can lead to small but significant movements of...of engineering design problems [6, 17]. However, many of the relevant applications involve non-convex optimisation problems with multiple locally
Statistical Engineering in Air Traffic Management Research
NASA Technical Reports Server (NTRS)
Wilson, Sara R.
2015-01-01
NASA is working to develop an integrated set of advanced technologies to enable efficient arrival operations in high-density terminal airspace for the Next Generation Air Transportation System. This integrated arrival solution is being validated and verified in laboratories and transitioned to a field prototype for an operational demonstration at a major U.S. airport. Within NASA, this is a collaborative effort between Ames and Langley Research Centers involving a multi-year iterative experimentation process. Designing and analyzing a series of sequential batch computer simulations and human-in-the-loop experiments across multiple facilities and simulation environments involves a number of statistical challenges. Experiments conducted in separate laboratories typically have different limitations and constraints, and can take different approaches with respect to the fundamental principles of statistical design of experiments. This often makes it difficult to compare results from multiple experiments and incorporate findings into the next experiment in the series. A statistical engineering approach is being employed within this project to support risk-informed decision making and maximize the knowledge gained within the available resources. This presentation describes a statistical engineering case study from NASA, highlights statistical challenges, and discusses areas where existing statistical methodology is adapted and extended.
Statistical Approaches to Adjusting Weights for Dependent Arms in Network Meta-analysis.
Su, Yu-Xuan; Tu, Yu-Kang
2018-05-22
Network meta-analysis compares multiple treatments in terms of their efficacy and harm by including evidence from randomized controlled trials. Most clinical trials use parallel design, where patients are randomly allocated to different treatments and receive only one treatment. However, some trials use within person designs such as split-body, split-mouth and cross-over designs, where each patient may receive more than one treatment. Data from treatment arms within these trials are no longer independent, so the correlations between dependent arms need to be accounted for within the statistical analyses. Ignoring these correlations may result in incorrect conclusions. The main objective of this study is to develop statistical approaches to adjusting weights for dependent arms within special design trials. In this study, we demonstrate the following three approaches: the data augmentation approach, the adjusting variance approach, and the reducing weight approach. These three methods could be perfectly applied in current statistic tools such as R and STATA. An example of periodontal regeneration was used to demonstrate how these approaches could be undertaken and implemented within statistical software packages, and to compare results from different approaches. The adjusting variance approach can be implemented within the network package in STATA, while reducing weight approach requires computer software programming to set up the within-study variance-covariance matrix. This article is protected by copyright. All rights reserved.
Skates, Steven J.; Gillette, Michael A.; LaBaer, Joshua; Carr, Steven A.; Anderson, N. Leigh; Liebler, Daniel C.; Ransohoff, David; Rifai, Nader; Kondratovich, Marina; Težak, Živana; Mansfield, Elizabeth; Oberg, Ann L.; Wright, Ian; Barnes, Grady; Gail, Mitchell; Mesri, Mehdi; Kinsinger, Christopher R.; Rodriguez, Henry; Boja, Emily S.
2014-01-01
Protein biomarkers are needed to deepen our understanding of cancer biology and to improve our ability to diagnose, monitor and treat cancers. Important analytical and clinical hurdles must be overcome to allow the most promising protein biomarker candidates to advance into clinical validation studies. Although contemporary proteomics technologies support the measurement of large numbers of proteins in individual clinical specimens, sample throughput remains comparatively low. This problem is amplified in typical clinical proteomics research studies, which routinely suffer from a lack of proper experimental design, resulting in analysis of too few biospecimens to achieve adequate statistical power at each stage of a biomarker pipeline. To address this critical shortcoming, a joint workshop was held by the National Cancer Institute (NCI), National Heart, Lung and Blood Institute (NHLBI), and American Association for Clinical Chemistry (AACC), with participation from the U.S. Food and Drug Administration (FDA). An important output from the workshop was a statistical framework for the design of biomarker discovery and verification studies. Herein, we describe the use of quantitative clinical judgments to set statistical criteria for clinical relevance, and the development of an approach to calculate biospecimen sample size for proteomic studies in discovery and verification stages prior to clinical validation stage. This represents a first step towards building a consensus on quantitative criteria for statistical design of proteomics biomarker discovery and verification research. PMID:24063748
Skates, Steven J; Gillette, Michael A; LaBaer, Joshua; Carr, Steven A; Anderson, Leigh; Liebler, Daniel C; Ransohoff, David; Rifai, Nader; Kondratovich, Marina; Težak, Živana; Mansfield, Elizabeth; Oberg, Ann L; Wright, Ian; Barnes, Grady; Gail, Mitchell; Mesri, Mehdi; Kinsinger, Christopher R; Rodriguez, Henry; Boja, Emily S
2013-12-06
Protein biomarkers are needed to deepen our understanding of cancer biology and to improve our ability to diagnose, monitor, and treat cancers. Important analytical and clinical hurdles must be overcome to allow the most promising protein biomarker candidates to advance into clinical validation studies. Although contemporary proteomics technologies support the measurement of large numbers of proteins in individual clinical specimens, sample throughput remains comparatively low. This problem is amplified in typical clinical proteomics research studies, which routinely suffer from a lack of proper experimental design, resulting in analysis of too few biospecimens to achieve adequate statistical power at each stage of a biomarker pipeline. To address this critical shortcoming, a joint workshop was held by the National Cancer Institute (NCI), National Heart, Lung, and Blood Institute (NHLBI), and American Association for Clinical Chemistry (AACC) with participation from the U.S. Food and Drug Administration (FDA). An important output from the workshop was a statistical framework for the design of biomarker discovery and verification studies. Herein, we describe the use of quantitative clinical judgments to set statistical criteria for clinical relevance and the development of an approach to calculate biospecimen sample size for proteomic studies in discovery and verification stages prior to clinical validation stage. This represents a first step toward building a consensus on quantitative criteria for statistical design of proteomics biomarker discovery and verification research.
Introductory life science mathematics and quantitative neuroscience courses.
Duffus, Dwight; Olifer, Andrei
2010-01-01
We describe two sets of courses designed to enhance the mathematical, statistical, and computational training of life science undergraduates at Emory College. The first course is an introductory sequence in differential and integral calculus, modeling with differential equations, probability, and inferential statistics. The second is an upper-division course in computational neuroscience. We provide a description of each course, detailed syllabi, examples of content, and a brief discussion of the main issues encountered in developing and offering the courses.
[The research protocol VI: How to choose the appropriate statistical test. Inferential statistics].
Flores-Ruiz, Eric; Miranda-Novales, María Guadalupe; Villasís-Keever, Miguel Ángel
2017-01-01
The statistical analysis can be divided in two main components: descriptive analysis and inferential analysis. An inference is to elaborate conclusions from the tests performed with the data obtained from a sample of a population. Statistical tests are used in order to establish the probability that a conclusion obtained from a sample is applicable to the population from which it was obtained. However, choosing the appropriate statistical test in general poses a challenge for novice researchers. To choose the statistical test it is necessary to take into account three aspects: the research design, the number of measurements and the scale of measurement of the variables. Statistical tests are divided into two sets, parametric and nonparametric. Parametric tests can only be used if the data show a normal distribution. Choosing the right statistical test will make it easier for readers to understand and apply the results.
Compromise decision support problems for hierarchical design involving uncertainty
NASA Astrophysics Data System (ADS)
Vadde, S.; Allen, J. K.; Mistree, F.
1994-08-01
In this paper an extension to the traditional compromise Decision Support Problem (DSP) formulation is presented. Bayesian statistics is used in the formulation to model uncertainties associated with the information being used. In an earlier paper a compromise DSP that accounts for uncertainty using fuzzy set theory was introduced. The Bayesian Decision Support Problem is described in this paper. The method for hierarchical design is demonstrated by using this formulation to design a portal frame. The results are discussed and comparisons are made with those obtained using the fuzzy DSP. Finally, the efficacy of incorporating Bayesian statistics into the traditional compromise DSP formulation is discussed and some pending research issues are described. Our emphasis in this paper is on the method rather than the results per se.
PGT: A Statistical Approach to Prediction and Mechanism Design
NASA Astrophysics Data System (ADS)
Wolpert, David H.; Bono, James W.
One of the biggest challenges facing behavioral economics is the lack of a single theoretical framework that is capable of directly utilizing all types of behavioral data. One of the biggest challenges of game theory is the lack of a framework for making predictions and designing markets in a manner that is consistent with the axioms of decision theory. An approach in which solution concepts are distribution-valued rather than set-valued (i.e. equilibrium theory) has both capabilities. We call this approach Predictive Game Theory (or PGT). This paper outlines a general Bayesian approach to PGT. It also presents one simple example to illustrate the way in which this approach differs from equilibrium approaches in both prediction and mechanism design settings.
Hong, Bonnie; Du, Yingzhou; Mukerji, Pushkor; Roper, Jason M; Appenzeller, Laura M
2017-07-12
Regulatory-compliant rodent subchronic feeding studies are compulsory regardless of a hypothesis to test, according to recent EU legislation for the safety assessment of whole food/feed produced from genetically modified (GM) crops containing a single genetic transformation event (European Union Commission Implementing Regulation No. 503/2013). The Implementing Regulation refers to guidelines set forth by the European Food Safety Authority (EFSA) for the design, conduct, and analysis of rodent subchronic feeding studies. The set of EFSA recommendations was rigorously applied to a 90-day feeding study in Sprague-Dawley rats. After study completion, the appropriateness and applicability of these recommendations were assessed using a battery of statistical analysis approaches including both retrospective and prospective statistical power analyses as well as variance-covariance decomposition. In the interest of animal welfare considerations, alternative experimental designs were investigated and evaluated in the context of informing the health risk assessment of food/feed from GM crops.
Design of experiments and data analysis challenges in calibration for forensics applications
Anderson-Cook, Christine M.; Burr, Thomas L.; Hamada, Michael S.; ...
2015-07-15
Forensic science aims to infer characteristics of source terms using measured observables. Our focus is on statistical design of experiments and data analysis challenges arising in nuclear forensics. More specifically, we focus on inferring aspects of experimental conditions (of a process to produce product Pu oxide powder), such as temperature, nitric acid concentration, and Pu concentration, using measured features of the product Pu oxide powder. The measured features, Y, include trace chemical concentrations and particle morphology such as particle size and shape of the produced Pu oxide power particles. Making inferences about the nature of inputs X that were usedmore » to create nuclear materials having particular characteristics, Y, is an inverse problem. Therefore, statistical analysis can be used to identify the best set (or sets) of Xs for a new set of observed responses Y. One can fit a model (or models) such as Υ = f(Χ) + error, for each of the responses, based on a calibration experiment and then “invert” to solve for the best set of Xs for a new set of Ys. This perspectives paper uses archived experimental data to consider aspects of data collection and experiment design for the calibration data to maximize the quality of the predicted Ys in the forward models; that is, we assume that well-estimated forward models are effective in the inverse problem. In addition, we consider how to identify a best solution for the inferred X, and evaluate the quality of the result and its robustness to a variety of initial assumptions, and different correlation structures between the responses. In addition, we also briefly review recent advances in metrology issues related to characterizing particle morphology measurements used in the response vector, Y.« less
Design of experiments and data analysis challenges in calibration for forensics applications
DOE Office of Scientific and Technical Information (OSTI.GOV)
Anderson-Cook, Christine M.; Burr, Thomas L.; Hamada, Michael S.
Forensic science aims to infer characteristics of source terms using measured observables. Our focus is on statistical design of experiments and data analysis challenges arising in nuclear forensics. More specifically, we focus on inferring aspects of experimental conditions (of a process to produce product Pu oxide powder), such as temperature, nitric acid concentration, and Pu concentration, using measured features of the product Pu oxide powder. The measured features, Y, include trace chemical concentrations and particle morphology such as particle size and shape of the produced Pu oxide power particles. Making inferences about the nature of inputs X that were usedmore » to create nuclear materials having particular characteristics, Y, is an inverse problem. Therefore, statistical analysis can be used to identify the best set (or sets) of Xs for a new set of observed responses Y. One can fit a model (or models) such as Υ = f(Χ) + error, for each of the responses, based on a calibration experiment and then “invert” to solve for the best set of Xs for a new set of Ys. This perspectives paper uses archived experimental data to consider aspects of data collection and experiment design for the calibration data to maximize the quality of the predicted Ys in the forward models; that is, we assume that well-estimated forward models are effective in the inverse problem. In addition, we consider how to identify a best solution for the inferred X, and evaluate the quality of the result and its robustness to a variety of initial assumptions, and different correlation structures between the responses. In addition, we also briefly review recent advances in metrology issues related to characterizing particle morphology measurements used in the response vector, Y.« less
Development of the Statistical Reasoning in Biology Concept Inventory (SRBCI)
Deane, Thomas; Nomme, Kathy; Jeffery, Erica; Pollock, Carol; Birol, Gülnur
2016-01-01
We followed established best practices in concept inventory design and developed a 12-item inventory to assess student ability in statistical reasoning in biology (Statistical Reasoning in Biology Concept Inventory [SRBCI]). It is important to assess student thinking in this conceptual area, because it is a fundamental requirement of being statistically literate and associated skills are needed in almost all walks of life. Despite this, previous work shows that non–expert-like thinking in statistical reasoning is common, even after instruction. As science educators, our goal should be to move students along a novice-to-expert spectrum, which could be achieved with growing experience in statistical reasoning. We used item response theory analyses (the one-parameter Rasch model and associated analyses) to assess responses gathered from biology students in two populations at a large research university in Canada in order to test SRBCI’s robustness and sensitivity in capturing useful data relating to the students’ conceptual ability in statistical reasoning. Our analyses indicated that SRBCI is a unidimensional construct, with items that vary widely in difficulty and provide useful information about such student ability. SRBCI should be useful as a diagnostic tool in a variety of biology settings and as a means of measuring the success of teaching interventions designed to improve statistical reasoning skills. PMID:26903497
Xu, G; Hughes-Oliver, J M; Brooks, J D; Yeatts, J L; Baynes, R E
2013-01-01
Quantitative structure-activity relationship (QSAR) models are being used increasingly in skin permeation studies. The main idea of QSAR modelling is to quantify the relationship between biological activities and chemical properties, and thus to predict the activity of chemical solutes. As a key step, the selection of a representative and structurally diverse training set is critical to the prediction power of a QSAR model. Early QSAR models selected training sets in a subjective way and solutes in the training set were relatively homogenous. More recently, statistical methods such as D-optimal design or space-filling design have been applied but such methods are not always ideal. This paper describes a comprehensive procedure to select training sets from a large candidate set of 4534 solutes. A newly proposed 'Baynes' rule', which is a modification of Lipinski's 'rule of five', was used to screen out solutes that were not qualified for the study. U-optimality was used as the selection criterion. A principal component analysis showed that the selected training set was representative of the chemical space. Gas chromatograph amenability was verified. A model built using the training set was shown to have greater predictive power than a model built using a previous dataset [1].
Design and implementation of fishery rescue data mart system
NASA Astrophysics Data System (ADS)
Pan, Jun; Huang, Haiguang; Liu, Yousong
A novel data mart based system for fishery rescue field was designed and implemented. The system runs ETL process to deal with original data from various databases and data warehouses, and then reorganized the data into the fishery rescue data mart. Next, online analytical processing (OLAP) are carried out and statistical reports are generated automatically. Particularly, quick configuration schemes are designed to configure query dimensions and OLAP data sets. The configuration file will be transformed into statistic interfaces automatically through a wizard-style process. The system provides various forms of reporting files, including crystal reports, flash graphical reports, and two-dimensional data grids. In addition, a wizard style interface was designed to guide users customizing inquiry processes, making it possible for nontechnical staffs to access customized reports. Characterized by quick configuration, safeness and flexibility, the system has been successfully applied in city fishery rescue department.
Ashrafi, Parivash; Sun, Yi; Davey, Neil; Adams, Roderick G; Wilkinson, Simon C; Moss, Gary Patrick
2018-03-01
The aim of this study was to investigate how to improve predictions from Gaussian Process models by optimising the model hyperparameters. Optimisation methods, including Grid Search, Conjugate Gradient, Random Search, Evolutionary Algorithm and Hyper-prior, were evaluated and applied to previously published data. Data sets were also altered in a structured manner to reduce their size, which retained the range, or 'chemical space' of the key descriptors to assess the effect of the data range on model quality. The Hyper-prior Smoothbox kernel results in the best models for the majority of data sets, and they exhibited significantly better performance than benchmark quantitative structure-permeability relationship (QSPR) models. When the data sets were systematically reduced in size, the different optimisation methods generally retained their statistical quality, whereas benchmark QSPR models performed poorly. The design of the data set, and possibly also the approach to validation of the model, is critical in the development of improved models. The size of the data set, if carefully controlled, was not generally a significant factor for these models and that models of excellent statistical quality could be produced from substantially smaller data sets. © 2018 Royal Pharmaceutical Society.
Computer Administering of the Psychological Investigations: Set-Relational Representation
NASA Astrophysics Data System (ADS)
Yordzhev, Krasimir
Computer administering of a psychological investigation is the computer representation of the entire procedure of psychological assessments - test construction, test implementation, results evaluation, storage and maintenance of the developed database, its statistical processing, analysis and interpretation. A mathematical description of psychological assessment with the aid of personality tests is discussed in this article. The set theory and the relational algebra are used in this description. A relational model of data, needed to design a computer system for automation of certain psychological assessments is given. Some finite sets and relation on them, which are necessary for creating a personality psychological test, are described. The described model could be used to develop real software for computer administering of any psychological test and there is full automation of the whole process: test construction, test implementation, result evaluation, storage of the developed database, statistical implementation, analysis and interpretation. A software project for computer administering personality psychological tests is suggested.
Paradigms for adaptive statistical information designs: practical experiences and strategies.
Wang, Sue-Jane; Hung, H M James; O'Neill, Robert
2012-11-10
In the last decade or so, interest in adaptive design clinical trials has gradually been directed towards their use in regulatory submissions by pharmaceutical drug sponsors to evaluate investigational new drugs. Methodological advances of adaptive designs are abundant in the statistical literature since the 1970s. The adaptive design paradigm has been enthusiastically perceived to increase the efficiency and to be more cost-effective than the fixed design paradigm for drug development. Much interest in adaptive designs is in those studies with two-stages, where stage 1 is exploratory and stage 2 depends upon stage 1 results, but where the data of both stages will be combined to yield statistical evidence for use as that of a pivotal registration trial. It was not until the recent release of the US Food and Drug Administration Draft Guidance for Industry on Adaptive Design Clinical Trials for Drugs and Biologics (2010) that the boundaries of flexibility for adaptive designs were specifically considered for regulatory purposes, including what are exploratory goals, and what are the goals of adequate and well-controlled (A&WC) trials (2002). The guidance carefully described these distinctions in an attempt to minimize the confusion between the goals of preliminary learning phases of drug development, which are inherently substantially uncertain, and the definitive inference-based phases of drug development. In this paper, in addition to discussing some aspects of adaptive designs in a confirmatory study setting, we underscore the value of adaptive designs when used in exploratory trials to improve planning of subsequent A&WC trials. One type of adaptation that is receiving attention is the re-estimation of the sample size during the course of the trial. We refer to this type of adaptation as an adaptive statistical information design. Specifically, a case example is used to illustrate how challenging it is to plan a confirmatory adaptive statistical information design. We highlight the substantial risk of planning the sample size for confirmatory trials when information is very uninformative and stipulate the advantages of adaptive statistical information designs for planning exploratory trials. Practical experiences and strategies as lessons learned from more recent adaptive design proposals will be discussed to pinpoint the improved utilities of adaptive design clinical trials and their potential to increase the chance of a successful drug development. Published 2012. This article is a US Government work and is in the public domain in the USA.
A product of independent beta probabilities dose escalation design for dual-agent phase I trials.
Mander, Adrian P; Sweeting, Michael J
2015-04-15
Dual-agent trials are now increasingly common in oncology research, and many proposed dose-escalation designs are available in the statistical literature. Despite this, the translation from statistical design to practical application is slow, as has been highlighted in single-agent phase I trials, where a 3 + 3 rule-based design is often still used. To expedite this process, new dose-escalation designs need to be not only scientifically beneficial but also easy to understand and implement by clinicians. In this paper, we propose a curve-free (nonparametric) design for a dual-agent trial in which the model parameters are the probabilities of toxicity at each of the dose combinations. We show that it is relatively trivial for a clinician's prior beliefs or historical information to be incorporated in the model and updating is fast and computationally simple through the use of conjugate Bayesian inference. Monotonicity is ensured by considering only a set of monotonic contours for the distribution of the maximum tolerated contour, which defines the dose-escalation decision process. Varied experimentation around the contour is achievable, and multiple dose combinations can be recommended to take forward to phase II. Code for R, Stata and Excel are available for implementation. © 2015 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.
Color Your Classroom V: A Math Guide on the Secondary Level.
ERIC Educational Resources Information Center
Mississippi Materials & Resource Center, Gulfport.
This curriculum guide, designed for use with secondary migrant students, presents mathematics activities in the areas of whole numbers, fractions, decimals, percent, measurement, geometry, probability and statistics, and sets. Within the categories of whole numbers, fractions, and decimals are activities using addition, subtraction,…
Facilities Performance Indicators Report 2012-13: Tracking Your Facilities Vital Signs
ERIC Educational Resources Information Center
APPA: Association of Higher Education Facilities Officers, 2014
2014-01-01
This paper features an expanded Web-based "Facilities Performance Indicators (FPI) Report." The purpose of APPA's Facilities Performance Indicators is to provide a representative set of statistics about facilities in educational institutions. "The Facilities Performance Indicators Report" is designed for survey…
Performance Indicators in Education.
ERIC Educational Resources Information Center
Irvine, David J.
Evaluation of education involves assessing the effectiveness of schools and trying to determine how best to improve them. Since evaluation often deals only with the question of effectiveness, performance indicators in education are designed to make evaluation more complete. They are a set of statistical models which relate several important…
Mueller, Amy V; Hemond, Harold F
2016-05-18
Knowledge of ionic concentrations in natural waters is essential to understand watershed processes. Inorganic nitrogen, in the form of nitrate and ammonium ions, is a key nutrient as well as a participant in redox, acid-base, and photochemical processes of natural waters, leading to spatiotemporal patterns of ion concentrations at scales as small as meters or hours. Current options for measurement in situ are costly, relying primarily on instruments adapted from laboratory methods (e.g., colorimetric, UV absorption); free-standing and inexpensive ISE sensors for NO3(-) and NH4(+) could be attractive alternatives if interferences from other constituents were overcome. Multi-sensor arrays, coupled with appropriate non-linear signal processing, offer promise in this capacity but have not yet successfully achieved signal separation for NO3(-) and NH4(+)in situ at naturally occurring levels in unprocessed water samples. A novel signal processor, underpinned by an appropriate sensor array, is proposed that overcomes previous limitations by explicitly integrating basic chemical constraints (e.g., charge balance). This work further presents a rationalized process for the development of such in situ instrumentation for NO3(-) and NH4(+), including a statistical-modeling strategy for instrument design, training/calibration, and validation. Statistical analysis reveals that historical concentrations of major ionic constituents in natural waters across New England strongly covary and are multi-modal. This informs the design of a statistically appropriate training set, suggesting that the strong covariance of constituents across environmental samples can be exploited through appropriate signal processing mechanisms to further improve estimates of minor constituents. Two artificial neural network architectures, one expanded to incorporate knowledge of basic chemical constraints, were tested to process outputs of a multi-sensor array, trained using datasets of varying degrees of statistical representativeness to natural water samples. The accuracy of ANN results improves monotonically with the statistical representativeness of the training set (error decreases by ∼5×), while the expanded neural network architecture contributes a further factor of 2-3.5 decrease in error when trained with the most representative sample set. Results using the most statistically accurate set of training samples (which retain environmentally relevant ion concentrations but avoid the potential interference of humic acids) demonstrated accurate, unbiased quantification of nitrate and ammonium at natural environmental levels (±20% down to <10 μM), as well as the major ions Na(+), K(+), Ca(2+), Mg(2+), Cl(-), and SO4(2-), in unprocessed samples. These results show promise for the development of new in situ instrumentation for the support of scientific field work.
Introductory Life Science Mathematics and Quantitative Neuroscience Courses
Olifer, Andrei
2010-01-01
We describe two sets of courses designed to enhance the mathematical, statistical, and computational training of life science undergraduates at Emory College. The first course is an introductory sequence in differential and integral calculus, modeling with differential equations, probability, and inferential statistics. The second is an upper-division course in computational neuroscience. We provide a description of each course, detailed syllabi, examples of content, and a brief discussion of the main issues encountered in developing and offering the courses. PMID:20810971
Statistics, Uncertainty, and Transmitted Variation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wendelberger, Joanne Roth
2014-11-05
The field of Statistics provides methods for modeling and understanding data and making decisions in the presence of uncertainty. When examining response functions, variation present in the input variables will be transmitted via the response function to the output variables. This phenomenon can potentially have significant impacts on the uncertainty associated with results from subsequent analysis. This presentation will examine the concept of transmitted variation, its impact on designed experiments, and a method for identifying and estimating sources of transmitted variation in certain settings.
Adaptable state based control system
NASA Technical Reports Server (NTRS)
Rasmussen, Robert D. (Inventor); Dvorak, Daniel L. (Inventor); Gostelow, Kim P. (Inventor); Starbird, Thomas W. (Inventor); Gat, Erann (Inventor); Chien, Steve Ankuo (Inventor); Keller, Robert M. (Inventor)
2004-01-01
An autonomous controller, comprised of a state knowledge manager, a control executor, hardware proxies and a statistical estimator collaborates with a goal elaborator, with which it shares common models of the behavior of the system and the controller. The elaborator uses the common models to generate from temporally indeterminate sets of goals, executable goals to be executed by the controller. The controller may be updated to operate in a different system or environment than that for which it was originally designed by the replacement of shared statistical models and by the instantiation of a new set of state variable objects derived from a state variable class. The adaptation of the controller does not require substantial modification of the goal elaborator for its application to the new system or environment.
2017-01-01
Technological developments and greater rigor in the quantitative measurement of biological features in medical images have given rise to an increased interest in using quantitative imaging biomarkers (QIBs) to measure changes in these features. Critical to the performance of a QIB in preclinical or clinical settings are three primary metrology areas of interest: measurement linearity and bias, repeatability, and the ability to consistently reproduce equivalent results when conditions change, as would be expected in any clinical trial. Unfortunately, performance studies to date differ greatly in designs, analysis method and metrics used to assess a QIB for clinical use. It is therefore, difficult or not possible to integrate results from different studies or to use reported results to design studies. The Radiological Society of North America (RSNA) and the Quantitative Imaging Biomarker Alliance (QIBA) with technical, radiological and statistical experts developed a set of technical performance analysis methods, metrics and study designs that provide terminology, metrics and methods consistent with widely accepted metrological standards. This document provides a consistent framework for the conduct and evaluation of QIB performance studies so that results from multiple studies can be compared, contrasted or combined. PMID:24919831
The use and misuse of statistical methodologies in pharmacology research.
Marino, Michael J
2014-01-01
Descriptive, exploratory, and inferential statistics are necessary components of hypothesis-driven biomedical research. Despite the ubiquitous need for these tools, the emphasis on statistical methods in pharmacology has become dominated by inferential methods often chosen more by the availability of user-friendly software than by any understanding of the data set or the critical assumptions of the statistical tests. Such frank misuse of statistical methodology and the quest to reach the mystical α<0.05 criteria has hampered research via the publication of incorrect analysis driven by rudimentary statistical training. Perhaps more critically, a poor understanding of statistical tools limits the conclusions that may be drawn from a study by divorcing the investigator from their own data. The net result is a decrease in quality and confidence in research findings, fueling recent controversies over the reproducibility of high profile findings and effects that appear to diminish over time. The recent development of "omics" approaches leading to the production of massive higher dimensional data sets has amplified these issues making it clear that new approaches are needed to appropriately and effectively mine this type of data. Unfortunately, statistical education in the field has not kept pace. This commentary provides a foundation for an intuitive understanding of statistics that fosters an exploratory approach and an appreciation for the assumptions of various statistical tests that hopefully will increase the correct use of statistics, the application of exploratory data analysis, and the use of statistical study design, with the goal of increasing reproducibility and confidence in the literature. Copyright © 2013. Published by Elsevier Inc.
Zhu, H.; Braun, W.
1999-01-01
A statistical analysis of a representative data set of 169 known protein structures was used to analyze the specificity of residue interactions between spatial neighboring strands in beta-sheets. Pairwise potentials were derived from the frequency of residue pairs in nearest contact, second nearest and third nearest contacts across neighboring beta-strands compared to the expected frequency of residue pairs in a random model. A pseudo-energy function based on these statistical pairwise potentials recognized native beta-sheets among possible alternative pairings. The native pairing was found within the three lowest energies in 73% of the cases in the training data set and in 63% of beta-sheets in a test data set of 67 proteins, which were not part of the training set. The energy function was also used to detect tripeptides, which occur frequently in beta-sheets of native proteins. The majority of native partners of tripeptides were distributed in a low energy range. Self-correcting distance geometry (SECODG) calculations using distance constraints sets derived from possible low energy pairing of beta-strands uniquely identified the native pairing of the beta-sheet in pancreatic trypsin inhibitor (BPTI). These results will be useful for predicting the structure of proteins from their amino acid sequence as well as for the design of proteins containing beta-sheets. PMID:10048326
Statistical distribution of mechanical properties for three graphite-epoxy material systems
NASA Technical Reports Server (NTRS)
Reese, C.; Sorem, J., Jr.
1981-01-01
Graphite-epoxy composites are playing an increasing role as viable alternative materials in structural applications necessitating thorough investigation into the predictability and reproducibility of their material strength properties. This investigation was concerned with tension, compression, and short beam shear coupon testing of large samples from three different material suppliers to determine their statistical strength behavior. Statistical results indicate that a two Parameter Weibull distribution model provides better overall characterization of material behavior for the graphite-epoxy systems tested than does the standard Normal distribution model that is employed for most design work. While either a Weibull or Normal distribution model provides adequate predictions for average strength values, the Weibull model provides better characterization in the lower tail region where the predictions are of maximum design interest. The two sets of the same material were found to have essentially the same material properties, and indicate that repeatability can be achieved.
Zborowsky, Terri
2014-01-01
The purpose of this paper is to explore nursing research that is focused on the impact of healthcare environments and that has resonance with the aspects of Florence Nightingale's environmental theory. Nurses have a unique ability to apply their observational skills to understand the role of the designed environment to enable healing in their patients. This affords nurses the opportunity to engage in research studies that have immediate impact on the act of nursing. Descriptive statistics were performed on 67 healthcare design-related research articles from 25 nursing journals to discover the topical areas of interest of nursing research today. Data were also analyzed to reveal the research designs, research methods, and research settings. These data are part of an ongoing study. Descriptive statistics reveal that topics and settings most frequently cited are in keeping with the current healthcare foci of patient care quality and safety in acute and intensive care environments. Research designs and methods most frequently cited are in keeping with the early progression of a knowledge area. A few assertions can be made as a result of this study. First, education is important to continue the knowledge development in this area. Second, multiple method research studies should continue to be considered as important to healthcare research. Finally, bedside nurses are in the best position possible to begin to help us all, through research, understand how the design environment impacts patients during the act of nursing. Evidence-based design, literature review, nursing.
NASA Technical Reports Server (NTRS)
Xiang, Xuwu; Smith, Eric A.; Tripoli, Gregory J.
1992-01-01
A hybrid statistical-physical retrieval scheme is explored which combines a statistical approach with an approach based on the development of cloud-radiation models designed to simulate precipitating atmospheres. The algorithm employs the detailed microphysical information from a cloud model as input to a radiative transfer model which generates a cloud-radiation model database. Statistical procedures are then invoked to objectively generate an initial guess composite profile data set from the database. The retrieval algorithm has been tested for a tropical typhoon case using Special Sensor Microwave/Imager (SSM/I) data and has shown satisfactory results.
From Acceptance to Rejection: Food Contamination in the Classroom.
ERIC Educational Resources Information Center
Rajecki, D. W.
1989-01-01
Describes a classroom exercise to explain design and measurement principles in methodology and statistics courses. This demonstration which involves measurement of a shift from food acceptance to food rejection produces meaningful data sets. The realism of the exercise gives students a view of problems that emerge in research. (KO)
HOW 1967 AWARD WINNING SCHOOLS COMPARE.
ERIC Educational Resources Information Center
1968
THIS IS A 30 PAGE PORTFOLIO OF PHOTOS, FLOOR PLANS, AND COMPARATIVE STATISTICS ON 24 TREND-SETTING SCHOOLS. SCHOOLS INCLUDED WERE GIVEN DISTINGUISHED DESIGN AWARDS BY THE AMERICAN ASSOCIATION OF SCHOOL ADMINISTRATORS AND STATE CHAPTERS OF THE AMERICAN INSTITUTE OF ARCHITECTS. TWELVE JUNIOR AND SENIOR HIGH SCHOOLS INCLUDED HAVE SUCH FEATURES AS THE…
DNA Fingerprinting in a Forensic Teaching Experiment
ERIC Educational Resources Information Center
Wagoner, Stacy A.; Carlson, Kimberly A.
2008-01-01
This article presents an experiment designed to provide students, in a classroom laboratory setting, a hands-on demonstration of the steps used in DNA forensic analysis by performing DNA extraction, DNA fingerprinting, and statistical analysis of the data. This experiment demonstrates how DNA fingerprinting is performed and how long it takes. It…
Self-Employment among Italian Female Graduates
ERIC Educational Resources Information Center
Rosti, Luisa; Chelli, Francesco
2009-01-01
Purpose: The purpose of this paper is to investigate the gender impact of tertiary education on the probability of entering and remaining in self-employment. Design/methodology/approach: A data set on labour market flows produced by the Italian National Statistical Office is exploited by interviewing about 62,000 graduate and non-graduate…
Monitoring changes in exotic vegetation
Robert D. Sutter
1998-01-01
Ecological monitoring provides critical information for management decisions by measuring changes in managed and unmanaged populations, communities and ecological systems. It integrates ecology, goal and objective setting, sampling design, sampling methods, and statistical analysis. It is a topic that I, with a team of Nature Conservancy ecologists, teach in a six day...
A Fast Turn-Around Facility for Very Large Scale Integration (VLSI)
1982-06-01
statistics determination, the first test mask set will use the MATRIX chip design which was recently developed here at Stanford. This chip provides...reached when the basewidth is reduced to zero. Such devices, variably known as depleted- base transistors or bipolar static-induction transitors , have been
Commentary: Using Potential Outcomes to Understand Causal Mediation Analysis
ERIC Educational Resources Information Center
Imai, Kosuke; Jo, Booil; Stuart, Elizabeth A.
2011-01-01
In this commentary, we demonstrate how the potential outcomes framework can help understand the key identification assumptions underlying causal mediation analysis. We show that this framework can lead to the development of alternative research design and statistical analysis strategies applicable to the longitudinal data settings considered by…
Issues in Postsecondary Education. Financial Viability of Institutions.
ERIC Educational Resources Information Center
Jenny, Hans H.
One of three papers commissioned as part of the Postsecondary Education Core Design Project of the National Center for Education Statistics (NCES), one of this study's purposes was to identify and set priorities for postsecondary education decisionmakers in the area of financial viability in postsecondary institutions. Another purpose was to…
The Role of Design-of-Experiments in Managing Flow in Compact Air Vehicle Inlets
NASA Technical Reports Server (NTRS)
Anderson, Bernhard H.; Miller, Daniel N.; Gridley, Marvin C.; Agrell, Johan
2003-01-01
It is the purpose of this study to demonstrate the viability and economy of Design-of-Experiments methodologies to arrive at microscale secondary flow control array designs that maintain optimal inlet performance over a wide range of the mission variables and to explore how these statistical methods provide a better understanding of the management of flow in compact air vehicle inlets. These statistical design concepts were used to investigate the robustness properties of low unit strength micro-effector arrays. Low unit strength micro-effectors are micro-vanes set at very low angles-of-incidence with very long chord lengths. They were designed to influence the near wall inlet flow over an extended streamwise distance, and their advantage lies in low total pressure loss and high effectiveness in managing engine face distortion. The term robustness is used in this paper in the same sense as it is used in the industrial problem solving community. It refers to minimizing the effects of the hard-to-control factors that influence the development of a product or process. In Robustness Engineering, the effects of the hard-to-control factors are often called noise , and the hard-to-control factors themselves are referred to as the environmental variables or sometimes as the Taguchi noise variables. Hence Robust Optimization refers to minimizing the effects of the environmental or noise variables on the development (design) of a product or process. In the management of flow in compact inlets, the environmental or noise variables can be identified with the mission variables. Therefore this paper formulates a statistical design methodology that minimizes the impact of variations in the mission variables on inlet performance and demonstrates that these statistical design concepts can lead to simpler inlet flow management systems.
Vesterinen, Hanna M; Vesterinen, Hanna V; Egan, Kieren; Deister, Amelie; Schlattmann, Peter; Macleod, Malcolm R; Dirnagl, Ulrich
2011-04-01
Translating experimental findings into clinically effective therapies is one of the major bottlenecks of modern medicine. As this has been particularly true for cerebrovascular research, attention has turned to the quality and validity of experimental cerebrovascular studies. We set out to assess the study design, statistical analyses, and reporting of cerebrovascular research. We assessed all original articles published in the Journal of Cerebral Blood Flow and Metabolism during the year 2008 against a checklist designed to capture the key attributes relating to study design, statistical analyses, and reporting. A total of 156 original publications were included (animal, in vitro, human). Few studies reported a primary research hypothesis, statement of purpose, or measures to safeguard internal validity (such as randomization, blinding, exclusion or inclusion criteria). Many studies lacked sufficient information regarding methods and results to form a reasonable judgment about their validity. In nearly 20% of studies, statistical tests were either not appropriate or information to allow assessment of appropriateness was lacking. This study identifies a number of factors that should be addressed if the quality of research in basic and translational biomedicine is to be improved. We support the widespread implementation of the ARRIVE (Animal Research Reporting In Vivo Experiments) statement for the reporting of experimental studies in biomedicine, for improving training in proper study design and analysis, and that reviewers and editors adopt a more constructively critical approach in the assessment of manuscripts for publication.
Vesterinen, Hanna V; Egan, Kieren; Deister, Amelie; Schlattmann, Peter; Macleod, Malcolm R; Dirnagl, Ulrich
2011-01-01
Translating experimental findings into clinically effective therapies is one of the major bottlenecks of modern medicine. As this has been particularly true for cerebrovascular research, attention has turned to the quality and validity of experimental cerebrovascular studies. We set out to assess the study design, statistical analyses, and reporting of cerebrovascular research. We assessed all original articles published in the Journal of Cerebral Blood Flow and Metabolism during the year 2008 against a checklist designed to capture the key attributes relating to study design, statistical analyses, and reporting. A total of 156 original publications were included (animal, in vitro, human). Few studies reported a primary research hypothesis, statement of purpose, or measures to safeguard internal validity (such as randomization, blinding, exclusion or inclusion criteria). Many studies lacked sufficient information regarding methods and results to form a reasonable judgment about their validity. In nearly 20% of studies, statistical tests were either not appropriate or information to allow assessment of appropriateness was lacking. This study identifies a number of factors that should be addressed if the quality of research in basic and translational biomedicine is to be improved. We support the widespread implementation of the ARRIVE (Animal Research Reporting In Vivo Experiments) statement for the reporting of experimental studies in biomedicine, for improving training in proper study design and analysis, and that reviewers and editors adopt a more constructively critical approach in the assessment of manuscripts for publication. PMID:21157472
Bastolla, Ugo
2014-01-01
The properties of biomolecules depend both on physics and on the evolutionary process that formed them. These two points of view produce a powerful synergism. Physics sets the stage and the constraints that molecular evolution has to obey, and evolutionary theory helps in rationalizing the physical properties of biomolecules, including protein folding thermodynamics. To complete the parallelism, protein thermodynamics is founded on the statistical mechanics in the space of protein structures, and molecular evolution can be viewed as statistical mechanics in the space of protein sequences. In this review, we will integrate both points of view, applying them to detecting selection on the stability of the folded state of proteins. We will start discussing positive design, which strengthens the stability of the folded against the unfolded state of proteins. Positive design justifies why statistical potentials for protein folding can be obtained from the frequencies of structural motifs. Stability against unfolding is easier to achieve for longer proteins. On the contrary, negative design, which consists in destabilizing frequently formed misfolded conformations, is more difficult to achieve for longer proteins. The folding rate can be enhanced by strengthening short-range native interactions, but this requirement contrasts with negative design, and evolution has to trade-off between them. Finally, selection can accelerate functional movements by favoring low frequency normal modes of the dynamics of the native state that strongly correlate with the functional conformation change. PMID:24970217
fMRI paradigm designing and post-processing tools
James, Jija S; Rajesh, PG; Chandran, Anuvitha VS; Kesavadas, Chandrasekharan
2014-01-01
In this article, we first review some aspects of functional magnetic resonance imaging (fMRI) paradigm designing for major cognitive functions by using stimulus delivery systems like Cogent, E-Prime, Presentation, etc., along with their technical aspects. We also review the stimulus presentation possibilities (block, event-related) for visual or auditory paradigms and their advantage in both clinical and research setting. The second part mainly focus on various fMRI data post-processing tools such as Statistical Parametric Mapping (SPM) and Brain Voyager, and discuss the particulars of various preprocessing steps involved (realignment, co-registration, normalization, smoothing) in these software and also the statistical analysis principles of General Linear Modeling for final interpretation of a functional activation result. PMID:24851001
Chung, Chi-Jung; Kuo, Yu-Chen; Hsieh, Yun-Yu; Li, Tsai-Chung; Lin, Cheng-Chieh; Liang, Wen-Miin; Liao, Li-Na; Li, Chia-Ing; Lin, Hsueh-Chun
2017-11-01
This study applied open source technology to establish a subject-enabled analytics model that can enhance measurement statistics of case studies with the public health data in cloud computing. The infrastructure of the proposed model comprises three domains: 1) the health measurement data warehouse (HMDW) for the case study repository, 2) the self-developed modules of online health risk information statistics (HRIStat) for cloud computing, and 3) the prototype of a Web-based process automation system in statistics (PASIS) for the health risk assessment of case studies with subject-enabled evaluation. The system design employed freeware including Java applications, MySQL, and R packages to drive a health risk expert system (HRES). In the design, the HRIStat modules enforce the typical analytics methods for biomedical statistics, and the PASIS interfaces enable process automation of the HRES for cloud computing. The Web-based model supports both modes, step-by-step analysis and auto-computing process, respectively for preliminary evaluation and real time computation. The proposed model was evaluated by computing prior researches in relation to the epidemiological measurement of diseases that were caused by either heavy metal exposures in the environment or clinical complications in hospital. The simulation validity was approved by the commercial statistics software. The model was installed in a stand-alone computer and in a cloud-server workstation to verify computing performance for a data amount of more than 230K sets. Both setups reached efficiency of about 10 5 sets per second. The Web-based PASIS interface can be used for cloud computing, and the HRIStat module can be flexibly expanded with advanced subjects for measurement statistics. The analytics procedure of the HRES prototype is capable of providing assessment criteria prior to estimating the potential risk to public health. Copyright © 2017 Elsevier B.V. All rights reserved.
Johnson, Oliver K.; Kurniawan, Christian
2018-02-03
Properties closures delineate the theoretical objective space for materials design problems, allowing designers to make informed trade-offs between competing constraints and target properties. In this paper, we present a new algorithm called hierarchical simplex sampling (HSS) that approximates properties closures more efficiently and faithfully than traditional optimization based approaches. By construction, HSS generates samples of microstructure statistics that span the corresponding microstructure hull. As a result, we also find that HSS can be coupled with synthetic polycrystal generation software to generate diverse sets of microstructures for subsequent mesoscale simulations. Finally, by more broadly sampling the space of possible microstructures, itmore » is anticipated that such diverse microstructure sets will expand our understanding of the influence of microstructure on macroscale effective properties and inform the construction of higher-fidelity mesoscale structure-property models.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Johnson, Oliver K.; Kurniawan, Christian
Properties closures delineate the theoretical objective space for materials design problems, allowing designers to make informed trade-offs between competing constraints and target properties. In this paper, we present a new algorithm called hierarchical simplex sampling (HSS) that approximates properties closures more efficiently and faithfully than traditional optimization based approaches. By construction, HSS generates samples of microstructure statistics that span the corresponding microstructure hull. As a result, we also find that HSS can be coupled with synthetic polycrystal generation software to generate diverse sets of microstructures for subsequent mesoscale simulations. Finally, by more broadly sampling the space of possible microstructures, itmore » is anticipated that such diverse microstructure sets will expand our understanding of the influence of microstructure on macroscale effective properties and inform the construction of higher-fidelity mesoscale structure-property models.« less
Building a medical image processing algorithm verification database
NASA Astrophysics Data System (ADS)
Brown, C. Wayne
2000-06-01
The design of a database containing head Computed Tomography (CT) studies is presented, along with a justification for the database's composition. The database will be used to validate software algorithms that screen normal head CT studies from studies that contain pathology. The database is designed to have the following major properties: (1) a size sufficient for statistical viability, (2) inclusion of both normal (no pathology) and abnormal scans, (3) inclusion of scans due to equipment malfunction, technologist error, and uncooperative patients, (4) inclusion of data sets from multiple scanner manufacturers, (5) inclusion of data sets from different gender and age groups, and (6) three independent diagnosis of each data set. Designed correctly, the database will provide a partial basis for FDA (United States Food and Drug Administration) approval of image processing algorithms for clinical use. Our goal for the database is the proof of viability of screening head CT's for normal anatomy using computer algorithms. To put this work into context, a classification scheme for 'computer aided diagnosis' systems is proposed.
Cost Finding Principles and Procedures. Preliminary Field Review Edition. Technical Report 26.
ERIC Educational Resources Information Center
Ziemer, Gordon; And Others
This report is part of the Larger Cost Finding Principles Project designed to develop a uniform set of standards, definitions, and alternative procedures that will use accounting and statistical data to find the full cost of resources utilized in the process of producing institutional outputs. This technical report describes preliminary procedures…
ERIC Educational Resources Information Center
Wright, Hazel A.; Ironside, Joseph E.; Gwynn-Jones, Dylan
2008-01-01
Purpose: This study aims to identify the current barriers to sustainability in the bioscience laboratory setting and to determine which mechanisms are likely to increase sustainable behaviours in this specialised environment. Design/methodology/approach: The study gathers qualitative data from a sample of laboratory researchers presently…
Higher Education in Non-Standard Wage Contracts
ERIC Educational Resources Information Center
Rosti, Luisa; Chelli, Francesco
2012-01-01
Purpose: The purpose of this paper is to verify whether higher education increases the likelihood of young Italian workers moving from non-standard to standard wage contracts. Design/methodology/approach: The authors exploit a data set on labour market flows, produced by the Italian National Statistical Office, by interviewing about 85,000…
ERIC Educational Resources Information Center
Mbalamula, Yazidu Saidi
2018-01-01
The study analyzes students' performance scores in formative assessments depicting the individual and group settings. A case study design was adopted using quantitative approach to extract data of 198 undergraduate students. Data were analyzed quantitatively using descriptive statistics--means and frequencies; spearman correlations, multiple…
ERIC Educational Resources Information Center
Western Center for Drug-Free Schools and Communities.
The Drug Impact Index provides a set of indicators designed to determine the extent of the local drug problem in a community. Each indicator includes a technical note on the data sources, a graph showing comparative statistics on that indicator for the Portland area and for the State of Oregon, and brief remarks on the implications of the data.…
Gender, Marital Status, and Commercially Prepared Food Expenditure
ERIC Educational Resources Information Center
Kroshus, Emily
2008-01-01
Objective: Assess how per capita expenditure on commercially prepared food as a proportion of total food expenditure varies by the sex and marital status of the head of the household. Design: Prospective cohort study, data collected by the United States Bureau of Labor Statistics 2004 Consumer Expenditure Survey. Setting: United States.…
Multivariate analysis in thoracic research.
Mengual-Macenlle, Noemí; Marcos, Pedro J; Golpe, Rafael; González-Rivas, Diego
2015-03-01
Multivariate analysis is based in observation and analysis of more than one statistical outcome variable at a time. In design and analysis, the technique is used to perform trade studies across multiple dimensions while taking into account the effects of all variables on the responses of interest. The development of multivariate methods emerged to analyze large databases and increasingly complex data. Since the best way to represent the knowledge of reality is the modeling, we should use multivariate statistical methods. Multivariate methods are designed to simultaneously analyze data sets, i.e., the analysis of different variables for each person or object studied. Keep in mind at all times that all variables must be treated accurately reflect the reality of the problem addressed. There are different types of multivariate analysis and each one should be employed according to the type of variables to analyze: dependent, interdependence and structural methods. In conclusion, multivariate methods are ideal for the analysis of large data sets and to find the cause and effect relationships between variables; there is a wide range of analysis types that we can use.
Recommendation Systems for Geoscience Data Portals Built by Analyzing Usage Patterns
NASA Astrophysics Data System (ADS)
Crosby, C.; Nandigam, V.; Baru, C.
2009-04-01
Since its launch five years ago, the National Science Foundation-funded GEON Project (www.geongrid.org) has been providing access to a variety of geoscience data sets such as geologic maps and other geographic information system (GIS)-oriented data, paleontologic databases, gravity and magnetics data and LiDAR topography via its online portal interface. In addition to data, the GEON Portal also provides web-based tools and other resources that enable users to process and interact with data. Examples of these tools include functions to dynamically map and integrate GIS data, compute synthetic seismograms, and to produce custom digital elevation models (DEMs) with user defined parameters such as resolution. The GEON portal built on the Gridsphere-portal framework allows us to capture user interaction with the system. In addition to the site access statistics captured by tools like Google Analystics which capture hits per unit time, search key words, operating systems, browsers, and referring sites, we also record additional statistics such as which data sets are being downloaded and in what formats, processing parameters, and navigation pathways through the portal. With over four years of data now available from the GEON Portal, this record of usage is a rich resource for exploring how earth scientists discover and utilize online data sets. Furthermore, we propose that this data could ultimately be harnessed to optimize the way users interact with the data portal, design intelligent processing and data management systems, and to make recommendations on algorithm settings and other available relevant data. The paradigm of integrating popular and commonly used patterns to make recommendations to a user is well established in the world of e-commerce where users receive suggestions on books, music and other products that they may find interesting based on their website browsing and purchasing history, as well as the patterns of fellow users who have made similar selections. However, this paradigm has not yet been explored for geoscience data portals. In this presentation we will present an initial analysis of user interaction and access statistics for the GEON OpenTopography LiDAR data distribution and processing system to illustrate what they reveal about user's spatial and temporal data access patterns, data processing parameter selections, and pathways through the data portal. We also demonstrate what these usage statistics can illustrate about aspects of the data sets that are of greatest interest. Finally, we explore how these usage statistics could be used to improve the user's experience in the data portal and to optimize how data access interfaces and tools are designed and implemented.
NASA Technical Reports Server (NTRS)
Hendricks, Robert C.; Zaretsky, Erwin V.
2001-01-01
Critical component design is based on minimizing product failures that results in loss of life. Potential catastrophic failures are reduced to secondary failures where components removed for cause or operating time in the system. Issues of liability and cost of component removal become of paramount importance. Deterministic design with factors of safety and probabilistic design address but lack the essential characteristics for the design of critical components. In deterministic design and fabrication there are heuristic rules and safety factors developed over time for large sets of structural/material components. These factors did not come without cost. Many designs failed and many rules (codes) have standing committees to oversee their proper usage and enforcement. In probabilistic design, not only are failures a given, the failures are calculated; an element of risk is assumed based on empirical failure data for large classes of component operations. Failure of a class of components can be predicted, yet one can not predict when a specific component will fail. The analogy is to the life insurance industry where very careful statistics are book-kept on classes of individuals. For a specific class, life span can be predicted within statistical limits, yet life-span of a specific element of that class can not be predicted.
A Bayesian pick-the-winner design in a randomized phase II clinical trial.
Chen, Dung-Tsa; Huang, Po-Yu; Lin, Hui-Yi; Chiappori, Alberto A; Gabrilovich, Dmitry I; Haura, Eric B; Antonia, Scott J; Gray, Jhanelle E
2017-10-24
Many phase II clinical trials evaluate unique experimental drugs/combinations through multi-arm design to expedite the screening process (early termination of ineffective drugs) and to identify the most effective drug (pick the winner) to warrant a phase III trial. Various statistical approaches have been developed for the pick-the-winner design but have been criticized for lack of objective comparison among the drug agents. We developed a Bayesian pick-the-winner design by integrating a Bayesian posterior probability with Simon two-stage design in a randomized two-arm clinical trial. The Bayesian posterior probability, as the rule to pick the winner, is defined as probability of the response rate in one arm higher than in the other arm. The posterior probability aims to determine the winner when both arms pass the second stage of the Simon two-stage design. When both arms are competitive (i.e., both passing the second stage), the Bayesian posterior probability performs better to correctly identify the winner compared with the Fisher exact test in the simulation study. In comparison to a standard two-arm randomized design, the Bayesian pick-the-winner design has a higher power to determine a clear winner. In application to two studies, the approach is able to perform statistical comparison of two treatment arms and provides a winner probability (Bayesian posterior probability) to statistically justify the winning arm. We developed an integrated design that utilizes Bayesian posterior probability, Simon two-stage design, and randomization into a unique setting. It gives objective comparisons between the arms to determine the winner.
Improving the governance of patient safety in emergency care: a systematic review of interventions
Hesselink, Gijs; Berben, Sivera; Beune, Thimpe
2016-01-01
Objectives To systematically review interventions that aim to improve the governance of patient safety within emergency care on effectiveness, reliability, validity and feasibility. Design A systematic review of the literature. Methods PubMed, EMBASE, Cumulative Index to Nursing and Allied Health Literature, the Cochrane Database of Systematic Reviews and PsychInfo were searched for studies published between January 1990 and July 2014. We included studies evaluating interventions relevant for higher management to oversee and manage patient safety, in prehospital emergency medical service (EMS) organisations and hospital-based emergency departments (EDs). Two reviewers independently selected candidate studies, extracted data and assessed study quality. Studies were categorised according to study quality, setting, sample, intervention characteristics and findings. Results Of the 18 included studies, 13 (72%) were non-experimental. Nine studies (50%) reported data on the reliability and/or validity of the intervention. Eight studies (44%) reported on the feasibility of the intervention. Only 4 studies (22%) reported statistically significant effects. The use of a simulation-based training programme and well-designed incident reporting systems led to a statistically significant improvement of safety knowledge and attitudes by ED staff and an increase of incident reports within EDs, respectively. Conclusions Characteristics of the interventions included in this review (eg, anonymous incident reporting and validation of incident reports by an independent party) could provide useful input for the design of an effective tool to govern patient safety in EMS organisations and EDs. However, executives cannot rely on a robust set of evidence-based and feasible tools to govern patient safety within their emergency care organisation and in the chain of emergency care. Established strategies from other high-risk sectors need to be evaluated in emergency care settings, using an experimental design with valid outcome measures to strengthen the evidence base. PMID:26826151
NASA Astrophysics Data System (ADS)
Jorge, Flávio; Riva, Carlo; Rocha, Armando
2016-03-01
The characterization of the fade dynamics on Earth-satellite links is an important subject when designing the so called fade mitigation techniques that contribute to the proper reliability of the satellite communication systems and the customers' quality of service (QoS). The interfade duration, defined as the period between two consecutive fade events, has been only poorly analyzed using limited data sets, but its complete characterization would enable the design and optimization of the satellite communication systems by estimating the system requirements to recover in time before the next propagation impairment. Depending on this analysis, several actions can be taken ensuring the service maintenance. In this paper we present for the first time a detailed and comprehensive analysis of the interfade events statistical properties based on 9 years of in-excess attenuation measurements at Ka band (19.7 GHz) with very high availability that is required to build a reliable data set mainly for the longer interfade duration events. The number of years necessary to reach the statistical stability of interfade duration is also evaluated for the first time, providing a reference when accessing the relevance of the results published in the past. The study is carried out in Aveiro, Portugal, which is conditioned by temperate Mediterranean climate with Oceanic influences.
NASA Astrophysics Data System (ADS)
Liu, Chanjuan; van Netten, Jaap J.; Klein, Marvin E.; van Baal, Jeff G.; Bus, Sicco A.; van der Heijden, Ferdi
2013-12-01
Early detection of (pre-)signs of ulceration on a diabetic foot is valuable for clinical practice. Hyperspectral imaging is a promising technique for detection and classification of such (pre-)signs. However, the number of the spectral bands should be limited to avoid overfitting, which is critical for pixel classification with hyperspectral image data. The goal was to design a detector/classifier based on spectral imaging (SI) with a small number of optical bandpass filters. The performance and stability of the design were also investigated. The selection of the bandpass filters boils down to a feature selection problem. A dataset was built, containing reflectance spectra of 227 skin spots from 64 patients, measured with a spectrometer. Each skin spot was annotated manually by clinicians as "healthy" or a specific (pre-)sign of ulceration. Statistical analysis on the data set showed the number of required filters is between 3 and 7, depending on additional constraints on the filter set. The stability analysis revealed that shot noise was the most critical factor affecting the classification performance. It indicated that this impact could be avoided in future SI systems with a camera sensor whose saturation level is higher than 106, or by postimage processing.
Raunig, David L; McShane, Lisa M; Pennello, Gene; Gatsonis, Constantine; Carson, Paul L; Voyvodic, James T; Wahl, Richard L; Kurland, Brenda F; Schwarz, Adam J; Gönen, Mithat; Zahlmann, Gudrun; Kondratovich, Marina V; O'Donnell, Kevin; Petrick, Nicholas; Cole, Patricia E; Garra, Brian; Sullivan, Daniel C
2015-02-01
Technological developments and greater rigor in the quantitative measurement of biological features in medical images have given rise to an increased interest in using quantitative imaging biomarkers to measure changes in these features. Critical to the performance of a quantitative imaging biomarker in preclinical or clinical settings are three primary metrology areas of interest: measurement linearity and bias, repeatability, and the ability to consistently reproduce equivalent results when conditions change, as would be expected in any clinical trial. Unfortunately, performance studies to date differ greatly in designs, analysis method, and metrics used to assess a quantitative imaging biomarker for clinical use. It is therefore difficult or not possible to integrate results from different studies or to use reported results to design studies. The Radiological Society of North America and the Quantitative Imaging Biomarker Alliance with technical, radiological, and statistical experts developed a set of technical performance analysis methods, metrics, and study designs that provide terminology, metrics, and methods consistent with widely accepted metrological standards. This document provides a consistent framework for the conduct and evaluation of quantitative imaging biomarker performance studies so that results from multiple studies can be compared, contrasted, or combined. © The Author(s) 2014 Reprints and permissions: sagepub.co.uk/journalsPermissions.nav.
Measuring outcomes of type 2 diabetes disease management program in an HMO setting.
Ibrahim, Ibrahim Awad; Beich, Jeff; Sidorov, Jaan; Gabbay, Robert; Yu, Lucy
2002-01-01
There is a need to evaluate empirical disease management programs used in managing chronic diseases such as diabetes mellitus in managed care settings. We analyzed data from 252 patients with type 2 diabetes before and 1 year after enrollment in a disease management program. We examined clinical indicators such as HbA1C, HDL, LDL, total cholesterol, diastolic blood pressure, and BMI in addition to self-reported health status measured by SF-36 instrument. All clinical indicators showed statistically and clinically significant improvements. Only vitality and mental health showed statistically significant improvements in health status. Weak to moderate significant correlation between clinical indicators and health status was observed. Disease management can be effective at making significant clinical improvements for participants in a mixed-model HMO setting. No strong relationship between clinical indicators and health status was found. Future research is needed using a more specific health status measuring instrument and a randomized clinical trial design.
Experimental statistics for biological sciences.
Bang, Heejung; Davidian, Marie
2010-01-01
In this chapter, we cover basic and fundamental principles and methods in statistics - from "What are Data and Statistics?" to "ANOVA and linear regression," which are the basis of any statistical thinking and undertaking. Readers can easily find the selected topics in most introductory statistics textbooks, but we have tried to assemble and structure them in a succinct and reader-friendly manner in a stand-alone chapter. This text has long been used in real classroom settings for both undergraduate and graduate students who do or do not major in statistical sciences. We hope that from this chapter, readers would understand the key statistical concepts and terminologies, how to design a study (experimental or observational), how to analyze the data (e.g., describe the data and/or estimate the parameter(s) and make inference), and how to interpret the results. This text would be most useful if it is used as a supplemental material, while the readers take their own statistical courses or it would serve as a great reference text associated with a manual for any statistical software as a self-teaching guide.
Guyonvarch, Estelle; Ramin, Elham; Kulahci, Murat; Plósz, Benedek Gy
2015-10-15
The present study aims at using statistically designed computational fluid dynamics (CFD) simulations as numerical experiments for the identification of one-dimensional (1-D) advection-dispersion models - computationally light tools, used e.g., as sub-models in systems analysis. The objective is to develop a new 1-D framework, referred to as interpreted CFD (iCFD) models, in which statistical meta-models are used to calculate the pseudo-dispersion coefficient (D) as a function of design and flow boundary conditions. The method - presented in a straightforward and transparent way - is illustrated using the example of a circular secondary settling tank (SST). First, the significant design and flow factors are screened out by applying the statistical method of two-level fractional factorial design of experiments. Second, based on the number of significant factors identified through the factor screening study and system understanding, 50 different sets of design and flow conditions are selected using Latin Hypercube Sampling (LHS). The boundary condition sets are imposed on a 2-D axi-symmetrical CFD simulation model of the SST. In the framework, to degenerate the 2-D model structure, CFD model outputs are approximated by the 1-D model through the calibration of three different model structures for D. Correlation equations for the D parameter then are identified as a function of the selected design and flow boundary conditions (meta-models), and their accuracy is evaluated against D values estimated in each numerical experiment. The evaluation and validation of the iCFD model structure is carried out using scenario simulation results obtained with parameters sampled from the corners of the LHS experimental region. For the studied SST, additional iCFD model development was carried out in terms of (i) assessing different density current sub-models; (ii) implementation of a combined flocculation, hindered, transient and compression settling velocity function; and (iii) assessment of modelling the onset of transient and compression settling. Furthermore, the optimal level of model discretization both in 2-D and 1-D was undertaken. Results suggest that the iCFD model developed for the SST through the proposed methodology is able to predict solid distribution with high accuracy - taking a reasonable computational effort - when compared to multi-dimensional numerical experiments, under a wide range of flow and design conditions. iCFD tools could play a crucial role in reliably predicting systems' performance under normal and shock events. Copyright © 2015 Elsevier Ltd. All rights reserved.
McElreath, Richard; Bell, Adrian V; Efferson, Charles; Lubell, Mark; Richerson, Peter J; Waring, Timothy
2008-11-12
The existence of social learning has been confirmed in diverse taxa, from apes to guppies. In order to advance our understanding of the consequences of social transmission and evolution of behaviour, however, we require statistical tools that can distinguish among diverse social learning strategies. In this paper, we advance two main ideas. First, social learning is diverse, in the sense that individuals can take advantage of different kinds of information and combine them in different ways. Examining learning strategies for different information conditions illuminates the more detailed design of social learning. We construct and analyse an evolutionary model of diverse social learning heuristics, in order to generate predictions and illustrate the impact of design differences on an organism's fitness. Second, in order to eventually escape the laboratory and apply social learning models to natural behaviour, we require statistical methods that do not depend upon tight experimental control. Therefore, we examine strategic social learning in an experimental setting in which the social information itself is endogenous to the experimental group, as it is in natural settings. We develop statistical models for distinguishing among different strategic uses of social information. The experimental data strongly suggest that most participants employ a hierarchical strategy that uses both average observed pay-offs of options as well as frequency information, the same model predicted by our evolutionary analysis to dominate a wide range of conditions.
Wyman, Peter A; Henry, David; Knoblauch, Shannon; Brown, C Hendricks
2015-10-01
The dynamic wait-listed design (DWLD) and regression point displacement design (RPDD) address several challenges in evaluating group-based interventions when there is a limited number of groups. Both DWLD and RPDD utilize efficiencies that increase statistical power and can enhance balance between community needs and research priorities. The DWLD blocks on more time units than traditional wait-listed designs, thereby increasing the proportion of a study period during which intervention and control conditions can be compared, and can also improve logistics of implementing intervention across multiple sites and strengthen fidelity. We discuss DWLDs in the larger context of roll-out randomized designs and compare it with its cousin the Stepped Wedge design. The RPDD uses archival data on the population of settings from which intervention unit(s) are selected to create expected posttest scores for units receiving intervention, to which actual posttest scores are compared. High pretest-posttest correlations give the RPDD statistical power for assessing intervention impact even when one or a few settings receive intervention. RPDD works best when archival data are available over a number of years prior to and following intervention. If intervention units were not randomly selected, propensity scores can be used to control for non-random selection factors. Examples are provided of the DWLD and RPDD used to evaluate, respectively, suicide prevention training (QPR) in 32 schools and a violence prevention program (CeaseFire) in two Chicago police districts over a 10-year period. How DWLD and RPDD address common threats to internal and external validity, as well as their limitations, are discussed.
Wyman, Peter A.; Brown, C. Hendricks
2015-01-01
The dynamic wait-listed design (DWLD) and regression point displacement design (RPDD) address several challenges in evaluating group-based interventions when there is a limited number of groups. Both DWLD and RPDD utilize efficiencies that increase statistical power and can enhance balance between community needs and research priorities. The DWLD blocks on more time units than traditional wait-listed designs, thereby increasing the proportion of a study period during which intervention and control conditions can be compared, and can also improve logistics of implementing intervention across multiple sites and strengthen fidelity. We discuss DWLDs in the larger context of roll-out randomized designs and compare it with its cousin the Stepped Wedge design. The RPDD uses archival data on the population of settings from which intervention unit(s) are selected to create expected posttest scores for units receiving intervention, to which actual posttest scores are compared. High pretest-posttest correlations give the RPDD statistical power for assessing intervention impact even when one or a few settings receive intervention. RPDD works best when archival data are available over a number of years prior to and following intervention. If intervention units were not randomly selected, propensity scores can be used to control for nonrandom selection factors. Examples are provided of the DWLD and RPDD used to evaluate, respectively, suicide prevention training (QPR) in 32 schools and a violence prevention program (CeaseFire) in 2 Chicago police districts over a 10-year period. How DWLD and RPDD address common threats to internal and external validity, as well as their limitations, are discussed. PMID:25481512
An Independent Filter for Gene Set Testing Based on Spectral Enrichment.
Frost, H Robert; Li, Zhigang; Asselbergs, Folkert W; Moore, Jason H
2015-01-01
Gene set testing has become an indispensable tool for the analysis of high-dimensional genomic data. An important motivation for testing gene sets, rather than individual genomic variables, is to improve statistical power by reducing the number of tested hypotheses. Given the dramatic growth in common gene set collections, however, testing is often performed with nearly as many gene sets as underlying genomic variables. To address the challenge to statistical power posed by large gene set collections, we have developed spectral gene set filtering (SGSF), a novel technique for independent filtering of gene set collections prior to gene set testing. The SGSF method uses as a filter statistic the p-value measuring the statistical significance of the association between each gene set and the sample principal components (PCs), taking into account the significance of the associated eigenvalues. Because this filter statistic is independent of standard gene set test statistics under the null hypothesis but dependent under the alternative, the proportion of enriched gene sets is increased without impacting the type I error rate. As shown using simulated and real gene expression data, the SGSF algorithm accurately filters gene sets unrelated to the experimental outcome resulting in significantly increased gene set testing power.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Burnett, R.A.
A major goal of the Analysis of Large Data Sets (ALDS) research project at Pacific Northwest Laboratory (PNL) is to provide efficient data organization, storage, and access capabilities for statistical applications involving large amounts of data. As part of the effort to achieve this goal, a self-describing binary (SDB) data file structure has been designed and implemented together with a set of basic data manipulation functions and supporting SDB data access routines. Logical and physical data descriptors are stored in SDB files preceding the data values. SDB files thus provide a common data representation for interfacing diverse software components. Thismore » paper describes the various types of data descriptors and data structures permitted by the file design. Data buffering, file segmentation, and a segment overflow handler are also discussed.« less
Guidelines for Genome-Scale Analysis of Biological Rhythms.
Hughes, Michael E; Abruzzi, Katherine C; Allada, Ravi; Anafi, Ron; Arpat, Alaaddin Bulak; Asher, Gad; Baldi, Pierre; de Bekker, Charissa; Bell-Pedersen, Deborah; Blau, Justin; Brown, Steve; Ceriani, M Fernanda; Chen, Zheng; Chiu, Joanna C; Cox, Juergen; Crowell, Alexander M; DeBruyne, Jason P; Dijk, Derk-Jan; DiTacchio, Luciano; Doyle, Francis J; Duffield, Giles E; Dunlap, Jay C; Eckel-Mahan, Kristin; Esser, Karyn A; FitzGerald, Garret A; Forger, Daniel B; Francey, Lauren J; Fu, Ying-Hui; Gachon, Frédéric; Gatfield, David; de Goede, Paul; Golden, Susan S; Green, Carla; Harer, John; Harmer, Stacey; Haspel, Jeff; Hastings, Michael H; Herzel, Hanspeter; Herzog, Erik D; Hoffmann, Christy; Hong, Christian; Hughey, Jacob J; Hurley, Jennifer M; de la Iglesia, Horacio O; Johnson, Carl; Kay, Steve A; Koike, Nobuya; Kornacker, Karl; Kramer, Achim; Lamia, Katja; Leise, Tanya; Lewis, Scott A; Li, Jiajia; Li, Xiaodong; Liu, Andrew C; Loros, Jennifer J; Martino, Tami A; Menet, Jerome S; Merrow, Martha; Millar, Andrew J; Mockler, Todd; Naef, Felix; Nagoshi, Emi; Nitabach, Michael N; Olmedo, Maria; Nusinow, Dmitri A; Ptáček, Louis J; Rand, David; Reddy, Akhilesh B; Robles, Maria S; Roenneberg, Till; Rosbash, Michael; Ruben, Marc D; Rund, Samuel S C; Sancar, Aziz; Sassone-Corsi, Paolo; Sehgal, Amita; Sherrill-Mix, Scott; Skene, Debra J; Storch, Kai-Florian; Takahashi, Joseph S; Ueda, Hiroki R; Wang, Han; Weitz, Charles; Westermark, Pål O; Wijnen, Herman; Xu, Ying; Wu, Gang; Yoo, Seung-Hee; Young, Michael; Zhang, Eric Erquan; Zielinski, Tomasz; Hogenesch, John B
2017-10-01
Genome biology approaches have made enormous contributions to our understanding of biological rhythms, particularly in identifying outputs of the clock, including RNAs, proteins, and metabolites, whose abundance oscillates throughout the day. These methods hold significant promise for future discovery, particularly when combined with computational modeling. However, genome-scale experiments are costly and laborious, yielding "big data" that are conceptually and statistically difficult to analyze. There is no obvious consensus regarding design or analysis. Here we discuss the relevant technical considerations to generate reproducible, statistically sound, and broadly useful genome-scale data. Rather than suggest a set of rigid rules, we aim to codify principles by which investigators, reviewers, and readers of the primary literature can evaluate the suitability of different experimental designs for measuring different aspects of biological rhythms. We introduce CircaInSilico, a web-based application for generating synthetic genome biology data to benchmark statistical methods for studying biological rhythms. Finally, we discuss several unmet analytical needs, including applications to clinical medicine, and suggest productive avenues to address them.
Guidelines for Genome-Scale Analysis of Biological Rhythms
Hughes, Michael E.; Abruzzi, Katherine C.; Allada, Ravi; Anafi, Ron; Arpat, Alaaddin Bulak; Asher, Gad; Baldi, Pierre; de Bekker, Charissa; Bell-Pedersen, Deborah; Blau, Justin; Brown, Steve; Ceriani, M. Fernanda; Chen, Zheng; Chiu, Joanna C.; Cox, Juergen; Crowell, Alexander M.; DeBruyne, Jason P.; Dijk, Derk-Jan; DiTacchio, Luciano; Doyle, Francis J.; Duffield, Giles E.; Dunlap, Jay C.; Eckel-Mahan, Kristin; Esser, Karyn A.; FitzGerald, Garret A.; Forger, Daniel B.; Francey, Lauren J.; Fu, Ying-Hui; Gachon, Frédéric; Gatfield, David; de Goede, Paul; Golden, Susan S.; Green, Carla; Harer, John; Harmer, Stacey; Haspel, Jeff; Hastings, Michael H.; Herzel, Hanspeter; Herzog, Erik D.; Hoffmann, Christy; Hong, Christian; Hughey, Jacob J.; Hurley, Jennifer M.; de la Iglesia, Horacio O.; Johnson, Carl; Kay, Steve A.; Koike, Nobuya; Kornacker, Karl; Kramer, Achim; Lamia, Katja; Leise, Tanya; Lewis, Scott A.; Li, Jiajia; Li, Xiaodong; Liu, Andrew C.; Loros, Jennifer J.; Martino, Tami A.; Menet, Jerome S.; Merrow, Martha; Millar, Andrew J.; Mockler, Todd; Naef, Felix; Nagoshi, Emi; Nitabach, Michael N.; Olmedo, Maria; Nusinow, Dmitri A.; Ptáček, Louis J.; Rand, David; Reddy, Akhilesh B.; Robles, Maria S.; Roenneberg, Till; Rosbash, Michael; Ruben, Marc D.; Rund, Samuel S.C.; Sancar, Aziz; Sassone-Corsi, Paolo; Sehgal, Amita; Sherrill-Mix, Scott; Skene, Debra J.; Storch, Kai-Florian; Takahashi, Joseph S.; Ueda, Hiroki R.; Wang, Han; Weitz, Charles; Westermark, Pål O.; Wijnen, Herman; Xu, Ying; Wu, Gang; Yoo, Seung-Hee; Young, Michael; Zhang, Eric Erquan; Zielinski, Tomasz; Hogenesch, John B.
2017-01-01
Genome biology approaches have made enormous contributions to our understanding of biological rhythms, particularly in identifying outputs of the clock, including RNAs, proteins, and metabolites, whose abundance oscillates throughout the day. These methods hold significant promise for future discovery, particularly when combined with computational modeling. However, genome-scale experiments are costly and laborious, yielding “big data” that are conceptually and statistically difficult to analyze. There is no obvious consensus regarding design or analysis. Here we discuss the relevant technical considerations to generate reproducible, statistically sound, and broadly useful genome-scale data. Rather than suggest a set of rigid rules, we aim to codify principles by which investigators, reviewers, and readers of the primary literature can evaluate the suitability of different experimental designs for measuring different aspects of biological rhythms. We introduce CircaInSilico, a web-based application for generating synthetic genome biology data to benchmark statistical methods for studying biological rhythms. Finally, we discuss several unmet analytical needs, including applications to clinical medicine, and suggest productive avenues to address them. PMID:29098954
NASA Technical Reports Server (NTRS)
Stutzman, Warren L.; Safaai-Jazi, A.; Pratt, Timothy; Nelson, B.; Laster, J.; Ajaz, H.
1993-01-01
Virginia Tech has performed a comprehensive propagation experiment using the Olympus satellite beacons at 12.5, 19.77, and 29.66 GHz (which we refer to as 12, 20, and 30 GHz). Four receive terminals were designed and constructed, one terminal at each frequency plus a portable one with 20 and 30 GHz receivers for microscale and scintillation studies. Total power radiometers were included in each terminal in order to set the clear air reference level for each beacon and also to predict path attenuation. More details on the equipment and the experiment design are found elsewhere. Statistical results for one year of data collection were analyzed. In addition, the following studies were performed: a microdiversity experiment in which two closely spaced 20 GHz receivers were used; a comparison of total power and Dicke switched radiometer measurements, frequency scaling of scintillations, and adaptive power control algorithm development. Statistical results are reported.
An adaptive two-stage dose-response design method for establishing proof of concept.
Franchetti, Yoko; Anderson, Stewart J; Sampson, Allan R
2013-01-01
We propose an adaptive two-stage dose-response design where a prespecified adaptation rule is used to add and/or drop treatment arms between the stages. We extend the multiple comparison procedures-modeling (MCP-Mod) approach into a two-stage design. In each stage, we use the same set of candidate dose-response models and test for a dose-response relationship or proof of concept (PoC) via model-associated statistics. The stage-wise test results are then combined to establish "global" PoC using a conditional error function. Our simulation studies showed good and more robust power in our design method compared to conventional and fixed designs.
Callous-Unemotional Interpersonal Style in DSM-V: What Does This Mean for the UK SEBD Population?
ERIC Educational Resources Information Center
Warren, Laura; Jones, Alice; Frederickson, Norah
2015-01-01
The definition of conduct disorder in "The Diagnostic and Statistical Manual of Mental Disorders" (5th ed.; DSMV) includes a new "limited prosocial emotions" specifier, designed to assist in the identification of those with more severe and persistent difficulties and the better targeting of interventions. This study set out to…
ERIC Educational Resources Information Center
Erford, Bradley T.; Giguere, Monica; Glenn, Kacie; Ciarlone, Hallie
2015-01-01
Patterns of articles published in "Professional School Counseling" (PSC) from the first 15 volumes were reviewed in this meta-study. Author characteristics (e.g., sex, employment setting, nation of domicile) and article characteristics (e.g., topic, type, design, sample, sample size, participant type, statistical procedures and…
Fusion And Inference From Multiple And Massive Disparate Distributed Dynamic Data Sets
2017-07-01
principled methodology for two-sample graph testing; designed a provably almost-surely perfect vertex clustering algorithm for block model graphs; proved...3.7 Semi-Supervised Clustering Methodology ...................................................................... 9 3.8 Robust Hypothesis Testing...dimensional Euclidean space – allows the full arsenal of statistical and machine learning methodology for multivariate Euclidean data to be deployed for
NLS Handbook, 2005. National Longitudinal Surveys
ERIC Educational Resources Information Center
Bureau of Labor Statistics, 2006
2006-01-01
The National Longitudinal Surveys (NLS), sponsored by the U.S. Bureau of Labor Statistics (BLS), are a set of surveys designed to gather information at multiple points in time on the labor market experiences of groups of men and women. Each of the cohorts has been selected to represent all people living in the United States at the initial…
1986-06-30
features of computer aided design systems and statistical quality control procedures that are generic to chip sets and processes. RADIATION HARDNESS -The...System PSP Programmable Signal Processor SSI Small Scale Integration ." TOW Tube Launched, Optically Tracked, Wire Guided TTL Transistor Transitor Logic
ERIC Educational Resources Information Center
Stack, Sue; Watson, Jane
2013-01-01
There is considerable research on the difficulties students have in conceptualising individual concepts of probability and statistics (see for example, Bryant & Nunes, 2012; Jones, 2005). The unit of work developed for the action research project described in this article is specifically designed to address some of these in order to help…
Linking Competencies in Educational Settings and Measuring Growth. Research Report. ETS RR-06-12
ERIC Educational Resources Information Center
von Davier, Alina A.; Carstensen, Claus H.; von Davier, Matthias
2006-01-01
Measuring and linking competencies require special instruments, special data collection designs, and special statistical models. The measurement instruments are tests or tests forms, which can be used in the following situations: The same test can be given repeatedly; two or more parallel tests forms (i.e., forms intended to be similar in…
ERIC Educational Resources Information Center
Yang, Dazhi
2017-01-01
Background: Teaching online is a different experience from that of teaching in a face-to-face setting. Knowledge and skills developed for teaching face-to-face classes are not adequate preparation for teaching online. It is even more challenging to teach science, technology, engineering and math (STEM) courses completely online because these…
Statistically-Driven Visualizations of Student Interactions with a French Online Course Video
ERIC Educational Resources Information Center
Youngs, Bonnie L.; Prakash, Akhil; Nugent, Rebecca
2018-01-01
Logged tracking data for online courses are generally not available to instructors, students, and course designers and developers, and even if these data were available, most content-oriented instructors do not have the skill set to analyze them. Learning analytics, mined from logged course data and usually presented in the form of learning…
F-8C adaptive flight control extensions. [for maximum likelihood estimation
NASA Technical Reports Server (NTRS)
Stein, G.; Hartmann, G. L.
1977-01-01
An adaptive concept which combines gain-scheduled control laws with explicit maximum likelihood estimation (MLE) identification to provide the scheduling values is described. The MLE algorithm was improved by incorporating attitude data, estimating gust statistics for setting filter gains, and improving parameter tracking during changing flight conditions. A lateral MLE algorithm was designed to improve true air speed and angle of attack estimates during lateral maneuvers. Relationships between the pitch axis sensors inherent in the MLE design were examined and used for sensor failure detection. Design details and simulation performance are presented for each of the three areas investigated.
Computational Analysis for Rocket-Based Combined-Cycle Systems During Rocket-Only Operation
NASA Technical Reports Server (NTRS)
Steffen, C. J., Jr.; Smith, T. D.; Yungster, S.; Keller, D. J.
2000-01-01
A series of Reynolds-averaged Navier-Stokes calculations were employed to study the performance of rocket-based combined-cycle systems operating in an all-rocket mode. This parametric series of calculations were executed within a statistical framework, commonly known as design of experiments. The parametric design space included four geometric and two flowfield variables set at three levels each, for a total of 729 possible combinations. A D-optimal design strategy was selected. It required that only 36 separate computational fluid dynamics (CFD) solutions be performed to develop a full response surface model, which quantified the linear, bilinear, and curvilinear effects of the six experimental variables. The axisymmetric, Reynolds-averaged Navier-Stokes simulations were executed with the NPARC v3.0 code. The response used in the statistical analysis was created from Isp efficiency data integrated from the 36 CFD simulations. The influence of turbulence modeling was analyzed by using both one- and two-equation models. Careful attention was also given to quantify the influence of mesh dependence, iterative convergence, and artificial viscosity upon the resulting statistical model. Thirteen statistically significant effects were observed to have an influence on rocket-based combined-cycle nozzle performance. It was apparent that the free-expansion process, directly downstream of the rocket nozzle, can influence the Isp efficiency. Numerical schlieren images and particle traces have been used to further understand the physical phenomena behind several of the statistically significant results.
NASA Technical Reports Server (NTRS)
Worm, Jeffrey A.; Culas, Donald E.
1991-01-01
Computers are not designed to handle terms where uncertainty is present. To deal with uncertainty, techniques other than classical logic must be developed. This paper examines the concepts of statistical analysis, the Dempster-Shafer theory, rough set theory, and fuzzy set theory to solve this problem. The fundamentals of these theories are combined to provide the possible optimal solution. By incorporating principles from these theories, a decision-making process may be simulated by extracting two sets of fuzzy rules: certain rules and possible rules. From these rules a corresponding measure of how much we believe these rules is constructed. From this, the idea of how much a fuzzy diagnosis is definable in terms of its fuzzy attributes is studied.
Silva, D; Martins, O; Matos, S; Lopes, P; Rolo, T; Baptista, I
2015-05-01
An ex vivo model was designed to profilometrically and histologically assess root changes resulting from scaling with a new ultrasonic device, designed for bone piezoelectric surgery, in comparison with curettes. Three groups of 10 periodontal hopeless teeth were each subjected to different root instrumentation: Gracey curettes (CUR); ultrasonic piezoelectric device, Perio 100% setting, level 8 (P100); and ultrasonic piezoelectric device Surg 50% setting, level 1 (S50). After extraction, all teeth were photographed to visually assess the presence of dental calculus. The treated root surfaces were profilometrically evaluated (Ra, Rz, Rmax). Undecalcified histological sections were prepared to assess qualitative changes in cementum thickness. Statistical analysis was carried out using one-way anova test with a significance level of 95%. Both instruments proved to be effective in the complete removal of calculus. The CUR group presented the lowest Ra [2.28 μm (±0.58)] and S50 the highest [3.01 μm (±0.61)]. No statistically significant differences were detected among the three groups, for Ra, Rz and Rmax. Histologically, there was a cementum thickness reduction in all groups, being higher and more irregular in S50 group. Within the limits of this study, there were no statistically significant differences in roughness parameters analyzed between curettes and the ultrasonic piezoelectric unit. This new instrument removes a smaller amount of cementum, mainly at the Perio 100% power setting, which appears to be the least damaging. The ultrasonic device is effective in calculus removal, proving to be as effective as curettes. © 2014 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
The crossing statistic: dealing with unknown errors in the dispersion of Type Ia supernovae
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shafieloo, Arman; Clifton, Timothy; Ferreira, Pedro, E-mail: arman@ewha.ac.kr, E-mail: tclifton@astro.ox.ac.uk, E-mail: p.ferreira1@physics.ox.ac.uk
2011-08-01
We propose a new statistic that has been designed to be used in situations where the intrinsic dispersion of a data set is not well known: The Crossing Statistic. This statistic is in general less sensitive than χ{sup 2} to the intrinsic dispersion of the data, and hence allows us to make progress in distinguishing between different models using goodness of fit to the data even when the errors involved are poorly understood. The proposed statistic makes use of the shape and trends of a model's predictions in a quantifiable manner. It is applicable to a variety of circumstances, althoughmore » we consider it to be especially well suited to the task of distinguishing between different cosmological models using type Ia supernovae. We show that this statistic can easily distinguish between different models in cases where the χ{sup 2} statistic fails. We also show that the last mode of the Crossing Statistic is identical to χ{sup 2}, so that it can be considered as a generalization of χ{sup 2}.« less
Galvanin, Federico; Ballan, Carlo C; Barolo, Massimiliano; Bezzo, Fabrizio
2013-08-01
The use of pharmacokinetic (PK) and pharmacodynamic (PD) models is a common and widespread practice in the preliminary stages of drug development. However, PK-PD models may be affected by structural identifiability issues intrinsically related to their mathematical formulation. A preliminary structural identifiability analysis is usually carried out to check if the set of model parameters can be uniquely determined from experimental observations under the ideal assumptions of noise-free data and no model uncertainty. However, even for structurally identifiable models, real-life experimental conditions and model uncertainty may strongly affect the practical possibility to estimate the model parameters in a statistically sound way. A systematic procedure coupling the numerical assessment of structural identifiability with advanced model-based design of experiments formulations is presented in this paper. The objective is to propose a general approach to design experiments in an optimal way, detecting a proper set of experimental settings that ensure the practical identifiability of PK-PD models. Two simulated case studies based on in vitro bacterial growth and killing models are presented to demonstrate the applicability and generality of the methodology to tackle model identifiability issues effectively, through the design of feasible and highly informative experiments.
Statistical inference for extended or shortened phase II studies based on Simon's two-stage designs.
Zhao, Junjun; Yu, Menggang; Feng, Xi-Ping
2015-06-07
Simon's two-stage designs are popular choices for conducting phase II clinical trials, especially in the oncology trials to reduce the number of patients placed on ineffective experimental therapies. Recently Koyama and Chen (2008) discussed how to conduct proper inference for such studies because they found that inference procedures used with Simon's designs almost always ignore the actual sampling plan used. In particular, they proposed an inference method for studies when the actual second stage sample sizes differ from planned ones. We consider an alternative inference method based on likelihood ratio. In particular, we order permissible sample paths under Simon's two-stage designs using their corresponding conditional likelihood. In this way, we can calculate p-values using the common definition: the probability of obtaining a test statistic value at least as extreme as that observed under the null hypothesis. In addition to providing inference for a couple of scenarios where Koyama and Chen's method can be difficult to apply, the resulting estimate based on our method appears to have certain advantage in terms of inference properties in many numerical simulations. It generally led to smaller biases and narrower confidence intervals while maintaining similar coverages. We also illustrated the two methods in a real data setting. Inference procedures used with Simon's designs almost always ignore the actual sampling plan. Reported P-values, point estimates and confidence intervals for the response rate are not usually adjusted for the design's adaptiveness. Proper statistical inference procedures should be used.
Fiori, Simone
2007-01-01
Bivariate statistical modeling from incomplete data is a useful statistical tool that allows to discover the model underlying two data sets when the data in the two sets do not correspond in size nor in ordering. Such situation may occur when the sizes of the two data sets do not match (i.e., there are “holes” in the data) or when the data sets have been acquired independently. Also, statistical modeling is useful when the amount of available data is enough to show relevant statistical features of the phenomenon underlying the data. We propose to tackle the problem of statistical modeling via a neural (nonlinear) system that is able to match its input-output statistic to the statistic of the available data sets. A key point of the new implementation proposed here is that it is based on look-up-table (LUT) neural systems, which guarantee a computationally advantageous way of implementing neural systems. A number of numerical experiments, performed on both synthetic and real-world data sets, illustrate the features of the proposed modeling procedure. PMID:18566641
efficient association study design via power-optimized tag SNP selection
HAN, BUHM; KANG, HYUN MIN; SEO, MYEONG SEONG; ZAITLEN, NOAH; ESKIN, ELEAZAR
2008-01-01
Discovering statistical correlation between causal genetic variation and clinical traits through association studies is an important method for identifying the genetic basis of human diseases. Since fully resequencing a cohort is prohibitively costly, genetic association studies take advantage of local correlation structure (or linkage disequilibrium) between single nucleotide polymorphisms (SNPs) by selecting a subset of SNPs to be genotyped (tag SNPs). While many current association studies are performed using commercially available high-throughput genotyping products that define a set of tag SNPs, choosing tag SNPs remains an important problem for both custom follow-up studies as well as designing the high-throughput genotyping products themselves. The most widely used tag SNP selection method optimizes over the correlation between SNPs (r2). However, tag SNPs chosen based on an r2 criterion do not necessarily maximize the statistical power of an association study. We propose a study design framework that chooses SNPs to maximize power and efficiently measures the power through empirical simulation. Empirical results based on the HapMap data show that our method gains considerable power over a widely used r2-based method, or equivalently reduces the number of tag SNPs required to attain the desired power of a study. Our power-optimized 100k whole genome tag set provides equivalent power to the Affymetrix 500k chip for the CEU population. For the design of custom follow-up studies, our method provides up to twice the power increase using the same number of tag SNPs as r2-based methods. Our method is publicly available via web server at http://design.cs.ucla.edu. PMID:18702637
Tolerancing aspheres based on manufacturing statistics
NASA Astrophysics Data System (ADS)
Wickenhagen, S.; Möhl, A.; Fuchs, U.
2017-11-01
A standard way of tolerancing optical elements or systems is to perform a Monte Carlo based analysis within a common optical design software package. Although, different weightings and distributions are assumed they are all counting on statistics, which usually means several hundreds or thousands of systems for reliable results. Thus, employing these methods for small batch sizes is unreliable, especially when aspheric surfaces are involved. The huge database of asphericon was used to investigate the correlation between the given tolerance values and measured data sets. The resulting probability distributions of these measured data were analyzed aiming for a robust optical tolerancing process.
Wolfson, Julian; Henn, Lisa
2014-01-01
In many areas of clinical investigation there is great interest in identifying and validating surrogate endpoints, biomarkers that can be measured a relatively short time after a treatment has been administered and that can reliably predict the effect of treatment on the clinical outcome of interest. However, despite dramatic advances in the ability to measure biomarkers, the recent history of clinical research is littered with failed surrogates. In this paper, we present a statistical perspective on why identifying surrogate endpoints is so difficult. We view the problem from the framework of causal inference, with a particular focus on the technique of principal stratification (PS), an approach which is appealing because the resulting estimands are not biased by unmeasured confounding. In many settings, PS estimands are not statistically identifiable and their degree of non-identifiability can be thought of as representing the statistical difficulty of assessing the surrogate value of a biomarker. In this work, we examine the identifiability issue and present key simplifying assumptions and enhanced study designs that enable the partial or full identification of PS estimands. We also present example situations where these assumptions and designs may or may not be feasible, providing insight into the problem characteristics which make the statistical evaluation of surrogate endpoints so challenging.
2014-01-01
In many areas of clinical investigation there is great interest in identifying and validating surrogate endpoints, biomarkers that can be measured a relatively short time after a treatment has been administered and that can reliably predict the effect of treatment on the clinical outcome of interest. However, despite dramatic advances in the ability to measure biomarkers, the recent history of clinical research is littered with failed surrogates. In this paper, we present a statistical perspective on why identifying surrogate endpoints is so difficult. We view the problem from the framework of causal inference, with a particular focus on the technique of principal stratification (PS), an approach which is appealing because the resulting estimands are not biased by unmeasured confounding. In many settings, PS estimands are not statistically identifiable and their degree of non-identifiability can be thought of as representing the statistical difficulty of assessing the surrogate value of a biomarker. In this work, we examine the identifiability issue and present key simplifying assumptions and enhanced study designs that enable the partial or full identification of PS estimands. We also present example situations where these assumptions and designs may or may not be feasible, providing insight into the problem characteristics which make the statistical evaluation of surrogate endpoints so challenging. PMID:25342953
Southard, Kelly; Jarrell, Ashley; Shattell, Mona M; McCoy, Thomas P; Bartlett, Robin; Judge, Christine A
2012-05-01
Specific efforts by hospital accreditation organizations encourage renovation of nursing stations, so nurses can better see, attend, and care for their patients. The purpose of this study was to examine the effect of nursing station design on the therapeutic milieu in an adult acute care psychiatric unit. A repeated cross-sectional, pretest-posttest design was used. Data were collected from a convenience sample of 81 patients and 25 nursing staff members who completed the Ward Atmosphere Scale. Pretest data were collected when the unit had an enclosed nursing station, and posttest data were collected after renovations to the unit created an open nursing station. No statistically significant differences were found in patient or staff perceptions of the therapeutic milieu. No increase in aggression toward staff was found, given patients' ease of access to the nursing station. More research is needed about the impact of unit design in acute care psychiatric settings. Copyright 2012, SLACK Incorporated.
Gichoya, Judy; Pearce, Chris; Wickramasinghe, Nilmini
2013-01-01
Kenya ranks among the twenty-two countries that collectively contribute about 80% of the world's Tuberculosis cases; with a 50-200 fold increased risk of tuberculosis in HIV infected persons versus non-HIV hosts. Contemporaneously, there is an increase in mobile penetration and its use to support healthcare throughout Africa. Many are skeptical that such m-health solutions are unsustainable and not scalable. We seek to design a scalable, pervasive m-health solution for Tuberculosis care to become a use case for sustainable and scalable health IT in limited resource settings. We combine agile design principles and user-centered design to develop the architecture needed for this initiative. Furthermore, the architecture runs on multiple devices integrated to deliver functionality critical for successful Health IT implementation in limited resource settings. It is anticipated that once fully implemented, the proposed m-health solution will facilitate superior monitoring and management of Tuberculosis and thereby reduce the alarming statistic regarding this disease in this region.
Design-of-experiments to Reduce Life-cycle Costs in Combat Aircraft Inlets
NASA Technical Reports Server (NTRS)
Anderson, Bernhard H.; Baust, Henry D.; Agrell, Johan
2003-01-01
It is the purpose of this study to demonstrate the viability and economy of Design- of-Experiments (DOE), to arrive at micro-secondary flow control installation designs that achieve optimal inlet performance for different mission strategies. These statistical design concepts were used to investigate the properties of "low unit strength" micro-effector installation. "Low unit strength" micro-effectors are micro-vanes, set a very low angle-of incidence, with very long chord lengths. They are designed to influence the neat wall inlet flow over an extended streamwise distance. In this study, however, the long chord lengths were replicated by a series of short chord length effectors arranged in series over multiple bands of effectors. In order to properly evaluate the performance differences between the single band extended chord length installation designs and the segmented multiband short chord length designs, both sets of installations must be optimal. Critical to achieving optimal micro-secondary flow control installation designs is the understanding of the factor interactions that occur between the multiple bands of micro-scale vane effectors. These factor interactions are best understood and brought together in an optimal manner through a structured DOE process, or more specifically Response Surface Methods (RSM).
The influence of narrative v. statistical information on perceiving vaccination risks.
Betsch, Cornelia; Ulshöfer, Corina; Renkewitz, Frank; Betsch, Tilmann
2011-01-01
Health-related information found on the Internet is increasing and impacts patient decision making, e.g. regarding vaccination decisions. In addition to statistical information (e.g. incidence rates of vaccine adverse events), narrative information is also widely available such as postings on online bulletin boards. Previous research has shown that narrative information can impact treatment decisions, even when statistical information is presented concurrently. As the determinants of this effect are largely unknown, we will vary features of the narratives to identify mechanisms through which narratives impact risk judgments. An online bulletin board setting provided participants with statistical information and authentic narratives about the occurrence and nonoccurrence of adverse events. Experiment 1 followed a single factorial design with 1, 2, or 4 narratives out of 10 reporting adverse events. Experiment 2 implemented a 2 (statistical risk 20% vs. 40%) × 2 (2/10 vs. 4/10 narratives reporting adverse events) × 2 (high vs. low richness) × 2 (high vs. low emotionality) between-subjects design. Dependent variables were perceived risk of side-effects and vaccination intentions. Experiment 1 shows an inverse relation between the number of narratives reporting adverse-events and vaccination intentions, which was mediated by the perceived risk of vaccinating. Experiment 2 showed a stronger influence of the number of narratives than of the statistical risk information. High (vs. low) emotional narratives had a greater impact on the perceived risk, while richness had no effect. The number of narratives influences risk judgments can potentially override statistical information about risk.
Response to Comments on "Ducklings imprint on the relational concept of 'same or different'".
Martinho, Antone; Kacelnik, Alex
2017-02-24
Two Comments by Hupé and by Langbein and Puppe address our choice of statistical analysis in assigning preference between sets of stimuli to individual ducklings in our paper. We believe that our analysis remains the most appropriate approach for our data and experimental design. Copyright © 2017, American Association for the Advancement of Science.
Selen, Arzu; Cruañes, Maria T; Müllertz, Anette; Dickinson, Paul A; Cook, Jack A; Polli, James E; Kesisoglou, Filippos; Crison, John; Johnson, Kevin C; Muirhead, Gordon T; Schofield, Timothy; Tsong, Yi
2010-09-01
A biopharmaceutics and Quality by Design (QbD) conference was held on June 10-12, 2009 in Rockville, Maryland, USA to provide a forum and identify approaches for enhancing product quality for patient benefit. Presentations concerned the current biopharmaceutical toolbox (i.e., in vitro, in silico, pre-clinical, in vivo, and statistical approaches), as well as case studies, and reflections on new paradigms. Plenary and breakout session discussions evaluated the current state and envisioned a future state that more effectively integrates QbD and biopharmaceutics. Breakout groups discussed the following four topics: Integrating Biopharmaceutical Assessment into the QbD Paradigm, Predictive Statistical Tools, Predictive Mechanistic Tools, and Predictive Analytical Tools. Nine priority areas, further described in this report, were identified for advancing integration of biopharmaceutics and support a more fundamentally based, integrated approach to setting product dissolution/release acceptance criteria. Collaboration among a broad range of disciplines and fostering a knowledge sharing environment that places the patient's needs as the focus of drug development, consistent with science- and risk-based spirit of QbD, were identified as key components of the path forward.
Validation of the Hospital Ethical Climate Survey for older people care.
Suhonen, Riitta; Stolt, Minna; Katajisto, Jouko; Charalambous, Andreas; Olson, Linda L
2015-08-01
The exploration of the ethical climate in the care settings for older people is highlighted in the literature, and it has been associated with various aspects of clinical practice and nurses' jobs. However, ethical climate is seldom studied in the older people care context. Valid, reliable, feasible measures are needed for the measurement of ethical climate. This study aimed to test the reliability, validity, and sensitivity of the Hospital Ethical Climate Survey in healthcare settings for older people. A non-experimental cross-sectional study design was employed, and a survey using questionnaires, including the Hospital Ethical Climate Survey was used for data collection. Data were analyzed using descriptive statistics, inferential statistics, and multivariable methods. Survey data were collected from a sample of nurses working in the care settings for older people in Finland (N = 1513, n = 874, response rate = 58%) in 2011. This study was conducted according to good scientific inquiry guidelines, and ethical approval was obtained from the university ethics committee. The mean score for the Hospital Ethical Climate Survey total was 3.85 (standard deviation = 0.56). Cronbach's alpha was 0.92. Principal component analysis provided evidence for factorial validity. LISREL provided evidence for construct validity based on goodness-of-fit statistics. Pearson's correlations of 0.68-0.90 were found between the sub-scales and the Hospital Ethical Climate Survey. The Hospital Ethical Climate Survey was found able to reveal discrimination across care settings and proved to be a valid and reliable tool for measuring ethical climate in care settings for older people and sensitive enough to reveal variations across various clinical settings. The Finnish version of the Hospital Ethical Climate Survey, used mainly in the hospital settings previously, proved to be a valid instrument to be used in the care settings for older people. Further studies are due to analyze the factor structure and some items of the Hospital Ethical Climate Survey. © The Author(s) 2014.
A Comparison of Two Balance Calibration Model Building Methods
NASA Technical Reports Server (NTRS)
DeLoach, Richard; Ulbrich, Norbert
2007-01-01
Simulated strain-gage balance calibration data is used to compare the accuracy of two balance calibration model building methods for different noise environments and calibration experiment designs. The first building method obtains a math model for the analysis of balance calibration data after applying a candidate math model search algorithm to the calibration data set. The second building method uses stepwise regression analysis in order to construct a model for the analysis. Four balance calibration data sets were simulated in order to compare the accuracy of the two math model building methods. The simulated data sets were prepared using the traditional One Factor At a Time (OFAT) technique and the Modern Design of Experiments (MDOE) approach. Random and systematic errors were introduced in the simulated calibration data sets in order to study their influence on the math model building methods. Residuals of the fitted calibration responses and other statistical metrics were compared in order to evaluate the calibration models developed with different combinations of noise environment, experiment design, and model building method. Overall, predicted math models and residuals of both math model building methods show very good agreement. Significant differences in model quality were attributable to noise environment, experiment design, and their interaction. Generally, the addition of systematic error significantly degraded the quality of calibration models developed from OFAT data by either method, but MDOE experiment designs were more robust with respect to the introduction of a systematic component of the unexplained variance.
Teacher Professional Development to Foster Authentic Student Research Experiences
NASA Astrophysics Data System (ADS)
Conn, K.; Iyengar, E.
2004-12-01
This presentation reports on a new teacher workshop design that encourages teachers to initiate and support long-term student-directed research projects in the classroom setting. Teachers were recruited and engaged in an intensive marine ecology learning experience at Shoals Marine Laboratory, Appledore Island, Maine. Part of the weeklong summer workshop was spent in field work, part in laboratory work, and part in learning experimental design and basic statistical analysis of experimental results. Teachers were presented with strategies to adapt their workshop learnings to formulate plans for initiating and managing authentic student research projects in their classrooms. The authors will report on the different considerations and constraints facing the teachers in their home school settings and teachers' progress in implementing their plans. Suggestions for replicating the workshop will be offered.
Carter, Eleanor; Whitworth, Adam
2015-04-01
'Creaming' and 'parking' are endemic concerns within quasi-marketised welfare-to-work (WTW) systems internationally, and the UK's flagship Work Programme for the long-term unemployed is something of an international pioneer of WTW delivery, based on outsourcing, payment by results and provider flexibility. In the Work Programme design, providers' incentives to 'cream' and 'park' differently positioned claimants are intended to be mitigated through the existence of nine payment groups (based on claimants' prior benefit type) into which different claimants are allocated and across which job outcome payments for providers differ. Evaluation evidence suggests however that 'creaming' and 'parking' practices remain common. This paper offers original quantitative insights into the extent of claimant variation within these payment groups, which, contrary to the government's intention, seem more likely to design in rather than design out 'creaming' and 'parking'. In response, a statistical approach to differential payment setting is explored and is shown to be a viable and more effective way to design a set of alternative and empirically grounded payment groups, offering greater predictive power and value-for-money than is the case in the current Work Programme design.
Do regional methods really help reduce uncertainties in flood frequency analyses?
NASA Astrophysics Data System (ADS)
Cong Nguyen, Chi; Payrastre, Olivier; Gaume, Eric
2013-04-01
Flood frequency analyses are often based on continuous measured series at gauge sites. However, the length of the available data sets is usually too short to provide reliable estimates of extreme design floods. To reduce the estimation uncertainties, the analyzed data sets have to be extended either in time, making use of historical and paleoflood data, or in space, merging data sets considered as statistically homogeneous to build large regional data samples. Nevertheless, the advantage of the regional analyses, the important increase of the size of the studied data sets, may be counterbalanced by the possible heterogeneities of the merged sets. The application and comparison of four different flood frequency analysis methods to two regions affected by flash floods in the south of France (Ardèche and Var) illustrates how this balance between the number of records and possible heterogeneities plays in real-world applications. The four tested methods are: (1) a local statistical analysis based on the existing series of measured discharges, (2) a local analysis valuating the existing information on historical floods, (3) a standard regional flood frequency analysis based on existing measured series at gauged sites and (4) a modified regional analysis including estimated extreme peak discharges at ungauged sites. Monte Carlo simulations are conducted to simulate a large number of discharge series with characteristics similar to the observed ones (type of statistical distributions, number of sites and records) to evaluate to which extent the results obtained on these case studies can be generalized. These two case studies indicate that even small statistical heterogeneities, which are not detected by the standard homogeneity tests implemented in regional flood frequency studies, may drastically limit the usefulness of such approaches. On the other hand, these result show that the valuation of information on extreme events, either historical flood events at gauged sites or estimated extremes at ungauged sites in the considered region, is an efficient way to reduce uncertainties in flood frequency studies.
Gao, Xing; He, Yao; Hu, Hongpu
2017-01-01
Allowing for the differences in economy development, informatization degree and characteristic of population served and so on among different community health service organizations, community health service precision fund appropriation system based on performance management is designed, which can provide support for the government to appropriate financial funds scientifically and rationally for primary care. The system has the characteristic of flexibility and practicability, in which there are five subsystems including data acquisition, parameter setting, fund appropriation, statistical analysis system and user management.
Measuring uncertainty by extracting fuzzy rules using rough sets
NASA Technical Reports Server (NTRS)
Worm, Jeffrey A.
1991-01-01
Despite the advancements in the computer industry in the past 30 years, there is still one major deficiency. Computers are not designed to handle terms where uncertainty is present. To deal with uncertainty, techniques other than classical logic must be developed. The methods are examined of statistical analysis, the Dempster-Shafer theory, rough set theory, and fuzzy set theory to solve this problem. The fundamentals of these theories are combined to possibly provide the optimal solution. By incorporating principles from these theories, a decision making process may be simulated by extracting two sets of fuzzy rules: certain rules and possible rules. From these rules a corresponding measure of how much these rules is believed is constructed. From this, the idea of how much a fuzzy diagnosis is definable in terms of a set of fuzzy attributes is studied.
Calibrated Noise Measurements with Induced Receiver Gain Fluctuations
NASA Technical Reports Server (NTRS)
Racette, Paul; Walker, David; Gu, Dazhen; Rajola, Marco; Spevacek, Ashly
2011-01-01
The lack of well-developed techniques for modeling changing statistical moments in our observations has stymied the application of stochastic process theory in science and engineering. These limitations were encountered when modeling the performance of radiometer calibration architectures and algorithms in the presence of non stationary receiver fluctuations. Analyses of measured signals have traditionally been limited to a single measurement series. Whereas in a radiometer that samples a set of noise references, the data collection can be treated as an ensemble set of measurements of the receiver state. Noise Assisted Data Analysis is a growing field of study with significant potential for aiding the understanding and modeling of non stationary processes. Typically, NADA entails adding noise to a signal to produce an ensemble set on which statistical analysis is performed. Alternatively as in radiometric measurements, mixing a signal with calibrated noise provides, through the calibration process, the means to detect deviations from the stationary assumption and thereby a measurement tool to characterize the signal's non stationary properties. Data sets comprised of calibrated noise measurements have been limited to those collected with naturally occurring fluctuations in the radiometer receiver. To examine the application of NADA using calibrated noise, a Receiver Gain Modulation Circuit (RGMC) was designed and built to modulate the gain of a radiometer receiver using an external signal. In 2010, an RGMC was installed and operated at the National Institute of Standards and Techniques (NIST) using their Noise Figure Radiometer (NFRad) and national standard noise references. The data collected is the first known set of calibrated noise measurements from a receiver with an externally modulated gain. As an initial step, sinusoidal and step-function signals were used to modulate the receiver gain, to evaluate the circuit characteristics and to study the performance of a variety of calibration algorithms. The receiver noise temperature and time-bandwidth product of the NFRad are calculated from the data. Statistical analysis using temporal-dependent calibration algorithms reveals that the natural occurring fluctuations in the receiver are stationary over long intervals (100s of seconds); however the receiver exhibits local non stationarity over the interval over which one set of reference measurements are collected. A variety of calibration algorithms have been applied to the data to assess algorithms' performance with the gain fluctuation signals. This presentation will describe the RGMC, experiment design and a comparative analysis of calibration algorithms.
System Synthesis in Preliminary Aircraft Design using Statistical Methods
NASA Technical Reports Server (NTRS)
DeLaurentis, Daniel; Mavris, Dimitri N.; Schrage, Daniel P.
1996-01-01
This paper documents an approach to conceptual and preliminary aircraft design in which system synthesis is achieved using statistical methods, specifically design of experiments (DOE) and response surface methodology (RSM). These methods are employed in order to more efficiently search the design space for optimum configurations. In particular, a methodology incorporating three uses of these techniques is presented. First, response surface equations are formed which represent aerodynamic analyses, in the form of regression polynomials, which are more sophisticated than generally available in early design stages. Next, a regression equation for an overall evaluation criterion is constructed for the purpose of constrained optimization at the system level. This optimization, though achieved in a innovative way, is still traditional in that it is a point design solution. The methodology put forward here remedies this by introducing uncertainty into the problem, resulting a solutions which are probabilistic in nature. DOE/RSM is used for the third time in this setting. The process is demonstrated through a detailed aero-propulsion optimization of a high speed civil transport. Fundamental goals of the methodology, then, are to introduce higher fidelity disciplinary analyses to the conceptual aircraft synthesis and provide a roadmap for transitioning from point solutions to probabalistic designs (and eventually robust ones).
Chenel, Marylore; Bouzom, François; Cazade, Fanny; Ogungbenro, Kayode; Aarons, Leon; Mentré, France
2008-12-01
To compare results of population PK analyses obtained with a full empirical design (FD) and an optimal sparse design (MD) in a Drug-Drug Interaction (DDI) study aiming to evaluate the potential CYP3A4 inhibitory effect of a drug in development, SX, on a reference substrate, midazolam (MDZ). Secondary aim was to evaluate the interaction of SX on MDZ in the in vivo study. Methods To compare designs, real data were analysed by population PK modelling technique using either FD or MD with NONMEM FOCEI for SX and with NONMEM FOCEI and MONOLIX SAEM for MDZ. When applicable a Wald test was performed to compare model parameter estimates, such as apparent clearance (CL/F), across designs. To conclude on the potential interaction of SX on MDZ PK, a Student paired test was applied to compare the individual PK parameters (i.e. log(AUC) and log(C(max))) obtained either by a non-compartmental approach (NCA) using FD or from empirical Bayes estimates (EBE) obtained after fitting the model separately on each treatment group using either FD or MD. For SX, whatever the design, CL/F was well estimated and no statistical differences were found between CL/F estimated values obtained with FD (CL/F = 8.2 l/h) and MD (CL/F = 8.2 l/h). For MDZ, only MONOLIX was able to estimate CL/F and to provide its standard error of estimation with MD. With MONOLIX, whatever the design and the administration setting, MDZ CL/F was well estimated and there were no statistical differences between CL/F estimated values obtained with FD (72 l/h and 40 l/h for MDZ alone and for MDZ with SX, respectively) and MD (77 l/h and 45 l/h for MDZ alone and for MDZ with SX, respectively). Whatever the approach, NCA or population PK modelling, and for the latter approach, whatever the design, MD or FD, comparison tests showed that there was a statistical difference (P < 0.0001) between individual MDZ log(AUC) obtained after MDZ administration alone and co-administered with SX. Regarding C(max), there was a statistical difference (P < 0.05) between individual MDZ log(C(max)) obtained under the 2 administration settings in all cases, except with the sparse design with MONOLIX. However, the effect on C(max) was small. Finally, SX was shown to be a moderate CYP3A4 inhibitor, which at therapeutic doses increased MDZ exposure by a factor of 2 in average and almost did not affect the C(max). The optimal sparse design enabled the estimation of CL/F of a CYP3A4 substrate and inhibitor when co-administered together and to show the interaction leading to the same conclusion as the full empirical design.
Chenel, Marylore; Bouzom, François; Cazade, Fanny; Ogungbenro, Kayode; Aarons, Leon; Mentré, France
2008-01-01
Purpose To compare results of population PK analyses obtained with a full empirical design (FD) and an optimal sparse design (MD) in a Drug-Drug Interaction (DDI) study aiming to evaluate the potential CYP3A4 inhibitory effect of a drug in development, SX, on a reference substrate, midazolam (MDZ). Secondary aim was to evaluate the interaction of SX on MDZ in the in vivo study. Methods To compare designs, real data were analysed by population PK modelling using either FD or MD with NONMEM FOCEI for SX and with NONMEM FOCEI and MONOLIX SAEM for MDZ. When applicable a Wald’s test was performed to compare model parameter estimates, such as apparent clearance (CL/F), across designs. To conclude on the potential interaction of SX on MDZ PK, a Student paired test was applied to compare the individual PK parameters (i.e. log(AUC) and log(Cmax)) obtained either by a non-compartmental approach (NCA) using FD or from empirical Bayes estimates (EBE) obtained after fitting the model separately on each treatment group using either FD or MD. Results For SX, whatever the design, CL/F was well estimated and no statistical differences were found between CL/F estimated values obtained with FD (CL/F = 8.2 L/h) and MD (CL/F = 8.2 L/h). For MDZ, only MONOLIX was able to estimate CL/F and to provide its standard error of estimation with MD. With MONOLIX, whatever the design and the administration setting, MDZ CL/F was well estimated and there were no statistical differences between CL/F estimated values obtained with FD (72 L/h and 40 L/h for MDZ alone and for MDZ with SX, respectively) and MD (77 L/h and 45 L/h for MDZ alone and for MDZ with SX, respectively). Whatever the approach, NCA or population PK modelling, and for the latter approach, whatever the design, MD or FD, comparison tests showed that there was a statistical difference (p<0.0001) between individual MDZ log(AUC) obtained after MDZ administration alone and co-administered with SX. Regarding Cmax, there was a statistical difference (p<0.05) between individual MDZ log(Cmax) obtained under the 2 administration settings in all cases, except with the sparse design with MONOLIX. However, the effect on Cmax was small. Finally, SX was shown to be a moderate CYP3A4 inhibitor, which at therapeutic doses increased MDZ exposure by a factor 2 in average and almost did not affect the Cmax. Conclusion The optimal sparse design enabled the estimation of CL/F of a CYP3A4 substrate and inhibitor when co-administered together and to show the interaction leading to the same conclusion than the full empirical design. PMID:19130187
SIGPI. Fault Tree Cut Set System Performance
DOE Office of Scientific and Technical Information (OSTI.GOV)
Patenaude, C.J.
1992-01-13
SIGPI computes the probabilistic performance of complex systems by combining cut set or other binary product data with probability information on each basic event. SIGPI is designed to work with either coherent systems, where the system fails when certain combinations of components fail, or noncoherent systems, where at least one cut set occurs only if at least one component of the system is operating properly. The program can handle conditionally independent components, dependent components, or a combination of component types and has been used to evaluate responses to environmental threats and seismic events. The three data types that can bemore » input are cut set data in disjoint normal form, basic component probabilities for independent basic components, and mean and covariance data for statistically dependent basic components.« less
SIGPI. Fault Tree Cut Set System Performance
DOE Office of Scientific and Technical Information (OSTI.GOV)
Patenaude, C.J.
1992-01-14
SIGPI computes the probabilistic performance of complex systems by combining cut set or other binary product data with probability information on each basic event. SIGPI is designed to work with either coherent systems, where the system fails when certain combinations of components fail, or noncoherent systems, where at least one cut set occurs only if at least one component of the system is operating properly. The program can handle conditionally independent components, dependent components, or a combination of component types and has been used to evaluate responses to environmental threats and seismic events. The three data types that can bemore » input are cut set data in disjoint normal form, basic component probabilities for independent basic components, and mean and covariance data for statistically dependent basic components.« less
[Triple-type theory of statistics and its application in the scientific research of biomedicine].
Hu, Liang-ping; Liu, Hui-gang
2005-07-20
To point out the crux of why so many people failed to grasp statistics and to bring forth a "triple-type theory of statistics" to solve the problem in a creative way. Based on the experience in long-time teaching and research in statistics, the "three-type theory" was raised and clarified. Examples were provided to demonstrate that the 3 types, i.e., expressive type, prototype and the standardized type are the essentials for people to apply statistics rationally both in theory and practice, and moreover, it is demonstrated by some instances that the "three types" are correlated with each other. It can help people to see the essence by interpreting and analyzing the problems of experimental designs and statistical analyses in medical research work. Investigations reveal that for some questions, the three types are mutually identical; for some questions, the prototype is their standardized type; however, for some others, the three types are distinct from each other. It has been shown that in some multifactor experimental researches, it leads to the nonexistence of the standardized type corresponding to the prototype at all, because some researchers have committed the mistake of "incomplete control" in setting experimental groups. This is a problem which should be solved by the concept and method of "division". Once the "triple-type" for each question is clarified, a proper experimental design and statistical method can be carried out easily. "Triple-type theory of statistics" can help people to avoid committing statistical mistakes or at least to decrease the misuse rate dramatically and improve the quality, level and speed of biomedical research during the process of applying statistics. It can also help people to improve the quality of statistical textbooks and the teaching effect of statistics and it has demonstrated how to advance biomedical statistics.
Estimating Uncertainties in the Multi-Instrument SBUV Profile Ozone Merged Data Set
NASA Technical Reports Server (NTRS)
Frith, Stacey; Stolarski, Richard
2015-01-01
The MOD data set is uniquely qualified for use in long-term ozone analysis because of its long record, high spatial coverage, and consistent instrument design and algorithm. The estimated MOD uncertainty term significantly increases the uncertainty over the statistical error alone. Trends in the post-2000 period are generally positive in the upper stratosphere, but only significant at 1-1.6 hPa. Remaining uncertainties not yet included in the Monte Carlo model are Smoothing Error ( 1 from 10 to 1 hPa) Relative calibration uncertainty between N11 and N17Seasonal cycle differences between SBUV records.
Park, Jangwoon; Ebert, Sheila M; Reed, Matthew P; Hallman, Jason J
2016-03-01
Previously published statistical models of driving posture have been effective for vehicle design but have not taken into account the effects of age. The present study developed new statistical models for predicting driving posture. Driving postures of 90 U.S. drivers with a wide range of age and body size were measured in laboratory mockup in nine package conditions. Posture-prediction models for female and male drivers were separately developed by employing a stepwise regression technique using age, body dimensions, vehicle package conditions, and two-way interactions, among other variables. Driving posture was significantly associated with age, and the effects of other variables depended on age. A set of posture-prediction models is presented for women and men. The results are compared with a previously developed model. The present study is the first study of driver posture to include a large cohort of older drivers and the first to report a significant effect of age. The posture-prediction models can be used to position computational human models or crash-test dummies for vehicle design and assessment. © 2015, Human Factors and Ergonomics Society.
Mutual interference between statistical summary perception and statistical learning.
Zhao, Jiaying; Ngo, Nhi; McKendrick, Ryan; Turk-Browne, Nicholas B
2011-09-01
The visual system is an efficient statistician, extracting statistical summaries over sets of objects (statistical summary perception) and statistical regularities among individual objects (statistical learning). Although these two kinds of statistical processing have been studied extensively in isolation, their relationship is not yet understood. We first examined how statistical summary perception influences statistical learning by manipulating the task that participants performed over sets of objects containing statistical regularities (Experiment 1). Participants who performed a summary task showed no statistical learning of the regularities, whereas those who performed control tasks showed robust learning. We then examined how statistical learning influences statistical summary perception by manipulating whether the sets being summarized contained regularities (Experiment 2) and whether such regularities had already been learned (Experiment 3). The accuracy of summary judgments improved when regularities were removed and when learning had occurred in advance. In sum, calculating summary statistics impeded statistical learning, and extracting statistical regularities impeded statistical summary perception. This mutual interference suggests that statistical summary perception and statistical learning are fundamentally related.
ERIC Educational Resources Information Center
Canadian Council on Learning, 2009
2009-01-01
A significant proportion of Canada's school-age population requires special educational provisions. Statistics Canada reports that 4.6% of 5- to 14-year-olds had some kind of disability in 2006. As well, recent data from the British Columbia and Ontario ministries of education indicate that students with designated special educational needs…
Reducing the Language Content in ToM Tests: A Developmental Scale
ERIC Educational Resources Information Center
Burnel, Morgane; Perrone-Bertolotti, Marcela; Reboul, Anne; Baciu, Monica; Durrleman, Stephanie
2018-01-01
The goal of the current study was to statistically evaluate the reliable scalability of a set of tasks designed to assess Theory of Mind (ToM) without language as a confounding variable. This tool might be useful to study ToM in populations where language is impaired or to study links between language and ToM. Low verbal versions of the ToM tasks…
ERIC Educational Resources Information Center
Mantri, Archana
2014-01-01
The intent of the study presented in this paper is to show that the model of problem-based learning (PBL) can be made scalable by designing curriculum around a set of open-ended problems (OEPs). The detailed statistical analysis of the data collected to measure the effects of traditional and PBL instructions for three courses in Electronics and…
Demystification of Bell inequality
NASA Astrophysics Data System (ADS)
Khrennikov, Andrei
2009-08-01
The main aim of this review is to show that the common conclusion that Bell's argument implies that any attempt to proceed beyond quantum mechanics induces a nonlocal model was not totally justified. Our analysis of Bell's argument demonstrates that violation of Bell's inequality implies neither "death of realism" nor nonlocality. This violation is just a sign of non-Kolmogorovness of statistical data - impossibility to put statistical data collected in a few different experiments (corresponding to incompatible settings of polarization beam splitters) in one probability space. This inequality was well known in theoretical probability since 19th century (from works of Boole). We couple non-Kolmogorovness of data with design of modern detectors of photons.
Bréant, C; Borst, F; Campi, D; Griesser, V; Momjian, S
1999-01-01
The use of a controlled vocabulary set in a hospital-wide clinical information system is of crucial importance for many departmental database systems to communicate and exchange information. In the absence of an internationally recognized clinical controlled vocabulary set, a new extension of the International statistical Classification of Diseases (ICD) is proposed. It expands the scope of the standard ICD beyond diagnosis and procedures to clinical terminology. In addition, the common Clinical Findings Dictionary (CFD) further records the definition of clinical entities. The construction of the vocabulary set and the CFD is incremental and manual. Tools have been implemented to facilitate the tasks of defining/maintaining/publishing dictionary versions. The design of database applications in the integrated clinical information system is driven by the CFD which is part of the Medical Questionnaire Designer tool. Several integrated clinical database applications in the field of diabetes and neuro-surgery have been developed at the HUG.
Bréant, C.; Borst, F.; Campi, D.; Griesser, V.; Momjian, S.
1999-01-01
The use of a controlled vocabulary set in a hospital-wide clinical information system is of crucial importance for many departmental database systems to communicate and exchange information. In the absence of an internationally recognized clinical controlled vocabulary set, a new extension of the International statistical Classification of Diseases (ICD) is proposed. It expands the scope of the standard ICD beyond diagnosis and procedures to clinical terminology. In addition, the common Clinical Findings Dictionary (CFD) further records the definition of clinical entities. The construction of the vocabulary set and the CFD is incremental and manual. Tools have been implemented to facilitate the tasks of defining/maintaining/publishing dictionary versions. The design of database applications in the integrated clinical information system is driven by the CFD which is part of the Medical Questionnaire Designer tool. Several integrated clinical database applications in the field of diabetes and neuro-surgery have been developed at the HUG. Images Figure 1 PMID:10566451
Neuroimaging in aphasia treatment research: Consensus and practical guidelines for data analysis
Meinzer, Marcus; Beeson, Pélagie M.; Cappa, Stefano; Crinion, Jenny; Kiran, Swathi; Saur, Dorothee; Parrish, Todd; Crosson, Bruce; Thompson, Cynthia K.
2012-01-01
Functional magnetic resonance imaging is the most widely used imaging technique to study treatment-induced recovery in post-stroke aphasia. The longitudinal design of such studies adds to the challenges researchers face when studying patient populations with brain damage in cross-sectional settings. The present review focuses on issues specifically relevant to neuroimaging data analysis in aphasia treatment research identified in discussions among international researchers at the Neuroimaging in Aphasia Treatment Research Workshop held at Northwestern University (Evanston, Illinois, USA). In particular, we aim to provide the reader with a critical review of unique problems related to the pre-processing, statistical modeling and interpretation of such data sets. Despite the fact that data analysis procedures critically depend on specific design features of a given study, we aim to discuss and communicate a basic set of practical guidelines that should be applicable to a wide range of studies and useful as a reference for researchers pursuing this line of research. PMID:22387474
Bayesian Dose-Response Modeling in Sparse Data
NASA Astrophysics Data System (ADS)
Kim, Steven B.
This book discusses Bayesian dose-response modeling in small samples applied to two different settings. The first setting is early phase clinical trials, and the second setting is toxicology studies in cancer risk assessment. In early phase clinical trials, experimental units are humans who are actual patients. Prior to a clinical trial, opinions from multiple subject area experts are generally more informative than the opinion of a single expert, but we may face a dilemma when they have disagreeing prior opinions. In this regard, we consider compromising the disagreement and compare two different approaches for making a decision. In addition to combining multiple opinions, we also address balancing two levels of ethics in early phase clinical trials. The first level is individual-level ethics which reflects the perspective of trial participants. The second level is population-level ethics which reflects the perspective of future patients. We extensively compare two existing statistical methods which focus on each perspective and propose a new method which balances the two conflicting perspectives. In toxicology studies, experimental units are living animals. Here we focus on a potential non-monotonic dose-response relationship which is known as hormesis. Briefly, hormesis is a phenomenon which can be characterized by a beneficial effect at low doses and a harmful effect at high doses. In cancer risk assessments, the estimation of a parameter, which is known as a benchmark dose, can be highly sensitive to a class of assumptions, monotonicity or hormesis. In this regard, we propose a robust approach which considers both monotonicity and hormesis as a possibility. In addition, We discuss statistical hypothesis testing for hormesis and consider various experimental designs for detecting hormesis based on Bayesian decision theory. Past experiments have not been optimally designed for testing for hormesis, and some Bayesian optimal designs may not be optimal under a wrong parametric assumption. In this regard, we consider a robust experimental design which does not require any parametric assumption.
Pinheiro, Rubiane C; Soares, Cleide M F; de Castro, Heizir F; Moraes, Flavio F; Zanin, Gisella M
2008-03-01
The conditions for maximization of the enzymatic activity of lipase entrapped in sol-gel matrix were determined for different vegetable oils using an experimental design. The effects of pH, temperature, and biocatalyst loading on lipase activity were verified using a central composite experimental design leading to a set of 13 assays and the surface response analysis. For canola oil and entrapped lipase, statistical analyses showed significant effects for pH and temperature and also the interactions between pH and temperature and temperature and biocatalyst loading. For the olive oil and entrapped lipase, it was verified that the pH was the only variable statistically significant. This study demonstrated that response surface analysis is a methodology appropriate for the maximization of the percentage of hydrolysis, as a function of pH, temperature, and lipase loading.
Design of novel quinazolinone derivatives as inhibitors for 5HT7 receptor.
Chitta, Aparna; Jatavath, Mohan Babu; Fatima, Sabiha; Manga, Vijjulatha
2012-02-01
To study the pharmacophore properties of quinazolinone derivatives as 5HT(7) inhibitors, 3D QSAR methodologies, namely Comparative Molecular Field Analysis (CoMFA) and Comparative Molecular Similarity Indices Analysis (CoMSIA) were applied, partial least square (PLS) analysis was performed and QSAR models were generated. The derived model showed good statistical reliability in terms of predicting the 5HT(7) inhibitory activity of the quinazolione derivative, based on molecular property fields like steric, electrostatic, hydrophobic, hydrogen bond donor and hydrogen bond acceptor fields. This is evident from statistical parameters like q(2) (cross validated correlation coefficient) of 0.642, 0.602 and r(2) (conventional correlation coefficient) of 0.937, 0.908 for CoMFA and CoMSIA respectively. The predictive ability of the models to determine 5HT(7) antagonistic activity is validated using a test set of 26 molecules that were not included in the training set and the predictive r(2) obtained for the test set was 0.512 & 0.541. Further, the results of the derived model are illustrated by means of contour maps, which give an insight into the interaction of the drug with the receptor. The molecular fields so obtained served as the basis for the design of twenty new ligands. In addition, ADME (Adsorption, Distribution, Metabolism and Elimination) have been calculated in order to predict the relevant pharmaceutical properties, and the results are in conformity with required drug like properties.
NASA Technical Reports Server (NTRS)
Falls, L. W.; Crutcher, H. L.
1976-01-01
Transformation of statistics from a dimensional set to another dimensional set involves linear functions of the original set of statistics. Similarly, linear functions will transform statistics within a dimensional set such that the new statistics are relevant to a new set of coordinate axes. A restricted case of the latter is the rotation of axes in a coordinate system involving any two correlated random variables. A special case is the transformation for horizontal wind distributions. Wind statistics are usually provided in terms of wind speed and direction (measured clockwise from north) or in east-west and north-south components. A direct application of this technique allows the determination of appropriate wind statistics parallel and normal to any preselected flight path of a space vehicle. Among the constraints for launching space vehicles are critical values selected from the distribution of the expected winds parallel to and normal to the flight path. These procedures are applied to space vehicle launches at Cape Kennedy, Florida.
Evidence for a Global Sampling Process in Extraction of Summary Statistics of Item Sizes in a Set.
Tokita, Midori; Ueda, Sachiyo; Ishiguchi, Akira
2016-01-01
Several studies have shown that our visual system may construct a "summary statistical representation" over groups of visual objects. Although there is a general understanding that human observers can accurately represent sets of a variety of features, many questions on how summary statistics, such as an average, are computed remain unanswered. This study investigated sampling properties of visual information used by human observers to extract two types of summary statistics of item sets, average and variance. We presented three models of ideal observers to extract the summary statistics: a global sampling model without sampling noise, global sampling model with sampling noise, and limited sampling model. We compared the performance of an ideal observer of each model with that of human observers using statistical efficiency analysis. Results suggest that summary statistics of items in a set may be computed without representing individual items, which makes it possible to discard the limited sampling account. Moreover, the extraction of summary statistics may not necessarily require the representation of individual objects with focused attention when the sets of items are larger than 4.
NASA Technical Reports Server (NTRS)
Anderson, Bernhard H.; Baust, Henry D.; Agrell, Johan
2002-01-01
It is the purpose of this study to demonstrate the viability and economy of Response Surface Methods (RSM) and Robustness Design Concepts (RDC) to arrive at micro-secondary flow control installation designs that maintain optimal inlet performance over a range of the mission variables. These statistical design concepts were used to investigate the robustness properties of 'low unit strength' micro-effector installations. 'Low unit strength' micro-effectors are micro-vanes set at very low angles-of-incidence with very long chord lengths. They were designed to influence the near wall inlet flow over an extended streamwise distance, and their advantage lies in low total pressure loss and high effectiveness in managing engine face distortion.
NASA Astrophysics Data System (ADS)
Mantri, Archana
2014-05-01
The intent of the study presented in this paper is to show that the model of problem-based learning (PBL) can be made scalable by designing curriculum around a set of open-ended problems (OEPs). The detailed statistical analysis of the data collected to measure the effects of traditional and PBL instructions for three courses in Electronics and Communication Engineering, namely Analog Electronics, Digital Electronics and Pulse, Digital & Switching Circuits is presented here. It measures the effects of pedagogy, gender and cognitive styles on the knowledge, skill and attitude of the students. The study was conducted two times with content designed around same set of OEPs but with two different trained facilitators for all the three courses. The repeatability of results for effects of the independent parameters on dependent parameters is studied and inferences are drawn.
Methods and apparatuses for information analysis on shared and distributed computing systems
Bohn, Shawn J [Richland, WA; Krishnan, Manoj Kumar [Richland, WA; Cowley, Wendy E [Richland, WA; Nieplocha, Jarek [Richland, WA
2011-02-22
Apparatuses and computer-implemented methods for analyzing, on shared and distributed computing systems, information comprising one or more documents are disclosed according to some aspects. In one embodiment, information analysis can comprise distributing one or more distinct sets of documents among each of a plurality of processes, wherein each process performs operations on a distinct set of documents substantially in parallel with other processes. Operations by each process can further comprise computing term statistics for terms contained in each distinct set of documents, thereby generating a local set of term statistics for each distinct set of documents. Still further, operations by each process can comprise contributing the local sets of term statistics to a global set of term statistics, and participating in generating a major term set from an assigned portion of a global vocabulary.
Recommendations for evaluation of computational methods
NASA Astrophysics Data System (ADS)
Jain, Ajay N.; Nicholls, Anthony
2008-03-01
The field of computational chemistry, particularly as applied to drug design, has become increasingly important in terms of the practical application of predictive modeling to pharmaceutical research and development. Tools for exploiting protein structures or sets of ligands known to bind particular targets can be used for binding-mode prediction, virtual screening, and prediction of activity. A serious weakness within the field is a lack of standards with respect to quantitative evaluation of methods, data set preparation, and data set sharing. Our goal should be to report new methods or comparative evaluations of methods in a manner that supports decision making for practical applications. Here we propose a modest beginning, with recommendations for requirements on statistical reporting, requirements for data sharing, and best practices for benchmark preparation and usage.
Politis, Stavros N; Rekkas, Dimitrios M
2017-04-01
A novel hot melt direct pelletization method was developed, characterized and optimized, using statistical thinking and experimental design tools. Mixtures of carnauba wax (CW) and HPMC K100M were spheronized using melted gelucire 50-13 as a binding material (BM). Experimentation was performed sequentially; a fractional factorial design was set up initially to screen the factors affecting the process, namely spray rate, quantity of BM, rotor speed, type of rotor disk, lubricant-glidant presence, additional spheronization time, powder feeding rate and quantity. From the eight factors assessed, three were further studied during process optimization (spray rate, quantity of BM and powder feeding rate), at different ratios of the solid mixture of CW and HPMC K100M. The study demonstrated that the novel hot melt process is fast, efficient, reproducible and predictable. Therefore, it can be adopted in a lean and agile manufacturing setting for the production of flexible pellet dosage forms with various release rates easily customized between immediate and modified delivery.
Generating Focused Molecule Libraries for Drug Discovery with Recurrent Neural Networks
2017-01-01
In de novo drug design, computational strategies are used to generate novel molecules with good affinity to the desired biological target. In this work, we show that recurrent neural networks can be trained as generative models for molecular structures, similar to statistical language models in natural language processing. We demonstrate that the properties of the generated molecules correlate very well with the properties of the molecules used to train the model. In order to enrich libraries with molecules active toward a given biological target, we propose to fine-tune the model with small sets of molecules, which are known to be active against that target. Against Staphylococcus aureus, the model reproduced 14% of 6051 hold-out test molecules that medicinal chemists designed, whereas against Plasmodium falciparum (Malaria), it reproduced 28% of 1240 test molecules. When coupled with a scoring function, our model can perform the complete de novo drug design cycle to generate large sets of novel molecules for drug discovery. PMID:29392184
Development of the Statistical Reasoning in Biology Concept Inventory (SRBCI).
Deane, Thomas; Nomme, Kathy; Jeffery, Erica; Pollock, Carol; Birol, Gülnur
2016-01-01
We followed established best practices in concept inventory design and developed a 12-item inventory to assess student ability in statistical reasoning in biology (Statistical Reasoning in Biology Concept Inventory [SRBCI]). It is important to assess student thinking in this conceptual area, because it is a fundamental requirement of being statistically literate and associated skills are needed in almost all walks of life. Despite this, previous work shows that non-expert-like thinking in statistical reasoning is common, even after instruction. As science educators, our goal should be to move students along a novice-to-expert spectrum, which could be achieved with growing experience in statistical reasoning. We used item response theory analyses (the one-parameter Rasch model and associated analyses) to assess responses gathered from biology students in two populations at a large research university in Canada in order to test SRBCI's robustness and sensitivity in capturing useful data relating to the students' conceptual ability in statistical reasoning. Our analyses indicated that SRBCI is a unidimensional construct, with items that vary widely in difficulty and provide useful information about such student ability. SRBCI should be useful as a diagnostic tool in a variety of biology settings and as a means of measuring the success of teaching interventions designed to improve statistical reasoning skills. © 2016 T. Deane et al. CBE—Life Sciences Education © 2016 The American Society for Cell Biology. This article is distributed by The American Society for Cell Biology under license from the author(s). It is available to the public under an Attribution–Noncommercial–Share Alike 3.0 Unported Creative Commons License (http://creativecommons.org/licenses/by-nc-sa/3.0).
DOE Office of Scientific and Technical Information (OSTI.GOV)
Berman, D.W.; Allen, B.C.; Van Landingham, C.B.
1998-12-31
The decision rules commonly employed to determine the need for cleanup are evaluated both to identify conditions under which they lead to erroneous conclusions and to quantify the rate that such errors occur. Their performance is also compared with that of other applicable decision rules. The authors based the evaluation of decision rules on simulations. Results are presented as power curves. These curves demonstrate that the degree of statistical control achieved is independent of the form of the null hypothesis. The loss of statistical control that occurs when a decision rule is applied to a data set that does notmore » satisfy the rule`s validity criteria is also clearly demonstrated. Some of the rules evaluated do not offer the formal statistical control that is an inherent design feature of other rules. Nevertheless, results indicate that such informal decision rules may provide superior overall control of error rates, when their application is restricted to data exhibiting particular characteristics. The results reported here are limited to decision rules applied to uncensored and lognormally distributed data. To optimize decision rules, it is necessary to evaluate their behavior when applied to data exhibiting a range of characteristics that bracket those common to field data. The performance of decision rules applied to data sets exhibiting a broader range of characteristics is reported in the second paper of this study.« less
How to Get Statistically Significant Effects in Any ERP Experiment (and Why You Shouldn’t)
Luck, Steven J.; Gaspelin, Nicholas
2016-01-01
Event-related potential (ERP) experiments generate massive data sets, often containing thousands of values for each participant, even after averaging. The richness of these data sets can be very useful in testing sophisticated hypotheses, but this richness also creates many opportunities to obtain effects that are statistically significant but do not reflect true differences among groups or conditions (bogus effects). The purpose of this paper is to demonstrate how common and seemingly innocuous methods for quantifying and analyzing ERP effects can lead to very high rates of significant-but-bogus effects, with the likelihood of obtaining at least one such bogus effect exceeding 50% in many experiments. We focus on two specific problems: using the grand average data to select the time windows and electrode sites for quantifying component amplitudes and latencies, and using one or more multi-factor statistical analyses. Re-analyses of prior data and simulations of typical experimental designs are used to show how these problems can greatly increase the likelihood of significant-but-bogus results. Several strategies are described for avoiding these problems and for increasing the likelihood that significant effects actually reflect true differences among groups or conditions. PMID:28000253
Gramatikov, Boris I
2017-04-27
Reliable detection of central fixation and eye alignment is essential in the diagnosis of amblyopia ("lazy eye"), which can lead to blindness. Our lab has developed and reported earlier a pediatric vision screener that performs scanning of the retina around the fovea and analyzes changes in the polarization state of light as the scan progresses. Depending on the direction of gaze and the instrument design, the screener produces several signal frequencies that can be utilized in the detection of central fixation. The objective of this study was to compare artificial neural networks with classical statistical methods, with respect to their ability to detect central fixation reliably. A classical feedforward, pattern recognition, two-layer neural network architecture was used, consisting of one hidden layer and one output layer. The network has four inputs, representing normalized spectral powers at four signal frequencies generated during retinal birefringence scanning. The hidden layer contains four neurons. The output suggests presence or absence of central fixation. Backpropagation was used to train the network, using the gradient descent algorithm and the cross-entropy error as the performance function. The network was trained, validated and tested on a set of controlled calibration data obtained from 600 measurements from ten eyes in a previous study, and was additionally tested on a clinical set of 78 eyes, independently diagnosed by an ophthalmologist. In the first part of this study, a neural network was designed around the calibration set. With a proper architecture and training, the network provided performance that was comparable to classical statistical methods, allowing perfect separation between the central and paracentral fixation data, with both the sensitivity and the specificity of the instrument being 100%. In the second part of the study, the neural network was applied to the clinical data. It allowed reliable separation between normal subjects and affected subjects, its accuracy again matching that of the statistical methods. With a proper choice of a neural network architecture and a good, uncontaminated training data set, the artificial neural network can be an efficient classification tool for detecting central fixation based on retinal birefringence scanning.
Yang, Yi; Tokita, Midori; Ishiguchi, Akira
2018-01-01
A number of studies revealed that our visual system can extract different types of summary statistics, such as the mean and variance, from sets of items. Although the extraction of such summary statistics has been studied well in isolation, the relationship between these statistics remains unclear. In this study, we explored this issue using an individual differences approach. Observers viewed illustrations of strawberries and lollypops varying in size or orientation and performed four tasks in a within-subject design, namely mean and variance discrimination tasks with size and orientation domains. We found that the performances in the mean and variance discrimination tasks were not correlated with each other and demonstrated that extractions of the mean and variance are mediated by different representation mechanisms. In addition, we tested the relationship between performances in size and orientation domains for each summary statistic (i.e. mean and variance) and examined whether each summary statistic has distinct processes across perceptual domains. The results illustrated that statistical summary representations of size and orientation may share a common mechanism for representing the mean and possibly for representing variance. Introspections for each observer performing the tasks were also examined and discussed.
Assessing the Power of Exome Chips.
Page, Christian Magnus; Baranzini, Sergio E; Mevik, Bjørn-Helge; Bos, Steffan Daniel; Harbo, Hanne F; Andreassen, Bettina Kulle
2015-01-01
Genotyping chips for rare and low-frequent variants have recently gained popularity with the introduction of exome chips, but the utility of these chips remains unclear. These chips were designed using exome sequencing data from mainly American-European individuals, enriched for a narrow set of common diseases. In addition, it is well-known that the statistical power of detecting associations with rare and low-frequent variants is much lower compared to studies exclusively involving common variants. We developed a simulation program adaptable to any exome chip design to empirically evaluate the power of the exome chips. We implemented the main properties of the Illumina HumanExome BeadChip array. The simulated data sets were used to assess the power of exome chip based studies for varying effect sizes and causal variant scenarios. We applied two widely-used statistical approaches for rare and low-frequency variants, which collapse the variants into genetic regions or genes. Under optimal conditions, we found that a sample size between 20,000 to 30,000 individuals were needed in order to detect modest effect sizes (0.5% < PAR > 1%) with 80% power. For small effect sizes (PAR <0.5%), 60,000-100,000 individuals were needed in the presence of non-causal variants. In conclusion, we found that at least tens of thousands of individuals are necessary to detect modest effects under optimal conditions. In addition, when using rare variant chips on cohorts or diseases they were not originally designed for, the identification of associated variants or genes will be even more challenging.
NASA Technical Reports Server (NTRS)
Michaud, N. H.
1979-01-01
A system of independent computer programs for the processing of digitized pulse code modulated (PCM) and frequency modulated (FM) data is described. Information is stored in a set of random files and accessed to produce both statistical and graphical output. The software system is designed primarily to present these reports within a twenty-four hour period for quick analysis of the helicopter's performance.
ERIC Educational Resources Information Center
Nam, Eun-Woo; Song, Yea-Li-A
2007-01-01
Objective: This study attempts to provide fundamental information to help with the development of health policy and health services by looking at the trends of the gender-specific mortality rates in Korea and Japan. Design: The death statistics of Korea and Japan over the 21-year period from 1983 to 2003 are analyzed. Setting: We used the death…
Criteria for Choosing the Best Neural Network: Part 1
1991-07-24
Touretzky, pp. 177-185. San Mateo: Morgan Kaufmann. Harp, S.A., Samad , T., and Guha, A . (1990). Designing application-specific neural networks using genetic...determining a parsimonious neural network for use in prediction/generalization based on a given fixed learning sample. Both the classification and...statistical settings, algorithms for selecting the number of hidden layer nodes in a three layer, feedforward neural network are presented. The selection
ERIC Educational Resources Information Center
Yang, Yu N.; Li, Yuan H.; Tompkins, Leroy J.; Modarresi, Shahpar
2005-01-01
This summative evaluation of magnet programs employed a quasi-experimental design to investigate whether or not students enrolled in magnet programs gained any achievement advantage over students who were not enrolled in a magnet program. Researchers used Zero-One Linear Programming to draw multiple sets of matched samples from the non-magnet…
Designing high speed diagnostics
NASA Astrophysics Data System (ADS)
Veliz Carrillo, Gerardo; Martinez, Adam; Mula, Swathi; Prestridge, Kathy; Extreme Fluids Team Team
2017-11-01
Timing and firing for shock-driven flows is complex because of jitter in the shock tube mechanical drivers. Consequently, experiments require dynamic triggering of diagnostics from pressure transducers. We explain the design process and criteria for setting up re-shock experiments at the Los Alamos Vertical Shock Tube facility, and the requirements for particle image velocimetry and planar laser induced fluorescence measurements necessary for calculating Richtmeyer-Meshkov variable density turbulent statistics. Dynamic triggering of diagnostics allows for further investigation of the development of the Richtemeyer-Meshkov instability at both initial shock and re-shock. Thanks to the Los Alamos National Laboratory for funding our project.
Region 9 2010 Census Web Service
This web service displays data collected during the 2010 U.S. Census. The data are organized into layers representing Tract, Block, and Block Group visualizations. Geography The TIGER Line Files are feature classes and related database files that are an extract of selected geographic and cartographic information from the U.S. Census Bureau's Master Address File / Topologically Integrated Geographic Encoding and Referencing (MAF/TIGER) Database (MTDB). The MTDB represents a seamless national file with no overlaps or gaps between parts, however, each TIGER Line File is designed to stand alone as an independent data set, or they can be combined to cover the entire nation. Census tracts are small, relatively permanent statistical subdivisions of a county or equivalent entity, and were defined by local participants as part of the 2010 Census Participant Statistical Areas Program. The Census Bureau delineated the census tracts in situations where no local participant existed or where all the potential participants declined to participate. The primary purpose of census tracts is to provide a stable set of geographic units for the presentation of census data and comparison back to previous decennial censuses. Census tracts generally have a population size between 1,200 and 8,000 people, with an optimum size of 4,000 people. When first delineated, census tracts were designed to be homogeneous with respect to population characteristics, economic status
Mirmohammadi, Seyyed Jalil; Hafezi, Rahmatollah; Mehrparvar, Amir Houshang; Gerdfaramarzi, Raziyeh Soltani; Mostaghaci, Mehrdad; Nodoushan, Reza Jafari; Rezaeian, Bibiseyedeh
2013-01-01
Anthropometric data can be used to identify the physical dimensions of equipment, furniture, clothing and workstations. The use of poorly designed furniture that fails to fulfil the users' anthropometric dimensions, has a negative impact on human health. In this study, we measured some anthropometric dimensions of Iranian children from different ethnicities. A total of 12,731 Iranian primary school children aged 7-11 years were included in the study and their static anthropometric dimensions were measured. Descriptive statistics such as mean, standard deviation and key percentiles were calculated. All dimensions were compared among different ethnicities and different genders. This study showed significant differences in a set of 22 anthropometric dimensions with regard to gender, age and ethnicity. Turk boys and Arab girls were larger than their contemporaries in different ages. According to the results of this study, difference between genders and among different ethnicities should be taken into account by designers and manufacturers of school furniture. In this study, we measured 22 static anthropometric dimensions of 12,731 Iranian primary school children aged 7-11 years from different ethnicities. Descriptive statistics such as mean, standard deviation and key percentiles were measured for each dimension. This study showed significant differences in a set of 22 anthropometric dimensions in different genders, ages and ethnicities.
On sufficient statistics of least-squares superposition of vector sets.
Konagurthu, Arun S; Kasarapu, Parthan; Allison, Lloyd; Collier, James H; Lesk, Arthur M
2015-06-01
The problem of superposition of two corresponding vector sets by minimizing their sum-of-squares error under orthogonal transformation is a fundamental task in many areas of science, notably structural molecular biology. This problem can be solved exactly using an algorithm whose time complexity grows linearly with the number of correspondences. This efficient solution has facilitated the widespread use of the superposition task, particularly in studies involving macromolecular structures. This article formally derives a set of sufficient statistics for the least-squares superposition problem. These statistics are additive. This permits a highly efficient (constant time) computation of superpositions (and sufficient statistics) of vector sets that are composed from its constituent vector sets under addition or deletion operation, where the sufficient statistics of the constituent sets are already known (that is, the constituent vector sets have been previously superposed). This results in a drastic improvement in the run time of the methods that commonly superpose vector sets under addition or deletion operations, where previously these operations were carried out ab initio (ignoring the sufficient statistics). We experimentally demonstrate the improvement our work offers in the context of protein structural alignment programs that assemble a reliable structural alignment from well-fitting (substructural) fragment pairs. A C++ library for this task is available online under an open-source license.
Strategies Used by Students to Compare Two Data Sets
ERIC Educational Resources Information Center
Reaburn, Robyn
2012-01-01
One of the common tasks of inferential statistics is to compare two data sets. Long before formal statistical procedures, however, students can be encouraged to make comparisons between data sets and therefore build up intuitive statistical reasoning. Such tasks also give meaning to the data collection students may do. This study describes the…
Usage and Design Evaluation by Family Caregivers of aStroke Intervention Website
Pierce, Linda L.; Steiner, Victoria
2013-01-01
Background Four out of 5 families are affected by stroke. Many caregivers access the Internet and gather healthcare information from web-based sources. Design The purpose of this descriptive evaluation was to assess the usage and design of the Caring~Web© site, which provides education/support for family caregivers of persons with stroke residing in home settings. Sample and Setting Thirty-six caregivers from two Midwest states accessed this intervention in a 1-year study. The average participant was fifty-four years of age, white, female, and the spouse of the care recipient. Methods In a telephone interview, four website questions were asked twice-/bi-monthly and a 33-item Survey at the conclusion of the study evaluated the website usage and design of its components. Descriptive analysis methods were used and statistics were collected on the number of visits to the website. Results On average, participants logged on to the website one to two hours per week, although usage declined after several months for some participants. Participants positively rated the website’s appearance and usability that included finding the training to be adequate. Conclusion Website designers can replicate this intervention for other health conditions. PMID:24025464
Reproducible detection of disease-associated markers from gene expression data.
Omae, Katsuhiro; Komori, Osamu; Eguchi, Shinto
2016-08-18
Detection of disease-associated markers plays a crucial role in gene screening for biological studies. Two-sample test statistics, such as the t-statistic, are widely used to rank genes based on gene expression data. However, the resultant gene ranking is often not reproducible among different data sets. Such irreproducibility may be caused by disease heterogeneity. When we divided data into two subsets, we found that the signs of the two t-statistics were often reversed. Focusing on such instability, we proposed a sign-sum statistic that counts the signs of the t-statistics for all possible subsets. The proposed method excludes genes affected by heterogeneity, thereby improving the reproducibility of gene ranking. We compared the sign-sum statistic with the t-statistic by a theoretical evaluation of the upper confidence limit. Through simulations and applications to real data sets, we show that the sign-sum statistic exhibits superior performance. We derive the sign-sum statistic for getting a robust gene ranking. The sign-sum statistic gives more reproducible ranking than the t-statistic. Using simulated data sets we show that the sign-sum statistic excludes hetero-type genes well. Also for the real data sets, the sign-sum statistic performs well in a viewpoint of ranking reproducibility.
Nagashima, Hiroaki; Watari, Akiko; Shinoda, Yasuharu; Okamoto, Hiroshi; Takuma, Shinya
2013-12-01
This case study describes the application of Quality by Design elements to the process of culturing Chinese hamster ovary cells in the production of a monoclonal antibody. All steps in the cell culture process and all process parameters in each step were identified by using a cause-and-effect diagram. Prospective risk assessment using failure mode and effects analysis identified the following four potential critical process parameters in the production culture step: initial viable cell density, culture duration, pH, and temperature. These parameters and lot-to-lot variability in raw material were then evaluated by process characterization utilizing a design of experiments approach consisting of a face-centered central composite design integrated with a full factorial design. Process characterization was conducted using a scaled down model that had been qualified by comparison with large-scale production data. Multivariate regression analysis was used to establish statistical prediction models for performance indicators and quality attributes; with these, we constructed contour plots and conducted Monte Carlo simulation to clarify the design space. The statistical analyses, especially for raw materials, identified set point values, which were most robust with respect to the lot-to-lot variability of raw materials while keeping the product quality within the acceptance criteria. © 2013 Wiley Periodicals, Inc. and the American Pharmacists Association.
Design of experiments (DoE) in pharmaceutical development.
N Politis, Stavros; Colombo, Paolo; Colombo, Gaia; M Rekkas, Dimitrios
2017-06-01
At the beginning of the twentieth century, Sir Ronald Fisher introduced the concept of applying statistical analysis during the planning stages of research rather than at the end of experimentation. When statistical thinking is applied from the design phase, it enables to build quality into the product, by adopting Deming's profound knowledge approach, comprising system thinking, variation understanding, theory of knowledge, and psychology. The pharmaceutical industry was late in adopting these paradigms, compared to other sectors. It heavily focused on blockbuster drugs, while formulation development was mainly performed by One Factor At a Time (OFAT) studies, rather than implementing Quality by Design (QbD) and modern engineering-based manufacturing methodologies. Among various mathematical modeling approaches, Design of Experiments (DoE) is extensively used for the implementation of QbD in both research and industrial settings. In QbD, product and process understanding is the key enabler of assuring quality in the final product. Knowledge is achieved by establishing models correlating the inputs with the outputs of the process. The mathematical relationships of the Critical Process Parameters (CPPs) and Material Attributes (CMAs) with the Critical Quality Attributes (CQAs) define the design space. Consequently, process understanding is well assured and rationally leads to a final product meeting the Quality Target Product Profile (QTPP). This review illustrates the principles of quality theory through the work of major contributors, the evolution of the QbD approach and the statistical toolset for its implementation. As such, DoE is presented in detail since it represents the first choice for rational pharmaceutical development.
NASA Astrophysics Data System (ADS)
Dixon, K. W.; Balaji, V.; Lanzante, J.; Radhakrishnan, A.; Hayhoe, K.; Stoner, A. K.; Gaitan, C. F.
2013-12-01
Statistical downscaling (SD) methods may be viewed as generating a value-added product - a refinement of global climate model (GCM) output designed to add finer scale detail and to address GCM shortcomings via a process that gleans information from a combination of observations and GCM-simulated climate change responses. Making use of observational data sets and GCM simulations representing the same historical period, cross-validation techniques allow one to assess how well an SD method meets this goal. However, lacking observations of future, the extent to which a particular SD method's skill might degrade when applied to future climate projections cannot be assessed in the same manner. Here we illustrate and describe extensions to a 'perfect model' experimental design that seeks to quantify aspects of SD method performance both for a historical period (1979-2008) and for late 21st century climate projections. Examples highlighting cases in which downscaling performance deteriorates in future climate projections will be discussed. Also, results will be presented showing how synthetic datasets having known statistical properties may be used to further isolate factors responsible for degradations in SD method skill under changing climatic conditions. We will describe a set of input files used to conduct these analyses that are being made available to researchers who wish to utilize this experimental framework to evaluate SD methods they have developed. The gridded data sets cover a region centered on the contiguous 48 United States with a grid spacing of approximately 25km, have daily time resolution (e.g., maximum and minimum near-surface temperature and precipitation), and represent a total of 120 years of model simulations. This effort is consistent with the 2013 National Climate Predictions and Projections Platform Quantitative Evaluation of Downscaling Workshop goal of supporting a community approach to promote the informed use of downscaled climate projections.
Density by moduli and Wijsman lacunary statistical convergence of sequences of sets.
Bhardwaj, Vinod K; Dhawan, Shweta
2017-01-01
The main object of this paper is to introduce and study a new concept of f -Wijsman lacunary statistical convergence of sequences of sets, where f is an unbounded modulus. The definition of Wijsman lacunary strong convergence of sequences of sets is extended to a definition of Wijsman lacunary strong convergence with respect to a modulus for sequences of sets and it is shown that, under certain conditions on a modulus f , the concepts of Wijsman lacunary strong convergence with respect to a modulus f and f -Wijsman lacunary statistical convergence are equivalent on bounded sequences. We further characterize those θ for which [Formula: see text], where [Formula: see text] and [Formula: see text] denote the sets of all f -Wijsman lacunary statistically convergent sequences and f -Wijsman statistically convergent sequences, respectively.
Bonn, Bernadine A.
2008-01-01
A long-term method detection level (LT-MDL) and laboratory reporting level (LRL) are used by the U.S. Geological Survey?s National Water Quality Laboratory (NWQL) when reporting results from most chemical analyses of water samples. Changing to this method provided data users with additional information about their data and often resulted in more reported values in the low concentration range. Before this method was implemented, many of these values would have been censored. The use of the LT-MDL and LRL presents some challenges for the data user. Interpreting data in the low concentration range increases the need for adequate quality assurance because even small contamination or recovery problems can be relatively large compared to concentrations near the LT-MDL and LRL. In addition, the definition of the LT-MDL, as well as the inclusion of low values, can result in complex data sets with multiple censoring levels and reported values that are less than a censoring level. Improper interpretation or statistical manipulation of low-range results in these data sets can result in bias and incorrect conclusions. This document is designed to help data users use and interpret data reported with the LTMDL/ LRL method. The calculation and application of the LT-MDL and LRL are described. This document shows how to extract statistical information from the LT-MDL and LRL and how to use that information in USGS investigations, such as assessing the quality of field data, interpreting field data, and planning data collection for new projects. A set of 19 detailed examples are included in this document to help data users think about their data and properly interpret lowrange data without introducing bias. Although this document is not meant to be a comprehensive resource of statistical methods, several useful methods of analyzing censored data are demonstrated, including Regression on Order Statistics and Kaplan-Meier Estimation. These two statistical methods handle complex censored data sets without resorting to substitution, thereby avoiding a common source of bias and inaccuracy.
Intervening to promote early initiation of breastfeeding in the LDR.
Komara, Carol; Simpson, Diana; Teasdale, Carla; Whalen, Gaye; Bell, Shay; Giovanetto, Laurie
2007-01-01
To evaluate the effectiveness of an interventional protocol for the early initiation of breastfeeding that would remove barriers in the labor, delivery, recovery (LDR) unit. Descriptive design using 100 postpartum mothers who were interviewed before discharge at a large university hospital in the south-central United States. Descriptive statistics were used for analysis. The protocol was effective for initiating breastfeeding, and breastfeeding increased from 53% to 66%. When barriers to breastfeeding are reduced in the LDR setting, women will breastfeed. It is possible that reducing hospital barriers to breastfeeding in the LDR can also set the stage for sustained breastfeeding during hospitalization and for less supplementation with formula.
Notes on numerical reliability of several statistical analysis programs
Landwehr, J.M.; Tasker, Gary D.
1999-01-01
This report presents a benchmark analysis of several statistical analysis programs currently in use in the USGS. The benchmark consists of a comparison between the values provided by a statistical analysis program for variables in the reference data set ANASTY and their known or calculated theoretical values. The ANASTY data set is an amendment of the Wilkinson NASTY data set that has been used in the statistical literature to assess the reliability (computational correctness) of calculated analytical results.
Kim, Youngwoo; Hong, Byung Woo; Kim, Seung Ja; Kim, Jong Hyo
2014-07-01
A major challenge when distinguishing glandular tissues on mammograms, especially for area-based estimations, lies in determining a boundary on a hazy transition zone from adipose to glandular tissues. This stems from the nature of mammography, which is a projection of superimposed tissues consisting of different structures. In this paper, the authors present a novel segmentation scheme which incorporates the learned prior knowledge of experts into a level set framework for fully automated mammographic density estimations. The authors modeled the learned knowledge as a population-based tissue probability map (PTPM) that was designed to capture the classification of experts' visual systems. The PTPM was constructed using an image database of a selected population consisting of 297 cases. Three mammogram experts extracted regions for dense and fatty tissues on digital mammograms, which was an independent subset used to create a tissue probability map for each ROI based on its local statistics. This tissue class probability was taken as a prior in the Bayesian formulation and was incorporated into a level set framework as an additional term to control the evolution and followed the energy surface designed to reflect experts' knowledge as well as the regional statistics inside and outside of the evolving contour. A subset of 100 digital mammograms, which was not used in constructing the PTPM, was used to validate the performance. The energy was minimized when the initial contour reached the boundary of the dense and fatty tissues, as defined by experts. The correlation coefficient between mammographic density measurements made by experts and measurements by the proposed method was 0.93, while that with the conventional level set was 0.47. The proposed method showed a marked improvement over the conventional level set method in terms of accuracy and reliability. This result suggests that the proposed method successfully incorporated the learned knowledge of the experts' visual systems and has potential to be used as an automated and quantitative tool for estimations of mammographic breast density levels.
Evaluation of a Low-Cost Bubble CPAP System Designed for Resource-Limited Settings.
Bennett, Desmond J; Carroll, Ryan W; Kacmarek, Robert M
2018-04-01
Respiratory compromise is a leading contributor to global neonatal death. CPAP is a method of treatment that helps maintain lung volume during expiration, promotes comfortable breathing, and improves oxygenation. Bubble CPAP is an effective alternative to standard CPAP. We sought to determine the reliability and functionality of a low-cost bubble CPAP device designed for low-resource settings. The low-cost bubble CPAP device was compared to a commercially available bubble CPAP system. The devices were connected to a lung simulator that simulated neonates of 4 different weights with compromised respiratory mechanics (∼1, ∼3, ∼5, and ∼10 kg). The devices' abilities to establish and maintain pressure and flow under normal conditions as well as under conditions of leak were compared. Multiple combinations of pressure levels (5, 8, and 10 cm H 2 O) and flow levels (3, 6, and 10 L/min) were tested. The endurance of both devices was also tested by running the systems continuously for 8 h and measuring the changes in pressure and flow. Both devices performed equivalently during the no-leak and leak trials. While our testing revealed individual differences that were statistically significant and clinically important (>10% difference) within specific CPAP and flow-level settings, no overall comparisons of CPAP or flow were both statistically significant and clinically important. Each device delivered pressures similar to the desired pressures, although the flows delivered by both machines were lower than the set flows in most trials. During the endurance trials, the low-cost device was marginally better at maintaining pressure, while the commercially available device was better at maintaining flow. The low-cost bubble CPAP device evaluated in this study is comparable to a bubble CPAP system used in developed settings. Extensive clinical trials, however, are necessary to confirm its effectiveness. Copyright © 2018 by Daedalus Enterprises.
NASA Astrophysics Data System (ADS)
Feyen, Luc; Caers, Jef
2006-06-01
In this work, we address the problem of characterizing the heterogeneity and uncertainty of hydraulic properties for complex geological settings. Hereby, we distinguish between two scales of heterogeneity, namely the hydrofacies structure and the intrafacies variability of the hydraulic properties. We employ multiple-point geostatistics to characterize the hydrofacies architecture. The multiple-point statistics are borrowed from a training image that is designed to reflect the prior geological conceptualization. The intrafacies variability of the hydraulic properties is represented using conventional two-point correlation methods, more precisely, spatial covariance models under a multi-Gaussian spatial law. We address the different levels and sources of uncertainty in characterizing the subsurface heterogeneity, and explore their effect on groundwater flow and transport predictions. Typically, uncertainty is assessed by way of many images, termed realizations, of a fixed statistical model. However, in many cases, sampling from a fixed stochastic model does not adequately represent the space of uncertainty. It neglects the uncertainty related to the selection of the stochastic model and the estimation of its input parameters. We acknowledge the uncertainty inherent in the definition of the prior conceptual model of aquifer architecture and in the estimation of global statistics, anisotropy, and correlation scales. Spatial bootstrap is used to assess the uncertainty of the unknown statistical parameters. As an illustrative example, we employ a synthetic field that represents a fluvial setting consisting of an interconnected network of channel sands embedded within finer-grained floodplain material. For this highly non-stationary setting we quantify the groundwater flow and transport model prediction uncertainty for various levels of hydrogeological uncertainty. Results indicate the importance of accurately describing the facies geometry, especially for transport predictions.
Use of the Monte Carlo Method for OECD Principles-Guided QSAR Modeling of SIRT1 Inhibitors.
Kumar, Ashwani; Chauhan, Shilpi
2017-01-01
SIRT1 inhibitors offer therapeutic potential for the treatment of a number of diseases including cancer and human immunodeficiency virus infection. A diverse series of 45 compounds with reported SIRT1 inhibitory activity has been employed for the development of quantitative structure-activity relationship (QSAR) models using the Monte Carlo optimization method. This method makes use of simplified molecular input line entry system notation of the molecular structure. The QSAR models were built up according to OECD principles. Three subsets of three splits were examined and validated by respective external sets. All the three described models have good statistical quality. The best model has the following statistical characteristics: R 2 = 0.8350, Q 2 test = 0.7491 for the test set and R 2 = 0.9655, Q 2 ext = 0.9261 for the validation set. In the mechanistic interpretation, structural attributes responsible for the endpoint increase and decrease are defined. Further, the design of some prospective SIRT1 inhibitors is also presented on the basis of these structural attributes. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
The Box Task: A tool to design experiments for assessing visuospatial working memory.
Kessels, Roy P C; Postma, Albert
2017-09-15
The present paper describes the Box Task, a paradigm for the computerized assessment of visuospatial working memory. In this task, hidden objects have to be searched by opening closed boxes that are shown at different locations on the computer screen. The set size (i.e., number of boxes that must be searched) can be varied and different error scores can be computed that measure specific working memory processes (i.e., the number of within-search and between-search errors). The Box Task also has a developer's mode in which new stimulus displays can be designed for use in tailored experiments. The Box Task comes with a standard set of stimulus displays (including practice trials, as well as stimulus displays with 4, 6, and 8 boxes). The raw data can be analyzed easily and the results of individual participants can be aggregated into one spreadsheet for further statistical analyses.
Replicates in high dimensions, with applications to latent variable graphical models.
Tan, Kean Ming; Ning, Yang; Witten, Daniela M; Liu, Han
2016-12-01
In classical statistics, much thought has been put into experimental design and data collection. In the high-dimensional setting, however, experimental design has been less of a focus. In this paper, we stress the importance of collecting multiple replicates for each subject in this setting. We consider learning the structure of a graphical model with latent variables, under the assumption that these variables take a constant value across replicates within each subject. By collecting multiple replicates for each subject, we are able to estimate the conditional dependence relationships among the observed variables given the latent variables. To test the null hypothesis of conditional independence between two observed variables, we propose a pairwise decorrelated score test. Theoretical guarantees are established for parameter estimation and for this test. We show that our proposal is able to estimate latent variable graphical models more accurately than some existing proposals, and apply the proposed method to a brain imaging dataset.
Zhang, Ying; Sun, Jin; Zhang, Yun-Jiao; Chai, Qian-Yun; Zhang, Kang; Ma, Hong-Li; Wu, Xiao-Ke; Liu, Jian-Ping
2016-10-21
Although Traditional Chinese Medicine (TCM) has been widely used in clinical settings, a major challenge that remains in TCM is to evaluate its efficacy scientifically. This randomized controlled trial aims to evaluate the efficacy and safety of berberine in the treatment of patients with polycystic ovary syndrome. In order to improve the transparency and research quality of this clinical trial, we prepared this statistical analysis plan (SAP). The trial design, primary and secondary outcomes, and safety outcomes were declared to reduce selection biases in data analysis and result reporting. We specified detailed methods for data management and statistical analyses. Statistics in corresponding tables, listings, and graphs were outlined. The SAP provided more detailed information than trial protocol on data management and statistical analysis methods. Any post hoc analyses could be identified via referring to this SAP, and the possible selection bias and performance bias will be reduced in the trial. This study is registered at ClinicalTrials.gov, NCT01138930 , registered on 7 June 2010.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lewis, John R
R code that performs the analysis of a data set presented in the paper ‘Leveraging Multiple Statistical Methods for Inverse Prediction in Nuclear Forensics Applications’ by Lewis, J., Zhang, A., Anderson-Cook, C. It provides functions for doing inverse predictions in this setting using several different statistical methods. The data set is a publicly available data set from a historical Plutonium production experiment.
Optimal Micro-Vane Flow Control for Compact Air Vehicle Inlets
NASA Technical Reports Server (NTRS)
Anderson, Bernhard H.; Miller, Daniel N.; Addington, Gregory A.; Agrell, Johan
2004-01-01
The purpose of this study on micro-vane secondary flow control is to demonstrate the viability and economy of Response Surface Methodology (RSM) to optimally design micro-vane secondary flow control arrays, and to establish that the aeromechanical effects of engine face distortion can also be included in the design and optimization process. These statistical design concepts were used to investigate the design characteristics of "low unit strength" micro-effector arrays. "Low unit strength" micro-effectors are micro-vanes set at very low angles-of-incidence with very long chord lengths. They were designed to influence the near wall inlet flow over an extended streamwise distance, and their advantage lies in low total pressure loss and high effectiveness in managing engine face distortion. Therefore, this report examines optimal micro-vane secondary flow control array designs for compact inlets through a Response Surface Methodology.
Rupert, Michael G.; Plummer, Niel
2009-01-01
This raster data set delineates the predicted probability of elevated volatile organic compound (VOC) concentrations in groundwater in the Eagle River watershed valley-fill aquifer, Eagle County, North-Central Colorado, 2006-2007. This data set was developed by a cooperative project between the U.S. Geological Survey, Eagle County, the Eagle River Water and Sanitation District, the Town of Eagle, the Town of Gypsum, and the Upper Eagle Regional Water Authority. This project was designed to evaluate potential land-development effects on groundwater and surface-water resources so that informed land-use and water management decisions can be made. This groundwater probability map and its associated probability maps was developed as follows: (1) A point data set of wells with groundwater quality and groundwater age data was overlaid with thematic layers of anthropogenic (related to human activities) and hydrogeologic data by using a geographic information system to assign each well values for depth to groundwater, distance to major streams and canals, distance to gypsum beds, precipitation, soils, and well depth. These data then were downloaded to a statistical software package for analysis by logistic regression. (2) Statistical models predicting the probability of elevated nitrate concentrations, the probability of unmixed young water (using chlorofluorocarbon-11 concentrations and tritium activities), and the probability of elevated volatile organic compound concentrations were developed using logistic regression techniques. (3) The statistical models were entered into a GIS and the probability map was constructed.
Rupert, Michael G.; Plummer, Niel
2009-01-01
This raster data set delineates the predicted probability of elevated nitrate concentrations in groundwater in the Eagle River watershed valley-fill aquifer, Eagle County, North-Central Colorado, 2006-2007. This data set was developed by a cooperative project between the U.S. Geological Survey, Eagle County, the Eagle River Water and Sanitation District, the Town of Eagle, the Town of Gypsum, and the Upper Eagle Regional Water Authority. This project was designed to evaluate potential land-development effects on groundwater and surface-water resources so that informed land-use and water management decisions can be made. This groundwater probability map and its associated probability maps was developed as follows: (1) A point data set of wells with groundwater quality and groundwater age data was overlaid with thematic layers of anthropogenic (related to human activities) and hydrogeologic data by using a geographic information system to assign each well values for depth to groundwater, distance to major streams and canals, distance to gypsum beds, precipitation, soils, and well depth. These data then were downloaded to a statistical software package for analysis by logistic regression. (2) Statistical models predicting the probability of elevated nitrate concentrations, the probability of unmixed young water (using chlorofluorocarbon-11 concentrations and tritium activities), and the probability of elevated volatile organic compound concentrations were developed using logistic regression techniques. (3) The statistical models were entered into a GIS and the probability map was constructed.
Al-Badriyeh, Daoud; Alameri, Marwah; Al-Okka, Randa
2017-01-01
Objective To perform a first-time analysis of the cost-effectiveness (CE) literature on chemotherapies, of all types, in cancer, in terms of trends and change over time, including the influence of industry funding. Design Systematic review. Setting A wide range of cancer-related research settings within healthcare, including health systems, hospitals and medical centres. Participants All literature comparative CE research of drug-based cancer therapies in the period 1986 to 2015. Primary and secondary outcome measures Primary outcomes are the literature trends in relation to journal subject category, authorship, research design, data sources, funds and consultation involvement. An additional outcome measure is the association between industry funding and study outcomes. Analysis Descriptive statistics and the χ2, Fisher exact or Somer's D tests were used to perform non-parametric statistics, with a p value of <0.05 as the statistical significance measure. Results Total 574 publications were analysed. The drug-related CE literature expands over time, with increased publishing in the healthcare sciences and services journal subject category (p<0.001). The retrospective data collection in studies increased over time (p<0.001). The usage of prospective data, however, has been decreasing (p<0.001) in relation to randomised clinical trials (RCTs), but is unchanging for non-RCT studies. The industry-sponsored CE studies have especially been increasing (p<0.001), in contrast to those sponsored by other sources. While paid consultation involvement grew throughout the years, the declaration of funding for this is relatively limited. Importantly, there is evidence that industry funding is associated with favourable result to the sponsor (p<0.001). Conclusions This analysis demonstrates clear trends in how the CE cancer research is presented to the practicing community, including in relation to journals, study designs, authorship and consultation, together with increased financial sponsorship by pharmaceutical industries, which may be more influencing study outcomes than other funding sources. PMID:28131999
Lunar soils grain size catalog
NASA Technical Reports Server (NTRS)
Graf, John C.
1993-01-01
This catalog compiles every available grain size distribution for Apollo surface soils, trench samples, cores, and Luna 24 soils. Original laboratory data are tabled, and cumulative weight distribution curves and histograms are plotted. Standard statistical parameters are calculated using the method of moments. Photos and location comments describe the sample environment and geological setting. This catalog can help researchers describe the geotechnical conditions and site variability of the lunar surface essential to the design of a lunar base.
Tsai, Chu-Lin; Camargo, Carlos A
2009-09-01
Acute exacerbations of chronic disease are ubiquitous in clinical medicine, and thus far, there has been a paucity of integrated methodological discussion on this phenomenon. We use acute exacerbations of chronic obstructive pulmonary disease as an example to emphasize key epidemiological and statistical issues for this understudied field in clinical epidemiology. Directed acyclic graphs are a useful epidemiological tool to explain the differential effects of risk factor on health outcomes in studies of acute and chronic phases of disease. To study the pathogenesis of acute exacerbations of chronic disease, case-crossover design and time-series analysis are well-suited study designs to differentiate acute and chronic effect. Modeling changes over time and setting appropriate thresholds are important steps to separate acute from chronic phases of disease in serial measurements. In statistical analysis, acute exacerbations are recurrent events, and some individuals are more prone to recurrences than others. Therefore, appropriate statistical modeling should take into account intraindividual dependence. Finally, we recommend the use of "event-based" number needed to treat (NNT) to prevent a single exacerbation instead of traditional patient-based NNT. Addressing these methodological challenges will advance research quality in acute on chronic disease epidemiology.
Plackett-Burman experimental design to facilitate syntactic foam development
Smith, Zachary D.; Keller, Jennie R.; Bello, Mollie; ...
2015-09-14
The use of an eight-experiment Plackett–Burman method can assess six experimental variables and eight responses in a polysiloxane-glass microsphere syntactic foam. The approach aims to decrease the time required to develop a tunable polymer composite by identifying a reduced set of variables and responses suitable for future predictive modeling. The statistical design assesses the main effects of mixing process parameters, polymer matrix composition, microsphere density and volume loading, and the blending of two grades of microspheres, using a dummy factor in statistical calculations. Responses cover rheological, physical, thermal, and mechanical properties. The cure accelerator content of the polymer matrix andmore » the volume loading of the microspheres have the largest effects on foam properties. These factors are the most suitable for controlling the gel point of the curing foam, and the density of the cured foam. The mixing parameters introduce widespread variability and therefore should be fixed at effective levels during follow-up testing. Some responses may require greater contrast in microsphere-related factors. As a result, compared to other possible statistical approaches, the run economy of the Plackett–Burman method makes it a valuable tool for rapidly characterizing new foams.« less
Parker, Loran Carleton; Gleichsner, Alyssa M; Adedokun, Omolola A; Forney, James
2016-11-12
Transformation of research in all biological fields necessitates the design, analysis and, interpretation of large data sets. Preparing students with the requisite skills in experimental design, statistical analysis, and interpretation, and mathematical reasoning will require both curricular reform and faculty who are willing and able to integrate mathematical and statistical concepts into their life science courses. A new Faculty Learning Community (FLC) was constituted each year for four years to assist in the transformation of the life sciences curriculum and faculty at a large, Midwestern research university. Participants were interviewed after participation and surveyed before and after participation to assess the impact of the FLC on their attitudes toward teaching, perceived pedagogical skills, and planned teaching practice. Overall, the FLC had a meaningful positive impact on participants' attitudes toward teaching, knowledge about teaching, and perceived pedagogical skills. Interestingly, confidence for viewing the classroom as a site for research about teaching declined. Implications for the creation and development of FLCs for science faculty are discussed. © 2016 by The International Union of Biochemistry and Molecular Biology, 44(6):517-525, 2016. © 2016 The International Union of Biochemistry and Molecular Biology.
The Effect of Distributed Practice in Undergraduate Statistics Homework Sets: A Randomized Trial
ERIC Educational Resources Information Center
Crissinger, Bryan R.
2015-01-01
Most homework sets in statistics courses are constructed so that students concentrate or "mass" their practice on a certain topic in one problem set. Distributed practice homework sets include review problems in each set so that practice on a topic is distributed across problem sets. There is a body of research that points to the…
Did the 1997 Balanced Budget Act Reduce Use of Physical and Occupational Therapy Services?
Latham, Nancy K.; Jette, Alan M.; Ngo, Long H.; Soukup, Jane; Iezzoni, Lisa I.
2012-01-01
Objective To investigate whether use of physical therapy (PT) and occupational therapy (OT) services decreased after the passage of the 1997 Balanced Budget Act (BBA). Design Data from the nationally representative Medicare Current Beneficiary Survey (MCBS) were merged with Medicare claims data. We conducted cross-sectional analyses of data from 1995 (n=7978), 1999 (n=7863), and 2001 (n=7973). All analyses used MCBS sampling weights to provide estimates that can be generalized to the Medicare population with 5 common conditions. Settings Skilled nursing facilities (SNFs), home health agencies, inpatient rehabilitation facilities (IRFs), and outpatient rehabilitation settings. Participants Medicare beneficiaries who participated in the MCBS survey in each of the study years and had 1 or more of the following conditions: acute stroke, acute myocardial infarction, chronic obstructive pulmonary disease, arthritis or degenerative joint disease, or mobility problems. Interventions Not applicable. Main Outcome Measures Percentage of persons meeting our inclusion criteria who received PT or OT in each setting, and total units of PT and OT received in each setting. Results Multivariable logistic regression revealed no statistically significantly differences in the proportion of people who met our inclusion criteria who used PT or OT from home health agencies across the 3 time points. For SNFs, an increase in the odds of receiving PT was statistically significant from 1995 to 1999 (odds ratio [OR]=1.42; 95% confidence interval [CI], 1.19–1.69) and 1995 to 2001 (OR=1.69; 95% CI, 1.39–2.05). For IRF and outpatient settings, a significant increase was observed between 1995 and 2001 (OR=1.71, OR=1.27, respectively). For OT, a statistically significant increase was observed for IRF and outpatient rehabilitation settings from 1995 to 2001. For SNF, the increase was statistically significant from 1995 to 1999 and 1995 to 2001. Mean total PT and OT units received also increased across all settings from 1995 to 2001 except for IRFs. Conclusions Despite BBA mandates restricting postacute care expenditures, this nationally representative study showed no decreases in the percentage of Medicare beneficiaries with 5 common diagnoses receiving PT and/or OT across all settings and no decreases in units of PT and/or OT services received between 1995 and 2001 except for those in IRFs. This study suggests that the delivery of PT and OT services did not decline among persons with conditions where rehabilitation services are often clinically indicated. PMID:18452725
Murphy, Thomas; Schwedock, Julie; Nguyen, Kham; Mills, Anna; Jones, David
2015-01-01
New recommendations for the validation of rapid microbiological methods have been included in the revised Technical Report 33 release from the PDA. The changes include a more comprehensive review of the statistical methods to be used to analyze data obtained during validation. This case study applies those statistical methods to accuracy, precision, ruggedness, and equivalence data obtained using a rapid microbiological methods system being evaluated for water bioburden testing. Results presented demonstrate that the statistical methods described in the PDA Technical Report 33 chapter can all be successfully applied to the rapid microbiological method data sets and gave the same interpretation for equivalence to the standard method. The rapid microbiological method was in general able to pass the requirements of PDA Technical Report 33, though the study shows that there can be occasional outlying results and that caution should be used when applying statistical methods to low average colony-forming unit values. Prior to use in a quality-controlled environment, any new method or technology has to be shown to work as designed by the manufacturer for the purpose required. For new rapid microbiological methods that detect and enumerate contaminating microorganisms, additional recommendations have been provided in the revised PDA Technical Report No. 33. The changes include a more comprehensive review of the statistical methods to be used to analyze data obtained during validation. This paper applies those statistical methods to analyze accuracy, precision, ruggedness, and equivalence data obtained using a rapid microbiological method system being validated for water bioburden testing. The case study demonstrates that the statistical methods described in the PDA Technical Report No. 33 chapter can be successfully applied to rapid microbiological method data sets and give the same comparability results for similarity or difference as the standard method. © PDA, Inc. 2015.
Dong, Qi; Elliott, Michael R; Raghunathan, Trivellore E
2014-06-01
Outside of the survey sampling literature, samples are often assumed to be generated by a simple random sampling process that produces independent and identically distributed (IID) samples. Many statistical methods are developed largely in this IID world. Application of these methods to data from complex sample surveys without making allowance for the survey design features can lead to erroneous inferences. Hence, much time and effort have been devoted to develop the statistical methods to analyze complex survey data and account for the sample design. This issue is particularly important when generating synthetic populations using finite population Bayesian inference, as is often done in missing data or disclosure risk settings, or when combining data from multiple surveys. By extending previous work in finite population Bayesian bootstrap literature, we propose a method to generate synthetic populations from a posterior predictive distribution in a fashion inverts the complex sampling design features and generates simple random samples from a superpopulation point of view, making adjustment on the complex data so that they can be analyzed as simple random samples. We consider a simulation study with a stratified, clustered unequal-probability of selection sample design, and use the proposed nonparametric method to generate synthetic populations for the 2006 National Health Interview Survey (NHIS), and the Medical Expenditure Panel Survey (MEPS), which are stratified, clustered unequal-probability of selection sample designs.
Dong, Qi; Elliott, Michael R.; Raghunathan, Trivellore E.
2017-01-01
Outside of the survey sampling literature, samples are often assumed to be generated by a simple random sampling process that produces independent and identically distributed (IID) samples. Many statistical methods are developed largely in this IID world. Application of these methods to data from complex sample surveys without making allowance for the survey design features can lead to erroneous inferences. Hence, much time and effort have been devoted to develop the statistical methods to analyze complex survey data and account for the sample design. This issue is particularly important when generating synthetic populations using finite population Bayesian inference, as is often done in missing data or disclosure risk settings, or when combining data from multiple surveys. By extending previous work in finite population Bayesian bootstrap literature, we propose a method to generate synthetic populations from a posterior predictive distribution in a fashion inverts the complex sampling design features and generates simple random samples from a superpopulation point of view, making adjustment on the complex data so that they can be analyzed as simple random samples. We consider a simulation study with a stratified, clustered unequal-probability of selection sample design, and use the proposed nonparametric method to generate synthetic populations for the 2006 National Health Interview Survey (NHIS), and the Medical Expenditure Panel Survey (MEPS), which are stratified, clustered unequal-probability of selection sample designs. PMID:29200608
Deep epistasis in human metabolism
NASA Astrophysics Data System (ADS)
Imielinski, Marcin; Belta, Calin
2010-06-01
We extend and apply a method that we have developed for deriving high-order epistatic relationships in large biochemical networks to a published genome-scale model of human metabolism. In our analysis we compute 33 328 reaction sets whose knockout synergistically disables one or more of 43 important metabolic functions. We also design minimal knockouts that remove flux through fumarase, an enzyme that has previously been shown to play an important role in human cancer. Most of these knockout sets employ more than eight mutually buffering reactions, spanning multiple cellular compartments and metabolic subsystems. These reaction sets suggest that human metabolic pathways possess a striking degree of parallelism, inducing "deep" epistasis between diversely annotated genes. Our results prompt specific chemical and genetic perturbation follow-up experiments that could be used to query in vivo pathway redundancy. They also suggest directions for future statistical studies of epistasis in genetic variation data sets.
NASA Astrophysics Data System (ADS)
Khan, M. M. A.; Romoli, L.; Fiaschi, M.; Dini, G.; Sarri, F.
2011-02-01
This paper presents an experimental design approach to process parameter optimization for the laser welding of martensitic AISI 416 and AISI 440FSe stainless steels in a constrained overlap configuration in which outer shell was 0.55 mm thick. To determine the optimal laser-welding parameters, a set of mathematical models were developed relating welding parameters to each of the weld characteristics. These were validated both statistically and experimentally. The quality criteria set for the weld to determine optimal parameters were the minimization of weld width and the maximization of weld penetration depth, resistance length and shearing force. Laser power and welding speed in the range 855-930 W and 4.50-4.65 m/min, respectively, with a fiber diameter of 300 μm were identified as the optimal set of process parameters. However, the laser power and welding speed can be reduced to 800-840 W and increased to 4.75-5.37 m/min, respectively, to obtain stronger and better welds.
A model for indexing medical documents combining statistical and symbolic knowledge.
Avillach, Paul; Joubert, Michel; Fieschi, Marius
2007-10-11
To develop and evaluate an information processing method based on terminologies, in order to index medical documents in any given documentary context. We designed a model using both symbolic general knowledge extracted from the Unified Medical Language System (UMLS) and statistical knowledge extracted from a domain of application. Using statistical knowledge allowed us to contextualize the general knowledge for every particular situation. For each document studied, the extracted terms are ranked to highlight the most significant ones. The model was tested on a set of 17,079 French standardized discharge summaries (SDSs). The most important ICD-10 term of each SDS was ranked 1st or 2nd by the method in nearly 90% of the cases. The use of several terminologies leads to more precise indexing. The improvement achieved in the models implementation performances as a result of using semantic relationships is encouraging.
A Model for Indexing Medical Documents Combining Statistical and Symbolic Knowledge.
Avillach, Paul; Joubert, Michel; Fieschi, Marius
2007-01-01
OBJECTIVES: To develop and evaluate an information processing method based on terminologies, in order to index medical documents in any given documentary context. METHODS: We designed a model using both symbolic general knowledge extracted from the Unified Medical Language System (UMLS) and statistical knowledge extracted from a domain of application. Using statistical knowledge allowed us to contextualize the general knowledge for every particular situation. For each document studied, the extracted terms are ranked to highlight the most significant ones. The model was tested on a set of 17,079 French standardized discharge summaries (SDSs). RESULTS: The most important ICD-10 term of each SDS was ranked 1st or 2nd by the method in nearly 90% of the cases. CONCLUSIONS: The use of several terminologies leads to more precise indexing. The improvement achieved in the model’s implementation performances as a result of using semantic relationships is encouraging. PMID:18693792
Parametric Studies of Flow Separation using Air Injection
NASA Technical Reports Server (NTRS)
Zhang, Wei
2004-01-01
Boundary Layer separation causes the airfoil to stall and therefore imposes dramatic performance degradation on the airfoil. In recent years, flow separation control has been one of the active research areas in the field of aerodynamics due to its promising performance improvements on the lifting device. These active flow separation control techniques include steady and unsteady air injection as well as suction on the airfoil surface etc. This paper will be focusing on the steady and unsteady air injection on the airfoil. Although wind tunnel experiments revealed that the performance improvements on the airfoil using injection techniques, the details of how the key variables such as air injection slot geometry and air injection angle etc impact the effectiveness of flow separation control via air injection has not been studied. A parametric study of both steady and unsteady air injection active flow control will be the main objective for this summer. For steady injection, the key variables include the slot geometry, orientation, spacing, air injection velocity as well as the injection angle. For unsteady injection, the injection frequency will also be investigated. Key metrics such as lift coefficient, drag coefficient, total pressure loss and total injection mass will be used to measure the effectiveness of the control technique. A design of experiments using the Box-Behnken Design is set up in order to determine how each of the variables affects each of the key metrics. Design of experiment is used so that the number of experimental runs will be at minimum and still be able to predict which variables are the key contributors to the responses. The experiments will then be conducted in the 1ft by 1ft wind tunnel according to the design of experiment settings. The data obtained from the experiments will be imported into JMP, statistical software, to generate sets of response surface equations which represent the statistical empirical model for each of the metrics as a function of the key variables. Next, the variables such as the slot geometry can be optimized using the build-in optimizer within JMP. Finally, a wind tunnel testing will be conducted using the optimized slot geometry and other key variables to verify the empirical statistical model. The long term goal for this effort is to assess the impacts of active flow control using air injection at system level as one of the task plan included in the NASAs URETI program with Georgia Institute of Technology.
The statistical power to detect cross-scale interactions at macroscales
Wagner, Tyler; Fergus, C. Emi; Stow, Craig A.; Cheruvelil, Kendra S.; Soranno, Patricia A.
2016-01-01
Macroscale studies of ecological phenomena are increasingly common because stressors such as climate and land-use change operate at large spatial and temporal scales. Cross-scale interactions (CSIs), where ecological processes operating at one spatial or temporal scale interact with processes operating at another scale, have been documented in a variety of ecosystems and contribute to complex system dynamics. However, studies investigating CSIs are often dependent on compiling multiple data sets from different sources to create multithematic, multiscaled data sets, which results in structurally complex, and sometimes incomplete data sets. The statistical power to detect CSIs needs to be evaluated because of their importance and the challenge of quantifying CSIs using data sets with complex structures and missing observations. We studied this problem using a spatially hierarchical model that measures CSIs between regional agriculture and its effects on the relationship between lake nutrients and lake productivity. We used an existing large multithematic, multiscaled database, LAke multiscaled GeOSpatial, and temporal database (LAGOS), to parameterize the power analysis simulations. We found that the power to detect CSIs was more strongly related to the number of regions in the study rather than the number of lakes nested within each region. CSI power analyses will not only help ecologists design large-scale studies aimed at detecting CSIs, but will also focus attention on CSI effect sizes and the degree to which they are ecologically relevant and detectable with large data sets.
Adaptive Signal Recovery on Graphs via Harmonic Analysis for Experimental Design in Neuroimaging.
Kim, Won Hwa; Hwang, Seong Jae; Adluru, Nagesh; Johnson, Sterling C; Singh, Vikas
2016-10-01
Consider an experimental design of a neuroimaging study, where we need to obtain p measurements for each participant in a setting where p ' (< p ) are cheaper and easier to acquire while the remaining ( p - p ') are expensive. For example, the p ' measurements may include demographics, cognitive scores or routinely offered imaging scans while the ( p - p ') measurements may correspond to more expensive types of brain image scans with a higher participant burden. In this scenario, it seems reasonable to seek an "adaptive" design for data acquisition so as to minimize the cost of the study without compromising statistical power. We show how this problem can be solved via harmonic analysis of a band-limited graph whose vertices correspond to participants and our goal is to fully recover a multi-variate signal on the nodes, given the full set of cheaper features and a partial set of more expensive measurements. This is accomplished using an adaptive query strategy derived from probing the properties of the graph in the frequency space. To demonstrate the benefits that this framework can provide, we present experimental evaluations on two independent neuroimaging studies and show that our proposed method can reliably recover the true signal with only partial observations directly yielding substantial financial savings.
Sinharay, Sandip
2017-09-01
Benefiting from item preknowledge is a major type of fraudulent behavior during educational assessments. Belov suggested the posterior shift statistic for detection of item preknowledge and showed its performance to be better on average than that of seven other statistics for detection of item preknowledge for a known set of compromised items. Sinharay suggested a statistic based on the likelihood ratio test for detection of item preknowledge; the advantage of the statistic is that its null distribution is known. Results from simulated and real data and adaptive and nonadaptive tests are used to demonstrate that the Type I error rate and power of the statistic based on the likelihood ratio test are very similar to those of the posterior shift statistic. Thus, the statistic based on the likelihood ratio test appears promising in detecting item preknowledge when the set of compromised items is known.
Yang, Yi; Tokita, Midori; Ishiguchi, Akira
2018-01-01
A number of studies revealed that our visual system can extract different types of summary statistics, such as the mean and variance, from sets of items. Although the extraction of such summary statistics has been studied well in isolation, the relationship between these statistics remains unclear. In this study, we explored this issue using an individual differences approach. Observers viewed illustrations of strawberries and lollypops varying in size or orientation and performed four tasks in a within-subject design, namely mean and variance discrimination tasks with size and orientation domains. We found that the performances in the mean and variance discrimination tasks were not correlated with each other and demonstrated that extractions of the mean and variance are mediated by different representation mechanisms. In addition, we tested the relationship between performances in size and orientation domains for each summary statistic (i.e. mean and variance) and examined whether each summary statistic has distinct processes across perceptual domains. The results illustrated that statistical summary representations of size and orientation may share a common mechanism for representing the mean and possibly for representing variance. Introspections for each observer performing the tasks were also examined and discussed. PMID:29399318
QSAR models for thiophene and imidazopyridine derivatives inhibitors of the Polo-Like Kinase 1.
Comelli, Nieves C; Duchowicz, Pablo R; Castro, Eduardo A
2014-10-01
The inhibitory activity of 103 thiophene and 33 imidazopyridine derivatives against Polo-Like Kinase 1 (PLK1) expressed as pIC50 (-logIC50) was predicted by QSAR modeling. Multivariate linear regression (MLR) was employed to model the relationship between 0D and 3D molecular descriptors and biological activities of molecules using the replacement method (MR) as variable selection tool. The 136 compounds were separated into several training and test sets. Two splitting approaches, distribution of biological data and structural diversity, and the statistical experimental design procedure D-optimal distance were applied to the dataset. The significance of the training set models was confirmed by statistically higher values of the internal leave one out cross-validated coefficient of determination (Q2) and external predictive coefficient of determination for the test set (Rtest2). The model developed from a training set, obtained with the D-optimal distance protocol and using 3D descriptor space along with activity values, separated chemical features that allowed to distinguish high and low pIC50 values reasonably well. Then, we verified that such model was sufficient to reliably and accurately predict the activity of external diverse structures. The model robustness was properly characterized by means of standard procedures and their applicability domain (AD) was analyzed by leverage method. Copyright © 2014 Elsevier B.V. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Goltz, G.; Kaiser, L.M.; Weiner, H.
A major mission of the U.S. Coast Guard is the task of providing and maintaining Maritime Aids to Navigation. These aids are located on and near the coastline and inland waters of the United States and its possessions. A computer program, Design Synthesis and Performance Analysis (DSPA), has been developed by the Jet Propulsion Laboratory to demonstrate the feasibility of low-cost solar array/battery power systems for use on flashing lamp buoys. To provide detailed, realistic temperature, wind, and solar insolation data for analysis of the flashing lamp buoy power systems, the two DSPA support computer program sets: MERGE and STATmore » were developed. A general description of these two packages is presented in this program summary report. The MERGE program set will enable the Coast Guard to combine temperature and wind velocity data (NOAA TDF-14 tapes) with solar insolation data (NOAA DECK-280 tapes) onto a single sequential MERGE file containing up to 12 years of hourly observations. This MERGE file can then be used as direct input to the DSPA program. The STAT program set will enable a statistical analysis to be performed of the MERGE data and produce high or low or mean profiles of the data and/or do a worst case analysis. The STAT output file consists of a one-year set of hourly statistical weather data which can be used as input to the DSPA program.« less
Statistical and optimal learning with applications in business analytics
NASA Astrophysics Data System (ADS)
Han, Bin
Statistical learning is widely used in business analytics to discover structure or exploit patterns from historical data, and build models that capture relationships between an outcome of interest and a set of variables. Optimal learning on the other hand, solves the operational side of the problem, by iterating between decision making and data acquisition/learning. All too often the two problems go hand-in-hand, which exhibit a feedback loop between statistics and optimization. We apply this statistical/optimal learning concept on a context of fundraising marketing campaign problem arising in many non-profit organizations. Many such organizations use direct-mail marketing to cultivate one-time donors and convert them into recurring contributors. Cultivated donors generate much more revenue than new donors, but also lapse with time, making it important to steadily draw in new cultivations. The direct-mail budget is limited, but better-designed mailings can improve success rates without increasing costs. We first apply statistical learning to analyze the effectiveness of several design approaches used in practice, based on a massive dataset covering 8.6 million direct-mail communications with donors to the American Red Cross during 2009-2011. We find evidence that mailed appeals are more effective when they emphasize disaster preparedness and training efforts over post-disaster cleanup. Including small cards that affirm donors' identity as Red Cross supporters is an effective strategy, while including gift items such as address labels is not. Finally, very recent acquisitions are more likely to respond to appeals that ask them to contribute an amount similar to their most recent donation, but this approach has an adverse effect on donors with a longer history. We show via simulation that a simple design strategy based on these insights has potential to improve success rates from 5.4% to 8.1%. Given these findings, when new scenario arises, however, new data need to be acquired to update our model and decisions, which is studied under optimal learning framework. The goal becomes discovering a sequential information collection strategy that learns the best campaign design alternative as quickly as possible. Regression structure is used to learn about a set of unknown parameters, which alternates with optimization to design new data points. Such problems have been extensively studied in the ranking and selection (R&S) community, but traditional R&S procedures experience high computational costs when the decision space grows combinatorially. We present a value of information procedure for simultaneously learning unknown regression parameters and unknown sampling noise. We then develop an approximate version of the procedure, based on semi-definite programming relaxation, that retains good performance and scales better to large problems. We also prove the asymptotic consistency of the algorithm in the parametric model, a result that has not previously been available for even the known-variance case.
Optimisation study of a vehicle bumper subsystem with fuzzy parameters
NASA Astrophysics Data System (ADS)
Farkas, L.; Moens, D.; Donders, S.; Vandepitte, D.
2012-10-01
This paper deals with the design and optimisation for crashworthiness of a vehicle bumper subsystem, which is a key scenario for vehicle component design. The automotive manufacturers and suppliers have to find optimal design solutions for such subsystems that comply with the conflicting requirements of the regulatory bodies regarding functional performance (safety and repairability) and regarding the environmental impact (mass). For the bumper design challenge, an integrated methodology for multi-attribute design engineering of mechanical structures is set up. The integrated process captures the various tasks that are usually performed manually, this way facilitating the automated design iterations for optimisation. Subsequently, an optimisation process is applied that takes the effect of parametric uncertainties into account, such that the system level of failure possibility is acceptable. This optimisation process is referred to as possibility-based design optimisation and integrates the fuzzy FE analysis applied for the uncertainty treatment in crash simulations. This process is the counterpart of the reliability-based design optimisation used in a probabilistic context with statistically defined parameters (variabilities).
Proportioning and performance evaluation of self-consolidating concrete
NASA Astrophysics Data System (ADS)
Wang, Xuhao
A well-proportioned self-consolidating concrete (SCC) mixture can be achieved by controlling the aggregate system, paste quality, and paste quantity. The work presented in this dissertation involves an effort to study and improve particle packing of the concrete system and reduce the paste quantity while maintaining concrete quality and performance. This dissertation is composed of four papers resulting from the study: (1) Assessing Particle Packing Based Self-Consolidating Concrete Mix Design; (2) Using Paste-To-Voids Volume Ratio to Evaluate the Performance of Self-Consolidating Concrete Mixtures; (3) Image Analysis Applications on Assessing Static Stability and Flowability of Self-Consolidating Concrete, and (4) Using Ultrasonic Wave Propagation to Monitor Stiffening Process of Self-Consolidating Concrete. Tests were conducted on a large matrix of SCC mixtures that were designed for cast-in-place bridge construction. The mixtures were made with different aggregate types, sizes, and different cementitious materials. In Paper 1, a modified particle-packing based mix design method, originally proposed by Brouwers (2005), was applied to the design of self-consolidating concrete (SCC) mixs. Using this method, a large matrix of SCC mixes was designed to have a particle distribution modulus (q) ranging from 0.23 to 0.29. Fresh properties (such as flowability, passing ability, segregation resistance, yield stress, viscosity, set time and formwork pressure) and hardened properties (such as compressive strength, surface resistance, shrinkage, and air structure) of these concrete mixes were experimentally evaluated. In Paper 2, a concept that is based on paste-to-voids volume ratio (Vpaste/Vvoids) was employed to assess the performance of SCC mixtures. The relationship between excess paste theory and Vpaste/Vvoids was investigated. The workability, flow properties, compressive strength, shrinkage, and surface resistivity of SCC mixtures were determined at various ages. Statistical analyses, response surface models and Tukey Honestly Significant Difference (HSD) tests, were conducted to relate the mix design parameters to the concrete performance. The work discussed in Paper 3 was to apply a digital image processing (DIP) method associated with a MATLAB algorithm to evaluate cross sectional images of self-consolidating concrete (SCC). Parameters, such as inter-particle spacing between coarse aggregate particles and average mortar to aggregate ratio defined as average mortar thickness index (MTI), were derived from DIP method and applied to evaluate the static stability and develop statistical models to predict flowability of SCC mixtures. The last paper investigated technologies available to monitor changing properties of a fresh mixture, particularly for use with self-consolidating concrete (SCC). A number of techniques were used to monitor setting time, stiffening and formwork pressure of SCC mixtures. These included longitudinal (P-wave) ultrasonic wave propagation, penetrometer based setting time, semi-adiabatic calorimetry, and formwork pressure. The first study demonstrated that the concrete mixes designed using the modified Brouwers mix design algorithm and particle packing concept had a potential to reduce up to 20% SCMs content compared to existing SCC mix proportioning methods and still maintain good performance. The second paper concluded that slump flow of the SCC mixtures increased with Vpaste/Vvoids at a given viscosity of mortar. Compressive trength increases with increasing Vpaste/Vvoids up to a point (~150%), after which the strength becomes independent of Vpaste/Vvoids, even slightly decreases. Vpaste/Vvoids has little effect on the shrinkage mixtures, while SCC mixtures tend to have a higher shrinkage than CC for a given Vpaste/Vvoids. Vpaste/Vvoids has little effects on surface resistivity of SCC mixtures. The paste quality tends to have a dominant effect. Statistical analysis is an efficient tool to identify the significance of influence factors on concrete performance. In third paper, proposed DIP method and MATLAB algorithm can be successfully used to derive inter-particle spacing and MTI, and quantitatively evaluate the static stability in hardened SCC samples. These parameters can be applied to overcome the limitations and challenges of existing theoretical frames and construct statistical models associated with rheological parameters to predict flowability of SCC mixtures. The outcome of this study can be of practical value for providing an efficient and useful tool in designing mixture proportions of SCC. Last paper compared several concrete performance measurement techniques, the P-wave test and calorimetric measurements can be efficiently used to monitor the stiffening and setting of SCC mixtures.
Packham, B; Barnes, G; Dos Santos, G Sato; Aristovich, K; Gilad, O; Ghosh, A; Oh, T; Holder, D
2016-06-01
Electrical impedance tomography (EIT) allows for the reconstruction of internal conductivity from surface measurements. A change in conductivity occurs as ion channels open during neural activity, making EIT a potential tool for functional brain imaging. EIT images can have >10 000 voxels, which means statistical analysis of such images presents a substantial multiple testing problem. One way to optimally correct for these issues and still maintain the flexibility of complicated experimental designs is to use random field theory. This parametric method estimates the distribution of peaks one would expect by chance in a smooth random field of a given size. Random field theory has been used in several other neuroimaging techniques but never validated for EIT images of fast neural activity, such validation can be achieved using non-parametric techniques. Both parametric and non-parametric techniques were used to analyze a set of 22 images collected from 8 rats. Significant group activations were detected using both techniques (corrected p < 0.05). Both parametric and non-parametric analyses yielded similar results, although the latter was less conservative. These results demonstrate the first statistical analysis of such an image set and indicate that such an analysis is an approach for EIT images of neural activity.
Packham, B; Barnes, G; dos Santos, G Sato; Aristovich, K; Gilad, O; Ghosh, A; Oh, T; Holder, D
2016-01-01
Abstract Electrical impedance tomography (EIT) allows for the reconstruction of internal conductivity from surface measurements. A change in conductivity occurs as ion channels open during neural activity, making EIT a potential tool for functional brain imaging. EIT images can have >10 000 voxels, which means statistical analysis of such images presents a substantial multiple testing problem. One way to optimally correct for these issues and still maintain the flexibility of complicated experimental designs is to use random field theory. This parametric method estimates the distribution of peaks one would expect by chance in a smooth random field of a given size. Random field theory has been used in several other neuroimaging techniques but never validated for EIT images of fast neural activity, such validation can be achieved using non-parametric techniques. Both parametric and non-parametric techniques were used to analyze a set of 22 images collected from 8 rats. Significant group activations were detected using both techniques (corrected p < 0.05). Both parametric and non-parametric analyses yielded similar results, although the latter was less conservative. These results demonstrate the first statistical analysis of such an image set and indicate that such an analysis is an approach for EIT images of neural activity. PMID:27203477
InChIKey collision resistance: an experimental testing
2012-01-01
InChIKey is a 27-character compacted (hashed) version of InChI which is intended for Internet and database searching/indexing and is based on an SHA-256 hash of the InChI character string. The first block of InChIKey encodes molecular skeleton while the second block represents various kinds of isomerism (stereo, tautomeric, etc.). InChIKey is designed to be a nearly unique substitute for the parent InChI. However, a single InChIKey may occasionally map to two or more InChI strings (collision). The appearance of collision itself does not compromise the signature as collision-free hashing is impossible; the only viable approach is to set and keep a reasonable level of collision resistance which is sufficient for typical applications. We tested, in computational experiments, how well the real-life InChIKey collision resistance corresponds to the theoretical estimates expected by design. For this purpose, we analyzed the statistical characteristics of InChIKey for datasets of variable size in comparison to the theoretical statistical frequencies. For the relatively short second block, an exhaustive direct testing was performed. We computed and compared to theory the numbers of collisions for the stereoisomers of Spongistatin I (using the whole set of 67,108,864 isomers and its subsets). For the longer first block, we generated, using custom-made software, InChIKeys for more than 3 × 1010 chemical structures. The statistical behavior of this block was tested by comparison of experimental and theoretical frequencies for the various four-letter sequences which may appear in the first block body. From the results of our computational experiments we conclude that the observed characteristics of InChIKey collision resistance are in good agreement with theoretical expectations. PMID:23256896
InChIKey collision resistance: an experimental testing.
Pletnev, Igor; Erin, Andrey; McNaught, Alan; Blinov, Kirill; Tchekhovskoi, Dmitrii; Heller, Steve
2012-12-20
InChIKey is a 27-character compacted (hashed) version of InChI which is intended for Internet and database searching/indexing and is based on an SHA-256 hash of the InChI character string. The first block of InChIKey encodes molecular skeleton while the second block represents various kinds of isomerism (stereo, tautomeric, etc.). InChIKey is designed to be a nearly unique substitute for the parent InChI. However, a single InChIKey may occasionally map to two or more InChI strings (collision). The appearance of collision itself does not compromise the signature as collision-free hashing is impossible; the only viable approach is to set and keep a reasonable level of collision resistance which is sufficient for typical applications.We tested, in computational experiments, how well the real-life InChIKey collision resistance corresponds to the theoretical estimates expected by design. For this purpose, we analyzed the statistical characteristics of InChIKey for datasets of variable size in comparison to the theoretical statistical frequencies. For the relatively short second block, an exhaustive direct testing was performed. We computed and compared to theory the numbers of collisions for the stereoisomers of Spongistatin I (using the whole set of 67,108,864 isomers and its subsets). For the longer first block, we generated, using custom-made software, InChIKeys for more than 3 × 1010 chemical structures. The statistical behavior of this block was tested by comparison of experimental and theoretical frequencies for the various four-letter sequences which may appear in the first block body.From the results of our computational experiments we conclude that the observed characteristics of InChIKey collision resistance are in good agreement with theoretical expectations.
Shen, Li; Saykin, Andrew J.; Williams, Scott M.; Moore, Jason H.
2016-01-01
ABSTRACT Although gene‐environment (G× E) interactions play an important role in many biological systems, detecting these interactions within genome‐wide data can be challenging due to the loss in statistical power incurred by multiple hypothesis correction. To address the challenge of poor power and the limitations of existing multistage methods, we recently developed a screening‐testing approach for G× E interaction detection that combines elastic net penalized regression with joint estimation to support a single omnibus test for the presence of G× E interactions. In our original work on this technique, however, we did not assess type I error control or power and evaluated the method using just a single, small bladder cancer data set. In this paper, we extend the original method in two important directions and provide a more rigorous performance evaluation. First, we introduce a hierarchical false discovery rate approach to formally assess the significance of individual G× E interactions. Second, to support the analysis of truly genome‐wide data sets, we incorporate a score statistic‐based prescreening step to reduce the number of single nucleotide polymorphisms prior to fitting the first stage penalized regression model. To assess the statistical properties of our method, we compare the type I error rate and statistical power of our approach with competing techniques using both simple simulation designs as well as designs based on real disease architectures. Finally, we demonstrate the ability of our approach to identify biologically plausible SNP‐education interactions relative to Alzheimer's disease status using genome‐wide association study data from the Alzheimer's Disease Neuroimaging Initiative (ADNI). PMID:27578615
Some challenges with statistical inference in adaptive designs.
Hung, H M James; Wang, Sue-Jane; Yang, Peiling
2014-01-01
Adaptive designs have generated a great deal of attention to clinical trial communities. The literature contains many statistical methods to deal with added statistical uncertainties concerning the adaptations. Increasingly encountered in regulatory applications are adaptive statistical information designs that allow modification of sample size or related statistical information and adaptive selection designs that allow selection of doses or patient populations during the course of a clinical trial. For adaptive statistical information designs, a few statistical testing methods are mathematically equivalent, as a number of articles have stipulated, but arguably there are large differences in their practical ramifications. We pinpoint some undesirable features of these methods in this work. For adaptive selection designs, the selection based on biomarker data for testing the correlated clinical endpoints may increase statistical uncertainty in terms of type I error probability, and most importantly the increased statistical uncertainty may be impossible to assess.
NASA Astrophysics Data System (ADS)
Ghosh, Arpita; Das, Papita; Sinha, Keka
2015-06-01
In the present work, spent tea leaves were modified with Ca(OH)2 and used as a new, non-conventional and low-cost biosorbent for the removal of Cu(II) from aqueous solution. Response surface methodology (RSM) and artificial neural network (ANN) were used to develop predictive models for simulation and optimization of the biosorption process. The influence of process parameters (pH, biosorbent dose and reaction time) on the biosorption efficiency was investigated through a two-level three-factor (23) full factorial central composite design with the help of Design Expert. The same design was also used to obtain a training set for ANN. Finally, both modeling methodologies were statistically compared by the root mean square error and absolute average deviation based on the validation data set. Results suggest that RSM has better prediction performance as compared to ANN. The biosorption followed Langmuir adsorption isotherm and it followed pseudo-second-order kinetic. The optimum removal efficiency of the adsorbent was found as 96.12 %.
NASA Astrophysics Data System (ADS)
Kwon, O.; Kim, W.; Kim, J.
2017-12-01
Recently construction of subsea tunnel has been increased globally. For safe construction of subsea tunnel, identifying the geological structure including fault at design and construction stage is more than important. Then unlike the tunnel in land, it's very difficult to obtain the data on geological structure because of the limit in geological survey. This study is intended to challenge such difficulties in a way of developing the technology to identify the geological structure of seabed automatically by using echo sounding data. When investigation a potential site for a deep subsea tunnel, there is the technical and economical limit with borehole of geophysical investigation. On the contrary, echo sounding data is easily obtainable while information reliability is higher comparing to above approaches. This study is aimed at developing the algorithm that identifies the large scale of geological structure of seabed using geostatic approach. This study is based on theory of structural geology that topographic features indicate geological structure. Basic concept of algorithm is outlined as follows; (1) convert the seabed topography to the grid data using echo sounding data, (2) apply the moving window in optimal size to the grid data, (3) estimate the spatial statistics of the grid data in the window area, (4) set the percentile standard of spatial statistics, (5) display the values satisfying the standard on the map, (6) visualize the geological structure on the map. The important elements in this study include optimal size of moving window, kinds of optimal spatial statistics and determination of optimal percentile standard. To determine such optimal elements, a numerous simulations were implemented. Eventually, user program based on R was developed using optimal analysis algorithm. The user program was designed to identify the variations of various spatial statistics. It leads to easy analysis of geological structure depending on variation of spatial statistics by arranging to easily designate the type of spatial statistics and percentile standard. This research was supported by the Korea Agency for Infrastructure Technology Advancement under the Ministry of Land, Infrastructure and Transport of the Korean government. (Project Number: 13 Construction Research T01)
Heavy Ion Transient Characterization of a Photobit Hardened-by-Design Active Pixel Sensor Array
NASA Technical Reports Server (NTRS)
Marshall, Paul W.; Byers, Wheaton B.; Conger, Christopher; Eid, El-Sayed; Gee, George; Jones, Michael R.; Marshall, Cheryl J.; Reed, Robert; Pickel, Jim; Kniffin, Scott
2002-01-01
This paper presents heavy ion data on the single event transient (SET) response of a Photobit active pixel sensor (APS) four quadrant test chip with different radiation tolerant designs in a standard 0.35 micron CMOS process. The physical design techniques of enclosed geometry and P-channel guard rings are used to design the four N-type active photodiode pixels as described in a previous paper. Argon transient measurements on the 256 x 256 chip array as a function of incident angle show a significant variation in the amount of charge collected as well as the charge spreading dependent on the pixel type. The results are correlated with processing and design information provided by Photobit. In addition, there is a large degree of statistical variability between individual ion strikes. No latch-up is observed up to an LET of 106 MeV/mg/sq cm.
Satellite Systems Design/Simulation Environment: A Systems Approach to Pre-Phase A Design
NASA Technical Reports Server (NTRS)
Ferebee, Melvin J., Jr.; Troutman, Patrick A.; Monell, Donald W.
1997-01-01
A toolset for the rapid development of small satellite systems has been created. The objective of this tool is to support the definition of spacecraft mission concepts to satisfy a given set of mission and instrument requirements. The objective of this report is to provide an introduction to understanding and using the SMALLSAT Model. SMALLSAT is a computer-aided Phase A design and technology evaluation tool for small satellites. SMALLSAT enables satellite designers, mission planners, and technology program managers to observe the likely consequences of their decisions in terms of satellite configuration, non-recurring and recurring cost, and mission life cycle costs and availability statistics. It was developed by Princeton Synergetic, Inc. and User Systems, Inc. as a revision of the previous TECHSAT Phase A design tool, which modeled medium-sized Earth observation satellites. Both TECHSAT and SMALLSAT were developed for NASA.
Sun, Xing; Li, Xiaoyun; Chen, Cong; Song, Yang
2013-01-01
Frequent rise of interval-censored time-to-event data in randomized clinical trials (e.g., progression-free survival [PFS] in oncology) challenges statistical researchers in the pharmaceutical industry in various ways. These challenges exist in both trial design and data analysis. Conventional statistical methods treating intervals as fixed points, which are generally practiced by pharmaceutical industry, sometimes yield inferior or even flawed analysis results in extreme cases for interval-censored data. In this article, we examine the limitation of these standard methods under typical clinical trial settings and further review and compare several existing nonparametric likelihood-based methods for interval-censored data, methods that are more sophisticated but robust. Trial design issues involved with interval-censored data comprise another topic to be explored in this article. Unlike right-censored survival data, expected sample size or power for a trial with interval-censored data relies heavily on the parametric distribution of the baseline survival function as well as the frequency of assessments. There can be substantial power loss in trials with interval-censored data if the assessments are very infrequent. Such an additional dependency controverts many fundamental assumptions and principles in conventional survival trial designs, especially the group sequential design (e.g., the concept of information fraction). In this article, we discuss these fundamental changes and available tools to work around their impacts. Although progression-free survival is often used as a discussion point in the article, the general conclusions are equally applicable to other interval-censored time-to-event endpoints.
Progress with modeling activity landscapes in drug discovery.
Vogt, Martin
2018-04-19
Activity landscapes (ALs) are representations and models of compound data sets annotated with a target-specific activity. In contrast to quantitative structure-activity relationship (QSAR) models, ALs aim at characterizing structure-activity relationships (SARs) on a large-scale level encompassing all active compounds for specific targets. The popularity of AL modeling has grown substantially with the public availability of large activity-annotated compound data sets. AL modeling crucially depends on molecular representations and similarity metrics used to assess structural similarity. Areas covered: The concepts of AL modeling are introduced and its basis in quantitatively assessing molecular similarity is discussed. The different types of AL modeling approaches are introduced. AL designs can broadly be divided into three categories: compound-pair based, dimensionality reduction, and network approaches. Recent developments for each of these categories are discussed focusing on the application of mathematical, statistical, and machine learning tools for AL modeling. AL modeling using chemical space networks is covered in more detail. Expert opinion: AL modeling has remained a largely descriptive approach for the analysis of SARs. Beyond mere visualization, the application of analytical tools from statistics, machine learning and network theory has aided in the sophistication of AL designs and provides a step forward in transforming ALs from descriptive to predictive tools. To this end, optimizing representations that encode activity relevant features of molecules might prove to be a crucial step.
R2 & NE Tract - 2010 Census; Housing and Population Summary
The TIGER/Line Files are shapefiles and related database files (.dbf) that are an extract of selected geographic and cartographic information from the U.S. Census Bureau's Master Address File / Topologically Integrated Geographic Encoding and Referencing (MAF/TIGER) Database (MTDB). The MTDB represents a seamless national file with no overlaps or gaps between parts, however, each TIGER/Line File is designed to stand alone as an independent data set, or they can be combined to cover the entire nation. Census tracts are small, relatively permanent statistical subdivisions of a county or equivalent entity, and were defined by local participants as part of the 2010 Census Participant Statistical Areas Program. The Census Bureau delineated the census tracts in situations where no local participant existed or where all the potential participants declined to participate. The primary purpose of census tracts is to provide a stable set of geographic units for the presentation of census data and comparison back to previous decennial censuses. Census tracts generally have a population size between 1,200 and 8,000 people, with an optimum size of 4,000 people. When first delineated, census tracts were designed to be homogeneous with respect to population characteristics, economic status, and living conditions. The spatial size of census tracts varies widely depending on the density of settlement. Physical changes in street patterns caused by highway construction, new
NASA Astrophysics Data System (ADS)
Zecha, Stefanie; Regelous, Anette
2017-04-01
National Geoparks are restricted areas incorporating educational resources of great importance in promoting education for sustainable development, mobilizing knowledge inherent to the EarthSciences. Different methods can be used to implement the education of sustainability. Here we present possibilities for National Geoparks to support sustainability focusing on new media and EarthCaches based on the data set of the "EarthCachers International EarthCaching" conference in Goslar in October 2015. Using an empirical study designed by ourselves we collected actual information about the environmental consciousness of Earthcachers. The data set was analyzed using SPSS and statistical methods. Here we present the results and their consequences for National Geoparks.
[Review of research design and statistical methods in Chinese Journal of Cardiology].
Zhang, Li-jun; Yu, Jin-ming
2009-07-01
To evaluate the research design and the use of statistical methods in Chinese Journal of Cardiology. Peer through the research design and statistical methods in all of the original papers in Chinese Journal of Cardiology from December 2007 to November 2008. The most frequently used research designs are cross-sectional design (34%), prospective design (21%) and experimental design (25%). In all of the articles, 49 (25%) use wrong statistical methods, 29 (15%) lack some sort of statistic analysis, 23 (12%) have inconsistencies in description of methods. There are significant differences between different statistical methods (P < 0.001). The correction rates of multifactor analysis were low and repeated measurement datas were not used repeated measurement analysis. Many problems exist in Chinese Journal of Cardiology. Better research design and correct use of statistical methods are still needed. More strict review by statistician and epidemiologist is also required to improve the literature qualities.
Design, analysis, and interpretation of field quality-control data for water-sampling projects
Mueller, David K.; Schertz, Terry L.; Martin, Jeffrey D.; Sandstrom, Mark W.
2015-01-01
The report provides extensive information about statistical methods used to analyze quality-control data in order to estimate potential bias and variability in environmental data. These methods include construction of confidence intervals on various statistical measures, such as the mean, percentiles and percentages, and standard deviation. The methods are used to compare quality-control results with the larger set of environmental data in order to determine whether the effects of bias and variability might interfere with interpretation of these data. Examples from published reports are presented to illustrate how the methods are applied, how bias and variability are reported, and how the interpretation of environmental data can be qualified based on the quality-control analysis.
Alania, M; De Backer, A; Lobato, I; Krause, F F; Van Dyck, D; Rosenauer, A; Van Aert, S
2017-10-01
In this paper, we investigate how precise atoms of a small nanocluster can ultimately be located in three dimensions (3D) from a tilt series of images acquired using annular dark field (ADF) scanning transmission electron microscopy (STEM). Therefore, we derive an expression for the statistical precision with which the 3D atomic position coordinates can be estimated in a quantitative analysis. Evaluating this statistical precision as a function of the microscope settings also allows us to derive the optimal experimental design. In this manner, the optimal angular tilt range, required electron dose, optimal detector angles, and number of projection images can be determined. Copyright © 2016 Elsevier B.V. All rights reserved.
A weighted generalized score statistic for comparison of predictive values of diagnostic tests.
Kosinski, Andrzej S
2013-03-15
Positive and negative predictive values are important measures of a medical diagnostic test performance. We consider testing equality of two positive or two negative predictive values within a paired design in which all patients receive two diagnostic tests. The existing statistical tests for testing equality of predictive values are either Wald tests based on the multinomial distribution or the empirical Wald and generalized score tests within the generalized estimating equations (GEE) framework. As presented in the literature, these test statistics have considerably complex formulas without clear intuitive insight. We propose their re-formulations that are mathematically equivalent but algebraically simple and intuitive. As is clearly seen with a new re-formulation we presented, the generalized score statistic does not always reduce to the commonly used score statistic in the independent samples case. To alleviate this, we introduce a weighted generalized score (WGS) test statistic that incorporates empirical covariance matrix with newly proposed weights. This statistic is simple to compute, always reduces to the score statistic in the independent samples situation, and preserves type I error better than the other statistics as demonstrated by simulations. Thus, we believe that the proposed WGS statistic is the preferred statistic for testing equality of two predictive values and for corresponding sample size computations. The new formulas of the Wald statistics may be useful for easy computation of confidence intervals for difference of predictive values. The introduced concepts have potential to lead to development of the WGS test statistic in a general GEE setting. Copyright © 2012 John Wiley & Sons, Ltd.
A weighted generalized score statistic for comparison of predictive values of diagnostic tests
Kosinski, Andrzej S.
2013-01-01
Positive and negative predictive values are important measures of a medical diagnostic test performance. We consider testing equality of two positive or two negative predictive values within a paired design in which all patients receive two diagnostic tests. The existing statistical tests for testing equality of predictive values are either Wald tests based on the multinomial distribution or the empirical Wald and generalized score tests within the generalized estimating equations (GEE) framework. As presented in the literature, these test statistics have considerably complex formulas without clear intuitive insight. We propose their re-formulations which are mathematically equivalent but algebraically simple and intuitive. As is clearly seen with a new re-formulation we present, the generalized score statistic does not always reduce to the commonly used score statistic in the independent samples case. To alleviate this, we introduce a weighted generalized score (WGS) test statistic which incorporates empirical covariance matrix with newly proposed weights. This statistic is simple to compute, it always reduces to the score statistic in the independent samples situation, and it preserves type I error better than the other statistics as demonstrated by simulations. Thus, we believe the proposed WGS statistic is the preferred statistic for testing equality of two predictive values and for corresponding sample size computations. The new formulas of the Wald statistics may be useful for easy computation of confidence intervals for difference of predictive values. The introduced concepts have potential to lead to development of the weighted generalized score test statistic in a general GEE setting. PMID:22912343
Automatic tissue characterization from ultrasound imagery
NASA Astrophysics Data System (ADS)
Kadah, Yasser M.; Farag, Aly A.; Youssef, Abou-Bakr M.; Badawi, Ahmed M.
1993-08-01
In this work, feature extraction algorithms are proposed to extract the tissue characterization parameters from liver images. Then the resulting parameter set is further processed to obtain the minimum number of parameters representing the most discriminating pattern space for classification. This preprocessing step was applied to over 120 pathology-investigated cases to obtain the learning data for designing the classifier. The extracted features are divided into independent training and test sets and are used to construct both statistical and neural classifiers. The optimal criteria for these classifiers are set to have minimum error, ease of implementation and learning, and the flexibility for future modifications. Various algorithms for implementing various classification techniques are presented and tested on the data. The best performance was obtained using a single layer tensor model functional link network. Also, the voting k-nearest neighbor classifier provided comparably good diagnostic rates.
Evaluating the consistency of gene sets used in the analysis of bacterial gene expression data.
Tintle, Nathan L; Sitarik, Alexandra; Boerema, Benjamin; Young, Kylie; Best, Aaron A; Dejongh, Matthew
2012-08-08
Statistical analyses of whole genome expression data require functional information about genes in order to yield meaningful biological conclusions. The Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) are common sources of functionally grouped gene sets. For bacteria, the SEED and MicrobesOnline provide alternative, complementary sources of gene sets. To date, no comprehensive evaluation of the data obtained from these resources has been performed. We define a series of gene set consistency metrics directly related to the most common classes of statistical analyses for gene expression data, and then perform a comprehensive analysis of 3581 Affymetrix® gene expression arrays across 17 diverse bacteria. We find that gene sets obtained from GO and KEGG demonstrate lower consistency than those obtained from the SEED and MicrobesOnline, regardless of gene set size. Despite the widespread use of GO and KEGG gene sets in bacterial gene expression data analysis, the SEED and MicrobesOnline provide more consistent sets for a wide variety of statistical analyses. Increased use of the SEED and MicrobesOnline gene sets in the analysis of bacterial gene expression data may improve statistical power and utility of expression data.
Estimating order statistics of network degrees
NASA Astrophysics Data System (ADS)
Chu, J.; Nadarajah, S.
2018-01-01
We model the order statistics of network degrees of big data sets by a range of generalised beta distributions. A three parameter beta distribution due to Libby and Novick (1982) is shown to give the best overall fit for at least four big data sets. The fit of this distribution is significantly better than the fit suggested by Olhede and Wolfe (2012) across the whole range of order statistics for all four data sets.
Testing the statistical compatibility of independent data sets
NASA Astrophysics Data System (ADS)
Maltoni, M.; Schwetz, T.
2003-08-01
We discuss a goodness-of-fit method which tests the compatibility between statistically independent data sets. The method gives sensible results even in cases where the χ2 minima of the individual data sets are very low or when several parameters are fitted to a large number of data points. In particular, it avoids the problem that a possible disagreement between data sets becomes diluted by data points which are insensitive to the crucial parameters. A formal derivation of the probability distribution function for the proposed test statistics is given, based on standard theorems of statistics. The application of the method is illustrated on data from neutrino oscillation experiments, and its complementarity to the standard goodness-of-fit is discussed.
The Power Prior: Theory and Applications
Ibrahim, Joseph G.; Chen, Ming-Hui; Gwon, Yeongjin; Chen, Fang
2015-01-01
The power prior has been widely used in many applications covering a large number of disciplines. The power prior is intended to be an informative prior constructed from historical data. It has been used in clinical trials, genetics, health care, psychology, environmental health, engineering, economics, and business. It has also been applied for a wide variety of models and settings, both in the experimental design and analysis contexts. In this review article, we give an A to Z exposition of the power prior and its applications to date. We review its theoretical properties, variations in its formulation, statistical contexts for which it has been used, applications, and its advantages over other informative priors. We review models for which it has been used, including generalized linear models, survival models, and random effects models. Statistical areas where the power prior has been used include model selection, experimental design, hierarchical modeling, and conjugate priors. Prequentist properties of power priors in posterior inference are established and a simulation study is conducted to further examine the empirical performance of the posterior estimates with power priors. Real data analyses are given illustrating the power prior as well as the use of the power prior in the Bayesian design of clinical trials. PMID:26346180
Generation of synthetic flood hydrographs by hydrological donors (SHYDONHY method)
NASA Astrophysics Data System (ADS)
Paquet, Emmanuel
2017-04-01
For the design of hydraulic infrastructures like dams, a design hydrograph is required in most of the cases. Some of its features (e.g. peak value, duration, volume) corresponding to a given return period are computed thanks to a wide range of methods: historical records, mono or multivariate statistical analysis, stochastic simulation, etc. Then various methods have been proposed to construct design hydrographs having such characteristics, ranging from traditional unit-hydrograph to statistical methods (Yue et al., 2002). A new method to build design hydrographs (or more generally synthetic hydrographs) is introduced here, named SHYDONHY, French acronym for "Synthèse d'HYdrogrammes par DONneurs HYdrologiques". It is based on an extensive database of 100 000 flood hydrographs recorded at hourly time-step on 1300 gauging stations in France and Switzerland, covering a wide range of catchment size and climatology. For each station, an average of two hydrographs per year of record has been selected by a peak-over-threshold (POT) method with independence criteria (Lang et al., 1999). This sampling ensures that only hydrographs of intense floods are gathered in the dataset. For a given catchment, where few or no hydrograph is available at the outlet, a sub-set of 10 "donor stations" is selected within the complete dataset, considering several criteria: proximity, size, mean annual values and regimes for both total runoff and POT-selected floods. This sub-set of stations (and their corresponding flood hydrographs) will allow to: • Estimate a characteristic duration of flood hydrographs (e.g. duration for which the discharge is above 50% of the peak value). • For a given duration (e.g. one day), estimate the average peak-to- volume ratio of floods. • For a given duration and peak-to-volume ratio, generation of a synthetic reference hydrograph by combining appropriate hydrographs of the sub-set. • For a given daily discharge sequence, being observed or generated for extreme flood estimation, generate a suitable synthetic hydrograph, also by combining selected hydrographs of the sub-set. The reliability of the method is assessed by performing a jackknife validation on the whole dataset of stations, in particular by reconstructing the hydrograph of the biggest flood of each station and comparing it to the actual one. Some applications are presented, e.g. the coupling of SHYDONHY with the SCHADEX method (Paquet et al., 2003) for the stochastic simulation of extreme reservoir level in dams. References: Lang, M., Ouarda, T. B. M. J., & Bobée, B. (1999). Towards operational guidelines for over-threshold modeling. Journal of hydrology, 225(3), 103-117. Paquet, E., Garavaglia, F., Garçon, R., & Gailhard, J. (2013). The SCHADEX method: A semi-continuous rainfall-runoff simulation for extreme flood estimation. Journal of Hydrology, 495, 23-37. Yue, S., Ouarda, T. B., Bobée, B., Legendre, P., & Bruneau, P. (2002). Approach for describing statistical properties of flood hydrograph. Journal of hydrologic engineering, 7(2), 147-153.
Comparing Data Sets: Implicit Summaries of the Statistical Properties of Number Sets
ERIC Educational Resources Information Center
Morris, Bradley J.; Masnick, Amy M.
2015-01-01
Comparing datasets, that is, sets of numbers in context, is a critical skill in higher order cognition. Although much is known about how people compare single numbers, little is known about how number sets are represented and compared. We investigated how subjects compared datasets that varied in their statistical properties, including ratio of…
Pingault, Jean Baptiste; Côté, Sylvana M; Petitclerc, Amélie; Vitaro, Frank; Tremblay, Richard E
2015-01-01
Parental educational expectations have been associated with children's educational attainment in a number of long-term longitudinal studies, but whether this relationship is causal has long been debated. The aims of this prospective study were twofold: 1) test whether low maternal educational expectations contributed to failure to graduate from high school; and 2) compare the results obtained using different strategies for accounting for confounding variables (i.e. multivariate regression and propensity score matching). The study sample included 1,279 participants from the Quebec Longitudinal Study of Kindergarten Children. Maternal educational expectations were assessed when the participants were aged 12 years. High school graduation—measuring educational attainment—was determined through the Quebec Ministry of Education when the participants were aged 22-23 years. Findings show that when using the most common statistical approach (i.e. multivariate regressions to adjust for a restricted set of potential confounders) the contribution of low maternal educational expectations to failure to graduate from high school was statistically significant. However, when using propensity score matching, the contribution of maternal expectations was reduced and remained statistically significant only for males. The results of this study are consistent with the possibility that the contribution of parental expectations to educational attainment is overestimated in the available literature. This may be explained by the use of a restricted range of potential confounding variables as well as the dearth of studies using appropriate statistical techniques and study designs in order to minimize confounding. Each of these techniques and designs, including propensity score matching, has its strengths and limitations: A more comprehensive understanding of the causal role of parental expectations will stem from a convergence of findings from studies using different techniques and designs.
Sampling and Data Analysis for Environmental Microbiology
DOE Office of Scientific and Technical Information (OSTI.GOV)
Murray, Christopher J.
2001-06-01
A brief review of the literature indicates the importance of statistical analysis in applied and environmental microbiology. Sampling designs are particularly important for successful studies, and it is highly recommended that researchers review their sampling design before heading to the laboratory or the field. Most statisticians have numerous stories of scientists who approached them after their study was complete only to have to tell them that the data they gathered could not be used to test the hypothesis they wanted to address. Once the data are gathered, a large and complex body of statistical techniques are available for analysis ofmore » the data. Those methods include both numerical and graphical techniques for exploratory characterization of the data. Hypothesis testing and analysis of variance (ANOVA) are techniques that can be used to compare the mean and variance of two or more groups of samples. Regression can be used to examine the relationships between sets of variables and is often used to examine the dependence of microbiological populations on microbiological parameters. Multivariate statistics provides several methods that can be used for interpretation of datasets with a large number of variables and to partition samples into similar groups, a task that is very common in taxonomy, but also has applications in other fields of microbiology. Geostatistics and other techniques have been used to examine the spatial distribution of microorganisms. The objectives of this chapter are to provide a brief survey of some of the statistical techniques that can be used for sample design and data analysis of microbiological data in environmental studies, and to provide some examples of their use from the literature.« less
Application of short-data methods on extreme surge levels
NASA Astrophysics Data System (ADS)
Feng, X.
2014-12-01
Tropical cyclone-induced storm surges are among the most destructive natural hazards that impact the United States. Unfortunately for academic research, the available time series for extreme surge analysis are very short. The limited data introduces uncertainty and affects the accuracy of statistical analyses of extreme surge levels. This study deals with techniques applicable to data sets less than 20 years, including simulation modelling and methods based on the parameters of the parent distribution. The verified water levels from water gauges spread along the Southwest and Southeast Florida Coast, as well as the Florida Keys, are used in this study. Methods to calculate extreme storm surges are described and reviewed, including 'classical' methods based on the generalized extreme value (GEV) distribution and the generalized Pareto distribution (GPD), and approaches designed specifically to deal with short data sets. Incorporating global-warming influence, the statistical analysis reveals enhanced extreme surge magnitudes and frequencies during warm years, while reduced levels of extreme surge activity are observed in the same study domain during cold years. Furthermore, a non-stationary GEV distribution is applied to predict the extreme surge levels with warming sea surface temperatures. The non-stationary GEV distribution indicates that with 1 Celsius degree warming in sea surface temperature from the baseline climate, the 100-year return surge level in Southwest and Southeast Florida will increase by up to 40 centimeters. The considered statistical approaches for extreme surge estimation based on short data sets will be valuable to coastal stakeholders, including urban planners, emergency managers, and the hurricane and storm surge forecasting and warning system.
Rupert, Michael G.; Plummer, Niel
2009-01-01
This raster data set delineates the predicted probability of unmixed young groundwater (defined using chlorofluorocarbon-11 concentrations and tritium activities) in groundwater in the Eagle River watershed valley-fill aquifer, Eagle County, North-Central Colorado, 2006-2007. This data set was developed by a cooperative project between the U.S. Geological Survey, Eagle County, the Eagle River Water and Sanitation District, the Town of Eagle, the Town of Gypsum, and the Upper Eagle Regional Water Authority. This project was designed to evaluate potential land-development effects on groundwater and surface-water resources so that informed land-use and water management decisions can be made. This groundwater probability map and its associated probability maps were developed as follows: (1) A point data set of wells with groundwater quality and groundwater age data was overlaid with thematic layers of anthropogenic (related to human activities) and hydrogeologic data by using a geographic information system to assign each well values for depth to groundwater, distance to major streams and canals, distance to gypsum beds, precipitation, soils, and well depth. These data then were downloaded to a statistical software package for analysis by logistic regression. (2) Statistical models predicting the probability of elevated nitrate concentrations, the probability of unmixed young water (using chlorofluorocarbon-11 concentrations and tritium activities), and the probability of elevated volatile organic compound concentrations were developed using logistic regression techniques. (3) The statistical models were entered into a GIS and the probability map was constructed.
Chan, R V Paul; Patel, Samir N; Ryan, Michael C; Jonas, Karyn E; Ostmo, Susan; Port, Alexander D; Sun, Grace I; Lauer, Andreas K; Chiang, Michael F
2015-01-01
To describe the design, implementation, and evaluation of a tele-education system developed to improve diagnostic competency in retinopathy of prematurity (ROP) by ophthalmology residents. A secure Web-based tele-education system was developed utilizing a repository of over 2,500 unique image sets of ROP. For each image set used in the system, a reference standard ROP diagnosis was established. Performance by ophthalmology residents (postgraduate years 2 to 4) from the United States and Canada in taking the ROP tele-education program was prospectively evaluated. Residents were presented with image-based clinical cases of ROP during a pretest, posttest, and training chapters. Accuracy and reliability of ROP diagnosis (eg, plus disease, zone, stage, category) were determined using sensitivity, specificity, and the kappa statistic calculations of the results from the pretest and posttest. Fifty-five ophthalmology residents were provided access to the ROP tele-education program. Thirty-one ophthalmology residents completed the program. When all training levels were analyzed together, a statistically significant increase was observed in sensitivity for the diagnosis of plus disease, zone, stage, category, and aggressive posterior ROP (P<.05). Statistically significant changes in specificity for identification of stage 2 or worse (P=.027) and pre-plus (P=.028) were observed. A tele-education system for ROP education is effective in improving diagnostic accuracy of ROP by ophthalmology residents. This system may have utility in the setting of both healthcare and medical education reform by creating a validated method to certify telemedicine providers and educate the next generation of ophthalmologists.
Validation of Statistical Sampling Algorithms in Visual Sample Plan (VSP): Summary Report
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nuffer, Lisa L; Sego, Landon H.; Wilson, John E.
2009-02-18
The U.S. Department of Homeland Security, Office of Technology Development (OTD) contracted with a set of U.S. Department of Energy national laboratories, including the Pacific Northwest National Laboratory (PNNL), to write a Remediation Guidance for Major Airports After a Chemical Attack. The report identifies key activities and issues that should be considered by a typical major airport following an incident involving release of a toxic chemical agent. Four experimental tasks were identified that would require further research in order to supplement the Remediation Guidance. One of the tasks, Task 4, OTD Chemical Remediation Statistical Sampling Design Validation, dealt with statisticalmore » sampling algorithm validation. This report documents the results of the sampling design validation conducted for Task 4. In 2005, the Government Accountability Office (GAO) performed a review of the past U.S. responses to Anthrax terrorist cases. Part of the motivation for this PNNL report was a major GAO finding that there was a lack of validated sampling strategies in the U.S. response to Anthrax cases. The report (GAO 2005) recommended that probability-based methods be used for sampling design in order to address confidence in the results, particularly when all sample results showed no remaining contamination. The GAO also expressed a desire that the methods be validated, which is the main purpose of this PNNL report. The objective of this study was to validate probability-based statistical sampling designs and the algorithms pertinent to within-building sampling that allow the user to prescribe or evaluate confidence levels of conclusions based on data collected as guided by the statistical sampling designs. Specifically, the designs found in the Visual Sample Plan (VSP) software were evaluated. VSP was used to calculate the number of samples and the sample location for a variety of sampling plans applied to an actual release site. Most of the sampling designs validated are probability based, meaning samples are located randomly (or on a randomly placed grid) so no bias enters into the placement of samples, and the number of samples is calculated such that IF the amount and spatial extent of contamination exceeds levels of concern, at least one of the samples would be taken from a contaminated area, at least X% of the time. Hence, "validation" of the statistical sampling algorithms is defined herein to mean ensuring that the "X%" (confidence) is actually met.« less
Local sensitivity analysis for inverse problems solved by singular value decomposition
Hill, M.C.; Nolan, B.T.
2010-01-01
Local sensitivity analysis provides computationally frugal ways to evaluate models commonly used for resource management, risk assessment, and so on. This includes diagnosing inverse model convergence problems caused by parameter insensitivity and(or) parameter interdependence (correlation), understanding what aspects of the model and data contribute to measures of uncertainty, and identifying new data likely to reduce model uncertainty. Here, we consider sensitivity statistics relevant to models in which the process model parameters are transformed using singular value decomposition (SVD) to create SVD parameters for model calibration. The statistics considered include the PEST identifiability statistic, and combined use of the process-model parameter statistics composite scaled sensitivities and parameter correlation coefficients (CSS and PCC). The statistics are complimentary in that the identifiability statistic integrates the effects of parameter sensitivity and interdependence, while CSS and PCC provide individual measures of sensitivity and interdependence. PCC quantifies correlations between pairs or larger sets of parameters; when a set of parameters is intercorrelated, the absolute value of PCC is close to 1.00 for all pairs in the set. The number of singular vectors to include in the calculation of the identifiability statistic is somewhat subjective and influences the statistic. To demonstrate the statistics, we use the USDA’s Root Zone Water Quality Model to simulate nitrogen fate and transport in the unsaturated zone of the Merced River Basin, CA. There are 16 log-transformed process-model parameters, including water content at field capacity (WFC) and bulk density (BD) for each of five soil layers. Calibration data consisted of 1,670 observations comprising soil moisture, soil water tension, aqueous nitrate and bromide concentrations, soil nitrate concentration, and organic matter content. All 16 of the SVD parameters could be estimated by regression based on the range of singular values. Identifiability statistic results varied based on the number of SVD parameters included. Identifiability statistics calculated for four SVD parameters indicate the same three most important process-model parameters as CSS/PCC (WFC1, WFC2, and BD2), but the order differed. Additionally, the identifiability statistic showed that BD1 was almost as dominant as WFC1. The CSS/PCC analysis showed that this results from its high correlation with WCF1 (-0.94), and not its individual sensitivity. Such distinctions, combined with analysis of how high correlations and(or) sensitivities result from the constructed model, can produce important insights into, for example, the use of sensitivity analysis to design monitoring networks. In conclusion, the statistics considered identified similar important parameters. They differ because (1) with CSS/PCC can be more awkward because sensitivity and interdependence are considered separately and (2) identifiability requires consideration of how many SVD parameters to include. A continuing challenge is to understand how these computationally efficient methods compare with computationally demanding global methods like Markov-Chain Monte Carlo given common nonlinear processes and the often even more nonlinear models.
Vega-Ramírez, Francisco Antonio; Rocamora-Pérez, Patricia; Aguilar-Parra, José Manuel; Padilla-Góngora, David
2016-01-01
Objective To compare home-based rehabilitation (RITH) and standard outpatient rehabilitation in a hospital setting, in terms of improving the functional recovery and quality of life of stroke patients. Study Design and Setting This was a prospective cohort study in Andalusia (Spain). Participants One hundred and forty-five patients completed the outcome data. Measures Daily activities were measured by the Barthel index, Canadian Neurological Scale (to assess mental state), Tinetti scale (balance and gait), and Short Form Health Survey-36 (SF-36 to compare the quality of life). Results No statistically significant differences were found between the two groups regarding the clinical characteristics of patients in the initial measurement, except for age and mental state (younger and with greater neurological impairment in the hospital group). After physical therapy, both groups showed statistically significant improvements from baseline in each of the measures. These improvements were better in RITH patients than in the hospital patients on all functionality scales with a smaller number of sessions. Conclusions Home rehabilitation is at least as effective as the outpatient rehabilitation programs in a hospital setting, in terms of recovery of functionality in post-stroke patients. Overall quality of life is severely impaired in both groups, as stroke is a very disabling disease that radically affects patients’ lives. PMID:27835673
Coccaro, Emil F
2011-01-01
This study was designed to develop a revised diagnostic criteria set for intermittent explosive disorder (IED) for consideration for inclusion in Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition (DSM-V). This revised criteria set was developed by integrating previous research criteria with elements from the current DSM-IV set of diagnostic criteria. Evidence supporting the reliability and validity of IED-IR ("IED Integrated Criteria") in a new and well-characterized group of subjects with personality disorder is presented. Clinical, phenomenologic, and diagnostic data from 201 individuals with personality disorder were reviewed. All IED diagnoses were assigned using a best-estimate process (eg, kappa for IED-IR >0.85). In addition, subjects meeting IED-IR criteria had higher scores on dimensional measures of aggression and had lower global functioning scores than non-IED-IR subjects, even when related variables were controlled. The IED-IR criteria were more sensitive than the DSM-IV criteria only in identifying subjects with significant impulsive-aggressive behavior by a factor of 16. We conclude that the IED-IR criteria can be reliably applied and have sufficient validity to warrant consideration as DSM-V criteria for IED. Copyright © 2011 Elsevier Inc. All rights reserved.
Shrinkage covariance matrix approach based on robust trimmed mean in gene sets detection
NASA Astrophysics Data System (ADS)
Karjanto, Suryaefiza; Ramli, Norazan Mohamed; Ghani, Nor Azura Md; Aripin, Rasimah; Yusop, Noorezatty Mohd
2015-02-01
Microarray involves of placing an orderly arrangement of thousands of gene sequences in a grid on a suitable surface. The technology has made a novelty discovery since its development and obtained an increasing attention among researchers. The widespread of microarray technology is largely due to its ability to perform simultaneous analysis of thousands of genes in a massively parallel manner in one experiment. Hence, it provides valuable knowledge on gene interaction and function. The microarray data set typically consists of tens of thousands of genes (variables) from just dozens of samples due to various constraints. Therefore, the sample covariance matrix in Hotelling's T2 statistic is not positive definite and become singular, thus it cannot be inverted. In this research, the Hotelling's T2 statistic is combined with a shrinkage approach as an alternative estimation to estimate the covariance matrix to detect significant gene sets. The use of shrinkage covariance matrix overcomes the singularity problem by converting an unbiased to an improved biased estimator of covariance matrix. Robust trimmed mean is integrated into the shrinkage matrix to reduce the influence of outliers and consequently increases its efficiency. The performance of the proposed method is measured using several simulation designs. The results are expected to outperform existing techniques in many tested conditions.
Influenza forecasting with Google Flu Trends.
Dugas, Andrea Freyer; Jalalpour, Mehdi; Gel, Yulia; Levin, Scott; Torcaso, Fred; Igusa, Takeru; Rothman, Richard E
2013-01-01
We developed a practical influenza forecast model based on real-time, geographically focused, and easy to access data, designed to provide individual medical centers with advanced warning of the expected number of influenza cases, thus allowing for sufficient time to implement interventions. Secondly, we evaluated the effects of incorporating a real-time influenza surveillance system, Google Flu Trends, and meteorological and temporal information on forecast accuracy. Forecast models designed to predict one week in advance were developed from weekly counts of confirmed influenza cases over seven seasons (2004-2011) divided into seven training and out-of-sample verification sets. Forecasting procedures using classical Box-Jenkins, generalized linear models (GLM), and generalized linear autoregressive moving average (GARMA) methods were employed to develop the final model and assess the relative contribution of external variables such as, Google Flu Trends, meteorological data, and temporal information. A GARMA(3,0) forecast model with Negative Binomial distribution integrating Google Flu Trends information provided the most accurate influenza case predictions. The model, on the average, predicts weekly influenza cases during 7 out-of-sample outbreaks within 7 cases for 83% of estimates. Google Flu Trend data was the only source of external information to provide statistically significant forecast improvements over the base model in four of the seven out-of-sample verification sets. Overall, the p-value of adding this external information to the model is 0.0005. The other exogenous variables did not yield a statistically significant improvement in any of the verification sets. Integer-valued autoregression of influenza cases provides a strong base forecast model, which is enhanced by the addition of Google Flu Trends confirming the predictive capabilities of search query based syndromic surveillance. This accessible and flexible forecast model can be used by individual medical centers to provide advanced warning of future influenza cases.
Nonlinear Modeling of Causal Interrelationships in Neuronal Ensembles
Zanos, Theodoros P.; Courellis, Spiros H.; Berger, Theodore W.; Hampson, Robert E.; Deadwyler, Sam A.; Marmarelis, Vasilis Z.
2009-01-01
The increasing availability of multiunit recordings gives new urgency to the need for effective analysis of “multidimensional” time-series data that are derived from the recorded activity of neuronal ensembles in the form of multiple sequences of action potentials—treated mathematically as point-processes and computationally as spike-trains. Whether in conditions of spontaneous activity or under conditions of external stimulation, the objective is the identification and quantification of possible causal links among the neurons generating the observed binary signals. A multiple-input/multiple-output (MIMO) modeling methodology is presented that can be used to quantify the neuronal dynamics of causal interrelationships in neuronal ensembles using spike-train data recorded from individual neurons. These causal interrelationships are modeled as transformations of spike-trains recorded from a set of neurons designated as the “inputs” into spike-trains recorded from another set of neurons designated as the “outputs.” The MIMO model is composed of a set of multiinput/single-output (MISO) modules, one for each output. Each module is the cascade of a MISO Volterra model and a threshold operator generating the output spikes. The Laguerre expansion approach is used to estimate the Volterra kernels of each MISO module from the respective input–output data using the least-squares method. The predictive performance of the model is evaluated with the use of the receiver operating characteristic (ROC) curve, from which the optimum threshold is also selected. The Mann–Whitney statistic is used to select the significant inputs for each output by examining the statistical significance of improvements in the predictive accuracy of the model when the respective inputs is included. Illustrative examples are presented for a simulated system and for an actual application using multiunit data recordings from the hippocampus of a behaving rat. PMID:18701382
Effect of wheelchair design on wheeled mobility and propulsion efficiency in less-resourced settings
2017-01-01
Background Wheelchair research includes both qualitative and quantitative approaches, primarily focuses on functionality and skill performance and is often limited to short testing periods. This is the first study to use the combination of a performance test (i.e. wheelchair propulsion test) and a multiple-day mobility assessment to evaluate wheelchair designs in rural areas of a developing country. Objectives Test the feasibility of using wheel-mounted accelerometers to document bouts of wheeled mobility data in rural settings and use these data to compare how patients respond to different wheelchair designs. Methods A quasi-experimental, pre- and post-test design was used to test the differences between locally manufactured wheelchairs (push rim and tricycle) and an imported intervention product (dual-lever propulsion wheelchair). A one-way repeated measures analysis of variance was used to interpret propulsion and wheeled mobility data. Results There were no statistical differences in bouts of mobility between the locally manufactured and intervention product, which was explained by high amounts of variability within the data. With regard to the propulsion test, push rim users were significantly more efficient when using the intervention product compared with tricycle users. Conclusion Use of wheel-mounted accelerometers as a means to test user mobility proved to be a feasible methodology in rural settings. Variability in wheeled mobility data could be decreased with longer acclimatisation periods. The data suggest that push rim users experience an easier transition to a dual-lever propulsion system. PMID:28936416
Stanfill, Christopher J; Jensen, Jody L
2017-01-01
Wheelchair research includes both qualitative and quantitative approaches, primarily focuses on functionality and skill performance and is often limited to short testing periods. This is the first study to use the combination of a performance test (i.e. wheelchair propulsion test) and a multiple-day mobility assessment to evaluate wheelchair designs in rural areas of a developing country. Test the feasibility of using wheel-mounted accelerometers to document bouts of wheeled mobility data in rural settings and use these data to compare how patients respond to different wheelchair designs. A quasi-experimental, pre- and post-test design was used to test the differences between locally manufactured wheelchairs (push rim and tricycle) and an imported intervention product (dual-lever propulsion wheelchair). A one-way repeated measures analysis of variance was used to interpret propulsion and wheeled mobility data. There were no statistical differences in bouts of mobility between the locally manufactured and intervention product, which was explained by high amounts of variability within the data. With regard to the propulsion test, push rim users were significantly more efficient when using the intervention product compared with tricycle users. Use of wheel-mounted accelerometers as a means to test user mobility proved to be a feasible methodology in rural settings. Variability in wheeled mobility data could be decreased with longer acclimatisation periods. The data suggest that push rim users experience an easier transition to a dual-lever propulsion system.
Research design and statistical methods in Pakistan Journal of Medical Sciences (PJMS).
Akhtar, Sohail; Shah, Syed Wadood Ali; Rafiq, M; Khan, Ajmal
2016-01-01
This article compares the study design and statistical methods used in 2005, 2010 and 2015 of Pakistan Journal of Medical Sciences (PJMS). Only original articles of PJMS were considered for the analysis. The articles were carefully reviewed for statistical methods and designs, and then recorded accordingly. The frequency of each statistical method and research design was estimated and compared with previous years. A total of 429 articles were evaluated (n=74 in 2005, n=179 in 2010, n=176 in 2015) in which 171 (40%) were cross-sectional and 116 (27%) were prospective study designs. A verity of statistical methods were found in the analysis. The most frequent methods include: descriptive statistics (n=315, 73.4%), chi-square/Fisher's exact tests (n=205, 47.8%) and student t-test (n=186, 43.4%). There was a significant increase in the use of statistical methods over time period: t-test, chi-square/Fisher's exact test, logistic regression, epidemiological statistics, and non-parametric tests. This study shows that a diverse variety of statistical methods have been used in the research articles of PJMS and frequency improved from 2005 to 2015. However, descriptive statistics was the most frequent method of statistical analysis in the published articles while cross-sectional study design was common study design.
Mars Microprobe Entry Analysis
NASA Technical Reports Server (NTRS)
Braun, Robert D.; Mitcheltree, Robert A.; Cheatwood, F. McNeil
1998-01-01
The Mars Microprobe mission will provide the first opportunity for subsurface measurements, including water detection, near the south pole of Mars. In this paper, performance of the Microprobe aeroshell design is evaluated through development of a six-degree-of-freedom (6-DOF) aerodynamic database and flight dynamics simulation. Numerous mission uncertainties are quantified and a Monte-Carlo analysis is performed to statistically assess mission performance. Results from this 6-DOF Monte-Carlo simulation demonstrate that, in a majority of the cases (approximately 2-sigma), the penetrator impact conditions are within current design tolerances. Several trajectories are identified in which the current set of impact requirements are not satisfied. From these cases, critical design parameters are highlighted and additional system requirements are suggested. In particular, a relatively large angle-of-attack range near peak heating is identified.
Bright high z SnIa: A challenge for ΛCDM
NASA Astrophysics Data System (ADS)
Perivolaropoulos, L.; Shafieloo, A.
2009-06-01
It has recently been pointed out by Kowalski et. al. [Astrophys. J. 686, 749 (2008).ASJOAB0004-637X10.1086/589937] that there is “an unexpected brightness of the SnIa data at z>1.” We quantify this statement by constructing a new statistic which is applicable directly on the type Ia supernova (SnIa) distance moduli. This statistic is designed to pick up systematic brightness trends of SnIa data points with respect to a best fit cosmological model at high redshifts. It is based on binning the normalized differences between the SnIa distance moduli and the corresponding best fit values in the context of a specific cosmological model (e.g. ΛCDM). These differences are normalized by the standard errors of the observed distance moduli. We then focus on the highest redshift bin and extend its size toward lower redshifts until the binned normalized difference (BND) changes sign (crosses 0) at a redshift zc (bin size Nc). The bin size Nc of this crossing (the statistical variable) is then compared with the corresponding crossing bin size Nmc for Monte Carlo data realizations based on the best fit model. We find that the crossing bin size Nc obtained from the Union08 and Gold06 data with respect to the best fit ΛCDM model is anomalously large compared to Nmc of the corresponding Monte Carlo data sets obtained from the best fit ΛCDM in each case. In particular, only 2.2% of the Monte Carlo ΛCDM data sets are consistent with the Gold06 value of Nc while the corresponding probability for the Union08 value of Nc is 5.3%. Thus, according to this statistic, the probability that the high redshift brightness bias of the Union08 and Gold06 data sets is realized in the context of a (w0,w1)=(-1,0) model (ΛCDM cosmology) is less than 6%. The corresponding realization probability in the context of a (w0,w1)=(-1.4,2) model is more than 30% for both the Union08 and the Gold06 data sets indicating a much better consistency for this model with respect to the BND statistic.
Detection of Anomalies in Hydrometric Data Using Artificial Intelligence Techniques
NASA Astrophysics Data System (ADS)
Lauzon, N.; Lence, B. J.
2002-12-01
This work focuses on the detection of anomalies in hydrometric data sequences, such as 1) outliers, which are individual data having statistical properties that differ from those of the overall population; 2) shifts, which are sudden changes over time in the statistical properties of the historical records of data; and 3) trends, which are systematic changes over time in the statistical properties. For the purpose of the design and management of water resources systems, it is important to be aware of these anomalies in hydrometric data, for they can induce a bias in the estimation of water quantity and quality parameters. These anomalies may be viewed as specific patterns affecting the data, and therefore pattern recognition techniques can be used for identifying them. However, the number of possible patterns is very large for each type of anomaly and consequently large computing capacities are required to account for all possibilities using the standard statistical techniques, such as cluster analysis. Artificial intelligence techniques, such as the Kohonen neural network and fuzzy c-means, are clustering techniques commonly used for pattern recognition in several areas of engineering and have recently begun to be used for the analysis of natural systems. They require much less computing capacity than the standard statistical techniques, and therefore are well suited for the identification of outliers, shifts and trends in hydrometric data. This work constitutes a preliminary study, using synthetic data representing hydrometric data that can be found in Canada. The analysis of the results obtained shows that the Kohonen neural network and fuzzy c-means are reasonably successful in identifying anomalies. This work also addresses the problem of uncertainties inherent to the calibration procedures that fit the clusters to the possible patterns for both the Kohonen neural network and fuzzy c-means. Indeed, for the same database, different sets of clusters can be established with these calibration procedures. A simple method for analyzing uncertainties associated with the Kohonen neural network and fuzzy c-means is developed here. The method combines the results from several sets of clusters, either from the Kohonen neural network or fuzzy c-means, so as to provide an overall diagnosis as to the identification of outliers, shifts and trends. The results indicate an improvement in the performance for identifying anomalies when the method of combining cluster sets is used, compared with when only one cluster set is used.
Effect of the absolute statistic on gene-sampling gene-set analysis methods.
Nam, Dougu
2017-06-01
Gene-set enrichment analysis and its modified versions have commonly been used for identifying altered functions or pathways in disease from microarray data. In particular, the simple gene-sampling gene-set analysis methods have been heavily used for datasets with only a few sample replicates. The biggest problem with this approach is the highly inflated false-positive rate. In this paper, the effect of absolute gene statistic on gene-sampling gene-set analysis methods is systematically investigated. Thus far, the absolute gene statistic has merely been regarded as a supplementary method for capturing the bidirectional changes in each gene set. Here, it is shown that incorporating the absolute gene statistic in gene-sampling gene-set analysis substantially reduces the false-positive rate and improves the overall discriminatory ability. Its effect was investigated by power, false-positive rate, and receiver operating curve for a number of simulated and real datasets. The performances of gene-set analysis methods in one-tailed (genome-wide association study) and two-tailed (gene expression data) tests were also compared and discussed.
Kovač, Marko; Bauer, Arthur; Ståhl, Göran
2014-01-01
Backgrounds, Material and Methods To meet the demands of sustainable forest management and international commitments, European nations have designed a variety of forest-monitoring systems for specific needs. While the majority of countries are committed to independent, single-purpose inventorying, a minority of countries have merged their single-purpose forest inventory systems into integrated forest resource inventories. The statistical efficiencies of the Bavarian, Slovene and Swedish integrated forest resource inventory designs are investigated with the various statistical parameters of the variables of growing stock volume, shares of damaged trees, and deadwood volume. The parameters are derived by using the estimators for the given inventory designs. The required sample sizes are derived via the general formula for non-stratified independent samples and via statistical power analyses. The cost effectiveness of the designs is compared via two simple cost effectiveness ratios. Results In terms of precision, the most illustrative parameters of the variables are relative standard errors; their values range between 1% and 3% if the variables’ variations are low (s%<80%) and are higher in the case of higher variations. A comparison of the actual and required sample sizes shows that the actual sample sizes were deliberately set high to provide precise estimates for the majority of variables and strata. In turn, the successive inventories are statistically efficient, because they allow detecting the mean changes of variables with powers higher than 90%; the highest precision is attained for the changes of growing stock volume and the lowest for the changes of the shares of damaged trees. Two indicators of cost effectiveness also show that the time input spent for measuring one variable decreases with the complexity of inventories. Conclusion There is an increasing need for credible information on forest resources to be used for decision making and national and international policy making. Such information can be cost-efficiently provided through integrated forest resource inventories. PMID:24941120
Shirazi, Mohammadali; Dhavala, Soma Sekhar; Lord, Dominique; Geedipally, Srinivas Reddy
2017-10-01
Safety analysts usually use post-modeling methods, such as the Goodness-of-Fit statistics or the Likelihood Ratio Test, to decide between two or more competitive distributions or models. Such metrics require all competitive distributions to be fitted to the data before any comparisons can be accomplished. Given the continuous growth in introducing new statistical distributions, choosing the best one using such post-modeling methods is not a trivial task, in addition to all theoretical or numerical issues the analyst may face during the analysis. Furthermore, and most importantly, these measures or tests do not provide any intuitions into why a specific distribution (or model) is preferred over another (Goodness-of-Logic). This paper ponders into these issues by proposing a methodology to design heuristics for Model Selection based on the characteristics of data, in terms of descriptive summary statistics, before fitting the models. The proposed methodology employs two analytic tools: (1) Monte-Carlo Simulations and (2) Machine Learning Classifiers, to design easy heuristics to predict the label of the 'most-likely-true' distribution for analyzing data. The proposed methodology was applied to investigate when the recently introduced Negative Binomial Lindley (NB-L) distribution is preferred over the Negative Binomial (NB) distribution. Heuristics were designed to select the 'most-likely-true' distribution between these two distributions, given a set of prescribed summary statistics of data. The proposed heuristics were successfully compared against classical tests for several real or observed datasets. Not only they are easy to use and do not need any post-modeling inputs, but also, using these heuristics, the analyst can attain useful information about why the NB-L is preferred over the NB - or vice versa- when modeling data. Copyright © 2017 Elsevier Ltd. All rights reserved.
Blueprint XAS: a Matlab-based toolbox for the fitting and analysis of XAS spectra.
Delgado-Jaime, Mario Ulises; Mewis, Craig Philip; Kennepohl, Pierre
2010-01-01
Blueprint XAS is a new Matlab-based program developed to fit and analyse X-ray absorption spectroscopy (XAS) data, most specifically in the near-edge region of the spectrum. The program is based on a methodology that introduces a novel background model into the complete fit model and that is capable of generating any number of independent fits with minimal introduction of user bias [Delgado-Jaime & Kennepohl (2010), J. Synchrotron Rad. 17, 119-128]. The functions and settings on the five panels of its graphical user interface are designed to suit the needs of near-edge XAS data analyzers. A batch function allows for the setting of multiple jobs to be run with Matlab in the background. A unique statistics panel allows the user to analyse a family of independent fits, to evaluate fit models and to draw statistically supported conclusions. The version introduced here (v0.2) is currently a toolbox for Matlab. Future stand-alone versions of the program will also incorporate several other new features to create a full package of tools for XAS data processing.
Mashburn, Andrew J; Downer, Jason T; Rivers, Susan E; Brackett, Marc A; Martinez, Andres
2014-04-01
Social and emotional learning programs are designed to improve the quality of social interactions in schools and classrooms in order to positively affect students' social, emotional, and academic development. The statistical power of group randomized trials to detect effects of social and emotional learning programs and other preventive interventions on setting-level outcomes is influenced by the reliability of the outcome measure. In this paper, we apply generalizability theory to an observational measure of the quality of classroom interactions that is an outcome in a study of the efficacy of a social and emotional learning program called The Recognizing, Understanding, Labeling, Expressing, and Regulating emotions Approach. We estimate multiple sources of error variance in the setting-level outcome and identify observation procedures to use in the efficacy study that most efficiently reduce these sources of error. We then discuss the implications of using different observation procedures on both the statistical power and the monetary costs of conducting the efficacy study.
Kazdal, Hizir; Kanat, Ayhan; Aydin, Mehmet Dumlu; Yazar, Ugur; Guvercin, Ali Riza; Calik, Muhammet; Gundogdu, Betul
2017-01-01
Context: Sudden death from subarachnoid hemorrhage (SAH) is not uncommon. Aims: The goal of this study is to elucidate the effect of the cervical spinal roots and the related dorsal root ganglions (DRGs) on cardiorespiratory arrest following SAH. Settings and Design: This was an experimental study conducted on rabbits. Materials and Methods: This study was conducted on 22 rabbits which were randomly divided into three groups: control (n = 5), physiologic serum saline (SS; n = 6), SAH groups (n = 11). Experimental SAH was performed. Seven of 11 rabbits with SAH died within the first 2 weeks. After 20 days, other animals were sacrificed. The anterior spinal arteries, arteriae nervorum of cervical nerve roots (C6–C8), DRGs, and lungs were histopathologically examined and estimated stereologically. Statistical Analysis Used: Statistical analysis was performed using the PASW Statistics 18.0 for Windows (SPSS Inc., Chicago, Illinois, USA). Intergroup differences were assessed using a one-way ANOVA. The statistical significance was set at P < 0.05. Results: In the SAH group, histopathologically, severe anterior spinal artery (ASA) and arteriae nervorum vasospasm, axonal and neuronal degeneration, and neuronal apoptosis were observed. Vasospasm of ASA did not occur in the SS and control groups. There was a statistically significant increase in the degenerated neuron density in the SAH group as compared to the control and SS groups (P < 0.05). Cardiorespiratory disturbances, arrest, and lung edema more commonly developed in animals in the SAH group. Conclusion: We noticed interestingly that C6–C8 DRG degenerations were secondary to the vasospasm of ASA, following SAH. Cardiorespiratory disturbances or arrest can be explained with these mechanisms. PMID:28250634
NASA Technical Reports Server (NTRS)
Tripp, John S.; Tcheng, Ping
1999-01-01
Statistical tools, previously developed for nonlinear least-squares estimation of multivariate sensor calibration parameters and the associated calibration uncertainty analysis, have been applied to single- and multiple-axis inertial model attitude sensors used in wind tunnel testing to measure angle of attack and roll angle. The analysis provides confidence and prediction intervals of calibrated sensor measurement uncertainty as functions of applied input pitch and roll angles. A comparative performance study of various experimental designs for inertial sensor calibration is presented along with corroborating experimental data. The importance of replicated calibrations over extended time periods has been emphasized; replication provides independent estimates of calibration precision and bias uncertainties, statistical tests for calibration or modeling bias uncertainty, and statistical tests for sensor parameter drift over time. A set of recommendations for a new standardized model attitude sensor calibration method and usage procedures is included. The statistical information provided by these procedures is necessary for the uncertainty analysis of aerospace test results now required by users of industrial wind tunnel test facilities.
Why does Japan use the probability method to set design flood?
NASA Astrophysics Data System (ADS)
Nakamura, S.; Oki, T.
2015-12-01
Design flood is hypothetical flood to make flood prevention plan. In Japan, a probability method based on precipitation data is used to define the scale of design flood: Tone River, the biggest river in Japan, is 1 in 200 years, Shinano River is 1 in 150 years, and so on. It is one of important socio-hydrological issue how to set reasonable and acceptable design flood in a changing world. The method to set design flood vary among countries. Although the probability method is also used in Netherland, but the base data is water level or discharge data and the probability is 1 in 1250 years (in fresh water section). On the other side, USA and China apply the maximum flood method which set the design flood based on the historical or probable maximum flood. This cases can leads a question: "what is the reason why the method vary among countries?" or "why does Japan use the probability method?" The purpose of this study is to clarify the historical process which the probability method was developed in Japan based on the literature. In the late 19the century, the concept of "discharge" and modern river engineering were imported by Dutch engineers, and modern flood prevention plans were developed in Japan. In these plans, the design floods were set based on the historical maximum method. Although the historical maximum method had been used until World War 2, however, the method was changed to the probability method after the war because of limitations of historical maximum method under the specific socio-economic situations: (1) the budget limitation due to the war and the GHQ occupation, (2) the historical floods: Makurazaki typhoon in 1945, Kathleen typhoon in 1947, Ione typhoon in 1948, and so on, attacked Japan and broke the record of historical maximum discharge in main rivers and the flood disasters made the flood prevention projects difficult to complete. Then, Japanese hydrologists imported the hydrological probability statistics from the West to take account of socio-economic situation in design flood, and they applied to Japanese rivers in 1958. The probability method was applied Japan to adapt the specific socio-economic and natural situation during the confusion after the war.
2011-01-01
Background Although many biological databases are applying semantic web technologies, meaningful biological hypothesis testing cannot be easily achieved. Database-driven high throughput genomic hypothesis testing requires both of the capabilities of obtaining semantically relevant experimental data and of performing relevant statistical testing for the retrieved data. Tissue Microarray (TMA) data are semantically rich and contains many biologically important hypotheses waiting for high throughput conclusions. Methods An application-specific ontology was developed for managing TMA and DNA microarray databases by semantic web technologies. Data were represented as Resource Description Framework (RDF) according to the framework of the ontology. Applications for hypothesis testing (Xperanto-RDF) for TMA data were designed and implemented by (1) formulating the syntactic and semantic structures of the hypotheses derived from TMA experiments, (2) formulating SPARQLs to reflect the semantic structures of the hypotheses, and (3) performing statistical test with the result sets returned by the SPARQLs. Results When a user designs a hypothesis in Xperanto-RDF and submits it, the hypothesis can be tested against TMA experimental data stored in Xperanto-RDF. When we evaluated four previously validated hypotheses as an illustration, all the hypotheses were supported by Xperanto-RDF. Conclusions We demonstrated the utility of high throughput biological hypothesis testing. We believe that preliminary investigation before performing highly controlled experiment can be benefited. PMID:21342584
Gamalo-Siebers, Margaret; Savic, Jasmina; Basu, Cynthia; Zhao, Xin; Gopalakrishnan, Mathangi; Gao, Aijun; Song, Guochen; Baygani, Simin; Thompson, Laura; Xia, H Amy; Price, Karen; Tiwari, Ram; Carlin, Bradley P
2017-07-01
Children represent a large underserved population of "therapeutic orphans," as an estimated 80% of children are treated off-label. However, pediatric drug development often faces substantial challenges, including economic, logistical, technical, and ethical barriers, among others. Among many efforts trying to remove these barriers, increased recent attention has been paid to extrapolation; that is, the leveraging of available data from adults or older age groups to draw conclusions for the pediatric population. The Bayesian statistical paradigm is natural in this setting, as it permits the combining (or "borrowing") of information across disparate sources, such as the adult and pediatric data. In this paper, authored by the pediatric subteam of the Drug Information Association Bayesian Scientific Working Group and Adaptive Design Working Group, we develop, illustrate, and provide suggestions on Bayesian statistical methods that could be used to design improved pediatric development programs that use all available information in the most efficient manner. A variety of relevant Bayesian approaches are described, several of which are illustrated through 2 case studies: extrapolating adult efficacy data to expand the labeling for Remicade to include pediatric ulcerative colitis and extrapolating adult exposure-response information for antiepileptic drugs to pediatrics. Copyright © 2017 John Wiley & Sons, Ltd.
Parametric study of the swimming performance of a fish robot propelled by a flexible caudal fin.
Low, K H; Chong, C W
2010-12-01
In this paper, we aim to study the swimming performance of fish robots by using a statistical approach. A fish robot employing a carangiform swimming mode had been used as an experimental platform for the performance study. The experiments conducted aim to investigate the effect of various design parameters on the thrust capability of the fish robot with a flexible caudal fin. The controllable parameters associated with the fin include frequency, amplitude of oscillation, aspect ratio and the rigidity of the caudal fin. The significance of these parameters was determined in the first set of experiments by using a statistical approach. A more detailed parametric experimental study was then conducted with only those significant parameters. As a result, the parametric study could be completed with a reduced number of experiments and time spent. With the obtained experimental result, we were able to understand the relationship between various parameters and a possible adjustment of parameters to obtain a higher thrust. The proposed statistical method for experimentation provides an objective and thorough analysis of the effects of individual or combinations of parameters on the swimming performance. Such an efficient experimental design helps to optimize the process and determine factors that influence variability.
Statistical power as a function of Cronbach alpha of instrument questionnaire items.
Heo, Moonseong; Kim, Namhee; Faith, Myles S
2015-10-14
In countless number of clinical trials, measurements of outcomes rely on instrument questionnaire items which however often suffer measurement error problems which in turn affect statistical power of study designs. The Cronbach alpha or coefficient alpha, here denoted by C(α), can be used as a measure of internal consistency of parallel instrument items that are developed to measure a target unidimensional outcome construct. Scale score for the target construct is often represented by the sum of the item scores. However, power functions based on C(α) have been lacking for various study designs. We formulate a statistical model for parallel items to derive power functions as a function of C(α) under several study designs. To this end, we assume fixed true score variance assumption as opposed to usual fixed total variance assumption. That assumption is critical and practically relevant to show that smaller measurement errors are inversely associated with higher inter-item correlations, and thus that greater C(α) is associated with greater statistical power. We compare the derived theoretical statistical power with empirical power obtained through Monte Carlo simulations for the following comparisons: one-sample comparison of pre- and post-treatment mean differences, two-sample comparison of pre-post mean differences between groups, and two-sample comparison of mean differences between groups. It is shown that C(α) is the same as a test-retest correlation of the scale scores of parallel items, which enables testing significance of C(α). Closed-form power functions and samples size determination formulas are derived in terms of C(α), for all of the aforementioned comparisons. Power functions are shown to be an increasing function of C(α), regardless of comparison of interest. The derived power functions are well validated by simulation studies that show that the magnitudes of theoretical power are virtually identical to those of the empirical power. Regardless of research designs or settings, in order to increase statistical power, development and use of instruments with greater C(α), or equivalently with greater inter-item correlations, is crucial for trials that intend to use questionnaire items for measuring research outcomes. Further development of the power functions for binary or ordinal item scores and under more general item correlation strutures reflecting more real world situations would be a valuable future study.
Gene set analysis approaches for RNA-seq data: performance evaluation and application guideline
Rahmatallah, Yasir; Emmert-Streib, Frank
2016-01-01
Transcriptome sequencing (RNA-seq) is gradually replacing microarrays for high-throughput studies of gene expression. The main challenge of analyzing microarray data is not in finding differentially expressed genes, but in gaining insights into the biological processes underlying phenotypic differences. To interpret experimental results from microarrays, gene set analysis (GSA) has become the method of choice, in particular because it incorporates pre-existing biological knowledge (in a form of functionally related gene sets) into the analysis. Here we provide a brief review of several statistically different GSA approaches (competitive and self-contained) that can be adapted from microarrays practice as well as those specifically designed for RNA-seq. We evaluate their performance (in terms of Type I error rate, power, robustness to the sample size and heterogeneity, as well as the sensitivity to different types of selection biases) on simulated and real RNA-seq data. Not surprisingly, the performance of various GSA approaches depends only on the statistical hypothesis they test and does not depend on whether the test was developed for microarrays or RNA-seq data. Interestingly, we found that competitive methods have lower power as well as robustness to the samples heterogeneity than self-contained methods, leading to poor results reproducibility. We also found that the power of unsupervised competitive methods depends on the balance between up- and down-regulated genes in tested gene sets. These properties of competitive methods have been overlooked before. Our evaluation provides a concise guideline for selecting GSA approaches, best performing under particular experimental settings in the context of RNA-seq. PMID:26342128
Shaikh, Masood Ali
2017-09-01
Assessment of research articles in terms of study designs used, statistical tests applied and the use of statistical analysis programmes help determine research activity profile and trends in the country. In this descriptive study, all original articles published by Journal of Pakistan Medical Association (JPMA) and Journal of the College of Physicians and Surgeons Pakistan (JCPSP), in the year 2015 were reviewed in terms of study designs used, application of statistical tests, and the use of statistical analysis programmes. JPMA and JCPSP published 192 and 128 original articles, respectively, in the year 2015. Results of this study indicate that cross-sectional study design, bivariate inferential statistical analysis entailing comparison between two variables/groups, and use of statistical software programme SPSS to be the most common study design, inferential statistical analysis, and statistical analysis software programmes, respectively. These results echo previously published assessment of these two journals for the year 2014.
Vetter, Thomas R
2017-11-01
Descriptive statistics are specific methods basically used to calculate, describe, and summarize collected research data in a logical, meaningful, and efficient way. Descriptive statistics are reported numerically in the manuscript text and/or in its tables, or graphically in its figures. This basic statistical tutorial discusses a series of fundamental concepts about descriptive statistics and their reporting. The mean, median, and mode are 3 measures of the center or central tendency of a set of data. In addition to a measure of its central tendency (mean, median, or mode), another important characteristic of a research data set is its variability or dispersion (ie, spread). In simplest terms, variability is how much the individual recorded scores or observed values differ from one another. The range, standard deviation, and interquartile range are 3 measures of variability or dispersion. The standard deviation is typically reported for a mean, and the interquartile range for a median. Testing for statistical significance, along with calculating the observed treatment effect (or the strength of the association between an exposure and an outcome), and generating a corresponding confidence interval are 3 tools commonly used by researchers (and their collaborating biostatistician or epidemiologist) to validly make inferences and more generalized conclusions from their collected data and descriptive statistics. A number of journals, including Anesthesia & Analgesia, strongly encourage or require the reporting of pertinent confidence intervals. A confidence interval can be calculated for virtually any variable or outcome measure in an experimental, quasi-experimental, or observational research study design. Generally speaking, in a clinical trial, the confidence interval is the range of values within which the true treatment effect in the population likely resides. In an observational study, the confidence interval is the range of values within which the true strength of the association between the exposure and the outcome (eg, the risk ratio or odds ratio) in the population likely resides. There are many possible ways to graphically display or illustrate different types of data. While there is often latitude as to the choice of format, ultimately, the simplest and most comprehensible format is preferred. Common examples include a histogram, bar chart, line chart or line graph, pie chart, scatterplot, and box-and-whisker plot. Valid and reliable descriptive statistics can answer basic yet important questions about a research data set, namely: "Who, What, Why, When, Where, How, How Much?"
Comparison of fMRI analysis methods for heterogeneous BOLD responses in block design studies
Bernal-Casas, David; Fang, Zhongnan; Lee, Jin Hyung
2017-01-01
A large number of fMRI studies have shown that the temporal dynamics of evoked BOLD responses can be highly heterogeneous. Failing to model heterogeneous responses in statistical analysis can lead to significant errors in signal detection and characterization and alter the neurobiological interpretation. However, to date it is not clear that, out of a large number of options, which methods are robust against variability in the temporal dynamics of BOLD responses in block-design studies. Here, we used rodent optogenetic fMRI data with heterogeneous BOLD responses and simulations guided by experimental data as a means to investigate different analysis methods’ performance against heterogeneous BOLD responses. Evaluations are carried out within the general linear model (GLM) framework and consist of standard basis sets as well as independent component analysis (ICA). Analyses show that, in the presence of heterogeneous BOLD responses, conventionally used GLM with a canonical basis set leads to considerable errors in the detection and characterization of BOLD responses. Our results suggest that the 3rd and 4th order gamma basis sets, the 7th to 9th order finite impulse response (FIR) basis sets, the 5th to 9th order B-spline basis sets, and the 2nd to 5th order Fourier basis sets are optimal for good balance between detection and characterization, while the 1st order Fourier basis set (coherence analysis) used in our earlier studies show good detection capability. ICA has mostly good detection and characterization capabilities, but detects a large volume of spurious activation with the control fMRI data. PMID:27993672
NASA Technical Reports Server (NTRS)
Gerberich, Matthew W.; Oleson, Steven R.
2013-01-01
The Collaborative Modeling for Parametric Assessment of Space Systems (COMPASS) team at Glenn Research Center has performed integrated system analysis of conceptual spacecraft mission designs since 2006 using a multidisciplinary concurrent engineering process. The set of completed designs was archived in a database, to allow for the study of relationships between design parameters. Although COMPASS uses a parametric spacecraft costing model, this research investigated the possibility of using a top-down approach to rapidly estimate the overall vehicle costs. This paper presents the relationships between significant design variables, including breakdowns of dry mass, wet mass, and cost. It also develops a model for a broad estimate of these parameters through basic mission characteristics, including the target location distance, the payload mass, the duration, the delta-v requirement, and the type of mission, propulsion, and electrical power. Finally, this paper examines the accuracy of this model in regards to past COMPASS designs, with an assessment of outlying spacecraft, and compares the results to historical data of completed NASA missions.
Failure Mode Identification Through Clustering Analysis
NASA Technical Reports Server (NTRS)
Arunajadai, Srikesh G.; Stone, Robert B.; Tumer, Irem Y.; Clancy, Daniel (Technical Monitor)
2002-01-01
Research has shown that nearly 80% of the costs and problems are created in product development and that cost and quality are essentially designed into products in the conceptual stage. Currently, failure identification procedures (such as FMEA (Failure Modes and Effects Analysis), FMECA (Failure Modes, Effects and Criticality Analysis) and FTA (Fault Tree Analysis)) and design of experiments are being used for quality control and for the detection of potential failure modes during the detail design stage or post-product launch. Though all of these methods have their own advantages, they do not give information as to what are the predominant failures that a designer should focus on while designing a product. This work uses a functional approach to identify failure modes, which hypothesizes that similarities exist between different failure modes based on the functionality of the product/component. In this paper, a statistical clustering procedure is proposed to retrieve information on the set of predominant failures that a function experiences. The various stages of the methodology are illustrated using a hypothetical design example.
Magic Mirror, On the Wall-Which Is the Right Study Design of Them All?-Part II.
Vetter, Thomas R
2017-07-01
The assessment of a new or existing treatment or other intervention typically answers 1 of 3 central research-related questions: (1) "Can it work?" (efficacy); (2) "Does it work?" (effectiveness); or (3) "Is it worth it?" (efficiency or cost-effectiveness). There are a number of study designs that, on a situational basis, are appropriate to apply in conducting research. These study designs are generally classified as experimental, quasiexperimental, or observational, with observational studies being further divided into descriptive and analytic categories. This second of a 2-part statistical tutorial reviews these 3 salient research questions and describes a subset of the most common types of observational study designs. Attention is focused on the strengths and weaknesses of each study design to assist in choosing which is appropriate for a given study objective and hypothesis as well as the particular study setting and available resources and data. Specific studies and papers are highlighted as examples of a well-chosen, clearly stated, and properly executed study design type.
Magic Mirror, on the Wall-Which Is the Right Study Design of Them All?-Part I.
Vetter, Thomas R
2017-06-01
The assessment of a new or existing treatment or intervention typically answers 1 of 3 research-related questions: (1) "Can it work?" (efficacy); (2) "Does it work?" (effectiveness); and (3) "Is it worth it?" (efficiency or cost-effectiveness). There are a number of study designs that on a situational basis are appropriate to apply in conducting research. These study designs are classified as experimental, quasi-experimental, or observational, with observational studies being further divided into descriptive and analytic categories. This first of a 2-part statistical tutorial reviews these 3 salient research questions and describes a subset of the most common types of experimental and quasi-experimental study design. Attention is focused on the strengths and weaknesses of each study design to assist in choosing which is appropriate for a given study objective and hypothesis as well as the particular study setting and available resources and data. Specific studies and papers are highlighted as examples of a well-chosen, clearly stated, and properly executed study design type.
Qu, Conghui; Schuetz, Johanna M.; Min, Jeong Eun; Leach, Stephen; Daley, Denise; Spinelli, John J.; Brooks-Wilson, Angela; Graham, Jinko
2011-01-01
We describe a statistical approach to predict gender-labeling errors in candidate-gene association studies, when Y-chromosome markers have not been included in the genotyping set. The approach adds value to methods that consider only the heterozygosity of X-chromosome SNPs, by incorporating available information about the intensity of X-chromosome SNPs in candidate genes relative to autosomal SNPs from the same individual. To our knowledge, no published methods formalize a framework in which heterozygosity and relative intensity are simultaneously taken into account. Our method offers the advantage that, in the genotyping set, no additional space is required beyond that already assigned to X-chromosome SNPs in the candidate genes. We also show how the predictions can be used in a two-phase sampling design to estimate the gender-labeling error rates for an entire study, at a fraction of the cost of a conventional design. PMID:22303327
System engineering toolbox for design-oriented engineers
NASA Technical Reports Server (NTRS)
Goldberg, B. E.; Everhart, K.; Stevens, R.; Babbitt, N., III; Clemens, P.; Stout, L.
1994-01-01
This system engineering toolbox is designed to provide tools and methodologies to the design-oriented systems engineer. A tool is defined as a set of procedures to accomplish a specific function. A methodology is defined as a collection of tools, rules, and postulates to accomplish a purpose. For each concept addressed in the toolbox, the following information is provided: (1) description, (2) application, (3) procedures, (4) examples, if practical, (5) advantages, (6) limitations, and (7) bibliography and/or references. The scope of the document includes concept development tools, system safety and reliability tools, design-related analytical tools, graphical data interpretation tools, a brief description of common statistical tools and methodologies, so-called total quality management tools, and trend analysis tools. Both relationship to project phase and primary functional usage of the tools are also delineated. The toolbox also includes a case study for illustrative purposes. Fifty-five tools are delineated in the text.
2015-06-30
7. Building Statistical Metamodels using Simulation Experimental Designs ............................................... 34 7.1. Statistical Design...system design drivers across several different domain models, our methodology uses statistical metamodeling to approximate the simulations’ behavior. A...output. We build metamodels using a number of statistical methods that include stepwise regression, boosted trees, neural nets, and bootstrap forest
2015-06-01
7. Building Statistical Metamodels using Simulation Experimental Designs ............................................... 34 7.1. Statistical Design...system design drivers across several different domain models, our methodology uses statistical metamodeling to approximate the simulations’ behavior. A...output. We build metamodels using a number of statistical methods that include stepwise regression, boosted trees, neural nets, and bootstrap forest
Research design and statistical methods in Pakistan Journal of Medical Sciences (PJMS)
Akhtar, Sohail; Shah, Syed Wadood Ali; Rafiq, M.; Khan, Ajmal
2016-01-01
Objective: This article compares the study design and statistical methods used in 2005, 2010 and 2015 of Pakistan Journal of Medical Sciences (PJMS). Methods: Only original articles of PJMS were considered for the analysis. The articles were carefully reviewed for statistical methods and designs, and then recorded accordingly. The frequency of each statistical method and research design was estimated and compared with previous years. Results: A total of 429 articles were evaluated (n=74 in 2005, n=179 in 2010, n=176 in 2015) in which 171 (40%) were cross-sectional and 116 (27%) were prospective study designs. A verity of statistical methods were found in the analysis. The most frequent methods include: descriptive statistics (n=315, 73.4%), chi-square/Fisher’s exact tests (n=205, 47.8%) and student t-test (n=186, 43.4%). There was a significant increase in the use of statistical methods over time period: t-test, chi-square/Fisher’s exact test, logistic regression, epidemiological statistics, and non-parametric tests. Conclusion: This study shows that a diverse variety of statistical methods have been used in the research articles of PJMS and frequency improved from 2005 to 2015. However, descriptive statistics was the most frequent method of statistical analysis in the published articles while cross-sectional study design was common study design. PMID:27022365
Tra, Yolande V; Evans, Irene M
2010-01-01
BIO2010 put forth the goal of improving the mathematical educational background of biology students. The analysis and interpretation of microarray high-dimensional data can be very challenging and is best done by a statistician and a biologist working and teaching in a collaborative manner. We set up such a collaboration and designed a course on microarray data analysis. We started using Genome Consortium for Active Teaching (GCAT) materials and Microarray Genome and Clustering Tool software and added R statistical software along with Bioconductor packages. In response to student feedback, one microarray data set was fully analyzed in class, starting from preprocessing to gene discovery to pathway analysis using the latter software. A class project was to conduct a similar analysis where students analyzed their own data or data from a published journal paper. This exercise showed the impact that filtering, preprocessing, and different normalization methods had on gene inclusion in the final data set. We conclude that this course achieved its goals to equip students with skills to analyze data from a microarray experiment. We offer our insight about collaborative teaching as well as how other faculty might design and implement a similar interdisciplinary course.
Evans, Irene M.
2010-01-01
BIO2010 put forth the goal of improving the mathematical educational background of biology students. The analysis and interpretation of microarray high-dimensional data can be very challenging and is best done by a statistician and a biologist working and teaching in a collaborative manner. We set up such a collaboration and designed a course on microarray data analysis. We started using Genome Consortium for Active Teaching (GCAT) materials and Microarray Genome and Clustering Tool software and added R statistical software along with Bioconductor packages. In response to student feedback, one microarray data set was fully analyzed in class, starting from preprocessing to gene discovery to pathway analysis using the latter software. A class project was to conduct a similar analysis where students analyzed their own data or data from a published journal paper. This exercise showed the impact that filtering, preprocessing, and different normalization methods had on gene inclusion in the final data set. We conclude that this course achieved its goals to equip students with skills to analyze data from a microarray experiment. We offer our insight about collaborative teaching as well as how other faculty might design and implement a similar interdisciplinary course. PMID:20810954
A randomized clinical trial of buprenorphine for prisoners: Findings at 12-months post-release
Gordon, Michael S.; Kinlock, Timothy W.; Schwartz, Robert P.; O’Grady, Kevin E.; Fitzgerald, Terrence T.; Vocci, Frank J.
2017-01-01
Background This study examined whether starting buprenorphine treatment prior to prison and after release from prison would be associated with better drug treatment outcomes and whether males and females responded differently to the combination of in-prison treatment and post-release service setting. Methods Study design was a 2 (In-Prison Treatment Condition: Buprenorphine Treatment Vs. Counseling Only) X 2 [Post-Release Service Setting Condition: Opioid Treatment Program (OTP) Vs. Community Health Center (CHC)] X 2 (Gender) factorial design. The trial was conducted between September 2008 and July 2012. Follow-up assessments were completed in 2014. Participants were recruited from two Baltimore prerelease prisons (one for men and one for women). Adult pre-release prisoners who were heroin-dependent during the year prior to incarceration were eligible. Post-release assessments were conducted at 1, 3, 6, and 12-month following prison release. Results Participants (N=211) in the in-prison treatment condition effect had a higher mean number of days of community buprenorphine treatment compared to the condition in which participants initiated medication after release (P=.005). However, there were no statistically significant hypothesized effects for the in-prison treatment condition in terms of: days of heroin use and crime, and opioid and cocaine positive urine screening test results (all Ps>.14) and no statistically significant hypothesized gender effects (all Ps>.18). Conclusions Although initiating buprenorphine treatment in prison compared to after-release was associated with more days receiving buprenorphine treatment in the designated community treatment program during the 12-months post-release assessment, it was not associated with superior outcomes in terms of heroin and cocaine use and criminal behavior. PMID:28107680
Templeton, A. R.; Sing, C. F.
1993-01-01
We previously developed an analytical strategy based on cladistic theory to identify subsets of haplotypes that are associated with significant phenotypic deviations. Our initial approach was limited to segments of DNA in which little recombination occurs. In such cases, a cladogram can be constructed from the restriction site data to estimate the evolutionary steps that interrelate the observed haplotypes to one another. The cladogram is then used to define a nested statistical design for identifying mutational steps associated with significant phenotypic deviations. The central assumption behind this strategy is that a mutation responsible for a particular phenotypic effect is embedded within the evolutionary history that is represented by the cladogram. The power of this approach depends on the accuracy of the cladogram in portraying the evolutionary history of the DNA region. This accuracy can be diminished both by recombination and by uncertainty in the estimated cladogram topology. In a previous paper, we presented an algorithm for estimating the set of likely cladograms and recombination events. In this paper we present an algorithm for defining a nested statistical design under cladogram uncertainty and recombination. Given the nested design, phenotypic associations can be examined using either a nested analysis of variance (for haploids or homozygous strains) or permutation testing (for outcrossed, diploid gene regions). In this paper we also extend this analytical strategy to include categorical phenotypes in addition to quantitative phenotypes. Some worked examples are presented using Drosophila data sets. These examples illustrate that having some recombination may actually enhance the biological inferences that may derived from a cladistic analysis. In particular, recombination can be used to assign a physical localization to a given subregion for mutations responsible for significant phenotypic effects. PMID:8100789
A randomized clinical trial of buprenorphine for prisoners: Findings at 12-months post-release.
Gordon, Michael S; Kinlock, Timothy W; Schwartz, Robert P; O'Grady, Kevin E; Fitzgerald, Terrence T; Vocci, Frank J
2017-03-01
This study examined whether starting buprenorphine treatment prior to prison and after release from prison would be associated with better drug treatment outcomes and whether males and females responded differently to the combination of in-prison treatment and post-release service setting. Study design was a 2 (In-Prison Treatment: Condition: Buprenorphine Treatment: vs. Counseling Only)×2 [Post-Release Service Setting Condition: Opioid Treatment: Program (OTP) vs. Community Health Center (CHC)]×2 (Gender) factorial design. The trial was conducted between September 2008 and July 2012. Follow-up assessments were completed in 2014. Participants were recruited from two Baltimore pre-release prisons (one for men and one for women). Adult pre-release prisoners who were heroin-dependent during the year prior to incarceration were eligible. Post-release assessments were conducted at 1, 3, 6, and 12-month following prison release. Participants (N=211) in the in-prison treatment condition effect had a higher mean number of days of community buprenorphine treatment compared to the condition in which participants initiated medication after release (P=0.005). However, there were no statistically significant hypothesized effects for the in-prison treatment condition in terms of: days of heroin use and crime, and opioid and cocaine positive urine screening test results (all Ps>0.14) and no statistically significant hypothesized gender effects (all Ps>0.18). Although initiating buprenorphine treatment in prison compared to after-release was associated with more days receiving buprenorphine treatment in the designated community treatment program during the 12-months post-release assessment, it was not associated with superior outcomes in terms of heroin and cocaine use and criminal behavior. Copyright © 2017 Elsevier B.V. All rights reserved.
Excoffier, Laurent; Lischer, Heidi E L
2010-05-01
We present here a new version of the Arlequin program available under three different forms: a Windows graphical version (Winarl35), a console version of Arlequin (arlecore), and a specific console version to compute summary statistics (arlsumstat). The command-line versions run under both Linux and Windows. The main innovations of the new version include enhanced outputs in XML format, the possibility to embed graphics displaying computation results directly into output files, and the implementation of a new method to detect loci under selection from genome scans. Command-line versions are designed to handle large series of files, and arlsumstat can be used to generate summary statistics from simulated data sets within an Approximate Bayesian Computation framework. © 2010 Blackwell Publishing Ltd.
Measurement of positive direct current corona pulse in coaxial wire-cylinder gap
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yin, Han, E-mail: hanyin1986@gmail.com; Zhang, Bo, E-mail: shizbcn@mail.tsinghua.edu.cn; He, Jinliang, E-mail: hejl@tsinghua.edu.cn
In this paper, a system is designed and developed to measure the positive corona current in coaxial wire-cylinder gaps. The characteristic parameters of corona current pulses, such as the amplitude, rise time, half-wave time, and repetition frequency, are statistically analyzed and a new set of empirical formulas are derived by numerical fitting. The influence of space charges on corona currents is tested by using three corona cages with different radii. A numerical method is used to solve a simplified ion-flow model to explain the influence of space charges. Based on the statistical results, a stochastic model is developed to simulatemore » the corona pulse trains. And this model is verified by comparing the simulated frequency-domain responses with the measured ones.« less
Olokundun, Maxwell; Iyiola, Oluwole; Ibidunni, Stephen; Ogbari, Mercy; Falola, Hezekiah; Salau, Odunayo; Peter, Fred; Borishade, Taiye
2018-06-01
The article presented data on the effectiveness of entrepreneurship curriculum contents on university students' entrepreneurial interest and knowledge. The study focused on the perceptions of Nigerian university students. Emphasis was laid on the first four universities in Nigeria to offer a degree programme in entrepreneurship. The study adopted quantitative approach with a descriptive research design to establish trends related to the objective of the study. Survey was be used as quantitative research method. The population of this study included all students in the selected universities. Data was analyzed with the use of Statistical Package for Social Sciences (SPSS). Mean score was used as statistical tool of analysis. The field data set is made widely accessible to enable critical or a more comprehensive investigation.
NASA Astrophysics Data System (ADS)
Ahn, Chul Kyun; Heo, Changyong; Jin, Heongmin; Kim, Jong Hyo
2017-03-01
Mammographic breast density is a well-established marker for breast cancer risk. However, accurate measurement of dense tissue is a difficult task due to faint contrast and significant variations in background fatty tissue. This study presents a novel method for automated mammographic density estimation based on Convolutional Neural Network (CNN). A total of 397 full-field digital mammograms were selected from Seoul National University Hospital. Among them, 297 mammograms were randomly selected as a training set and the rest 100 mammograms were used for a test set. We designed a CNN architecture suitable to learn the imaging characteristic from a multitudes of sub-images and classify them into dense and fatty tissues. To train the CNN, not only local statistics but also global statistics extracted from an image set were used. The image set was composed of original mammogram and eigen-image which was able to capture the X-ray characteristics in despite of the fact that CNN is well known to effectively extract features on original image. The 100 test images which was not used in training the CNN was used to validate the performance. The correlation coefficient between the breast estimates by the CNN and those by the expert's manual measurement was 0.96. Our study demonstrated the feasibility of incorporating the deep learning technology into radiology practice, especially for breast density estimation. The proposed method has a potential to be used as an automated and quantitative assessment tool for mammographic breast density in routine practice.
Maguire, Michelle; Bennett, Marialice S.
2015-01-01
Objective. To determine the impact of an elective course on students’ perception of opportunities and of their preparedness for patient care in community and ambulatory pharmacy settings. Design. Each course meeting included a lecture and discussion to introduce concepts and active-learning activities to apply concepts to patient care or practice development in a community or ambulatory pharmacy setting. Assessment. A survey was administered to students before and after the course. Descriptive statistics were used to assess student responses to survey questions, and Wilcoxon signed rank tests were used to analyze the improvement in student responses with an alpha level set at 0.05. Students felt more prepared to provide patient care, develop or improve a clinical service, and effectively communicate recommendations to other health care providers after course completion. Conclusion. This elective course equipped students with the skills necessary to increase their confidence in providing patient care services in community and ambulatory settings. PMID:27168617
Set Shifting Training with Categorization Tasks
Soveri, Anna; Waris, Otto; Laine, Matti
2013-01-01
The very few cognitive training studies targeting an important executive function, set shifting, have reported performance improvements that also generalized to untrained tasks. The present randomized controlled trial extends set shifting training research by comparing previously used cued training with uncued training. A computerized adaptation of the Wisconsin Card Sorting Test was utilized as the training task in a pretest-posttest experimental design involving three groups of university students. One group received uncued training (n = 14), another received cued training (n = 14) and the control group (n = 14) only participated in pre- and posttests. The uncued training group showed posttraining performance increases on their training task, but neither training group showed statistically significant transfer effects. Nevertheless, comparison of effect sizes for transfer effects indicated that our results did not differ significantly from the previous studies. Our results suggest that the cognitive effects of computerized set shifting training are mostly task-specific, and would preclude any robust generalization effects with this training. PMID:24324717
Lung segmentation from HRCT using united geometric active contours
NASA Astrophysics Data System (ADS)
Liu, Junwei; Li, Chuanfu; Xiong, Jin; Feng, Huanqing
2007-12-01
Accurate lung segmentation from high resolution CT images is a challenging task due to various detail tracheal structures, missing boundary segments and complex lung anatomy. One popular method is based on gray-level threshold, however its results are usually rough. A united geometric active contours model based on level set is proposed for lung segmentation in this paper. Particularly, this method combines local boundary information and region statistical-based model synchronously: 1) Boundary term ensures the integrality of lung tissue.2) Region term makes the level set function evolve with global characteristic and independent on initial settings. A penalizing energy term is introduced into the model, which forces the level set function evolving without re-initialization. The method is found to be much more efficient in lung segmentation than other methods that are only based on boundary or region. Results are shown by 3D lung surface reconstruction, which indicates that the method will play an important role in the design of computer-aided diagnostic (CAD) system.
The Effect of Mental Rotation on Surgical Pathological Diagnosis.
Park, Heejung; Kim, Hyun Soo; Cha, Yoon Jin; Choi, Junjeong; Minn, Yangki; Kim, Kyung Sik; Kim, Se Hoon
2018-05-01
Pathological diagnosis involves very delicate and complex consequent processing that is conducted by a pathologist. The recognition of false patterns might be an important cause of misdiagnosis in the field of surgical pathology. In this study, we evaluated the influence of visual and cognitive bias in surgical pathologic diagnosis, focusing on the influence of "mental rotation." We designed three sets of the same images of uterine cervix biopsied specimens (original, left to right mirror images, and 180-degree rotated images), and recruited 32 pathologists to diagnose the 3 set items individually. First, the items found to be adequate for analysis by classical test theory, Generalizability theory, and item response theory. The results showed statistically no differences in difficulty, discrimination indices, and response duration time between the image sets. Mental rotation did not influence the pathologists' diagnosis in practice. Interestingly, outliers were more frequent in rotated image sets, suggesting that the mental rotation process may influence the pathological diagnoses of a few individual pathologists. © Copyright: Yonsei University College of Medicine 2018.
NASA Astrophysics Data System (ADS)
Kehres, Jan; Lyksborg, Mark; Olsen, Ulrik L.
2017-09-01
Energy dispersive X-ray diffraction (EDXRD) can be applied for identification of liquid threats in luggage scanning in security applications. To define the instrumental design, the framework for data reduction and analysis and test the performance of the threat detection in various scenarios, a flexible laboratory EDXRD test setup was build. A data set of overall 570 EDXRD spectra has been acquired for training and testing of threat identification algorithms. The EDXRD data was acquired with limited count statistics and at multiple detector angles and merged after correction and normalization. Initial testing of the threat detection algorithms with this data set indicate the feasibility of detection levels of > 95 % true positive with < 6 % false positive alarms.
Providing peak river flow statistics and forecasting in the Niger River basin
NASA Astrophysics Data System (ADS)
Andersson, Jafet C. M.; Ali, Abdou; Arheimer, Berit; Gustafsson, David; Minoungou, Bernard
2017-08-01
Flooding is a growing concern in West Africa. Improved quantification of discharge extremes and associated uncertainties is needed to improve infrastructure design, and operational forecasting is needed to provide timely warnings. In this study, we use discharge observations, a hydrological model (Niger-HYPE) and extreme value analysis to estimate peak river flow statistics (e.g. the discharge magnitude with a 100-year return period) across the Niger River basin. To test the model's capacity of predicting peak flows, we compared 30-year maximum discharge and peak flow statistics derived from the model vs. derived from nine observation stations. The results indicate that the model simulates peak discharge reasonably well (on average + 20%). However, the peak flow statistics have a large uncertainty range, which ought to be considered in infrastructure design. We then applied the methodology to derive basin-wide maps of peak flow statistics and their associated uncertainty. The results indicate that the method is applicable across the hydrologically active part of the river basin, and that the uncertainty varies substantially depending on location. Subsequently, we used the most recent bias-corrected climate projections to analyze potential changes in peak flow statistics in a changed climate. The results are generally ambiguous, with consistent changes only in very few areas. To test the forecasting capacity, we ran Niger-HYPE with a combination of meteorological data sets for the 2008 high-flow season and compared with observations. The results indicate reasonable forecasting capacity (on average 17% deviation), but additional years should also be evaluated. We finish by presenting a strategy and pilot project which will develop an operational flood monitoring and forecasting system based in-situ data, earth observations, modelling, and extreme statistics. In this way we aim to build capacity to ultimately improve resilience toward floods, protecting lives and infrastructure in the region.
Shen, Shihao; Park, Juw Won; Lu, Zhi-xiang; Lin, Lan; Henry, Michael D; Wu, Ying Nian; Zhou, Qing; Xing, Yi
2014-12-23
Ultra-deep RNA sequencing (RNA-Seq) has become a powerful approach for genome-wide analysis of pre-mRNA alternative splicing. We previously developed multivariate analysis of transcript splicing (MATS), a statistical method for detecting differential alternative splicing between two RNA-Seq samples. Here we describe a new statistical model and computer program, replicate MATS (rMATS), designed for detection of differential alternative splicing from replicate RNA-Seq data. rMATS uses a hierarchical model to simultaneously account for sampling uncertainty in individual replicates and variability among replicates. In addition to the analysis of unpaired replicates, rMATS also includes a model specifically designed for paired replicates between sample groups. The hypothesis-testing framework of rMATS is flexible and can assess the statistical significance over any user-defined magnitude of splicing change. The performance of rMATS is evaluated by the analysis of simulated and real RNA-Seq data. rMATS outperformed two existing methods for replicate RNA-Seq data in all simulation settings, and RT-PCR yielded a high validation rate (94%) in an RNA-Seq dataset of prostate cancer cell lines. Our data also provide guiding principles for designing RNA-Seq studies of alternative splicing. We demonstrate that it is essential to incorporate biological replicates in the study design. Of note, pooling RNAs or merging RNA-Seq data from multiple replicates is not an effective approach to account for variability, and the result is particularly sensitive to outliers. The rMATS source code is freely available at rnaseq-mats.sourceforge.net/. As the popularity of RNA-Seq continues to grow, we expect rMATS will be useful for studies of alternative splicing in diverse RNA-Seq projects.
The power and promise of RNA-seq in ecology and evolution.
Todd, Erica V; Black, Michael A; Gemmell, Neil J
2016-03-01
Reference is regularly made to the power of new genomic sequencing approaches. Using powerful technology, however, is not the same as having the necessary power to address a research question with statistical robustness. In the rush to adopt new and improved genomic research methods, limitations of technology and experimental design may be initially neglected. Here, we review these issues with regard to RNA sequencing (RNA-seq). RNA-seq adds large-scale transcriptomics to the toolkit of ecological and evolutionary biologists, enabling differential gene expression (DE) studies in nonmodel species without the need for prior genomic resources. High biological variance is typical of field-based gene expression studies and means that larger sample sizes are often needed to achieve the same degree of statistical power as clinical studies based on data from cell lines or inbred animal models. Sequencing costs have plummeted, yet RNA-seq studies still underutilize biological replication. Finite research budgets force a trade-off between sequencing effort and replication in RNA-seq experimental design. However, clear guidelines for negotiating this trade-off, while taking into account study-specific factors affecting power, are currently lacking. Study designs that prioritize sequencing depth over replication fail to capitalize on the power of RNA-seq technology for DE inference. Significant recent research effort has gone into developing statistical frameworks and software tools for power analysis and sample size calculation in the context of RNA-seq DE analysis. We synthesize progress in this area and derive an accessible rule-of-thumb guide for designing powerful RNA-seq experiments relevant in eco-evolutionary and clinical settings alike. © 2016 John Wiley & Sons Ltd.
A novel method to estimate the affinity of HLA-A∗0201 restricted CTL epitope
NASA Astrophysics Data System (ADS)
Xu, Yun-sheng; Lin, Yong; Zhu, Bo; Lin, Zhi-hua
2009-02-01
A set of 70 peptides with affinity for the class I MHC HLA-A∗0201 molecule was subjected to quantitative structure-affinity relationship studies based on the SCORE function with good results ( r2 = 0.6982, RMS = 0.280). Then the 'leave-one-out' cross-validation (LOO-CV) and an outer test set including 18 outer samples were used to validate the QSAR model. The results of the LOO-CV were q2 = 0.6188, RMS = 0.315, and the results of outer test set were r2 = 0.5633, RMS = 0.2292. All these show that the QSAR model has good predictability. Statistical analysis showed that the hydrophobic and hydrogen bond interaction played a significant role in peptide-MHC molecule binding. The study also provided useful information for structure modification of CTL epitope, and laid theoretical base for molecular design of therapeutic vaccine.
Addressing group dynamics in a brief motivational intervention for college student drinkers.
Faris, Alexander S; Brown, Janice M
2003-01-01
Previous research indicates that brief motivational interventions for college student drinkers may be less effective in group settings than individual settings. Social psychological theories about counterproductive group dynamics may partially explain this finding. The present study examined potential problems with group motivational interventions by comparing outcomes from a standard group motivational intervention (SGMI; n = 25), an enhanced group motivational intervention (EGMI; n = 27) designed to suppress counterproductive processes, and a no intervention control (n = 23). SGMI and EGMI participants reported disruptive group dynamics as evidenced by low elaboration likelihood, production blocking, and social loafing, though the level of disturbance was significantly lower for EGMI individuals (p = .001). Despite counteracting group dynamics in the EGMI condition, participants in the two interventions were statistically similar in post-intervention problem recognition and future drinking intentions. The results raise concerns over implementing individually-based interventions in group settings without making necessary adjustments.
ERIC Educational Resources Information Center
Williams, Amanda S.
2015-01-01
Statistics anxiety is a common problem for graduate students. This study explores the multivariate relationship between a set of worry-related variables and six types of statistics anxiety. Canonical correlation analysis indicates a significant relationship between the two sets of variables. Findings suggest that students who are more intolerant…
NASA Technical Reports Server (NTRS)
Hughes, William O.; McNelis, Anne M.
2010-01-01
The Earth Observing System (EOS) Terra spacecraft was launched on an Atlas IIAS launch vehicle on its mission to observe planet Earth in late 1999. Prior to launch, the new design of the spacecraft's pyroshock separation system was characterized by a series of 13 separation ground tests. The analysis methods used to evaluate this unusually large amount of shock data will be discussed in this paper, with particular emphasis on population distributions and finding statistically significant families of data, leading to an overall shock separation interface level. The wealth of ground test data also allowed a derivation of a Mission Assurance level for the flight. All of the flight shock measurements were below the EOS Terra Mission Assurance level thus contributing to the overall success of the EOS Terra mission. The effectiveness of the statistical methodology for characterizing the shock interface level and for developing a flight Mission Assurance level from a large sample size of shock data is demonstrated in this paper.
Statistical process control using optimized neural networks: a case study.
Addeh, Jalil; Ebrahimzadeh, Ata; Azarbad, Milad; Ranaee, Vahid
2014-09-01
The most common statistical process control (SPC) tools employed for monitoring process changes are control charts. A control chart demonstrates that the process has altered by generating an out-of-control signal. This study investigates the design of an accurate system for the control chart patterns (CCPs) recognition in two aspects. First, an efficient system is introduced that includes two main modules: feature extraction module and classifier module. In the feature extraction module, a proper set of shape features and statistical feature are proposed as the efficient characteristics of the patterns. In the classifier module, several neural networks, such as multilayer perceptron, probabilistic neural network and radial basis function are investigated. Based on an experimental study, the best classifier is chosen in order to recognize the CCPs. Second, a hybrid heuristic recognition system is introduced based on cuckoo optimization algorithm (COA) algorithm to improve the generalization performance of the classifier. The simulation results show that the proposed algorithm has high recognition accuracy. Copyright © 2013 ISA. Published by Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Arena, Dylan A.; Schwartz, Daniel L.
2014-08-01
Well-designed digital games can deliver powerful experiences that are difficult to provide through traditional instruction, while traditional instruction can deliver formal explanations that are not a natural fit for gameplay. Combined, they can accomplish more than either can alone. An experiment tested this claim using the topic of statistics, where people's everyday experiences often conflict with normative statistical theories and a videogame might provide an alternate set of experiences for students to draw upon. The research used a game called Stats Invaders!, a variant of the classic videogame Space Invaders. In Stats Invaders!, the locations of descending alien invaders follow probability distributions, and players need to infer the shape of the distributions to play well. The experiment tested whether the game developed participants' intuitions about the structure of random events and thereby prepared them for future learning from a subsequent written passage on probability distributions. Community-college students who played the game and then read the passage learned more than participants who only read the passage.
The (Ir)Responsibility of (Under)Estimating Missing Data.
Fernández-García, María P; Vallejo-Seco, Guillermo; Livácic-Rojas, Pablo; Tuero-Herrero, Ellian
2018-01-01
It is practically impossible to avoid losing data in the course of an investigation, and it has been proven that the consequences can reach such magnitude that they could even invalidate the results of the study. This paper describes some of the most likely causes of missing data in research in the field of clinical psychology and the consequences they may have on statistical and substantive inferences. When it is necessary to recover the missing information, analyzing the data can become extremely complex. We summarize the experts' recommendations regarding the most powerful procedures for performing this task, the advantages each one has over the others, the elements that can or should influence our choice, and the procedures that are not a recommended option except in very exceptional cases. We conclude by offering four pieces of advice, on which all the experts agree and to which we must attend at all times in order to proceed with the greatest possible success. Finally, we show the pernicious effects produced by missing data on the statistical result and on the substantive or clinical conclusions. For this purpose we have planned to lose data in different percentage rates under two mechanisms of loss of data, MCAR and MAR in the complete data set of two very different real researchs, and we proceed to analyze the set of the available data, listwise deletion. One study is carried out using a quasi-experimental non-equivalent control group design, and another study using a experimental design completely randomized.
The application of latent curve analysis to testing developmental theories in intervention research.
Curran, P J; Muthén, B O
1999-08-01
The effectiveness of a prevention or intervention program has traditionally been assessed using time-specific comparisons of mean levels between the treatment and the control groups. However, many times the behavior targeted by the intervention is naturally developing over time, and the goal of the treatment is to alter this natural or normative developmental trajectory. Examining time-specific mean levels can be both limiting and potentially misleading when the behavior of interest is developing systematically over time. It is argued here that there are both theoretical and statistical advantages associated with recasting intervention treatment effects in terms of normative and altered developmental trajectories. The recently developed technique of latent curve (LC) analysis is reviewed and extended to a true experimental design setting in which subjects are randomly assigned to a treatment intervention or a control condition. LC models are applied to both artificially generated and real intervention data sets to evaluate the efficacy of an intervention program. Not only do the LC models provide a more comprehensive understanding of the treatment and control group developmental processes compared to more traditional fixed-effects models, but LC models have greater statistical power to detect a given treatment effect. Finally, the LC models are modified to allow for the computation of specific power estimates under a variety of conditions and assumptions that can provide much needed information for the planning and design of more powerful but cost-efficient intervention programs for the future.
On the Analysis of Case-Control Studies in Cluster-correlated Data Settings.
Haneuse, Sebastien; Rivera-Rodriguez, Claudia
2018-01-01
In resource-limited settings, long-term evaluation of national antiretroviral treatment (ART) programs often relies on aggregated data, the analysis of which may be subject to ecological bias. As researchers and policy makers consider evaluating individual-level outcomes such as treatment adherence or mortality, the well-known case-control design is appealing in that it provides efficiency gains over random sampling. In the context that motivates this article, valid estimation and inference requires acknowledging any clustering, although, to our knowledge, no statistical methods have been published for the analysis of case-control data for which the underlying population exhibits clustering. Furthermore, in the specific context of an ongoing collaboration in Malawi, rather than performing case-control sampling across all clinics, case-control sampling within clinics has been suggested as a more practical strategy. To our knowledge, although similar outcome-dependent sampling schemes have been described in the literature, a case-control design specific to correlated data settings is new. In this article, we describe this design, discuss balanced versus unbalanced sampling techniques, and provide a general approach to analyzing case-control studies in cluster-correlated settings based on inverse probability-weighted generalized estimating equations. Inference is based on a robust sandwich estimator with correlation parameters estimated to ensure appropriate accounting of the outcome-dependent sampling scheme. We conduct comprehensive simulations, based in part on real data on a sample of N = 78,155 program registrants in Malawi between 2005 and 2007, to evaluate small-sample operating characteristics and potential trade-offs associated with standard case-control sampling or when case-control sampling is performed within clusters.
A scan statistic to extract causal gene clusters from case-control genome-wide rare CNV data.
Nishiyama, Takeshi; Takahashi, Kunihiko; Tango, Toshiro; Pinto, Dalila; Scherer, Stephen W; Takami, Satoshi; Kishino, Hirohisa
2011-05-26
Several statistical tests have been developed for analyzing genome-wide association data by incorporating gene pathway information in terms of gene sets. Using these methods, hundreds of gene sets are typically tested, and the tested gene sets often overlap. This overlapping greatly increases the probability of generating false positives, and the results obtained are difficult to interpret, particularly when many gene sets show statistical significance. We propose a flexible statistical framework to circumvent these problems. Inspired by spatial scan statistics for detecting clustering of disease occurrence in the field of epidemiology, we developed a scan statistic to extract disease-associated gene clusters from a whole gene pathway. Extracting one or a few significant gene clusters from a global pathway limits the overall false positive probability, which results in increased statistical power, and facilitates the interpretation of test results. In the present study, we applied our method to genome-wide association data for rare copy-number variations, which have been strongly implicated in common diseases. Application of our method to a simulated dataset demonstrated the high accuracy of this method in detecting disease-associated gene clusters in a whole gene pathway. The scan statistic approach proposed here shows a high level of accuracy in detecting gene clusters in a whole gene pathway. This study has provided a sound statistical framework for analyzing genome-wide rare CNV data by incorporating topological information on the gene pathway.
A two-parameter design storm for Mediterranean convective rainfall
NASA Astrophysics Data System (ADS)
García-Bartual, Rafael; Andrés-Doménech, Ignacio
2017-05-01
The following research explores the feasibility of building effective design storms for extreme hydrological regimes, such as the one which characterizes the rainfall regime of the east and south-east of the Iberian Peninsula, without employing intensity-duration-frequency (IDF) curves as a starting point. Nowadays, after decades of functioning hydrological automatic networks, there is an abundance of high-resolution rainfall data with a reasonable statistic representation, which enable the direct research of temporal patterns and inner structures of rainfall events at a given geographic location, with the aim of establishing a statistical synthesis directly based on those observed patterns. The authors propose a temporal design storm defined in analytical terms, through a two-parameter gamma-type function. The two parameters are directly estimated from 73 independent storms identified from rainfall records of high temporal resolution in Valencia (Spain). All the relevant analytical properties derived from that function are developed in order to use this storm in real applications. In particular, in order to assign a probability to the design storm (return period), an auxiliary variable combining maximum intensity and total cumulated rainfall is introduced. As a result, for a given return period, a set of three storms with different duration, depth and peak intensity are defined. The consistency of the results is verified by means of comparison with the classic method of alternating blocks based on an IDF curve, for the above mentioned study case.
An Intercompany Perspective on Biopharmaceutical Drug Product Robustness Studies.
Morar-Mitrica, Sorina; Adams, Monica L; Crotts, George; Wurth, Christine; Ihnat, Peter M; Tabish, Tanvir; Antochshuk, Valentyn; DiLuzio, Willow; Dix, Daniel B; Fernandez, Jason E; Gupta, Kapil; Fleming, Michael S; He, Bing; Kranz, James K; Liu, Dingjiang; Narasimhan, Chakravarthy; Routhier, Eric; Taylor, Katherine D; Truong, Nobel; Stokes, Elaine S E
2018-02-01
The Biophorum Development Group (BPDG) is an industry-wide consortium enabling networking and sharing of best practices for the development of biopharmaceuticals. To gain a better understanding of current industry approaches for establishing biopharmaceutical drug product (DP) robustness, the BPDG-Formulation Point Share group conducted an intercompany collaboration exercise, which included a bench-marking survey and extensive group discussions around the scope, design, and execution of robustness studies. The results of this industry collaboration revealed several key common themes: (1) overall DP robustness is defined by both the formulation and the manufacturing process robustness; (2) robustness integrates the principles of quality by design (QbD); (3) DP robustness is an important factor in setting critical quality attribute control strategies and commercial specifications; (4) most companies employ robustness studies, along with prior knowledge, risk assessments, and statistics, to develop the DP design space; (5) studies are tailored to commercial development needs and the practices of each company. Three case studies further illustrate how a robustness study design for a biopharmaceutical DP balances experimental complexity, statistical power, scientific understanding, and risk assessment to provide the desired product and process knowledge. The BPDG-Formulation Point Share discusses identified industry challenges with regard to biopharmaceutical DP robustness and presents some recommendations for best practices. Copyright © 2018 American Pharmacists Association®. Published by Elsevier Inc. All rights reserved.
Optimizing DNA assembly based on statistical language modelling.
Fang, Gang; Zhang, Shemin; Dong, Yafei
2017-12-15
By successively assembling genetic parts such as BioBrick according to grammatical models, complex genetic constructs composed of dozens of functional blocks can be built. However, usually every category of genetic parts includes a few or many parts. With increasing quantity of genetic parts, the process of assembling more than a few sets of these parts can be expensive, time consuming and error prone. At the last step of assembling it is somewhat difficult to decide which part should be selected. Based on statistical language model, which is a probability distribution P(s) over strings S that attempts to reflect how frequently a string S occurs as a sentence, the most commonly used parts will be selected. Then, a dynamic programming algorithm was designed to figure out the solution of maximum probability. The algorithm optimizes the results of a genetic design based on a grammatical model and finds an optimal solution. In this way, redundant operations can be reduced and the time and cost required for conducting biological experiments can be minimized. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Novel Data Reduction Based on Statistical Similarity
Lee, Dongeun; Sim, Alex; Choi, Jaesik; ...
2016-07-18
Applications such as scientific simulations and power grid monitoring are generating so much data quickly that compression is essential to reduce storage requirement or transmission capacity. To achieve better compression, one is often willing to discard some repeated information. These lossy compression methods are primarily designed to minimize the Euclidean distance between the original data and the compressed data. But this measure of distance severely limits either reconstruction quality or compression performance. In this paper, we propose a new class of compression method by redefining the distance measure with a statistical concept known as exchangeability. This approach reduces the storagemore » requirement and captures essential features, while reducing the storage requirement. In this paper, we report our design and implementation of such a compression method named IDEALEM. To demonstrate its effectiveness, we apply it on a set of power grid monitoring data, and show that it can reduce the volume of data much more than the best known compression method while maintaining the quality of the compressed data. Finally, in these tests, IDEALEM captures extraordinary events in the data, while its compression ratios can far exceed 100.« less
Wong, P K S; Wong, D F K
2008-03-01
The ecological perspective recognizes the critical role that is played by rehabilitation personnel in helping people with intellectual disability (ID) to exercise self-determination, particularly in residential settings. In Hong Kong, the authors developed the first staff training programme of its kind to strengthen the competence of personnel in this area. The purpose of this study was to examine the effectiveness of staff training in enhancing residential staff's attitudes, knowledge and facilitation skills in assisting residents with ID to exercise self-determination. A pretest-posttest comparison group design was adopted. Thirty-two participants in an experimental group attended a six-session staff training programme. A 34-item self-constructed scale was designed and used for measuring the effectiveness of the staff training. The results showed that the experimental group achieved statistically significant positive changes in all domains, whereas no significant changes were found in the comparison group. The findings provided initial evidence of the effectiveness of staff training that uses an interactional attitude-knowledge-skills model for Chinese rehabilitation personnel. The factors that contributed to its effectiveness were discussed and recommendations for future research were made.
Defensive efficacy interim design: Dynamic benefit/risk ratio view using probability of success.
Tang, Zhongwen
2017-01-01
Traditional efficacy interim design is based on alpha spending which does not have intuitive interpretation and hence is difficult to communicate with non-statistician colleagues. The alpha-spending approach is based on efficacy alone and hence does not have the flexibility to incorporate newly emerged safety signal. Newly emerged safety signal may nullify the originally set efficacy boundary. In contrast, the probability of success (POS) concept has intuitive interpretation and hence can facilitate our communication with non-statistician colleagues and help to obtain health authority (HA) buying. The success criteria of POS are not restricted to statistical significance. Hence, POS has the capability to incorporate both efficacy and safety information. We propose to use POS and its credible interval to design efficacy interim. In the proposed method, the efficacy boundary is adjustable to offset newly emerged safety signal.
Lin, B Y; Wan, T T
1999-12-01
Few empirical analyses have been done in the organizational researches of integrated healthcare networks (IHNs) or integrated healthcare delivery systems. Using a contingency derived contact-process-performance model, this study attempts to explore the relationships among an IHN's strategic direction, structural design, and performance. A cross-sectional analysis of 100 IHNs suggests that certain contextual factors such as market competition and network age and tax status have statistically significant effects on the implementation of an IHN's service differentiation strategy, which addresses coordination and control in the market. An IHN's service differentiation strategy is positively related to its integrated structural design, which is characterized as integration of administration, patient care, and information system across different settings. However, no evidence supports that the development of integrated structural design may benefit an IHN's performance in terms of clinical efficiency and financial viability.
Design prediction for long term stress rupture service of composite pressure vessels
NASA Technical Reports Server (NTRS)
Robinson, Ernest Y.
1992-01-01
Extensive stress rupture studies on glass composites and Kevlar composites were conducted by the Lawrence Radiation Laboratory beginning in the late 1960's and extending to about 8 years in some cases. Some of the data from these studies published over the years were incomplete or were tainted by spurious failures, such as grip slippage. Updated data sets were defined for both fiberglass and Kevlar composite stand test specimens. These updated data are analyzed in this report by a convenient form of the bivariate Weibull distribution, to establish a consistent set of design prediction charts that may be used as a conservative basis for predicting the stress rupture life of composite pressure vessels. The updated glass composite data exhibit an invariant Weibull modulus with lifetime. The data are analyzed in terms of homologous service load (referenced to the observed median strength). The equations relating life, homologous load, and probability are given, and corresponding design prediction charts are presented. A similar approach is taken for Kevlar composites, where the updated stand data do show a turndown tendency at long life accompanied by a corresponding change (increase) of the Weibull modulus. The turndown characteristic is not present in stress rupture test data of Kevlar pressure vessels. A modification of the stress rupture equations is presented to incorporate a latent, but limited, strength drop, and design prediction charts are presented that incorporate such behavior. The methods presented utilize Cartesian plots of the probability distributions (which are a more natural display for the design engineer), based on median normalized data that are independent of statistical parameters and are readily defined for any set of test data.
Statistical assessment of the learning curves of health technologies.
Ramsay, C R; Grant, A M; Wallace, S A; Garthwaite, P H; Monk, A F; Russell, I T
2001-01-01
(1) To describe systematically studies that directly assessed the learning curve effect of health technologies. (2) Systematically to identify 'novel' statistical techniques applied to learning curve data in other fields, such as psychology and manufacturing. (3) To test these statistical techniques in data sets from studies of varying designs to assess health technologies in which learning curve effects are known to exist. METHODS - STUDY SELECTION (HEALTH TECHNOLOGY ASSESSMENT LITERATURE REVIEW): For a study to be included, it had to include a formal analysis of the learning curve of a health technology using a graphical, tabular or statistical technique. METHODS - STUDY SELECTION (NON-HEALTH TECHNOLOGY ASSESSMENT LITERATURE SEARCH): For a study to be included, it had to include a formal assessment of a learning curve using a statistical technique that had not been identified in the previous search. METHODS - DATA SOURCES: Six clinical and 16 non-clinical biomedical databases were searched. A limited amount of handsearching and scanning of reference lists was also undertaken. METHODS - DATA EXTRACTION (HEALTH TECHNOLOGY ASSESSMENT LITERATURE REVIEW): A number of study characteristics were abstracted from the papers such as study design, study size, number of operators and the statistical method used. METHODS - DATA EXTRACTION (NON-HEALTH TECHNOLOGY ASSESSMENT LITERATURE SEARCH): The new statistical techniques identified were categorised into four subgroups of increasing complexity: exploratory data analysis; simple series data analysis; complex data structure analysis, generic techniques. METHODS - TESTING OF STATISTICAL METHODS: Some of the statistical methods identified in the systematic searches for single (simple) operator series data and for multiple (complex) operator series data were illustrated and explored using three data sets. The first was a case series of 190 consecutive laparoscopic fundoplication procedures performed by a single surgeon; the second was a case series of consecutive laparoscopic cholecystectomy procedures performed by ten surgeons; the third was randomised trial data derived from the laparoscopic procedure arm of a multicentre trial of groin hernia repair, supplemented by data from non-randomised operations performed during the trial. RESULTS - HEALTH TECHNOLOGY ASSESSMENT LITERATURE REVIEW: Of 4571 abstracts identified, 272 (6%) were later included in the study after review of the full paper. Some 51% of studies assessed a surgical minimal access technique and 95% were case series. The statistical method used most often (60%) was splitting the data into consecutive parts (such as halves or thirds), with only 14% attempting a more formal statistical analysis. The reporting of the studies was poor, with 31% giving no details of data collection methods. RESULTS - NON-HEALTH TECHNOLOGY ASSESSMENT LITERATURE SEARCH: Of 9431 abstracts assessed, 115 (1%) were deemed appropriate for further investigation and, of these, 18 were included in the study. All of the methods for complex data sets were identified in the non-clinical literature. These were discriminant analysis, two-stage estimation of learning rates, generalised estimating equations, multilevel models, latent curve models, time series models and stochastic parameter models. In addition, eight new shapes of learning curves were identified. RESULTS - TESTING OF STATISTICAL METHODS: No one particular shape of learning curve performed significantly better than another. The performance of 'operation time' as a proxy for learning differed between the three procedures. Multilevel modelling using the laparoscopic cholecystectomy data demonstrated and measured surgeon-specific and confounding effects. The inclusion of non-randomised cases, despite the possible limitations of the method, enhanced the interpretation of learning effects. CONCLUSIONS - HEALTH TECHNOLOGY ASSESSMENT LITERATURE REVIEW: The statistical methods used for assessing learning effects in health technology assessment have been crude and the reporting of studies poor. CONCLUSIONS - NON-HEALTH TECHNOLOGY ASSESSMENT LITERATURE SEARCH: A number of statistical methods for assessing learning effects were identified that had not hitherto been used in health technology assessment. There was a hierarchy of methods for the identification and measurement of learning, and the more sophisticated methods for both have had little if any use in health technology assessment. This demonstrated the value of considering fields outside clinical research when addressing methodological issues in health technology assessment. CONCLUSIONS - TESTING OF STATISTICAL METHODS: It has been demonstrated that the portfolio of techniques identified can enhance investigations of learning curve effects. (ABSTRACT TRUNCATED)
The coastDat data set and its potential for coastal and offshore applications
NASA Astrophysics Data System (ADS)
Meyer, E.; Weisse, R.
2012-12-01
The coastDat data set is a compilation of coastal analyses and scenarios for the future from various sources. It contains no direct measurements but results from numerical models that have been driven either by observed data in order to achieve the best possible representation of observed past conditions or by climate change scenarios for the near future. One of the key objectives for developing coastDat was to derive a consistent and mostly homogeneous database for assessing marine weather statistics and long-term changes. Here, homogeneity refers to a data set which is free from effects caused by changes in instrumentation or measurement techniques. Contrary to direct measurements which are often rare and incomplete, coastDat offers a unique combination of consistent atmospheric, oceanic, sea state and other parameters at high spatial and temporal detail, even for places and variables for which no measurements have been made. The backbones of coastDat are regional wind, wave and storm surge hindcast and scenarios mainly for the North Sea and the Baltic Sea. Furthermore hindcast simulations are available for temperature, salinity, water level, u- and v-components for the North Sea for last 60 years. We will discuss the methodology to derive these data, their quality and limitations in comparison with observations. Long-term changes in the temperature, wind, wave and storm surge climate will be discussed and potential future changes will be assessed. We will conclude with a number of coastal and offshore applications (e.g. ship design, coastal protection, oil risk modelling and marine energy use) of coastDat demonstrating some of the potentials of the data set in hazard assessment. For example data from coastDat have been used extensively for designing, planning and installation of offshore wind farms. Return periods of extreme wind speed, surge and wave heights are used by a variety of users involved in the design and construction of offshore wind parks. Moreover, planning of installation and maintenance requires the estimation of probabilities of weather windows; that is, for example the probability of an extended period with wave heights below a given threshold to enable installation and/or maintenance. Data from coastDat were frequently used in such cases as observational data are too often too short to derive reliable statistics.
Lamont, Scott; Brunero, Scott
2018-05-19
Workplace violence prevalence has attracted significant attention within the international nursing literature. Little attention to non-mental health settings and a lack of evaluation rigor have been identified within review literature. To examine the effects of a workplace violence training program in relation to risk assessment and management practices, de-escalation skills, breakaway techniques, and confidence levels, within an acute hospital setting. A quasi-experimental study of nurses using pretest-posttest measurements of educational objectives and confidence levels, with two week follow-up. A 440 bed metropolitan tertiary referral hospital in Sydney, Australia. Nurses working in specialties identified as a 'high risk' for violence. A pre-post-test design was used with participants attending a one day workshop. The workshop evaluation comprised the use of two validated questionnaires: the Continuing Professional Development Reaction questionnaire, and the Confidence in Coping with Patient Aggression Instrument. Descriptive and inferential statistics were calculated. The paired t-test was used to assess the statistical significance of changes in the clinical behaviour intention and confidence scores from pre- to post-intervention. Cohen's d effect sizes were calculated to determine the extent of the significant results. Seventy-eight participants completed both pre- and post-workshop evaluation questionnaires. Statistically significant increases in behaviour intention scores were found in fourteen of the fifteen constructs relating to the three broad workshop objectives, and confidence ratings, with medium to large effect sizes observed in some constructs. A significant increase in overall confidence in coping with patient aggression was also found post-test with large effect size. Positive results were observed from the workplace violence training. Training needs to be complimented by a multi-faceted organisational approach which includes governance, quality and review processes. Copyright © 2018 Elsevier Ltd. All rights reserved.
Chan, R.V. Paul; Patel, Samir N.; Ryan, Michael C.; Jonas, Karyn E.; Ostmo, Susan; Port, Alexander D.; Sun, Grace I.; Lauer, Andreas K.; Chiang, Michael F.
2015-01-01
Purpose: To describe the design, implementation, and evaluation of a tele-education system developed to improve diagnostic competency in retinopathy of prematurity (ROP) by ophthalmology residents. Methods: A secure Web-based tele-education system was developed utilizing a repository of over 2,500 unique image sets of ROP. For each image set used in the system, a reference standard ROP diagnosis was established. Performance by ophthalmology residents (postgraduate years 2 to 4) from the United States and Canada in taking the ROP tele-education program was prospectively evaluated. Residents were presented with image-based clinical cases of ROP during a pretest, posttest, and training chapters. Accuracy and reliability of ROP diagnosis (eg, plus disease, zone, stage, category) were determined using sensitivity, specificity, and the kappa statistic calculations of the results from the pretest and posttest. Results: Fifty-five ophthalmology residents were provided access to the ROP tele-education program. Thirty-one ophthalmology residents completed the program. When all training levels were analyzed together, a statistically significant increase was observed in sensitivity for the diagnosis of plus disease, zone, stage, category, and aggressive posterior ROP (P<.05). Statistically significant changes in specificity for identification of stage 2 or worse (P=.027) and pre-plus (P=.028) were observed. Conclusions: A tele-education system for ROP education is effective in improving diagnostic accuracy of ROP by ophthalmology residents. This system may have utility in the setting of both healthcare and medical education reform by creating a validated method to certify telemedicine providers and educate the next generation of ophthalmologists. PMID:26538772
The statistics of identifying differentially expressed genes in Expresso and TM4: a comparison
Sioson, Allan A; Mane, Shrinivasrao P; Li, Pinghua; Sha, Wei; Heath, Lenwood S; Bohnert, Hans J; Grene, Ruth
2006-01-01
Background Analysis of DNA microarray data takes as input spot intensity measurements from scanner software and returns differential expression of genes between two conditions, together with a statistical significance assessment. This process typically consists of two steps: data normalization and identification of differentially expressed genes through statistical analysis. The Expresso microarray experiment management system implements these steps with a two-stage, log-linear ANOVA mixed model technique, tailored to individual experimental designs. The complement of tools in TM4, on the other hand, is based on a number of preset design choices that limit its flexibility. In the TM4 microarray analysis suite, normalization, filter, and analysis methods form an analysis pipeline. TM4 computes integrated intensity values (IIV) from the average intensities and spot pixel counts returned by the scanner software as input to its normalization steps. By contrast, Expresso can use either IIV data or median intensity values (MIV). Here, we compare Expresso and TM4 analysis of two experiments and assess the results against qRT-PCR data. Results The Expresso analysis using MIV data consistently identifies more genes as differentially expressed, when compared to Expresso analysis with IIV data. The typical TM4 normalization and filtering pipeline corrects systematic intensity-specific bias on a per microarray basis. Subsequent statistical analysis with Expresso or a TM4 t-test can effectively identify differentially expressed genes. The best agreement with qRT-PCR data is obtained through the use of Expresso analysis and MIV data. Conclusion The results of this research are of practical value to biologists who analyze microarray data sets. The TM4 normalization and filtering pipeline corrects microarray-specific systematic bias and complements the normalization stage in Expresso analysis. The results of Expresso using MIV data have the best agreement with qRT-PCR results. In one experiment, MIV is a better choice than IIV as input to data normalization and statistical analysis methods, as it yields as greater number of statistically significant differentially expressed genes; TM4 does not support the choice of MIV input data. Overall, the more flexible and extensive statistical models of Expresso achieve more accurate analytical results, when judged by the yardstick of qRT-PCR data, in the context of an experimental design of modest complexity. PMID:16626497
Surface laser marking optimization using an experimental design approach
NASA Astrophysics Data System (ADS)
Brihmat-Hamadi, F.; Amara, E. H.; Lavisse, L.; Jouvard, J. M.; Cicala, E.; Kellou, H.
2017-04-01
Laser surface marking is performed on a titanium substrate using a pulsed frequency doubled Nd:YAG laser ( λ= 532 nm, τ pulse=5 ns) to process the substrate surface under normal atmospheric conditions. The aim of the work is to investigate, following experimental and statistical approaches, the correlation between the process parameters and the response variables (output), using a Design of Experiment method (DOE): Taguchi methodology and a response surface methodology (RSM). A design is first created using MINTAB program, and then the laser marking process is performed according to the planned design. The response variables; surface roughness and surface reflectance were measured for each sample, and incorporated into the design matrix. The results are then analyzed and the RSM model is developed and verified for predicting the process output for the given set of process parameters values. The analysis shows that the laser beam scanning speed is the most influential operating factor followed by the laser pumping intensity during marking, while the other factors show complex influences on the objective functions.
Multiple linear regression analysis
NASA Technical Reports Server (NTRS)
Edwards, T. R.
1980-01-01
Program rapidly selects best-suited set of coefficients. User supplies only vectors of independent and dependent data and specifies confidence level required. Program uses stepwise statistical procedure for relating minimal set of variables to set of observations; final regression contains only most statistically significant coefficients. Program is written in FORTRAN IV for batch execution and has been implemented on NOVA 1200.
ERIC Educational Resources Information Center
Acee, Taylor Wayne
2009-01-01
The purpose of this dissertation was to investigate the differential effects of goal setting and value reappraisal on female students' self-efficacy beliefs, value perceptions, exam performance and continued interest in statistics. It was hypothesized that the Enhanced Goal Setting Intervention (GS-E) would positively impact students'…
Set statistics in conductive bridge random access memory device with Cu/HfO{sub 2}/Pt structure
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhang, Meiyun; Long, Shibing, E-mail: longshibing@ime.ac.cn; Wang, Guoming
2014-11-10
The switching parameter variation of resistive switching memory is one of the most important challenges in its application. In this letter, we have studied the set statistics of conductive bridge random access memory with a Cu/HfO{sub 2}/Pt structure. The experimental distributions of the set parameters in several off resistance ranges are shown to nicely fit a Weibull model. The Weibull slopes of the set voltage and current increase and decrease logarithmically with off resistance, respectively. This experimental behavior is perfectly captured by a Monte Carlo simulator based on the cell-based set voltage statistics model and the Quantum Point Contact electronmore » transport model. Our work provides indications for the improvement of the switching uniformity.« less
Multiple Versus Single Set Validation of Multivariate Models to Avoid Mistakes.
Harrington, Peter de Boves
2018-01-02
Validation of multivariate models is of current importance for a wide range of chemical applications. Although important, it is neglected. The common practice is to use a single external validation set for evaluation. This approach is deficient and may mislead investigators with results that are specific to the single validation set of data. In addition, no statistics are available regarding the precision of a derived figure of merit (FOM). A statistical approach using bootstrapped Latin partitions is advocated. This validation method makes an efficient use of the data because each object is used once for validation. It was reviewed a decade earlier but primarily for the optimization of chemometric models this review presents the reasons it should be used for generalized statistical validation. Average FOMs with confidence intervals are reported and powerful, matched-sample statistics may be applied for comparing models and methods. Examples demonstrate the problems with single validation sets.
DESIGNING ENVIRONMENTAL MONITORING DATABASES FOR STATISTIC ASSESSMENT
Databases designed for statistical analyses have characteristics that distinguish them from databases intended for general use. EMAP uses a probabilistic sampling design to collect data to produce statistical assessments of environmental conditions. In addition to supporting the ...
Collier, Lesley; Jakob, Anke
2017-10-01
Multisensory environments (MSEs) for people with dementia have been available over 20 years but are used in an ad hoc manner using an eclectic range of equipment. Care homes have endeavored to utilize this approach but have struggled to find a design and approach that works for this setting. Study aims were to appraise the evolving concept of MSEs from a user perspective, to study the aesthetic and functional qualities, to identify barriers to staff engagement with a sensory environment approach, and to identify design criteria to improve the potential of MSE for people with dementia. Data were collected from 16 care homes with experience of MSE using ethnographic methods, incorporating semi-structured interviews, and observations of MSE design. Analysis was undertaken using descriptive statistics and thematic analysis. Observations revealed equipment that predominantly stimulated vision and touch. Thematic analysis of the semi-structured interviews revealed six themes: not knowing what to do in the room, good for people in the later stages of the disease, reduces anxiety, it's a good activity, design and setting up of the space, and including relatives and care staff. Few MSEs in care homes are designed to meet needs of people with dementia, and staff receive little training in how to facilitate sessions. As such, MSEs are often underused despite perceived benefits. Results of this study have been used to identify the design principles that have been reviewed by relevant stakeholders.
Tunneling Statistics for Analysis of Spin-Readout Fidelity
NASA Astrophysics Data System (ADS)
Gorman, S. K.; He, Y.; House, M. G.; Keizer, J. G.; Keith, D.; Fricke, L.; Hile, S. J.; Broome, M. A.; Simmons, M. Y.
2017-09-01
We investigate spin and charge dynamics of a quantum dot of phosphorus atoms coupled to a radio-frequency single-electron transistor (SET) using full counting statistics. We show how the magnetic field plays a role in determining the bunching or antibunching tunneling statistics of the donor dot and SET system. Using the counting statistics, we show how to determine the lowest magnetic field where spin readout is possible. We then show how such a measurement can be used to investigate and optimize single-electron spin-readout fidelity.
NASA Technical Reports Server (NTRS)
Hoffer, R. M. (Principal Investigator); Knowlton, D. J.; Dean, M. E.
1981-01-01
A set of training statistics for the 30 meter resolution simulated thematic mapper MSS data was generated based on land use/land cover classes. In addition to this supervised data set, a nonsupervised multicluster block of training statistics is being defined in order to compare the classification results and evaluate the effect of the different training selection methods on classification performance. Two test data sets, defined using a stratified sampling procedure incorporating a grid system with dimensions of 50 lines by 50 columns, and another set based on an analyst supervised set of test fields were used to evaluate the classifications of the TMS data. The supervised training data set generated training statistics, and a per point Gaussian maximum likelihood classification of the 1979 TMS data was obtained. The August 1980 MSS data was radiometrically adjusted. The SAR data was redigitized and the SAR imagery was qualitatively analyzed.
The Content of Statistical Requirements for Authors in Biomedical Research Journals
Liu, Tian-Yi; Cai, Si-Yu; Nie, Xiao-Lu; Lyu, Ya-Qi; Peng, Xiao-Xia; Feng, Guo-Shuang
2016-01-01
Background: Robust statistical designing, sound statistical analysis, and standardized presentation are important to enhance the quality and transparency of biomedical research. This systematic review was conducted to summarize the statistical reporting requirements introduced by biomedical research journals with an impact factor of 10 or above so that researchers are able to give statistical issues’ serious considerations not only at the stage of data analysis but also at the stage of methodological design. Methods: Detailed statistical instructions for authors were downloaded from the homepage of each of the included journals or obtained from the editors directly via email. Then, we described the types and numbers of statistical guidelines introduced by different press groups. Items of statistical reporting guideline as well as particular requirements were summarized in frequency, which were grouped into design, method of analysis, and presentation, respectively. Finally, updated statistical guidelines and particular requirements for improvement were summed up. Results: Totally, 21 of 23 press groups introduced at least one statistical guideline. More than half of press groups can update their statistical instruction for authors gradually relative to issues of new statistical reporting guidelines. In addition, 16 press groups, covering 44 journals, address particular statistical requirements. The most of the particular requirements focused on the performance of statistical analysis and transparency in statistical reporting, including “address issues relevant to research design, including participant flow diagram, eligibility criteria, and sample size estimation,” and “statistical methods and the reasons.” Conclusions: Statistical requirements for authors are becoming increasingly perfected. Statistical requirements for authors remind researchers that they should make sufficient consideration not only in regards to statistical methods during the research design, but also standardized statistical reporting, which would be beneficial in providing stronger evidence and making a greater critical appraisal of evidence more accessible. PMID:27748343
The Content of Statistical Requirements for Authors in Biomedical Research Journals.
Liu, Tian-Yi; Cai, Si-Yu; Nie, Xiao-Lu; Lyu, Ya-Qi; Peng, Xiao-Xia; Feng, Guo-Shuang
2016-10-20
Robust statistical designing, sound statistical analysis, and standardized presentation are important to enhance the quality and transparency of biomedical research. This systematic review was conducted to summarize the statistical reporting requirements introduced by biomedical research journals with an impact factor of 10 or above so that researchers are able to give statistical issues' serious considerations not only at the stage of data analysis but also at the stage of methodological design. Detailed statistical instructions for authors were downloaded from the homepage of each of the included journals or obtained from the editors directly via email. Then, we described the types and numbers of statistical guidelines introduced by different press groups. Items of statistical reporting guideline as well as particular requirements were summarized in frequency, which were grouped into design, method of analysis, and presentation, respectively. Finally, updated statistical guidelines and particular requirements for improvement were summed up. Totally, 21 of 23 press groups introduced at least one statistical guideline. More than half of press groups can update their statistical instruction for authors gradually relative to issues of new statistical reporting guidelines. In addition, 16 press groups, covering 44 journals, address particular statistical requirements. The most of the particular requirements focused on the performance of statistical analysis and transparency in statistical reporting, including "address issues relevant to research design, including participant flow diagram, eligibility criteria, and sample size estimation," and "statistical methods and the reasons." Statistical requirements for authors are becoming increasingly perfected. Statistical requirements for authors remind researchers that they should make sufficient consideration not only in regards to statistical methods during the research design, but also standardized statistical reporting, which would be beneficial in providing stronger evidence and making a greater critical appraisal of evidence more accessible.
NASA Astrophysics Data System (ADS)
Baart, F.; van Gils, A.; Hagenaars, G.; Donchyts, G.; Eisemann, E.; van Velzen, J. W.
2016-12-01
A compelling visualization is captivating, beautiful and narrative. Here we show how melding the skills of computer graphics, art, statistics, and environmental modeling can be used to generate innovative, attractive and very informative visualizations. We focus on the topic of visualizing forecasts and measurements of water (water level, waves, currents, density, and salinity). For the field of computer graphics and arts, water is an important topic because it occurs in many natural scenes. For environmental modeling and statistics, water is an important topic because the water is essential for transport, a healthy environment, fruitful agriculture, and a safe environment.The different disciplines take different approaches to visualizing water. In computer graphics, one focusses on creating water as realistic looking as possible. The focus on realistic perception (versus the focus on the physical balance pursued by environmental scientists) resulted in fascinating renderings, as seen in recent games and movies. Visualization techniques for statistical results have benefited from the advancement in design and journalism, resulting in enthralling infographics. The field of environmental modeling has absorbed advances in contemporary cartography as seen in the latest interactive data-driven maps. We systematically review the design emerging types of water visualizations. The examples that we analyze range from dynamically animated forecasts, interactive paintings, infographics, modern cartography to web-based photorealistic rendering. By characterizing the intended audience, the design choices, the scales (e.g. time, space), and the explorability we provide a set of guidelines and genres. The unique contributions of the different fields show how the innovations in the current state of the art of water visualization have benefited from inter-disciplinary collaborations.
Medium composition influence on Biotin and Riboflavin production by newly isolated Candida sp
Suzuki, Gaby Tiemi; Macedo, Juliana Alves; Macedo, Gabriela Alves
2011-01-01
Complex B vitamins as Biotin and Riboflavin are required by living organisms, not only for growth but also for metabolite production, and the feed market classifies them as growth promoters. Since Brazil will soon be one of the world’s biggest animal protein producers, feed production is a large consumer of vitamins and micronutrients. The industry requires 10 mg riboflavin/0.2 mg biotin per kilogram of feed; a ratio of 40 ~ 50:1. Although few studies have been conducted specifically on riboflavin production using factorial design and surface response method as an optimization strategy, it is a common practice in biotechnology with many research reports available. However, there are no reports on the use of statistical design for biotin production. This study set out to evaluate medium composition influence on biotin and riboflavin production using a statistical design. There are no studies relating biotin and riboflavin production by Candida sp LEB 130. In this preliminary study to improve the simultaneous production of biotin and riboflavin, the maximum riboflavin/biotin ratio of 8.3 μg/mL was achieved with medium component concentrations of: sucrose 30 g/L, KH2PO4 2 g/L, MgSO4 1 g/L and ZnSO4 0.5mL/L. PMID:24031727
The power prior: theory and applications.
Ibrahim, Joseph G; Chen, Ming-Hui; Gwon, Yeongjin; Chen, Fang
2015-12-10
The power prior has been widely used in many applications covering a large number of disciplines. The power prior is intended to be an informative prior constructed from historical data. It has been used in clinical trials, genetics, health care, psychology, environmental health, engineering, economics, and business. It has also been applied for a wide variety of models and settings, both in the experimental design and analysis contexts. In this review article, we give an A-to-Z exposition of the power prior and its applications to date. We review its theoretical properties, variations in its formulation, statistical contexts for which it has been used, applications, and its advantages over other informative priors. We review models for which it has been used, including generalized linear models, survival models, and random effects models. Statistical areas where the power prior has been used include model selection, experimental design, hierarchical modeling, and conjugate priors. Frequentist properties of power priors in posterior inference are established, and a simulation study is conducted to further examine the empirical performance of the posterior estimates with power priors. Real data analyses are given illustrating the power prior as well as the use of the power prior in the Bayesian design of clinical trials. Copyright © 2015 John Wiley & Sons, Ltd.
Tichy, Diana; Pickl, Julia Maria Anna; Benner, Axel; Sültmann, Holger
2017-03-31
The identification of microRNA (miRNA) target genes is crucial for understanding miRNA function. Many methods for the genome-wide miRNA target identification have been developed in recent years; however, they have several limitations including the dependence on low-confident prediction programs and artificial miRNA manipulations. Ago-RNA immunoprecipitation combined with high-throughput sequencing (Ago-RIP-Seq) is a promising alternative. However, appropriate statistical data analysis algorithms taking into account the experimental design and the inherent noise of such experiments are largely lacking.Here, we investigate the experimental design for Ago-RIP-Seq and examine biostatistical methods to identify de novo miRNA target genes. Statistical approaches considered are either based on a negative binomial model fit to the read count data or applied to transformed data using a normal distribution-based generalized linear model. We compare them by a real data simulation study using plasmode data sets and evaluate the suitability of the approaches to detect true miRNA targets by sensitivity and false discovery rates. Our results suggest that simple approaches like linear regression models on (appropriately) transformed read count data are preferable. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Ever Enrolled Medicare Population Estimates from the MCBS Access to Care Files
Petroski, Jason; Ferraro, David; Chu, Adam
2014-01-01
Objective The Medicare Current Beneficiary Survey’s (MCBS) Access to Care (ATC) file is designed to provide timely access to information on the Medicare population, yet because of the survey’s complex sampling design and expedited processing it is difficult to use the file to make both “always-enrolled” and “ever-enrolled” estimates on the Medicare population. In this study, we describe the ATC file and sample design, and we evaluate and review various alternatives for producing “ever-enrolled” estimates. Methods We created “ever enrolled” estimates for key variables in the MCBS using three separate approaches. We tested differences between the alternative approaches for statistical significance and show the relative magnitude of difference between approaches. Results Even when estimates derived from the different approaches were statistically different, the magnitude of the difference was often sufficiently small so as to result in little practical difference among the alternate approaches. However, when considering more than just the estimation method, there are advantages to using certain approaches over others. Conclusion There are several plausible approaches to achieving “ever-enrolled” estimates in the MCBS ATC file; however, the most straightforward approach appears to be implementation and usage of a new set of “ever-enrolled” weights for this file. PMID:24991484
Medium composition influence on Biotin and Riboflavin production by newly isolated Candida sp.
Suzuki, Gaby Tiemi; Macedo, Juliana Alves; Macedo, Gabriela Alves
2011-07-01
Complex B vitamins as Biotin and Riboflavin are required by living organisms, not only for growth but also for metabolite production, and the feed market classifies them as growth promoters. Since Brazil will soon be one of the world's biggest animal protein producers, feed production is a large consumer of vitamins and micronutrients. The industry requires 10 mg riboflavin/0.2 mg biotin per kilogram of feed; a ratio of 40 ~ 50:1. Although few studies have been conducted specifically on riboflavin production using factorial design and surface response method as an optimization strategy, it is a common practice in biotechnology with many research reports available. However, there are no reports on the use of statistical design for biotin production. This study set out to evaluate medium composition influence on biotin and riboflavin production using a statistical design. There are no studies relating biotin and riboflavin production by Candida sp LEB 130. In this preliminary study to improve the simultaneous production of biotin and riboflavin, the maximum riboflavin/biotin ratio of 8.3 μg/mL was achieved with medium component concentrations of: sucrose 30 g/L, KH2PO4 2 g/L, MgSO4 1 g/L and ZnSO4 0.5mL/L.
FTree query construction for virtual screening: a statistical analysis.
Gerlach, Christof; Broughton, Howard; Zaliani, Andrea
2008-02-01
FTrees (FT) is a known chemoinformatic tool able to condense molecular descriptions into a graph object and to search for actives in large databases using graph similarity. The query graph is classically derived from a known active molecule, or a set of actives, for which a similar compound has to be found. Recently, FT similarity has been extended to fragment space, widening its capabilities. If a user were able to build a knowledge-based FT query from information other than a known active structure, the similarity search could be combined with other, normally separate, fields like de-novo design or pharmacophore searches. With this aim in mind, we performed a comprehensive analysis of several databases in terms of FT description and provide a basic statistical analysis of the FT spaces so far at hand. Vendors' catalogue collections and MDDR as a source of potential or known "actives", respectively, have been used. With the results reported herein, a set of ranges, mean values and standard deviations for several query parameters are presented in order to set a reference guide for the users. Applications on how to use this information in FT query building are also provided, using a newly built 3D-pharmacophore from 57 5HT-1F agonists and a published one which was used for virtual screening for tRNA-guanine transglycosylase (TGT) inhibitors.
FTree query construction for virtual screening: a statistical analysis
NASA Astrophysics Data System (ADS)
Gerlach, Christof; Broughton, Howard; Zaliani, Andrea
2008-02-01
FTrees (FT) is a known chemoinformatic tool able to condense molecular descriptions into a graph object and to search for actives in large databases using graph similarity. The query graph is classically derived from a known active molecule, or a set of actives, for which a similar compound has to be found. Recently, FT similarity has been extended to fragment space, widening its capabilities. If a user were able to build a knowledge-based FT query from information other than a known active structure, the similarity search could be combined with other, normally separate, fields like de-novo design or pharmacophore searches. With this aim in mind, we performed a comprehensive analysis of several databases in terms of FT description and provide a basic statistical analysis of the FT spaces so far at hand. Vendors' catalogue collections and MDDR as a source of potential or known "actives", respectively, have been used. With the results reported herein, a set of ranges, mean values and standard deviations for several query parameters are presented in order to set a reference guide for the users. Applications on how to use this information in FT query building are also provided, using a newly built 3D-pharmacophore from 57 5HT-1F agonists and a published one which was used for virtual screening for tRNA-guanine transglycosylase (TGT) inhibitors.
A practical approach for the scale-up of roller compaction process.
Shi, Weixian; Sprockel, Omar L
2016-09-01
An alternative approach for the scale-up of ribbon formation during roller compaction was investigated, which required only one batch at the commercial scale to set the operational conditions. The scale-up of ribbon formation was based on a probability method. It was sufficient in describing the mechanism of ribbon formation at both scales. In this method, a statistical relationship between roller compaction parameters and ribbon attributes (thickness and density) was first defined with DoE using a pilot Alexanderwerk WP120 roller compactor. While the milling speed was included in the design, it has no practical effect on granule properties within the study range despite its statistical significance. The statistical relationship was then adapted to a commercial Alexanderwerk WP200 roller compactor with one experimental run. The experimental run served as a calibration of the statistical model parameters. The proposed transfer method was then confirmed by conducting a mapping study on the Alexanderwerk WP200 using a factorial DoE, which showed a match between the predictions and the verification experiments. The study demonstrates the applicability of the roller compaction transfer method using the statistical model from the development scale calibrated with one experiment point at the commercial scale. Copyright © 2016 Elsevier B.V. All rights reserved.
NASA Technical Reports Server (NTRS)
Dubovik, O; Herman, M.; Holdak, A.; Lapyonok, T.; Taure, D.; Deuze, J. L.; Ducos, F.; Sinyuk, A.
2011-01-01
The proposed development is an attempt to enhance aerosol retrieval by emphasizing statistical optimization in inversion of advanced satellite observations. This optimization concept improves retrieval accuracy relying on the knowledge of measurement error distribution. Efficient application of such optimization requires pronounced data redundancy (excess of the measurements number over number of unknowns) that is not common in satellite observations. The POLDER imager on board the PARASOL microsatellite registers spectral polarimetric characteristics of the reflected atmospheric radiation at up to 16 viewing directions over each observed pixel. The completeness of such observations is notably higher than for most currently operating passive satellite aerosol sensors. This provides an opportunity for profound utilization of statistical optimization principles in satellite data inversion. The proposed retrieval scheme is designed as statistically optimized multi-variable fitting of all available angular observations obtained by the POLDER sensor in the window spectral channels where absorption by gas is minimal. The total number of such observations by PARASOL always exceeds a hundred over each pixel and the statistical optimization concept promises to be efficient even if the algorithm retrieves several tens of aerosol parameters. Based on this idea, the proposed algorithm uses a large number of unknowns and is aimed at retrieval of extended set of parameters affecting measured radiation.
2012-01-01
Background Because of the large volume of data and the intrinsic variation of data intensity observed in microarray experiments, different statistical methods have been used to systematically extract biological information and to quantify the associated uncertainty. The simplest method to identify differentially expressed genes is to evaluate the ratio of average intensities in two different conditions and consider all genes that differ by more than an arbitrary cut-off value to be differentially expressed. This filtering approach is not a statistical test and there is no associated value that can indicate the level of confidence in the designation of genes as differentially expressed or not differentially expressed. At the same time the fold change by itself provide valuable information and it is important to find unambiguous ways of using this information in expression data treatment. Results A new method of finding differentially expressed genes, called distributional fold change (DFC) test is introduced. The method is based on an analysis of the intensity distribution of all microarray probe sets mapped to a three dimensional feature space composed of average expression level, average difference of gene expression and total variance. The proposed method allows one to rank each feature based on the signal-to-noise ratio and to ascertain for each feature the confidence level and power for being differentially expressed. The performance of the new method was evaluated using the total and partial area under receiver operating curves and tested on 11 data sets from Gene Omnibus Database with independently verified differentially expressed genes and compared with the t-test and shrinkage t-test. Overall the DFC test performed the best – on average it had higher sensitivity and partial AUC and its elevation was most prominent in the low range of differentially expressed features, typical for formalin-fixed paraffin-embedded sample sets. Conclusions The distributional fold change test is an effective method for finding and ranking differentially expressed probesets on microarrays. The application of this test is advantageous to data sets using formalin-fixed paraffin-embedded samples or other systems where degradation effects diminish the applicability of correlation adjusted methods to the whole feature set. PMID:23122055
Comparison of fMRI analysis methods for heterogeneous BOLD responses in block design studies.
Liu, Jia; Duffy, Ben A; Bernal-Casas, David; Fang, Zhongnan; Lee, Jin Hyung
2017-02-15
A large number of fMRI studies have shown that the temporal dynamics of evoked BOLD responses can be highly heterogeneous. Failing to model heterogeneous responses in statistical analysis can lead to significant errors in signal detection and characterization and alter the neurobiological interpretation. However, to date it is not clear that, out of a large number of options, which methods are robust against variability in the temporal dynamics of BOLD responses in block-design studies. Here, we used rodent optogenetic fMRI data with heterogeneous BOLD responses and simulations guided by experimental data as a means to investigate different analysis methods' performance against heterogeneous BOLD responses. Evaluations are carried out within the general linear model (GLM) framework and consist of standard basis sets as well as independent component analysis (ICA). Analyses show that, in the presence of heterogeneous BOLD responses, conventionally used GLM with a canonical basis set leads to considerable errors in the detection and characterization of BOLD responses. Our results suggest that the 3rd and 4th order gamma basis sets, the 7th to 9th order finite impulse response (FIR) basis sets, the 5th to 9th order B-spline basis sets, and the 2nd to 5th order Fourier basis sets are optimal for good balance between detection and characterization, while the 1st order Fourier basis set (coherence analysis) used in our earlier studies show good detection capability. ICA has mostly good detection and characterization capabilities, but detects a large volume of spurious activation with the control fMRI data. Copyright © 2016 Elsevier Inc. All rights reserved.
NASA Technical Reports Server (NTRS)
Peters, C.; Kampe, F. (Principal Investigator)
1980-01-01
The mathematical description and implementation of the statistical estimation procedure known as the Houston integrated spatial/spectral estimator (HISSE) is discussed. HISSE is based on a normal mixture model and is designed to take advantage of spectral and spatial information of LANDSAT data pixels, utilizing the initial classification and clustering information provided by the AMOEBA algorithm. The HISSE calculates parametric estimates of class proportions which reduce the error inherent in estimates derived from typical classify and count procedures common to nonparametric clustering algorithms. It also singles out spatial groupings of pixels which are most suitable for labeling classes. These calculations are designed to aid the analyst/interpreter in labeling patches with a crop class label. Finally, HISSE's initial performance on an actual LANDSAT agricultural ground truth data set is reported.
NASA Technical Reports Server (NTRS)
Fichtl, G. H.
1971-01-01
Statistical estimates of wind shear in the planetary boundary layer are important in the design of V/STOL aircraft, and for the design of the Space Shuttle. The data analyzed in this study consist of eleven sets of longitudinal turbulent velocity fluctuation time histories digitized at 0.2 sec intervals with approximately 18,000 data points per time history. The longitudinal velocity fluctuations were calculated with horizontal wind and direction data collected at the 18-, 30-, 60-, 90-, 120-, and 150-m levels. The data obtained confirm the result that Eulerian time spectra transformed to wave-number spectra with Taylor's frozen eddy hypothesis possess inertial-like behavior at wave-numbers well out of the inertial subrange.
Aero-Optics Measurement System for the AEDC Aero-Optics Test Facility
1991-02-01
Pulse Energy Statistics , 150 Pulses ........................................ 41 AEDC-TR-90-20 APPENDIXES A. Optical Performance of Heated Windows...hypersonic wind tunnel, where the requisite extensive statistical database can be developed in a cost- and time-effective manner. Ground testing...At the present time at AEDC, measured AO parameter statistics are derived from sets of image-spot recordings with a set containing as many as 150
Communicating Patient Status: Comparison of Teaching Strategies in Prelicensure Nursing Education.
Lanz, Amelia S; Wood, Felecia G
Research indicates that nurses lack adequate preparation for reporting patient status. This study compared 2 instructional methods focused on patient status reporting in the clinical setting using a randomized posttest-only comparison group design. Reporting performance using a standardized communication framework and student perceptions of satisfaction and confidence with learning were measured in a simulated event that followed the instruction. Between the instructional methods, there was no statistical difference in student reporting performance or perceptions of learning. Performance evaluations provided helpful insights for the nurse educator.
The Value of Physical Examination: A New Conceptual Framework.
Zaman, Junaid; Verghese, Abraham; Elder, Andrew
2016-12-01
The physical examination defines medical practice, yet its role is being questioned increasingly, with statistical comparisons of diagnostic accuracy often the sole metric used against newer technologies. We set out to highlight seven ways in which the physical examination has value beyond diagnostic accuracy to reaffirm its place in the core skills of a physician and guide future research, teaching, and curriculum design. We show that this more comprehensive approach to the physical examination of its "utility" beyond that of reaching a diagnosis can be beneficial to both doctor and patient.
Handbook of satellite pointing errors and their statistical treatment
NASA Astrophysics Data System (ADS)
Weinberger, M. C.
1980-03-01
This handbook aims to provide both satellite payload and attitude control system designers with a consistent, unambiguous approach to the formulation, definition and interpretation of attitude pointing and measurement specifications. It reviews and assesses the current terminology and practices, and from them establishes a set of unified terminology, giving the user a sound basis to understand the meaning and implications of various specifications and requirements. Guidelines are presented for defining the characteristics of the error sources influencing satellite pointing and attitude measurement, and their combination in performance verification.
A computerized data base of nitrate concentrations in Indiana ground water
Risch, M.R.; Cohen, D.A.
1995-01-01
The nitrate data base was compiled from numerous data sets that were readily accessible in electronic format. The uses of these data may be limited because they were neither comprehensive nor of a single statistical design. Nonetheless, the nitrate data can be used in several ways: (1) to identify geographic areas with and without nitrate data; (2) to evaluate assumptions, models, and maps of ground-water-contamination potential; and (3) to investigate the relation between environmental factors, land-use types, and the occurrence of nitrate.
Reproducing a Prospective Clinical Study as a Computational Retrospective Study in MIMIC-II.
Kury, Fabrício S P; Huser, Vojtech; Cimino, James J
2015-01-01
In this paper we sought to reproduce, as a computational retrospective study in an EHR database (MIMIC-II), a recent large prospective clinical study: the 2013 publication, by the Japanese Association for Acute Medicine (JAAM), about disseminated intravascular coagulation, in the journal Critical Care (PMID: 23787004). We designed in SQL and Java a set of electronic phenotypes that reproduced the study's data sampling, and used R to perform the same statistical inference procedures. All produced source code is available online at https://github.com/fabkury/paamia2015. Our program identified 2,257 eligible patients in MIMIC-II, and the results remarkably agreed with the prospective study. A minority of the needed data elements was not found in MIMIC-II, and statistically significant inferences were possible in the majority of the cases.
Experimental comparisons of hypothesis test and moving average based combustion phase controllers.
Gao, Jinwu; Wu, Yuhu; Shen, Tielong
2016-11-01
For engine control, combustion phase is the most effective and direct parameter to improve fuel efficiency. In this paper, the statistical control strategy based on hypothesis test criterion is discussed. Taking location of peak pressure (LPP) as combustion phase indicator, the statistical model of LPP is first proposed, and then the controller design method is discussed on the basis of both Z and T tests. For comparison, moving average based control strategy is also presented and implemented in this study. The experiments on a spark ignition gasoline engine at various operating conditions show that the hypothesis test based controller is able to regulate LPP close to set point while maintaining the rapid transient response, and the variance of LPP is also well constrained. Copyright © 2016 ISA. Published by Elsevier Ltd. All rights reserved.
Analyzing longitudinal data with the linear mixed models procedure in SPSS.
West, Brady T
2009-09-01
Many applied researchers analyzing longitudinal data share a common misconception: that specialized statistical software is necessary to fit hierarchical linear models (also known as linear mixed models [LMMs], or multilevel models) to longitudinal data sets. Although several specialized statistical software programs of high quality are available that allow researchers to fit these models to longitudinal data sets (e.g., HLM), rapid advances in general purpose statistical software packages have recently enabled analysts to fit these same models when using preferred packages that also enable other more common analyses. One of these general purpose statistical packages is SPSS, which includes a very flexible and powerful procedure for fitting LMMs to longitudinal data sets with continuous outcomes. This article aims to present readers with a practical discussion of how to analyze longitudinal data using the LMMs procedure in the SPSS statistical software package.
Statistical Design Model (SDM) of satellite thermal control subsystem
NASA Astrophysics Data System (ADS)
Mirshams, Mehran; Zabihian, Ehsan; Aarabi Chamalishahi, Mahdi
2016-07-01
Satellites thermal control, is a satellite subsystem that its main task is keeping the satellite components at its own survival and activity temperatures. Ability of satellite thermal control plays a key role in satisfying satellite's operational requirements and designing this subsystem is a part of satellite design. In the other hand due to the lack of information provided by companies and designers still doesn't have a specific design process while it is one of the fundamental subsystems. The aim of this paper, is to identify and extract statistical design models of spacecraft thermal control subsystem by using SDM design method. This method analyses statistical data with a particular procedure. To implement SDM method, a complete database is required. Therefore, we first collect spacecraft data and create a database, and then we extract statistical graphs using Microsoft Excel, from which we further extract mathematical models. Inputs parameters of the method are mass, mission, and life time of the satellite. For this purpose at first thermal control subsystem has been introduced and hardware using in the this subsystem and its variants has been investigated. In the next part different statistical models has been mentioned and a brief compare will be between them. Finally, this paper particular statistical model is extracted from collected statistical data. Process of testing the accuracy and verifying the method use a case study. Which by the comparisons between the specifications of thermal control subsystem of a fabricated satellite and the analyses results, the methodology in this paper was proved to be effective. Key Words: Thermal control subsystem design, Statistical design model (SDM), Satellite conceptual design, Thermal hardware
An Asynchronous Many-Task Implementation of In-Situ Statistical Analysis using Legion.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pebay, Philippe Pierre; Bennett, Janine Camille
2015-11-01
In this report, we propose a framework for the design and implementation of in-situ analy- ses using an asynchronous many-task (AMT) model, using the Legion programming model together with the MiniAero mini-application as a surrogate for full-scale parallel scientific computing applications. The bulk of this work consists of converting the Learn/Derive/Assess model which we had initially developed for parallel statistical analysis using MPI [PTBM11], from a SPMD to an AMT model. In this goal, we propose an original use of the concept of Legion logical regions as a replacement for the parallel communication schemes used for the only operation ofmore » the statistics engines that require explicit communication. We then evaluate this proposed scheme in a shared memory environment, using the Legion port of MiniAero as a proxy for a full-scale scientific application, as a means to provide input data sets of variable size for the in-situ statistical analyses in an AMT context. We demonstrate in particular that the approach has merit, and warrants further investigation, in collaboration with ongoing efforts to improve the overall parallel performance of the Legion system.« less
Low altitude wind shear statistics derived from measured and FAA proposed standard wind profiles
NASA Technical Reports Server (NTRS)
Dunham, R. E., Jr.; Usry, J. W.
1984-01-01
Wind shear statistics were calculated for a simulated data set using wind profiles proposed as a standard and compared to statistics derived from measured wind profile data. Wind shear values were grouped in altitude bands of 100 ft between 100 and 1400 ft, and in wind shear increments of 0.025 kt/ft between + or - 0.600 kt/ft for the simulated data set and between + or - 0.200 kt/ft for the measured set. No values existed outside the + or - 0.200 kt/ft boundaries for the measured data. Frequency distributions, means, and standard deviations were derived for each altitude band for both data sets, and compared. Also, frequency distributions were derived for the total sample for both data sets and compared. Frequency of occurrence of a given wind shear was about the same for both data sets for wind shears, but less than + or 0.10 kt/ft, but the simulated data set had larger values outside these boundaries. Neglecting the vertical wind component did not significantly affect the statistics for these data sets. The frequency of occurrence of wind shears for the flight measured data was essentially the same for each altitude band and the total sample, but the simulated data distributions were different for each altitude band. The larger wind shears for the flight measured data were found to have short durations.
Burroughs, N J; Pillay, D; Mutimer, D
1999-01-01
Bayesian analysis using a virus dynamics model is demonstrated to facilitate hypothesis testing of patterns in clinical time-series. Our Markov chain Monte Carlo implementation demonstrates that the viraemia time-series observed in two sets of hepatitis B patients on antiviral (lamivudine) therapy, chronic carriers and liver transplant patients, are significantly different, overcoming clinical trial design differences that question the validity of non-parametric tests. We show that lamivudine-resistant mutants grow faster in transplant patients than in chronic carriers, which probably explains the differences in emergence times and failure rates between these two sets of patients. Incorporation of dynamic models into Bayesian parameter analysis is of general applicability in medical statistics. PMID:10643081
Student nurses' perceived use of NANDA-I nursing diagnoses in the community setting.
Ogunfowokan, Adesola A; Oluwatosin, Abimbola O; Olajubu, Aanuoluwapo O; Alao, Olujide A; Faremi, Adenike F
2013-02-01
Study explored knowledge and perception of student nurses on the use of NANDA-I nursing diagnoses in the community setting. Study adopted cross-sectional design. Convenient sampling method was used to select 290 nursing students. Data analysis was by descriptive and inferential statistics. A majority (81.3%) of the participants considered NANDA-I nursing diagnoses to be useful in the community. Significant association existed in the perception and level of education of the students (χ(2) = 8.257, d.f. = 1, p= .04). Knowledge and perception of the participants about the use of NANDA-I nursing diagnoses in the community is satisfactory. Use of NANDA-I nursing diagnoses should be encouraged among community health nurses. © 2012, The Authors. International Journal of Nursing Knowledge © 2012, NANDA International.
On designing a new cumulative sum Wilcoxon signed rank chart for monitoring process location
Nazir, Hafiz Zafar; Tahir, Muhammad; Riaz, Muhammad
2018-01-01
In this paper, ranked set sampling is used for developing a non-parametric location chart which is developed on the basis of Wilcoxon signed rank statistic. The average run length and some other characteristics of run length are used as the measures to assess the performance of the proposed scheme. Some selective distributions including Laplace (or double exponential), logistic, normal, contaminated normal and student’s t-distributions are considered to examine the performance of the proposed Wilcoxon signed rank control chart. It has been observed that the proposed scheme shows superior shift detection ability than some of the competing counterpart schemes covered in this study. Moreover, the proposed control chart is also implemented and illustrated with a real data set. PMID:29664919
A risk-based approach to management of leachables utilizing statistical analysis of extractables.
Stults, Cheryl L M; Mikl, Jaromir; Whelehan, Oliver; Morrical, Bradley; Duffield, William; Nagao, Lee M
2015-04-01
To incorporate quality by design concepts into the management of leachables, an emphasis is often put on understanding the extractable profile for the materials of construction for manufacturing disposables, container-closure, or delivery systems. Component manufacturing processes may also impact the extractable profile. An approach was developed to (1) identify critical components that may be sources of leachables, (2) enable an understanding of manufacturing process factors that affect extractable profiles, (3) determine if quantitative models can be developed that predict the effect of those key factors, and (4) evaluate the practical impact of the key factors on the product. A risk evaluation for an inhalation product identified injection molding as a key process. Designed experiments were performed to evaluate the impact of molding process parameters on the extractable profile from an ABS inhaler component. Statistical analysis of the resulting GC chromatographic profiles identified processing factors that were correlated with peak levels in the extractable profiles. The combination of statistically significant molding process parameters was different for different types of extractable compounds. ANOVA models were used to obtain optimal process settings and predict extractable levels for a selected number of compounds. The proposed paradigm may be applied to evaluate the impact of material composition and processing parameters on extractable profiles and utilized to manage product leachables early in the development process and throughout the product lifecycle.
Tasker, Gary D.; Granato, Gregory E.
2000-01-01
Decision makers need viable methods for the interpretation of local, regional, and national-highway runoff and urban-stormwater data including flows, concentrations and loads of chemical constituents and sediment, potential effects on receiving waters, and the potential effectiveness of various best management practices (BMPs). Valid (useful for intended purposes), current, and technically defensible stormwater-runoff models are needed to interpret data collected in field studies, to support existing highway and urban-runoffplanning processes, to meet National Pollutant Discharge Elimination System (NPDES) requirements, and to provide methods for computation of Total Maximum Daily Loads (TMDLs) systematically and economically. Historically, conceptual, simulation, empirical, and statistical models of varying levels of detail, complexity, and uncertainty have been used to meet various data-quality objectives in the decision-making processes necessary for the planning, design, construction, and maintenance of highways and for other land-use applications. Water-quality simulation models attempt a detailed representation of the physical processes and mechanisms at a given site. Empirical and statistical regional water-quality assessment models provide a more general picture of water quality or changes in water quality over a region. All these modeling techniques share one common aspect-their predictive ability is poor without suitable site-specific data for calibration. To properly apply the correct model, one must understand the classification of variables, the unique characteristics of water-resources data, and the concept of population structure and analysis. Classifying variables being used to analyze data may determine which statistical methods are appropriate for data analysis. An understanding of the characteristics of water-resources data is necessary to evaluate the applicability of different statistical methods, to interpret the results of these techniques, and to use tools and techniques that account for the unique nature of water-resources data sets. Populations of data on stormwater-runoff quantity and quality are often best modeled as logarithmic transformations. Therefore, these factors need to be considered to form valid, current, and technically defensible stormwater-runoff models. Regression analysis is an accepted method for interpretation of water-resources data and for prediction of current or future conditions at sites that fit the input data model. Regression analysis is designed to provide an estimate of the average response of a system as it relates to variation in one or more known variables. To produce valid models, however, regression analysis should include visual analysis of scatterplots, an examination of the regression equation, evaluation of the method design assumptions, and regression diagnostics. A number of statistical techniques are described in the text and in the appendixes to provide information necessary to interpret data by use of appropriate methods. Uncertainty is an important part of any decisionmaking process. In order to deal with uncertainty problems, the analyst needs to know the severity of the statistical uncertainty of the methods used to predict water quality. Statistical models need to be based on information that is meaningful, representative, complete, precise, accurate, and comparable to be deemed valid, up to date, and technically supportable. To assess uncertainty in the analytical tools, the modeling methods, and the underlying data set, all of these components need be documented and communicated in an accessible format within project publications.
StreamStats: A water resources web application
Ries, Kernell G.; Guthrie, John G.; Rea, Alan H.; Steeves, Peter A.; Stewart, David W.
2008-01-01
Streamflow statistics, such as the 1-percent flood, the mean flow, and the 7-day 10-year low flow, are used by engineers, land managers, biologists, and many others to help guide decisions in their everyday work. For example, estimates of the 1-percent flood (the flow that is exceeded, on average, once in 100 years and has a 1-percent chance of being exceeded in any year, sometimes referred to as the 100-year flood) are used to create flood-plain maps that form the basis for setting insurance rates and land-use zoning. This and other streamflow statistics also are used for dam, bridge, and culvert design; water-supply planning and management; water-use appropriations and permitting; wastewater and industrial discharge permitting; hydropower facility design and regulation; and the setting of minimum required streamflows to protect freshwater ecosystems. In addition, researchers, planners, regulators, and others often need to know the physical and climatic characteristics of the drainage basins (basin characteristics) and the influence of human activities, such as dams and water withdrawals, on streamflow upstream from locations of interest to understand the mechanisms that control water availability and quality at those locations. Knowledge of the streamflow network and downstream human activities also is necessary to adequately determine whether an upstream activity, such as a water withdrawal, can be allowed without adversely affecting downstream activities.Streamflow statistics could be needed at any location along a stream. Most often, streamflow statistics are needed at ungaged sites, where no streamflow data are available to compute the statistics. At U.S. Geological Survey (USGS) streamflow data-collection stations, which include streamgaging stations, partial-record stations, and miscellaneous-measurement stations, streamflow statistics can be computed from available data for the stations. Streamflow data are collected continuously at streamgaging stations. Streamflow measurements are collected systematically over a period of years at partial-record stations to estimate peak-flow or low-flow statistics. Streamflow measurements usually are collected at miscellaneous-measurement stations for specific hydrologic studies with various objectives.StreamStats is a Web-based Geographic Information System (GIS) application that was created by the USGS, in cooperation with Environmental Systems Research Institute, Inc. (ESRI)1, to provide users with access to an assortment of analytical tools that are useful for water-resources planning and management. StreamStats functionality is based on ESRI’s ArcHydro Data Model and Tools, described on the Web at http://resources.arcgis.com/en/communities/hydro/01vn0000000s000000.htm. StreamStats allows users to easily obtain streamflow statistics, basin characteristics, and descriptive information for USGS data-collection stations and user-selected ungaged sites. It also allows users to identify stream reaches that are upstream and downstream from user-selected sites, and to identify and obtain information for locations along the streams where activities that may affect streamflow conditions are occurring. This functionality can be accessed through a map-based user interface that appears in the user’s Web browser, or individual functions can be requested remotely as Web services by other Web or desktop computer applications. StreamStats can perform these analyses much faster than historically used manual techniques.StreamStats was designed so that each state would be implemented as a separate application, with a reliance on local partnerships to fund the individual applications, and a goal of eventual full national implementation. Idaho became the first state to implement StreamStats in 2003. By mid-2008, 14 states had applications available to the public, and 18 other states were in various stages of implementation.
Correcting for Optimistic Prediction in Small Data Sets
Smith, Gordon C. S.; Seaman, Shaun R.; Wood, Angela M.; Royston, Patrick; White, Ian R.
2014-01-01
The C statistic is a commonly reported measure of screening test performance. Optimistic estimation of the C statistic is a frequent problem because of overfitting of statistical models in small data sets, and methods exist to correct for this issue. However, many studies do not use such methods, and those that do correct for optimism use diverse methods, some of which are known to be biased. We used clinical data sets (United Kingdom Down syndrome screening data from Glasgow (1991–2003), Edinburgh (1999–2003), and Cambridge (1990–2006), as well as Scottish national pregnancy discharge data (2004–2007)) to evaluate different approaches to adjustment for optimism. We found that sample splitting, cross-validation without replication, and leave-1-out cross-validation produced optimism-adjusted estimates of the C statistic that were biased and/or associated with greater absolute error than other available methods. Cross-validation with replication, bootstrapping, and a new method (leave-pair-out cross-validation) all generated unbiased optimism-adjusted estimates of the C statistic and had similar absolute errors in the clinical data set. Larger simulation studies confirmed that all 3 methods performed similarly with 10 or more events per variable, or when the C statistic was 0.9 or greater. However, with lower events per variable or lower C statistics, bootstrapping tended to be optimistic but with lower absolute and mean squared errors than both methods of cross-validation. PMID:24966219
DOE Office of Scientific and Technical Information (OSTI.GOV)
Poyer, D.A.
In this report, tests of statistical significance of five sets of variables with household energy consumption (at the point of end-use) are described. Five models, in sequence, were empirically estimated and tested for statistical significance by using the Residential Energy Consumption Survey of the US Department of Energy, Energy Information Administration. Each model incorporated additional information, embodied in a set of variables not previously specified in the energy demand system. The variable sets were generally labeled as economic variables, weather variables, household-structure variables, end-use variables, and housing-type variables. The tests of statistical significance showed each of the variable sets tomore » be highly significant in explaining the overall variance in energy consumption. The findings imply that the contemporaneous interaction of different types of variables, and not just one exclusive set of variables, determines the level of household energy consumption.« less
Molecular system identification for enzyme directed evolution and design
NASA Astrophysics Data System (ADS)
Guan, Xiangying; Chakrabarti, Raj
2017-09-01
The rational design of chemical catalysts requires methods for the measurement of free energy differences in the catalytic mechanism for any given catalyst Hamiltonian. The scope of experimental learning algorithms that can be applied to catalyst design would also be expanded by the availability of such methods. Methods for catalyst characterization typically either estimate apparent kinetic parameters that do not necessarily correspond to free energy differences in the catalytic mechanism or measure individual free energy differences that are not sufficient for establishing the relationship between the potential energy surface and catalytic activity. Moreover, in order to enhance the duty cycle of catalyst design, statistically efficient methods for the estimation of the complete set of free energy differences relevant to the catalytic activity based on high-throughput measurements are preferred. In this paper, we present a theoretical and algorithmic system identification framework for the optimal estimation of free energy differences in solution phase catalysts, with a focus on one- and two-substrate enzymes. This framework, which can be automated using programmable logic, prescribes a choice of feasible experimental measurements and manipulated input variables that identify the complete set of free energy differences relevant to the catalytic activity and minimize the uncertainty in these free energy estimates for each successive Hamiltonian design. The framework also employs decision-theoretic logic to determine when model reduction can be applied to improve the duty cycle of high-throughput catalyst design. Automation of the algorithm using fluidic control systems is proposed, and applications of the framework to the problem of enzyme design are discussed.
A joint source-channel distortion model for JPEG compressed images.
Sabir, Muhammad F; Sheikh, Hamid Rahim; Heath, Robert W; Bovik, Alan C
2006-06-01
The need for efficient joint source-channel coding (JSCC) is growing as new multimedia services are introduced in commercial wireless communication systems. An important component of practical JSCC schemes is a distortion model that can predict the quality of compressed digital multimedia such as images and videos. The usual approach in the JSCC literature for quantifying the distortion due to quantization and channel errors is to estimate it for each image using the statistics of the image for a given signal-to-noise ratio (SNR). This is not an efficient approach in the design of real-time systems because of the computational complexity. A more useful and practical approach would be to design JSCC techniques that minimize average distortion for a large set of images based on some distortion model rather than carrying out per-image optimizations. However, models for estimating average distortion due to quantization and channel bit errors in a combined fashion for a large set of images are not available for practical image or video coding standards employing entropy coding and differential coding. This paper presents a statistical model for estimating the distortion introduced in progressive JPEG compressed images due to quantization and channel bit errors in a joint manner. Statistical modeling of important compression techniques such as Huffman coding, differential pulse-coding modulation, and run-length coding are included in the model. Examples show that the distortion in terms of peak signal-to-noise ratio (PSNR) can be predicted within a 2-dB maximum error over a variety of compression ratios and bit-error rates. To illustrate the utility of the proposed model, we present an unequal power allocation scheme as a simple application of our model. Results show that it gives a PSNR gain of around 6.5 dB at low SNRs, as compared to equal power allocation.
Yothers, Greg; O’Connell, Michael J.; Beart, Robert W.; Wozniak, Timothy F.; Pitot, Henry C.; Shields, Anthony F.; Landry, Jerome C.; Ryan, David P.; Arora, Amit; Evans, Lisa S.; Bahary, Nathan; Soori, Gamini; Eakle, Janice F.; Robertson, John M.; Moore, Dennis F.; Mullane, Michael R.; Marchello, Benjamin T.; Ward, Patrick J.; Sharif, Saima; Roh, Mark S.; Wolmark, Norman
2015-01-01
Background: National Surgical Adjuvant Breast and Bowel Project R-04 was designed to determine whether the oral fluoropyrimidine capecitabine could be substituted for continuous infusion 5-FU in the curative setting of stage II/III rectal cancer during neoadjuvant radiation therapy and whether the addition of oxaliplatin could further enhance the activity of fluoropyrimidine-sensitized radiation. Methods: Patients with clinical stage II or III rectal cancer undergoing preoperative radiation were randomly assigned to one of four chemotherapy regimens in a 2x2 design: CVI 5-FU or oral capecitabine with or without oxaliplatin. The primary endpoint was local-regional tumor control. Time-to-event endpoint distributions were estimated using the Kaplan-Meier method. Hazard ratios were estimated from Cox proportional hazard models. All statistical tests were two-sided. Results: Among 1608 randomized patients there were no statistically significant differences between regimens using 5-FU vs capecitabine in three-year local-regional tumor event rates (11.2% vs 11.8%), 5-year DFS (66.4% vs 67.7%), or 5-year OS (79.9% vs 80.8%); or for oxaliplatin vs no oxaliplatin for the three endpoints of local-regional events, DFS, and OS (11.2% vs 12.1%, 69.2% vs 64.2%, and 81.3% vs 79.0%). The addition of oxaliplatin was associated with statistically significantly more overall and grade 3–4 diarrhea (P < .0001). Three-year rates of local-regional recurrence among patients who underwent R0 resection ranged from 3.1 to 5.1% depending on the study arm. Conclusions: Continuous infusion 5-FU produced outcomes for local-regional control, DFS, and OS similar to those obtained with oral capecitabine combined with radiation. This study establishes capecitabine as a standard of care in the pre-operative rectal setting. Oxaliplatin did not improve the local-regional failure rate, DFS, or OS for any patient risk group but did add considerable toxicity. PMID:26374429
Cross Contamination: Are Hospital Gloves Reservoirs for Nosocomial Infections?
Moran, Vicki; Heuertz, Rita
2017-01-01
Use of disposable nonsterile gloves in the hospital setting is second only to proper hand washing in reducing contamination during patient contact. Because proper handwashing is not consistently practiced, added emphasis on glove use is warranted. There is a growing body of evidence that glove boxes and dispensers available to healthcare workers are contaminated by daily exposure to environmental organisms. This finding, in conjunction with new and emerging antibiotic-resistant bacteria, poses a threat to patients and healthcare workers alike. A newly designed glove dispenser may reduce contamination of disposable gloves. The authors investigated contamination of nonsterile examination gloves in an Emergency Department setting according to the type of dispenser used to access gloves. A statistically significant difference existed between the number of bacterial colonies and the type of dispenser: the downward-facing glove dispenser had a lower number of bacteria on the gloves. There was no statistically significant difference in the number of gloves contaminated between the two types of glove dispensers. The study demonstrated that contamination of disposable gloves existed. Additional research using a larger sample size would validate a difference in the contamination of disposable gloves using outward or downward glove dispensers.
Burdo, Joseph; O'Dwyer, Laura
2015-12-01
Concept mapping and retrieval practice are both educational methods that have separately been reported to provide significant benefits for learning in diverse settings. Concept mapping involves diagramming a hierarchical representation of relationships between distinct pieces of information, whereas retrieval practice involves retrieving information that was previously coded into memory. The relative benefits of these two methods have never been tested against each other in a classroom setting. Our study was designed to investigate whether or not concept mapping or retrieval practice produced a significant learning benefit in an undergraduate physiology course as measured by exam performance and, if so, was the benefit of one method significantly greater than the other. We found that there was a trend toward increased exam scores for the retrieval practice group compared with both the control group and concept mapping group, and that trend achieved statistical significance for one of the four module exams in the course. We also found that women performed statistically better than men on the module exam that contained a substantial amount of material relating to female reproductive physiology. Copyright © 2015 The American Physiological Society.
Hiemstra-van Mastrigt, Suzanne; Groenesteijn, Liesbeth; Vink, Peter; Kuijt-Evers, Lottie F M
2017-07-01
This literature review focused on passenger seat comfort and discomfort in a human-product-context interaction. The relationships between anthropometric variables (human level), activities (context level), seat characteristics (product level) and the perception of comfort and discomfort were studied through mediating variables, such as body posture, movement and interface pressure. It is concluded that there are correlations between anthropometric variables and interface pressure variables, and that this relationship is affected by body posture. The results of studies on the correlation between pressure variables and passenger comfort and discomfort are not in line with each other. Only associations were found between the other variables (e.g. activities and seat characteristics). A conceptual model illustrates the results of the review, but relationships could not be quantified due to a lack of statistical evidence and large differences in research set-ups between the reviewed papers. Practitioner Summary: This literature review set out to quantify the relationships between human, context and seat characteristics, and comfort and discomfort experience of passenger seats, in order to build a predictive model that can support seat designers and purchasers to make informed decisions. However, statistical evidence is lacking from existing literature.
A Novel Signal Modeling Approach for Classification of Seizure and Seizure-Free EEG Signals.
Gupta, Anubha; Singh, Pushpendra; Karlekar, Mandar
2018-05-01
This paper presents a signal modeling-based new methodology of automatic seizure detection in EEG signals. The proposed method consists of three stages. First, a multirate filterbank structure is proposed that is constructed using the basis vectors of discrete cosine transform. The proposed filterbank decomposes EEG signals into its respective brain rhythms: delta, theta, alpha, beta, and gamma. Second, these brain rhythms are statistically modeled with the class of self-similar Gaussian random processes, namely, fractional Brownian motion and fractional Gaussian noises. The statistics of these processes are modeled using a single parameter called the Hurst exponent. In the last stage, the value of Hurst exponent and autoregressive moving average parameters are used as features to design a binary support vector machine classifier to classify pre-ictal, inter-ictal (epileptic with seizure free interval), and ictal (seizure) EEG segments. The performance of the classifier is assessed via extensive analysis on two widely used data set and is observed to provide good accuracy on both the data set. Thus, this paper proposes a novel signal model for EEG data that best captures the attributes of these signals and hence, allows to boost the classification accuracy of seizure and seizure-free epochs.
Garud, Nandita R; Rosenberg, Noah A
2015-06-01
Soft selective sweeps represent an important form of adaptation in which multiple haplotypes bearing adaptive alleles rise to high frequency. Most statistical methods for detecting selective sweeps from genetic polymorphism data, however, have focused on identifying hard selective sweeps in which a favored allele appears on a single haplotypic background; these methods might be underpowered to detect soft sweeps. Among exceptions is the set of haplotype homozygosity statistics introduced for the detection of soft sweeps by Garud et al. (2015). These statistics, examining frequencies of multiple haplotypes in relation to each other, include H12, a statistic designed to identify both hard and soft selective sweeps, and H2/H1, a statistic that conditional on high H12 values seeks to distinguish between hard and soft sweeps. A challenge in the use of H2/H1 is that its range depends on the associated value of H12, so that equal H2/H1 values might provide different levels of support for a soft sweep model at different values of H12. Here, we enhance the H12 and H2/H1 haplotype homozygosity statistics for selective sweep detection by deriving the upper bound on H2/H1 as a function of H12, thereby generating a statistic that normalizes H2/H1 to lie between 0 and 1. Through a reanalysis of resequencing data from inbred lines of Drosophila, we show that the enhanced statistic both strengthens interpretations obtained with the unnormalized statistic and leads to empirical insights that are less readily apparent without the normalization. Copyright © 2015 Elsevier Inc. All rights reserved.
Identifying positive deviants in healthcare quality and safety: a mixed methods study.
O'Hara, Jane K; Grasic, Katja; Gutacker, Nils; Street, Andrew; Foy, Robbie; Thompson, Carl; Wright, John; Lawton, Rebecca
2018-01-01
Objective Solutions to quality and safety problems exist within healthcare organisations, but to maximise the learning from these positive deviants, we first need to identify them. This study explores using routinely collected, publicly available data in England to identify positively deviant services in one region of the country. Design A mixed methods study undertaken July 2014 to February 2015, employing expert discussion, consensus and statistical modelling to identify indicators of quality and safety, establish a set of criteria to inform decisions about which indicators were robust and useful measures, and whether these could be used to identify positive deviants. Setting Yorkshire and Humber, England. Participants None - analysis based on routinely collected, administrative English hospital data. Main outcome measures We identified 49 indicators of quality and safety from acute care settings across eight data sources. Twenty-six indicators did not allow comparison of quality at the sub-hospital level. Of the 23 remaining indicators, 12 met all criteria and were possible candidates for identifying positive deviants. Results Four indicators (readmission and patient reported outcomes for hip and knee surgery) offered indicators of the same service. These were selected by an expert group as the basis for statistical modelling, which supported identification of one service in Yorkshire and Humber showing a 50% positive deviation from the national average. Conclusion Relatively few indicators of quality and safety relate to a service level, making meaningful comparisons and local improvement based on the measures difficult. It was possible, however, to identify a set of indicators that provided robust measurement of the quality and safety of services providing hip and knee surgery.
Implementing guidelines in nursing homes: a systematic review.
Diehl, Heinz; Graverholt, Birgitte; Espehaug, Birgitte; Lund, Hans
2016-07-25
Research on guideline implementation strategies has mostly been conducted in settings which differ significantly from a nursing home setting and its transferability to the nursing home setting is therefore limited. The objective of this study was to systematically review the effects of interventions to improve the implementation of guidelines in nursing homes. A systematic literature search was conducted in the Cochrane Library, CINAHL, Embase, MEDLINE, DARE, HTA, CENTRAL, SveMed + and ISI Web of Science from their inception until August 2015. Reference screening and a citation search were performed. Studies were eligible if they evaluated any type of guideline implementation strategy in a nursing home setting. Eligible study designs were systematic reviews, randomised controlled trials, non-randomised controlled trials, controlled before-after studies and interrupted-time-series studies. The EPOC risk of bias tool was used to evaluate the risk of bias in the included studies. The overall quality of the evidence was rated using GRADE. Five cluster-randomised controlled trials met the inclusion criteria, evaluating a total of six different multifaceted implementation strategies. One study reported a small statistically significant effect on professional practice, and two studies demonstrated small to moderate statistically significant effects on patient outcome. The overall quality of the evidence for all comparisons was low or very low using GRADE. Little is known about how to improve the implementation of guidelines in nursing homes, and the evidence to support or discourage particular interventions is inconclusive. More implementation research is needed to ensure high quality of care in nursing homes. PROSPERO 2014: CRD42014007664.
A persuasive concept of research-oriented teaching in Soil Biochemistry
NASA Astrophysics Data System (ADS)
Blagodatskaya, Evgenia; Kuzyakova, Irina
2013-04-01
One of the main problems of existing bachelor programs is disconnection of basic and experimental education: even during practical training the methods learned are not related to characterization of soil field experiments and observed soil processes. We introduce a multi-level research-oriented teaching system involving Bachelor students in four-semesters active study by integration the basic knowledge, experimental techniques, statistical approaches, project design and it's realization.The novelty of research-oriented teaching system is based 1) on linkage of ongoing experiment to the study of statistical methods and 2) on self-responsibility of students for interpretation of soil chemical and biochemical characteristics obtained in the very beginning of their study by analysing the set of soil samples allowing full-factorial data treatment. This experimental data set is related to specific soil stand and is used as a backbone of the teaching system accelerating the student's interest to soil studies and motivating them for application of basic knowledge from lecture courses. The multi-level system includes: 1) basic lecture course on soil biochemistry with analysis of research questions, 2) practical training course on laboratory analytics where small groups of students are responsible for analysis of soil samples related to the specific land-use/forest type/forest age; 3) training course on biotic (e.g. respiration) - abiotic (e.g. temperature, moisture, fire etc.) interactions in the same soil samples; 4) theoretical seminars where students present and make a first attempt to explain soil characteristics of various soil stands as affected by abiotic factors (first semester); 5) lecture and seminar course on soil statistics where students apply newly learned statistical methods to prove their conclusions and to find relationships between soil characteristics obtained during first semester; 6) seminar course on project design where students develop their scientific projects to study the uncertainties revealed in soil responses to abiotic factors (second and third semesters); 7) Lecture, seminar and training courses on estimation of active microbial biomass in soil where students realize their projects applying a new knowledge to the soils from the stands they are responsible for (fourth semester). Thus, during four semesters the students continuously combine the theoretical knowledge from the lectures with their own experimental experience, compare and discuss results of various groups during seminars and obtain the skills in project design. The successful application of research-oriented teaching system in University of Göttingen allowed each student the early-stage revealing knowledge gaps, accelerated their involvement in ongoing research projects, and motivated them to begin own scientific career.
Heo, Moonseong; Meissner, Paul; Litwin, Alain H; Arnsten, Julia H; McKee, M Diane; Karasz, Alison; McKinley, Paula; Rehm, Colin D; Chambers, Earle C; Yeh, Ming-Chin; Wylie-Rosett, Judith
2017-01-01
Comparative effectiveness research trials in real-world settings may require participants to choose between preferred intervention options. A randomized clinical trial with parallel experimental and control arms is straightforward and regarded as a gold standard design, but by design it forces and anticipates the participants to comply with a randomly assigned intervention regardless of their preference. Therefore, the randomized clinical trial may impose impractical limitations when planning comparative effectiveness research trials. To accommodate participants' preference if they are expressed, and to maintain randomization, we propose an alternative design that allows participants' preference after randomization, which we call a "preference option randomized design (PORD)". In contrast to other preference designs, which ask whether or not participants consent to the assigned intervention after randomization, the crucial feature of preference option randomized design is its unique informed consent process before randomization. Specifically, the preference option randomized design consent process informs participants that they can opt out and switch to the other intervention only if after randomization they actively express the desire to do so. Participants who do not independently express explicit alternate preference or assent to the randomly assigned intervention are considered to not have an alternate preference. In sum, preference option randomized design intends to maximize retention, minimize possibility of forced assignment for any participants, and to maintain randomization by allowing participants with no or equal preference to represent random assignments. This design scheme enables to define five effects that are interconnected with each other through common design parameters-comparative, preference, selection, intent-to-treat, and overall/as-treated-to collectively guide decision making between interventions. Statistical power functions for testing all these effects are derived, and simulations verified the validity of the power functions under normal and binomial distributions.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mackenzie, Cristóbal; Pichara, Karim; Protopapas, Pavlos
The success of automatic classification of variable stars depends strongly on the lightcurve representation. Usually, lightcurves are represented as a vector of many descriptors designed by astronomers called features. These descriptors are expensive in terms of computing, require substantial research effort to develop, and do not guarantee a good classification. Today, lightcurve representation is not entirely automatic; algorithms must be designed and manually tuned up for every survey. The amounts of data that will be generated in the future mean astronomers must develop scalable and automated analysis pipelines. In this work we present a feature learning algorithm designed for variablemore » objects. Our method works by extracting a large number of lightcurve subsequences from a given set, which are then clustered to find common local patterns in the time series. Representatives of these common patterns are then used to transform lightcurves of a labeled set into a new representation that can be used to train a classifier. The proposed algorithm learns the features from both labeled and unlabeled lightcurves, overcoming the bias using only labeled data. We test our method on data sets from the Massive Compact Halo Object survey and the Optical Gravitational Lensing Experiment; the results show that our classification performance is as good as and in some cases better than the performance achieved using traditional statistical features, while the computational cost is significantly lower. With these promising results, we believe that our method constitutes a significant step toward the automation of the lightcurve classification pipeline.« less
Clustering-based Feature Learning on Variable Stars
NASA Astrophysics Data System (ADS)
Mackenzie, Cristóbal; Pichara, Karim; Protopapas, Pavlos
2016-04-01
The success of automatic classification of variable stars depends strongly on the lightcurve representation. Usually, lightcurves are represented as a vector of many descriptors designed by astronomers called features. These descriptors are expensive in terms of computing, require substantial research effort to develop, and do not guarantee a good classification. Today, lightcurve representation is not entirely automatic; algorithms must be designed and manually tuned up for every survey. The amounts of data that will be generated in the future mean astronomers must develop scalable and automated analysis pipelines. In this work we present a feature learning algorithm designed for variable objects. Our method works by extracting a large number of lightcurve subsequences from a given set, which are then clustered to find common local patterns in the time series. Representatives of these common patterns are then used to transform lightcurves of a labeled set into a new representation that can be used to train a classifier. The proposed algorithm learns the features from both labeled and unlabeled lightcurves, overcoming the bias using only labeled data. We test our method on data sets from the Massive Compact Halo Object survey and the Optical Gravitational Lensing Experiment; the results show that our classification performance is as good as and in some cases better than the performance achieved using traditional statistical features, while the computational cost is significantly lower. With these promising results, we believe that our method constitutes a significant step toward the automation of the lightcurve classification pipeline.
Ninety-three pictures and 108 questions for the elicitation of homophones
FERREIRA, VICTOR S.; CUTTING, J. COOPER
2007-01-01
Homographs and homophones have interesting linguistic properties that make them useful in many experiments involving language. To assist researchers in the elicitation of homophones, this paper presents a set of 93 line-drawn pictures of objects with homophonic names and a set of 108 questions with homophonic answers. Statistics are also included for each picture and question: Picture statistics include name-agreement percentages, dominance, and frequency statistics of depicted referents, and picture-naming latencies both with and without study of the picture names. For questions, statistics include answer-agreement percentages, difficulty ratings, dominance, frequency statistics, and naming latencies for 60 of the most consistently answered questions. PMID:18185842
Crans, Gerald G; Shuster, Jonathan J
2008-08-15
The debate as to which statistical methodology is most appropriate for the analysis of the two-sample comparative binomial trial has persisted for decades. Practitioners who favor the conditional methods of Fisher, Fisher's exact test (FET), claim that only experimental outcomes containing the same amount of information should be considered when performing analyses. Hence, the total number of successes should be fixed at its observed level in hypothetical repetitions of the experiment. Using conditional methods in clinical settings can pose interpretation difficulties, since results are derived using conditional sample spaces rather than the set of all possible outcomes. Perhaps more importantly from a clinical trial design perspective, this test can be too conservative, resulting in greater resource requirements and more subjects exposed to an experimental treatment. The actual significance level attained by FET (the size of the test) has not been reported in the statistical literature. Berger (J. R. Statist. Soc. D (The Statistician) 2001; 50:79-85) proposed assessing the conservativeness of conditional methods using p-value confidence intervals. In this paper we develop a numerical algorithm that calculates the size of FET for sample sizes, n, up to 125 per group at the two-sided significance level, alpha = 0.05. Additionally, this numerical method is used to define new significance levels alpha(*) = alpha+epsilon, where epsilon is a small positive number, for each n, such that the size of the test is as close as possible to the pre-specified alpha (0.05 for the current work) without exceeding it. Lastly, a sample size and power calculation example are presented, which demonstrates the statistical advantages of implementing the adjustment to FET (using alpha(*) instead of alpha) in the two-sample comparative binomial trial. 2008 John Wiley & Sons, Ltd
Bottle, Alex; Darzi, Ara W; Athanasiou, Thanos; Vale, Justin A
2010-01-01
Objectives To investigate the relation between volume and mortality after adjustment for case mix for radical cystectomy in the English healthcare setting using improved statistical methodology, taking into account the institutional and surgeon volume effects and institutional structural and process of care factors. Design Retrospective analysis of hospital episode statistics using multilevel modelling. Setting English hospitals carrying out radical cystectomy in the seven financial years 2000/1 to 2006/7. Participants Patients with a primary diagnosis of cancer undergoing an inpatient elective cystectomy. Main outcome measure Mortality within 30 days of cystectomy. Results Compared with low volume institutions, medium volume ones had a significantly higher odds of in-hospital and total mortality: odds ratio 1.72 (95% confidence interval 1.00 to 2.98, P=0.05) and 1.82 (1.08 to 3.06, P=0.02). This was only seen in the final model, which included adjustment for structural and processes of care factors. The surgeon volume-mortality relation showed weak evidence of reduced odds of in-hospital mortality (by 35%) for the high volume surgeons, although this did not reach statistical significance at the 5% level. Conclusions The relation between case volume and mortality after radical cystectomy for bladder cancer became evident only after adjustment for structural and process of care factors, including staffing levels of nurses and junior doctors, in addition to case mix. At least for this relatively uncommon procedure, adjusting for these confounders when examining the volume-outcome relation is critical before considering centralisation of care to a few specialist institutions. Outcomes other than mortality, such as functional morbidity and disease recurrence may ultimately influence towards centralising care. PMID:20305302
Statistical Inference at Work: Statistical Process Control as an Example
ERIC Educational Resources Information Center
Bakker, Arthur; Kent, Phillip; Derry, Jan; Noss, Richard; Hoyles, Celia
2008-01-01
To characterise statistical inference in the workplace this paper compares a prototypical type of statistical inference at work, statistical process control (SPC), with a type of statistical inference that is better known in educational settings, hypothesis testing. Although there are some similarities between the reasoning structure involved in…
User's Guide for Monthly Vector Wind Profile Model
NASA Technical Reports Server (NTRS)
Adelfang, S. I.
1999-01-01
The background, theoretical concepts, and methodology for construction of vector wind profiles based on a statistical model are presented. The derived monthly vector wind profiles are to be applied by the launch vehicle design community for establishing realistic estimates of critical vehicle design parameter dispersions related to wind profile dispersions. During initial studies a number of months are used to establish the model profiles that produce the largest monthly dispersions of ascent vehicle aerodynamic load indicators. The largest monthly dispersions for wind, which occur during the winter high-wind months, are used for establishing the design reference dispersions for the aerodynamic load indicators. This document includes a description of the computational process for the vector wind model including specification of input data, parameter settings, and output data formats. Sample output data listings are provided to aid the user in the verification of test output.
Rothmann, Mark
2005-01-01
When testing the equality of means from two different populations, a t-test or large sample normal test tend to be performed. For these tests, when the sample size or design for the second sample is dependent on the results of the first sample, the type I error probability is altered for each specific possibility in the null hypothesis. We will examine the impact on the type I error probabilities for two confidence interval procedures and procedures using test statistics when the design for the second sample or experiment is dependent on the results from the first sample or experiment (or series of experiments). Ways for controlling a desired maximum type I error probability or a desired type I error rate will be discussed. Results are applied to the setting of noninferiority comparisons in active controlled trials where the use of a placebo is unethical.
NASA Astrophysics Data System (ADS)
Berthoud, L.; Gliddon, J.
2018-03-01
In today's global Aerospace industry, virtual workspaces are commonly used for collaboration between geographically distributed multidisciplinary teams. This study investigated the use of wikis to look at communication, collaboration and engagement in 'Capstone' team design projects at the end of an engineering degree. Wikis were set up for teams of engineering students from different disciplinary backgrounds and years. The students' perception of the usefulness of the tool were surveyed and the user contribution statistics and content categorisation were analysed for a case study wiki. Recommendations and lessons learned for the deployment of wikis are provided for interested academic staff from other institutions. Wikis were found to be of limited use to investigate levels of communication and collaboration in this study, but may be of interest in other contexts. Wikis were considered a potentially useful tool to track engagement for Capstone design projects in engineering subjects.
Statistical modeling for visualization evaluation through data fusion.
Chen, Xiaoyu; Jin, Ran
2017-11-01
There is a high demand of data visualization providing insights to users in various applications. However, a consistent, online visualization evaluation method to quantify mental workload or user preference is lacking, which leads to an inefficient visualization and user interface design process. Recently, the advancement of interactive and sensing technologies makes the electroencephalogram (EEG) signals, eye movements as well as visualization logs available in user-centered evaluation. This paper proposes a data fusion model and the application procedure for quantitative and online visualization evaluation. 15 participants joined the study based on three different visualization designs. The results provide a regularized regression model which can accurately predict the user's evaluation of task complexity, and indicate the significance of all three types of sensing data sets for visualization evaluation. This model can be widely applied to data visualization evaluation, and other user-centered designs evaluation and data analysis in human factors and ergonomics. Copyright © 2016 Elsevier Ltd. All rights reserved.
Scientific visualization of volumetric radar cross section data
NASA Astrophysics Data System (ADS)
Wojszynski, Thomas G.
1992-12-01
For aircraft design and mission planning, designers, threat analysts, mission planners, and pilots require a Radar Cross Section (RCS) central tendency with its associated distribution about a specified aspect and its relation to a known threat, Historically, RCS data sets have been statically analyzed to evaluate a d profile. However, Scientific Visualization, the application of computer graphics techniques to produce pictures of complex physical phenomena appears to be a more promising tool to interpret this data. This work describes data reduction techniques and a surface rendering algorithm to construct and display a complex polyhedron from adjacent contours of RCS data. Data reduction is accomplished by sectorizing the data and characterizing the statistical properties of the data. Color, lighting, and orientation cues are added to complete the visualization system. The tool may be useful for synthesis, design, and analysis of complex, low observable air vehicles.
Heo, Moonseong; Litwin, Alain H; Blackstock, Oni; Kim, Namhee; Arnsten, Julia H
2017-02-01
We derived sample size formulae for detecting main effects in group-based randomized clinical trials with different levels of data hierarchy between experimental and control arms. Such designs are necessary when experimental interventions need to be administered to groups of subjects whereas control conditions need to be administered to individual subjects. This type of trial, often referred to as a partially nested or partially clustered design, has been implemented for management of chronic diseases such as diabetes and is beginning to emerge more commonly in wider clinical settings. Depending on the research setting, the level of hierarchy of data structure for the experimental arm can be three or two, whereas that for the control arm is two or one. Such different levels of data hierarchy assume correlation structures of outcomes that are different between arms, regardless of whether research settings require two or three level data structure for the experimental arm. Therefore, the different correlations should be taken into account for statistical modeling and for sample size determinations. To this end, we considered mixed-effects linear models with different correlation structures between experimental and control arms to theoretically derive and empirically validate the sample size formulae with simulation studies.
APPETITE CONTROL: METHODOLOGICAL ASPECTS OF THE EVALUATION OF FOODS
Blundell, John; de Graaf, Cees; Hulshof, Toine; Jebb, Susan; Livingstone, Barbara; Lluch, Anne; Mela, David; Salah, Samir; Schuring, Ewoud; van der Knaap, Henk; Westerterp, Margriet
2013-01-01
This report describes a set of scientific procedures used to assess the impact of foods and food ingredients on the expression of appetite (psychological and behavioural). An overarching priority has been to enable potential evaluators of health claims about foods to identify justified claims, and to exclude claims that are not supported by scientific evidence for the effect cited. This priority follows precisely from the principles set down in the PASSCLAIM report. (4) The report allows the evaluation of the strength of health claims, about the effects of foods on appetite, which can be sustained on the basis of the commonly used scientific designs and experimental procedures. The report includes different designs for assessing effects on satiation as opposed to satiety,detailed coverage of the extent to which a change in hunger can stand-alone as a measure of appetite control, and an extensive discussion of the statistical procedures appropriate for handling data in this field of research. Since research in this area is continually evolving, new improved methodologies may emerge over time and will need to be incorporated into the framework. One main objective of the report has been to produce guidance on good practice in carrying out appetite research, and not to set down a series of commandments that must be followed. PMID:20122136
Pingault, Jean Baptiste; Côté, Sylvana M.; Petitclerc, Amélie; Vitaro, Frank; Tremblay, Richard E.
2015-01-01
Background Parental educational expectations have been associated with children’s educational attainment in a number of long-term longitudinal studies, but whether this relationship is causal has long been debated. The aims of this prospective study were twofold: 1) test whether low maternal educational expectations contributed to failure to graduate from high school; and 2) compare the results obtained using different strategies for accounting for confounding variables (i.e. multivariate regression and propensity score matching). Methodology/Principal Findings The study sample included 1,279 participants from the Quebec Longitudinal Study of Kindergarten Children. Maternal educational expectations were assessed when the participants were aged 12 years. High school graduation – measuring educational attainment – was determined through the Quebec Ministry of Education when the participants were aged 22–23 years. Findings show that when using the most common statistical approach (i.e. multivariate regressions to adjust for a restricted set of potential confounders) the contribution of low maternal educational expectations to failure to graduate from high school was statistically significant. However, when using propensity score matching, the contribution of maternal expectations was reduced and remained statistically significant only for males. Conclusions/Significance The results of this study are consistent with the possibility that the contribution of parental expectations to educational attainment is overestimated in the available literature. This may be explained by the use of a restricted range of potential confounding variables as well as the dearth of studies using appropriate statistical techniques and study designs in order to minimize confounding. Each of these techniques and designs, including propensity score matching, has its strengths and limitations: A more comprehensive understanding of the causal role of parental expectations will stem from a convergence of findings from studies using different techniques and designs. PMID:25803867
Schürmann, Tim; Beckerle, Philipp; Preller, Julia; Vogt, Joachim; Christ, Oliver
2016-12-19
In product development for lower limb prosthetic devices, a set of special criteria needs to be met. Prosthetic devices have a direct impact on the rehabilitation process after an amputation with both perceived technological and psychological aspects playing an important role. However, available psychometric questionnaires fail to consider the important links between these two dimensions. In this article a probabilistic latent trait model is proposed with seven technical and psychological factors which measure satisfaction with the prosthesis. The results of a first study are used to determine the basic parameters of the statistical model. These distributions represent hypotheses about factor loadings between manifest items and latent factors of the proposed psychometric questionnaire. A study was conducted and analyzed to form hypotheses for the prior distributions of the questionnaire's measurement model. An expert agreement study conducted on 22 experts was used to determine the prior distribution of item-factor loadings in the model. Model parameters that had to be specified as part of the measurement model were informed prior distributions on the item-factor loadings. For the current 70 items in the questionnaire, each factor loading was set to represent the certainty with which experts had assigned the items to their respective factors. Considering only the measurement model and not the structural model of the questionnaire, 70 out of 217 informed prior distributions on parameters were set. The use of preliminary studies to set prior distributions in latent trait models, while being a relatively new approach in psychological research, provides helpful information towards the design of a seven factor questionnaire that means to identify relations between technical and psychological factors in prosthetic product design and rehabilitation medicine.
NASA Astrophysics Data System (ADS)
Bhowmik, Mrinal Kanti; Gogoi, Usha Rani; Das, Kakali; Ghosh, Anjan Kumar; Bhattacharjee, Debotosh; Majumdar, Gautam
2016-05-01
The non-invasive, painless, radiation-free and cost-effective infrared breast thermography (IBT) makes a significant contribution to improving the survival rate of breast cancer patients by early detecting the disease. This paper presents a set of standard breast thermogram acquisition protocols to improve the potentiality and accuracy of infrared breast thermograms in early breast cancer detection. By maintaining all these protocols, an infrared breast thermogram acquisition setup has been established at the Regional Cancer Centre (RCC) of Government Medical College (AGMC), Tripura, India. The acquisition of breast thermogram is followed by the breast thermogram interpretation, for identifying the presence of any abnormality. However, due to the presence of complex vascular patterns, accurate interpretation of breast thermogram is a very challenging task. The bilateral symmetry of the thermal patterns in each breast thermogram is quantitatively computed by statistical feature analysis. A series of statistical features are extracted from a set of 20 thermograms of both healthy and unhealthy subjects. Finally, the extracted features are analyzed for breast abnormality detection. The key contributions made by this paper can be highlighted as -- a) the designing of a standard protocol suite for accurate acquisition of breast thermograms, b) creation of a new breast thermogram dataset by maintaining the protocol suite, and c) statistical analysis of the thermograms for abnormality detection. By doing so, this proposed work can minimize the rate of false findings in breast thermograms and thus, it will increase the utilization potentiality of breast thermograms in early breast cancer detection.
Martin, Lisa; Watanabe, Sharon; Fainsinger, Robin; Lau, Francis; Ghosh, Sunita; Quan, Hue; Atkins, Marlis; Fassbender, Konrad; Downing, G Michael; Baracos, Vickie
2010-10-01
To determine whether elements of a standard nutritional screening assessment are independently prognostic of survival in patients with advanced cancer. A prospective nested cohort of patients with metastatic cancer were accrued from different units of a Regional Palliative Care Program. Patients completed a nutritional screen on admission. Data included age, sex, cancer site, height, weight history, dietary intake, 13 nutrition impact symptoms, and patient- and physician-reported performance status (PS). Univariate and multivariate survival analyses were conducted. Concordance statistics (c-statistics) were used to test the predictive accuracy of models based on training and validation sets; a c-statistic of 0.5 indicates the model predicts the outcome as well as chance; perfect prediction has a c-statistic of 1.0. A training set of patients in palliative home care (n = 1,164) was used to identify prognostic variables. Primary disease site, PS, short-term weight change (either gain or loss), dietary intake, and dysphagia predicted survival in multivariate analysis (P < .05). A model including only patients separated by disease site and PS with high c-statistics between predicted and observed responses for survival in the training set (0.90) and validation set (0.88; n = 603). The addition of weight change, dietary intake, and dysphagia did not further improve the c-statistic of the model. The c-statistic was also not altered by substituting physician-rated palliative PS for patient-reported PS. We demonstrate a high probability of concordance between predicted and observed survival for patients in distinct palliative care settings (home care, tertiary inpatient, ambulatory outpatient) based on patient-reported information.
The effects of context on multidimensional spatial cognitive models. Ph.D. Thesis - Arizona Univ.
NASA Technical Reports Server (NTRS)
Dupnick, E. G.
1979-01-01
Spatial cognitive models obtained by multidimensional scaling represent cognitive structure by defining alternatives as points in a coordinate space based on relevant dimensions such that interstimulus dissimilarities perceived by the individual correspond to distances between the respective alternatives. The dependence of spatial models on the context of the judgments required of the individual was investigated. Context, which is defined as a perceptual interpretation and cognitive understanding of a judgment situation, was analyzed and classified with respect to five characteristics: physical environment, social environment, task definition, individual perspective, and temporal setting. Four experiments designed to produce changes in the characteristics of context and to test the effects of these changes upon individual cognitive spaces are described with focus on experiment design, objectives, statistical analysis, results, and conclusions. The hypothesis is advanced that an individual can be characterized as having a master cognitive space for a set of alternatives. When the context changes, the individual appears to change the dimension weights to give a new spatial configuration. Factor analysis was used in the interpretation and labeling of cognitive space dimensions.
Semantics-Based Intelligent Indexing and Retrieval of Digital Images - A Case Study
NASA Astrophysics Data System (ADS)
Osman, Taha; Thakker, Dhavalkumar; Schaefer, Gerald
The proliferation of digital media has led to a huge interest in classifying and indexing media objects for generic search and usage. In particular, we are witnessing colossal growth in digital image repositories that are difficult to navigate using free-text search mechanisms, which often return inaccurate matches as they typically rely on statistical analysis of query keyword recurrence in the image annotation or surrounding text. In this chapter we present a semantically enabled image annotation and retrieval engine that is designed to satisfy the requirements of commercial image collections market in terms of both accuracy and efficiency of the retrieval process. Our search engine relies on methodically structured ontologies for image annotation, thus allowing for more intelligent reasoning about the image content and subsequently obtaining a more accurate set of results and a richer set of alternatives matchmaking the original query. We also show how our well-analysed and designed domain ontology contributes to the implicit expansion of user queries as well as presenting our initial thoughts on exploiting lexical databases for explicit semantic-based query expansion.
Levin, Gregory P; Emerson, Sarah C; Emerson, Scott S
2014-09-01
Many papers have introduced adaptive clinical trial methods that allow modifications to the sample size based on interim estimates of treatment effect. There has been extensive commentary on type I error control and efficiency considerations, but little research on estimation after an adaptive hypothesis test. We evaluate the reliability and precision of different inferential procedures in the presence of an adaptive design with pre-specified rules for modifying the sampling plan. We extend group sequential orderings of the outcome space based on the stage at stopping, likelihood ratio statistic, and sample mean to the adaptive setting in order to compute median-unbiased point estimates, exact confidence intervals, and P-values uniformly distributed under the null hypothesis. The likelihood ratio ordering is found to average shorter confidence intervals and produce higher probabilities of P-values below important thresholds than alternative approaches. The bias adjusted mean demonstrates the lowest mean squared error among candidate point estimates. A conditional error-based approach in the literature has the benefit of being the only method that accommodates unplanned adaptations. We compare the performance of this and other methods in order to quantify the cost of failing to plan ahead in settings where adaptations could realistically be pre-specified at the design stage. We find the cost to be meaningful for all designs and treatment effects considered, and to be substantial for designs frequently proposed in the literature. © 2014, The International Biometric Society.
NASA Astrophysics Data System (ADS)
Bucy, Brandon R.
While much of physics education research (PER) has traditionally been conducted in introductory undergraduate courses, researchers have begun to study student understanding of physics concepts at the upper-level. In this dissertation, we describe investigations conducted in advanced undergraduate thermodynamics courses. We present and discuss results pertaining to student understanding of two topics: entropy and the role of mixed second-order partial derivatives in thermodynamics. Our investigations into student understanding of entropy consisted of an analysis of written student responses to researcher-designed diagnostic questions. Data gathered in clinical interviews is employed to illustrate and extend results gathered from written responses. The question sets provided students with several ideal gas processes, and asked students to determine and compare the entropy changes of these processes. We administered the question sets to students from six distinct populations, including students enrolled in classical thermodynamics, statistical mechanics, thermal physics, physical chemistry, and chemical engineering courses, as well as a sample of physics graduate students. Data was gathered both before and after instruction in several samples. Several noteworthy features of student reasoning are identified and discussed. These features include student ideas about entropy prior to instruction, as well as specific difficulties and other aspects of student reasoning evident after instruction. As an example, students from various populations tended to emphasize either the thermodynamic or the statistical definition of entropy. Both approaches present students with a unique set of benefits as well as challenges. We additionally studied student understanding of partial derivatives in a thermodynamics context. We identified specific difficulties related to the mixed second partial derivatives of a thermodynamic state function, based on an analysis of student responses to homework and exam problems. Students tended to set these partial derivatives identically equal to zero. Students also displayed difficulties in relating the physical description of a material property to a corresponding mathematical statement involving partial derivatives. We describe the development of a guided-inquiry tutorial activity designed to address these specific difficulties. This tutorial focused on the graphical interpretation of partial derivatives. Preliminary results suggest that the tutorial was effective in addressing several student difficulties related to partial derivatives.
Power calculation for overall hypothesis testing with high-dimensional commensurate outcomes.
Chi, Yueh-Yun; Gribbin, Matthew J; Johnson, Jacqueline L; Muller, Keith E
2014-02-28
The complexity of system biology means that any metabolic, genetic, or proteomic pathway typically includes so many components (e.g., molecules) that statistical methods specialized for overall testing of high-dimensional and commensurate outcomes are required. While many overall tests have been proposed, very few have power and sample size methods. We develop accurate power and sample size methods and software to facilitate study planning for high-dimensional pathway analysis. With an account of any complex correlation structure between high-dimensional outcomes, the new methods allow power calculation even when the sample size is less than the number of variables. We derive the exact (finite-sample) and approximate non-null distributions of the 'univariate' approach to repeated measures test statistic, as well as power-equivalent scenarios useful to generalize our numerical evaluations. Extensive simulations of group comparisons support the accuracy of the approximations even when the ratio of number of variables to sample size is large. We derive a minimum set of constants and parameters sufficient and practical for power calculation. Using the new methods and specifying the minimum set to determine power for a study of metabolic consequences of vitamin B6 deficiency helps illustrate the practical value of the new results. Free software implementing the power and sample size methods applies to a wide range of designs, including one group pre-intervention and post-intervention comparisons, multiple parallel group comparisons with one-way or factorial designs, and the adjustment and evaluation of covariate effects. Copyright © 2013 John Wiley & Sons, Ltd.
You Cannot Step Into the Same River Twice: When Power Analyses Are Optimistic.
McShane, Blakeley B; Böckenholt, Ulf
2014-11-01
Statistical power depends on the size of the effect of interest. However, effect sizes are rarely fixed in psychological research: Study design choices, such as the operationalization of the dependent variable or the treatment manipulation, the social context, the subject pool, or the time of day, typically cause systematic variation in the effect size. Ignoring this between-study variation, as standard power formulae do, results in assessments of power that are too optimistic. Consequently, when researchers attempting replication set sample sizes using these formulae, their studies will be underpowered and will thus fail at a greater than expected rate. We illustrate this with both hypothetical examples and data on several well-studied phenomena in psychology. We provide formulae that account for between-study variation and suggest that researchers set sample sizes with respect to our generally more conservative formulae. Our formulae generalize to settings in which there are multiple effects of interest. We also introduce an easy-to-use website that implements our approach to setting sample sizes. Finally, we conclude with recommendations for quantifying between-study variation. © The Author(s) 2014.
van Oostrom, Conny T.; Jonker, Martijs J.; de Jong, Mark; Dekker, Rob J.; Rauwerda, Han; Ensink, Wim A.; de Vries, Annemieke; Breit, Timo M.
2014-01-01
In transcriptomics research, design for experimentation by carefully considering biological, technological, practical and statistical aspects is very important, because the experimental design space is essentially limitless. Usually, the ranges of variable biological parameters of the design space are based on common practices and in turn on phenotypic endpoints. However, specific sub-cellular processes might only be partially reflected by phenotypic endpoints or outside the associated parameter range. Here, we provide a generic protocol for range finding in design for transcriptomics experimentation based on small-scale gene-expression experiments to help in the search for the right location in the design space by analyzing the activity of already known genes of relevant molecular mechanisms. Two examples illustrate the applicability: in-vitro UV-C exposure of mouse embryonic fibroblasts and in-vivo UV-B exposure of mouse skin. Our pragmatic approach is based on: framing a specific biological question and associated gene-set, performing a wide-ranged experiment without replication, eliminating potentially non-relevant genes, and determining the experimental ‘sweet spot’ by gene-set enrichment plus dose-response correlation analysis. Examination of many cellular processes that are related to UV response, such as DNA repair and cell-cycle arrest, revealed that basically each cellular (sub-) process is active at its own specific spot(s) in the experimental design space. Hence, the use of range finding, based on an affordable protocol like this, enables researchers to conveniently identify the ‘sweet spot’ for their cellular process of interest in an experimental design space and might have far-reaching implications for experimental standardization. PMID:24823911
Park, Jungkap; Saitou, Kazuhiro
2014-09-18
Multibody potentials accounting for cooperative effects of molecular interactions have shown better accuracy than typical pairwise potentials. The main challenge in the development of such potentials is to find relevant structural features that characterize the tightly folded proteins. Also, the side-chains of residues adopt several specific, staggered conformations, known as rotamers within protein structures. Different molecular conformations result in different dipole moments and induce charge reorientations. However, until now modeling of the rotameric state of residues had not been incorporated into the development of multibody potentials for modeling non-bonded interactions in protein structures. In this study, we develop a new multibody statistical potential which can account for the influence of rotameric states on the specificity of atomic interactions. In this potential, named "rotamer-dependent atomic statistical potential" (ROTAS), the interaction between two atoms is specified by not only the distance and relative orientation but also by two state parameters concerning the rotameric state of the residues to which the interacting atoms belong. It was clearly found that the rotameric state is correlated to the specificity of atomic interactions. Such rotamer-dependencies are not limited to specific type or certain range of interactions. The performance of ROTAS was tested using 13 sets of decoys and was compared to those of existing atomic-level statistical potentials which incorporate orientation-dependent energy terms. The results show that ROTAS performs better than other competing potentials not only in native structure recognition, but also in best model selection and correlation coefficients between energy and model quality. A new multibody statistical potential, ROTAS accounting for the influence of rotameric states on the specificity of atomic interactions was developed and tested on decoy sets. The results show that ROTAS has improved ability to recognize native structure from decoy models compared to other potentials. The effectiveness of ROTAS may provide insightful information for the development of many applications which require accurate side-chain modeling such as protein design, mutation analysis, and docking simulation.
A Bayesian sequential design with adaptive randomization for 2-sided hypothesis test.
Yu, Qingzhao; Zhu, Lin; Zhu, Han
2017-11-01
Bayesian sequential and adaptive randomization designs are gaining popularity in clinical trials thanks to their potentials to reduce the number of required participants and save resources. We propose a Bayesian sequential design with adaptive randomization rates so as to more efficiently attribute newly recruited patients to different treatment arms. In this paper, we consider 2-arm clinical trials. Patients are allocated to the 2 arms with a randomization rate to achieve minimum variance for the test statistic. Algorithms are presented to calculate the optimal randomization rate, critical values, and power for the proposed design. Sensitivity analysis is implemented to check the influence on design by changing the prior distributions. Simulation studies are applied to compare the proposed method and traditional methods in terms of power and actual sample sizes. Simulations show that, when total sample size is fixed, the proposed design can obtain greater power and/or cost smaller actual sample size than the traditional Bayesian sequential design. Finally, we apply the proposed method to a real data set and compare the results with the Bayesian sequential design without adaptive randomization in terms of sample sizes. The proposed method can further reduce required sample size. Copyright © 2017 John Wiley & Sons, Ltd.
Siegelman, Noam; Bogaerts, Louisa; Kronenfeld, Ofer; Frost, Ram
2017-10-07
From a theoretical perspective, most discussions of statistical learning (SL) have focused on the possible "statistical" properties that are the object of learning. Much less attention has been given to defining what "learning" is in the context of "statistical learning." One major difficulty is that SL research has been monitoring participants' performance in laboratory settings with a strikingly narrow set of tasks, where learning is typically assessed offline, through a set of two-alternative-forced-choice questions, which follow a brief visual or auditory familiarization stream. Is that all there is to characterizing SL abilities? Here we adopt a novel perspective for investigating the processing of regularities in the visual modality. By tracking online performance in a self-paced SL paradigm, we focus on the trajectory of learning. In a set of three experiments we show that this paradigm provides a reliable and valid signature of SL performance, and it offers important insights for understanding how statistical regularities are perceived and assimilated in the visual modality. This demonstrates the promise of integrating different operational measures to our theory of SL. © 2017 Cognitive Science Society, Inc.
Ma, Yue; Yin, Fei; Zhang, Tao; Zhou, Xiaohua Andrew; Li, Xiaosong
2016-01-01
Spatial scan statistics are widely used in various fields. The performance of these statistics is influenced by parameters, such as maximum spatial cluster size, and can be improved by parameter selection using performance measures. Current performance measures are based on the presence of clusters and are thus inapplicable to data sets without known clusters. In this work, we propose a novel overall performance measure called maximum clustering set-proportion (MCS-P), which is based on the likelihood of the union of detected clusters and the applied dataset. MCS-P was compared with existing performance measures in a simulation study to select the maximum spatial cluster size. Results of other performance measures, such as sensitivity and misclassification, suggest that the spatial scan statistic achieves accurate results in most scenarios with the maximum spatial cluster sizes selected using MCS-P. Given that previously known clusters are not required in the proposed strategy, selection of the optimal maximum cluster size with MCS-P can improve the performance of the scan statistic in applications without identified clusters.
Assessing Discriminative Performance at External Validation of Clinical Prediction Models
Nieboer, Daan; van der Ploeg, Tjeerd; Steyerberg, Ewout W.
2016-01-01
Introduction External validation studies are essential to study the generalizability of prediction models. Recently a permutation test, focusing on discrimination as quantified by the c-statistic, was proposed to judge whether a prediction model is transportable to a new setting. We aimed to evaluate this test and compare it to previously proposed procedures to judge any changes in c-statistic from development to external validation setting. Methods We compared the use of the permutation test to the use of benchmark values of the c-statistic following from a previously proposed framework to judge transportability of a prediction model. In a simulation study we developed a prediction model with logistic regression on a development set and validated them in the validation set. We concentrated on two scenarios: 1) the case-mix was more heterogeneous and predictor effects were weaker in the validation set compared to the development set, and 2) the case-mix was less heterogeneous in the validation set and predictor effects were identical in the validation and development set. Furthermore we illustrated the methods in a case study using 15 datasets of patients suffering from traumatic brain injury. Results The permutation test indicated that the validation and development set were homogenous in scenario 1 (in almost all simulated samples) and heterogeneous in scenario 2 (in 17%-39% of simulated samples). Previously proposed benchmark values of the c-statistic and the standard deviation of the linear predictors correctly pointed at the more heterogeneous case-mix in scenario 1 and the less heterogeneous case-mix in scenario 2. Conclusion The recently proposed permutation test may provide misleading results when externally validating prediction models in the presence of case-mix differences between the development and validation population. To correctly interpret the c-statistic found at external validation it is crucial to disentangle case-mix differences from incorrect regression coefficients. PMID:26881753
Assessing Discriminative Performance at External Validation of Clinical Prediction Models.
Nieboer, Daan; van der Ploeg, Tjeerd; Steyerberg, Ewout W
2016-01-01
External validation studies are essential to study the generalizability of prediction models. Recently a permutation test, focusing on discrimination as quantified by the c-statistic, was proposed to judge whether a prediction model is transportable to a new setting. We aimed to evaluate this test and compare it to previously proposed procedures to judge any changes in c-statistic from development to external validation setting. We compared the use of the permutation test to the use of benchmark values of the c-statistic following from a previously proposed framework to judge transportability of a prediction model. In a simulation study we developed a prediction model with logistic regression on a development set and validated them in the validation set. We concentrated on two scenarios: 1) the case-mix was more heterogeneous and predictor effects were weaker in the validation set compared to the development set, and 2) the case-mix was less heterogeneous in the validation set and predictor effects were identical in the validation and development set. Furthermore we illustrated the methods in a case study using 15 datasets of patients suffering from traumatic brain injury. The permutation test indicated that the validation and development set were homogenous in scenario 1 (in almost all simulated samples) and heterogeneous in scenario 2 (in 17%-39% of simulated samples). Previously proposed benchmark values of the c-statistic and the standard deviation of the linear predictors correctly pointed at the more heterogeneous case-mix in scenario 1 and the less heterogeneous case-mix in scenario 2. The recently proposed permutation test may provide misleading results when externally validating prediction models in the presence of case-mix differences between the development and validation population. To correctly interpret the c-statistic found at external validation it is crucial to disentangle case-mix differences from incorrect regression coefficients.
Jenkins, Hazel J.; Hancock, Mark J.; French, Simon D.; Maher, Chris G.; Engel, Roger M.; Magnussen, John S.
2015-01-01
Background: Rates of imaging for low-back pain are high and are associated with increased health care costs and radiation exposure as well as potentially poorer patient outcomes. We conducted a systematic review to investigate the effectiveness of interventions aimed at reducing the use of imaging for low-back pain. Methods: We searched MEDLINE, Embase, CINAHL and the Cochrane Central Register of Controlled Trials from the earliest records to June 23, 2014. We included randomized controlled trials, controlled clinical trials and interrupted time series studies that assessed interventions designed to reduce the use of imaging in any clinical setting, including primary, emergency and specialist care. Two independent reviewers extracted data and assessed risk of bias. We used raw data on imaging rates to calculate summary statistics. Study heterogeneity prevented meta-analysis. Results: A total of 8500 records were identified through the literature search. Of the 54 potentially eligible studies reviewed in full, 7 were included in our review. Clinical decision support involving a modified referral form in a hospital setting reduced imaging by 36.8% (95% confidence interval [CI] 33.2% to 40.5%). Targeted reminders to primary care physicians of appropriate indications for imaging reduced referrals for imaging by 22.5% (95% CI 8.4% to 36.8%). Interventions that used practitioner audits and feedback, practitioner education or guideline dissemination did not significantly reduce imaging rates. Lack of power within some of the included studies resulted in lack of statistical significance despite potentially clinically important effects. Interpretation: Clinical decision support in a hospital setting and targeted reminders to primary care doctors were effective interventions in reducing the use of imaging for low-back pain. These are potentially low-cost interventions that would substantially decrease medical expenditures associated with the management of low-back pain. PMID:25733741
Tarling, Maggie; Jones, Anne; Murrells, Trevor; McCutcheon, Helen
2017-01-01
Objectives The main aim of the study was to explore the potential sources of variation and understand the meaning of safety climate for nursing practice in acute hospital settings in the UK. Design A sequential mixed methods design included a cross-sectional survey using the Safety Climate Questionnaire (SCQ) and thematic analysis of focus group discussions. Confirmatory factor analysis (CFA) was used to validate the factor structure of the SCQ. Factor scores were compared between nurses working in operating theatres, critical care and ward areas. Results from the survey and the thematic analysis were then compared and synthesised. Setting A London University. Participants 319 registered nurses working in acute hospital settings completed the SCQ and a further 23 nurses participated in focus groups. Results CFA indicated that there was a good model fit on some criteria (χ2=1683.699, df=824, p<0.001; χ2/df=2.04; root mean square error of approximation=0.058) but a less acceptable fit on comparative fit index which is 0.804. There was a statistically significant difference between clinical specialisms in management commitment (F (4,266)=4.66, p=0.001). Nurses working in operating theatres had lower scores compared with ward areas and they also reported negative perceptions about management in their focus group. There was significant variation in scores for communication across clinical specialism (F (4,266)=2.62, p=0.035) but none of the pairwise comparisons achieved statistical significance. Thematic analysis identified themes of human factors, clinical management and protecting patients. The system and the human side of caring was identified as a meta-theme. Conclusions The results suggest that the SCQ has some utility but requires further exploration. The findings indicate that safety in nursing practice is a complex interaction between safety systems and the social and interpersonal aspects of clinical practice. PMID:29084793
Walters, Stephen John; Stern, Cindy; Robertson-Malt, Suzanne
2016-04-01
There is a growing call by consumers and governments for healthcare to adopt systems and approaches to care to improve patient safety. Collaboration within healthcare settings is an important factor for improving systems of care. By using validated measurement instruments a standardized approach to assessing collaboration is possible, otherwise it is only an assumption that collaboration is occurring in any healthcare setting. The objective of this review was to evaluate and compare measurement properties of instruments that measure collaboration within healthcare settings, specifically those which have been psychometrically tested and validated. Participants could be healthcare professionals, the patient or any non-professional who contributes to a patient's care, for example, family members, chaplains or orderlies. The term participant type means the designation of any one participant; for example 'nurse', 'social worker' or 'administrator'. More than two participant types was mandatory. The focus of this review was the validity of tools used to measure collaboration within healthcare settings. The types of studies considered for inclusion were validation studies, but quantitative study designs such as randomized controlled trials, controlled trials and case studies were also eligible for inclusion. Studies that focused on Interprofessional Education, were published as an abstract only, contained patient self-reporting only or were not about care delivery were excluded. The outcome of interest was validation and interpretability of the instrument being assessed and included content validity, construct validity and reliability. Interpretability is characterized by statistics such as mean and standard deviation which can be translated to a qualitative meaning. The search strategy aimed to find both published and unpublished studies. A three-step search strategy was utilized in this review. The databases searched included PubMed, CINAHL, Embase, Cochrane Central Register of Controlled Trials, Emerald Fulltext, MD Consult Australia, PsycARTICLES, Psychology and Behavioural Sciences Collection, PsycINFO, Informit Health Databases, Scopus, UpToDate and Web of Science. The search for unpublished studies included EThOS (Electronic Thesis Online Service), Index to Theses and ProQuest- Dissertations and Theses. The assessment of methodological quality of the included studies was undertaken using the COSMIN checklist which is a validated tool that assesses the process of design and validation of healthcare measurement instruments. An Excel spreadsheet version of COSMIN was developed for data collection which included a worksheet for extracting participant characteristics and interpretability data. Statistical pooling of data was not possible for this review. Therefore, the findings are presented in a narrative form including tables and figures to aid in data presentation. To make a synthesis of the assessments of methodological quality of the different studies, each instrument was rated by accounting for the number of studies performed with an instrument, the appraisal of methodological quality and the consistency of results between studies. Twenty-one studies of 12 instruments were included in the review. The studies were diverse in their theoretical underpinnings, target population/setting and measurement objectives. Measurement objectives included: investigating beliefs, behaviors, attitudes, perceptions and relationships associated with collaboration; measuring collaboration between different levels of care or within a multi-rater/target group; assessing collaboration across teams; or assessing internal participation of both teams and patients.Studies produced validity or interpretability data but none of the studies assessed all validity and reliability properties. However, most of the included studies produced a factor structure or referred to prior factor analysis. A narrative synthesis of the individual study factor structures was generated consisting of nine headings: organizational settings, support structures, purpose and goals; communication; reflection on process; cooperation; coordination; role interdependence and partnership; relationships; newly created professional activities; and professional flexibility. Among the many instruments that measure collaboration within healthcare settings, the quality of each instrument varies; instruments are designed for specific populations and purposes, and are validated in various settings. Selecting an instrument requires careful consideration of the qualities of each. Therefore, referring to systematic reviews of measurement properties of instruments may be helpful to clinicians or researchers in instrument selection. Systematic reviews of measurement properties of instruments are valuable in aiding in instrument selection. This systematic review may be useful in instrument selection for the measurement of collaboration within healthcare settings with a complex mix of participant types. Evaluating collaboration provides important information on the strengths and limitations of different healthcare settings and the opportunities for continuous improvement via any remedial actions initiated. Development of a tool that can be used to measure collaboration within teams of healthcare professionals and non-professionals is important for practice. The use of different statistical modelling techniques, such as Item Response Theory modelling and the translation of models into Computer Adaptive Tests, may prove useful. Measurement equivalence is an important consideration for future instrument development and validation. Further development of the COSMIN tool should include appraisal for measurement equivalence. Researchers developing and validating measurement tools should consider multi-method research designs.
Tang, Guoping; Shafer, Sarah L.; Barlein, Patrick J.; Holman, Justin O.
2009-01-01
Prognostic vegetation models have been widely used to study the interactions between environmental change and biological systems. This study examines the sensitivity of vegetation model simulations to: (i) the selection of input climatologies representing different time periods and their associated atmospheric CO2 concentrations, (ii) the choice of observed vegetation data for evaluating the model results, and (iii) the methods used to compare simulated and observed vegetation. We use vegetation simulated for Asia by the equilibrium vegetation model BIOME4 as a typical example of vegetation model output. BIOME4 was run using 19 different climatologies and their associated atmospheric CO2 concentrations. The Kappa statistic, Fuzzy Kappa statistic and a newly developed map-comparison method, the Nomad index, were used to quantify the agreement between the biomes simulated under each scenario and the observed vegetation from three different global land- and tree-cover data sets: the global Potential Natural Vegetation data set (PNV), the Global Land Cover Characteristics data set (GLCC), and the Global Land Cover Facility data set (GLCF). The results indicate that the 30-year mean climatology (and its associated atmospheric CO2 concentration) for the time period immediately preceding the collection date of the observed vegetation data produce the most accurate vegetation simulations when compared with all three observed vegetation data sets. The study also indicates that the BIOME4-simulated vegetation for Asia more closely matches the PNV data than the other two observed vegetation data sets. Given the same observed data, the accuracy assessments of the BIOME4 simulations made using the Kappa, Fuzzy Kappa and Nomad index map-comparison methods agree well when the compared vegetation types consist of a large number of spatially continuous grid cells. The results of this analysis can assist model users in designing experimental protocols for simulating vegetation.
Sy, Michael Palapal
2017-11-01
For the past more than 50 years, the World Health Organisation has acknowledged through empirical findings that health workers that learn together work together effectively to provide the best care for their patients. This study aimed to: (1) describe the perceived extent of interprofessional education (IPE) experience among Filipino occupational therapists (OTs), physical therapists (PTs), and speech-language-pathologists (SLPs); (2) identify their attitudes towards interprofessional collaboration (IPC); and (3) compare their attitudes towards IPC according to: prior IPE experience, classification of IPE experience, profession, years of practice, and practice setting. Using a cross-sectional survey design, a two-part questionnaire was sent to Filipino OTs, PTs, and SLPs working in the Philippines via an online survey application. The first part of the survey contained eight items of demographic information and the second part contained the 14-item Attitudes Towards Health Care Teams Scale (ATHCTS). Findings revealed that among the Filipino OT, PT and SLP respondents (n = 189), 70.9% had prior experience on IPE. Moreover, the three most commonly used IPE teaching-learning strategies were case discussion (clinical setting), small group discussion, didactics, and case discussion (community setting), while the use of didactics and case discussion (community setting) yielded more agreeable attitudes towards IPC. Among the 14 items in the ATHCTS, 11 were rated with agreeability and three items with neutrality. For professional variables, only the practice setting variable yielded a statistically significant finding confirming those working in the academia to be more agreeable towards IPC compared to other settings. However, years of practice and professional background variables both yielded no statistically significant difference implying no association between years of practice and attitude towards IPC and a homogenous composition among respondents, respectively. The results of this research are to springboard IPE initiatives within Philippine higher education institutions to enable evidence-based IPC approaches in clinical practice.
Medical intelligence in Sweden. Vitamin B12: oral compared with parenteral?
Nilsson, M; Norberg, B; Hultdin, J; Sandström, H; Westman, G; Lökk, J
2005-03-01
Sweden is the only country in which oral high dose vitamin B12 has gained widespread use in the treatment of deficiency states. The aim of the study was to describe prescribing patterns and sales statistics of vitamin B12 tablets and injections in Sweden 1990-2000.Design, setting, and sources: Official statistics of cobalamin prescriptions and sales were used. The use of vitamin B12 increased in Sweden 1990-2000, mainly because of an increase in the use of oral high dose vitamin B12 therapy. The experience, in statistical terms a "total investigation", comprised 1,000,000 patient years for tablets and 750,000 patient years for injections. During 2000, 13% of residents aged 70 and over were treated with vitamin B12, two of three with the tablet preparation. Most patients in Sweden requiring vitamin B12 therapy have transferred from parenteral to oral high dose vitamin B12 since 1964, when the oral preparation was introduced. The findings suggest that many patients in other post-industrial societies may also be suitable for oral vitamin B12 treatment.
The Specific Features of design and process engineering in branch of industrial enterprise
NASA Astrophysics Data System (ADS)
Sosedko, V. V.; Yanishevskaya, A. G.
2017-06-01
Production output of industrial enterprise is organized in debugged working mechanisms at each stage of product’s life cycle from initial design documentation to product and finishing it with utilization. The topic of article is mathematical model of the system design and process engineering in branch of the industrial enterprise, statistical processing of estimated implementation results of developed mathematical model in branch, and demonstration of advantages at application at this enterprise. During the creation of model a data flow about driving of information, orders, details and modules in branch of enterprise groups of divisions were classified. Proceeding from the analysis of divisions activity, a data flow, details and documents the state graph of design and process engineering was constructed, transitions were described and coefficients are appropriated. To each condition of system of the constructed state graph the corresponding limiting state probabilities were defined, and also Kolmogorov’s equations are worked out. When integration of sets of equations of Kolmogorov the state probability of system activity the specified divisions and production as function of time in each instant is defined. On the basis of developed mathematical model of uniform system of designing and process engineering and manufacture, and a state graph by authors statistical processing the application of mathematical model results was carried out, and also advantage at application at this enterprise is shown. Researches on studying of loading services probability of branch and third-party contractors (the orders received from branch within a month) were conducted. The developed mathematical model of system design and process engineering and manufacture can be applied to definition of activity state probability of divisions and manufacture as function of time in each instant that will allow to keep account of loading of performance of work in branches of the enterprise.
Army Air Forces Statistical Digest, 1946. First Annual Number
1947-06-01
accordance with AAF Letter 5-5 dated 28 April 1947, the Army Air Forces Statistical Digest has been designated as the official AAF statisti- cal yearbook...more detailed exposition, Since the Digest is designed primarily as a reference. manual, the re- action of users to its contents is important in the...distributed by this Headquarters (Statistical Control Division, Office of the Air Comptroller) is hereby designated as the official AAF statistical yearbook
Caplan, Susan
2016-08-01
In order to understand the effects of interventions designed to reduce stigma about mental illness, we need valid measures. However, the validity of commonly used measures is compromised by social desirability bias. The purpose of this pilot study was to test an anonymous method of measuring stigma in the community setting. The method of data collection, Preguntas con Cartas (Questions with Cards) used numbered playing cards to conduct anonymous group polling about stigmatizing beliefs during a mental health literacy intervention. An analysis of the difference between Preguntas con Cartas stigma votes and corresponding face-to-face individual survey results for the same seven stigma questions indicated that there was a statistically significant differences in the distributions between the two methods of data collection (χ(2) = 8.27, p = 0.016). This exploratory study has shown the potential effectiveness of Preguntas con Cartas as a novel method of measuring stigma in the community-based setting.
NASA Astrophysics Data System (ADS)
Dakshinamurthy, Devika; Gupta, Srinivasa
2018-04-01
Fused Deposition Modelling (FDM) is a fast growing Rapid Prototyping (RP) technology due to its ability to build parts having complex geometrical shape in reasonable time period. The quality of built parts depends on many process variables. In this study, the influence of three FDM process parameters namely, slice height, raster angle and raster width on viscoelastic properties of Acrylonitrile Butadiene Styrene (ABS) RP-specimen is studied. Statistically designed experiments have been conducted for finding the optimum process parameter setting for enhancing the storage modulus. Dynamic Mechanical Analysis has been used to understand the viscoelastic properties at various parameter settings. At the optimal parameter setting the storage modulus and loss modulus of the ABS-RP specimen was 1008 and 259.9 MPa respectively. The relative percentage contribution of slice height and raster width on the viscoelastic properties of the FDM-RP components was found to be 55 and 31 % respectively.
Li, Yingxue; Hu, Yiying; Yang, Jingang; Li, Xiang; Liu, Haifeng; Xie, Guotong; Xu, Meilin; Hu, Jingyi; Yang, Yuejin
2017-01-01
Treatment effectiveness plays a fundamental role in patient therapies. In most observational studies, researchers often design an analysis pipeline for a specific treatment based on the study cohort. To evaluate other treatments in the data set, much repeated and multifarious work including cohort construction, statistical analysis need to be done. In addition, as treatments are often with an intrinsic hierarchical relationship, many rational comparable treatment pairs can be derived as new treatment variables besides the original single treatment one from the original cohort data set. In this paper, we propose an automatic treatment effectiveness analysis approach to solve this problem. With our approach, clinicians can assess the effect of treatments not only more conveniently but also more thoroughly and comprehensively. We applied this method to a real world case of estimating the drug effectiveness on Chinese Acute Myocardial Infarction (CAMI) data set and some meaningful results are obtained for potential improvement of patient treatments.
High-level intuitive features (HLIFs) for intuitive skin lesion description.
Amelard, Robert; Glaister, Jeffrey; Wong, Alexander; Clausi, David A
2015-03-01
A set of high-level intuitive features (HLIFs) is proposed to quantitatively describe melanoma in standard camera images. Melanoma is the deadliest form of skin cancer. With rising incidence rates and subjectivity in current clinical detection methods, there is a need for melanoma decision support systems. Feature extraction is a critical step in melanoma decision support systems. Existing feature sets for analyzing standard camera images are comprised of low-level features, which exist in high-dimensional feature spaces and limit the system's ability to convey intuitive diagnostic rationale. The proposed HLIFs were designed to model the ABCD criteria commonly used by dermatologists such that each HLIF represents a human-observable characteristic. As such, intuitive diagnostic rationale can be conveyed to the user. Experimental results show that concatenating the proposed HLIFs with a full low-level feature set increased classification accuracy, and that HLIFs were able to separate the data better than low-level features with statistical significance. An example of a graphical interface for providing intuitive rationale is given.
Efficient summary statistical representation when change localization fails.
Haberman, Jason; Whitney, David
2011-10-01
People are sensitive to the summary statistics of the visual world (e.g., average orientation/speed/facial expression). We readily derive this information from complex scenes, often without explicit awareness. Given the fundamental and ubiquitous nature of summary statistical representation, we tested whether this kind of information is subject to the attentional constraints imposed by change blindness. We show that information regarding the summary statistics of a scene is available despite limited conscious access. In a novel experiment, we found that while observers can suffer from change blindness (i.e., not localize where change occurred between two views of the same scene), observers could nevertheless accurately report changes in the summary statistics (or "gist") about the very same scene. In the experiment, observers saw two successively presented sets of 16 faces that varied in expression. Four of the faces in the first set changed from one emotional extreme (e.g., happy) to another (e.g., sad) in the second set. Observers performed poorly when asked to locate any of the faces that changed (change blindness). However, when asked about the ensemble (which set was happier, on average), observer performance remained high. Observers were sensitive to the average expression even when they failed to localize any specific object change. That is, even when observers could not locate the very faces driving the change in average expression between the two sets, they nonetheless derived a precise ensemble representation. Thus, the visual system may be optimized to process summary statistics in an efficient manner, allowing it to operate despite minimal conscious access to the information presented.
On the theory of Carriers's Electrostatic Interaction near an Interface
NASA Astrophysics Data System (ADS)
Waters, Michael; Hashemi, Hossein; Kieffer, John
2015-03-01
Heterojunction interfaces are common in non-traditional photovoltaic device designs, such as those based small molecules, polymers, and perovskites. We have examined a number of the effects of the heterojunction interface region on carrier/exciton energetics using a mixture of both semi-classical and quantum electrostatic methods, ab initio methods, and statistical mechanics. Our theoretical analysis has yielded several useful relationships and numerical recipes that should be considered in device design regardless of the particular materials system. As a demonstration, we highlight these formalisms as applied to carriers and polaron pairs near a C60/subphthalocyanine interface. On the regularly ordered areas of the heterojunction, the effect of the interface is a significant set of corrections to the carrier energies, which in turn directly affects device performance.
Sparse approximation of currents for statistics on curves and surfaces.
Durrleman, Stanley; Pennec, Xavier; Trouvé, Alain; Ayache, Nicholas
2008-01-01
Computing, processing, visualizing statistics on shapes like curves or surfaces is a real challenge with many applications ranging from medical image analysis to computational geometry. Modelling such geometrical primitives with currents avoids feature-based approach as well as point-correspondence method. This framework has been proved to be powerful to register brain surfaces or to measure geometrical invariants. However, if the state-of-the-art methods perform efficiently pairwise registrations, new numerical schemes are required to process groupwise statistics due to an increasing complexity when the size of the database is growing. Statistics such as mean and principal modes of a set of shapes often have a heavy and highly redundant representation. We propose therefore to find an adapted basis on which mean and principal modes have a sparse decomposition. Besides the computational improvement, this sparse representation offers a way to visualize and interpret statistics on currents. Experiments show the relevance of the approach on 34 sets of 70 sulcal lines and on 50 sets of 10 meshes of deep brain structures.
Pounds, Stan; Cao, Xueyuan; Cheng, Cheng; Yang, Jun; Campana, Dario; Evans, William E.; Pui, Ching-Hon; Relling, Mary V.
2010-01-01
Powerful methods for integrated analysis of multiple biological data sets are needed to maximize interpretation capacity and acquire meaningful knowledge. We recently developed Projection Onto the Most Interesting Statistical Evidence (PROMISE). PROMISE is a statistical procedure that incorporates prior knowledge about the biological relationships among endpoint variables into an integrated analysis of microarray gene expression data with multiple biological and clinical endpoints. Here, PROMISE is adapted to the integrated analysis of pharmacologic, clinical, and genome-wide genotype data that incorporating knowledge about the biological relationships among pharmacologic and clinical response data. An efficient permutation-testing algorithm is introduced so that statistical calculations are computationally feasible in this higher-dimension setting. The new method is applied to a pediatric leukemia data set. The results clearly indicate that PROMISE is a powerful statistical tool for identifying genomic features that exhibit a biologically meaningful pattern of association with multiple endpoint variables. PMID:21516175
Alligood, Christina A; Dorey, Nicole R; Mehrkam, Lindsay R; Leighty, Katherine A
2017-05-01
Environmental enrichment in zoos and aquariums is often evaluated at two overlapping levels: published research and day-to-day institutional record keeping. Several authors have discussed ongoing challenges with small sample sizes in between-groups zoological research and have cautioned against the inappropriate use of inferential statistics (Shepherdson, , International Zoo Yearbook, 38, 118-124; Shepherdson, Lewis, Carlstead, Bauman, & Perrin, Applied Animal Behaviour Science, 147, 298-277; Swaisgood, , Applied Animal Behaviour Science, 102, 139-162; Swaisgood & Shepherdson, , Zoo Biology, 24, 499-518). Multi-institutional studies are the typically-prescribed solution, but these are expensive and difficult to carry out. Kuhar ( Zoo Biology, 25, 339-352) provided a reminder that inferential statistics are only necessary when one wishes to draw general conclusions at the population level. Because welfare is assessed at the level of the individual animal, we argue that evaluations of enrichment efficacy are often instances in which inferential statistics may be neither necessary nor appropriate. In recent years, there have been calls for the application of behavior-analytic techniques to zoo animal behavior management, including environmental enrichment (e.g., Bloomsmith, Marr, & Maple, , Applied Animal Behaviour Science, 102, 205-222; Tarou & Bashaw, , Applied Animal Behaviour Science, 102, 189-204). Single-subject (also called single-case, or small-n) designs provide a means of designing evaluations of enrichment efficacy based on an individual's behavior. We discuss how these designs might apply to research and practice goals at zoos and aquariums, contrast them with standard practices in the field, and give examples of how each could be successfully applied in a zoo or aquarium setting. © 2017 Wiley Periodicals, Inc.
The importance of early investigation and publishing in an emergent health and environment crisis.
Murase, Kaori
2016-10-01
To minimize the damage resulting from a long-term environmental disaster such as the 2011 Fukushima nuclear accident in Japan, early disclosure of research data by scientists and prompt decision making by government authorities are required in place of careful, time-consuming research and deliberation about the consequences and cause of the accident. A Bayesian approach with flexible statistical modeling helps scientists and encourages government authorities to make decisions based on environmental data available in the early stages of a disaster. It is evident from Fukushima and similar accidents that classical research methods involving statistical methodologies that require rigorous experimental design and complex data sets are too cumbersome and delay important actions that may be critical in the early stages of an environmental disaster. Integr Environ Assess Manag 2016;12:680-682. © 2016 SETAC. © 2016 SETAC.
Development of chemistry attitudes and experiences questionnaire (CAEQ)
NASA Astrophysics Data System (ADS)
Dalgety, Jacinta; Coll, Richard K.; Jones, Alister
2003-09-01
In this article we describe the development of the Chemistry Attitudes and Experiences Questionnaire (CAEQ) that measures first-year university chemistry students' attitude toward chemistry, chemistry self-efficacy, and learning experiences. The instrument was developed as part of a larger study and sought to fulfill a need for an instrument to investigate factors that influence student enrollment choice. We set out to design the instrument in a manner that would maximize construct validity. The CAEQ was piloted with a cohort of science and technology students (n = 129) at the end of their first year. Based on statistical analysis the instrument was modified and subsequently administered on two occasions at two tertiary institutions (n = 669). Statistical data along with additional data gathered from interviews suggest that the CAEQ possesses good construct validity and will prove a useful tool for tertiary level educators who wish to gain an understanding of factors that influence student choice of chemistry enrolment.
A Statistics-Based Cracking Criterion of Resin-Bonded Silica Sand for Casting Process Simulation
NASA Astrophysics Data System (ADS)
Wang, Huimin; Lu, Yan; Ripplinger, Keith; Detwiler, Duane; Luo, Alan A.
2017-02-01
Cracking of sand molds/cores can result in many casting defects such as veining. A robust cracking criterion is needed in casting process simulation for predicting/controlling such defects. A cracking probability map, relating to fracture stress and effective volume, was proposed for resin-bonded silica sand based on Weibull statistics. Three-point bending test results of sand samples were used to generate the cracking map and set up a safety line for cracking criterion. Tensile test results confirmed the accuracy of the safety line for cracking prediction. A laboratory casting experiment was designed and carried out to predict cracking of a cup mold during aluminum casting. The stress-strain behavior and the effective volume of the cup molds were calculated using a finite element analysis code ProCAST®. Furthermore, an energy dispersive spectroscopy fractographic examination of the sand samples confirmed the binder cracking in resin-bonded silica sand.
Gel, Bernat; Díez-Villanueva, Anna; Serra, Eduard; Buschbeck, Marcus; Peinado, Miguel A; Malinverni, Roberto
2016-01-15
Statistically assessing the relation between a set of genomic regions and other genomic features is a common challenging task in genomic and epigenomic analyses. Randomization based approaches implicitly take into account the complexity of the genome without the need of assuming an underlying statistical model. regioneR is an R package that implements a permutation test framework specifically designed to work with genomic regions. In addition to the predefined randomization and evaluation strategies, regioneR is fully customizable allowing the use of custom strategies to adapt it to specific questions. Finally, it also implements a novel function to evaluate the local specificity of the detected association. regioneR is an R package released under Artistic-2.0 License. The source code and documents are freely available through Bioconductor (http://www.bioconductor.org/packages/regioneR). rmalinverni@carrerasresearch.org. © The Author 2015. Published by Oxford University Press.
Learning from Friends: Measuring Influence in a Dyadic Computer Instructional Setting
ERIC Educational Resources Information Center
DeLay, Dawn; Hartl, Amy C.; Laursen, Brett; Denner, Jill; Werner, Linda; Campe, Shannon; Ortiz, Eloy
2014-01-01
Data collected from partners in a dyadic instructional setting are, by definition, not statistically independent. As a consequence, conventional parametric statistical analyses of change and influence carry considerable risk of bias. In this article, we illustrate a strategy to overcome this obstacle: the longitudinal actor-partner interdependence…
SECIMTools: a suite of metabolomics data analysis tools.
Kirpich, Alexander S; Ibarra, Miguel; Moskalenko, Oleksandr; Fear, Justin M; Gerken, Joseph; Mi, Xinlei; Ashrafi, Ali; Morse, Alison M; McIntyre, Lauren M
2018-04-20
Metabolomics has the promise to transform the area of personalized medicine with the rapid development of high throughput technology for untargeted analysis of metabolites. Open access, easy to use, analytic tools that are broadly accessible to the biological community need to be developed. While technology used in metabolomics varies, most metabolomics studies have a set of features identified. Galaxy is an open access platform that enables scientists at all levels to interact with big data. Galaxy promotes reproducibility by saving histories and enabling the sharing workflows among scientists. SECIMTools (SouthEast Center for Integrated Metabolomics) is a set of Python applications that are available both as standalone tools and wrapped for use in Galaxy. The suite includes a comprehensive set of quality control metrics (retention time window evaluation and various peak evaluation tools), visualization techniques (hierarchical cluster heatmap, principal component analysis, modular modularity clustering), basic statistical analysis methods (partial least squares - discriminant analysis, analysis of variance, t-test, Kruskal-Wallis non-parametric test), advanced classification methods (random forest, support vector machines), and advanced variable selection tools (least absolute shrinkage and selection operator LASSO and Elastic Net). SECIMTools leverages the Galaxy platform and enables integrated workflows for metabolomics data analysis made from building blocks designed for easy use and interpretability. Standard data formats and a set of utilities allow arbitrary linkages between tools to encourage novel workflow designs. The Galaxy framework enables future data integration for metabolomics studies with other omics data.
NASA Technical Reports Server (NTRS)
Crutcher, H. L.; Falls, L. W.
1976-01-01
Sets of experimentally determined or routinely observed data provide information about the past, present and, hopefully, future sets of similarly produced data. An infinite set of statistical models exists which may be used to describe the data sets. The normal distribution is one model. If it serves at all, it serves well. If a data set, or a transformation of the set, representative of a larger population can be described by the normal distribution, then valid statistical inferences can be drawn. There are several tests which may be applied to a data set to determine whether the univariate normal model adequately describes the set. The chi-square test based on Pearson's work in the late nineteenth and early twentieth centuries is often used. Like all tests, it has some weaknesses which are discussed in elementary texts. Extension of the chi-square test to the multivariate normal model is provided. Tables and graphs permit easier application of the test in the higher dimensions. Several examples, using recorded data, illustrate the procedures. Tests of maximum absolute differences, mean sum of squares of residuals, runs and changes of sign are included in these tests. Dimensions one through five with selected sample sizes 11 to 101 are used to illustrate the statistical tests developed.
Accounting for interim safety monitoring of an adverse event upon termination of a clinical trial.
Dallas, Michael J
2008-01-01
Upon termination of a clinical trial that uses interim evaluations to determine whether the trial can be stopped, a proper statistical analysis must account for the interim evaluations. For example, in a group-sequential design where the efficacy of a treatment regimen is evaluated at interim stages, and the opportunity to stop the trial based on positive efficacy findings exists, the terminal p-value, point estimate, and confidence limits of the outcome of interest must be adjusted to eliminate bias. While it is standard practice to adjust terminal statistical analyses due to opportunities to stop for "positive" findings, adjusting due to opportunities to stop for "negative" findings is also important. Stopping rules for negative findings are particularly useful when monitoring a specific rare serious adverse event in trials designed to show safety with respect to the event. In these settings, establishing conservative stopping rules are appropriate, and therefore accounting for the interim monitoring can have a substantial effect on the final results. Here I present a method to account for interim safety monitoring and illustrate its usefulness. The method is demonstrated to have advantages over methodology that does not account for interim monitoring.