Sample records for statistical method called

  1. A comparative analysis of the statistical properties of large mobile phone calling networks.

    PubMed

    Li, Ming-Xia; Jiang, Zhi-Qiang; Xie, Wen-Jie; Miccichè, Salvatore; Tumminello, Michele; Zhou, Wei-Xing; Mantegna, Rosario N

    2014-05-30

    Mobile phone calling is one of the most widely used communication methods in modern society. The records of calls among mobile phone users provide us a valuable proxy for the understanding of human communication patterns embedded in social networks. Mobile phone users call each other forming a directed calling network. If only reciprocal calls are considered, we obtain an undirected mutual calling network. The preferential communication behavior between two connected users can be statistically tested and it results in two Bonferroni networks with statistically validated edges. We perform a comparative analysis of the statistical properties of these four networks, which are constructed from the calling records of more than nine million individuals in Shanghai over a period of 110 days. We find that these networks share many common structural properties and also exhibit idiosyncratic features when compared with previously studied large mobile calling networks. The empirical findings provide us an intriguing picture of a representative large social network that might shed new lights on the modelling of large social networks.

  2. The mediating effect of calling on the relationship between medical school students’ academic burnout and empathy

    PubMed Central

    2017-01-01

    Purpose This study is aimed at identifying the relationships between medical school students’ academic burnout, empathy, and calling, and determining whether their calling has a mediating effect on the relationship between academic burnout and empathy. Methods A mixed method study was conducted. One hundred twenty-seven medical students completed a survey. Scales measuring academic burnout, medical students’ empathy, and calling were utilized. For statistical analysis, correlation analysis, descriptive statistics analysis, and hierarchical multiple regression analyses were conducted. For qualitative approach, eight medical students participated in a focus group interview. Results The study found that empathy has a statistically significant, negative correlation with academic burnout, while having a significant, positive correlation with calling. Sense of calling proved to be an effective mediator of the relationship between academic burnout and empathy. Conclusion This result demonstrates that calling is a key variable that mediates the relationship between medical students’ academic burnout and empathy. As such, this study provides baseline data for an education that could improve medical students’ empathy skills. PMID:28870019

  3. The mediating effect of calling on the relationship between medical school students' academic burnout and empathy.

    PubMed

    Chae, Su Jin; Jeong, So Mi; Chung, Yoon-Sok

    2017-09-01

    This study is aimed at identifying the relationships between medical school students' academic burnout, empathy, and calling, and determining whether their calling has a mediating effect on the relationship between academic burnout and empathy. A mixed method study was conducted. One hundred twenty-seven medical students completed a survey. Scales measuring academic burnout, medical students' empathy, and calling were utilized. For statistical analysis, correlation analysis, descriptive statistics analysis, and hierarchical multiple regression analyses were conducted. For qualitative approach, eight medical students participated in a focus group interview. The study found that empathy has a statistically significant, negative correlation with academic burnout, while having a significant, positive correlation with calling. Sense of calling proved to be an effective mediator of the relationship between academic burnout and empathy. This result demonstrates that calling is a key variable that mediates the relationship between medical students' academic burnout and empathy. As such, this study provides baseline data for an education that could improve medical students' empathy skills.

  4. Measuring the effectiveness of patient-chosen reminder methods in a private orthodontic practice.

    PubMed

    Wegrzyniak, Lauren M; Hedderly, Deborah; Chaudry, Kishore; Bollu, Prashanti

    2018-05-01

    To evaluate the effectiveness of patient-chosen appointment reminder methods (phone call, e-mail, or SMS text) in reducing no-show rates. This was a retrospective case study that determined the correlation between patient-chosen appointment reminder methods and no-show rates in a private orthodontic practice. This study was conducted in a single office location of a multioffice private orthodontic practice using data gathered in 2015. The subjects were patients who self-selected the appointment reminder method (phone call, e-mail, or SMS text). Patient appointment data were collected over a 6-month period. Patient attendance was analyzed with descriptive statistics to determine any significant differences among patient-chosen reminder methods. There was a total of 1193 appointments with an average no-show rate of 2.43% across the three reminder methods. No statistically significant differences ( P = .569) were observed in the no-show rates between the three methods: phone call (3.49%), e-mail (2.68%), and SMS text (1.90%). The electronic appointment reminder methods (SMS text and e-mail) had lower no-show rates compared with the phone call method, with SMS text having the lowest no-show rate of 1.90%. However, since no significant differences were observed between the three patient-chosen reminder methods, providers may want to allow patients to choose their reminder method to decrease no-shows.

  5. MAFsnp: A Multi-Sample Accurate and Flexible SNP Caller Using Next-Generation Sequencing Data

    PubMed Central

    Hu, Jiyuan; Li, Tengfei; Xiu, Zidi; Zhang, Hong

    2015-01-01

    Most existing statistical methods developed for calling single nucleotide polymorphisms (SNPs) using next-generation sequencing (NGS) data are based on Bayesian frameworks, and there does not exist any SNP caller that produces p-values for calling SNPs in a frequentist framework. To fill in this gap, we develop a new method MAFsnp, a Multiple-sample based Accurate and Flexible algorithm for calling SNPs with NGS data. MAFsnp is based on an estimated likelihood ratio test (eLRT) statistic. In practical situation, the involved parameter is very close to the boundary of the parametric space, so the standard large sample property is not suitable to evaluate the finite-sample distribution of the eLRT statistic. Observing that the distribution of the test statistic is a mixture of zero and a continuous part, we propose to model the test statistic with a novel two-parameter mixture distribution. Once the parameters in the mixture distribution are estimated, p-values can be easily calculated for detecting SNPs, and the multiple-testing corrected p-values can be used to control false discovery rate (FDR) at any pre-specified level. With simulated data, MAFsnp is shown to have much better control of FDR than the existing SNP callers. Through the application to two real datasets, MAFsnp is also shown to outperform the existing SNP callers in terms of calling accuracy. An R package “MAFsnp” implementing the new SNP caller is freely available at http://homepage.fudan.edu.cn/zhangh/softwares/. PMID:26309201

  6. Resampling: A Marriage of Computers and Statistics. ERIC/TM Digest.

    ERIC Educational Resources Information Center

    Rudner, Lawrence M.; Shafer, Mary Morello

    Advances in computer technology are making it possible for educational researchers to use simpler statistical methods to address a wide range of questions with smaller data sets and fewer, and less restrictive, assumptions. This digest introduces computationally intensive statistics, collectively called resampling techniques. Resampling is a…

  7. Optimal Weights Mixed Filter for removing mixture of Gaussian and impulse noises

    PubMed Central

    Grama, Ion; Liu, Quansheng

    2017-01-01

    In this paper we consider the problem of restoration of a image contaminated by a mixture of Gaussian and impulse noises. We propose a new statistic called ROADGI which improves the well-known Rank-Ordered Absolute Differences (ROAD) statistic for detecting points contaminated with the impulse noise in this context. Combining ROADGI statistic with the method of weights optimization we obtain a new algorithm called Optimal Weights Mixed Filter (OWMF) to deal with the mixed noise. Our simulation results show that the proposed filter is effective for mixed noises, as well as for single impulse noise and for single Gaussian noise. PMID:28692667

  8. Optimal Weights Mixed Filter for removing mixture of Gaussian and impulse noises.

    PubMed

    Jin, Qiyu; Grama, Ion; Liu, Quansheng

    2017-01-01

    In this paper we consider the problem of restoration of a image contaminated by a mixture of Gaussian and impulse noises. We propose a new statistic called ROADGI which improves the well-known Rank-Ordered Absolute Differences (ROAD) statistic for detecting points contaminated with the impulse noise in this context. Combining ROADGI statistic with the method of weights optimization we obtain a new algorithm called Optimal Weights Mixed Filter (OWMF) to deal with the mixed noise. Our simulation results show that the proposed filter is effective for mixed noises, as well as for single impulse noise and for single Gaussian noise.

  9. Reservoir property grids improve with geostatistics

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Vogt, J.

    1993-09-01

    Visualization software, reservoir simulators and many other E and P software applications need reservoir property grids as input. Using geostatistics, as compared to other gridding methods, to produce these grids leads to the best output from the software programs. For the purpose stated herein, geostatistics is simply two types of gridding methods. Mathematically, these methods are based on minimizing or duplicating certain statistical properties of the input data. One geostatical method, called kriging, is used when the highest possible point-by-point accuracy is desired. The other method, called conditional simulation, is used when one wants statistics and texture of the resultingmore » grid to be the same as for the input data. In the following discussion, each method is explained, compared to other gridding methods, and illustrated through example applications. Proper use of geostatistical data in flow simulations, use of geostatistical data for history matching, and situations where geostatistics has no significant advantage over other methods, also will be covered.« less

  10. Survival analysis, or what to do with upper limits in astronomical surveys

    NASA Technical Reports Server (NTRS)

    Isobe, Takashi; Feigelson, Eric D.

    1986-01-01

    A field of applied statistics called survival analysis has been developed over several decades to deal with censored data, which occur in astronomical surveys when objects are too faint to be detected. How these methods can assist in the statistical interpretation of astronomical data are reviewed.

  11. Static Methods in the Design of Nonlinear Automatic Control Systems,

    DTIC Science & Technology

    1984-06-27

    227 Chapter VI. Ways of Decrease of the Number of Statistical Nodes During the Research of Nonlinear Systems...at present occupies the central place. This region of research was called the statistical dynamics of nonlinear H automatic control systems...receives further development in the numerous research of Soviet and C foreign scientists. Special role in the development of the statistical dynamics of

  12. A Computer-Assisted Instruction in Teaching Abstract Statistics to Public Affairs Undergraduates

    ERIC Educational Resources Information Center

    Ozturk, Ali Osman

    2012-01-01

    This article attempts to demonstrate the applicability of a computer-assisted instruction supported with simulated data in teaching abstract statistical concepts to political science and public affairs students in an introductory research methods course. The software is called the Elaboration Model Computer Exercise (EMCE) in that it takes a great…

  13. Guide to Using Onionskin Analysis Code (U)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Fugate, Michael Lynn; Morzinski, Jerome Arthur

    2016-09-15

    This document is a guide to using R-code written for the purpose of analyzing onionskin experiments. We expect the user to be very familiar with statistical methods and the R programming language. For more details about onionskin experiments and the statistical methods mentioned in this document see Storlie, Fugate, et al. (2013). Engineers at LANL experiment with detonators and high explosives to assess performance. The experimental unit, called an onionskin, is a hemisphere consisting of a detonator and a booster pellet surrounded by explosive material. When the detonator explodes, a streak camera mounted above the pole of the hemisphere recordsmore » when the shock wave arrives at the surface. The output from the camera is a two-dimensional image that is transformed into a curve that shows the arrival time as a function of polar angle. The statistical challenge is to characterize a baseline population of arrival time curves and to compare the baseline curves to curves from a new, so-called, test series. The hope is that the new test series of curves is statistically similar to the baseline population.« less

  14. Monte Carlo based statistical power analysis for mediation models: methods and software.

    PubMed

    Zhang, Zhiyong

    2014-12-01

    The existing literature on statistical power analysis for mediation models often assumes data normality and is based on a less powerful Sobel test instead of the more powerful bootstrap test. This study proposes to estimate statistical power to detect mediation effects on the basis of the bootstrap method through Monte Carlo simulation. Nonnormal data with excessive skewness and kurtosis are allowed in the proposed method. A free R package called bmem is developed to conduct the power analysis discussed in this study. Four examples, including a simple mediation model, a multiple-mediator model with a latent mediator, a multiple-group mediation model, and a longitudinal mediation model, are provided to illustrate the proposed method.

  15. Solutions to Some Nonlinear Equations from Nonmetric Data.

    ERIC Educational Resources Information Center

    Rule, Stanley J.

    1979-01-01

    A method to provide estimates of parameters of specified nonlinear equations from ordinal data generated from a crossed design is presented. The statistical basis for the method, called NOPE (nonmetric parameter estimation), as well as examples using artifical data, are presented. (Author/JKS)

  16. Non-homogeneous updates for the iterative coordinate descent algorithm

    NASA Astrophysics Data System (ADS)

    Yu, Zhou; Thibault, Jean-Baptiste; Bouman, Charles A.; Sauer, Ken D.; Hsieh, Jiang

    2007-02-01

    Statistical reconstruction methods show great promise for improving resolution, and reducing noise and artifacts in helical X-ray CT. In fact, statistical reconstruction seems to be particularly valuable in maintaining reconstructed image quality when the dosage is low and the noise is therefore high. However, high computational cost and long reconstruction times remain as a barrier to the use of statistical reconstruction in practical applications. Among the various iterative methods that have been studied for statistical reconstruction, iterative coordinate descent (ICD) has been found to have relatively low overall computational requirements due to its fast convergence. This paper presents a novel method for further speeding the convergence of the ICD algorithm, and therefore reducing the overall reconstruction time for statistical reconstruction. The method, which we call nonhomogeneous iterative coordinate descent (NH-ICD) uses spatially non-homogeneous updates to speed convergence by focusing computation where it is most needed. Experimental results with real data indicate that the method speeds reconstruction by roughly a factor of two for typical 3D multi-slice geometries.

  17. Pick a Sample.

    ERIC Educational Resources Information Center

    Peterson, Ivars

    1991-01-01

    A method that enables people to obtain the benefits of statistics and probability theory without the shortcomings of conventional methods because it is free of mathematical formulas and is easy to understand and use is described. A resampling technique called the "bootstrap" is discussed in terms of application and development. (KR)

  18. Climate Verification Using Running Mann Whitney Z Statistics

    USDA-ARS?s Scientific Manuscript database

    A robust method previously used to detect observed intra- to multi-decadal (IMD) climate regimes was adapted to test whether climate models could reproduce IMD variations in U.S. surface temperatures during 1919-2008. This procedure, called the running Mann Whitney Z (MWZ) method, samples data ranki...

  19. A non-parametric peak calling algorithm for DamID-Seq.

    PubMed

    Li, Renhua; Hempel, Leonie U; Jiang, Tingbo

    2015-01-01

    Protein-DNA interactions play a significant role in gene regulation and expression. In order to identify transcription factor binding sites (TFBS) of double sex (DSX)-an important transcription factor in sex determination, we applied the DNA adenine methylation identification (DamID) technology to the fat body tissue of Drosophila, followed by deep sequencing (DamID-Seq). One feature of DamID-Seq data is that induced adenine methylation signals are not assured to be symmetrically distributed at TFBS, which renders the existing peak calling algorithms for ChIP-Seq, including SPP and MACS, inappropriate for DamID-Seq data. This challenged us to develop a new algorithm for peak calling. A challenge in peaking calling based on sequence data is estimating the averaged behavior of background signals. We applied a bootstrap resampling method to short sequence reads in the control (Dam only). After data quality check and mapping reads to a reference genome, the peaking calling procedure compromises the following steps: 1) reads resampling; 2) reads scaling (normalization) and computing signal-to-noise fold changes; 3) filtering; 4) Calling peaks based on a statistically significant threshold. This is a non-parametric method for peak calling (NPPC). We also used irreproducible discovery rate (IDR) analysis, as well as ChIP-Seq data to compare the peaks called by the NPPC. We identified approximately 6,000 peaks for DSX, which point to 1,225 genes related to the fat body tissue difference between female and male Drosophila. Statistical evidence from IDR analysis indicated that these peaks are reproducible across biological replicates. In addition, these peaks are comparable to those identified by use of ChIP-Seq on S2 cells, in terms of peak number, location, and peaks width.

  20. Best practices for evaluating single nucleotide variant calling methods for microbial genomics

    PubMed Central

    Olson, Nathan D.; Lund, Steven P.; Colman, Rebecca E.; Foster, Jeffrey T.; Sahl, Jason W.; Schupp, James M.; Keim, Paul; Morrow, Jayne B.; Salit, Marc L.; Zook, Justin M.

    2015-01-01

    Innovations in sequencing technologies have allowed biologists to make incredible advances in understanding biological systems. As experience grows, researchers increasingly recognize that analyzing the wealth of data provided by these new sequencing platforms requires careful attention to detail for robust results. Thus far, much of the scientific Communit’s focus for use in bacterial genomics has been on evaluating genome assembly algorithms and rigorously validating assembly program performance. Missing, however, is a focus on critical evaluation of variant callers for these genomes. Variant calling is essential for comparative genomics as it yields insights into nucleotide-level organismal differences. Variant calling is a multistep process with a host of potential error sources that may lead to incorrect variant calls. Identifying and resolving these incorrect calls is critical for bacterial genomics to advance. The goal of this review is to provide guidance on validating algorithms and pipelines used in variant calling for bacterial genomics. First, we will provide an overview of the variant calling procedures and the potential sources of error associated with the methods. We will then identify appropriate datasets for use in evaluating algorithms and describe statistical methods for evaluating algorithm performance. As variant calling moves from basic research to the applied setting, standardized methods for performance evaluation and reporting are required; it is our hope that this review provides the groundwork for the development of these standards. PMID:26217378

  1. The difference between a dynamic and mechanical approach to stroke treatment.

    PubMed

    Helgason, Cathy M

    2007-06-01

    The current classification of stroke is based on causation, also called pathogenesis, and relies on binary logic faithful to the Aristotelian tradition. Accordingly, a pathology is or is not the cause of the stroke, is considered independent of others, and is the target for treatment. It is the subject for large double-blind randomized clinical therapeutic trials. The scientific view behind clinical trials is the fundamental concept that information is statistical, and causation is determined by probabilities. Therefore, the cause and effect relation will be determined by probability-theory-based statistics. This is the basis of evidence-based medicine, which calls for the results of such trials to be the basis for physician decisions regarding diagnosis and treatment. However, there are problems with the methodology behind evidence-based medicine. Calculations using probability-theory-based statistics regarding cause and effect are performed within an automatic system where there are known inputs and outputs. This method of research provides a framework of certainty with no surprise elements or outcomes. However, it is not a system or method that will come up with previously unknown variables, concepts, or universal principles; it is not a method that will give a new outcome; and it is not a method that allows for creativity, expertise, or new insight for problem solving.

  2. Masking as an effective quality control method for next-generation sequencing data analysis.

    PubMed

    Yun, Sajung; Yun, Sijung

    2014-12-13

    Next generation sequencing produces base calls with low quality scores that can affect the accuracy of identifying simple nucleotide variation calls, including single nucleotide polymorphisms and small insertions and deletions. Here we compare the effectiveness of two data preprocessing methods, masking and trimming, and the accuracy of simple nucleotide variation calls on whole-genome sequence data from Caenorhabditis elegans. Masking substitutes low quality base calls with 'N's (undetermined bases), whereas trimming removes low quality bases that results in a shorter read lengths. We demonstrate that masking is more effective than trimming in reducing the false-positive rate in single nucleotide polymorphism (SNP) calling. However, both of the preprocessing methods did not affect the false-negative rate in SNP calling with statistical significance compared to the data analysis without preprocessing. False-positive rate and false-negative rate for small insertions and deletions did not show differences between masking and trimming. We recommend masking over trimming as a more effective preprocessing method for next generation sequencing data analysis since masking reduces the false-positive rate in SNP calling without sacrificing the false-negative rate although trimming is more commonly used currently in the field. The perl script for masking is available at http://code.google.com/p/subn/. The sequencing data used in the study were deposited in the Sequence Read Archive (SRX450968 and SRX451773).

  3. Examining the effectiveness of discriminant function analysis and cluster analysis in species identification of male field crickets based on their calling songs.

    PubMed

    Jaiswara, Ranjana; Nandi, Diptarup; Balakrishnan, Rohini

    2013-01-01

    Traditional taxonomy based on morphology has often failed in accurate species identification owing to the occurrence of cryptic species, which are reproductively isolated but morphologically identical. Molecular data have thus been used to complement morphology in species identification. The sexual advertisement calls in several groups of acoustically communicating animals are species-specific and can thus complement molecular data as non-invasive tools for identification. Several statistical tools and automated identifier algorithms have been used to investigate the efficiency of acoustic signals in species identification. Despite a plethora of such methods, there is a general lack of knowledge regarding the appropriate usage of these methods in specific taxa. In this study, we investigated the performance of two commonly used statistical methods, discriminant function analysis (DFA) and cluster analysis, in identification and classification based on acoustic signals of field cricket species belonging to the subfamily Gryllinae. Using a comparative approach we evaluated the optimal number of species and calling song characteristics for both the methods that lead to most accurate classification and identification. The accuracy of classification using DFA was high and was not affected by the number of taxa used. However, a constraint in using discriminant function analysis is the need for a priori classification of songs. Accuracy of classification using cluster analysis, which does not require a priori knowledge, was maximum for 6-7 taxa and decreased significantly when more than ten taxa were analysed together. We also investigated the efficacy of two novel derived acoustic features in improving the accuracy of identification. Our results show that DFA is a reliable statistical tool for species identification using acoustic signals. Our results also show that cluster analysis of acoustic signals in crickets works effectively for species classification and identification.

  4. Re-Conceptualization of Modified Angoff Standard Setting: Unified Statistical, Measurement, Cognitive, and Social Psychological Theories

    ERIC Educational Resources Information Center

    Iyioke, Ifeoma Chika

    2013-01-01

    This dissertation describes a design for training, in accordance with probability judgment heuristics principles, for the Angoff standard setting method. The new training with instruction, practice, and feedback tailored to the probability judgment heuristics principles was called the Heuristic training and the prevailing Angoff method training…

  5. Revisiting the Scale-Invariant, Two-Dimensional Linear Regression Method

    ERIC Educational Resources Information Center

    Patzer, A. Beate C.; Bauer, Hans; Chang, Christian; Bolte, Jan; Su¨lzle, Detlev

    2018-01-01

    The scale-invariant way to analyze two-dimensional experimental and theoretical data with statistical errors in both the independent and dependent variables is revisited by using what we call the triangular linear regression method. This is compared to the standard least-squares fit approach by applying it to typical simple sets of example data…

  6. Diagramming the Never Ending Story: Student-generated diagrammatic stories integrate and retain science concepts improving science literacy

    NASA Astrophysics Data System (ADS)

    Pillsbury, Ralph T.

    This research examined an instructional strategy called Diagramming the Never Ending Story: A method called diagramming was taught to sixth grade students via an outdoor science inquiry ecology unit. Students generated diagrams of the new ecology concepts they encountered, creating explanatory 'captions' for their newly drawn diagrams while connecting them in a memorable story. The diagramming process culminates in 20-30 meter-long murals called the Never Ending Story: Months of science instruction are constructed as pictorial scrolls, making sense of all new science concepts they encounter. This method was taught at a North Carolina "Public" Charter School, Children's Community School, to measure its efficacy in helping students comprehend scientific concepts and retain them thereby increasing science literacy. There were four demographically similar classes of 20 students each. Two 'treatment' classes, randomly chosen from the four classes, generated their own Never Ending Stories after being taught the diagramming method. A Solomon Four-Group Design was employed: Two Classes (one control, one treatment) were administered pre- and post; two classes received post tests only. The tests were comprised of multiple choice, fill-in and extended response (open-ended) sections. Multiple choice and fill-in test data were not statistically significant whereas extended response test data confirm that treatment classes made statistically significant gains.

  7. Recall of past use of mobile phone handsets.

    PubMed

    Parslow, R C; Hepworth, S J; McKinney, P A

    2003-01-01

    Previous studies investigating health effects of mobile phones have based their estimation of exposure on self-reported levels of phone use. This UK validation study assesses the accuracy of reported voice calls made from mobile handsets. Data collected by postal questionnaire from 93 volunteers was compared to records obtained prospectively over 6 months from four network operators. Agreement was measured for outgoing calls using the kappa statistic, log-linear modelling, Spearman correlation coefficient and graphical methods. Agreement for number of calls gained moderate classification (kappa = 0.39) with better agreement for duration (kappa = 0.50). Log-linear modelling produced similar results. The Spearman correlation coefficient was 0.48 for number of calls and 0.60 for duration. Graphical agreement methods demonstrated patterns of over-reporting call numbers (by a factor of 1.7) and duration (by a factor of 2.8). These results suggest that self-reported mobile phone use may not fully represent patterns of actual use. This has implications for calculating exposures from questionnaire data.

  8. Spectral statistics of the uni-modular ensemble

    NASA Astrophysics Data System (ADS)

    Joyner, Christopher H.; Smilansky, Uzy; Weidenmüller, Hans A.

    2017-09-01

    We investigate the spectral statistics of Hermitian matrices in which the elements are chosen uniformly from U(1) , called the uni-modular ensemble (UME), in the limit of large matrix size. Using three complimentary methods; a supersymmetric integration method, a combinatorial graph-theoretical analysis and a Brownian motion approach, we are able to derive expressions for 1 / N corrections to the mean spectral moments and also analyse the fluctuations about this mean. By addressing the same ensemble from three different point of view, we can critically compare their relative advantages and derive some new results.

  9. Experiments in ESP.

    ERIC Educational Resources Information Center

    Guerin, Stephen M.; Guerin, Clark L.

    1979-01-01

    Discusses a phenomenon called Extrasensory Perception (ESP) whereby information is gained directly by the mind without the use of the ordinary senses. Experiments in ESP and the basic equipment and methods are presented. Statistical evaluation of ESP experimental results are also included. (HM)

  10. Inappropriate Fiddling with Statistical Analyses to Obtain a Desirable P-value: Tests to Detect its Presence in Published Literature

    PubMed Central

    Gadbury, Gary L.; Allison, David B.

    2012-01-01

    Much has been written regarding p-values below certain thresholds (most notably 0.05) denoting statistical significance and the tendency of such p-values to be more readily publishable in peer-reviewed journals. Intuition suggests that there may be a tendency to manipulate statistical analyses to push a “near significant p-value” to a level that is considered significant. This article presents a method for detecting the presence of such manipulation (herein called “fiddling”) in a distribution of p-values from independent studies. Simulations are used to illustrate the properties of the method. The results suggest that the method has low type I error and that power approaches acceptable levels as the number of p-values being studied approaches 1000. PMID:23056287

  11. Inappropriate fiddling with statistical analyses to obtain a desirable p-value: tests to detect its presence in published literature.

    PubMed

    Gadbury, Gary L; Allison, David B

    2012-01-01

    Much has been written regarding p-values below certain thresholds (most notably 0.05) denoting statistical significance and the tendency of such p-values to be more readily publishable in peer-reviewed journals. Intuition suggests that there may be a tendency to manipulate statistical analyses to push a "near significant p-value" to a level that is considered significant. This article presents a method for detecting the presence of such manipulation (herein called "fiddling") in a distribution of p-values from independent studies. Simulations are used to illustrate the properties of the method. The results suggest that the method has low type I error and that power approaches acceptable levels as the number of p-values being studied approaches 1000.

  12. Site specific passive acoustic detection and densities of humpback whale calls off the coast of California

    NASA Astrophysics Data System (ADS)

    Helble, Tyler Adam

    Passive acoustic monitoring of marine mammal calls is an increasingly important method for assessing population numbers, distribution, and behavior. Automated methods are needed to aid in the analyses of the recorded data. When a mammal vocalizes in the marine environment, the received signal is a filtered version of the original waveform emitted by the marine mammal. The waveform is reduced in amplitude and distorted due to propagation effects that are influenced by the bathymetry and environment. It is important to account for these effects to determine a site-specific probability of detection for marine mammal calls in a given study area. A knowledge of that probability function over a range of environmental and ocean noise conditions allows vocalization statistics from recordings of single, fixed, omnidirectional sensors to be compared across sensors and at the same sensor over time with less bias and uncertainty in the results than direct comparison of the raw statistics. This dissertation focuses on both the development of new tools needed to automatically detect humpback whale vocalizations from single-fixed omnidirectional sensors as well as the determination of the site-specific probability of detection for monitoring sites off the coast of California. Using these tools, detected humpback calls are "calibrated" for environmental properties using the site-specific probability of detection values, and presented as call densities (calls per square kilometer per time). A two-year monitoring effort using these calibrated call densities reveals important biological and ecological information on migrating humpback whales off the coast of California. Call density trends are compared between the monitoring sites and at the same monitoring site over time. Call densities also are compared to several natural and human-influenced variables including season, time of day, lunar illumination, and ocean noise. The results reveal substantial differences in call densities between the two sites which were not noticeable using uncorrected (raw) call counts. Additionally, a Lombard effect was observed for humpback whale vocalizations in response to increasing ocean noise. The results presented in this thesis develop techniques to accurately measure marine mammal abundances from passive acoustic sensors.

  13. Estimation of Return Values of Wave Height: Consequences of Missing Observations

    ERIC Educational Resources Information Center

    Ryden, Jesper

    2008-01-01

    Extreme-value statistics is often used to estimate so-called return values (actually related to quantiles) for environmental quantities like wind speed or wave height. A basic method for estimation is the method of block maxima which consists in partitioning observations in blocks, where maxima from each block could be considered independent.…

  14. Cross-correlation detection and analysis for California's electricity market based on analogous multifractal analysis

    NASA Astrophysics Data System (ADS)

    Wang, Fang; Liao, Gui-ping; Li, Jian-hui; Zou, Rui-biao; Shi, Wen

    2013-03-01

    A novel method, which we called the analogous multifractal cross-correlation analysis, is proposed in this paper to study the multifractal behavior in the power-law cross-correlation between price and load in California electricity market. In addition, a statistic ρAMF -XA, which we call the analogous multifractal cross-correlation coefficient, is defined to test whether the cross-correlation between two given signals is genuine or not. Our analysis finds that both the price and load time series in California electricity market express multifractal nature. While, as indicated by the ρAMF -XA statistical test, there is a huge difference in the cross-correlation behavior between the years 1999 and 2000 in California electricity markets.

  15. Cross-correlation detection and analysis for California's electricity market based on analogous multifractal analysis.

    PubMed

    Wang, Fang; Liao, Gui-ping; Li, Jian-hui; Zou, Rui-biao; Shi, Wen

    2013-03-01

    A novel method, which we called the analogous multifractal cross-correlation analysis, is proposed in this paper to study the multifractal behavior in the power-law cross-correlation between price and load in California electricity market. In addition, a statistic ρAMF-XA, which we call the analogous multifractal cross-correlation coefficient, is defined to test whether the cross-correlation between two given signals is genuine or not. Our analysis finds that both the price and load time series in California electricity market express multifractal nature. While, as indicated by the ρAMF-XA statistical test, there is a huge difference in the cross-correlation behavior between the years 1999 and 2000 in California electricity markets.

  16. A voxel-based investigation for MRI-only radiotherapy of the brain using ultra short echo times

    NASA Astrophysics Data System (ADS)

    Edmund, Jens M.; Kjer, Hans M.; Van Leemput, Koen; Hansen, Rasmus H.; Andersen, Jon AL; Andreasen, Daniel

    2014-12-01

    Radiotherapy (RT) based on magnetic resonance imaging (MRI) as the only modality, so-called MRI-only RT, would remove the systematic registration error between MR and computed tomography (CT), and provide co-registered MRI for assessment of treatment response and adaptive RT. Electron densities, however, need to be assigned to the MRI images for dose calculation and patient setup based on digitally reconstructed radiographs (DRRs). Here, we investigate the geometric and dosimetric performance for a number of popular voxel-based methods to generate a so-called pseudo CT (pCT). Five patients receiving cranial irradiation, each containing a co-registered MRI and CT scan, were included. An ultra short echo time MRI sequence for bone visualization was used. Six methods were investigated for three popular types of voxel-based approaches; (1) threshold-based segmentation, (2) Bayesian segmentation and (3) statistical regression. Each approach contained two methods. Approach 1 used bulk density assignment of MRI voxels into air, soft tissue and bone based on logical masks and the transverse relaxation time T2 of the bone. Approach 2 used similar bulk density assignments with Bayesian statistics including or excluding additional spatial information. Approach 3 used a statistical regression correlating MRI voxels with their corresponding CT voxels. A similar photon and proton treatment plan was generated for a target positioned between the nasal cavity and the brainstem for all patients. The CT agreement with the pCT of each method was quantified and compared with the other methods geometrically and dosimetrically using both a number of reported metrics and introducing some novel metrics. The best geometrical agreement with CT was obtained with the statistical regression methods which performed significantly better than the threshold and Bayesian segmentation methods (excluding spatial information). All methods agreed significantly better with CT than a reference water MRI comparison. The mean dosimetric deviation for photons and protons compared to the CT was about 2% and highest in the gradient dose region of the brainstem. Both the threshold based method and the statistical regression methods showed the highest dosimetrical agreement. Generation of pCTs using statistical regression seems to be the most promising candidate for MRI-only RT of the brain. Further, the total amount of different tissues needs to be taken into account for dosimetric considerations regardless of their correct geometrical position.

  17. Using the Bootstrap Method for a Statistical Significance Test of Differences between Summary Histograms

    NASA Technical Reports Server (NTRS)

    Xu, Kuan-Man

    2006-01-01

    A new method is proposed to compare statistical differences between summary histograms, which are the histograms summed over a large ensemble of individual histograms. It consists of choosing a distance statistic for measuring the difference between summary histograms and using a bootstrap procedure to calculate the statistical significance level. Bootstrapping is an approach to statistical inference that makes few assumptions about the underlying probability distribution that describes the data. Three distance statistics are compared in this study. They are the Euclidean distance, the Jeffries-Matusita distance and the Kuiper distance. The data used in testing the bootstrap method are satellite measurements of cloud systems called cloud objects. Each cloud object is defined as a contiguous region/patch composed of individual footprints or fields of view. A histogram of measured values over footprints is generated for each parameter of each cloud object and then summary histograms are accumulated over all individual histograms in a given cloud-object size category. The results of statistical hypothesis tests using all three distances as test statistics are generally similar, indicating the validity of the proposed method. The Euclidean distance is determined to be most suitable after comparing the statistical tests of several parameters with distinct probability distributions among three cloud-object size categories. Impacts on the statistical significance levels resulting from differences in the total lengths of satellite footprint data between two size categories are also discussed.

  18. Implementation of unsteady sampling procedures for the parallel direct simulation Monte Carlo method

    NASA Astrophysics Data System (ADS)

    Cave, H. M.; Tseng, K.-C.; Wu, J.-S.; Jermy, M. C.; Huang, J.-C.; Krumdieck, S. P.

    2008-06-01

    An unsteady sampling routine for a general parallel direct simulation Monte Carlo method called PDSC is introduced, allowing the simulation of time-dependent flow problems in the near continuum range. A post-processing procedure called DSMC rapid ensemble averaging method (DREAM) is developed to improve the statistical scatter in the results while minimising both memory and simulation time. This method builds an ensemble average of repeated runs over small number of sampling intervals prior to the sampling point of interest by restarting the flow using either a Maxwellian distribution based on macroscopic properties for near equilibrium flows (DREAM-I) or output instantaneous particle data obtained by the original unsteady sampling of PDSC for strongly non-equilibrium flows (DREAM-II). The method is validated by simulating shock tube flow and the development of simple Couette flow. Unsteady PDSC is found to accurately predict the flow field in both cases with significantly reduced run-times over single processor code and DREAM greatly reduces the statistical scatter in the results while maintaining accurate particle velocity distributions. Simulations are then conducted of two applications involving the interaction of shocks over wedges. The results of these simulations are compared to experimental data and simulations from the literature where there these are available. In general, it was found that 10 ensembled runs of DREAM processing could reduce the statistical uncertainty in the raw PDSC data by 2.5-3.3 times, based on the limited number of cases in the present study.

  19. A call to improve methods for estimating tree biomass for regional and national assessments

    Treesearch

    Aaron R. Weiskittel; David W. MacFarlane; Philip J. Radtke; David L.R. Affleck; Hailemariam Temesgen; Christopher W. Woodall; James A. Westfall; John W. Coulston

    2015-01-01

    Tree biomass is typically estimated using statistical models. This review highlights five limitations of most tree biomass models, which include the following: (1) biomass data are costly to collect and alternative sampling methods are used; (2) belowground data and models are generally lacking; (3) models are often developed from small and geographically limited data...

  20. Detection of Cutting Tool Wear using Statistical Analysis and Regression Model

    NASA Astrophysics Data System (ADS)

    Ghani, Jaharah A.; Rizal, Muhammad; Nuawi, Mohd Zaki; Haron, Che Hassan Che; Ramli, Rizauddin

    2010-10-01

    This study presents a new method for detecting the cutting tool wear based on the measured cutting force signals. A statistical-based method called Integrated Kurtosis-based Algorithm for Z-Filter technique, called I-kaz was used for developing a regression model and 3D graphic presentation of I-kaz 3D coefficient during machining process. The machining tests were carried out using a CNC turning machine Colchester Master Tornado T4 in dry cutting condition. A Kistler 9255B dynamometer was used to measure the cutting force signals, which were transmitted, analyzed, and displayed in the DasyLab software. Various force signals from machining operation were analyzed, and each has its own I-kaz 3D coefficient. This coefficient was examined and its relationship with flank wear lands (VB) was determined. A regression model was developed due to this relationship, and results of the regression model shows that the I-kaz 3D coefficient value decreases as tool wear increases. The result then is used for real time tool wear monitoring.

  1. Calculating phase equilibrium properties of plasma pseudopotential model using hybrid Gibbs statistical ensemble Monte-Carlo technique

    NASA Astrophysics Data System (ADS)

    Butlitsky, M. A.; Zelener, B. B.; Zelener, B. V.

    2015-11-01

    Earlier a two-component pseudopotential plasma model, which we called a “shelf Coulomb” model has been developed. A Monte-Carlo study of canonical NVT ensemble with periodic boundary conditions has been undertaken to calculate equations of state, pair distribution functions, internal energies and other thermodynamics properties of the model. In present work, an attempt is made to apply so-called hybrid Gibbs statistical ensemble Monte-Carlo technique to this model. First simulation results data show qualitatively similar results for critical point region for both methods. Gibbs ensemble technique let us to estimate the melting curve position and a triple point of the model (in reduced temperature and specific volume coordinates): T* ≈ 0.0476, v* ≈ 6 × 10-4.

  2. A Generalized Approach for Measuring Relationships Among Genes.

    PubMed

    Wang, Lijun; Ahsan, Md Asif; Chen, Ming

    2017-07-21

    Several methods for identifying relationships among pairs of genes have been developed. In this article, we present a generalized approach for measuring relationships between any pairs of genes, which is based on statistical prediction. We derive two particular versions of the generalized approach, least squares estimation (LSE) and nearest neighbors prediction (NNP). According to mathematical proof, LSE is equivalent to the methods based on correlation; and NNP is approximate to one popular method called the maximal information coefficient (MIC) according to the performances in simulations and real dataset. Moreover, the approach based on statistical prediction can be extended from two-genes relationships to multi-genes relationships. This application would help to identify relationships among multi-genes.

  3. High call volume at poison control centers: identification and implications for communication

    PubMed Central

    CARAVATI, E. M.; LATIMER, S.; REBLIN, M.; BENNETT, H. K. W.; CUMMINS, M. R.; CROUCH, B. I.; ELLINGTON, L.

    2016-01-01

    Context High volume surges in health care are uncommon and unpredictable events. Their impact on health system performance and capacity is difficult to study. Objectives To identify time periods that exhibited very busy conditions at a poison control center and to determine whether cases and communication during high volume call periods are different from cases during low volume periods. Methods Call data from a US poison control center over twelve consecutive months was collected via a call logger and an electronic case database (Toxicall®). Variables evaluated for high call volume conditions were: (1) call duration; (2) number of cases; and (3) number of calls per staff member per 30 minute period. Statistical analyses identified peak periods as busier than 99% of all other 30 minute time periods and low volume periods as slower than 70% of all other 30 minute periods. Case and communication characteristics of high volume and low volume calls were compared using logistic regression. Results A total of 65,364 incoming calls occurred over 12 months. One hundred high call volume and 4885 low call volume 30 minute periods were identified. High volume periods were more common between 1500 and 2300 hours and during the winter months. Coded verbal communication data were evaluated for 42 high volume and 296 low volume calls. The mean (standard deviation) call length of these calls during high volume and low volume periods was 3 minutes 27 seconds (1 minute 46 seconds) and 3 minutes 57 seconds (2 minutes 11 seconds), respectively. Regression analyses revealed a trend for fewer overall verbal statements and fewer staff questions during peak periods, but no other significant differences for staff-caller communication behaviors were found. Conclusion Peak activity for poison center call volume can be identified by statistical modeling. Calls during high volume periods were similar to low volume calls. Communication was more concise yet staff was able to maintain a good rapport with callers during busy call periods. This approach allows evaluation of poison exposure call characteristics and communication during high volume periods. PMID:22889059

  4. Citation of previous meta-analyses on the same topic: a clue to perpetuation of incorrect methods?

    PubMed

    Li, Tianjing; Dickersin, Kay

    2013-06-01

    Systematic reviews and meta-analyses serve as a basis for decision-making and clinical practice guidelines and should be carried out using appropriate methodology to avoid incorrect inferences. We describe the characteristics, statistical methods used for meta-analyses, and citation patterns of all 21 glaucoma systematic reviews we identified pertaining to the effectiveness of prostaglandin analog eye drops in treating primary open-angle glaucoma, published between December 2000 and February 2012. We abstracted data, assessed whether appropriate statistical methods were applied in meta-analyses, and examined citation patterns of included reviews. We identified two forms of problematic statistical analyses in 9 of the 21 systematic reviews examined. Except in 1 case, none of the 9 reviews that used incorrect statistical methods cited a previously published review that used appropriate methods. Reviews that used incorrect methods were cited 2.6 times more often than reviews that used appropriate statistical methods. We speculate that by emulating the statistical methodology of previous systematic reviews, systematic review authors may have perpetuated incorrect approaches to meta-analysis. The use of incorrect statistical methods, perhaps through emulating methods described in previous research, calls conclusions of systematic reviews into question and may lead to inappropriate patient care. We urge systematic review authors and journal editors to seek the advice of experienced statisticians before undertaking or accepting for publication a systematic review and meta-analysis. The author(s) have no proprietary or commercial interest in any materials discussed in this article. Copyright © 2013 American Academy of Ophthalmology. Published by Elsevier Inc. All rights reserved.

  5. Implementation and Use of a Crisis Hotline During the Treatment as Usual and Universal Screening Phases of a Suicide Intervention Study

    PubMed Central

    Arias, Sarah A.; Sullivan, Ashley F.; Miller, Ivan; Camargo, Carlos A.; Boudreaux, Edwin D.

    2015-01-01

    Background Although research suggests that crisis hotlines are an effective means of mitigating suicide risk, lack of empirical evidence may limit the use of this method as a research safety protocol. Purpose This study describes the use of a crisis hotline to provide clinical backup for research assessments. Methods Data were analyzed from participants in the Emergency Department Safety and Follow-up Evaluation (ED-SAFE) study (n=874). Socio-demographics, call completion data, and data available on suicide attempts occurring in relation to the crisis counseling call were analyzed. Pearson chi-squared statistic for differences in proportions were conducted to compare characteristics of patients receiving versus not receiving crisis counseling. P<0.05 was considered statistically significant. Results Overall, there were 163 counseling calls (6% of total assessment calls) from 135 (16%) of the enrolled subjects who were transferred to the crisis line because of suicide risk identified during the research assessment. For those transferred to the crisis line, the median age was 40 years (interquartile range 27–48) with 67% female, 80% white, and 11% Hispanic. Conclusions Increasing demand for suicide interventions in diverse healthcare settings warrants consideration of crisis hotlines as a safety protocol mechanism. Our findings provide background on how a crisis hotline was implemented as a safety measure, as well as the type of patients who may utilize this safety protocol. PMID:26341724

  6. Examining the Effectiveness of Discriminant Function Analysis and Cluster Analysis in Species Identification of Male Field Crickets Based on Their Calling Songs

    PubMed Central

    Jaiswara, Ranjana; Nandi, Diptarup; Balakrishnan, Rohini

    2013-01-01

    Traditional taxonomy based on morphology has often failed in accurate species identification owing to the occurrence of cryptic species, which are reproductively isolated but morphologically identical. Molecular data have thus been used to complement morphology in species identification. The sexual advertisement calls in several groups of acoustically communicating animals are species-specific and can thus complement molecular data as non-invasive tools for identification. Several statistical tools and automated identifier algorithms have been used to investigate the efficiency of acoustic signals in species identification. Despite a plethora of such methods, there is a general lack of knowledge regarding the appropriate usage of these methods in specific taxa. In this study, we investigated the performance of two commonly used statistical methods, discriminant function analysis (DFA) and cluster analysis, in identification and classification based on acoustic signals of field cricket species belonging to the subfamily Gryllinae. Using a comparative approach we evaluated the optimal number of species and calling song characteristics for both the methods that lead to most accurate classification and identification. The accuracy of classification using DFA was high and was not affected by the number of taxa used. However, a constraint in using discriminant function analysis is the need for a priori classification of songs. Accuracy of classification using cluster analysis, which does not require a priori knowledge, was maximum for 6–7 taxa and decreased significantly when more than ten taxa were analysed together. We also investigated the efficacy of two novel derived acoustic features in improving the accuracy of identification. Our results show that DFA is a reliable statistical tool for species identification using acoustic signals. Our results also show that cluster analysis of acoustic signals in crickets works effectively for species classification and identification. PMID:24086666

  7. The emergence of modern statistics in agricultural science: analysis of variance, experimental design and the reshaping of research at Rothamsted Experimental Station, 1919-1933.

    PubMed

    Parolini, Giuditta

    2015-01-01

    During the twentieth century statistical methods have transformed research in the experimental and social sciences. Qualitative evidence has largely been replaced by quantitative results and the tools of statistical inference have helped foster a new ideal of objectivity in scientific knowledge. The paper will investigate this transformation by considering the genesis of analysis of variance and experimental design, statistical methods nowadays taught in every elementary course of statistics for the experimental and social sciences. These methods were developed by the mathematician and geneticist R. A. Fisher during the 1920s, while he was working at Rothamsted Experimental Station, where agricultural research was in turn reshaped by Fisher's methods. Analysis of variance and experimental design required new practices and instruments in field and laboratory research, and imposed a redistribution of expertise among statisticians, experimental scientists and the farm staff. On the other hand the use of statistical methods in agricultural science called for a systematization of information management and made computing an activity integral to the experimental research done at Rothamsted, permanently integrating the statisticians' tools and expertise into the station research programme. Fisher's statistical methods did not remain confined within agricultural research and by the end of the 1950s they had come to stay in psychology, sociology, education, chemistry, medicine, engineering, economics, quality control, just to mention a few of the disciplines which adopted them.

  8. A Comparison of Didactic and Inquiry Teaching Methods in a Rural Community College Earth Science Course

    NASA Astrophysics Data System (ADS)

    Beam, Margery Elizabeth

    The combination of increasing enrollment and the importance of providing transfer students a solid foundation in science calls for science faculty to evaluate teaching methods in rural community colleges. The purpose of this study was to examine and compare the effectiveness of two teaching methods, inquiry teaching methods and didactic teaching methods, applied in a rural community college earth science course. Two groups of students were taught the same content via inquiry and didactic teaching methods. Analysis of quantitative data included a non-parametric ranking statistical testing method in which the difference between the rankings and the median of the post-test scores was analyzed for significance. Results indicated there was not a significant statistical difference between the teaching methods for the group of students participating in the research. The practical and educational significance of this study provides valuable perspectives on teaching methods and student learning styles in rural community colleges.

  9. Fine Mapping Causal Variants with an Approximate Bayesian Method Using Marginal Test Statistics

    PubMed Central

    Chen, Wenan; Larrabee, Beth R.; Ovsyannikova, Inna G.; Kennedy, Richard B.; Haralambieva, Iana H.; Poland, Gregory A.; Schaid, Daniel J.

    2015-01-01

    Two recently developed fine-mapping methods, CAVIAR and PAINTOR, demonstrate better performance over other fine-mapping methods. They also have the advantage of using only the marginal test statistics and the correlation among SNPs. Both methods leverage the fact that the marginal test statistics asymptotically follow a multivariate normal distribution and are likelihood based. However, their relationship with Bayesian fine mapping, such as BIMBAM, is not clear. In this study, we first show that CAVIAR and BIMBAM are actually approximately equivalent to each other. This leads to a fine-mapping method using marginal test statistics in the Bayesian framework, which we call CAVIAR Bayes factor (CAVIARBF). Another advantage of the Bayesian framework is that it can answer both association and fine-mapping questions. We also used simulations to compare CAVIARBF with other methods under different numbers of causal variants. The results showed that both CAVIARBF and BIMBAM have better performance than PAINTOR and other methods. Compared to BIMBAM, CAVIARBF has the advantage of using only marginal test statistics and takes about one-quarter to one-fifth of the running time. We applied different methods on two independent cohorts of the same phenotype. Results showed that CAVIARBF, BIMBAM, and PAINTOR selected the same top 3 SNPs; however, CAVIARBF and BIMBAM had better consistency in selecting the top 10 ranked SNPs between the two cohorts. Software is available at https://bitbucket.org/Wenan/caviarbf. PMID:25948564

  10. TRAPR: R Package for Statistical Analysis and Visualization of RNA-Seq Data.

    PubMed

    Lim, Jae Hyun; Lee, Soo Youn; Kim, Ju Han

    2017-03-01

    High-throughput transcriptome sequencing, also known as RNA sequencing (RNA-Seq), is a standard technology for measuring gene expression with unprecedented accuracy. Numerous bioconductor packages have been developed for the statistical analysis of RNA-Seq data. However, these tools focus on specific aspects of the data analysis pipeline, and are difficult to appropriately integrate with one another due to their disparate data structures and processing methods. They also lack visualization methods to confirm the integrity of the data and the process. In this paper, we propose an R-based RNA-Seq analysis pipeline called TRAPR, an integrated tool that facilitates the statistical analysis and visualization of RNA-Seq expression data. TRAPR provides various functions for data management, the filtering of low-quality data, normalization, transformation, statistical analysis, data visualization, and result visualization that allow researchers to build customized analysis pipelines.

  11. [Statistical analysis of German radiologic periodicals: developmental trends in the last 10 years].

    PubMed

    Golder, W

    1999-09-01

    To identify which statistical tests are applied in German radiological publications, to what extent their use has changed during the last decade, and which factors might be responsible for this development. The major articles published in "ROFO" and "DER RADIOLOGE" during 1988, 1993 and 1998 were reviewed for statistical content. The contributions were classified by principal focus and radiological subspecialty. The methods used were assigned to descriptive, basal and advanced statistics. Sample size, significance level and power were established. The use of experts' assistance was monitored. Finally, we calculated the so-called cumulative accessibility of the publications. 525 contributions were found to be eligible. In 1988, 87% used descriptive statistics only, 12.5% basal, and 0.5% advanced statistics. The corresponding figures in 1993 and 1998 are 62 and 49%, 32 and 41%, and 6 and 10%, respectively. Statistical techniques were most likely to be used in research on musculoskeletal imaging and articles dedicated to MRI. Six basic categories of statistical methods account for the complete statistical analysis appearing in 90% of the articles. ROC analysis is the single most common advanced technique. Authors make increasingly use of statistical experts' opinion and programs. During the last decade, the use of statistical methods in German radiological journals has fundamentally improved, both quantitatively and qualitatively. Presently, advanced techniques account for 20% of the pertinent statistical tests. This development seems to be promoted by the increasing availability of statistical analysis software.

  12. Empirical comparison study of approximate methods for structure selection in binary graphical models.

    PubMed

    Viallon, Vivian; Banerjee, Onureena; Jougla, Eric; Rey, Grégoire; Coste, Joel

    2014-03-01

    Looking for associations among multiple variables is a topical issue in statistics due to the increasing amount of data encountered in biology, medicine, and many other domains involving statistical applications. Graphical models have recently gained popularity for this purpose in the statistical literature. In the binary case, however, exact inference is generally very slow or even intractable because of the form of the so-called log-partition function. In this paper, we review various approximate methods for structure selection in binary graphical models that have recently been proposed in the literature and compare them through an extensive simulation study. We also propose a modification of one existing method, that is shown to achieve good performance and to be generally very fast. We conclude with an application in which we search for associations among causes of death recorded on French death certificates. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  13. A Deterministic Annealing Approach to Clustering AIRS Data

    NASA Technical Reports Server (NTRS)

    Guillaume, Alexandre; Braverman, Amy; Ruzmaikin, Alexander

    2012-01-01

    We will examine the validity of means and standard deviations as a basis for climate data products. We will explore the conditions under which these two simple statistics are inadequate summaries of the underlying empirical probability distributions by contrasting them with a nonparametric, method called Deterministic Annealing technique

  14. A new exact and more powerful unconditional test of no treatment effect from binary matched pairs.

    PubMed

    Lloyd, Chris J

    2008-09-01

    We consider the problem of testing for a difference in the probability of success from matched binary pairs. Starting with three standard inexact tests, the nuisance parameter is first estimated and then the residual dependence is eliminated by maximization, producing what I call an E+M P-value. The E+M P-value based on McNemar's statistic is shown numerically to dominate previous suggestions, including partially maximized P-values as described in Berger and Sidik (2003, Statistical Methods in Medical Research 12, 91-108). The latter method, however, may have computational advantages for large samples.

  15. Adapt-Mix: learning local genetic correlation structure improves summary statistics-based analyses

    PubMed Central

    Park, Danny S.; Brown, Brielin; Eng, Celeste; Huntsman, Scott; Hu, Donglei; Torgerson, Dara G.; Burchard, Esteban G.; Zaitlen, Noah

    2015-01-01

    Motivation: Approaches to identifying new risk loci, training risk prediction models, imputing untyped variants and fine-mapping causal variants from summary statistics of genome-wide association studies are playing an increasingly important role in the human genetics community. Current summary statistics-based methods rely on global ‘best guess’ reference panels to model the genetic correlation structure of the dataset being studied. This approach, especially in admixed populations, has the potential to produce misleading results, ignores variation in local structure and is not feasible when appropriate reference panels are missing or small. Here, we develop a method, Adapt-Mix, that combines information across all available reference panels to produce estimates of local genetic correlation structure for summary statistics-based methods in arbitrary populations. Results: We applied Adapt-Mix to estimate the genetic correlation structure of both admixed and non-admixed individuals using simulated and real data. We evaluated our method by measuring the performance of two summary statistics-based methods: imputation and joint-testing. When using our method as opposed to the current standard of ‘best guess’ reference panels, we observed a 28% decrease in mean-squared error for imputation and a 73.7% decrease in mean-squared error for joint-testing. Availability and implementation: Our method is publicly available in a software package called ADAPT-Mix available at https://github.com/dpark27/adapt_mix. Contact: noah.zaitlen@ucsf.edu PMID:26072481

  16. THE SCREENING AND RANKING ALGORITHM FOR CHANGE-POINTS DETECTION IN MULTIPLE SAMPLES

    PubMed Central

    Song, Chi; Min, Xiaoyi; Zhang, Heping

    2016-01-01

    The chromosome copy number variation (CNV) is the deviation of genomic regions from their normal copy number states, which may associate with many human diseases. Current genetic studies usually collect hundreds to thousands of samples to study the association between CNV and diseases. CNVs can be called by detecting the change-points in mean for sequences of array-based intensity measurements. Although multiple samples are of interest, the majority of the available CNV calling methods are single sample based. Only a few multiple sample methods have been proposed using scan statistics that are computationally intensive and designed toward either common or rare change-points detection. In this paper, we propose a novel multiple sample method by adaptively combining the scan statistic of the screening and ranking algorithm (SaRa), which is computationally efficient and is able to detect both common and rare change-points. We prove that asymptotically this method can find the true change-points with almost certainty and show in theory that multiple sample methods are superior to single sample methods when shared change-points are of interest. Additionally, we report extensive simulation studies to examine the performance of our proposed method. Finally, using our proposed method as well as two competing approaches, we attempt to detect CNVs in the data from the Primary Open-Angle Glaucoma Genes and Environment study, and conclude that our method is faster and requires less information while our ability to detect the CNVs is comparable or better. PMID:28090239

  17. FORSPAN Model Users Guide

    USGS Publications Warehouse

    Klett, T.R.; Charpentier, Ronald R.

    2003-01-01

    The USGS FORSPAN model is designed for the assessment of continuous accumulations of crude oil, natural gas, and natural gas liquids (collectively called petroleum). Continuous (also called ?unconventional?) accumulations have large spatial dimensions and lack well defined down-dip petroleum/water contacts. Oil and natural gas therefore are not localized by buoyancy in water in these accumulations. Continuous accumulations include ?tight gas reservoirs,? coalbed gas, oil and gas in shale, oil and gas in chalk, and shallow biogenic gas. The FORSPAN model treats a continuous accumulation as a collection of petroleumcontaining cells for assessment purposes. Each cell is capable of producing oil or gas, but the cells may vary significantly from one another in their production (and thus economic) characteristics. The potential additions to reserves from continuous petroleum resources are calculated by statistically combining probability distributions of the estimated number of untested cells having the potential for additions to reserves with the estimated volume of oil and natural gas that each of the untested cells may potentially produce (total recovery). One such statistical method for combination of number of cells with total recovery, used by the USGS, is called ACCESS.

  18. Implementing Peer-Assisted Writing Support in German Secondary Schools

    ERIC Educational Resources Information Center

    Rensing, Julia; Vierbuchen, Marie-Christine; Hillenbrand, Clemens; Grünke, Matthias

    2016-01-01

    The alarming results of large studies such as the National Assessment of Educational Progress (NAEP; National Center for Education Statistics, 2012) point to an urgent need for writing support and call for specific and effective methods to foster writing competencies. The main purpose of this paper is to describe an innovative peer-assisted…

  19. Standard and Robust Methods in Regression Imputation

    ERIC Educational Resources Information Center

    Moraveji, Behjat; Jafarian, Koorosh

    2014-01-01

    The aim of this paper is to provide an introduction of new imputation algorithms for estimating missing values from official statistics in larger data sets of data pre-processing, or outliers. The goal is to propose a new algorithm called IRMI (iterative robust model-based imputation). This algorithm is able to deal with all challenges like…

  20. A critique of the usefulness of inferential statistics in applied behavior analysis

    PubMed Central

    Hopkins, B. L.; Cole, Brian L.; Mason, Tina L.

    1998-01-01

    Researchers continue to recommend that applied behavior analysts use inferential statistics in making decisions about effects of independent variables on dependent variables. In many other approaches to behavioral science, inferential statistics are the primary means for deciding the importance of effects. Several possible uses of inferential statistics are considered. Rather than being an objective means for making decisions about effects, as is often claimed, inferential statistics are shown to be subjective. It is argued that the use of inferential statistics adds nothing to the complex and admittedly subjective nonstatistical methods that are often employed in applied behavior analysis. Attacks on inferential statistics that are being made, perhaps with increasing frequency, by those who are not behavior analysts, are discussed. These attackers are calling for banning the use of inferential statistics in research publications and commonly recommend that behavioral scientists should switch to using statistics aimed at interval estimation or the method of confidence intervals. Interval estimation is shown to be contrary to the fundamental assumption of behavior analysis that only individuals behave. It is recommended that authors who wish to publish the results of inferential statistics be asked to justify them as a means for helping us to identify any ways in which they may be useful. PMID:22478304

  1. Detecting Genomic Clustering of Risk Variants from Sequence Data: Cases vs. Controls

    PubMed Central

    Schaid, Daniel J.; Sinnwell, Jason P.; McDonnell, Shannon K.; Thibodeau, Stephen N.

    2013-01-01

    As the ability to measure dense genetic markers approaches the limit of the DNA sequence itself, taking advantage of possible clustering of genetic variants in, and around, a gene would benefit genetic association analyses, and likely provide biological insights. The greatest benefit might be realized when multiple rare variants cluster in a functional region. Several statistical tests have been developed, one of which is based on the popular Kulldorff scan statistic for spatial clustering of disease. We extended another popular spatial clustering method – Tango’s statistic – to genomic sequence data. An advantage of Tango’s method is that it is rapid to compute, and when single test statistic is computed, its distribution is well approximated by a scaled chi-square distribution, making computation of p-values very rapid. We compared the Type-I error rates and power of several clustering statistics, as well as the omnibus sequence kernel association test (SKAT). Although our version of Tango’s statistic, which we call “Kernel Distance” statistic, took approximately half the time to compute than the Kulldorff scan statistic, it had slightly less power than the scan statistic. Our results showed that the Ionita-Laza version of Kulldorff’s scan statistic had the greatest power over a range of clustering scenarios. PMID:23842950

  2. Fine Mapping Causal Variants with an Approximate Bayesian Method Using Marginal Test Statistics.

    PubMed

    Chen, Wenan; Larrabee, Beth R; Ovsyannikova, Inna G; Kennedy, Richard B; Haralambieva, Iana H; Poland, Gregory A; Schaid, Daniel J

    2015-07-01

    Two recently developed fine-mapping methods, CAVIAR and PAINTOR, demonstrate better performance over other fine-mapping methods. They also have the advantage of using only the marginal test statistics and the correlation among SNPs. Both methods leverage the fact that the marginal test statistics asymptotically follow a multivariate normal distribution and are likelihood based. However, their relationship with Bayesian fine mapping, such as BIMBAM, is not clear. In this study, we first show that CAVIAR and BIMBAM are actually approximately equivalent to each other. This leads to a fine-mapping method using marginal test statistics in the Bayesian framework, which we call CAVIAR Bayes factor (CAVIARBF). Another advantage of the Bayesian framework is that it can answer both association and fine-mapping questions. We also used simulations to compare CAVIARBF with other methods under different numbers of causal variants. The results showed that both CAVIARBF and BIMBAM have better performance than PAINTOR and other methods. Compared to BIMBAM, CAVIARBF has the advantage of using only marginal test statistics and takes about one-quarter to one-fifth of the running time. We applied different methods on two independent cohorts of the same phenotype. Results showed that CAVIARBF, BIMBAM, and PAINTOR selected the same top 3 SNPs; however, CAVIARBF and BIMBAM had better consistency in selecting the top 10 ranked SNPs between the two cohorts. Software is available at https://bitbucket.org/Wenan/caviarbf. Copyright © 2015 by the Genetics Society of America.

  3. Advanced statistical methods for improved data analysis of NASA astrophysics missions

    NASA Technical Reports Server (NTRS)

    Feigelson, Eric D.

    1992-01-01

    The investigators under this grant studied ways to improve the statistical analysis of astronomical data. They looked at existing techniques, the development of new techniques, and the production and distribution of specialized software to the astronomical community. Abstracts of nine papers that were produced are included, as well as brief descriptions of four software packages. The articles that are abstracted discuss analytical and Monte Carlo comparisons of six different linear least squares fits, a (second) paper on linear regression in astronomy, two reviews of public domain software for the astronomer, subsample and half-sample methods for estimating sampling distributions, a nonparametric estimation of survival functions under dependent competing risks, censoring in astronomical data due to nondetections, an astronomy survival analysis computer package called ASURV, and improving the statistical methodology of astronomical data analysis.

  4. Weighted Statistical Binning: Enabling Statistically Consistent Genome-Scale Phylogenetic Analyses

    PubMed Central

    Bayzid, Md Shamsuzzoha; Mirarab, Siavash; Boussau, Bastien; Warnow, Tandy

    2015-01-01

    Because biological processes can result in different loci having different evolutionary histories, species tree estimation requires multiple loci from across multiple genomes. While many processes can result in discord between gene trees and species trees, incomplete lineage sorting (ILS), modeled by the multi-species coalescent, is considered to be a dominant cause for gene tree heterogeneity. Coalescent-based methods have been developed to estimate species trees, many of which operate by combining estimated gene trees, and so are called "summary methods". Because summary methods are generally fast (and much faster than more complicated coalescent-based methods that co-estimate gene trees and species trees), they have become very popular techniques for estimating species trees from multiple loci. However, recent studies have established that summary methods can have reduced accuracy in the presence of gene tree estimation error, and also that many biological datasets have substantial gene tree estimation error, so that summary methods may not be highly accurate in biologically realistic conditions. Mirarab et al. (Science 2014) presented the "statistical binning" technique to improve gene tree estimation in multi-locus analyses, and showed that it improved the accuracy of MP-EST, one of the most popular coalescent-based summary methods. Statistical binning, which uses a simple heuristic to evaluate "combinability" and then uses the larger sets of genes to re-calculate gene trees, has good empirical performance, but using statistical binning within a phylogenomic pipeline does not have the desirable property of being statistically consistent. We show that weighting the re-calculated gene trees by the bin sizes makes statistical binning statistically consistent under the multispecies coalescent, and maintains the good empirical performance. Thus, "weighted statistical binning" enables highly accurate genome-scale species tree estimation, and is also statistically consistent under the multi-species coalescent model. New data used in this study are available at DOI: http://dx.doi.org/10.6084/m9.figshare.1411146, and the software is available at https://github.com/smirarab/binning. PMID:26086579

  5. 3D automatic anatomy recognition based on iterative graph-cut-ASM

    NASA Astrophysics Data System (ADS)

    Chen, Xinjian; Udupa, Jayaram K.; Bagci, Ulas; Alavi, Abass; Torigian, Drew A.

    2010-02-01

    We call the computerized assistive process of recognizing, delineating, and quantifying organs and tissue regions in medical imaging, occurring automatically during clinical image interpretation, automatic anatomy recognition (AAR). The AAR system we are developing includes five main parts: model building, object recognition, object delineation, pathology detection, and organ system quantification. In this paper, we focus on the delineation part. For the modeling part, we employ the active shape model (ASM) strategy. For recognition and delineation, we integrate several hybrid strategies of combining purely image based methods with ASM. In this paper, an iterative Graph-Cut ASM (IGCASM) method is proposed for object delineation. An algorithm called GC-ASM was presented at this symposium last year for object delineation in 2D images which attempted to combine synergistically ASM and GC. Here, we extend this method to 3D medical image delineation. The IGCASM method effectively combines the rich statistical shape information embodied in ASM with the globally optimal delineation capability of the GC method. We propose a new GC cost function, which effectively integrates the specific image information with the ASM shape model information. The proposed methods are tested on a clinical abdominal CT data set. The preliminary results show that: (a) it is feasible to explicitly bring prior 3D statistical shape information into the GC framework; (b) the 3D IGCASM delineation method improves on ASM and GC and can provide practical operational time on clinical images.

  6. Statistical Process Control Techniques for the Telecommunications Systems Manager

    DTIC Science & Technology

    1992-03-01

    products that are out of 59 tolerance and bad designs. The third type of defect, mistakes, are remedied by Poka - Yoke methods that are 1 introduced later...based on total production costs plus quality costs. Once production is underway, interventions are determined by their impact on the QLF. F. POKA - YOKE ...Mistakes require process improvements called Poka Yoke or mistake proofing. Shiego Shingo developed Poka Yoke methods to incorporate 100% inspection at

  7. A statistical, task-based evaluation method for three-dimensional x-ray breast imaging systems using variable-background phantoms

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Park, Subok; Jennings, Robert; Liu Haimo

    Purpose: For the last few years, development and optimization of three-dimensional (3D) x-ray breast imaging systems, such as digital breast tomosynthesis (DBT) and computed tomography, have drawn much attention from the medical imaging community, either academia or industry. However, there is still much room for understanding how to best optimize and evaluate the devices over a large space of many different system parameters and geometries. Current evaluation methods, which work well for 2D systems, do not incorporate the depth information from the 3D imaging systems. Therefore, it is critical to develop a statistically sound evaluation method to investigate the usefulnessmore » of inclusion of depth and background-variability information into the assessment and optimization of the 3D systems. Methods: In this paper, we present a mathematical framework for a statistical assessment of planar and 3D x-ray breast imaging systems. Our method is based on statistical decision theory, in particular, making use of the ideal linear observer called the Hotelling observer. We also present a physical phantom that consists of spheres of different sizes and materials for producing an ensemble of randomly varying backgrounds to be imaged for a given patient class. Lastly, we demonstrate our evaluation method in comparing laboratory mammography and three-angle DBT systems for signal detection tasks using the phantom's projection data. We compare the variable phantom case to that of a phantom of the same dimensions filled with water, which we call the uniform phantom, based on the performance of the Hotelling observer as a function of signal size and intensity. Results: Detectability trends calculated using the variable and uniform phantom methods are different from each other for both mammography and DBT systems. Conclusions: Our results indicate that measuring the system's detection performance with consideration of background variability may lead to differences in system performance estimates and comparisons. For the assessment of 3D systems, to accurately determine trade offs between image quality and radiation dose, it is critical to incorporate randomness arising from the imaging chain including background variability into system performance calculations.« less

  8. Integer lattice gas with Monte Carlo collision operator recovers the lattice Boltzmann method with Poisson-distributed fluctuations

    NASA Astrophysics Data System (ADS)

    Blommel, Thomas; Wagner, Alexander J.

    2018-02-01

    We examine a new kind of lattice gas that closely resembles modern lattice Boltzmann methods. This new kind of lattice gas, which we call a Monte Carlo lattice gas, has interesting properties that shed light on the origin of the multirelaxation time collision operator, and it derives the equilibrium distribution for an entropic lattice Boltzmann. Furthermore these lattice gas methods have Galilean invariant fluctuations given by a Poisson statistics, giving further insight into the properties that we should expect for fluctuating lattice Boltzmann methods.

  9. Social Communication and Vocal Recognition in Free-Ranging Rhesus Monkeys

    NASA Astrophysics Data System (ADS)

    Rendall, Christopher Andrew

    Kinship and individual identity are key determinants of primate sociality, and the capacity for vocal recognition of individuals and kin is hypothesized to be an important adaptation facilitating intra-group social communication. Research was conducted on adult female rhesus monkeys on Cayo Santiago, Puerto Rico to test this hypothesis for three acoustically distinct calls characterized by varying selective pressures on communicating identity: coos (contact calls), grunts (close range social calls), and noisy screams (agonistic recruitment calls). Vocalization playback experiments confirmed a capacity for both individual and kin recognition of coos, but not screams (grunts were not tested). Acoustic analyses, using traditional spectrographic methods as well as linear predictive coding techniques, indicated that coos (but not grunts or screams) were highly distinctive, and that the effects of vocal tract filtering--formants --contributed more to statistical discriminations of both individuals and kin groups than did temporal or laryngeal source features. Formants were identified from very short (23 ms.) segments of coos and were stable within calls, indicating that formant cues to individual and kin identity were available throughout a call. This aspect of formant cues is predicted to be an especially important design feature for signaling identity efficiently in complex acoustic environments. Results of playback experiments involving manipulated coo stimuli provided preliminary perceptual support for the statistical inference that formant cues take precedence in facilitating vocal recognition. The similarity of formants among female kin suggested a mechanism for the development of matrilineal vocal signatures from the genetic and environmental determinants of vocal tract morphology shared among relatives. The fact that screams --calls strongly expected to communicate identity--were not individually distinctive nor recognized suggested the possibility that their acoustic structure and role in signaling identity might be constrained by functional or morphological design requirements associated with their role in signaling submission.

  10. Pesticide poisoning in Palestine: a retrospective analysis of calls received by Poison Control and Drug Information Center from 2006-2010.

    PubMed

    Sawalha, Ansam F; O'Malley, Gerald F; Sweileh, Waleed M

    2012-01-01

    The agricultural industry is the largest economic sector in Palestine and is characterized by extensive and unregulated use of pesticides. The objective of this study was to analyze phone calls received by the Poison Control and Drug Information Center (PCDIC) in Palestine regarding pesticide poisoning. All phone calls regarding pesticide poisoning received by the PCDIC from 2006 to 2010 were descriptively analyzed. Statistical Package for Social Sciences (SPSS version 16) was used in statistical analysis and to create figures. A total of 290 calls regarding pesticide poisoning were received during the study period. Most calls (83.8%) were made by physicians. The average age of reported cases was 19.6 ± 15 years. Pesticide poisoning occurred mostly in males (56.9%). Pesticide poisoning was most common (75, 25.9%) in the age category of 20-29.9 years. The majority (51.7%) of the cases were deliberate self-harm while the remaining was accidental exposure. The majority of phone calls (250, 86.2%) described oral exposure to pesticides. Approximately one third (32.9%) of the cases had symptoms consistent with organophosphate poisoning. Gastric lavage (31.7%) was the major decontamination method used, while charcoal was only utilized in 1.4% of the cases. Follow up was performed in 45.5% of the cases, two patients died after hospital admission while the remaining had positive outcome. Pesticide poisoning is a major health problem in Palestine, and the PCDIC has a clear mission to help in recommending therapy and gathering information.

  11. Clustangles: An Open Library for Clustering Angular Data.

    PubMed

    Sargsyan, Karen; Hua, Yun Hao; Lim, Carmay

    2015-08-24

    Dihedral angles are good descriptors of the numerous conformations visited by large, flexible systems, but their analysis requires directional statistics. A single package including the various multivariate statistical methods for angular data that accounts for the distinct topology of such data does not exist. Here, we present a lightweight standalone, operating-system independent package called Clustangles to fill this gap. Clustangles will be useful in analyzing the ever-increasing number of structures in the Protein Data Bank and clustering the copious conformations from increasingly long molecular dynamics simulations.

  12. Bayesian statistics and Monte Carlo methods

    NASA Astrophysics Data System (ADS)

    Koch, K. R.

    2018-03-01

    The Bayesian approach allows an intuitive way to derive the methods of statistics. Probability is defined as a measure of the plausibility of statements or propositions. Three rules are sufficient to obtain the laws of probability. If the statements refer to the numerical values of variables, the so-called random variables, univariate and multivariate distributions follow. They lead to the point estimation by which unknown quantities, i.e. unknown parameters, are computed from measurements. The unknown parameters are random variables, they are fixed quantities in traditional statistics which is not founded on Bayes' theorem. Bayesian statistics therefore recommends itself for Monte Carlo methods, which generate random variates from given distributions. Monte Carlo methods, of course, can also be applied in traditional statistics. The unknown parameters, are introduced as functions of the measurements, and the Monte Carlo methods give the covariance matrix and the expectation of these functions. A confidence region is derived where the unknown parameters are situated with a given probability. Following a method of traditional statistics, hypotheses are tested by determining whether a value for an unknown parameter lies inside or outside the confidence region. The error propagation of a random vector by the Monte Carlo methods is presented as an application. If the random vector results from a nonlinearly transformed vector, its covariance matrix and its expectation follow from the Monte Carlo estimate. This saves a considerable amount of derivatives to be computed, and errors of the linearization are avoided. The Monte Carlo method is therefore efficient. If the functions of the measurements are given by a sum of two or more random vectors with different multivariate distributions, the resulting distribution is generally not known. TheMonte Carlo methods are then needed to obtain the covariance matrix and the expectation of the sum.

  13. Methods for detrending success metrics to account for inflationary and deflationary factors*

    NASA Astrophysics Data System (ADS)

    Petersen, A. M.; Penner, O.; Stanley, H. E.

    2011-01-01

    Time-dependent economic, technological, and social factors can artificially inflate or deflate quantitative measures for career success. Here we develop and test a statistical method for normalizing career success metrics across time dependent factors. In particular, this method addresses the long standing question: how do we compare the career achievements of professional athletes from different historical eras? Developing an objective approach will be of particular importance over the next decade as major league baseball (MLB) players from the "steroids era" become eligible for Hall of Fame induction. Some experts are calling for asterisks (*) to be placed next to the career statistics of athletes found guilty of using performance enhancing drugs (PED). Here we address this issue, as well as the general problem of comparing statistics from distinct eras, by detrending the seasonal statistics of professional baseball players. We detrend player statistics by normalizing achievements to seasonal averages, which accounts for changes in relative player ability resulting from a range of factors. Our methods are general, and can be extended to various arenas of competition where time-dependent factors play a key role. For five statistical categories, we compare the probability density function (pdf) of detrended career statistics to the pdf of raw career statistics calculated for all player careers in the 90-year period 1920-2009. We find that the functional form of these pdfs is stationary under detrending. This stationarity implies that the statistical regularity observed in the right-skewed distributions for longevity and success in professional sports arises from both the wide range of intrinsic talent among athletes and the underlying nature of competition. We fit the pdfs for career success by the Gamma distribution in order to calculate objective benchmarks based on extreme statistics which can be used for the identification of extraordinary careers.

  14. Deja Vu: Another Call for Replications of Research, Again.

    ERIC Educational Resources Information Center

    Newman, Isadore; McNeil, Keith; Fraas, John W.

    Over the last few years, there has been evolution, although not a linear one, that has progressed from an emphasis on statistical significance to an emphasis on effect size to an emphasis on both of these concepts to what is believed to be a pragmatic emphasis on replicability. This paper presented two methods of estimating a study's replicability…

  15. Shift-Invariant Image Reconstruction of Speckle-Degraded Images Using Bispectrum Estimation

    DTIC Science & Technology

    1990-05-01

    process with the requisite negative exponential pelf. I call this model the Negative Exponential Model ( NENI ). The NENI flowchart is seen in Figure 6...Figure ]3d-g. Statistical Histograms and Phase for the RPj NG EXP FDF MULT METHOD FILuteC 14a. Truth Object Speckled Via the NENI HISTOGRAM OF SPECKLE

  16. A Model for Post Hoc Evaluation.

    ERIC Educational Resources Information Center

    Theimer, William C., Jr.

    Often a research department in a school system is called on to make an after the fact evaluation of a program or project. Although the department is operating under a handicap, it can still provide some data useful for evaluative purposes. It is suggested that all the classical methods of descriptive statistics be brought into play. The use of…

  17. BetaTPred: prediction of beta-TURNS in a protein using statistical algorithms.

    PubMed

    Kaur, Harpreet; Raghava, G P S

    2002-03-01

    beta-turns play an important role from a structural and functional point of view. beta-turns are the most common type of non-repetitive structures in proteins and comprise on average, 25% of the residues. In the past numerous methods have been developed to predict beta-turns in a protein. Most of these prediction methods are based on statistical approaches. In order to utilize the full potential of these methods, there is a need to develop a web server. This paper describes a web server called BetaTPred, developed for predicting beta-TURNS in a protein from its amino acid sequence. BetaTPred allows the user to predict turns in a protein using existing statistical algorithms. It also allows to predict different types of beta-TURNS e.g. type I, I', II, II', VI, VIII and non-specific. This server assists the users in predicting the consensus beta-TURNS in a protein. The server is accessible from http://imtech.res.in/raghava/betatpred/

  18. Use of personalized Dynamic Treatment Regimes (DTRs) and Sequential Multiple Assignment Randomized Trials (SMARTs) in mental health studies

    PubMed Central

    Liu, Ying; ZENG, Donglin; WANG, Yuanjia

    2014-01-01

    Summary Dynamic treatment regimens (DTRs) are sequential decision rules tailored at each point where a clinical decision is made based on each patient’s time-varying characteristics and intermediate outcomes observed at earlier points in time. The complexity, patient heterogeneity, and chronicity of mental disorders call for learning optimal DTRs to dynamically adapt treatment to an individual’s response over time. The Sequential Multiple Assignment Randomized Trial (SMARTs) design allows for estimating causal effects of DTRs. Modern statistical tools have been developed to optimize DTRs based on personalized variables and intermediate outcomes using rich data collected from SMARTs; these statistical methods can also be used to recommend tailoring variables for designing future SMART studies. This paper introduces DTRs and SMARTs using two examples in mental health studies, discusses two machine learning methods for estimating optimal DTR from SMARTs data, and demonstrates the performance of the statistical methods using simulated data. PMID:25642116

  19. DECONV-TOOL: An IDL based deconvolution software package

    NASA Technical Reports Server (NTRS)

    Varosi, F.; Landsman, W. B.

    1992-01-01

    There are a variety of algorithms for deconvolution of blurred images, each having its own criteria or statistic to be optimized in order to estimate the original image data. Using the Interactive Data Language (IDL), we have implemented the Maximum Likelihood, Maximum Entropy, Maximum Residual Likelihood, and sigma-CLEAN algorithms in a unified environment called DeConv_Tool. Most of the algorithms have as their goal the optimization of statistics such as standard deviation and mean of residuals. Shannon entropy, log-likelihood, and chi-square of the residual auto-correlation are computed by DeConv_Tool for the purpose of determining the performance and convergence of any particular method and comparisons between methods. DeConv_Tool allows interactive monitoring of the statistics and the deconvolved image during computation. The final results, and optionally, the intermediate results, are stored in a structure convenient for comparison between methods and review of the deconvolution computation. The routines comprising DeConv_Tool are available via anonymous FTP through the IDL Astronomy User's Library.

  20. Allele-specific copy-number discovery from whole-genome and whole-exome sequencing

    PubMed Central

    Wang, WeiBo; Wang, Wei; Sun, Wei; Crowley, James J.; Szatkiewicz, Jin P.

    2015-01-01

    Copy-number variants (CNVs) are a major form of genetic variation and a risk factor for various human diseases, so it is crucial to accurately detect and characterize them. It is conceivable that allele-specific reads from high-throughput sequencing data could be leveraged to both enhance CNV detection and produce allele-specific copy number (ASCN) calls. Although statistical methods have been developed to detect CNVs using whole-genome sequence (WGS) and/or whole-exome sequence (WES) data, information from allele-specific read counts has not yet been adequately exploited. In this paper, we develop an integrated method, called AS-GENSENG, which incorporates allele-specific read counts in CNV detection and estimates ASCN using either WGS or WES data. To evaluate the performance of AS-GENSENG, we conducted extensive simulations, generated empirical data using existing WGS and WES data sets and validated predicted CNVs using an independent methodology. We conclude that AS-GENSENG not only predicts accurate ASCN calls but also improves the accuracy of total copy number calls, owing to its unique ability to exploit information from both total and allele-specific read counts while accounting for various experimental biases in sequence data. Our novel, user-friendly and computationally efficient method and a complete analytic protocol is freely available at https://sourceforge.net/projects/asgenseng/. PMID:25883151

  1. Nonparametric entropy estimation using kernel densities.

    PubMed

    Lake, Douglas E

    2009-01-01

    The entropy of experimental data from the biological and medical sciences provides additional information over summary statistics. Calculating entropy involves estimates of probability density functions, which can be effectively accomplished using kernel density methods. Kernel density estimation has been widely studied and a univariate implementation is readily available in MATLAB. The traditional definition of Shannon entropy is part of a larger family of statistics, called Renyi entropy, which are useful in applications that require a measure of the Gaussianity of data. Of particular note is the quadratic entropy which is related to the Friedman-Tukey (FT) index, a widely used measure in the statistical community. One application where quadratic entropy is very useful is the detection of abnormal cardiac rhythms, such as atrial fibrillation (AF). Asymptotic and exact small-sample results for optimal bandwidth and kernel selection to estimate the FT index are presented and lead to improved methods for entropy estimation.

  2. Valid statistical inference methods for a case-control study with missing data.

    PubMed

    Tian, Guo-Liang; Zhang, Chi; Jiang, Xuejun

    2018-04-01

    The main objective of this paper is to derive the valid sampling distribution of the observed counts in a case-control study with missing data under the assumption of missing at random by employing the conditional sampling method and the mechanism augmentation method. The proposed sampling distribution, called the case-control sampling distribution, can be used to calculate the standard errors of the maximum likelihood estimates of parameters via the Fisher information matrix and to generate independent samples for constructing small-sample bootstrap confidence intervals. Theoretical comparisons of the new case-control sampling distribution with two existing sampling distributions exhibit a large difference. Simulations are conducted to investigate the influence of the three different sampling distributions on statistical inferences. One finding is that the conclusion by the Wald test for testing independency under the two existing sampling distributions could be completely different (even contradictory) from the Wald test for testing the equality of the success probabilities in control/case groups under the proposed distribution. A real cervical cancer data set is used to illustrate the proposed statistical methods.

  3. Powerful Inference with the D-Statistic on Low-Coverage Whole-Genome Data

    PubMed Central

    Soraggi, Samuele; Wiuf, Carsten; Albrechtsen, Anders

    2017-01-01

    The detection of ancient gene flow between human populations is an important issue in population genetics. A common tool for detecting ancient admixture events is the D-statistic. The D-statistic is based on the hypothesis of a genetic relationship that involves four populations, whose correctness is assessed by evaluating specific coincidences of alleles between the groups. When working with high-throughput sequencing data, calling genotypes accurately is not always possible; therefore, the D-statistic currently samples a single base from the reads of one individual per population. This implies ignoring much of the information in the data, an issue especially striking in the case of ancient genomes. We provide a significant improvement to overcome the problems of the D-statistic by considering all reads from multiple individuals in each population. We also apply type-specific error correction to combat the problems of sequencing errors, and show a way to correct for introgression from an external population that is not part of the supposed genetic relationship, and how this leads to an estimate of the admixture rate. We prove that the D-statistic is approximated by a standard normal distribution. Furthermore, we show that our method outperforms the traditional D-statistic in detecting admixtures. The power gain is most pronounced for low and medium sequencing depth (1–10×), and performances are as good as with perfectly called genotypes at a sequencing depth of 2×. We show the reliability of error correction in scenarios with simulated errors and ancient data, and correct for introgression in known scenarios to estimate the admixture rates. PMID:29196497

  4. Potential energy surface fitting by a statistically localized, permutationally invariant, local interpolating moving least squares method for the many-body potential: Method and application to N{sub 4}

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bender, Jason D.; Doraiswamy, Sriram; Candler, Graham V., E-mail: truhlar@umn.edu, E-mail: candler@aem.umn.edu

    2014-02-07

    Fitting potential energy surfaces to analytic forms is an important first step for efficient molecular dynamics simulations. Here, we present an improved version of the local interpolating moving least squares method (L-IMLS) for such fitting. Our method has three key improvements. First, pairwise interactions are modeled separately from many-body interactions. Second, permutational invariance is incorporated in the basis functions, using permutationally invariant polynomials in Morse variables, and in the weight functions. Third, computational cost is reduced by statistical localization, in which we statistically correlate the cutoff radius with data point density. We motivate our discussion in this paper with amore » review of global and local least-squares-based fitting methods in one dimension. Then, we develop our method in six dimensions, and we note that it allows the analytic evaluation of gradients, a feature that is important for molecular dynamics. The approach, which we call statistically localized, permutationally invariant, local interpolating moving least squares fitting of the many-body potential (SL-PI-L-IMLS-MP, or, more simply, L-IMLS-G2), is used to fit a potential energy surface to an electronic structure dataset for N{sub 4}. We discuss its performance on the dataset and give directions for further research, including applications to trajectory calculations.« less

  5. LQAS: User Beware.

    PubMed

    Rhoda, Dale A; Fernandez, Soledad A; Fitch, David J; Lemeshow, Stanley

    2010-02-01

    Researchers around the world are using Lot Quality Assurance Sampling (LQAS) techniques to assess public health parameters and evaluate program outcomes. In this paper, we report that there are actually two methods being called LQAS in the world today, and that one of them is badly flawed. This paper reviews fundamental LQAS design principles, and compares and contrasts the two LQAS methods. We raise four concerns with the simply-written, freely-downloadable training materials associated with the second method. The first method is founded on sound statistical principles and is carefully designed to protect the vulnerable populations that it studies. The language used in the training materials for the second method is simple, but not at all clear, so the second method sounds very much like the first. On close inspection, however, the second method is found to promote study designs that are biased in favor of finding programmatic or intervention success, and therefore biased against the interests of the population being studied. We outline several recommendations, and issue a call for a new high standard of clarity and face validity for those who design, conduct, and report LQAS studies.

  6. An Exercise in Exploring Big Data for Producing Reliable Statistical Information.

    PubMed

    Rey-Del-Castillo, Pilar; Cardeñosa, Jesús

    2016-06-01

    The availability of copious data about many human, social, and economic phenomena is considered an opportunity for the production of official statistics. National statistical organizations and other institutions are more and more involved in new projects for developing what is sometimes seen as a possible change of paradigm in the way statistical figures are produced. Nevertheless, there are hardly any systems in production using Big Data sources. Arguments of confidentiality, data ownership, representativeness, and others make it a difficult task to get results in the short term. Using Call Detail Records from Ivory Coast as an illustration, this article shows some of the issues that must be dealt with when producing statistical indicators from Big Data sources. A proposal of a graphical method to evaluate one specific aspect of the quality of the computed figures is also presented, demonstrating that the visual insight provided improves the results obtained using other traditional procedures.

  7. Statistical Tools for Determining Fitness to Fly

    DTIC Science & Technology

    1981-09-01

    program. (a) Number of Cards in file: 13 (b) Layout of Card 1: iIi Field Length a•e. Variable 1 8 Real EFAIL : Average # of failures for size of control...Method Compute Survival Probability and Frequency Tables 4-4 END 25P FLUW CHARTS 27 QSTART Input EFAIL ,CYEAR,NVAR,NAV,XINC BB~i i=1,3 name (1) i=1,4 Call

  8. Statistical, Graphical, and Learning Methods for Sensing, Surveillance, and Navigation Systems

    DTIC Science & Technology

    2016-06-28

    harsh propagation environments. Conventional filtering techniques fail to provide satisfactory performance in many important nonlinear or non...Gaussian scenarios. In addition, there is a lack of a unified methodology for the design and analysis of different filtering techniques. To address...these problems, we have proposed a new filtering methodology called belief condensation (BC) DISTRIBUTION A: Distribution approved for public release

  9. A common base method for analysis of qPCR data and the application of simple blocking in qPCR experiments.

    PubMed

    Ganger, Michael T; Dietz, Geoffrey D; Ewing, Sarah J

    2017-12-01

    qPCR has established itself as the technique of choice for the quantification of gene expression. Procedures for conducting qPCR have received significant attention; however, more rigorous approaches to the statistical analysis of qPCR data are needed. Here we develop a mathematical model, termed the Common Base Method, for analysis of qPCR data based on threshold cycle values (C q ) and efficiencies of reactions (E). The Common Base Method keeps all calculations in the logscale as long as possible by working with log 10 (E) ∙ C q , which we call the efficiency-weighted C q value; subsequent statistical analyses are then applied in the logscale. We show how efficiency-weighted C q values may be analyzed using a simple paired or unpaired experimental design and develop blocking methods to help reduce unexplained variation. The Common Base Method has several advantages. It allows for the incorporation of well-specific efficiencies and multiple reference genes. The method does not necessitate the pairing of samples that must be performed using traditional analysis methods in order to calculate relative expression ratios. Our method is also simple enough to be implemented in any spreadsheet or statistical software without additional scripts or proprietary components.

  10. Asymptotic formulae for likelihood-based tests of new physics

    NASA Astrophysics Data System (ADS)

    Cowan, Glen; Cranmer, Kyle; Gross, Eilam; Vitells, Ofer

    2011-02-01

    We describe likelihood-based statistical tests for use in high energy physics for the discovery of new phenomena and for construction of confidence intervals on model parameters. We focus on the properties of the test procedures that allow one to account for systematic uncertainties. Explicit formulae for the asymptotic distributions of test statistics are derived using results of Wilks and Wald. We motivate and justify the use of a representative data set, called the "Asimov data set", which provides a simple method to obtain the median experimental sensitivity of a search or measurement as well as fluctuations about this expectation.

  11. Cross-Validation of Survival Bump Hunting by Recursive Peeling Methods.

    PubMed

    Dazard, Jean-Eudes; Choe, Michael; LeBlanc, Michael; Rao, J Sunil

    2014-08-01

    We introduce a survival/risk bump hunting framework to build a bump hunting model with a possibly censored time-to-event type of response and to validate model estimates. First, we describe the use of adequate survival peeling criteria to build a survival/risk bump hunting model based on recursive peeling methods. Our method called "Patient Recursive Survival Peeling" is a rule-induction method that makes use of specific peeling criteria such as hazard ratio or log-rank statistics. Second, to validate our model estimates and improve survival prediction accuracy, we describe a resampling-based validation technique specifically designed for the joint task of decision rule making by recursive peeling (i.e. decision-box) and survival estimation. This alternative technique, called "combined" cross-validation is done by combining test samples over the cross-validation loops, a design allowing for bump hunting by recursive peeling in a survival setting. We provide empirical results showing the importance of cross-validation and replication.

  12. Cross-Validation of Survival Bump Hunting by Recursive Peeling Methods

    PubMed Central

    Dazard, Jean-Eudes; Choe, Michael; LeBlanc, Michael; Rao, J. Sunil

    2015-01-01

    We introduce a survival/risk bump hunting framework to build a bump hunting model with a possibly censored time-to-event type of response and to validate model estimates. First, we describe the use of adequate survival peeling criteria to build a survival/risk bump hunting model based on recursive peeling methods. Our method called “Patient Recursive Survival Peeling” is a rule-induction method that makes use of specific peeling criteria such as hazard ratio or log-rank statistics. Second, to validate our model estimates and improve survival prediction accuracy, we describe a resampling-based validation technique specifically designed for the joint task of decision rule making by recursive peeling (i.e. decision-box) and survival estimation. This alternative technique, called “combined” cross-validation is done by combining test samples over the cross-validation loops, a design allowing for bump hunting by recursive peeling in a survival setting. We provide empirical results showing the importance of cross-validation and replication. PMID:26997922

  13. A fast and objective multidimensional kernel density estimation method: fastKDE

    DOE PAGES

    O'Brien, Travis A.; Kashinath, Karthik; Cavanaugh, Nicholas R.; ...

    2016-03-07

    Numerous facets of scientific research implicitly or explicitly call for the estimation of probability densities. Histograms and kernel density estimates (KDEs) are two commonly used techniques for estimating such information, with the KDE generally providing a higher fidelity representation of the probability density function (PDF). Both methods require specification of either a bin width or a kernel bandwidth. While techniques exist for choosing the kernel bandwidth optimally and objectively, they are computationally intensive, since they require repeated calculation of the KDE. A solution for objectively and optimally choosing both the kernel shape and width has recently been developed by Bernacchiamore » and Pigolotti (2011). While this solution theoretically applies to multidimensional KDEs, it has not been clear how to practically do so. A method for practically extending the Bernacchia-Pigolotti KDE to multidimensions is introduced. This multidimensional extension is combined with a recently-developed computational improvement to their method that makes it computationally efficient: a 2D KDE on 10 5 samples only takes 1 s on a modern workstation. This fast and objective KDE method, called the fastKDE method, retains the excellent statistical convergence properties that have been demonstrated for univariate samples. The fastKDE method exhibits statistical accuracy that is comparable to state-of-the-science KDE methods publicly available in R, and it produces kernel density estimates several orders of magnitude faster. The fastKDE method does an excellent job of encoding covariance information for bivariate samples. This property allows for direct calculation of conditional PDFs with fastKDE. It is demonstrated how this capability might be leveraged for detecting non-trivial relationships between quantities in physical systems, such as transitional behavior.« less

  14. Statistical auditing and randomness test of lotto k/N-type games

    NASA Astrophysics Data System (ADS)

    Coronel-Brizio, H. F.; Hernández-Montoya, A. R.; Rapallo, F.; Scalas, E.

    2008-11-01

    One of the most popular lottery games worldwide is the so-called “lotto k/N”. It considers N numbers 1,2,…,N from which k are drawn randomly, without replacement. A player selects k or more numbers and the first prize is shared amongst those players whose selected numbers match all of the k randomly drawn. Exact rules may vary in different countries. In this paper, mean values and covariances for the random variables representing the numbers drawn from this kind of game are presented, with the aim of using them to audit statistically the consistency of a given sample of historical results with theoretical values coming from a hypergeometric statistical model. The method can be adapted to test pseudorandom number generators.

  15. Spectrophotometric methods for simultaneous determination of betamethasone valerate and fusidic acid in their binary mixture.

    PubMed

    Lotfy, Hayam Mahmoud; Salem, Hesham; Abdelkawy, Mohammad; Samir, Ahmed

    2015-04-05

    Five spectrophotometric methods were successfully developed and validated for the determination of betamethasone valerate and fusidic acid in their binary mixture. Those methods are isoabsorptive point method combined with the first derivative (ISO Point--D1) and the recently developed and well established methods namely ratio difference (RD) and constant center coupled with spectrum subtraction (CC) methods, in addition to derivative ratio (1DD) and mean centering of ratio spectra (MCR). New enrichment technique called spectrum addition technique was used instead of traditional spiking technique. The proposed spectrophotometric procedures do not require any separation steps. Accuracy, precision and linearity ranges of the proposed methods were determined and the specificity was assessed by analyzing synthetic mixtures of both drugs. They were applied to their pharmaceutical formulation and the results obtained were statistically compared to that of official methods. The statistical comparison showed that there is no significant difference between the proposed methods and the official ones regarding both accuracy and precision. Copyright © 2015 Elsevier B.V. All rights reserved.

  16. Allele-specific copy-number discovery from whole-genome and whole-exome sequencing.

    PubMed

    Wang, WeiBo; Wang, Wei; Sun, Wei; Crowley, James J; Szatkiewicz, Jin P

    2015-08-18

    Copy-number variants (CNVs) are a major form of genetic variation and a risk factor for various human diseases, so it is crucial to accurately detect and characterize them. It is conceivable that allele-specific reads from high-throughput sequencing data could be leveraged to both enhance CNV detection and produce allele-specific copy number (ASCN) calls. Although statistical methods have been developed to detect CNVs using whole-genome sequence (WGS) and/or whole-exome sequence (WES) data, information from allele-specific read counts has not yet been adequately exploited. In this paper, we develop an integrated method, called AS-GENSENG, which incorporates allele-specific read counts in CNV detection and estimates ASCN using either WGS or WES data. To evaluate the performance of AS-GENSENG, we conducted extensive simulations, generated empirical data using existing WGS and WES data sets and validated predicted CNVs using an independent methodology. We conclude that AS-GENSENG not only predicts accurate ASCN calls but also improves the accuracy of total copy number calls, owing to its unique ability to exploit information from both total and allele-specific read counts while accounting for various experimental biases in sequence data. Our novel, user-friendly and computationally efficient method and a complete analytic protocol is freely available at https://sourceforge.net/projects/asgenseng/. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  17. DOE Office of Scientific and Technical Information (OSTI.GOV)

    O'Brien, Travis A.; Kashinath, Karthik; Cavanaugh, Nicholas R.

    Numerous facets of scientific research implicitly or explicitly call for the estimation of probability densities. Histograms and kernel density estimates (KDEs) are two commonly used techniques for estimating such information, with the KDE generally providing a higher fidelity representation of the probability density function (PDF). Both methods require specification of either a bin width or a kernel bandwidth. While techniques exist for choosing the kernel bandwidth optimally and objectively, they are computationally intensive, since they require repeated calculation of the KDE. A solution for objectively and optimally choosing both the kernel shape and width has recently been developed by Bernacchiamore » and Pigolotti (2011). While this solution theoretically applies to multidimensional KDEs, it has not been clear how to practically do so. A method for practically extending the Bernacchia-Pigolotti KDE to multidimensions is introduced. This multidimensional extension is combined with a recently-developed computational improvement to their method that makes it computationally efficient: a 2D KDE on 10 5 samples only takes 1 s on a modern workstation. This fast and objective KDE method, called the fastKDE method, retains the excellent statistical convergence properties that have been demonstrated for univariate samples. The fastKDE method exhibits statistical accuracy that is comparable to state-of-the-science KDE methods publicly available in R, and it produces kernel density estimates several orders of magnitude faster. The fastKDE method does an excellent job of encoding covariance information for bivariate samples. This property allows for direct calculation of conditional PDFs with fastKDE. It is demonstrated how this capability might be leveraged for detecting non-trivial relationships between quantities in physical systems, such as transitional behavior.« less

  18. [Methods of the multivariate statistical analysis of so-called polyetiological diseases using the example of coronary heart disease].

    PubMed

    Lifshits, A M

    1979-01-01

    General characteristics of the multivariate statistical analysis (MSA) is given. Methodical premises and criteria for the selection of an adequate MSA method applicable to pathoanatomic investigations of the epidemiology of multicausal diseases are presented. The experience of using MSA with computors and standard computing programs in studies of coronary arteries aterosclerosis on the materials of 2060 autopsies is described. The combined use of 4 MSA methods: sequential, correlational, regressional, and discriminant permitted to quantitate the contribution of each of the 8 examined risk factors in the development of aterosclerosis. The most important factors were found to be the age, arterial hypertension, and heredity. Occupational hypodynamia and increased fatness were more important in men, whereas diabetes melitus--in women. The registration of this combination of risk factors by MSA methods provides for more reliable prognosis of the likelihood of coronary heart disease with a fatal outcome than prognosis of the degree of coronary aterosclerosis.

  19. A Public Health Initiative for Preventing Family Violence.

    ERIC Educational Resources Information Center

    Hargens, Yvonne Marie

    A community task force studying violence issues closely examined police statistics for domestic calls. Few records of referrals were made in response to these calls. Other statistics on child abuse and family violence reinforced the fact that family violence was a significant problem, making program response to family violence issues a top…

  20. Powerful Inference with the D-Statistic on Low-Coverage Whole-Genome Data.

    PubMed

    Soraggi, Samuele; Wiuf, Carsten; Albrechtsen, Anders

    2018-02-02

    The detection of ancient gene flow between human populations is an important issue in population genetics. A common tool for detecting ancient admixture events is the D-statistic. The D-statistic is based on the hypothesis of a genetic relationship that involves four populations, whose correctness is assessed by evaluating specific coincidences of alleles between the groups. When working with high-throughput sequencing data, calling genotypes accurately is not always possible; therefore, the D-statistic currently samples a single base from the reads of one individual per population. This implies ignoring much of the information in the data, an issue especially striking in the case of ancient genomes. We provide a significant improvement to overcome the problems of the D-statistic by considering all reads from multiple individuals in each population. We also apply type-specific error correction to combat the problems of sequencing errors, and show a way to correct for introgression from an external population that is not part of the supposed genetic relationship, and how this leads to an estimate of the admixture rate. We prove that the D-statistic is approximated by a standard normal distribution. Furthermore, we show that our method outperforms the traditional D-statistic in detecting admixtures. The power gain is most pronounced for low and medium sequencing depth (1-10×), and performances are as good as with perfectly called genotypes at a sequencing depth of 2×. We show the reliability of error correction in scenarios with simulated errors and ancient data, and correct for introgression in known scenarios to estimate the admixture rates. Copyright © 2018 Soraggi et al.

  1. Visualization of time series statistical data by shape analysis (GDP ratio changes among Asia countries)

    NASA Astrophysics Data System (ADS)

    Shirota, Yukari; Hashimoto, Takako; Fitri Sari, Riri

    2018-03-01

    It has been very significant to visualize time series big data. In the paper we shall discuss a new analysis method called “statistical shape analysis” or “geometry driven statistics” on time series statistical data in economics. In the paper, we analyse the agriculture, value added and industry, value added (percentage of GDP) changes from 2000 to 2010 in Asia. We handle the data as a set of landmarks on a two-dimensional image to see the deformation using the principal components. The point of the analysis method is the principal components of the given formation which are eigenvectors of its bending energy matrix. The local deformation can be expressed as the set of non-Affine transformations. The transformations give us information about the local differences between in 2000 and in 2010. Because the non-Affine transformation can be decomposed into a set of partial warps, we present the partial warps visually. The statistical shape analysis is widely used in biology but, in economics, no application can be found. In the paper, we investigate its potential to analyse the economic data.

  2. Publication Bias ( The "File-Drawer Problem") in Scientific Inference

    NASA Technical Reports Server (NTRS)

    Scargle, Jeffrey D.; DeVincenzi, Donald (Technical Monitor)

    1999-01-01

    Publication bias arises whenever the probability that a study is published depends on the statistical significance of its results. This bias, often called the file-drawer effect since the unpublished results are imagined to be tucked away in researchers' file cabinets, is potentially a severe impediment to combining the statistical results of studies collected from the literature. With almost any reasonable quantitative model for publication bias, only a small number of studies lost in the file-drawer will produce a significant bias. This result contradicts the well known Fail Safe File Drawer (FSFD) method for setting limits on the potential harm of publication bias, widely used in social, medical and psychic research. This method incorrectly treats the file drawer as unbiased, and almost always miss-estimates the seriousness of publication bias. A large body of not only psychic research, but medical and social science studies, has mistakenly relied on this method to validate claimed discoveries. Statistical combination can be trusted only if it is known with certainty that all studies that have been carried out are included. Such certainty is virtually impossible to achieve in literature surveys.

  3. Genome-wide association analysis of secondary imaging phenotypes from the Alzheimer's disease neuroimaging initiative study.

    PubMed

    Zhu, Wensheng; Yuan, Ying; Zhang, Jingwen; Zhou, Fan; Knickmeyer, Rebecca C; Zhu, Hongtu

    2017-02-01

    The aim of this paper is to systematically evaluate a biased sampling issue associated with genome-wide association analysis (GWAS) of imaging phenotypes for most imaging genetic studies, including the Alzheimer's Disease Neuroimaging Initiative (ADNI). Specifically, the original sampling scheme of these imaging genetic studies is primarily the retrospective case-control design, whereas most existing statistical analyses of these studies ignore such sampling scheme by directly correlating imaging phenotypes (called the secondary traits) with genotype. Although it has been well documented in genetic epidemiology that ignoring the case-control sampling scheme can produce highly biased estimates, and subsequently lead to misleading results and suspicious associations, such findings are not well documented in imaging genetics. We use extensive simulations and a large-scale imaging genetic data analysis of the Alzheimer's Disease Neuroimaging Initiative (ADNI) data to evaluate the effects of the case-control sampling scheme on GWAS results based on some standard statistical methods, such as linear regression methods, while comparing it with several advanced statistical methods that appropriately adjust for the case-control sampling scheme. Copyright © 2016 Elsevier Inc. All rights reserved.

  4. A statistical method to estimate low-energy hadronic cross sections

    NASA Astrophysics Data System (ADS)

    Balassa, Gábor; Kovács, Péter; Wolf, György

    2018-02-01

    In this article we propose a model based on the Statistical Bootstrap approach to estimate the cross sections of different hadronic reactions up to a few GeV in c.m.s. energy. The method is based on the idea, when two particles collide a so-called fireball is formed, which after a short time period decays statistically into a specific final state. To calculate the probabilities we use a phase space description extended with quark combinatorial factors and the possibility of more than one fireball formation. In a few simple cases the probability of a specific final state can be calculated analytically, where we show that the model is able to reproduce the ratios of the considered cross sections. We also show that the model is able to describe proton-antiproton annihilation at rest. In the latter case we used a numerical method to calculate the more complicated final state probabilities. Additionally, we examined the formation of strange and charmed mesons as well, where we used existing data to fit the relevant model parameters.

  5. Evaluating fMRI methods for assessing hemispheric language dominance in healthy subjects.

    PubMed

    Baciu, Monica; Juphard, Alexandra; Cousin, Emilie; Bas, Jean François Le

    2005-08-01

    We evaluated two methods for quantifying the hemispheric language dominance in healthy subjects, by using a rhyme detection (deciding whether couple of words rhyme) and a word fluency (generating words starting with a given letter) task. One of methods called "flip method" (FM) was based on the direct statistical comparison between hemispheres' activity. The second one, the classical lateralization indices method (LIM), was based on calculating lateralization indices by taking into account the number of activated pixels within hemispheres. The main difference between methods is the statistical assessment of the inter-hemispheric difference: while FM shows if the difference between hemispheres' activity is statistically significant, LIM shows only that if there is a difference between hemispheres. The robustness of LIM and FM was assessed by calculating correlation coefficients between LIs obtained with each of these methods and manual lateralization indices MLI obtained with Edinburgh inventory. Our results showed significant correlation between LIs provided by each method and the MIL, suggesting that both methods are robust for quantifying hemispheric dominance for language in healthy subjects. In the present study we also evaluated the effect of spatial normalization, smoothing and "clustering" (NSC) on the intra-hemispheric location of activated regions and inter-hemispheric asymmetry of the activation. Our results have shown that NSC did not affect the hemispheric specialization but increased the value of the inter-hemispheric difference.

  6. Operator Approach to the Master Equation for the One-Step Process

    NASA Astrophysics Data System (ADS)

    Hnatič, M.; Eferina, E. G.; Korolkova, A. V.; Kulyabov, D. S.; Sevastyanov, L. A.

    2016-02-01

    Background. Presentation of the probability as an intrinsic property of the nature leads researchers to switch from deterministic to stochastic description of the phenomena. The kinetics of the interaction has recently attracted attention because it often occurs in the physical, chemical, technical, biological, environmental, economic, and sociological systems. However, there are no general methods for the direct study of this equation. The expansion of the equation in a formal Taylor series (the so called Kramers-Moyal's expansion) is used in the procedure of stochastization of one-step processes. Purpose. However, this does not eliminate the need for the study of the master equation. Method. It is proposed to use quantum field perturbation theory for the statistical systems (the so-called Doi method). Results: This work is a methodological material that describes the principles of master equation solution based on quantum field perturbation theory methods. The characteristic property of the work is that it is intelligible for non-specialists in quantum field theory. Conclusions: We show the full equivalence of the operator and combinatorial methods of obtaining and study of the one-step process master equation.

  7. Implementation and use of a crisis hotline during the treatment as usual and universal screening phases of a suicide intervention study.

    PubMed

    Arias, Sarah A; Sullivan, Ashley F; Miller, Ivan; Camargo, Carlos A; Boudreaux, Edwin D

    2015-11-01

    Although research suggests that crisis hotlines are an effective means of mitigating suicide risk, lack of empirical evidence may limit the use of this method as a research safety protocol. This study describes the use of a crisis hotline to provide clinical backup for research assessments. Data were analyzed from participants in the Emergency Department Safety and Follow-up Evaluation (ED-SAFE) study (n=874). Socio-demographics, call completion data, and data available on suicide attempts occurring in relation to the crisis counseling call were analyzed. Pearson chi-squared statistic for differences in proportions were conducted to compare characteristics of patients receiving versus not receiving crisis counseling. P<0.05 was considered statistically significant. Overall, there were 163 counseling calls (6% of total assessment calls) from 135 (16%) of the enrolled subjects who were transferred to the crisis line because of suicide risk identified during the research assessment. For those transferred to the crisis line, the median age was 40 years (interquartile range 27-48) with 67% female, 80% white, and 11% Hispanic. Increasing demand for suicide interventions in diverse healthcare settings warrants consideration of crisis hotlines as a safety protocol mechanism. Our findings provide background on how a crisis hotline was implemented as a safety measure, as well as the type of patients who may utilize this safety protocol. Copyright © 2015 Elsevier Inc. All rights reserved.

  8. Using telephony data to facilitate discovery of clinical workflows.

    PubMed

    Rucker, Donald W

    2017-04-19

    Discovery of clinical workflows to target for redesign using methods such as Lean and Six Sigma is difficult. VoIP telephone call pattern analysis may complement direct observation and EMR-based tools in understanding clinical workflows at the enterprise level by allowing visualization of institutional telecommunications activity. To build an analytic framework mapping repetitive and high-volume telephone call patterns in a large medical center to their associated clinical units using an enterprise unified communications server log file and to support visualization of specific call patterns using graphical networks. Consecutive call detail records from the medical center's unified communications server were parsed to cross-correlate telephone call patterns and map associated phone numbers to a cost center dictionary. Hashed data structures were built to allow construction of edge and node files representing high volume call patterns for display with an open source graph network tool. Summary statistics for an analysis of exactly one week's call detail records at a large academic medical center showed that 912,386 calls were placed with a total duration of 23,186 hours. Approximately half of all calling called number pairs had an average call duration under 60 seconds and of these the average call duration was 27 seconds. Cross-correlation of phone calls identified by clinical cost center can be used to generate graphical displays of clinical enterprise communications. Many calls are short. The compact data transfers within short calls may serve as automation or re-design targets. The large absolute amount of time medical center employees were engaged in VoIP telecommunications suggests that analysis of telephone call patterns may offer additional insights into core clinical workflows.

  9. A Statistical Analysis of Brain Morphology Using Wild Bootstrapping

    PubMed Central

    Ibrahim, Joseph G.; Tang, Niansheng; Rowe, Daniel B.; Hao, Xuejun; Bansal, Ravi; Peterson, Bradley S.

    2008-01-01

    Methods for the analysis of brain morphology, including voxel-based morphology and surface-based morphometries, have been used to detect associations between brain structure and covariates of interest, such as diagnosis, severity of disease, age, IQ, and genotype. The statistical analysis of morphometric measures usually involves two statistical procedures: 1) invoking a statistical model at each voxel (or point) on the surface of the brain or brain subregion, followed by mapping test statistics (e.g., t test) or their associated p values at each of those voxels; 2) correction for the multiple statistical tests conducted across all voxels on the surface of the brain region under investigation. We propose the use of new statistical methods for each of these procedures. We first use a heteroscedastic linear model to test the associations between the morphological measures at each voxel on the surface of the specified subregion (e.g., cortical or subcortical surfaces) and the covariates of interest. Moreover, we develop a robust test procedure that is based on a resampling method, called wild bootstrapping. This procedure assesses the statistical significance of the associations between a measure of given brain structure and the covariates of interest. The value of this robust test procedure lies in its computationally simplicity and in its applicability to a wide range of imaging data, including data from both anatomical and functional magnetic resonance imaging (fMRI). Simulation studies demonstrate that this robust test procedure can accurately control the family-wise error rate. We demonstrate the application of this robust test procedure to the detection of statistically significant differences in the morphology of the hippocampus over time across gender groups in a large sample of healthy subjects. PMID:17649909

  10. ParseCNV integrative copy number variation association software with quality tracking

    PubMed Central

    Glessner, Joseph T.; Li, Jin; Hakonarson, Hakon

    2013-01-01

    A number of copy number variation (CNV) calling algorithms exist; however, comprehensive software tools for CNV association studies are lacking. We describe ParseCNV, unique software that takes CNV calls and creates probe-based statistics for CNV occurrence in both case–control design and in family based studies addressing both de novo and inheritance events, which are then summarized based on CNV regions (CNVRs). CNVRs are defined in a dynamic manner to allow for a complex CNV overlap while maintaining precise association region. Using this approach, we avoid failure to converge and non-monotonic curve fitting weaknesses of programs, such as CNVtools and CNVassoc, and although Plink is easy to use, it only provides combined CNV state probe-based statistics, not state-specific CNVRs. Existing CNV association methods do not provide any quality tracking information to filter confident associations, a key issue which is fully addressed by ParseCNV. In addition, uncertainty in CNV calls underlying CNV associations is evaluated to verify significant results, including CNV overlap profiles, genomic context, number of probes supporting the CNV and single-probe intensities. When optimal quality control parameters are followed using ParseCNV, 90% of CNVs validate by polymerase chain reaction, an often problematic stage because of inadequate significant association review. ParseCNV is freely available at http://parsecnv.sourceforge.net. PMID:23293001

  11. ParseCNV integrative copy number variation association software with quality tracking.

    PubMed

    Glessner, Joseph T; Li, Jin; Hakonarson, Hakon

    2013-03-01

    A number of copy number variation (CNV) calling algorithms exist; however, comprehensive software tools for CNV association studies are lacking. We describe ParseCNV, unique software that takes CNV calls and creates probe-based statistics for CNV occurrence in both case-control design and in family based studies addressing both de novo and inheritance events, which are then summarized based on CNV regions (CNVRs). CNVRs are defined in a dynamic manner to allow for a complex CNV overlap while maintaining precise association region. Using this approach, we avoid failure to converge and non-monotonic curve fitting weaknesses of programs, such as CNVtools and CNVassoc, and although Plink is easy to use, it only provides combined CNV state probe-based statistics, not state-specific CNVRs. Existing CNV association methods do not provide any quality tracking information to filter confident associations, a key issue which is fully addressed by ParseCNV. In addition, uncertainty in CNV calls underlying CNV associations is evaluated to verify significant results, including CNV overlap profiles, genomic context, number of probes supporting the CNV and single-probe intensities. When optimal quality control parameters are followed using ParseCNV, 90% of CNVs validate by polymerase chain reaction, an often problematic stage because of inadequate significant association review. ParseCNV is freely available at http://parsecnv.sourceforge.net.

  12. Entropy in sound and vibration: towards a new paradigm.

    PubMed

    Le Bot, A

    2017-01-01

    This paper describes a discussion on the method and the status of a statistical theory of sound and vibration, called statistical energy analysis (SEA). SEA is a simple theory of sound and vibration in elastic structures that applies when the vibrational energy is diffusely distributed. We show that SEA is a thermodynamical theory of sound and vibration, based on a law of exchange of energy analogous to the Clausius principle. We further investigate the notion of entropy in this context and discuss its meaning. We show that entropy is a measure of information lost in the passage from the classical theory of sound and vibration and SEA, its thermodynamical counterpart.

  13. UNITY: Confronting Supernova Cosmology's Statistical and Systematic Uncertainties in a Unified Bayesian Framework

    NASA Astrophysics Data System (ADS)

    Rubin, D.; Aldering, G.; Barbary, K.; Boone, K.; Chappell, G.; Currie, M.; Deustua, S.; Fagrelius, P.; Fruchter, A.; Hayden, B.; Lidman, C.; Nordin, J.; Perlmutter, S.; Saunders, C.; Sofiatti, C.; Supernova Cosmology Project, The

    2015-11-01

    While recent supernova (SN) cosmology research has benefited from improved measurements, current analysis approaches are not statistically optimal and will prove insufficient for future surveys. This paper discusses the limitations of current SN cosmological analyses in treating outliers, selection effects, shape- and color-standardization relations, unexplained dispersion, and heterogeneous observations. We present a new Bayesian framework, called UNITY (Unified Nonlinear Inference for Type-Ia cosmologY), that incorporates significant improvements in our ability to confront these effects. We apply the framework to real SN observations and demonstrate smaller statistical and systematic uncertainties. We verify earlier results that SNe Ia require nonlinear shape and color standardizations, but we now include these nonlinear relations in a statistically well-justified way. This analysis was primarily performed blinded, in that the basic framework was first validated on simulated data before transitioning to real data. We also discuss possible extensions of the method.

  14. NIRS-SPM: statistical parametric mapping for near infrared spectroscopy

    NASA Astrophysics Data System (ADS)

    Tak, Sungho; Jang, Kwang Eun; Jung, Jinwook; Jang, Jaeduck; Jeong, Yong; Ye, Jong Chul

    2008-02-01

    Even though there exists a powerful statistical parametric mapping (SPM) tool for fMRI, similar public domain tools are not available for near infrared spectroscopy (NIRS). In this paper, we describe a new public domain statistical toolbox called NIRS-SPM for quantitative analysis of NIRS signals. Specifically, NIRS-SPM statistically analyzes the NIRS data using GLM and makes inference as the excursion probability which comes from the random field that are interpolated from the sparse measurement. In order to obtain correct inference, NIRS-SPM offers the pre-coloring and pre-whitening method for temporal correlation estimation. For simultaneous recording NIRS signal with fMRI, the spatial mapping between fMRI image and real coordinate in 3-D digitizer is estimated using Horn's algorithm. These powerful tools allows us the super-resolution localization of the brain activation which is not possible using the conventional NIRS analysis tools.

  15. Logistic regression applied to natural hazards: rare event logistic regression with replications

    NASA Astrophysics Data System (ADS)

    Guns, M.; Vanacker, V.

    2012-06-01

    Statistical analysis of natural hazards needs particular attention, as most of these phenomena are rare events. This study shows that the ordinary rare event logistic regression, as it is now commonly used in geomorphologic studies, does not always lead to a robust detection of controlling factors, as the results can be strongly sample-dependent. In this paper, we introduce some concepts of Monte Carlo simulations in rare event logistic regression. This technique, so-called rare event logistic regression with replications, combines the strength of probabilistic and statistical methods, and allows overcoming some of the limitations of previous developments through robust variable selection. This technique was here developed for the analyses of landslide controlling factors, but the concept is widely applicable for statistical analyses of natural hazards.

  16. Phaser crystallographic software.

    PubMed

    McCoy, Airlie J; Grosse-Kunstleve, Ralf W; Adams, Paul D; Winn, Martyn D; Storoni, Laurent C; Read, Randy J

    2007-08-01

    Phaser is a program for phasing macromolecular crystal structures by both molecular replacement and experimental phasing methods. The novel phasing algorithms implemented in Phaser have been developed using maximum likelihood and multivariate statistics. For molecular replacement, the new algorithms have proved to be significantly better than traditional methods in discriminating correct solutions from noise, and for single-wavelength anomalous dispersion experimental phasing, the new algorithms, which account for correlations between F(+) and F(-), give better phases (lower mean phase error with respect to the phases given by the refined structure) than those that use mean F and anomalous differences DeltaF. One of the design concepts of Phaser was that it be capable of a high degree of automation. To this end, Phaser (written in C++) can be called directly from Python, although it can also be called using traditional CCP4 keyword-style input. Phaser is a platform for future development of improved phasing methods and their release, including source code, to the crystallographic community.

  17. An improved numerical method to compute neutron/gamma deexcitation cascades starting from a high spin state

    DOE PAGES

    Regnier, D.; Litaize, O.; Serot, O.

    2015-12-23

    Numerous nuclear processes involve the deexcitation of a compound nucleus through the emission of several neutrons, gamma-rays and/or conversion electrons. The characteristics of such a deexcitation are commonly derived from a total statistical framework often called “Hauser–Feshbach” method. In this work, we highlight a numerical limitation of this kind of method in the case of the deexcitation of a high spin initial state. To circumvent this issue, an improved technique called the Fluctuating Structure Properties (FSP) method is presented. Two FSP algorithms are derived and benchmarked on the calculation of the total radiative width for a thermal neutron capture onmore » 238U. We compare the standard method with these FSP algorithms for the prediction of particle multiplicities in the deexcitation of a high spin level of 143Ba. The gamma multiplicity turns out to be very sensitive to the numerical method. The bias between the two techniques can reach 1.5 γγ/cascade. Lastly, the uncertainty of these calculations coming from the lack of knowledge on nuclear structure is estimated via the FSP method.« less

  18. Increasing Tobacco Quitline Calls from Pregnant African American Women: The “One Tiny Reason to Quit” Social Marketing Campaign

    PubMed Central

    Genderson, Maureen Wilson; Sepulveda, Allison L.; Garland, Sheryl L.; Wilson, Diane Baer; Stith-Singleton, Rose; Dubuque, Susan

    2013-01-01

    Abstract Introduction Pregnant African American women are at disproportionately high risk of premature birth and infant mortality, outcomes associated with cigarette smoking. Telephone-based, individual smoking cessation counseling has been shown to result in successful quit attempts in the general population and among pregnant women, but “quitlines” are underutilized. A social marketing campaign called One Tiny Reason to Quit (OTRTQ) promoted calling a quitline (1-800-QUIT-NOW) to pregnant, African American women in Richmond, Virginia, in 2009 and was replicated there 2 years later. Methods The campaign disseminated messages via radio, interior bus ads, posters, newspaper ads, and billboards. Trained volunteers also delivered messages face-to-face and distributed branded give-away reminder items. The number of calls made from pregnant women in the Richmond area during summer 2009 was contrasted with (a) the number of calls during the seasons immediately before and after the campaign, and (b) the number of calls the previous summer. The replication used the same evaluation design. Results There were statistically significant spikes in calls from pregnant women during both campaign waves for both types of contrasts. A higher proportion of the calls from pregnant women were from African Americans during the campaign. Conclusion A multimodal quitline promotion like OTRTQ should be considered for geographic areas with sizable African American populations and high rates of infant mortality. PMID:23621745

  19. Repertoire and classification of non-song calls in Southeast Alaskan humpback whales (Megaptera novaeangliae).

    PubMed

    Fournet, Michelle E; Szabo, Andy; Mellinger, David K

    2015-01-01

    On low-latitude breeding grounds, humpback whales produce complex and highly stereotyped songs as well as a range of non-song sounds associated with breeding behaviors. While on their Southeast Alaskan foraging grounds, humpback whales produce a range of previously unclassified non-song vocalizations. This study investigates the vocal repertoire of Southeast Alaskan humpback whales from a sample of 299 non-song vocalizations collected over a 3-month period on foraging grounds in Frederick Sound, Southeast Alaska. Three classification systems were used, including aural spectrogram analysis, statistical cluster analysis, and discriminant function analysis, to describe and classify vocalizations. A hierarchical acoustic structure was identified; vocalizations were classified into 16 individual call types nested within four vocal classes. The combined classification method shows promise for identifying variability in call stereotypy between vocal groupings and is recommended for future classification of broad vocal repertoires.

  20. Polygenic scores via penalized regression on summary statistics.

    PubMed

    Mak, Timothy Shin Heng; Porsch, Robert Milan; Choi, Shing Wan; Zhou, Xueya; Sham, Pak Chung

    2017-09-01

    Polygenic scores (PGS) summarize the genetic contribution of a person's genotype to a disease or phenotype. They can be used to group participants into different risk categories for diseases, and are also used as covariates in epidemiological analyses. A number of possible ways of calculating PGS have been proposed, and recently there is much interest in methods that incorporate information available in published summary statistics. As there is no inherent information on linkage disequilibrium (LD) in summary statistics, a pertinent question is how we can use LD information available elsewhere to supplement such analyses. To answer this question, we propose a method for constructing PGS using summary statistics and a reference panel in a penalized regression framework, which we call lassosum. We also propose a general method for choosing the value of the tuning parameter in the absence of validation data. In our simulations, we showed that pseudovalidation often resulted in prediction accuracy that is comparable to using a dataset with validation phenotype and was clearly superior to the conservative option of setting the tuning parameter of lassosum to its lowest value. We also showed that lassosum achieved better prediction accuracy than simple clumping and P-value thresholding in almost all scenarios. It was also substantially faster and more accurate than the recently proposed LDpred. © 2017 WILEY PERIODICALS, INC.

  1. Helping Students Develop Statistical Reasoning: Implementing a Statistical Reasoning Learning Environment

    ERIC Educational Resources Information Center

    Garfield, Joan; Ben-Zvi, Dani

    2009-01-01

    This article describes a model for an interactive, introductory secondary- or tertiary-level statistics course that is designed to develop students' statistical reasoning. This model is called a "Statistical Reasoning Learning Environment" and is built on the constructivist theory of learning.

  2. Multi-Centrality Graph Spectral Decompositions and Their Application to Cyber Intrusion Detection

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chen, Pin-Yu; Choudhury, Sutanay; Hero, Alfred

    Many modern datasets can be represented as graphs and hence spectral decompositions such as graph principal component analysis (PCA) can be useful. Distinct from previous graph decomposition approaches based on subspace projection of a single topological feature, e.g., the centered graph adjacency matrix (graph Laplacian), we propose spectral decomposition approaches to graph PCA and graph dictionary learning that integrate multiple features, including graph walk statistics, centrality measures and graph distances to reference nodes. In this paper we propose a new PCA method for single graph analysis, called multi-centrality graph PCA (MC-GPCA), and a new dictionary learning method for ensembles ofmore » graphs, called multi-centrality graph dictionary learning (MC-GDL), both based on spectral decomposition of multi-centrality matrices. As an application to cyber intrusion detection, MC-GPCA can be an effective indicator of anomalous connectivity pattern and MC-GDL can provide discriminative basis for attack classification.« less

  3. Examining the Process of Responding to Circumplex Scales of Interpersonal Values Items: Should Ideal Point Scoring Methods Be Considered?

    PubMed

    Ling, Ying; Zhang, Minqiang; Locke, Kenneth D; Li, Guangming; Li, Zonglong

    2016-01-01

    The Circumplex Scales of Interpersonal Values (CSIV) is a 64-item self-report measure of goals from each octant of the interpersonal circumplex. We used item response theory methods to compare whether dominance models or ideal point models best described how people respond to CSIV items. Specifically, we fit a polytomous dominance model called the generalized partial credit model and an ideal point model of similar complexity called the generalized graded unfolding model to the responses of 1,893 college students. The results of both graphical comparisons of item characteristic curves and statistical comparisons of model fit suggested that an ideal point model best describes the process of responding to CSIV items. The different models produced different rank orderings of high-scoring respondents, but overall the models did not differ in their prediction of criterion variables (agentic and communal interpersonal traits and implicit motives).

  4. A method of classification for multisource data in remote sensing based on interval-valued probabilities

    NASA Technical Reports Server (NTRS)

    Kim, Hakil; Swain, Philip H.

    1990-01-01

    An axiomatic approach to intervalued (IV) probabilities is presented, where the IV probability is defined by a pair of set-theoretic functions which satisfy some pre-specified axioms. On the basis of this approach representation of statistical evidence and combination of multiple bodies of evidence are emphasized. Although IV probabilities provide an innovative means for the representation and combination of evidential information, they make the decision process rather complicated. It entails more intelligent strategies for making decisions. The development of decision rules over IV probabilities is discussed from the viewpoint of statistical pattern recognition. The proposed method, so called evidential reasoning method, is applied to the ground-cover classification of a multisource data set consisting of Multispectral Scanner (MSS) data, Synthetic Aperture Radar (SAR) data, and digital terrain data such as elevation, slope, and aspect. By treating the data sources separately, the method is able to capture both parametric and nonparametric information and to combine them. Then the method is applied to two separate cases of classifying multiband data obtained by a single sensor. In each case a set of multiple sources is obtained by dividing the dimensionally huge data into smaller and more manageable pieces based on the global statistical correlation information. By a divide-and-combine process, the method is able to utilize more features than the conventional maximum likelihood method.

  5. Statistical physics in foreign exchange currency and stock markets

    NASA Astrophysics Data System (ADS)

    Ausloos, M.

    2000-09-01

    Problems in economy and finance have attracted the interest of statistical physicists all over the world. Fundamental problems pertain to the existence or not of long-, medium- or/and short-range power-law correlations in various economic systems, to the presence of financial cycles and on economic considerations, including economic policy. A method like the detrended fluctuation analysis is recalled emphasizing its value in sorting out correlation ranges, thereby leading to predictability at short horizon. The ( m, k)-Zipf method is presented for sorting out short-range correlations in the sign and amplitude of the fluctuations. A well-known financial analysis technique, the so-called moving average, is shown to raise questions to physicists about fractional Brownian motion properties. Among spectacular results, the possibility of crash predictions has been demonstrated through the log-periodicity of financial index oscillations.

  6. On an Additive Semigraphoid Model for Statistical Networks With Application to Pathway Analysis.

    PubMed

    Li, Bing; Chun, Hyonho; Zhao, Hongyu

    2014-09-01

    We introduce a nonparametric method for estimating non-gaussian graphical models based on a new statistical relation called additive conditional independence, which is a three-way relation among random vectors that resembles the logical structure of conditional independence. Additive conditional independence allows us to use one-dimensional kernel regardless of the dimension of the graph, which not only avoids the curse of dimensionality but also simplifies computation. It also gives rise to a parallel structure to the gaussian graphical model that replaces the precision matrix by an additive precision operator. The estimators derived from additive conditional independence cover the recently introduced nonparanormal graphical model as a special case, but outperform it when the gaussian copula assumption is violated. We compare the new method with existing ones by simulations and in genetic pathway analysis.

  7. Japan Report

    DTIC Science & Technology

    1985-03-18

    increased profitability, productivity and more effective management ." Members of the British side also called for an increase in Japan’s defense...copyright transfer, or rental permission is called technology trade. According to the Statistic Bureau of the Management and Coordina- tion Agency...1- 8.8 -Swizerland 6.3 5.6 Others 13.1 Source: Statistics Bureau of the Management and Coordination Agency China. 66.2% of Japan’s imported

  8. Impact of the mass media on calls to the CDC National AIDS Hotline.

    PubMed

    Fan, D P

    1996-06-01

    This paper considers new computer methodologies for assessing the impact of different types of public health information. The example used public service announcements (PSAs) and mass media news to predict the volume of attempts to call the CDC National AIDS Hotline from December 1992 through to the end of 1993. The analysis relied solely on data from electronic databases. Newspaper stories and television news transcripts were obtained from the NEXIS electronic database and were scored by machine for AIDS coverage. The PSA database was generated by computer monitoring of advertising distributed by the Centers for Disease Control and Prevention (CDC) and by others. The volume of call attempts was collected automatically by the public branch exchange (PBX) of the Hotline telephone system. The call attempts, the PSAs and the news story data were related to each other using both a standard time series method and the statistical model of ideodynamics. The analysis indicated that the only significant explanatory variable for the call attempts was PSAs produced by the CDC. One possible explanation was that these commercials all included the Hotline telephone number while the other information sources did not.

  9. Bringing Clouds into Our Lab! - The Influence of Turbulence on the Early Stage Rain Droplets

    NASA Astrophysics Data System (ADS)

    Yavuz, Mehmet Altug; Kunnen, Rudie; Heijst, Gertjan; Clercx, Herman

    2015-11-01

    We are investigating a droplet-laden flow in an air-filled turbulence chamber, forced by speaker-driven air jets. The speakers are running in a random manner; yet they allow us to control and define the statistics of the turbulence. We study the motion of droplets with tunable size (Stokes numbers ~ 0.13 - 9) in a turbulent flow, mimicking the early stages of raindrop formation. 3D Particle Tracking Velocimetry (PTV) together with Laser Induced Fluorescence (LIF) methods are chosen as the experimental method to track the droplets and collect data for statistical analysis. Thereby it is possible to study the spatial distribution of the droplets in turbulence using the so-called Radial Distribution Function (RDF), a statistical measure to quantify the clustering of particles. Additionally, 3D-PTV technique allows us to measure velocity statistics of the droplets and the influence of the turbulence on droplet trajectories, both individually and collectively. In this contribution, we will present the clustering probability quantified by the RDF for different Stokes numbers. We will explain the physics underlying the influence of turbulence on droplet cluster behavior. This study supported by FOM/NWO Netherlands.

  10. Statistics of Optical Coherence Tomography Data From Human Retina

    PubMed Central

    de Juan, Joaquín; Ferrone, Claudia; Giannini, Daniela; Huang, David; Koch, Giorgio; Russo, Valentina; Tan, Ou; Bruni, Carlo

    2010-01-01

    Optical coherence tomography (OCT) has recently become one of the primary methods for noninvasive probing of the human retina. The pseudoimage formed by OCT (the so-called B-scan) varies probabilistically across pixels due to complexities in the measurement technique. Hence, sensitive automatic procedures of diagnosis using OCT may exploit statistical analysis of the spatial distribution of reflectance. In this paper, we perform a statistical study of retinal OCT data. We find that the stretched exponential probability density function can model well the distribution of intensities in OCT pseudoimages. Moreover, we show a small, but significant correlation between neighbor pixels when measuring OCT intensities with pixels of about 5 µm. We then develop a simple joint probability model for the OCT data consistent with known retinal features. This model fits well the stretched exponential distribution of intensities and their spatial correlation. In normal retinas, fit parameters of this model are relatively constant along retinal layers, but varies across layers. However, in retinas with diabetic retinopathy, large spikes of parameter modulation interrupt the constancy within layers, exactly where pathologies are visible. We argue that these results give hope for improvement in statistical pathology-detection methods even when the disease is in its early stages. PMID:20304733

  11. An extended data mining method for identifying differentially expressed assay-specific signatures in functional genomic studies.

    PubMed

    Rollins, Derrick K; Teh, Ailing

    2010-12-17

    Microarray data sets provide relative expression levels for thousands of genes for a small number, in comparison, of different experimental conditions called assays. Data mining techniques are used to extract specific information of genes as they relate to the assays. The multivariate statistical technique of principal component analysis (PCA) has proven useful in providing effective data mining methods. This article extends the PCA approach of Rollins et al. to the development of ranking genes of microarray data sets that express most differently between two biologically different grouping of assays. This method is evaluated on real and simulated data and compared to a current approach on the basis of false discovery rate (FDR) and statistical power (SP) which is the ability to correctly identify important genes. This work developed and evaluated two new test statistics based on PCA and compared them to a popular method that is not PCA based. Both test statistics were found to be effective as evaluated in three case studies: (i) exposing E. coli cells to two different ethanol levels; (ii) application of myostatin to two groups of mice; and (iii) a simulated data study derived from the properties of (ii). The proposed method (PM) effectively identified critical genes in these studies based on comparison with the current method (CM). The simulation study supports higher identification accuracy for PM over CM for both proposed test statistics when the gene variance is constant and for one of the test statistics when the gene variance is non-constant. PM compares quite favorably to CM in terms of lower FDR and much higher SP. Thus, PM can be quite effective in producing accurate signatures from large microarray data sets for differential expression between assays groups identified in a preliminary step of the PCA procedure and is, therefore, recommended for use in these applications.

  12. Statistical Analysis of Hit/Miss Data (Preprint)

    DTIC Science & Technology

    2012-07-01

    HDBK-1823A, 2009). Other agencies and industries have also made use of this guidance (Gandossi et al., 2010) and ( Drury et al., 2006). It should...better accounting of false call rates such that the POD curve doesn’t converge to 0 for small flaw sizes. The difficulty with conventional methods...2002. Drury , Ghylin, and Holness, Error Analysis and Threat Magnitude for Carry-on Bag Inspection, Proceedings of the Human Factors and Ergonomic

  13. Shilling Attacks Detection in Recommender Systems Based on Target Item Analysis

    PubMed Central

    Zhou, Wei; Wen, Junhao; Koh, Yun Sing; Xiong, Qingyu; Gao, Min; Dobbie, Gillian; Alam, Shafiq

    2015-01-01

    Recommender systems are highly vulnerable to shilling attacks, both by individuals and groups. Attackers who introduce biased ratings in order to affect recommendations, have been shown to negatively affect collaborative filtering (CF) algorithms. Previous research focuses only on the differences between genuine profiles and attack profiles, ignoring the group characteristics in attack profiles. In this paper, we study the use of statistical metrics to detect rating patterns of attackers and group characteristics in attack profiles. Another question is that most existing detecting methods are model specific. Two metrics, Rating Deviation from Mean Agreement (RDMA) and Degree of Similarity with Top Neighbors (DegSim), are used for analyzing rating patterns between malicious profiles and genuine profiles in attack models. Building upon this, we also propose and evaluate a detection structure called RD-TIA for detecting shilling attacks in recommender systems using a statistical approach. In order to detect more complicated attack models, we propose a novel metric called DegSim’ based on DegSim. The experimental results show that our detection model based on target item analysis is an effective approach for detecting shilling attacks. PMID:26222882

  14. Image Filtering with Boolean and Statistical Operators.

    DTIC Science & Technology

    1983-12-01

    S3(2) COMPLEX AMAT(256, 4). BMAT (256. 4). CMAT(256. 4) CALL IOF(3. MAIN. AFLNM. DFLNI, CFLNM. MS., 82, S3) CALL OPEN(1.AFLNM* 1.IER) CALL CHECKC!ER...RDBLK(2. 6164. MAT. 16, IER) CALL CHECK(IER) DO I K-1. 4 DO I J-1.256 CMAT(J. K)-AMAT(J. K)’. BMAT (J. K) I CONTINUE S CALL WRBLK(3. 164!. CMAT. 16. IER

  15. Applications submitted and grants awarded to men and women in nationwide biomedical competitive research, in 2006, in Spain

    PubMed Central

    Peiró‐Pérez, Rosana; Colomer‐Revuelta, Concha; Blázquez‐Herranz, Margarita; Gómez‐López, Fernando

    2007-01-01

    Background According to European reports, women participate in research less than men, especially in positions of responsibility. This kind of analysis has not been carried out in Spain in the field of biomedical research. This study describes participation of men and women as grant applicants in two different calls for research funding, held in Spain in 2006. Methods Data collected from grant applicants and from grantees, for two different competitive grant researches areas: human resources and CIBER (Spanish acronym for Biomedical Research Network Centres) have been described by sex. Results The human resources call shows that the number of applications submitted by women is higher (67.8% vs 32.2%), but the percentage of awards are similar (20.3% vs 22.7%), OR = 1.15 (95% CI: 0.82 to 1.62), with no statistical differences, although there are more men in the upper categories (superior technical experts (OR = 1.19 (0.58 to 2.45)) post‐doctoral (OR = 1.36 (0.65 to 2.86)) and research personnel (OR = 1.48 (0.67 to 3.25)). With the CIBER call (senior researchers) there is a clear difference in the number of applicants (women 19.6%, men 80.4%) but the number of awardees is similar (40.3% vs 43.1%) OR = 0.89 (0.65 to 1.34). Conclusions Although there are no statistical differences between women and men, with respect the awards obtained, there is a different pattern to the type of grant application, with fewer women in the more senior call. PMID:18000109

  16. Inverse statistical physics of protein sequences: a key issues review.

    PubMed

    Cocco, Simona; Feinauer, Christoph; Figliuzzi, Matteo; Monasson, Rémi; Weigt, Martin

    2018-03-01

    In the course of evolution, proteins undergo important changes in their amino acid sequences, while their three-dimensional folded structure and their biological function remain remarkably conserved. Thanks to modern sequencing techniques, sequence data accumulate at unprecedented pace. This provides large sets of so-called homologous, i.e. evolutionarily related protein sequences, to which methods of inverse statistical physics can be applied. Using sequence data as the basis for the inference of Boltzmann distributions from samples of microscopic configurations or observables, it is possible to extract information about evolutionary constraints and thus protein function and structure. Here we give an overview over some biologically important questions, and how statistical-mechanics inspired modeling approaches can help to answer them. Finally, we discuss some open questions, which we expect to be addressed over the next years.

  17. Inverse statistical physics of protein sequences: a key issues review

    NASA Astrophysics Data System (ADS)

    Cocco, Simona; Feinauer, Christoph; Figliuzzi, Matteo; Monasson, Rémi; Weigt, Martin

    2018-03-01

    In the course of evolution, proteins undergo important changes in their amino acid sequences, while their three-dimensional folded structure and their biological function remain remarkably conserved. Thanks to modern sequencing techniques, sequence data accumulate at unprecedented pace. This provides large sets of so-called homologous, i.e. evolutionarily related protein sequences, to which methods of inverse statistical physics can be applied. Using sequence data as the basis for the inference of Boltzmann distributions from samples of microscopic configurations or observables, it is possible to extract information about evolutionary constraints and thus protein function and structure. Here we give an overview over some biologically important questions, and how statistical-mechanics inspired modeling approaches can help to answer them. Finally, we discuss some open questions, which we expect to be addressed over the next years.

  18. From Weakly Chaotic Dynamics to Deterministic Subdiffusion via Copula Modeling

    NASA Astrophysics Data System (ADS)

    Nazé, Pierre

    2018-03-01

    Copula modeling consists in finding a probabilistic distribution, called copula, whereby its coupling with the marginal distributions of a set of random variables produces their joint distribution. The present work aims to use this technique to connect the statistical distributions of weakly chaotic dynamics and deterministic subdiffusion. More precisely, we decompose the jumps distribution of Geisel-Thomae map into a bivariate one and determine the marginal and copula distributions respectively by infinite ergodic theory and statistical inference techniques. We verify therefore that the characteristic tail distribution of subdiffusion is an extreme value copula coupling Mittag-Leffler distributions. We also present a method to calculate the exact copula and joint distributions in the case where weakly chaotic dynamics and deterministic subdiffusion statistical distributions are already known. Numerical simulations and consistency with the dynamical aspects of the map support our results.

  19. Entropy of dynamical social networks

    NASA Astrophysics Data System (ADS)

    Zhao, Kun; Karsai, Marton; Bianconi, Ginestra

    2012-02-01

    Dynamical social networks are evolving rapidly and are highly adaptive. Characterizing the information encoded in social networks is essential to gain insight into the structure, evolution, adaptability and dynamics. Recently entropy measures have been used to quantify the information in email correspondence, static networks and mobility patterns. Nevertheless, we still lack methods to quantify the information encoded in time-varying dynamical social networks. In this talk we present a model to quantify the entropy of dynamical social networks and use this model to analyze the data of phone-call communication. We show evidence that the entropy of the phone-call interaction network changes according to circadian rhythms. Moreover we show that social networks are extremely adaptive and are modified by the use of technologies such as mobile phone communication. Indeed the statistics of duration of phone-call is described by a Weibull distribution and is significantly different from the distribution of duration of face-to-face interactions in a conference. Finally we investigate how much the entropy of dynamical social networks changes in realistic models of phone-call or face-to face interactions characterizing in this way different type human social behavior.

  20. Quantifying the impact of between-study heterogeneity in multivariate meta-analyses

    PubMed Central

    Jackson, Dan; White, Ian R; Riley, Richard D

    2012-01-01

    Measures that quantify the impact of heterogeneity in univariate meta-analysis, including the very popular I2 statistic, are now well established. Multivariate meta-analysis, where studies provide multiple outcomes that are pooled in a single analysis, is also becoming more commonly used. The question of how to quantify heterogeneity in the multivariate setting is therefore raised. It is the univariate R2 statistic, the ratio of the variance of the estimated treatment effect under the random and fixed effects models, that generalises most naturally, so this statistic provides our basis. This statistic is then used to derive a multivariate analogue of I2, which we call . We also provide a multivariate H2 statistic, the ratio of a generalisation of Cochran's heterogeneity statistic and its associated degrees of freedom, with an accompanying generalisation of the usual I2 statistic, . Our proposed heterogeneity statistics can be used alongside all the usual estimates and inferential procedures used in multivariate meta-analysis. We apply our methods to some real datasets and show how our statistics are equally appropriate in the context of multivariate meta-regression, where study level covariate effects are included in the model. Our heterogeneity statistics may be used when applying any procedure for fitting the multivariate random effects model. Copyright © 2012 John Wiley & Sons, Ltd. PMID:22763950

  1. GAPIT: genome association and prediction integrated tool.

    PubMed

    Lipka, Alexander E; Tian, Feng; Wang, Qishan; Peiffer, Jason; Li, Meng; Bradbury, Peter J; Gore, Michael A; Buckler, Edward S; Zhang, Zhiwu

    2012-09-15

    Software programs that conduct genome-wide association studies and genomic prediction and selection need to use methodologies that maximize statistical power, provide high prediction accuracy and run in a computationally efficient manner. We developed an R package called Genome Association and Prediction Integrated Tool (GAPIT) that implements advanced statistical methods including the compressed mixed linear model (CMLM) and CMLM-based genomic prediction and selection. The GAPIT package can handle large datasets in excess of 10 000 individuals and 1 million single-nucleotide polymorphisms with minimal computational time, while providing user-friendly access and concise tables and graphs to interpret results. http://www.maizegenetics.net/GAPIT. zhiwu.zhang@cornell.edu Supplementary data are available at Bioinformatics online.

  2. Dynamic principle for ensemble control tools.

    PubMed

    Samoletov, A; Vasiev, B

    2017-11-28

    Dynamical equations describing physical systems in contact with a thermal bath are commonly extended by mathematical tools called "thermostats." These tools are designed for sampling ensembles in statistical mechanics. Here we propose a dynamic principle underlying a range of thermostats which is derived using fundamental laws of statistical physics and ensures invariance of the canonical measure. The principle covers both stochastic and deterministic thermostat schemes. Our method has a clear advantage over a range of proposed and widely used thermostat schemes that are based on formal mathematical reasoning. Following the derivation of the proposed principle, we show its generality and illustrate its applications including design of temperature control tools that differ from the Nosé-Hoover-Langevin scheme.

  3. Entropy in sound and vibration: towards a new paradigm

    PubMed Central

    2017-01-01

    This paper describes a discussion on the method and the status of a statistical theory of sound and vibration, called statistical energy analysis (SEA). SEA is a simple theory of sound and vibration in elastic structures that applies when the vibrational energy is diffusely distributed. We show that SEA is a thermodynamical theory of sound and vibration, based on a law of exchange of energy analogous to the Clausius principle. We further investigate the notion of entropy in this context and discuss its meaning. We show that entropy is a measure of information lost in the passage from the classical theory of sound and vibration and SEA, its thermodynamical counterpart. PMID:28265190

  4. Three novel approaches to structural identifiability analysis in mixed-effects models.

    PubMed

    Janzén, David L I; Jirstrand, Mats; Chappell, Michael J; Evans, Neil D

    2016-05-06

    Structural identifiability is a concept that considers whether the structure of a model together with a set of input-output relations uniquely determines the model parameters. In the mathematical modelling of biological systems, structural identifiability is an important concept since biological interpretations are typically made from the parameter estimates. For a system defined by ordinary differential equations, several methods have been developed to analyse whether the model is structurally identifiable or otherwise. Another well-used modelling framework, which is particularly useful when the experimental data are sparsely sampled and the population variance is of interest, is mixed-effects modelling. However, established identifiability analysis techniques for ordinary differential equations are not directly applicable to such models. In this paper, we present and apply three different methods that can be used to study structural identifiability in mixed-effects models. The first method, called the repeated measurement approach, is based on applying a set of previously established statistical theorems. The second method, called the augmented system approach, is based on augmenting the mixed-effects model to an extended state-space form. The third method, called the Laplace transform mixed-effects extension, is based on considering the moment invariants of the systems transfer function as functions of random variables. To illustrate, compare and contrast the application of the three methods, they are applied to a set of mixed-effects models. Three structural identifiability analysis methods applicable to mixed-effects models have been presented in this paper. As method development of structural identifiability techniques for mixed-effects models has been given very little attention, despite mixed-effects models being widely used, the methods presented in this paper provides a way of handling structural identifiability in mixed-effects models previously not possible. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  5. Non-Gaussian Distributions Affect Identification of Expression Patterns, Functional Annotation, and Prospective Classification in Human Cancer Genomes

    PubMed Central

    Marko, Nicholas F.; Weil, Robert J.

    2012-01-01

    Introduction Gene expression data is often assumed to be normally-distributed, but this assumption has not been tested rigorously. We investigate the distribution of expression data in human cancer genomes and study the implications of deviations from the normal distribution for translational molecular oncology research. Methods We conducted a central moments analysis of five cancer genomes and performed empiric distribution fitting to examine the true distribution of expression data both on the complete-experiment and on the individual-gene levels. We used a variety of parametric and nonparametric methods to test the effects of deviations from normality on gene calling, functional annotation, and prospective molecular classification using a sixth cancer genome. Results Central moments analyses reveal statistically-significant deviations from normality in all of the analyzed cancer genomes. We observe as much as 37% variability in gene calling, 39% variability in functional annotation, and 30% variability in prospective, molecular tumor subclassification associated with this effect. Conclusions Cancer gene expression profiles are not normally-distributed, either on the complete-experiment or on the individual-gene level. Instead, they exhibit complex, heavy-tailed distributions characterized by statistically-significant skewness and kurtosis. The non-Gaussian distribution of this data affects identification of differentially-expressed genes, functional annotation, and prospective molecular classification. These effects may be reduced in some circumstances, although not completely eliminated, by using nonparametric analytics. This analysis highlights two unreliable assumptions of translational cancer gene expression analysis: that “small” departures from normality in the expression data distributions are analytically-insignificant and that “robust” gene-calling algorithms can fully compensate for these effects. PMID:23118863

  6. Impact of genotyping errors on statistical power of association tests in genomic analyses: A case study

    PubMed Central

    Hou, Lin; Sun, Ning; Mane, Shrikant; Sayward, Fred; Rajeevan, Nallakkandi; Cheung, Kei-Hoi; Cho, Kelly; Pyarajan, Saiju; Aslan, Mihaela; Miller, Perry; Harvey, Philip D.; Gaziano, J. Michael; Concato, John; Zhao, Hongyu

    2017-01-01

    A key step in genomic studies is to assess high throughput measurements across millions of markers for each participant’s DNA, either using microarrays or sequencing techniques. Accurate genotype calling is essential for downstream statistical analysis of genotype-phenotype associations, and next generation sequencing (NGS) has recently become a more common approach in genomic studies. How the accuracy of variant calling in NGS-based studies affects downstream association analysis has not, however, been studied using empirical data in which both microarrays and NGS were available. In this article, we investigate the impact of variant calling errors on the statistical power to identify associations between single nucleotides and disease, and on associations between multiple rare variants and disease. Both differential and nondifferential genotyping errors are considered. Our results show that the power of burden tests for rare variants is strongly influenced by the specificity in variant calling, but is rather robust with regard to sensitivity. By using the variant calling accuracies estimated from a substudy of a Cooperative Studies Program project conducted by the Department of Veterans Affairs, we show that the power of association tests is mostly retained with commonly adopted variant calling pipelines. An R package, GWAS.PC, is provided to accommodate power analysis that takes account of genotyping errors (http://zhaocenter.org/software/). PMID:28019059

  7. Developing Statistical Literacy with Year 9 Students: A Collaborative Research Project

    ERIC Educational Resources Information Center

    Sharma, Sashi

    2013-01-01

    Advances in technology and communication have increased the amount of statistical information delivered through everyday media. The importance of statistics in everyday life has led to calls for increased attention to statistical literacy in the mathematics curriculum (Watson 2006). Gal (2004) sees statistical literacy as the need for students to…

  8. Genotyping and inflated type I error rate in genome-wide association case/control studies

    PubMed Central

    Sampson, Joshua N; Zhao, Hongyu

    2009-01-01

    Background One common goal of a case/control genome wide association study (GWAS) is to find SNPs associated with a disease. Traditionally, the first step in such studies is to assign a genotype to each SNP in each subject, based on a statistic summarizing fluorescence measurements. When the distributions of the summary statistics are not well separated by genotype, the act of genotype assignment can lead to more potential problems than acknowledged by the literature. Results Specifically, we show that the proportions of each called genotype need not equal the true proportions in the population, even as the number of subjects grows infinitely large. The called genotypes for two subjects need not be independent, even when their true genotypes are independent. Consequently, p-values from tests of association can be anti-conservative, even when the distributions of the summary statistic for the cases and controls are identical. To address these problems, we propose two new tests designed to reduce the inflation in the type I error rate caused by these problems. The first algorithm, logiCALL, measures call quality by fully exploring the likelihood profile of intensity measurements, and the second algorithm avoids genotyping by using a likelihood ratio statistic. Conclusion Genotyping can introduce avoidable false positives in GWAS. PMID:19236714

  9. Metrically adjusted questionnaires can provide more information for scientists- an example from the tourism.

    PubMed

    Sindik, Joško; Miljanović, Maja

    2017-03-01

    The article deals with the issue of research methodology, illustrating the use of known research methods for new purposes. Questionnaires that originally do not have metric characteristics can be called »handy questionnaires«. In this article, the author is trying to consider the possibilities of their improved scientific usability, which can be primarily ensured by improving their metric characteristics, consequently using multivariate instead of univariate statistical methods. In order to establish the base for the application of multivariate statistical procedures, the main idea is to develop strategies to design measurement instruments from parts of the handy questionnaires. This can be accomplished in two ways: before deciding upon the methods for data collection (redesigning the handy questionnaires) and before the collection of the data (a priori) or after the data has been collected, without modifying the questionnaire (a posteriori). The basic principles of applying these two strategies of the metrical adaptation of handy questionnaires are described.

  10. K2 and K2*: efficient alignment-free sequence similarity measurement based on Kendall statistics.

    PubMed

    Lin, Jie; Adjeroh, Donald A; Jiang, Bing-Hua; Jiang, Yue

    2018-05-15

    Alignment-free sequence comparison methods can compute the pairwise similarity between a huge number of sequences much faster than sequence-alignment based methods. We propose a new non-parametric alignment-free sequence comparison method, called K2, based on the Kendall statistics. Comparing to the other state-of-the-art alignment-free comparison methods, K2 demonstrates competitive performance in generating the phylogenetic tree, in evaluating functionally related regulatory sequences, and in computing the edit distance (similarity/dissimilarity) between sequences. Furthermore, the K2 approach is much faster than the other methods. An improved method, K2*, is also proposed, which is able to determine the appropriate algorithmic parameter (length) automatically, without first considering different values. Comparative analysis with the state-of-the-art alignment-free sequence similarity methods demonstrates the superiority of the proposed approaches, especially with increasing sequence length, or increasing dataset sizes. The K2 and K2* approaches are implemented in the R language as a package and is freely available for open access (http://community.wvu.edu/daadjeroh/projects/K2/K2_1.0.tar.gz). yueljiang@163.com. Supplementary data are available at Bioinformatics online.

  11. Statistical Learning Theory for High Dimensional Prediction: Application to Criterion-Keyed Scale Development

    PubMed Central

    Chapman, Benjamin P.; Weiss, Alexander; Duberstein, Paul

    2016-01-01

    Statistical learning theory (SLT) is the statistical formulation of machine learning theory, a body of analytic methods common in “big data” problems. Regression-based SLT algorithms seek to maximize predictive accuracy for some outcome, given a large pool of potential predictors, without overfitting the sample. Research goals in psychology may sometimes call for high dimensional regression. One example is criterion-keyed scale construction, where a scale with maximal predictive validity must be built from a large item pool. Using this as a working example, we first introduce a core principle of SLT methods: minimization of expected prediction error (EPE). Minimizing EPE is fundamentally different than maximizing the within-sample likelihood, and hinges on building a predictive model of sufficient complexity to predict the outcome well, without undue complexity leading to overfitting. We describe how such models are built and refined via cross-validation. We then illustrate how three common SLT algorithms–Supervised Principal Components, Regularization, and Boosting—can be used to construct a criterion-keyed scale predicting all-cause mortality, using a large personality item pool within a population cohort. Each algorithm illustrates a different approach to minimizing EPE. Finally, we consider broader applications of SLT predictive algorithms, both as supportive analytic tools for conventional methods, and as primary analytic tools in discovery phase research. We conclude that despite their differences from the classic null-hypothesis testing approach—or perhaps because of them–SLT methods may hold value as a statistically rigorous approach to exploratory regression. PMID:27454257

  12. Disjunctive Normal Shape and Appearance Priors with Applications to Image Segmentation.

    PubMed

    Mesadi, Fitsum; Cetin, Mujdat; Tasdizen, Tolga

    2015-10-01

    The use of appearance and shape priors in image segmentation is known to improve accuracy; however, existing techniques have several drawbacks. Active shape and appearance models require landmark points and assume unimodal shape and appearance distributions. Level set based shape priors are limited to global shape similarity. In this paper, we present a novel shape and appearance priors for image segmentation based on an implicit parametric shape representation called disjunctive normal shape model (DNSM). DNSM is formed by disjunction of conjunctions of half-spaces defined by discriminants. We learn shape and appearance statistics at varying spatial scales using nonparametric density estimation. Our method can generate a rich set of shape variations by locally combining training shapes. Additionally, by studying the intensity and texture statistics around each discriminant of our shape model, we construct a local appearance probability map. Experiments carried out on both medical and natural image datasets show the potential of the proposed method.

  13. Unbiased estimation of oceanic mean rainfall from satellite borne radiometer measurements

    NASA Technical Reports Server (NTRS)

    Mittal, M. C.

    1981-01-01

    The statistical properties of the radar derived rainfall obtained during the GARP Atlantic Tropical Experiment (GATE) are used to derive quantitative estimates of the spatial and temporal sampling errors associated with estimating rainfall from brightness temperature measurements such as would be obtained from a satelliteborne microwave radiometer employing a practical size antenna aperture. A basis for a method of correcting the so called beam filling problem, i.e., for the effect of nonuniformity of rainfall over the radiometer beamwidth is provided. The method presented employs the statistical properties of the observations themselves without need for physical assumptions beyond those associated with the radiative transfer model. The simulation results presented offer a validation of the estimated accuracy that can be achieved and the graphs included permit evaluation of the effect of the antenna resolution on both the temporal and spatial sampling errors.

  14. Automated surveillance of 911 call data for detection of possible water contamination incidents

    PubMed Central

    2011-01-01

    Background Drinking water contamination, with the capability to affect large populations, poses a significant risk to public health. In recent water contamination events, the impact of contamination on public health appeared in data streams monitoring health-seeking behavior. While public health surveillance has traditionally focused on the detection of pathogens, developing methods for detection of illness from fast-acting chemicals has not been an emphasis. Methods An automated surveillance system was implemented for Cincinnati's drinking water contamination warning system to monitor health-related 911 calls in the city of Cincinnati. Incident codes indicative of possible water contamination were filtered from all 911 calls for analysis. The 911 surveillance system uses a space-time scan statistic to detect potential water contamination incidents. The frequency and characteristics of the 911 alarms over a 2.5 year period were studied. Results During the evaluation, 85 alarms occurred, although most occurred prior to the implementation of an additional alerting constraint in May 2009. Data were available for analysis approximately 48 minutes after calls indicating alarms may be generated 1-2 hours after a rapid increase in call volume. Most alerts occurred in areas of high population density. The average alarm area was 9.22 square kilometers. The average number of cases in an alarm was nine calls. Conclusions The 911 surveillance system provides timely notification of possible public health events, but did have limitations. While the alarms contained incident codes and location of the caller, additional information such as medical status was not available to assist validating the cause of the alarm. Furthermore, users indicated that a better understanding of 911 system functionality is necessary to understand how it would behave in an actual water contamination event. PMID:21450105

  15. An integrated workflow for robust alignment and simplified quantitative analysis of NMR spectrometry data.

    PubMed

    Vu, Trung N; Valkenborg, Dirk; Smets, Koen; Verwaest, Kim A; Dommisse, Roger; Lemière, Filip; Verschoren, Alain; Goethals, Bart; Laukens, Kris

    2011-10-20

    Nuclear magnetic resonance spectroscopy (NMR) is a powerful technique to reveal and compare quantitative metabolic profiles of biological tissues. However, chemical and physical sample variations make the analysis of the data challenging, and typically require the application of a number of preprocessing steps prior to data interpretation. For example, noise reduction, normalization, baseline correction, peak picking, spectrum alignment and statistical analysis are indispensable components in any NMR analysis pipeline. We introduce a novel suite of informatics tools for the quantitative analysis of NMR metabolomic profile data. The core of the processing cascade is a novel peak alignment algorithm, called hierarchical Cluster-based Peak Alignment (CluPA). The algorithm aligns a target spectrum to the reference spectrum in a top-down fashion by building a hierarchical cluster tree from peak lists of reference and target spectra and then dividing the spectra into smaller segments based on the most distant clusters of the tree. To reduce the computational time to estimate the spectral misalignment, the method makes use of Fast Fourier Transformation (FFT) cross-correlation. Since the method returns a high-quality alignment, we can propose a simple methodology to study the variability of the NMR spectra. For each aligned NMR data point the ratio of the between-group and within-group sum of squares (BW-ratio) is calculated to quantify the difference in variability between and within predefined groups of NMR spectra. This differential analysis is related to the calculation of the F-statistic or a one-way ANOVA, but without distributional assumptions. Statistical inference based on the BW-ratio is achieved by bootstrapping the null distribution from the experimental data. The workflow performance was evaluated using a previously published dataset. Correlation maps, spectral and grey scale plots show clear improvements in comparison to other methods, and the down-to-earth quantitative analysis works well for the CluPA-aligned spectra. The whole workflow is embedded into a modular and statistically sound framework that is implemented as an R package called "speaq" ("spectrum alignment and quantitation"), which is freely available from http://code.google.com/p/speaq/.

  16. A Low-Cost Method for Multiple Disease Prediction.

    PubMed

    Bayati, Mohsen; Bhaskar, Sonia; Montanari, Andrea

    Recently, in response to the rising costs of healthcare services, employers that are financially responsible for the healthcare costs of their workforce have been investing in health improvement programs for their employees. A main objective of these so called "wellness programs" is to reduce the incidence of chronic illnesses such as cardiovascular disease, cancer, diabetes, and obesity, with the goal of reducing future medical costs. The majority of these wellness programs include an annual screening to detect individuals with the highest risk of developing chronic disease. Once these individuals are identified, the company can invest in interventions to reduce the risk of those individuals. However, capturing many biomarkers per employee creates a costly screening procedure. We propose a statistical data-driven method to address this challenge by minimizing the number of biomarkers in the screening procedure while maximizing the predictive power over a broad spectrum of diseases. Our solution uses multi-task learning and group dimensionality reduction from machine learning and statistics. We provide empirical validation of the proposed solution using data from two different electronic medical records systems, with comparisons to a statistical benchmark.

  17. Modelling a real-world buried valley system with vertical non-stationarity using multiple-point statistics

    NASA Astrophysics Data System (ADS)

    He, Xiulan; Sonnenborg, Torben O.; Jørgensen, Flemming; Jensen, Karsten H.

    2017-03-01

    Stationarity has traditionally been a requirement of geostatistical simulations. A common way to deal with non-stationarity is to divide the system into stationary sub-regions and subsequently merge the realizations for each region. Recently, the so-called partition approach that has the flexibility to model non-stationary systems directly was developed for multiple-point statistics simulation (MPS). The objective of this study is to apply the MPS partition method with conventional borehole logs and high-resolution airborne electromagnetic (AEM) data, for simulation of a real-world non-stationary geological system characterized by a network of connected buried valleys that incise deeply into layered Miocene sediments (case study in Denmark). The results show that, based on fragmented information of the formation boundaries, the MPS partition method is able to simulate a non-stationary system including valley structures embedded in a layered Miocene sequence in a single run. Besides, statistical information retrieved from the AEM data improved the simulation of the geology significantly, especially for the deep-seated buried valley sediments where borehole information is sparse.

  18. Regularized learning of linear ordered-statistic constant false alarm rate filters (Conference Presentation)

    NASA Astrophysics Data System (ADS)

    Havens, Timothy C.; Cummings, Ian; Botts, Jonathan; Summers, Jason E.

    2017-05-01

    The linear ordered statistic (LOS) is a parameterized ordered statistic (OS) that is a weighted average of a rank-ordered sample. LOS operators are useful generalizations of aggregation as they can represent any linear aggregation, from minimum to maximum, including conventional aggregations, such as mean and median. In the fuzzy logic field, these aggregations are called ordered weighted averages (OWAs). Here, we present a method for learning LOS operators from training data, viz., data for which you know the output of the desired LOS. We then extend the learning process with regularization, such that a lower complexity or sparse LOS can be learned. Hence, we discuss what 'lower complexity' means in this context and how to represent that in the optimization procedure. Finally, we apply our learning methods to the well-known constant-false-alarm-rate (CFAR) detection problem, specifically for the case of background levels modeled by long-tailed distributions, such as the K-distribution. These backgrounds arise in several pertinent imaging problems, including the modeling of clutter in synthetic aperture radar and sonar (SAR and SAS) and in wireless communications.

  19. Bayesian Sensitivity Analysis of Statistical Models with Missing Data

    PubMed Central

    ZHU, HONGTU; IBRAHIM, JOSEPH G.; TANG, NIANSHENG

    2013-01-01

    Methods for handling missing data depend strongly on the mechanism that generated the missing values, such as missing completely at random (MCAR) or missing at random (MAR), as well as other distributional and modeling assumptions at various stages. It is well known that the resulting estimates and tests may be sensitive to these assumptions as well as to outlying observations. In this paper, we introduce various perturbations to modeling assumptions and individual observations, and then develop a formal sensitivity analysis to assess these perturbations in the Bayesian analysis of statistical models with missing data. We develop a geometric framework, called the Bayesian perturbation manifold, to characterize the intrinsic structure of these perturbations. We propose several intrinsic influence measures to perform sensitivity analysis and quantify the effect of various perturbations to statistical models. We use the proposed sensitivity analysis procedure to systematically investigate the tenability of the non-ignorable missing at random (NMAR) assumption. Simulation studies are conducted to evaluate our methods, and a dataset is analyzed to illustrate the use of our diagnostic measures. PMID:24753718

  20. Normalization, bias correction, and peak calling for ChIP-seq

    PubMed Central

    Diaz, Aaron; Park, Kiyoub; Lim, Daniel A.; Song, Jun S.

    2012-01-01

    Next-generation sequencing is rapidly transforming our ability to profile the transcriptional, genetic, and epigenetic states of a cell. In particular, sequencing DNA from the immunoprecipitation of protein-DNA complexes (ChIP-seq) and methylated DNA (MeDIP-seq) can reveal the locations of protein binding sites and epigenetic modifications. These approaches contain numerous biases which may significantly influence the interpretation of the resulting data. Rigorous computational methods for detecting and removing such biases are still lacking. Also, multi-sample normalization still remains an important open problem. This theoretical paper systematically characterizes the biases and properties of ChIP-seq data by comparing 62 separate publicly available datasets, using rigorous statistical models and signal processing techniques. Statistical methods for separating ChIP-seq signal from background noise, as well as correcting enrichment test statistics for sequence-dependent and sonication biases, are presented. Our method effectively separates reads into signal and background components prior to normalization, improving the signal-to-noise ratio. Moreover, most peak callers currently use a generic null model which suffers from low specificity at the sensitivity level requisite for detecting subtle, but true, ChIP enrichment. The proposed method of determining a cell type-specific null model, which accounts for cell type-specific biases, is shown to be capable of achieving a lower false discovery rate at a given significance threshold than current methods. PMID:22499706

  1. Hold My Calls: An Activity for Introducing the Statistical Process

    ERIC Educational Resources Information Center

    Abel, Todd; Poling, Lisa

    2015-01-01

    Working with practicing teachers, this article demonstrates, through the facilitation of a statistical activity, how to introduce and investigate the unique qualities of the statistical process including: formulate a question, collect data, analyze data, and interpret data.

  2. Health Statistics

    MedlinePlus

    ... births, deaths, marriages, and divorces are sometimes called "vital statistics." Researchers use statistics to see patterns of diseases in groups of people. This can help in figuring out who is at risk for certain diseases, finding ways to control diseases and deciding which diseases ...

  3. The Reasoning behind Informal Statistical Inference

    ERIC Educational Resources Information Center

    Makar, Katie; Bakker, Arthur; Ben-Zvi, Dani

    2011-01-01

    Informal statistical inference (ISI) has been a frequent focus of recent research in statistics education. Considering the role that context plays in developing ISI calls into question the need to be more explicit about the reasoning that underpins ISI. This paper uses educational literature on informal statistical inference and philosophical…

  4. Properties of permutation-based gene tests and controlling type 1 error using a summary statistic based gene test

    PubMed Central

    2013-01-01

    Background The advent of genome-wide association studies has led to many novel disease-SNP associations, opening the door to focused study on their biological underpinnings. Because of the importance of analyzing these associations, numerous statistical methods have been devoted to them. However, fewer methods have attempted to associate entire genes or genomic regions with outcomes, which is potentially more useful knowledge from a biological perspective and those methods currently implemented are often permutation-based. Results One property of some permutation-based tests is that their power varies as a function of whether significant markers are in regions of linkage disequilibrium (LD) or not, which we show from a theoretical perspective. We therefore develop two methods for quantifying the degree of association between a genomic region and outcome, both of whose power does not vary as a function of LD structure. One method uses dimension reduction to “filter” redundant information when significant LD exists in the region, while the other, called the summary-statistic test, controls for LD by scaling marker Z-statistics using knowledge of the correlation matrix of markers. An advantage of this latter test is that it does not require the original data, but only their Z-statistics from univariate regressions and an estimate of the correlation structure of markers, and we show how to modify the test to protect the type 1 error rate when the correlation structure of markers is misspecified. We apply these methods to sequence data of oral cleft and compare our results to previously proposed gene tests, in particular permutation-based ones. We evaluate the versatility of the modification of the summary-statistic test since the specification of correlation structure between markers can be inaccurate. Conclusion We find a significant association in the sequence data between the 8q24 region and oral cleft using our dimension reduction approach and a borderline significant association using the summary-statistic based approach. We also implement the summary-statistic test using Z-statistics from an already-published GWAS of Chronic Obstructive Pulmonary Disorder (COPD) and correlation structure obtained from HapMap. We experiment with the modification of this test because the correlation structure is assumed imperfectly known. PMID:24199751

  5. Properties of permutation-based gene tests and controlling type 1 error using a summary statistic based gene test.

    PubMed

    Swanson, David M; Blacker, Deborah; Alchawa, Taofik; Ludwig, Kerstin U; Mangold, Elisabeth; Lange, Christoph

    2013-11-07

    The advent of genome-wide association studies has led to many novel disease-SNP associations, opening the door to focused study on their biological underpinnings. Because of the importance of analyzing these associations, numerous statistical methods have been devoted to them. However, fewer methods have attempted to associate entire genes or genomic regions with outcomes, which is potentially more useful knowledge from a biological perspective and those methods currently implemented are often permutation-based. One property of some permutation-based tests is that their power varies as a function of whether significant markers are in regions of linkage disequilibrium (LD) or not, which we show from a theoretical perspective. We therefore develop two methods for quantifying the degree of association between a genomic region and outcome, both of whose power does not vary as a function of LD structure. One method uses dimension reduction to "filter" redundant information when significant LD exists in the region, while the other, called the summary-statistic test, controls for LD by scaling marker Z-statistics using knowledge of the correlation matrix of markers. An advantage of this latter test is that it does not require the original data, but only their Z-statistics from univariate regressions and an estimate of the correlation structure of markers, and we show how to modify the test to protect the type 1 error rate when the correlation structure of markers is misspecified. We apply these methods to sequence data of oral cleft and compare our results to previously proposed gene tests, in particular permutation-based ones. We evaluate the versatility of the modification of the summary-statistic test since the specification of correlation structure between markers can be inaccurate. We find a significant association in the sequence data between the 8q24 region and oral cleft using our dimension reduction approach and a borderline significant association using the summary-statistic based approach. We also implement the summary-statistic test using Z-statistics from an already-published GWAS of Chronic Obstructive Pulmonary Disorder (COPD) and correlation structure obtained from HapMap. We experiment with the modification of this test because the correlation structure is assumed imperfectly known.

  6. Statistical characterization of portal images and noise from portal imaging systems.

    PubMed

    González-López, Antonio; Morales-Sánchez, Juan; Verdú-Monedero, Rafael; Larrey-Ruiz, Jorge

    2013-06-01

    In this paper, we consider the statistical characteristics of the so-called portal images, which are acquired prior to the radiotherapy treatment, as well as the noise that present the portal imaging systems, in order to analyze whether the well-known noise and image features in other image modalities, such as natural image, can be found in the portal imaging modality. The study is carried out in the spatial image domain, in the Fourier domain, and finally in the wavelet domain. The probability density of the noise in the spatial image domain, the power spectral densities of the image and noise, and the marginal, joint, and conditional statistical distributions of the wavelet coefficients are estimated. Moreover, the statistical dependencies between noise and signal are investigated. The obtained results are compared with practical and useful references, like the characteristics of the natural image and the white noise. Finally, we discuss the implication of the results obtained in several noise reduction methods that operate in the wavelet domain.

  7. The writer independent online handwriting recognition system frog on hand and cluster generative statistical dynamic time warping.

    PubMed

    Bahlmann, Claus; Burkhardt, Hans

    2004-03-01

    In this paper, we give a comprehensive description of our writer-independent online handwriting recognition system frog on hand. The focus of this work concerns the presentation of the classification/training approach, which we call cluster generative statistical dynamic time warping (CSDTW). CSDTW is a general, scalable, HMM-based method for variable-sized, sequential data that holistically combines cluster analysis and statistical sequence modeling. It can handle general classification problems that rely on this sequential type of data, e.g., speech recognition, genome processing, robotics, etc. Contrary to previous attempts, clustering and statistical sequence modeling are embedded in a single feature space and use a closely related distance measure. We show character recognition experiments of frog on hand using CSDTW on the UNIPEN online handwriting database. The recognition accuracy is significantly higher than reported results of other handwriting recognition systems. Finally, we describe the real-time implementation of frog on hand on a Linux Compaq iPAQ embedded device.

  8. Region-specific network plasticity in simulated and living cortical networks: comparison of the center of activity trajectory (CAT) with other statistics

    NASA Astrophysics Data System (ADS)

    Chao, Zenas C.; Bakkum, Douglas J.; Potter, Steve M.

    2007-09-01

    Electrically interfaced cortical networks cultured in vitro can be used as a model for studying the network mechanisms of learning and memory. Lasting changes in functional connectivity have been difficult to detect with extracellular multi-electrode arrays using standard firing rate statistics. We used both simulated and living networks to compare the ability of various statistics to quantify functional plasticity at the network level. Using a simulated integrate-and-fire neural network, we compared five established statistical methods to one of our own design, called center of activity trajectory (CAT). CAT, which depicts dynamics of the location-weighted average of spatiotemporal patterns of action potentials across the physical space of the neuronal circuitry, was the most sensitive statistic for detecting tetanus-induced plasticity in both simulated and living networks. By reducing the dimensionality of multi-unit data while still including spatial information, CAT allows efficient real-time computation of spatiotemporal activity patterns. Thus, CAT will be useful for studies in vivo or in vitro in which the locations of recording sites on multi-electrode probes are important.

  9. Are EMS call volume predictions based on demand pattern analysis accurate?

    PubMed

    Brown, Lawrence H; Lerner, E Brooke; Larmon, Baxter; LeGassick, Todd; Taigman, Michael

    2007-01-01

    Most EMS systems determine the number of crews they will deploy in their communities and when those crews will be scheduled based on anticipated call volumes. Many systems use historical data to calculate their anticipated call volumes, a method of prediction known as demand pattern analysis. To evaluate the accuracy of call volume predictions calculated using demand pattern analysis. Seven EMS systems provided 73 consecutive weeks of hourly call volume data. The first 20 weeks of data were used to calculate three common demand pattern analysis constructs for call volume prediction: average peak demand (AP), smoothed average peak demand (SAP), and 90th percentile rank (90%R). The 21st week served as a buffer. Actual call volumes in the last 52 weeks were then compared to the predicted call volumes by using descriptive statistics. There were 61,152 hourly observations in the test period. All three constructs accurately predicted peaks and troughs in call volume but not exact call volume. Predictions were accurate (+/-1 call) 13% of the time using AP, 10% using SAP, and 19% using 90%R. Call volumes were overestimated 83% of the time using AP, 86% using SAP, and 74% using 90%R. When call volumes were overestimated, predictions exceeded actual call volume by a median (Interquartile range) of 4 (2-6) calls for AP, 4 (2-6) for SAP, and 3 (2-5) for 90%R. Call volumes were underestimated 4% of time using AP, 4% using SAP, and 7% using 90%R predictions. When call volumes were underestimated, call volumes exceeded predictions by a median (Interquartile range; maximum under estimation) of 1 (1-2; 18) call for AP, 1 (1-2; 18) for SAP, and 2 (1-3; 20) for 90%R. Results did not vary between systems. Generally, demand pattern analysis estimated or overestimated call volume, making it a reasonable predictor for ambulance staffing patterns. However, it did underestimate call volume between 4% and 7% of the time. Communities need to determine if these rates of over-and underestimation are acceptable given their resources and local priorities.

  10. Calls Forecast for the Moscow Ambulance Service. The Impact of Weather Forecast

    NASA Astrophysics Data System (ADS)

    Gordin, Vladimir; Bykov, Philipp

    2015-04-01

    We use the known statistics of the calls for the current and previous days to predict them for tomorrow and for the following days. We assume that this algorithm will work operatively, will cyclically update the available information and will move the horizon of the forecast. Sure, the accuracy of such forecasts depends on their lead time, and from a choice of some group of diagnoses. For comparison we used the error of the inertial forecast (tomorrow there will be the same number of calls as today). Our technology has demonstrated accuracy that is approximately two times better compared to the inertial forecast. We obtained the following result: the number of calls depends on the actual weather in the city as well as on its rate of change. We were interested in the accuracy of the forecast for 12-hour sum of the calls in real situations. We evaluate the impact of the meteorological errors [1] on the forecast errors of the number of Ambulance calls. The weather and the Ambulance calls number both have seasonal tendencies. Therefore, if we have medical information from one city only, we should separate the impacts of such predictors as "annual variations in the number of calls" and "weather". We need to consider the seasonal tendencies (associated, e. g. with the seasonal migration of the population) and the impact of the air temperature simultaneously, rather than sequentially. We forecasted separately the number of calls with diagnoses of cardiovascular group, where it was demonstrated the advantage of the forecasting method, when we use the maximum daily air temperature as a predictor. We have a chance to evaluate statistically the influence of meteorological factors on the dynamics of medical problems. In some cases it may be useful for understanding of the physiology of disease and possible treatment options. We can assimilate some personal archives of medical parameters for the individuals with concrete diseases and the relative meteorological archive. As a result we hope to evaluate how weather can influence the intensity of the disease. Thus, the knowledge of the weather forecast for several days will help us to predict a state of health. The person will be able to take some proactive actions to avoid the anticipated worsening of his health. Literature 1. A. N. Bagrov, F. L. Bykov, V. A. Gordin. Complex Forecast of Surface Meteorological Parameters. Meteorology and Hydrology, 2014, N 5, 5-16 (Russian), 283-291 (English). 2. Bykov, Ph.L., Gordin, V.A., Objective Analysis of the Structure of Three-Dimensional Atmospheric Fronts. Izvestia of Russian Academy of Sciences. Ser. The Physics of Atmosphere and Ocean, 48 (2) (2012), 172-188 (Russian), 152-168 (English), http://dx.doi.org/10.1134/S0001433812020053 3. V.A.Gordin. Mathematical Problems and Methods in Hydrodynamical Weather Forecasting. Amsterdam etc.: Gordon & Breach Publ. House, 2000. 4. V.A.Gordin. Mathematics, Computer, Weather Forecasting, and Other Mathematical Physics' Scenarios. Moscow, Fizmatlit, 2010, 2012 (Russian).

  11. Cost-Effectiveness Analysis: a proposal of new reporting standards in statistical analysis

    PubMed Central

    Bang, Heejung; Zhao, Hongwei

    2014-01-01

    Cost-effectiveness analysis (CEA) is a method for evaluating the outcomes and costs of competing strategies designed to improve health, and has been applied to a variety of different scientific fields. Yet, there are inherent complexities in cost estimation and CEA from statistical perspectives (e.g., skewness, bi-dimensionality, and censoring). The incremental cost-effectiveness ratio that represents the additional cost per one unit of outcome gained by a new strategy has served as the most widely accepted methodology in the CEA. In this article, we call for expanded perspectives and reporting standards reflecting a more comprehensive analysis that can elucidate different aspects of available data. Specifically, we propose that mean and median-based incremental cost-effectiveness ratios and average cost-effectiveness ratios be reported together, along with relevant summary and inferential statistics as complementary measures for informed decision making. PMID:24605979

  12. Toward the detection of gravitational waves under non-Gaussian noises I. Locally optimal statistic.

    PubMed

    Yokoyama, Jun'ichi

    2014-01-01

    After reviewing the standard hypothesis test and the matched filter technique to identify gravitational waves under Gaussian noises, we introduce two methods to deal with non-Gaussian stationary noises. We formulate the likelihood ratio function under weakly non-Gaussian noises through the Edgeworth expansion and strongly non-Gaussian noises in terms of a new method we call Gaussian mapping where the observed marginal distribution and the two-body correlation function are fully taken into account. We then apply these two approaches to Student's t-distribution which has a larger tails than Gaussian. It is shown that while both methods work well in the case the non-Gaussianity is small, only the latter method works well for highly non-Gaussian case.

  13. What do consumers want to know about antibiotics? Analysis of a medicines call centre database.

    PubMed

    Hawke, Kate L; McGuire, Treasure M; Ranmuthugala, Geetha; van Driel, Mieke L

    2016-02-01

    Australia is one of the highest users of antibiotics in the developed world. This study aimed to identify consumer antibiotic information needs to improve targeting of medicines information. We conducted a retrospective, mixed-method study of consumers' antibiotic-related calls to Australia's National Prescribing Service (NPS) Medicines Line from September 2002 to June 2010. Demographic and question data were analysed, and the most common enquiry type in each age group was explored for key narrative themes. Relative antibiotic call frequencies were determined by comparing number of calls to antibiotic utilization in Australian Statistics on Medicines (ASM) data. Between 2002 and 2010, consumers made 8696 antibiotic calls to Medicines Line. The most common reason was questions about the role of their medicine (22.4%). Patient age groups differed in enquiry pattern, with more questions about lactation in the 0- to 4-year age group (33.6%), administration (5-14 years: 32.4%), interactions (15-24 years: 33.4% and 25-54 years: 23.3%) and role of the medicine (55 years and over: 26.6%). Key themes were identified for each age group. Relative to use in the community, antibiotics most likely to attract consumer calls were ciprofloxacin (18.0 calls/100,000 ASM prescriptions) and metronidazole (12.9 calls/100,000 ASM prescriptions), with higher call rates than the most commonly prescribed antibiotic amoxicillin (3.9 calls/100,000 ASM prescriptions). Consumers' knowledge gaps and concerns about antibiotics vary with age, and certain antibiotics generate greater concern relative to their usage. Clinicians should target medicines information to proactively address consumer concerns. © The Author 2015. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  14. Usage of Video Analysis of Traffic Conflicts for the Evaluation of Inappropriately Designed Building Elements on Intersections

    NASA Astrophysics Data System (ADS)

    Krivda, Vladislav; Petru, Jan

    2018-04-01

    Human society is constantly evolving in all areas. The transport is no exception – especially development of road transport is considerable and visible on all levels. However, it is necessary to look for tools that can harmonize economic and social progress with the full preservation of the environment for future generations. The so-called sustainable development of road transport is also closely related to the issue of creating safe transport infrastructure. For example, roads must be in compliance with technical rules on one hand and on the other one they have to be clearly understandable for all road participants (drivers, pedestrians, etc.). Unfortunately, even if the designer of transport construction meets the technical requirements, the result is a dangerous place where very frequently the accidents or traffic conflicts take place. There exist statistics of accidents, but on the contrary, there are not any statistics of traffic conflicts. And simply monitoring the traffic conflicts can lead to early detection of problematic places in road infrastructure and thus increasing the safety in the road traffic. There are many methods used for monitoring traffic conflicts. The method used in the Czech Republic is described in the submitted paper. The original method, which monitored mainly and only the behaviour of drivers and their mistakes, was innovated, so that it could be used also for evaluation of inappropriately designed building elements of road constructions. Therefore, the paper deals with the description of so-called Innovated Video Analysis of Traffic Conflicts (IVATC) and with the usage of this method for one particular example of intersection, where larger vehicles (buses, trucks, etc.) had problems with passing through it. It points out to the fact that even a relatively small change can lead to the increase of the road safety.

  15. Associations Between the Department of Veterans Affairs' Suicide Prevention Campaign and Calls to Related Crisis Lines

    PubMed Central

    Bossarte, Robert M.; Lu, Naiji; Tu, Xin; Stephens, Brady; Draper, John; Kemp, Janet E.

    2014-01-01

    Objective The Transit Authority Suicide Prevention (TASP) campaign was launched by the Department of Veterans Affairs (VA) in a limited number of U.S. cities to promote the use of crisis lines among veterans of military service. Methods We obtained the daily number of calls to the VCL and National Suicide Prevention Lifeline (NSPL) for six implementation cities (where the campaign was active) and four control cities (where there was no TASP campaign messaging) for a 14-month period. To identify changes in call volume associated with campaign implementation, VCL and NSPL daily call counts for three time periods of equal length (pre-campaign, during campaign, and post-campaign) were modeled using a Poisson log-linear regression with inference based on the generalized estimating equations. Results Statistically significant increases in calls to both the VCL and the NSPL were reported during the TASP campaign in implementation cities, but were not reported in control cities during or following the campaign. Secondary outcome measures were also reported for the VCL and included the percentage of callers who are veterans, and calls resulting in a rescue during the study period. Conclusions Results from this study reveal some promise for suicide prevention messaging to promote the use of telephone crisis services and contribute to an emerging area of research examining the effects of campaigns on help seeking. PMID:25364053

  16. Acoustic Mechanisms of a Species-Based Discrimination of the chick-a-dee Call in Sympatric Black-Capped (Poecile atricapillus) and Mountain Chickadees (P. gambeli).

    PubMed

    Guillette, Lauren M; Farrell, Tara M; Hoeschele, Marisa; Sturdy, Christopher B

    2010-01-01

    Previous perceptual research with black-capped and mountain chickadees has demonstrated that these species treat each other's namesake chick-a-dee calls as belonging to separate, open-ended categories. Further, the terminal dee portion of the call has been implicated as the most prominent species marker. However, statistical classification using acoustic summary features suggests that all note-types contained within the chick-a-dee call should be sufficient for species classification. The current study seeks to better understand the note-type based mechanisms underlying species-based classification of the chick-a-dee call by black-capped and mountain chickadees. In two, complementary, operant discrimination experiments, both species were trained to discriminate the species of the signaler using either entire chick-a-dee calls, or individual note-types from chick-a-dee calls. In agreement with previous perceptual work we find that the D note had significant stimulus control over species-based discrimination. However, in line with statistical classifications, we find that all note-types carry species information. We discuss reasons why the most easily discriminated note-types are likely candidates to carry species-based cues.

  17. Bias Characterization in Probabilistic Genotype Data and Improved Signal Detection with Multiple Imputation

    PubMed Central

    Palmer, Cameron; Pe’er, Itsik

    2016-01-01

    Missing data are an unavoidable component of modern statistical genetics. Different array or sequencing technologies cover different single nucleotide polymorphisms (SNPs), leading to a complicated mosaic pattern of missingness where both individual genotypes and entire SNPs are sporadically absent. Such missing data patterns cannot be ignored without introducing bias, yet cannot be inferred exclusively from nonmissing data. In genome-wide association studies, the accepted solution to missingness is to impute missing data using external reference haplotypes. The resulting probabilistic genotypes may be analyzed in the place of genotype calls. A general-purpose paradigm, called Multiple Imputation (MI), is known to model uncertainty in many contexts, yet it is not widely used in association studies. Here, we undertake a systematic evaluation of existing imputed data analysis methods and MI. We characterize biases related to uncertainty in association studies, and find that bias is introduced both at the imputation level, when imputation algorithms generate inconsistent genotype probabilities, and at the association level, when analysis methods inadequately model genotype uncertainty. We find that MI performs at least as well as existing methods or in some cases much better, and provides a straightforward paradigm for adapting existing genotype association methods to uncertain data. PMID:27310603

  18. Chemical databases evaluated by order theoretical tools.

    PubMed

    Voigt, Kristina; Brüggemann, Rainer; Pudenz, Stefan

    2004-10-01

    Data on environmental chemicals are urgently needed to comply with the future chemicals policy in the European Union. The availability of data on parameters and chemicals can be evaluated by chemometrical and environmetrical methods. Different mathematical and statistical methods are taken into account in this paper. The emphasis is set on a new, discrete mathematical method called METEOR (method of evaluation by order theory). Application of the Hasse diagram technique (HDT) of the complete data-matrix comprising 12 objects (databases) x 27 attributes (parameters + chemicals) reveals that ECOTOX (ECO), environmental fate database (EFD) and extoxnet (EXT)--also called multi-database databases--are best. Most single databases which are specialised are found in a minimal position in the Hasse diagram; these are biocatalysis/biodegradation database (BID), pesticide database (PES) and UmweltInfo (UMW). The aggregation of environmental parameters and chemicals (equal weight) leads to a slimmer data-matrix on the attribute side. However, no significant differences are found in the "best" and "worst" objects. The whole approach indicates a rather bad situation in terms of the availability of data on existing chemicals and hence an alarming signal concerning the new and existing chemicals policies of the EEC.

  19. Dictionary-learning-based reconstruction method for electron tomography.

    PubMed

    Liu, Baodong; Yu, Hengyong; Verbridge, Scott S; Sun, Lizhi; Wang, Ge

    2014-01-01

    Electron tomography usually suffers from so-called “missing wedge” artifacts caused by limited tilt angle range. An equally sloped tomography (EST) acquisition scheme (which should be called the linogram sampling scheme) was recently applied to achieve 2.4-angstrom resolution. On the other hand, a compressive sensing inspired reconstruction algorithm, known as adaptive dictionary based statistical iterative reconstruction (ADSIR), has been reported for X-ray computed tomography. In this paper, we evaluate the EST, ADSIR, and an ordered-subset simultaneous algebraic reconstruction technique (OS-SART), and compare the ES and equally angled (EA) data acquisition modes. Our results show that OS-SART is comparable to EST, and the ADSIR outperforms EST and OS-SART. Furthermore, the equally sloped projection data acquisition mode has no advantage over the conventional equally angled mode in this context.

  20. Developing Statistical Knowledge for Teaching during Design-Based Research

    ERIC Educational Resources Information Center

    Groth, Randall E.

    2017-01-01

    Statistical knowledge for teaching is not precisely equivalent to statistics subject matter knowledge. Teachers must know how to make statistics understandable to others as well as understand the subject matter themselves. This dual demand on teachers calls for the development of viable teacher education models. This paper offers one such model,…

  1. Using GAISE and NCTM Standards as Frameworks for Teaching Probability and Statistics to Pre-Service Elementary and Middle School Mathematics Teachers

    ERIC Educational Resources Information Center

    Metz, Mary Louise

    2010-01-01

    Statistics education has become an increasingly important component of the mathematics education of today's citizens. In part to address the call for a more statistically literate citizenship, The "Guidelines for Assessment and Instruction in Statistics Education (GAISE)" were developed in 2005 by the American Statistical Association. These…

  2. Limited plasticity in the phenotypic variance-covariance matrix for male advertisement calls in the black field cricket, Teleogryllus commodus

    PubMed Central

    Pitchers, W. R.; Brooks, R.; Jennions, M. D.; Tregenza, T.; Dworkin, I.; Hunt, J.

    2013-01-01

    Phenotypic integration and plasticity are central to our understanding of how complex phenotypic traits evolve. Evolutionary change in complex quantitative traits can be predicted using the multivariate breeders’ equation, but such predictions are only accurate if the matrices involved are stable over evolutionary time. Recent work, however, suggests that these matrices are temporally plastic, spatially variable and themselves evolvable. The data available on phenotypic variance-covariance matrix (P) stability is sparse, and largely focused on morphological traits. Here we compared P for the structure of the complex sexual advertisement call of six divergent allopatric populations of the Australian black field cricket, Teleogryllus commodus. We measured a subset of calls from wild-caught crickets from each of the populations and then a second subset after rearing crickets under common-garden conditions for three generations. In a second experiment, crickets from each population were reared in the laboratory on high- and low-nutrient diets and their calls recorded. In both experiments, we estimated P for call traits and used multiple methods to compare them statistically (Flury hierarchy, geometric subspace comparisons and random skewers). Despite considerable variation in means and variances of individual call traits, the structure of P was largely conserved among populations, across generations and between our rearing diets. Our finding that P remains largely stable, among populations and between environmental conditions, suggests that selection has preserved the structure of call traits in order that they can function as an integrated unit. PMID:23530814

  3. Anchoring quartet-based phylogenetic distances and applications to species tree reconstruction.

    PubMed

    Sayyari, Erfan; Mirarab, Siavash

    2016-11-11

    Inferring species trees from gene trees using the coalescent-based summary methods has been the subject of much attention, yet new scalable and accurate methods are needed. We introduce DISTIQUE, a new statistically consistent summary method for inferring species trees from gene trees under the coalescent model. We generalize our results to arbitrary phylogenetic inference problems; we show that two arbitrarily chosen leaves, called anchors, can be used to estimate relative distances between all other pairs of leaves by inferring relevant quartet trees. This results in a family of distance-based tree inference methods, with running times ranging between quadratic to quartic in the number of leaves. We show in simulated studies that DISTIQUE has comparable accuracy to leading coalescent-based summary methods and reduced running times.

  4. Appplication of statistical mechanical methods to the modeling of social networks

    NASA Astrophysics Data System (ADS)

    Strathman, Anthony Robert

    With the recent availability of large-scale social data sets, social networks have become open to quantitative analysis via the methods of statistical physics. We examine the statistical properties of a real large-scale social network, generated from cellular phone call-trace logs. We find this network, like many other social networks to be assortative (r = 0.31) and clustered (i.e., strongly transitive, C = 0.21). We measure fluctuation scaling to identify the presence of internal structure in the network and find that structural inhomogeneity effectively disappears at the scale of a few hundred nodes, though there is no sharp cutoff. We introduce an agent-based model of social behavior, designed to model the formation and dissolution of social ties. The model is a modified Metropolis algorithm containing agents operating under the basic sociological constraints of reciprocity, communication need and transitivity. The model introduces the concept of a social temperature. We go on to show that this simple model reproduces the global statistical network features (incl. assortativity, connected fraction, mean degree, clustering, and mean shortest path length) of the real network data and undergoes two phase transitions, one being from a "gas" to a "liquid" state and the second from a liquid to a glassy state as function of this social temperature.

  5. Integration of modern statistical tools for the analysis of climate extremes into the web-GIS “CLIMATE”

    NASA Astrophysics Data System (ADS)

    Ryazanova, A. A.; Okladnikov, I. G.; Gordov, E. P.

    2017-11-01

    The frequency of occurrence and magnitude of precipitation and temperature extreme events show positive trends in several geographical regions. These events must be analyzed and studied in order to better understand their impact on the environment, predict their occurrences, and mitigate their effects. For this purpose, we augmented web-GIS called “CLIMATE” to include a dedicated statistical package developed in the R language. The web-GIS “CLIMATE” is a software platform for cloud storage processing and visualization of distributed archives of spatial datasets. It is based on a combined use of web and GIS technologies with reliable procedures for searching, extracting, processing, and visualizing the spatial data archives. The system provides a set of thematic online tools for the complex analysis of current and future climate changes and their effects on the environment. The package includes new powerful methods of time-dependent statistics of extremes, quantile regression and copula approach for the detailed analysis of various climate extreme events. Specifically, the very promising copula approach allows obtaining the structural connections between the extremes and the various environmental characteristics. The new statistical methods integrated into the web-GIS “CLIMATE” can significantly facilitate and accelerate the complex analysis of climate extremes using only a desktop PC connected to the Internet.

  6. DREAM: An Efficient Methodology for DSMC Simulation of Unsteady Processes

    NASA Astrophysics Data System (ADS)

    Cave, H. M.; Jermy, M. C.; Tseng, K. C.; Wu, J. S.

    2008-12-01

    A technique called the DSMC Rapid Ensemble Averaging Method (DREAM) for reducing the statistical scatter in the output from unsteady DSMC simulations is introduced. During post-processing by DREAM, the DSMC algorithm is re-run multiple times over a short period before the temporal point of interest thus building up a combination of time- and ensemble-averaged sampling data. The particle data is regenerated several mean collision times before the output time using the particle data generated during the original DSMC run. This methodology conserves the original phase space data from the DSMC run and so is suitable for reducing the statistical scatter in highly non-equilibrium flows. In this paper, the DREAM-II method is investigated and verified in detail. Propagating shock waves at high Mach numbers (Mach 8 and 12) are simulated using a parallel DSMC code (PDSC) and then post-processed using DREAM. The ability of DREAM to obtain the correct particle velocity distribution in the shock structure is demonstrated and the reduction of statistical scatter in the output macroscopic properties is measured. DREAM is also used to reduce the statistical scatter in the results from the interaction of a Mach 4 shock with a square cavity and for the interaction of a Mach 12 shock on a wedge in a channel.

  7. A bootstrapping method for development of Treebank

    NASA Astrophysics Data System (ADS)

    Zarei, F.; Basirat, A.; Faili, H.; Mirain, M.

    2017-01-01

    Using statistical approaches beside the traditional methods of natural language processing could significantly improve both the quality and performance of several natural language processing (NLP) tasks. The effective usage of these approaches is subject to the availability of the informative, accurate and detailed corpora on which the learners are trained. This article introduces a bootstrapping method for developing annotated corpora based on a complex and rich linguistically motivated elementary structure called supertag. To this end, a hybrid method for supertagging is proposed that combines both of the generative and discriminative methods of supertagging. The method was applied on a subset of Wall Street Journal (WSJ) in order to annotate its sentences with a set of linguistically motivated elementary structures of the English XTAG grammar that is using a lexicalised tree-adjoining grammar formalism. The empirical results confirm that the bootstrapping method provides a satisfactory way for annotating the English sentences with the mentioned structures. The experiments show that the method could automatically annotate about 20% of WSJ with the accuracy of F-measure about 80% of which is particularly 12% higher than the F-measure of the XTAG Treebank automatically generated from the approach proposed by Basirat and Faili [(2013). Bridge the gap between statistical and hand-crafted grammars. Computer Speech and Language, 27, 1085-1104].

  8. Level set method for image segmentation based on moment competition

    NASA Astrophysics Data System (ADS)

    Min, Hai; Wang, Xiao-Feng; Huang, De-Shuang; Jin, Jing; Wang, Hong-Zhi; Li, Hai

    2015-05-01

    We propose a level set method for image segmentation which introduces the moment competition and weakly supervised information into the energy functional construction. Different from the region-based level set methods which use force competition, the moment competition is adopted to drive the contour evolution. Here, a so-called three-point labeling scheme is proposed to manually label three independent points (weakly supervised information) on the image. Then the intensity differences between the three points and the unlabeled pixels are used to construct the force arms for each image pixel. The corresponding force is generated from the global statistical information of a region-based method and weighted by the force arm. As a result, the moment can be constructed and incorporated into the energy functional to drive the evolving contour to approach the object boundary. In our method, the force arm can take full advantage of the three-point labeling scheme to constrain the moment competition. Additionally, the global statistical information and weakly supervised information are successfully integrated, which makes the proposed method more robust than traditional methods for initial contour placement and parameter setting. Experimental results with performance analysis also show the superiority of the proposed method on segmenting different types of complicated images, such as noisy images, three-phase images, images with intensity inhomogeneity, and texture images.

  9. A Localized Ensemble Kalman Smoother

    NASA Technical Reports Server (NTRS)

    Butala, Mark D.

    2012-01-01

    Numerous geophysical inverse problems prove difficult because the available measurements are indirectly related to the underlying unknown dynamic state and the physics governing the system may involve imperfect models or unobserved parameters. Data assimilation addresses these difficulties by combining the measurements and physical knowledge. The main challenge in such problems usually involves their high dimensionality and the standard statistical methods prove computationally intractable. This paper develops and addresses the theoretical convergence of a new high-dimensional Monte-Carlo approach called the localized ensemble Kalman smoother.

  10. Characterization of Cloud Water-Content Distribution

    NASA Technical Reports Server (NTRS)

    Lee, Seungwon

    2010-01-01

    The development of realistic cloud parameterizations for climate models requires accurate characterizations of subgrid distributions of thermodynamic variables. To this end, a software tool was developed to characterize cloud water-content distributions in climate-model sub-grid scales. This software characterizes distributions of cloud water content with respect to cloud phase, cloud type, precipitation occurrence, and geo-location using CloudSat radar measurements. It uses a statistical method called maximum likelihood estimation to estimate the probability density function of the cloud water content.

  11. Can fatigue affect acquisition of new surgical skills? A prospective trial of pre- and post-call general surgery residents using the da Vinci surgical skills simulator.

    PubMed

    Robison, Weston; Patel, Sonya K; Mehta, Akshat; Senkowski, Tristan; Allen, John; Shaw, Eric; Senkowski, Christopher K

    2018-03-01

    To study the effects of fatigue on general surgery residents' performance on the da Vinci Skills Simulator (dVSS). 15 General Surgery residents from various postgraduate training years (PGY2, PGY3, PGY4, and PGY5) performed 5 simulation tasks on the dVSS as recommended by the Robotic Training Network (RTN). The General Surgery residents had no prior experience with the dVSS. Participants were assigned to either the Pre-call group or Post-call group based on call schedule. As a measure of subjective fatigue, residents were given the Epworth Sleepiness Scale (ESS) prior to their dVSS testing. The dVSS MScore™ software recorded various metrics (Objective Structured Assessment of Technical Skills, OSATS) that were used to evaluate the performance of each resident to compare the robotic simulation proficiency between the Pre-call and Post-call groups. Six general surgery residents were stratified into the Pre-call group and nine into the Post-call group. These residents were also stratified into Fatigued (10) or Nonfatigued (5) groups, as determined by their reported ESS scores. A statistically significant difference was found between the Pre-call and Post-call reported sleep hours (p = 0.036). There was no statistically significant difference between the Pre-call and Post-call groups or between the Fatigued and Nonfatigued groups in time to complete exercise, number of attempts, and high MScore™ score. Despite variation in fatigue levels, there was no effect on the acquisition of robotic simulator skills.

  12. Multiband tissue classification for ultrasonic transmission tomography using spectral profile detection

    NASA Astrophysics Data System (ADS)

    Jeong, Jeong-Won; Kim, Tae-Seong; Shin, Dae-Chul; Do, Synho; Marmarelis, Vasilis Z.

    2004-04-01

    Recently it was shown that soft tissue can be differentiated with spectral unmixing and detection methods that utilize multi-band information obtained from a High-Resolution Ultrasonic Transmission Tomography (HUTT) system. In this study, we focus on tissue differentiation using the spectral target detection method based on Constrained Energy Minimization (CEM). We have developed a new tissue differentiation method called "CEM filter bank". Statistical inference on the output of each CEM filter of a filter bank is used to make a decision based on the maximum statistical significance rather than the magnitude of each CEM filter output. We validate this method through 3-D inter/intra-phantom soft tissue classification where target profiles obtained from an arbitrary single slice are used for differentiation in multiple tomographic slices. Also spectral coherence between target and object profiles of an identical tissue at different slices and phantoms is evaluated by conventional cross-correlation analysis. The performance of the proposed classifier is assessed using Receiver Operating Characteristic (ROC) analysis. Finally we apply our method to classify tiny structures inside a beef kidney such as Styrofoam balls (~1mm), chicken tissue (~5mm), and vessel-duct structures.

  13. A data recipient centered de-identification method to retain statistical attributes.

    PubMed

    Gal, Tamas S; Tucker, Thomas C; Gangopadhyay, Aryya; Chen, Zhiyuan

    2014-08-01

    Privacy has always been a great concern of patients and medical service providers. As a result of the recent advances in information technology and the government's push for the use of Electronic Health Record (EHR) systems, a large amount of medical data is collected and stored electronically. This data needs to be made available for analysis but at the same time patient privacy has to be protected through de-identification. Although biomedical researchers often describe their research plans when they request anonymized data, most existing anonymization methods do not use this information when de-identifying the data. As a result, the anonymized data may not be useful for the planned research project. This paper proposes a data recipient centered approach to tailor the de-identification method based on input from the recipient of the data. We demonstrate our approach through an anonymization project for biomedical researchers with specific goals to improve the utility of the anonymized data for statistical models used for their research project. The selected algorithm improves a privacy protection method called Condensation by Aggarwal et al. Our methods were tested and validated on real cancer surveillance data provided by the Kentucky Cancer Registry. Copyright © 2014 Elsevier Inc. All rights reserved.

  14. The coefficient of determination R2 and intra-class correlation coefficient from generalized linear mixed-effects models revisited and expanded.

    PubMed

    Nakagawa, Shinichi; Johnson, Paul C D; Schielzeth, Holger

    2017-09-01

    The coefficient of determination R 2 quantifies the proportion of variance explained by a statistical model and is an important summary statistic of biological interest. However, estimating R 2 for generalized linear mixed models (GLMMs) remains challenging. We have previously introduced a version of R 2 that we called [Formula: see text] for Poisson and binomial GLMMs, but not for other distributional families. Similarly, we earlier discussed how to estimate intra-class correlation coefficients (ICCs) using Poisson and binomial GLMMs. In this paper, we generalize our methods to all other non-Gaussian distributions, in particular to negative binomial and gamma distributions that are commonly used for modelling biological data. While expanding our approach, we highlight two useful concepts for biologists, Jensen's inequality and the delta method, both of which help us in understanding the properties of GLMMs. Jensen's inequality has important implications for biologically meaningful interpretation of GLMMs, whereas the delta method allows a general derivation of variance associated with non-Gaussian distributions. We also discuss some special considerations for binomial GLMMs with binary or proportion data. We illustrate the implementation of our extension by worked examples from the field of ecology and evolution in the R environment. However, our method can be used across disciplines and regardless of statistical environments. © 2017 The Author(s).

  15. Site specific probability of passive acoustic detection of humpback whale calls from single fixed hydrophones.

    PubMed

    Helble, Tyler A; D'Spain, Gerald L; Hildebrand, John A; Campbell, Gregory S; Campbell, Richard L; Heaney, Kevin D

    2013-09-01

    Passive acoustic monitoring of marine mammal calls is an increasingly important method for assessing population numbers, distribution, and behavior. A common mistake in the analysis of marine mammal acoustic data is formulating conclusions about these animals without first understanding how environmental properties such as bathymetry, sediment properties, water column sound speed, and ocean acoustic noise influence the detection and character of vocalizations in the acoustic data. The approach in this paper is to use Monte Carlo simulations with a full wave field acoustic propagation model to characterize the site specific probability of detection of six types of humpback whale calls at three passive acoustic monitoring locations off the California coast. Results show that the probability of detection can vary by factors greater than ten when comparing detections across locations, or comparing detections at the same location over time, due to environmental effects. Effects of uncertainties in the inputs to the propagation model are also quantified, and the model accuracy is assessed by comparing calling statistics amassed from 24,690 humpback units recorded in the month of October 2008. Under certain conditions, the probability of detection can be estimated with uncertainties sufficiently small to allow for accurate density estimates.

  16. A simple rapid approach using coupled multivariate statistical methods, GIS and trajectory models to delineate areas of common oil spill risk

    NASA Astrophysics Data System (ADS)

    Guillen, George; Rainey, Gail; Morin, Michelle

    2004-04-01

    Currently, the Minerals Management Service uses the Oil Spill Risk Analysis model (OSRAM) to predict the movement of potential oil spills greater than 1000 bbl originating from offshore oil and gas facilities. OSRAM generates oil spill trajectories using meteorological and hydrological data input from either actual physical measurements or estimates generated from other hydrological models. OSRAM and many other models produce output matrices of average, maximum and minimum contact probabilities to specific landfall or target segments (columns) from oil spills at specific points (rows). Analysts and managers are often interested in identifying geographic areas or groups of facilities that pose similar risks to specific targets or groups of targets if a spill occurred. Unfortunately, due to the potentially large matrix generated by many spill models, this question is difficult to answer without the use of data reduction and visualization methods. In our study we utilized a multivariate statistical method called cluster analysis to group areas of similar risk based on potential distribution of landfall target trajectory probabilities. We also utilized ArcView™ GIS to display spill launch point groupings. The combination of GIS and multivariate statistical techniques in the post-processing of trajectory model output is a powerful tool for identifying and delineating areas of similar risk from multiple spill sources. We strongly encourage modelers, statistical and GIS software programmers to closely collaborate to produce a more seamless integration of these technologies and approaches to analyzing data. They are complimentary methods that strengthen the overall assessment of spill risks.

  17. Statistical learning theory for high dimensional prediction: Application to criterion-keyed scale development.

    PubMed

    Chapman, Benjamin P; Weiss, Alexander; Duberstein, Paul R

    2016-12-01

    Statistical learning theory (SLT) is the statistical formulation of machine learning theory, a body of analytic methods common in "big data" problems. Regression-based SLT algorithms seek to maximize predictive accuracy for some outcome, given a large pool of potential predictors, without overfitting the sample. Research goals in psychology may sometimes call for high dimensional regression. One example is criterion-keyed scale construction, where a scale with maximal predictive validity must be built from a large item pool. Using this as a working example, we first introduce a core principle of SLT methods: minimization of expected prediction error (EPE). Minimizing EPE is fundamentally different than maximizing the within-sample likelihood, and hinges on building a predictive model of sufficient complexity to predict the outcome well, without undue complexity leading to overfitting. We describe how such models are built and refined via cross-validation. We then illustrate how 3 common SLT algorithms-supervised principal components, regularization, and boosting-can be used to construct a criterion-keyed scale predicting all-cause mortality, using a large personality item pool within a population cohort. Each algorithm illustrates a different approach to minimizing EPE. Finally, we consider broader applications of SLT predictive algorithms, both as supportive analytic tools for conventional methods, and as primary analytic tools in discovery phase research. We conclude that despite their differences from the classic null-hypothesis testing approach-or perhaps because of them-SLT methods may hold value as a statistically rigorous approach to exploratory regression. (PsycINFO Database Record (c) 2016 APA, all rights reserved).

  18. The New Maia Detector System: Methods For High Definition Trace Element Imaging Of Natural Material

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ryan, C. G.; School of Physics, University of Melbourne, Parkville VIC; CODES Centre of Excellence, University of Tasmania, Hobart TAS

    2010-04-06

    Motivated by the need for megapixel high definition trace element imaging to capture intricate detail in natural material, together with faster acquisition and improved counting statistics in elemental imaging, a large energy-dispersive detector array called Maia has been developed by CSIRO and BNL for SXRF imaging on the XFM beamline at the Australian Synchrotron. A 96 detector prototype demonstrated the capacity of the system for real-time deconvolution of complex spectral data using an embedded implementation of the Dynamic Analysis method and acquiring highly detailed images up to 77 M pixels spanning large areas of complex mineral sample sections.

  19. The New Maia Detector System: Methods For High Definition Trace Element Imaging Of Natural Material

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ryan, C.G.; Siddons, D.P.; Kirkham, R.

    2010-05-25

    Motivated by the need for megapixel high definition trace element imaging to capture intricate detail in natural material, together with faster acquisition and improved counting statistics in elemental imaging, a large energy-dispersive detector array called Maia has been developed by CSIRO and BNL for SXRF imaging on the XFM beamline at the Australian Synchrotron. A 96 detector prototype demonstrated the capacity of the system for real-time deconvolution of complex spectral data using an embedded implementation of the Dynamic Analysis method and acquiring highly detailed images up to 77 M pixels spanning large areas of complex mineral sample sections.

  20. ALCHEMY: a reliable method for automated SNP genotype calling for small batch sizes and highly homozygous populations

    PubMed Central

    Wright, Mark H.; Tung, Chih-Wei; Zhao, Keyan; Reynolds, Andy; McCouch, Susan R.; Bustamante, Carlos D.

    2010-01-01

    Motivation: The development of new high-throughput genotyping products requires a significant investment in testing and training samples to evaluate and optimize the product before it can be used reliably on new samples. One reason for this is current methods for automated calling of genotypes are based on clustering approaches which require a large number of samples to be analyzed simultaneously, or an extensive training dataset to seed clusters. In systems where inbred samples are of primary interest, current clustering approaches perform poorly due to the inability to clearly identify a heterozygote cluster. Results: As part of the development of two custom single nucleotide polymorphism genotyping products for Oryza sativa (domestic rice), we have developed a new genotype calling algorithm called ‘ALCHEMY’ based on statistical modeling of the raw intensity data rather than modelless clustering. A novel feature of the model is the ability to estimate and incorporate inbreeding information on a per sample basis allowing accurate genotyping of both inbred and heterozygous samples even when analyzed simultaneously. Since clustering is not used explicitly, ALCHEMY performs well on small sample sizes with accuracy exceeding 99% with as few as 18 samples. Availability: ALCHEMY is available for both commercial and academic use free of charge and distributed under the GNU General Public License at http://alchemy.sourceforge.net/ Contact: mhw6@cornell.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:20926420

  1. Statistical properties of filtered pseudorandom digital sequences formed from the sum of maximum-length sequences

    NASA Technical Reports Server (NTRS)

    Wallace, G. R.; Weathers, G. D.; Graf, E. R.

    1973-01-01

    The statistics of filtered pseudorandom digital sequences called hybrid-sum sequences, formed from the modulo-two sum of several maximum-length sequences, are analyzed. The results indicate that a relation exists between the statistics of the filtered sequence and the characteristic polynomials of the component maximum length sequences. An analysis procedure is developed for identifying a large group of sequences with good statistical properties for applications requiring the generation of analog pseudorandom noise. By use of the analysis approach, the filtering process is approximated by the convolution of the sequence with a sum of unit step functions. A parameter reflecting the overall statistical properties of filtered pseudorandom sequences is derived. This parameter is called the statistical quality factor. A computer algorithm to calculate the statistical quality factor for the filtered sequences is presented, and the results for two examples of sequence combinations are included. The analysis reveals that the statistics of the signals generated with the hybrid-sum generator are potentially superior to the statistics of signals generated with maximum-length generators. Furthermore, fewer calculations are required to evaluate the statistics of a large group of hybrid-sum generators than are required to evaluate the statistics of the same size group of approximately equivalent maximum-length sequences.

  2. Flux control coefficients determined by inhibitor titration: the design and analysis of experiments to minimize errors.

    PubMed Central

    Small, J R

    1993-01-01

    This paper is a study into the effects of experimental error on the estimated values of flux control coefficients obtained using specific inhibitors. Two possible techniques for analysing the experimental data are compared: a simple extrapolation method (the so-called graph method) and a non-linear function fitting method. For these techniques, the sources of systematic errors are identified and the effects of systematic and random errors are quantified, using both statistical analysis and numerical computation. It is shown that the graph method is very sensitive to random errors and, under all conditions studied, that the fitting method, even under conditions where the assumptions underlying the fitted function do not hold, outperformed the graph method. Possible ways of designing experiments to minimize the effects of experimental errors are analysed and discussed. PMID:8257434

  3. ARK: Aggregation of Reads by K-Means for Estimation of Bacterial Community Composition.

    PubMed

    Koslicki, David; Chatterjee, Saikat; Shahrivar, Damon; Walker, Alan W; Francis, Suzanna C; Fraser, Louise J; Vehkaperä, Mikko; Lan, Yueheng; Corander, Jukka

    2015-01-01

    Estimation of bacterial community composition from high-throughput sequenced 16S rRNA gene amplicons is a key task in microbial ecology. Since the sequence data from each sample typically consist of a large number of reads and are adversely impacted by different levels of biological and technical noise, accurate analysis of such large datasets is challenging. There has been a recent surge of interest in using compressed sensing inspired and convex-optimization based methods to solve the estimation problem for bacterial community composition. These methods typically rely on summarizing the sequence data by frequencies of low-order k-mers and matching this information statistically with a taxonomically structured database. Here we show that the accuracy of the resulting community composition estimates can be substantially improved by aggregating the reads from a sample with an unsupervised machine learning approach prior to the estimation phase. The aggregation of reads is a pre-processing approach where we use a standard K-means clustering algorithm that partitions a large set of reads into subsets with reasonable computational cost to provide several vectors of first order statistics instead of only single statistical summarization in terms of k-mer frequencies. The output of the clustering is then processed further to obtain the final estimate for each sample. The resulting method is called Aggregation of Reads by K-means (ARK), and it is based on a statistical argument via mixture density formulation. ARK is found to improve the fidelity and robustness of several recently introduced methods, with only a modest increase in computational complexity. An open source, platform-independent implementation of the method in the Julia programming language is freely available at https://github.com/dkoslicki/ARK. A Matlab implementation is available at http://www.ee.kth.se/ctsoftware.

  4. Testing alternative ground water models using cross-validation and other methods

    USGS Publications Warehouse

    Foglia, L.; Mehl, S.W.; Hill, M.C.; Perona, P.; Burlando, P.

    2007-01-01

    Many methods can be used to test alternative ground water models. Of concern in this work are methods able to (1) rank alternative models (also called model discrimination) and (2) identify observations important to parameter estimates and predictions (equivalent to the purpose served by some types of sensitivity analysis). Some of the measures investigated are computationally efficient; others are computationally demanding. The latter are generally needed to account for model nonlinearity. The efficient model discrimination methods investigated include the information criteria: the corrected Akaike information criterion, Bayesian information criterion, and generalized cross-validation. The efficient sensitivity analysis measures used are dimensionless scaled sensitivity (DSS), composite scaled sensitivity, and parameter correlation coefficient (PCC); the other statistics are DFBETAS, Cook's D, and observation-prediction statistic. Acronyms are explained in the introduction. Cross-validation (CV) is a computationally intensive nonlinear method that is used for both model discrimination and sensitivity analysis. The methods are tested using up to five alternative parsimoniously constructed models of the ground water system of the Maggia Valley in southern Switzerland. The alternative models differ in their representation of hydraulic conductivity. A new method for graphically representing CV and sensitivity analysis results for complex models is presented and used to evaluate the utility of the efficient statistics. The results indicate that for model selection, the information criteria produce similar results at much smaller computational cost than CV. For identifying important observations, the only obviously inferior linear measure is DSS; the poor performance was expected because DSS does not include the effects of parameter correlation and PCC reveals large parameter correlations. ?? 2007 National Ground Water Association.

  5. Spatio-Chromatic Adaptation via Higher-Order Canonical Correlation Analysis of Natural Images

    PubMed Central

    Gutmann, Michael U.; Laparra, Valero; Hyvärinen, Aapo; Malo, Jesús

    2014-01-01

    Independent component and canonical correlation analysis are two general-purpose statistical methods with wide applicability. In neuroscience, independent component analysis of chromatic natural images explains the spatio-chromatic structure of primary cortical receptive fields in terms of properties of the visual environment. Canonical correlation analysis explains similarly chromatic adaptation to different illuminations. But, as we show in this paper, neither of the two methods generalizes well to explain both spatio-chromatic processing and adaptation at the same time. We propose a statistical method which combines the desirable properties of independent component and canonical correlation analysis: It finds independent components in each data set which, across the two data sets, are related to each other via linear or higher-order correlations. The new method is as widely applicable as canonical correlation analysis, and also to more than two data sets. We call it higher-order canonical correlation analysis. When applied to chromatic natural images, we found that it provides a single (unified) statistical framework which accounts for both spatio-chromatic processing and adaptation. Filters with spatio-chromatic tuning properties as in the primary visual cortex emerged and corresponding-colors psychophysics was reproduced reasonably well. We used the new method to make a theory-driven testable prediction on how the neural response to colored patterns should change when the illumination changes. We predict shifts in the responses which are comparable to the shifts reported for chromatic contrast habituation. PMID:24533049

  6. Spatio-chromatic adaptation via higher-order canonical correlation analysis of natural images.

    PubMed

    Gutmann, Michael U; Laparra, Valero; Hyvärinen, Aapo; Malo, Jesús

    2014-01-01

    Independent component and canonical correlation analysis are two general-purpose statistical methods with wide applicability. In neuroscience, independent component analysis of chromatic natural images explains the spatio-chromatic structure of primary cortical receptive fields in terms of properties of the visual environment. Canonical correlation analysis explains similarly chromatic adaptation to different illuminations. But, as we show in this paper, neither of the two methods generalizes well to explain both spatio-chromatic processing and adaptation at the same time. We propose a statistical method which combines the desirable properties of independent component and canonical correlation analysis: It finds independent components in each data set which, across the two data sets, are related to each other via linear or higher-order correlations. The new method is as widely applicable as canonical correlation analysis, and also to more than two data sets. We call it higher-order canonical correlation analysis. When applied to chromatic natural images, we found that it provides a single (unified) statistical framework which accounts for both spatio-chromatic processing and adaptation. Filters with spatio-chromatic tuning properties as in the primary visual cortex emerged and corresponding-colors psychophysics was reproduced reasonably well. We used the new method to make a theory-driven testable prediction on how the neural response to colored patterns should change when the illumination changes. We predict shifts in the responses which are comparable to the shifts reported for chromatic contrast habituation.

  7. Solar granulation and statistical crystallography: A modeling approach using size-shape relations

    NASA Technical Reports Server (NTRS)

    Noever, D. A.

    1994-01-01

    The irregular polygonal pattern of solar granulation is analyzed for size-shape relations using statistical crystallography. In contrast to previous work which has assumed perfectly hexagonal patterns for granulation, more realistic accounting of cell (granule) shapes reveals a broader basis for quantitative analysis. Several features emerge as noteworthy: (1) a linear correlation between number of cell-sides and neighboring shapes (called Aboav-Weaire's law); (2) a linear correlation between both average cell area and perimeter and the number of cell-sides (called Lewis's law and a perimeter law, respectively) and (3) a linear correlation between cell area and squared perimeter (called convolution index). This statistical picture of granulation is consistent with a finding of no correlation in cell shapes beyond nearest neighbors. A comparative calculation between existing model predictions taken from luminosity data and the present analysis shows substantial agreements for cell-size distributions. A model for understanding grain lifetimes is proposed which links convective times to cell shape using crystallographic results.

  8. Exploratory analysis of real personal emergency response call conversations: considerations for personal emergency response spoken dialogue systems.

    PubMed

    Young, Victoria; Rochon, Elizabeth; Mihailidis, Alex

    2016-11-14

    The purpose of this study was to derive data from real, recorded, personal emergency response call conversations to help improve the artificial intelligence and decision making capability of a spoken dialogue system in a smart personal emergency response system. The main study objectives were to: develop a model of personal emergency response; determine categories for the model's features; identify and calculate measures from call conversations (verbal ability, conversational structure, timing); and examine conversational patterns and relationships between measures and model features applicable for improving the system's ability to automatically identify call model categories and predict a target response. This study was exploratory and used mixed methods. Personal emergency response calls were pre-classified according to call model categories identified qualitatively from response call transcripts. The relationships between six verbal ability measures, three conversational structure measures, two timing measures and three independent factors: caller type, risk level, and speaker type, were examined statistically. Emergency medical response services were the preferred response for the majority of medium and high risk calls for both caller types. Older adult callers mainly requested non-emergency medical service responders during medium risk situations. By measuring the number of spoken words-per-minute and turn-length-in-words for the first spoken utterance of a call, older adult and care provider callers could be identified with moderate accuracy. Average call taker response time was calculated using the number-of-speaker-turns and time-in-seconds measures. Care providers and older adults used different conversational strategies when responding to call takers. The words 'ambulance' and 'paramedic' may hold different latent connotations for different callers. The data derived from the real personal emergency response recordings may help a spoken dialogue system classify incoming calls by caller type with moderate probability shortly after the initial caller utterance. Knowing the caller type, the target response for the call may be predicted with some degree of probability and the output dialogue could be tailored to this caller type. The average call taker response time measured from real calls may be used to limit the conversation length in a spoken dialogue system before defaulting to a live call taker.

  9. Saturation analysis of ChIP-seq data for reproducible identification of binding peaks

    PubMed Central

    Hansen, Peter; Hecht, Jochen; Ibrahim, Daniel M.; Krannich, Alexander; Truss, Matthias; Robinson, Peter N.

    2015-01-01

    Chromatin immunoprecipitation coupled with next-generation sequencing (ChIP-seq) is a powerful technology to identify the genome-wide locations of transcription factors and other DNA binding proteins. Computational ChIP-seq peak calling infers the location of protein–DNA interactions based on various measures of enrichment of sequence reads. In this work, we introduce an algorithm, Q, that uses an assessment of the quadratic enrichment of reads to center candidate peaks followed by statistical analysis of saturation of candidate peaks by 5′ ends of reads. We show that our method not only is substantially faster than several competing methods but also demonstrates statistically significant advantages with respect to reproducibility of results and in its ability to identify peaks with reproducible binding site motifs. We show that Q has superior performance in the delineation of double RNAPII and H3K4me3 peaks surrounding transcription start sites related to a better ability to resolve individual peaks. The method is implemented in C+l+ and is freely available under an open source license. PMID:26163319

  10. Toward the detection of gravitational waves under non-Gaussian noises I. Locally optimal statistic

    PubMed Central

    YOKOYAMA, Jun’ichi

    2014-01-01

    After reviewing the standard hypothesis test and the matched filter technique to identify gravitational waves under Gaussian noises, we introduce two methods to deal with non-Gaussian stationary noises. We formulate the likelihood ratio function under weakly non-Gaussian noises through the Edgeworth expansion and strongly non-Gaussian noises in terms of a new method we call Gaussian mapping where the observed marginal distribution and the two-body correlation function are fully taken into account. We then apply these two approaches to Student’s t-distribution which has a larger tails than Gaussian. It is shown that while both methods work well in the case the non-Gaussianity is small, only the latter method works well for highly non-Gaussian case. PMID:25504231

  11. Cause-specific mortality time series analysis: a general method to detect and correct for abrupt data production changes

    PubMed Central

    2011-01-01

    Background Monitoring the time course of mortality by cause is a key public health issue. However, several mortality data production changes may affect cause-specific time trends, thus altering the interpretation. This paper proposes a statistical method that detects abrupt changes ("jumps") and estimates correction factors that may be used for further analysis. Methods The method was applied to a subset of the AMIEHS (Avoidable Mortality in the European Union, toward better Indicators for the Effectiveness of Health Systems) project mortality database and considered for six European countries and 13 selected causes of deaths. For each country and cause of death, an automated jump detection method called Polydect was applied to the log mortality rate time series. The plausibility of a data production change associated with each detected jump was evaluated through literature search or feedback obtained from the national data producers. For each plausible jump position, the statistical significance of the between-age and between-gender jump amplitude heterogeneity was evaluated by means of a generalized additive regression model, and correction factors were deduced from the results. Results Forty-nine jumps were detected by the Polydect method from 1970 to 2005. Most of the detected jumps were found to be plausible. The age- and gender-specific amplitudes of the jumps were estimated when they were statistically heterogeneous, and they showed greater by-age heterogeneity than by-gender heterogeneity. Conclusion The method presented in this paper was successfully applied to a large set of causes of death and countries. The method appears to be an alternative to bridge coding methods when the latter are not systematically implemented because they are time- and resource-consuming. PMID:21929756

  12. Bayesian learning

    NASA Technical Reports Server (NTRS)

    Denning, Peter J.

    1989-01-01

    In 1983 and 1984, the Infrared Astronomical Satellite (IRAS) detected 5,425 stellar objects and measured their infrared spectra. In 1987 a program called AUTOCLASS used Bayesian inference methods to discover the classes present in these data and determine the most probable class of each object, revealing unknown phenomena in astronomy. AUTOCLASS has rekindled the old debate on the suitability of Bayesian methods, which are computationally intensive, interpret probabilities as plausibility measures rather than frequencies, and appear to depend on a subjective assessment of the probability of a hypothesis before the data were collected. Modern statistical methods have, however, recently been shown to also depend on subjective elements. These debates bring into question the whole tradition of scientific objectivity and offer scientists a new way to take responsibility for their findings and conclusions.

  13. Quantitative trait nucleotide analysis using Bayesian model selection.

    PubMed

    Blangero, John; Goring, Harald H H; Kent, Jack W; Williams, Jeff T; Peterson, Charles P; Almasy, Laura; Dyer, Thomas D

    2005-10-01

    Although much attention has been given to statistical genetic methods for the initial localization and fine mapping of quantitative trait loci (QTLs), little methodological work has been done to date on the problem of statistically identifying the most likely functional polymorphisms using sequence data. In this paper we provide a general statistical genetic framework, called Bayesian quantitative trait nucleotide (BQTN) analysis, for assessing the likely functional status of genetic variants. The approach requires the initial enumeration of all genetic variants in a set of resequenced individuals. These polymorphisms are then typed in a large number of individuals (potentially in families), and marker variation is related to quantitative phenotypic variation using Bayesian model selection and averaging. For each sequence variant a posterior probability of effect is obtained and can be used to prioritize additional molecular functional experiments. An example of this quantitative nucleotide analysis is provided using the GAW12 simulated data. The results show that the BQTN method may be useful for choosing the most likely functional variants within a gene (or set of genes). We also include instructions on how to use our computer program, SOLAR, for association analysis and BQTN analysis.

  14. A STATISTICAL SURVEY OF DIOXIN-LIKE COMPOUNDS IN ...

    EPA Pesticide Factsheets

    The USEPA and the USDA completed the first statistically designed survey of the occurrence and concentration of dibenzo-p-dioxins (CDDs), dibenzofurans (CDFs), and coplanar polychlorinated biphenyls (PCBs) in the fat of beef animals raised for human consumption in the United States. Back fat was sampled from 63 carcasses at federally inspected slaughter establishments nationwide. The sample design called for sampling beef animal classes in proportion to national annual slaughter statistics. All samples were analyzed using a modification of EPA method 1613, using isotope dilution, High Resolution GC/MS to determine the rate of occurrence of 2,3,7,8-substituted CDDs/CDFs/PCBs. The method detection limits ranged from 0.05 ng/kg for TCDD to 3 ng/kg for OCDD. The results of this survey showed a mean concentration (reported as I-TEQ, lipid adjusted) in U.S. beef animals of 0.35 ng/kg and 0.89 ng/kg for CDD/CDF TEQs when either non-detects are treated as 0 value or assigned a value of 1/2 the detection limit, respectively, and 0.51 ng/kg for coplanar PCB TEQs at both non-detect equal 0 and 1/2 detection limit. journal article

  15. A Varying Coefficient Model to Measure the Effectiveness of Mass Media Anti-Smoking Campaigns in Generating Calls to a Quitline

    PubMed Central

    Bui, Quang M.; Huggins, Richard M.; Hwang, Wen-Han; White, Victoria; Erbas, Bircan

    2010-01-01

    Background Anti-smoking advertisements are an effective population-based smoking reduction strategy. The Quitline telephone service provides a first point of contact for adults considering quitting. Because of data complexity, the relationship between anti-smoking advertising placement, intensity, and time trends in total call volume is poorly understood. In this study we use a recently developed semi-varying coefficient model to elucidate this relationship. Methods Semi-varying coefficient models comprise parametric and nonparametric components. The model is fitted to the daily number of calls to Quitline in Victoria, Australia to estimate a nonparametric long-term trend and parametric terms for day-of-the-week effects and to clarify the relationship with target audience rating points (TARPs) for the Quit and nicotine replacement advertising campaigns. Results The number of calls to Quitline increased with the TARP value of both the Quit and other smoking cessation advertisement; the TARP values associated with the Quit program were almost twice as effective. The varying coefficient term was statistically significant for peak periods with little or no advertising. Conclusions Semi-varying coefficient models are useful for modeling public health data when there is little or no information on other factors related to the at-risk population. These models are well suited to modeling call volume to Quitline, because the varying coefficient allowed the underlying time trend to depend on fixed covariates that also vary with time, thereby explaining more of the variation in the call model. PMID:20827036

  16. Stretching and Joint Mobilization Exercises Reduce Call-Center Operators’ Musculoskeletal Discomfort and Fatigue

    PubMed Central

    de Castro Lacaze, Denise Helena; Sacco, Isabel de C. N.; Rocha, Lys Esther; de Bragança Pereira, Carlos Alberto; Casarotto, Raquel Aparecida

    2010-01-01

    AIM: We sought to evaluate musculoskeletal discomfort and mental and physical fatigue in the call-center workers of an airline company before and after a supervised exercise program compared with rest breaks during the work shift. INTRODUCTION: This was a longitudinal pilot study conducted in a flight-booking call-center for an airline in São Paulo, Brazil. Occupational health activities are recommended to decrease the negative effects of the call-center working conditions. In practice, exercise programs are commonly recommended for computer workers, but their effects have not been studied in call-center operators. METHODS: Sixty-four call-center operators participated in this study. Thirty-two subjects were placed into the experimental group and attended a 10-min daily exercise session for 2 months. Conversely, 32 participants were placed into the control group and took a 10-min daily rest break during the same period. Each subject was evaluated once a week by means of the Corlett-Bishop body map with a visual analog discomfort scale and the Chalder fatigue questionnaire. RESULTS: Musculoskeletal discomfort decreased in both groups, but the reduction was only statistically significant for the spine and buttocks (p=0.04) and the sum of the segments (p=0.01) in the experimental group. In addition, the experimental group showed significant differences in the level of mental fatigue, especially in questions related to memory Rienzo, #181ff and tiredness (p=0.001). CONCLUSIONS: Our preliminary results demonstrate that appropriately designed and supervised exercise programs may be more efficient than rest breaks in decreasing discomfort and fatigue levels in call-center operators. PMID:20668622

  17. Transmit Designs for the MIMO Broadcast Channel With Statistical CSI

    NASA Astrophysics Data System (ADS)

    Wu, Yongpeng; Jin, Shi; Gao, Xiqi; McKay, Matthew R.; Xiao, Chengshan

    2014-09-01

    We investigate the multiple-input multiple-output broadcast channel with statistical channel state information available at the transmitter. The so-called linear assignment operation is employed, and necessary conditions are derived for the optimal transmit design under general fading conditions. Based on this, we introduce an iterative algorithm to maximize the linear assignment weighted sum-rate by applying a gradient descent method. To reduce complexity, we derive an upper bound of the linear assignment achievable rate of each receiver, from which a simplified closed-form expression for a near-optimal linear assignment matrix is derived. This reveals an interesting construction analogous to that of dirty-paper coding. In light of this, a low complexity transmission scheme is provided. Numerical examples illustrate the significant performance of the proposed low complexity scheme.

  18. Assimilating NOAA SST data into BSH operational circulation model for North and Baltic Seas

    NASA Astrophysics Data System (ADS)

    Losa, Svetlana; Schroeter, Jens; Nerger, Lars; Janjic, Tijana; Danilov, Sergey; Janssen, Frank

    A data assimilation (DA) system is developed for BSH operational circulation model in order to improve forecast of current velocities, sea surface height, temperature and salinity in the North and Baltic Seas. Assimilated data are NOAA sea surface temperature (SST) data for the following period: 01.10.07 -30.09.08. All data assimilation experiments are based on im-plementation of one of the so-called statistical DA methods -Singular Evolutive Interpolated Kalman (SEIK) filter, -with different ways of prescribing assumed model and data errors statis-tics. Results of the experiments will be shown and compared against each other. Hydrographic data from MARNET stations and sea level at series of tide gauges are used as independent information to validate the data assimilation system. Keywords: Operational Oceanography and forecasting

  19. The Extended Erlang-Truncated Exponential distribution: Properties and application to rainfall data.

    PubMed

    Okorie, I E; Akpanta, A C; Ohakwe, J; Chikezie, D C

    2017-06-01

    The Erlang-Truncated Exponential ETE distribution is modified and the new lifetime distribution is called the Extended Erlang-Truncated Exponential EETE distribution. Some statistical and reliability properties of the new distribution are given and the method of maximum likelihood estimate was proposed for estimating the model parameters. The usefulness and flexibility of the EETE distribution was illustrated with an uncensored data set and its fit was compared with that of the ETE and three other three-parameter distributions. Results based on the minimized log-likelihood ([Formula: see text]), Akaike information criterion (AIC), Bayesian information criterion (BIC) and the generalized Cramér-von Mises [Formula: see text] statistics shows that the EETE distribution provides a more reasonable fit than the one based on the other competing distributions.

  20. A method for developing design diagrams for ceramic and glass materials using fatigue data

    NASA Technical Reports Server (NTRS)

    Heslin, T. M.; Magida, M. B.; Forrest, K. A.

    1986-01-01

    The service lifetime of glass and ceramic materials can be expressed as a plot of time-to-failure versus applied stress whose plot is parametric in percent probability of failure. This type of plot is called a design diagram. Confidence interval estimates for such plots depend on the type of test that is used to generate the data, on assumptions made concerning the statistical distribution of the test results, and on the type of analysis used. This report outlines the development of design diagrams for glass and ceramic materials in engineering terms using static or dynamic fatigue tests, assuming either no particular statistical distribution of test results or a Weibull distribution and using either median value or homologous ratio analysis of the test results.

  1. Statistical Analysis of Complexity Generators for Cost Estimation

    NASA Technical Reports Server (NTRS)

    Rowell, Ginger Holmes

    1999-01-01

    Predicting the cost of cutting edge new technologies involved with spacecraft hardware can be quite complicated. A new feature of the NASA Air Force Cost Model (NAFCOM), called the Complexity Generator, is being developed to model the complexity factors that drive the cost of space hardware. This parametric approach is also designed to account for the differences in cost, based on factors that are unique to each system and subsystem. The cost driver categories included in this model are weight, inheritance from previous missions, technical complexity, and management factors. This paper explains the Complexity Generator framework, the statistical methods used to select the best model within this framework, and the procedures used to find the region of predictability and the prediction intervals for the cost of a mission.

  2. Bayesian statistics in radionuclide metrology: measurement of a decaying source

    NASA Astrophysics Data System (ADS)

    Bochud, François O.; Bailat, Claude J.; Laedermann, Jean-Pascal

    2007-08-01

    The most intuitive way of defining a probability is perhaps through the frequency at which it appears when a large number of trials are realized in identical conditions. The probability derived from the obtained histogram characterizes the so-called frequentist or conventional statistical approach. In this sense, probability is defined as a physical property of the observed system. By contrast, in Bayesian statistics, a probability is not a physical property or a directly observable quantity, but a degree of belief or an element of inference. The goal of this paper is to show how Bayesian statistics can be used in radionuclide metrology and what its advantages and disadvantages are compared with conventional statistics. This is performed through the example of an yttrium-90 source typically encountered in environmental surveillance measurement. Because of the very low activity of this kind of source and the small half-life of the radionuclide, this measurement takes several days, during which the source decays significantly. Several methods are proposed to compute simultaneously the number of unstable nuclei at a given reference time, the decay constant and the background. Asymptotically, all approaches give the same result. However, Bayesian statistics produces coherent estimates and confidence intervals in a much smaller number of measurements. Apart from the conceptual understanding of statistics, the main difficulty that could deter radionuclide metrologists from using Bayesian statistics is the complexity of the computation.

  3. Development and evaluation of statistical shape modeling for principal inner organs on torso CT images.

    PubMed

    Zhou, Xiangrong; Xu, Rui; Hara, Takeshi; Hirano, Yasushi; Yokoyama, Ryujiro; Kanematsu, Masayuki; Hoshi, Hiroaki; Kido, Shoji; Fujita, Hiroshi

    2014-07-01

    The shapes of the inner organs are important information for medical image analysis. Statistical shape modeling provides a way of quantifying and measuring shape variations of the inner organs in different patients. In this study, we developed a universal scheme that can be used for building the statistical shape models for different inner organs efficiently. This scheme combines the traditional point distribution modeling with a group-wise optimization method based on a measure called minimum description length to provide a practical means for 3D organ shape modeling. In experiments, the proposed scheme was applied to the building of five statistical shape models for hearts, livers, spleens, and right and left kidneys by use of 50 cases of 3D torso CT images. The performance of these models was evaluated by three measures: model compactness, model generalization, and model specificity. The experimental results showed that the constructed shape models have good "compactness" and satisfied the "generalization" performance for different organ shape representations; however, the "specificity" of these models should be improved in the future.

  4. SD-MSAEs: Promoter recognition in human genome based on deep feature extraction.

    PubMed

    Xu, Wenxuan; Zhang, Li; Lu, Yaping

    2016-06-01

    The prediction and recognition of promoter in human genome play an important role in DNA sequence analysis. Entropy, in Shannon sense, of information theory is a multiple utility in bioinformatic details analysis. The relative entropy estimator methods based on statistical divergence (SD) are used to extract meaningful features to distinguish different regions of DNA sequences. In this paper, we choose context feature and use a set of methods of SD to select the most effective n-mers distinguishing promoter regions from other DNA regions in human genome. Extracted from the total possible combinations of n-mers, we can get four sparse distributions based on promoter and non-promoters training samples. The informative n-mers are selected by optimizing the differentiating extents of these distributions. Specially, we combine the advantage of statistical divergence and multiple sparse auto-encoders (MSAEs) in deep learning to extract deep feature for promoter recognition. And then we apply multiple SVMs and a decision model to construct a human promoter recognition method called SD-MSAEs. Framework is flexible that it can integrate new feature extraction or new classification models freely. Experimental results show that our method has high sensitivity and specificity. Copyright © 2016 Elsevier Inc. All rights reserved.

  5. OPTIMIZING THROUGH CO-EVOLUTIONARY AVALANCHES

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    S. BOETTCHER; A. PERCUS

    2000-08-01

    We explore a new general-purpose heuristic for finding high-quality solutions to hard optimization problems. The method, called extremal optimization, is inspired by ''self-organized critically,'' a concept introduced to describe emergent complexity in many physical systems. In contrast to Genetic Algorithms which operate on an entire ''gene-pool'' of possible solutions, extremal optimization successively replaces extremely undesirable elements of a sub-optimal solution with new, random ones. Large fluctuations, called ''avalanches,'' ensue that efficiently explore many local optima. Drawing upon models used to simulate far-from-equilibrium dynamics, extremal optimization complements approximation methods inspired by equilibrium statistical physics, such as simulated annealing. With only onemore » adjustable parameter, its performance has proved competitive with more elaborate methods, especially near phase transitions. Those phase transitions are found in the parameter space of most optimization problems, and have recently been conjectured to be the origin of some of the hardest instances in computational complexity. We will demonstrate how extremal optimization can be implemented for a variety of combinatorial optimization problems. We believe that extremal optimization will be a useful tool in the investigation of phase transitions in combinatorial optimization problems, hence valuable in elucidating the origin of computational complexity.« less

  6. Prioritising prevention: implementation of IGT Care Call, a telephone based service for people at risk of developing type 2 diabetes.

    PubMed

    Savas, Linda Ann; Grady, Katherine; Cotterill, Sarah; Summers, Lucinda; Boaden, Ruth; Gibson, J Martin

    2015-02-01

    To design, deliver and evaluate IGT Care Call, a telephone service providing a 6 month lifestyle education programme for people with impaired glucose tolerance (IGT). An observational study of IGT Care Call, a programme providing motivational support and education using electronic scripts. The service was delivered to 55 participants, all of whom completed the course (an information pack and at least five telephone calls over 6 months). Clinical measurements were undertaken in General Practice at baseline, on completion of the programme and one year later. Among the 40 participants for whom we have complete data available, one year after discharge, participants showed improvements in fasting plasma glucose (0.29 mmol/l, 95% CI 0.07 to 0.51), weight (2.81 kg, 95% CI 1.20 to 4.42) and BMI (1.06 kg/m(2), 95% CI 0.49 to 1.63). All differences were statistically significant (p < 0.01). Whilst an uncontrolled observational study with a small sample size, this pilot suggests IGT Care Call may be effective in promoting positive and sustained lifestyle changes to prevent type 2 diabetes, which warrants further investigation. A telephone method of service delivery was acceptable, convenient and may have improved self confidence in how to reduce risk of type 2 diabetes. Copyright © 2014 Primary Care Diabetes Europe. Published by Elsevier Ltd. All rights reserved.

  7. An Example of How to Supplement Goal Setting to Promote Behavior Change for Families Using Motivational Interviewing.

    PubMed

    Draxten, Michelle; Flattum, Colleen; Fulkerson, Jayne

    2016-10-01

    The purpose of this study was to describe the components and use of motivational interviewing (MI) within a behavior change intervention to promote healthful eating and family meals and prevent childhood obesity. The Healthy Home Offerings via the Mealtime Environment (HOME) Plus intervention was part of a two-arm randomized-controlled trial and included 81 families (children 8-12 years old and their parents) in the intervention condition. The intervention included 10 monthly, 2-hour group sessions and 5 bimonthly motivational/goal-setting phone calls. Data were collected for intervention families only at each of the goal-setting calls and a behavior change assessment was administered at the 10th/final group session. Descriptive statistics were used to analyze the MI call data and behavior assessment. Overall group attendance was high (68% attending ≥7 sessions). Motivational/goal-setting phone calls were well accepted by parents, with an 87% average completion rate. More than 85% of the time, families reported meeting their chosen goal between calls. Families completing the behavioral assessment reported the most change in having family meals more often and improving home food healthfulness. Researchers should use a combination of delivery methods using MI when implementing behavior change programs for families to promote goal setting and healthful eating within pediatric obesity interventions.

  8. InSAR Tropospheric Correction Methods: A Statistical Comparison over Different Regions

    NASA Astrophysics Data System (ADS)

    Bekaert, D. P.; Walters, R. J.; Wright, T. J.; Hooper, A. J.; Parker, D. J.

    2015-12-01

    Observing small magnitude surface displacements through InSAR is highly challenging, and requires advanced correction techniques to reduce noise. In fact, one of the largest obstacles facing the InSAR community is related to tropospheric noise correction. Spatial and temporal variations in temperature, pressure, and relative humidity result in a spatially-variable InSAR tropospheric signal, which masks smaller surface displacements due to tectonic or volcanic deformation. Correction methods applied today include those relying on weather model data, GNSS and/or spectrometer data. Unfortunately, these methods are often limited by the spatial and temporal resolution of the auxiliary data. Alternatively a correction can be estimated from the high-resolution interferometric phase by assuming a linear or a power-law relationship between the phase and topography. For these methods, the challenge lies in separating deformation from tropospheric signals. We will present results of a statistical comparison of the state-of-the-art tropospheric corrections estimated from spectrometer products (MERIS and MODIS), a low and high spatial-resolution weather model (ERA-I and WRF), and both the conventional linear and power-law empirical methods. We evaluate the correction capability over Southern Mexico, Italy, and El Hierro, and investigate the impact of increasing cloud cover on the accuracy of the tropospheric delay estimation. We find that each method has its strengths and weaknesses, and suggest that further developments should aim to combine different correction methods. All the presented methods are included into our new open source software package called TRAIN - Toolbox for Reducing Atmospheric InSAR Noise (Bekaert et al., in review), which is available to the community Bekaert, D., R. Walters, T. Wright, A. Hooper, and D. Parker (in review), Statistical comparison of InSAR tropospheric correction techniques, Remote Sensing of Environment

  9. Improving the power of clinical trials of rheumatoid arthritis by using data on continuous scales when analysing response rates: an application of the augmented binary method

    PubMed Central

    Jenkins, Martin

    2016-01-01

    Objective. In clinical trials of RA, it is common to assess effectiveness using end points based upon dichotomized continuous measures of disease activity, which classify patients as responders or non-responders. Although dichotomization generally loses statistical power, there are good clinical reasons to use these end points; for example, to allow for patients receiving rescue therapy to be assigned as non-responders. We adopt a statistical technique called the augmented binary method to make better use of the information provided by these continuous measures and account for how close patients were to being responders. Methods. We adapted the augmented binary method for use in RA clinical trials. We used a previously published randomized controlled trial (Oral SyK Inhibition in Rheumatoid Arthritis-1) to assess its performance in comparison to a standard method treating patients purely as responders or non-responders. The power and error rate were investigated by sampling from this study. Results. The augmented binary method reached similar conclusions to standard analysis methods but was able to estimate the difference in response rates to a higher degree of precision. Results suggested that CI widths for ACR responder end points could be reduced by at least 15%, which could equate to reducing the sample size of a study by 29% to achieve the same statistical power. For other end points, the gain was even higher. Type I error rates were not inflated. Conclusion. The augmented binary method shows considerable promise for RA trials, making more efficient use of patient data whilst still reporting outcomes in terms of recognized response end points. PMID:27338084

  10. Statistical analysis of CCSN/SS7 traffic data from working CCS subnetworks

    NASA Astrophysics Data System (ADS)

    Duffy, Diane E.; McIntosh, Allen A.; Rosenstein, Mark; Willinger, Walter

    1994-04-01

    In this paper, we report on an ongoing statistical analysis of actual CCSN traffic data. The data consist of approximately 170 million signaling messages collected from a variety of different working CCS subnetworks. The key findings from our analysis concern: (1) the characteristics of both the telephone call arrival process and the signaling message arrival process; (2) the tail behavior of the call holding time distribution; and (3) the observed performance of the CCSN with respect to a variety of performance and reliability measurements.

  11. CALL versus Paper: In Which Context Are L1 Glosses More Effective?

    ERIC Educational Resources Information Center

    Taylor, Alan M.

    2013-01-01

    CALL glossing in first language (L1) or second language (L2) texts has been shown by previous studies to be more effective than traditional, paper-and-pen L1 glossing. Using a pool of studies with much more statistical power and more accurate results, this meta-analysis demonstrates more precisely the degree to which CALL L1 glossing can be more…

  12. Analogical Instruction in Statistics: Implications for Social Work Educators

    ERIC Educational Resources Information Center

    Thomas, Leela

    2008-01-01

    This paper examines the use of analogies in statistics instruction. Much has been written about the difficulty social work students have with statistics. To address this concern, Glisson and Fischer (1987) called for the use of analogies. Understanding of analogical problem solving has surged in the last few decades with the integration of…

  13. Beneath the Skin: Statistics, Trust, and Status

    ERIC Educational Resources Information Center

    Smith, Richard

    2011-01-01

    Overreliance on statistics, and even faith in them--which Richard Smith in this essay calls a branch of "metricophilia"--is a common feature of research in education and in the social sciences more generally. Of course accurate statistics are important, but they often constitute essentially a powerful form of rhetoric. For purposes of analysis and…

  14. High call volume at poison control centers: identification and implications for communication.

    PubMed

    Caravati, E M; Latimer, S; Reblin, M; Bennett, H K W; Cummins, M R; Crouch, B I; Ellington, L

    2012-09-01

    High volume surges in health care are uncommon and unpredictable events. Their impact on health system performance and capacity is difficult to study. To identify time periods that exhibited very busy conditions at a poison control center and to determine whether cases and communication during high volume call periods are different from cases during low volume periods. Call data from a US poison control center over twelve consecutive months was collected via a call logger and an electronic case database (Toxicall®).Variables evaluated for high call volume conditions were: (1) call duration; (2) number of cases; and (3) number of calls per staff member per 30 minute period. Statistical analyses identified peak periods as busier than 99% of all other 30 minute time periods and low volume periods as slower than 70% of all other 30 minute periods. Case and communication characteristics of high volume and low volume calls were compared using logistic regression. A total of 65,364 incoming calls occurred over 12 months. One hundred high call volume and 4885 low call volume 30 minute periods were identified. High volume periods were more common between 1500 and 2300 hours and during the winter months. Coded verbal communication data were evaluated for 42 high volume and 296 low volume calls. The mean (standard deviation) call length of these calls during high volume and low volume periods was 3 minutes 27 seconds (1 minute 46 seconds) and 3 minutes 57 seconds (2 minutes 11 seconds), respectively. Regression analyses revealed a trend for fewer overall verbal statements and fewer staff questions during peak periods, but no other significant differences for staff-caller communication behaviors were found. Peak activity for poison center call volume can be identified by statistical modeling. Calls during high volume periods were similar to low volume calls. Communication was more concise yet staff was able to maintain a good rapport with callers during busy call periods. This approach allows evaluation of poison exposure call characteristics and communication during high volume periods.

  15. Stochastic Partial Differential Equation Solver for Hydroacoustic Modeling: Improvements to Paracousti Sound Propagation Solver

    NASA Astrophysics Data System (ADS)

    Preston, L. A.

    2017-12-01

    Marine hydrokinetic (MHK) devices offer a clean, renewable alternative energy source for the future. Responsible utilization of MHK devices, however, requires that the effects of acoustic noise produced by these devices on marine life and marine-related human activities be well understood. Paracousti is a 3-D full waveform acoustic modeling suite that can accurately propagate MHK noise signals in the complex bathymetry found in the near-shore to open ocean environment and considers real properties of the seabed, water column, and air-surface interface. However, this is a deterministic simulation that assumes the environment and source are exactly known. In reality, environmental and source characteristics are often only known in a statistical sense. Thus, to fully characterize the expected noise levels within the marine environment, this uncertainty in environmental and source factors should be incorporated into the acoustic simulations. One method is to use Monte Carlo (MC) techniques where simulation results from a large number of deterministic solutions are aggregated to provide statistical properties of the output signal. However, MC methods can be computationally prohibitive since they can require tens of thousands or more simulations to build up an accurate representation of those statistical properties. An alternative method, using the technique of stochastic partial differential equations (SPDE), allows computation of the statistical properties of output signals at a small fraction of the computational cost of MC. We are developing a SPDE solver for the 3-D acoustic wave propagation problem called Paracousti-UQ to help regulators and operators assess the statistical properties of environmental noise produced by MHK devices. In this presentation, we present the SPDE method and compare statistical distributions of simulated acoustic signals in simple models to MC simulations to show the accuracy and efficiency of the SPDE method. Sandia National Laboratories is a multimission laboratory managed and operated by National Technology and Engineering Solutions of Sandia LLC, a wholly owned subsidiary of Honeywell International Inc. for the U.S. Department of Energy's National Nuclear Security Administration under contract DE-NA0003525.

  16. Connectivity-based fixel enhancement: Whole-brain statistical analysis of diffusion MRI measures in the presence of crossing fibres

    PubMed Central

    Raffelt, David A.; Smith, Robert E.; Ridgway, Gerard R.; Tournier, J-Donald; Vaughan, David N.; Rose, Stephen; Henderson, Robert; Connelly, Alan

    2015-01-01

    In brain regions containing crossing fibre bundles, voxel-average diffusion MRI measures such as fractional anisotropy (FA) are difficult to interpret, and lack within-voxel single fibre population specificity. Recent work has focused on the development of more interpretable quantitative measures that can be associated with a specific fibre population within a voxel containing crossing fibres (herein we use fixel to refer to a specific fibre population within a single voxel). Unfortunately, traditional 3D methods for smoothing and cluster-based statistical inference cannot be used for voxel-based analysis of these measures, since the local neighbourhood for smoothing and cluster formation can be ambiguous when adjacent voxels may have different numbers of fixels, or ill-defined when they belong to different tracts. Here we introduce a novel statistical method to perform whole-brain fixel-based analysis called connectivity-based fixel enhancement (CFE). CFE uses probabilistic tractography to identify structurally connected fixels that are likely to share underlying anatomy and pathology. Probabilistic connectivity information is then used for tract-specific smoothing (prior to the statistical analysis) and enhancement of the statistical map (using a threshold-free cluster enhancement-like approach). To investigate the characteristics of the CFE method, we assessed sensitivity and specificity using a large number of combinations of CFE enhancement parameters and smoothing extents, using simulated pathology generated with a range of test-statistic signal-to-noise ratios in five different white matter regions (chosen to cover a broad range of fibre bundle features). The results suggest that CFE input parameters are relatively insensitive to the characteristics of the simulated pathology. We therefore recommend a single set of CFE parameters that should give near optimal results in future studies where the group effect is unknown. We then demonstrate the proposed method by comparing apparent fibre density between motor neurone disease (MND) patients with control subjects. The MND results illustrate the benefit of fixel-specific statistical inference in white matter regions that contain crossing fibres. PMID:26004503

  17. Brain Cancer—Patient Version

    Cancer.gov

    Brain cancer refers to growths of malignant cells in tissues of the brain. Tumors that start in the brain are called primary brain tumors. Tumors that spread to the brain are called metastatic brain tumors. Start here to find information on brain cancer treatment, research, and statistics.

  18. Powerful Statistical Inference for Nested Data Using Sufficient Summary Statistics

    PubMed Central

    Dowding, Irene; Haufe, Stefan

    2018-01-01

    Hierarchically-organized data arise naturally in many psychology and neuroscience studies. As the standard assumption of independent and identically distributed samples does not hold for such data, two important problems are to accurately estimate group-level effect sizes, and to obtain powerful statistical tests against group-level null hypotheses. A common approach is to summarize subject-level data by a single quantity per subject, which is often the mean or the difference between class means, and treat these as samples in a group-level t-test. This “naive” approach is, however, suboptimal in terms of statistical power, as it ignores information about the intra-subject variance. To address this issue, we review several approaches to deal with nested data, with a focus on methods that are easy to implement. With what we call the sufficient-summary-statistic approach, we highlight a computationally efficient technique that can improve statistical power by taking into account within-subject variances, and we provide step-by-step instructions on how to apply this approach to a number of frequently-used measures of effect size. The properties of the reviewed approaches and the potential benefits over a group-level t-test are quantitatively assessed on simulated data and demonstrated on EEG data from a simulated-driving experiment. PMID:29615885

  19. Design of a testing strategy using non-animal based test methods: lessons learnt from the ACuteTox project.

    PubMed

    Kopp-Schneider, Annette; Prieto, Pilar; Kinsner-Ovaskainen, Agnieszka; Stanzel, Sven

    2013-06-01

    In the framework of toxicology, a testing strategy can be viewed as a series of steps which are taken to come to a final prediction about a characteristic of a compound under study. The testing strategy is performed as a single-step procedure, usually called a test battery, using simultaneously all information collected on different endpoints, or as tiered approach in which a decision tree is followed. Design of a testing strategy involves statistical considerations, such as the development of a statistical prediction model. During the EU FP6 ACuteTox project, several prediction models were proposed on the basis of statistical classification algorithms which we illustrate here. The final choice of testing strategies was not based on statistical considerations alone. However, without thorough statistical evaluations a testing strategy cannot be identified. We present here a number of observations made from the statistical viewpoint which relate to the development of testing strategies. The points we make were derived from problems we had to deal with during the evaluation of this large research project. A central issue during the development of a prediction model is the danger of overfitting. Procedures are presented to deal with this challenge. Copyright © 2012 Elsevier Ltd. All rights reserved.

  20. Evolutionary neural networks for anomaly detection based on the behavior of a program.

    PubMed

    Han, Sang-Jun; Cho, Sung-Bae

    2006-06-01

    The process of learning the behavior of a given program by using machine-learning techniques (based on system-call audit data) is effective to detect intrusions. Rule learning, neural networks, statistics, and hidden Markov models (HMMs) are some of the kinds of representative methods for intrusion detection. Among them, neural networks are known for good performance in learning system-call sequences. In order to apply this knowledge to real-world problems successfully, it is important to determine the structures and weights of these call sequences. However, finding the appropriate structures requires very long time periods because there are no suitable analytical solutions. In this paper, a novel intrusion-detection technique based on evolutionary neural networks (ENNs) is proposed. One advantage of using ENNs is that it takes less time to obtain superior neural networks than when using conventional approaches. This is because they discover the structures and weights of the neural networks simultaneously. Experimental results with the 1999 Defense Advanced Research Projects Agency (DARPA) Intrusion Detection Evaluation (IDEVAL) data confirm that ENNs are promising tools for intrusion detection.

  1. Photolysis rates in correlated overlapping cloud fields: Cloud-J 7.3

    DOE PAGES

    Prather, M. J.

    2015-05-27

    A new approach for modeling photolysis rates ( J values) in atmospheres with fractional cloud cover has been developed and implemented as Cloud-J – a multi-scattering eight-stream radiative transfer model for solar radiation based on Fast-J. Using observed statistics for the vertical correlation of cloud layers, Cloud-J 7.3 provides a practical and accurate method for modeling atmospheric chemistry. The combination of the new maximum-correlated cloud groups with the integration over all cloud combinations represented by four quadrature atmospheres produces mean J values in an atmospheric column with root-mean-square errors of 4% or less compared with 10–20% errors using simpler approximations.more » Cloud-J is practical for chemistry-climate models, requiring only an average of 2.8 Fast-J calls per atmosphere, vs. hundreds of calls with the correlated cloud groups, or 1 call with the simplest cloud approximations. Another improvement in modeling J values, the treatment of volatile organic compounds with pressure-dependent cross sections is also incorporated into Cloud-J.« less

  2. Guided SAR image despeckling with probabilistic non local weights

    NASA Astrophysics Data System (ADS)

    Gokul, Jithin; Nair, Madhu S.; Rajan, Jeny

    2017-12-01

    SAR images are generally corrupted by granular disturbances called speckle, which makes visual analysis and detail extraction a difficult task. Non Local despeckling techniques with probabilistic similarity has been a recent trend in SAR despeckling. To achieve effective speckle suppression without compromising detail preservation, we propose an improvement for the existing Generalized Guided Filter with Bayesian Non-Local Means (GGF-BNLM) method. The proposed method (Guided SAR Image Despeckling with Probabilistic Non Local Weights) replaces parametric constants based on heuristics in GGF-BNLM method with dynamically derived values based on the image statistics for weight computation. Proposed changes make GGF-BNLM method adaptive and as a result, significant improvement is achieved in terms of performance. Experimental analysis on SAR images shows excellent speckle reduction without compromising feature preservation when compared to GGF-BNLM method. Results are also compared with other state-of-the-art and classic SAR depseckling techniques to demonstrate the effectiveness of the proposed method.

  3. Basics of Bayesian methods.

    PubMed

    Ghosh, Sujit K

    2010-01-01

    Bayesian methods are rapidly becoming popular tools for making statistical inference in various fields of science including biology, engineering, finance, and genetics. One of the key aspects of Bayesian inferential method is its logical foundation that provides a coherent framework to utilize not only empirical but also scientific information available to a researcher. Prior knowledge arising from scientific background, expert judgment, or previously collected data is used to build a prior distribution which is then combined with current data via the likelihood function to characterize the current state of knowledge using the so-called posterior distribution. Bayesian methods allow the use of models of complex physical phenomena that were previously too difficult to estimate (e.g., using asymptotic approximations). Bayesian methods offer a means of more fully understanding issues that are central to many practical problems by allowing researchers to build integrated models based on hierarchical conditional distributions that can be estimated even with limited amounts of data. Furthermore, advances in numerical integration methods, particularly those based on Monte Carlo methods, have made it possible to compute the optimal Bayes estimators. However, there is a reasonably wide gap between the background of the empirically trained scientists and the full weight of Bayesian statistical inference. Hence, one of the goals of this chapter is to bridge the gap by offering elementary to advanced concepts that emphasize linkages between standard approaches and full probability modeling via Bayesian methods.

  4. Antigenic Structure of Rabbit γ Globulin

    PubMed Central

    Dubiski, S.; Dubiska, Anna; Skalba, Danuta; Kelus, A.

    1961-01-01

    By iso-immunization, antisera to five rabbit γ globulin antigens were obtained. They are called A (former Da), B, C, D and E. Individual sera of 670 rabbits belonging to six separate populations were tested by precipitation methods. The distribution of the iso-antigens and their combinations into serum groups were studied. Each particular γ globulin iso-antigen was found to be of hereditary character; they seem to form three genetic systems: A, C and BDE, statistically independent. Various antisera from England, Poland and U.S.A were compared. PMID:13724581

  5. Accurate landmarking of three-dimensional facial data in the presence of facial expressions and occlusions using a three-dimensional statistical facial feature model.

    PubMed

    Zhao, Xi; Dellandréa, Emmanuel; Chen, Liming; Kakadiaris, Ioannis A

    2011-10-01

    Three-dimensional face landmarking aims at automatically localizing facial landmarks and has a wide range of applications (e.g., face recognition, face tracking, and facial expression analysis). Existing methods assume neutral facial expressions and unoccluded faces. In this paper, we propose a general learning-based framework for reliable landmark localization on 3-D facial data under challenging conditions (i.e., facial expressions and occlusions). Our approach relies on a statistical model, called 3-D statistical facial feature model, which learns both the global variations in configurational relationships between landmarks and the local variations of texture and geometry around each landmark. Based on this model, we further propose an occlusion classifier and a fitting algorithm. Results from experiments on three publicly available 3-D face databases (FRGC, BU-3-DFE, and Bosphorus) demonstrate the effectiveness of our approach, in terms of landmarking accuracy and robustness, in the presence of expressions and occlusions.

  6. PRANAS: A New Platform for Retinal Analysis and Simulation.

    PubMed

    Cessac, Bruno; Kornprobst, Pierre; Kraria, Selim; Nasser, Hassan; Pamplona, Daniela; Portelli, Geoffrey; Viéville, Thierry

    2017-01-01

    The retina encodes visual scenes by trains of action potentials that are sent to the brain via the optic nerve. In this paper, we describe a new free access user-end software allowing to better understand this coding. It is called PRANAS (https://pranas.inria.fr), standing for Platform for Retinal ANalysis And Simulation. PRANAS targets neuroscientists and modelers by providing a unique set of retina-related tools. PRANAS integrates a retina simulator allowing large scale simulations while keeping a strong biological plausibility and a toolbox for the analysis of spike train population statistics. The statistical method (entropy maximization under constraints) takes into account both spatial and temporal correlations as constraints, allowing to analyze the effects of memory on statistics. PRANAS also integrates a tool computing and representing in 3D (time-space) receptive fields. All these tools are accessible through a friendly graphical user interface. The most CPU-costly of them have been implemented to run in parallel.

  7. On Fluctuations of Eigenvalues of Random Band Matrices

    NASA Astrophysics Data System (ADS)

    Shcherbina, M.

    2015-10-01

    We consider the fluctuations of linear eigenvalue statistics of random band matrices whose entries have the form with i.i.d. possessing the th moment, where the function u has a finite support , so that M has only nonzero diagonals. The parameter b (called the bandwidth) is assumed to grow with n in a way such that . Without any additional assumptions on the growth of b we prove CLT for linear eigenvalue statistics for a rather wide class of test functions. Thus we improve and generalize the results of the previous papers (Jana et al., arXiv:1412.2445; Li et al. Random Matrices 2:04, 2013), where CLT was proven under the assumption . Moreover, we develop a method which allows to prove automatically the CLT for linear eigenvalue statistics of the smooth test functions for almost all classical models of random matrix theory: deformed Wigner and sample covariance matrices, sparse matrices, diluted random matrices, matrices with heavy tales etc.

  8. Automatic generation of efficient orderings of events for scheduling applications

    NASA Technical Reports Server (NTRS)

    Morris, Robert A.

    1994-01-01

    In scheduling a set of tasks, it is often not known with certainty how long a given event will take. We call this duration uncertainty. Duration uncertainty is a primary obstacle to the successful completion of a schedule. If a duration of one task is longer than expected, the remaining tasks are delayed. The delay may result in the abandonment of the schedule itself, a phenomenon known as schedule breakage. One response to schedule breakage is on-line, dynamic rescheduling. A more recent alternative is called proactive rescheduling. This method uses statistical data about the durations of events in order to anticipate the locations in the schedule where breakage is likely prior to the execution of the schedule. It generates alternative schedules at such sensitive points, which can be then applied by the scheduler at execution time, without the delay incurred by dynamic rescheduling. This paper proposes a technique for making proactive error management more effective. The technique is based on applying a similarity-based method of clustering to the problem of identifying similar events in a set of events.

  9. Weather and emotional state: a search for associations between weather and calls to telephone counseling services

    NASA Astrophysics Data System (ADS)

    Driscoll, Dennis; Stillman, Daniel

    2002-08-01

    Previous research has revealed that an emotional response to weather might be indicated by calls to telephone counseling services. We analyzed call frequency from such "hotlines", each serving communities in a major metropolitan area of the United States (Detroit, Washington DC, Dallas and Seattle). The periods examined were all, or parts of, the years 1997 and 1998. Associations with subjectively derived synoptic weather types for all cities except Seattle, as well as with individual weather elements [cloudiness (sky cover), precipitation, windspeed, and interdiurnal temperature change] for all four cities, were investigated. Analysis of variance and t-tests (significance of means) were applied to test the statistical significance of differences. Although statistically significant results were obtained in scattered instances, the total number was within that expected by chance, and there was little in the way of consistency to these associations. One clear exception was the increased call frequency during destructive (severe) weather, when there is obvious concern about the damage done by it.

  10. Citizen surveillance for environmental monitoring: combining the efforts of citizen science and crowdsourcing in a quantitative data framework.

    PubMed

    Welvaert, Marijke; Caley, Peter

    2016-01-01

    Citizen science and crowdsourcing have been emerging as methods to collect data for surveillance and/or monitoring activities. They could be gathered under the overarching term citizen surveillance . The discipline, however, still struggles to be widely accepted in the scientific community, mainly because these activities are not embedded in a quantitative framework. This results in an ongoing discussion on how to analyze and make useful inference from these data. When considering the data collection process, we illustrate how citizen surveillance can be classified according to the nature of the underlying observation process measured in two dimensions-the degree of observer reporting intention and the control in observer detection effort. By classifying the observation process in these dimensions we distinguish between crowdsourcing, unstructured citizen science and structured citizen science. This classification helps the determine data processing and statistical treatment of these data for making inference. Using our framework, it is apparent that published studies are overwhelmingly associated with structured citizen science, and there are well developed statistical methods for the resulting data. In contrast, methods for making useful inference from purely crowd-sourced data remain under development, with the challenges of accounting for the unknown observation process considerable. Our quantitative framework for citizen surveillance calls for an integration of citizen science and crowdsourcing and provides a way forward to solve the statistical challenges inherent to citizen-sourced data.

  11. Simultaneous Genotype Calling and Haplotype Phasing Improves Genotype Accuracy and Reduces False-Positive Associations for Genome-wide Association Studies

    PubMed Central

    Browning, Brian L.; Yu, Zhaoxia

    2009-01-01

    We present a novel method for simultaneous genotype calling and haplotype-phase inference. Our method employs the computationally efficient BEAGLE haplotype-frequency model, which can be applied to large-scale studies with millions of markers and thousands of samples. We compare genotype calls made with our method to genotype calls made with the BIRDSEED, CHIAMO, GenCall, and ILLUMINUS genotype-calling methods, using genotype data from the Illumina 550K and Affymetrix 500K arrays. We show that our method has higher genotype-call accuracy and yields fewer uncalled genotypes than competing methods. We perform single-marker analysis of data from the Wellcome Trust Case Control Consortium bipolar disorder and type 2 diabetes studies. For bipolar disorder, the genotype calls in the original study yield 25 markers with apparent false-positive association with bipolar disorder at a p < 10−7 significance level, whereas genotype calls made with our method yield no associated markers at this significance threshold. Conversely, for markers with replicated association with type 2 diabetes, there is good concordance between genotype calls used in the original study and calls made by our method. Results from single-marker and haplotypic analysis of our method's genotype calls for the bipolar disorder study indicate that our method is highly effective at eliminating genotyping artifacts that cause false-positive associations in genome-wide association studies. Our new genotype-calling methods are implemented in the BEAGLE and BEAGLECALL software packages. PMID:19931040

  12. Integrating Formal Methods and Testing 2002

    NASA Technical Reports Server (NTRS)

    Cukic, Bojan

    2002-01-01

    Traditionally, qualitative program verification methodologies and program testing are studied in separate research communities. None of them alone is powerful and practical enough to provide sufficient confidence in ultra-high reliability assessment when used exclusively. Significant advances can be made by accounting not only tho formal verification and program testing. but also the impact of many other standard V&V techniques, in a unified software reliability assessment framework. The first year of this research resulted in the statistical framework that, given the assumptions on the success of the qualitative V&V and QA procedures, significantly reduces the amount of testing needed to confidently assess reliability at so-called high and ultra-high levels (10-4 or higher). The coming years shall address the methodologies to realistically estimate the impacts of various V&V techniques to system reliability and include the impact of operational risk to reliability assessment. Combine formal correctness verification, process and product metrics, and other standard qualitative software assurance methods with statistical testing with the aim of gaining higher confidence in software reliability assessment for high-assurance applications. B) Quantify the impact of these methods on software reliability. C) Demonstrate that accounting for the effectiveness of these methods reduces the number of tests needed to attain certain confidence level. D) Quantify and justify the reliability estimate for systems developed using various methods.

  13. More efficient parameter estimates for factor analysis of ordinal variables by ridge generalized least squares.

    PubMed

    Yuan, Ke-Hai; Jiang, Ge; Cheng, Ying

    2017-11-01

    Data in psychology are often collected using Likert-type scales, and it has been shown that factor analysis of Likert-type data is better performed on the polychoric correlation matrix than on the product-moment covariance matrix, especially when the distributions of the observed variables are skewed. In theory, factor analysis of the polychoric correlation matrix is best conducted using generalized least squares with an asymptotically correct weight matrix (AGLS). However, simulation studies showed that both least squares (LS) and diagonally weighted least squares (DWLS) perform better than AGLS, and thus LS or DWLS is routinely used in practice. In either LS or DWLS, the associations among the polychoric correlation coefficients are completely ignored. To mend such a gap between statistical theory and empirical work, this paper proposes new methods, called ridge GLS, for factor analysis of ordinal data. Monte Carlo results show that, for a wide range of sample sizes, ridge GLS methods yield uniformly more accurate parameter estimates than existing methods (LS, DWLS, AGLS). A real-data example indicates that estimates by ridge GLS are 9-20% more efficient than those by existing methods. Rescaled and adjusted test statistics as well as sandwich-type standard errors following the ridge GLS methods also perform reasonably well. © 2017 The British Psychological Society.

  14. The Environmental Data Book: A Guide to Statistics on the Environment and Development.

    ERIC Educational Resources Information Center

    Sheram, Katherine

    This book presents statistics on countries with populations of more than 1 million related to the quality of the environment, economic development, and how each is affected by the other. Sometimes called indicators, the statistics are measures of environmental, economic, and social conditions in developing and industrial countries. The book is…

  15. Tri-Center Analysis: Determining Measures of Trichotomous Central Tendency for the Parametric Analysis of Tri-Squared Test Results

    ERIC Educational Resources Information Center

    Osler, James Edward

    2014-01-01

    This monograph provides an epistemological rational for the design of a novel post hoc statistical measure called "Tri-Center Analysis". This new statistic is designed to analyze the post hoc outcomes of the Tri-Squared Test. In Tri-Center Analysis trichotomous parametric inferential parametric statistical measures are calculated from…

  16. Investigation of Super Learner Methodology on HIV-1 Small Sample: Application on Jaguar Trial Data.

    PubMed

    Houssaïni, Allal; Assoumou, Lambert; Marcelin, Anne Geneviève; Molina, Jean Michel; Calvez, Vincent; Flandre, Philippe

    2012-01-01

    Background. Many statistical models have been tested to predict phenotypic or virological response from genotypic data. A statistical framework called Super Learner has been introduced either to compare different methods/learners (discrete Super Learner) or to combine them in a Super Learner prediction method. Methods. The Jaguar trial is used to apply the Super Learner framework. The Jaguar study is an "add-on" trial comparing the efficacy of adding didanosine to an on-going failing regimen. Our aim was also to investigate the impact on the use of different cross-validation strategies and different loss functions. Four different repartitions between training set and validations set were tested through two loss functions. Six statistical methods were compared. We assess performance by evaluating R(2) values and accuracy by calculating the rates of patients being correctly classified. Results. Our results indicated that the more recent Super Learner methodology of building a new predictor based on a weighted combination of different methods/learners provided good performance. A simple linear model provided similar results to those of this new predictor. Slight discrepancy arises between the two loss functions investigated, and slight difference arises also between results based on cross-validated risks and results from full dataset. The Super Learner methodology and linear model provided around 80% of patients correctly classified. The difference between the lower and higher rates is around 10 percent. The number of mutations retained in different learners also varys from one to 41. Conclusions. The more recent Super Learner methodology combining the prediction of many learners provided good performance on our small dataset.

  17. MMASS: an optimized array-based method for assessing CpG island methylation.

    PubMed

    Ibrahim, Ashraf E K; Thorne, Natalie P; Baird, Katie; Barbosa-Morais, Nuno L; Tavaré, Simon; Collins, V Peter; Wyllie, Andrew H; Arends, Mark J; Brenton, James D

    2006-01-01

    We describe an optimized microarray method for identifying genome-wide CpG island methylation called microarray-based methylation assessment of single samples (MMASS) which directly compares methylated to unmethylated sequences within a single sample. To improve previous methods we used bioinformatic analysis to predict an optimized combination of methylation-sensitive enzymes that had the highest utility for CpG-island probes and different methods to produce unmethylated representations of test DNA for more sensitive detection of differential methylation by hybridization. Subtraction or methylation-dependent digestion with McrBC was used with optimized (MMASS-v2) or previously described (MMASS-v1, MMASS-sub) methylation-sensitive enzyme combinations and compared with a published McrBC method. Comparison was performed using DNA from the cell line HCT116. We show that the distribution of methylation microarray data is inherently skewed and requires exogenous spiked controls for normalization and that analysis of digestion of methylated and unmethylated control sequences together with linear fit models of replicate data showed superior statistical power for the MMASS-v2 method. Comparison with previous methylation data for HCT116 and validation of CpG islands from PXMP4, SFRP2, DCC, RARB and TSEN2 confirmed the accuracy of MMASS-v2 results. The MMASS-v2 method offers improved sensitivity and statistical power for high-throughput microarray identification of differential methylation.

  18. Psychometrics Matter in Health Behavior: A Long-term Reliability Generalization Study.

    PubMed

    Pickett, Andrew C; Valdez, Danny; Barry, Adam E

    2017-09-01

    Despite numerous calls for increased understanding and reporting of reliability estimates, social science research, including the field of health behavior, has been slow to respond and adopt such practices. Therefore, we offer a brief overview of reliability and common reporting errors; we then perform analyses to examine and demonstrate the variability of reliability estimates by sample and over time. Using meta-analytic reliability generalization, we examined the variability of coefficient alpha scores for a well-designed, consistent, nationwide health study, covering a span of nearly 40 years. For each year and sample, reliability varied. Furthermore, reliability was predicted by a sample characteristic that differed among age groups within each administration. We demonstrated that reliability is influenced by the methods and individuals from which a given sample is drawn. Our work echoes previous calls that psychometric properties, particularly reliability of scores, are important and must be considered and reported before drawing statistical conclusions.

  19. PISA: Federated Search in P2P Networks with Uncooperative Peers

    NASA Astrophysics Data System (ADS)

    Ren, Zujie; Shou, Lidan; Chen, Gang; Chen, Chun; Bei, Yijun

    Recently, federated search in P2P networks has received much attention. Most of the previous work assumed a cooperative environment where each peer can actively participate in information publishing and distributed document indexing. However, little work has addressed the problem of incorporating uncooperative peers, which do not publish their own corpus statistics, into a network. This paper presents a P2P-based federated search framework called PISA which incorporates uncooperative peers as well as the normal ones. In order to address the indexing needs for uncooperative peers, we propose a novel heuristic query-based sampling approach which can obtain high-quality resource descriptions from uncooperative peers at relatively low communication cost. We also propose an effective method called RISE to merge the results returned by uncooperative peers. Our experimental results indicate that PISA can provide quality search results, while utilizing the uncooperative peers at a low cost.

  20. Gamma-ray Full Spectrum Analysis for Environmental Radioactivity by HPGe Detector

    NASA Astrophysics Data System (ADS)

    Jeong, Meeyoung; Lee, Kyeong Beom; Kim, Kyeong Ja; Lee, Min-Kie; Han, Ju-Bong

    2014-12-01

    Odyssey, one of the NASA¡¯s Mars exploration program and SELENE (Kaguya), a Japanese lunar orbiting spacecraft have a payload of Gamma-Ray Spectrometer (GRS) for analyzing radioactive chemical elements of the atmosphere and the surface. In these days, gamma-ray spectroscopy with a High-Purity Germanium (HPGe) detector has been widely used for the activity measurements of natural radionuclides contained in the soil of the Earth. The energy spectra obtained by the HPGe detectors have been generally analyzed by means of the Window Analysis (WA) method. In this method, activity concentrations are determined by using the net counts of energy window around individual peaks. Meanwhile, an alternative method, the so-called Full Spectrum Analysis (FSA) method uses count numbers not only from full-absorption peaks but from the contributions of Compton scattering due to gamma-rays. Consequently, while it takes a substantial time to obtain a statistically significant result in the WA method, the FSA method requires a much shorter time to reach the same level of the statistical significance. This study shows the validation results of FSA method. We have compared the concentration of radioactivity of 40K, 232Th and 238U in the soil measured by the WA method and the FSA method, respectively. The gamma-ray spectrum of reference materials (RGU and RGTh, KCl) and soil samples were measured by the 120% HPGe detector with cosmic muon veto detector. According to the comparison result of activity concentrations between the FSA and the WA, we could conclude that FSA method is validated against the WA method. This study implies that the FSA method can be used in a harsh measurement environment, such as the gamma-ray measurement in the Moon, in which the level of statistical significance is usually required in a much shorter data acquisition time than the WA method.

  1. Plasma Cell Neoplasms (Including Multiple Myeloma)—Patient Version

    Cancer.gov

    Plasma cell neoplasms occur when abnormal plasma cells form cancerous tumors. When there is only one tumor, the disease is called a plasmacytoma. When there are multiple tumors, it is called multiple myeloma. Start here to find information on plasma cell neoplasms treatment, research, and statistics.

  2. 78 FR 17469 - Government Securities: Call for Large Position Reports

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-03-21

    ... DEPARTMENT OF THE TREASURY Government Securities: Call for Large Position Reports AGENCY: Office... Reserve Bank of New York, Government Securities Dealer Statistics Unit, 4th Floor, 33 Liberty Street, New... Eidemiller, or Kevin Hawkins; Government Securities Regulations Staff, Department of the Treasury, at 202-504...

  3. [Personality and work of Florence Nightingale--creator of modern nursing and public health pioneer].

    PubMed

    Milutinović, Dragana; Sumonja, Sanja; Maksimović, Jovan

    2012-01-01

    Through her "calling to service", Florence Nightingale worked as a nurse, manager, researcher, reformer, writer and teacher. The aim of this study is to present Florence Nightingale in all these roles, pointing out all complexity and multidimensionality of nursing profession. Having come from an aristocratic English family, Florence Nightingale was very educated She considered knowledge as a way, and statistical method as an instrument for discovering the rules of the world. Her work during the Crimean War was one of her most important deeds and made her a national hero. After the war, she devoted herself to reforming nursing and public health in Britain and in the world. Since she was bedbound after the Crimean War due to her illness, writing became the most powerful tool she had in achieving her goals. Florence Nightingale wrote many letters to politicians and statesmen, many newspaper and scientific articles. One of her greatest works "Notes on Nursing" was not written only for nurses, but for all women. By founding Nursing school at St. Thomas Hospital in 1860 she aspired to train and educate nurses. Her complete and lifelong devotion to the ,,calling" directed all her activities, contributions and achievements, not only towards nursing but also towards statistics, epidemiology, public health and social sciences.

  4. Ice nucleation in nature: supercooling point (SCP) measurements and the role of heterogeneous nucleation.

    PubMed

    Wilson, P W; Heneghan, A F; Haymet, A D J

    2003-02-01

    In biological systems, nucleation of ice from a supercooled aqueous solution is a stochastic process and always heterogeneous. The average time any solution may remain supercooled is determined only by the degree of supercooling and heterogeneous nucleation sites it encounters. Here we summarize the many and varied definitions of the so-called "supercooling point," also called the "temperature of crystallization" and the "nucleation temperature," and exhibit the natural, inherent width associated with this quantity. We describe a new method for accurate determination of the supercooling point, which takes into account the inherent statistical fluctuations of the value. We show further that many measurements on a single unchanging sample are required to make a statistically valid measure of the supercooling point. This raises an interesting difference in circumstances where such repeat measurements are inconvenient, or impossible, for example for live organism experiments. We also discuss the effect of solutes on this temperature of nucleation. Existing data appear to show that various solute species decrease the nucleation temperature somewhat more than the equivalent melting point depression. For non-ionic solutes the species appears not to be a significant factor whereas for ions the species does affect the level of decrease of the nucleation temperature.

  5. Energy landscape analysis of neuroimaging data

    NASA Astrophysics Data System (ADS)

    Ezaki, Takahiro; Watanabe, Takamitsu; Ohzeki, Masayuki; Masuda, Naoki

    2017-05-01

    Computational neuroscience models have been used for understanding neural dynamics in the brain and how they may be altered when physiological or other conditions change. We review and develop a data-driven approach to neuroimaging data called the energy landscape analysis. The methods are rooted in statistical physics theory, in particular the Ising model, also known as the (pairwise) maximum entropy model and Boltzmann machine. The methods have been applied to fitting electrophysiological data in neuroscience for a decade, but their use in neuroimaging data is still in its infancy. We first review the methods and discuss some algorithms and technical aspects. Then, we apply the methods to functional magnetic resonance imaging data recorded from healthy individuals to inspect the relationship between the accuracy of fitting, the size of the brain system to be analysed and the data length. This article is part of the themed issue `Mathematical methods in medicine: neuroscience, cardiology and pathology'.

  6. Probability and Statistics in Sensor Performance Modeling

    DTIC Science & Technology

    2010-12-01

    language software program is called Environmental Awareness for Sensor and Emitter Employment. Some important numerical issues in the implementation...3 Statistical analysis for measuring sensor performance...complementary cumulative distribution function cdf cumulative distribution function DST decision-support tool EASEE Environmental Awareness of

  7. An Artificial Intelligence Approach to Analyzing Student Errors in Statistics.

    ERIC Educational Resources Information Center

    Sebrechts, Marc M.; Schooler, Lael J.

    1987-01-01

    Describes the development of an artificial intelligence system called GIDE that analyzes student errors in statistics problems by inferring the students' intentions. Learning strategies involved in problem solving are discussed and the inclusion of goal structures is explained. (LRW)

  8. Infinitely divisible cascades to model the statistics of natural images.

    PubMed

    Chainais, Pierre

    2007-12-01

    We propose to model the statistics of natural images thanks to the large class of stochastic processes called Infinitely Divisible Cascades (IDC). IDC were first introduced in one dimension to provide multifractal time series to model the so-called intermittency phenomenon in hydrodynamical turbulence. We have extended the definition of scalar infinitely divisible cascades from 1 to N dimensions and commented on the relevance of such a model in fully developed turbulence in [1]. In this article, we focus on the particular 2 dimensional case. IDC appear as good candidates to model the statistics of natural images. They share most of their usual properties and appear to be consistent with several independent theoretical and experimental approaches of the literature. We point out the interest of IDC for applications to procedural texture synthesis.

  9. Using Poison Center Exposure Calls to Predict Methadone Poisoning Deaths

    PubMed Central

    Dasgupta, Nabarun; Davis, Jonathan; Jonsson Funk, Michele; Dart, Richard

    2012-01-01

    Purpose There are more drug overdose deaths in the Untied States than motor vehicle fatalities. Yet the US vital statistics reporting system is of limited value because the data are delayed by four years. Poison centers report data within an hour of the event, but previous studies suggested a small proportion of poisoning deaths are reported to poison centers (PC). In an era of improved electronic surveillance capabilities, exposure calls to PCs may be an alternate indicator of trends in overdose mortality. Methods We used PC call counts for methadone that were reported to the Researched Abuse, Diversion and Addiction-Related Surveillance (RADARS®) System in 2006 and 2007. US death certificate data were used to identify deaths due to methadone. Linear regression was used to quantify the relationship of deaths and poison center calls. Results Compared to decedents, poison center callers tended to be younger, more often female, at home and less likely to require medical attention. A strong association was found with PC calls and methadone mortality (b = 0.88, se = 0.42, t = 9.5, df = 1, p<0.0001, R2 = 0.77). These findings were robust to large changes in a sensitivity analysis assessing the impact of underreporting of methadone overdose deaths. Conclusions Our results suggest that calls to poison centers for methadone are correlated with poisoning mortality as identified on death certificates. Calls received by poison centers may be used for timely surveillance of mortality due to methadone. In the midst of the prescription opioid overdose epidemic, electronic surveillance tools that report in real-time are powerful public health tools. PMID:22829925

  10. Turnover intentions in a call center: The role of emotional dissonance, job resources, and job satisfaction

    PubMed Central

    Zito, Margherita; Molino, Monica; Cortese, Claudio Giovanni; Ghislieri, Chiara; Colombo, Lara

    2018-01-01

    Background Turnover intentions refer to employees’ intent to leave the organization and, within call centers, it can be influenced by factors such as relational variables or the perception of the quality of working life, which can be affected by emotional dissonance. This specific job demand to express emotions not felt is peculiar in call centers, and can influence job satisfaction and turnover intentions, a crucial problem among these working contexts. This study aims to detect, within the theoretical framework of the Job Demands-Resources Model, the role of emotional dissonance (job demand), and two resources, job autonomy and supervisors’ support, in the perception of job satisfaction and turnover intentions among an Italian call center. Method The study involved 318 call center agents of an Italian Telecommunication Company. Data analysis first performed descriptive statistics through SPSS 22. A path analysis was then performed through LISREL 8.72 and tested both direct and indirect effects. Results Results suggest the role of resources in fostering job satisfaction and in decreasing turnover intentions. Emotional dissonance reveals a negative relation with job satisfaction and a positive relation with turnover. Moreover, job satisfaction is negatively related with turnover and mediates the relationship between job resources and turnover. Conclusion This study contributes to extend the knowledge about the variables influencing turnover intentions, a crucial problem among call centers. Moreover, the study identifies theoretical considerations and practical implications to promote well-being among call center employees. To foster job satisfaction and reduce turnover intentions, in fact, it is important to make resources available, but also to offer specific training programs to make employees and supervisors aware about the consequences of emotional dissonance. PMID:29401507

  11. Assessing the suitability of summary data for two-sample Mendelian randomization analyses using MR-Egger regression: the role of the I2 statistic.

    PubMed

    Bowden, Jack; Del Greco M, Fabiola; Minelli, Cosetta; Davey Smith, George; Sheehan, Nuala A; Thompson, John R

    2016-12-01

    : MR-Egger regression has recently been proposed as a method for Mendelian randomization (MR) analyses incorporating summary data estimates of causal effect from multiple individual variants, which is robust to invalid instruments. It can be used to test for directional pleiotropy and provides an estimate of the causal effect adjusted for its presence. MR-Egger regression provides a useful additional sensitivity analysis to the standard inverse variance weighted (IVW) approach that assumes all variants are valid instruments. Both methods use weights that consider the single nucleotide polymorphism (SNP)-exposure associations to be known, rather than estimated. We call this the `NO Measurement Error' (NOME) assumption. Causal effect estimates from the IVW approach exhibit weak instrument bias whenever the genetic variants utilized violate the NOME assumption, which can be reliably measured using the F-statistic. The effect of NOME violation on MR-Egger regression has yet to be studied. An adaptation of the I2 statistic from the field of meta-analysis is proposed to quantify the strength of NOME violation for MR-Egger. It lies between 0 and 1, and indicates the expected relative bias (or dilution) of the MR-Egger causal estimate in the two-sample MR context. We call it IGX2 . The method of simulation extrapolation is also explored to counteract the dilution. Their joint utility is evaluated using simulated data and applied to a real MR example. In simulated two-sample MR analyses we show that, when a causal effect exists, the MR-Egger estimate of causal effect is biased towards the null when NOME is violated, and the stronger the violation (as indicated by lower values of IGX2 ), the stronger the dilution. When additionally all genetic variants are valid instruments, the type I error rate of the MR-Egger test for pleiotropy is inflated and the causal effect underestimated. Simulation extrapolation is shown to substantially mitigate these adverse effects. We demonstrate our proposed approach for a two-sample summary data MR analysis to estimate the causal effect of low-density lipoprotein on heart disease risk. A high value of IGX2 close to 1 indicates that dilution does not materially affect the standard MR-Egger analyses for these data. : Care must be taken to assess the NOME assumption via the IGX2 statistic before implementing standard MR-Egger regression in the two-sample summary data context. If IGX2 is sufficiently low (less than 90%), inferences from the method should be interpreted with caution and adjustment methods considered. © The Author 2016. Published by Oxford University Press on behalf of the International Epidemiological Association.

  12. Effect Sizes in Gifted Education Research

    ERIC Educational Resources Information Center

    Gentry, Marcia; Peters, Scott J.

    2009-01-01

    Recent calls for reporting and interpreting effect sizes have been numerous, with the 5th edition of the "Publication Manual of the American Psychological Association" (2001) calling for the inclusion of effect sizes to interpret quantitative findings. Many top journals have required that effect sizes accompany claims of statistical significance.…

  13. A new statistic for identifying batch effects in high-throughput genomic data that uses guided principal component analysis.

    PubMed

    Reese, Sarah E; Archer, Kellie J; Therneau, Terry M; Atkinson, Elizabeth J; Vachon, Celine M; de Andrade, Mariza; Kocher, Jean-Pierre A; Eckel-Passow, Jeanette E

    2013-11-15

    Batch effects are due to probe-specific systematic variation between groups of samples (batches) resulting from experimental features that are not of biological interest. Principal component analysis (PCA) is commonly used as a visual tool to determine whether batch effects exist after applying a global normalization method. However, PCA yields linear combinations of the variables that contribute maximum variance and thus will not necessarily detect batch effects if they are not the largest source of variability in the data. We present an extension of PCA to quantify the existence of batch effects, called guided PCA (gPCA). We describe a test statistic that uses gPCA to test whether a batch effect exists. We apply our proposed test statistic derived using gPCA to simulated data and to two copy number variation case studies: the first study consisted of 614 samples from a breast cancer family study using Illumina Human 660 bead-chip arrays, whereas the second case study consisted of 703 samples from a family blood pressure study that used Affymetrix SNP Array 6.0. We demonstrate that our statistic has good statistical properties and is able to identify significant batch effects in two copy number variation case studies. We developed a new statistic that uses gPCA to identify whether batch effects exist in high-throughput genomic data. Although our examples pertain to copy number data, gPCA is general and can be used on other data types as well. The gPCA R package (Available via CRAN) provides functionality and data to perform the methods in this article. reesese@vcu.edu

  14. Reflexion on linear regression trip production modelling method for ensuring good model quality

    NASA Astrophysics Data System (ADS)

    Suprayitno, Hitapriya; Ratnasari, Vita

    2017-11-01

    Transport Modelling is important. For certain cases, the conventional model still has to be used, in which having a good trip production model is capital. A good model can only be obtained from a good sample. Two of the basic principles of a good sampling is having a sample capable to represent the population characteristics and capable to produce an acceptable error at a certain confidence level. It seems that this principle is not yet quite understood and used in trip production modeling. Therefore, investigating the Trip Production Modelling practice in Indonesia and try to formulate a better modeling method for ensuring the Model Quality is necessary. This research result is presented as follows. Statistics knows a method to calculate span of prediction value at a certain confidence level for linear regression, which is called Confidence Interval of Predicted Value. The common modeling practice uses R2 as the principal quality measure, the sampling practice varies and not always conform to the sampling principles. An experiment indicates that small sample is already capable to give excellent R2 value and sample composition can significantly change the model. Hence, good R2 value, in fact, does not always mean good model quality. These lead to three basic ideas for ensuring good model quality, i.e. reformulating quality measure, calculation procedure, and sampling method. A quality measure is defined as having a good R2 value and a good Confidence Interval of Predicted Value. Calculation procedure must incorporate statistical calculation method and appropriate statistical tests needed. A good sampling method must incorporate random well distributed stratified sampling with a certain minimum number of samples. These three ideas need to be more developed and tested.

  15. QuASAR: quantitative allele-specific analysis of reads

    PubMed Central

    Harvey, Chris T.; Moyerbrailean, Gregory A.; Davis, Gordon O.; Wen, Xiaoquan; Luca, Francesca; Pique-Regi, Roger

    2015-01-01

    Motivation: Expression quantitative trait loci (eQTL) studies have discovered thousands of genetic variants that regulate gene expression, enabling a better understanding of the functional role of non-coding sequences. However, eQTL studies are costly, requiring large sample sizes and genome-wide genotyping of each sample. In contrast, analysis of allele-specific expression (ASE) is becoming a popular approach to detect the effect of genetic variation on gene expression, even within a single individual. This is typically achieved by counting the number of RNA-seq reads matching each allele at heterozygous sites and testing the null hypothesis of a 1:1 allelic ratio. In principle, when genotype information is not readily available, it could be inferred from the RNA-seq reads directly. However, there are currently no existing methods that jointly infer genotypes and conduct ASE inference, while considering uncertainty in the genotype calls. Results: We present QuASAR, quantitative allele-specific analysis of reads, a novel statistical learning method for jointly detecting heterozygous genotypes and inferring ASE. The proposed ASE inference step takes into consideration the uncertainty in the genotype calls, while including parameters that model base-call errors in sequencing and allelic over-dispersion. We validated our method with experimental data for which high-quality genotypes are available. Results for an additional dataset with multiple replicates at different sequencing depths demonstrate that QuASAR is a powerful tool for ASE analysis when genotypes are not available. Availability and implementation: http://github.com/piquelab/QuASAR. Contact: fluca@wayne.edu or rpique@wayne.edu Supplementary information: Supplementary Material is available at Bioinformatics online. PMID:25480375

  16. Calibrating random forests for probability estimation.

    PubMed

    Dankowski, Theresa; Ziegler, Andreas

    2016-09-30

    Probabilities can be consistently estimated using random forests. It is, however, unclear how random forests should be updated to make predictions for other centers or at different time points. In this work, we present two approaches for updating random forests for probability estimation. The first method has been proposed by Elkan and may be used for updating any machine learning approach yielding consistent probabilities, so-called probability machines. The second approach is a new strategy specifically developed for random forests. Using the terminal nodes, which represent conditional probabilities, the random forest is first translated to logistic regression models. These are, in turn, used for re-calibration. The two updating strategies were compared in a simulation study and are illustrated with data from the German Stroke Study Collaboration. In most simulation scenarios, both methods led to similar improvements. In the simulation scenario in which the stricter assumptions of Elkan's method were not met, the logistic regression-based re-calibration approach for random forests outperformed Elkan's method. It also performed better on the stroke data than Elkan's method. The strength of Elkan's method is its general applicability to any probability machine. However, if the strict assumptions underlying this approach are not met, the logistic regression-based approach is preferable for updating random forests for probability estimation. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.

  17. Statistical technique for analysing functional connectivity of multiple spike trains.

    PubMed

    Masud, Mohammad Shahed; Borisyuk, Roman

    2011-03-15

    A new statistical technique, the Cox method, used for analysing functional connectivity of simultaneously recorded multiple spike trains is presented. This method is based on the theory of modulated renewal processes and it estimates a vector of influence strengths from multiple spike trains (called reference trains) to the selected (target) spike train. Selecting another target spike train and repeating the calculation of the influence strengths from the reference spike trains enables researchers to find all functional connections among multiple spike trains. In order to study functional connectivity an "influence function" is identified. This function recognises the specificity of neuronal interactions and reflects the dynamics of postsynaptic potential. In comparison to existing techniques, the Cox method has the following advantages: it does not use bins (binless method); it is applicable to cases where the sample size is small; it is sufficiently sensitive such that it estimates weak influences; it supports the simultaneous analysis of multiple influences; it is able to identify a correct connectivity scheme in difficult cases of "common source" or "indirect" connectivity. The Cox method has been thoroughly tested using multiple sets of data generated by the neural network model of the leaky integrate and fire neurons with a prescribed architecture of connections. The results suggest that this method is highly successful for analysing functional connectivity of simultaneously recorded multiple spike trains. Copyright © 2011 Elsevier B.V. All rights reserved.

  18. A Space–Time Permutation Scan Statistic for Disease Outbreak Detection

    PubMed Central

    Kulldorff, Martin; Heffernan, Richard; Hartman, Jessica; Assunção, Renato; Mostashari, Farzad

    2005-01-01

    Background The ability to detect disease outbreaks early is important in order to minimize morbidity and mortality through timely implementation of disease prevention and control measures. Many national, state, and local health departments are launching disease surveillance systems with daily analyses of hospital emergency department visits, ambulance dispatch calls, or pharmacy sales for which population-at-risk information is unavailable or irrelevant. Methods and Findings We propose a prospective space–time permutation scan statistic for the early detection of disease outbreaks that uses only case numbers, with no need for population-at-risk data. It makes minimal assumptions about the time, geographical location, or size of the outbreak, and it adjusts for natural purely spatial and purely temporal variation. The new method was evaluated using daily analyses of hospital emergency department visits in New York City. Four of the five strongest signals were likely local precursors to citywide outbreaks due to rotavirus, norovirus, and influenza. The number of false signals was at most modest. Conclusion If such results hold up over longer study times and in other locations, the space–time permutation scan statistic will be an important tool for local and national health departments that are setting up early disease detection surveillance systems. PMID:15719066

  19. Causes of Death Data in the Global Burden of Disease Estimates for Ischemic and Hemorrhagic Stroke

    PubMed Central

    Truelsen, Thomas; Krarup, Lars-Henrik; Iversen, Helle; Mensah, George A.; Feigin, Valery; Sposato, Luciano; Naghavi, Mohsen

    2015-01-01

    Background Stroke mortality estimates in the Global Burden of Disease (GBD) study are based on routine mortality statistics and redistribution of ill-defined codes that cannot be a cause of death, the so-called “garbage codes”. This study describes the contribution of these codes to stroke mortality estimates. Methods All available mortality data were compiled and non-specific cause codes were redistributed based on literature review and statistical methods. Ill-defined codes were redistributed to their specific cause of disease by age, sex, country, and year. The reassignment was done based on the international classification of diseases and the pathology behind each code by checking multiple causes of death and literature review. Results Unspecified stroke, and primary and secondary hypertension are leading contributing “garbage codes” to stroke mortality estimates for intracranial hemorrhagic stroke and ischemic stroke. There were marked differences in the fraction of death assigned to ischemic stroke and hemorrhagic stroke for unspecified stroke and hypertension between GBD regions and between age groups. Conclusions A large proportion of stroke fatalities is derived from the redistribution of “unspecified stroke” and “hypertension” with marked regional differences. Future advancements in stroke certification, data collections, and statistical analyses may improve the estimation of the global stroke burden. PMID:26505189

  20. Segment and fit thresholding: a new method for image analysis applied to microarray and immunofluorescence data.

    PubMed

    Ensink, Elliot; Sinha, Jessica; Sinha, Arkadeep; Tang, Huiyuan; Calderone, Heather M; Hostetter, Galen; Winter, Jordan; Cherba, David; Brand, Randall E; Allen, Peter J; Sempere, Lorenzo F; Haab, Brian B

    2015-10-06

    Experiments involving the high-throughput quantification of image data require algorithms for automation. A challenge in the development of such algorithms is to properly interpret signals over a broad range of image characteristics, without the need for manual adjustment of parameters. Here we present a new approach for locating signals in image data, called Segment and Fit Thresholding (SFT). The method assesses statistical characteristics of small segments of the image and determines the best-fit trends between the statistics. Based on the relationships, SFT identifies segments belonging to background regions; analyzes the background to determine optimal thresholds; and analyzes all segments to identify signal pixels. We optimized the initial settings for locating background and signal in antibody microarray and immunofluorescence data and found that SFT performed well over multiple, diverse image characteristics without readjustment of settings. When used for the automated analysis of multicolor, tissue-microarray images, SFT correctly found the overlap of markers with known subcellular localization, and it performed better than a fixed threshold and Otsu's method for selected images. SFT promises to advance the goal of full automation in image analysis.

  1. vFitness: a web-based computing tool for improving estimation of in vitro HIV-1 fitness experiments

    PubMed Central

    2010-01-01

    Background The replication rate (or fitness) between viral variants has been investigated in vivo and in vitro for human immunodeficiency virus (HIV). HIV fitness plays an important role in the development and persistence of drug resistance. The accurate estimation of viral fitness relies on complicated computations based on statistical methods. This calls for tools that are easy to access and intuitive to use for various experiments of viral fitness. Results Based on a mathematical model and several statistical methods (least-squares approach and measurement error models), a Web-based computing tool has been developed for improving estimation of virus fitness in growth competition assays of human immunodeficiency virus type 1 (HIV-1). Conclusions Unlike the two-point calculation used in previous studies, the estimation here uses linear regression methods with all observed data in the competition experiment to more accurately estimate relative viral fitness parameters. The dilution factor is introduced for making the computational tool more flexible to accommodate various experimental conditions. This Web-based tool is implemented in C# language with Microsoft ASP.NET, and is publicly available on the Web at http://bis.urmc.rochester.edu/vFitness/. PMID:20482791

  2. vFitness: a web-based computing tool for improving estimation of in vitro HIV-1 fitness experiments.

    PubMed

    Ma, Jingming; Dykes, Carrie; Wu, Tao; Huang, Yangxin; Demeter, Lisa; Wu, Hulin

    2010-05-18

    The replication rate (or fitness) between viral variants has been investigated in vivo and in vitro for human immunodeficiency virus (HIV). HIV fitness plays an important role in the development and persistence of drug resistance. The accurate estimation of viral fitness relies on complicated computations based on statistical methods. This calls for tools that are easy to access and intuitive to use for various experiments of viral fitness. Based on a mathematical model and several statistical methods (least-squares approach and measurement error models), a Web-based computing tool has been developed for improving estimation of virus fitness in growth competition assays of human immunodeficiency virus type 1 (HIV-1). Unlike the two-point calculation used in previous studies, the estimation here uses linear regression methods with all observed data in the competition experiment to more accurately estimate relative viral fitness parameters. The dilution factor is introduced for making the computational tool more flexible to accommodate various experimental conditions. This Web-based tool is implemented in C# language with Microsoft ASP.NET, and is publicly available on the Web at http://bis.urmc.rochester.edu/vFitness/.

  3. Segment and Fit Thresholding: A New Method for Image Analysis Applied to Microarray and Immunofluorescence Data

    PubMed Central

    Ensink, Elliot; Sinha, Jessica; Sinha, Arkadeep; Tang, Huiyuan; Calderone, Heather M.; Hostetter, Galen; Winter, Jordan; Cherba, David; Brand, Randall E.; Allen, Peter J.; Sempere, Lorenzo F.; Haab, Brian B.

    2016-01-01

    Certain experiments involve the high-throughput quantification of image data, thus requiring algorithms for automation. A challenge in the development of such algorithms is to properly interpret signals over a broad range of image characteristics, without the need for manual adjustment of parameters. Here we present a new approach for locating signals in image data, called Segment and Fit Thresholding (SFT). The method assesses statistical characteristics of small segments of the image and determines the best-fit trends between the statistics. Based on the relationships, SFT identifies segments belonging to background regions; analyzes the background to determine optimal thresholds; and analyzes all segments to identify signal pixels. We optimized the initial settings for locating background and signal in antibody microarray and immunofluorescence data and found that SFT performed well over multiple, diverse image characteristics without readjustment of settings. When used for the automated analysis of multi-color, tissue-microarray images, SFT correctly found the overlap of markers with known subcellular localization, and it performed better than a fixed threshold and Otsu’s method for selected images. SFT promises to advance the goal of full automation in image analysis. PMID:26339978

  4. A Method of Relating General Circulation Model Simulated Climate to the Observed Local Climate. Part I: Seasonal Statistics.

    NASA Astrophysics Data System (ADS)

    Karl, Thomas R.; Wang, Wei-Chyung; Schlesinger, Michael E.; Knight, Richard W.; Portman, David

    1990-10-01

    Important surface observations such as the daily maximum and minimum temperature, daily precipitation, and cloud ceilings often have localized characteristics that are difficult to reproduce with the current resolution and the physical parameterizations in state-of-the-art General Circulation climate Models (GCMs). Many of the difficulties can be partially attributed to mismatches in scale, local topography. regional geography and boundary conditions between models and surface-based observations. Here, we present a method, called climatological projection by model statistics (CPMS), to relate GCM grid-point flee-atmosphere statistics, the predictors, to these important local surface observations. The method can be viewed as a generalization of the model output statistics (MOS) and perfect prog (PP) procedures used in numerical weather prediction (NWP) models. It consists of the application of three statistical methods: 1) principle component analysis (FICA), 2) canonical correlation, and 3) inflated regression analysis. The PCA reduces the redundancy of the predictors The canonical correlation is used to develop simultaneous relationships between linear combinations of the predictors, the canonical variables, and the surface-based observations. Finally, inflated regression is used to relate the important canonical variables to each of the surface-based observed variables.We demonstrate that even an early version of the Oregon State University two-level atmospheric GCM (with prescribed sea surface temperature) produces free-atmosphere statistics than can, when standardized using the model's internal means and variances (the MOS-like version of CPMS), closely approximate the observed local climate. When the model data are standardized by the observed free-atmosphere means and variances (the PP version of CPMS), however, the model does not reproduce the observed surface climate as well. Our results indicate that in the MOS-like version of CPMS the differences between the output of a ten-year GCM control run and the surface-based observations are often smaller than the differences between the observations of two ten-year periods. Such positive results suggest that GCMs may already contain important climatological information that can be used to infer the local climate.

  5. Effectiveness and cost effectiveness of television, radio and print advertisements in promoting the New York smokers' quitline

    PubMed Central

    Farrelly, Matthew C; Hussin, Altijani; Bauer, Ursula E

    2007-01-01

    Objectives This study assessed the relative effectiveness and cost effectiveness of television, radio and print advertisements to generate calls to the New York smokers' quitline. Methods Regression analysis was used to link total county level monthly quitline calls to television, radio and print advertising expenditures. Based on regression results, standardised measures of the relative effectiveness and cost effectiveness of expenditures were computed. Results There was a positive and statistically significant relation between call volume and expenditures for television (p<0.01) and radio (p<0.001) advertisements and a marginally significant effect for expenditures on newspaper advertisements (p<0.065). The largest effect was for television advertising. However, because of differences in advertising costs, for every $1000 increase in television, radio and newspaper expenditures, call volume increased by 0.1%, 5.7% and 2.8%, respectively. Conclusions Television, radio and print media all effectively increased calls to the New York smokers' quitline. Although increases in expenditures for television were the most effective, their relatively high costs suggest they are not currently the most cost effective means to promote a quitline. This implies that a more efficient mix of media would place greater emphasis on radio than television. However, because the current study does not adequately assess the extent to which radio expenditures would sustain their effectiveness with substantial expenditure increases, it is not feasible to determine a more optimal mix of expenditures. PMID:18048625

  6. Reinventing Biostatistics Education for Basic Scientists

    PubMed Central

    Weissgerber, Tracey L.; Garovic, Vesna D.; Milin-Lazovic, Jelena S.; Winham, Stacey J.; Obradovic, Zoran; Trzeciakowski, Jerome P.; Milic, Natasa M.

    2016-01-01

    Numerous studies demonstrating that statistical errors are common in basic science publications have led to calls to improve statistical training for basic scientists. In this article, we sought to evaluate statistical requirements for PhD training and to identify opportunities for improving biostatistics education in the basic sciences. We provide recommendations for improving statistics training for basic biomedical scientists, including: 1. Encouraging departments to require statistics training, 2. Tailoring coursework to the students’ fields of research, and 3. Developing tools and strategies to promote education and dissemination of statistical knowledge. We also provide a list of statistical considerations that should be addressed in statistics education for basic scientists. PMID:27058055

  7. aPPRove: An HMM-Based Method for Accurate Prediction of RNA-Pentatricopeptide Repeat Protein Binding Events

    PubMed Central

    Harrison, Thomas; Ruiz, Jaime; Sloan, Daniel B.; Ben-Hur, Asa; Boucher, Christina

    2016-01-01

    Pentatricopeptide repeat containing proteins (PPRs) bind to RNA transcripts originating from mitochondria and plastids. There are two classes of PPR proteins. The P class contains tandem P-type motif sequences, and the PLS class contains alternating P, L and S type sequences. In this paper, we describe a novel tool that predicts PPR-RNA interaction; specifically, our method, which we call aPPRove, determines where and how a PLS-class PPR protein will bind to RNA when given a PPR and one or more RNA transcripts by using a combinatorial binding code for site specificity proposed by Barkan et al. Our results demonstrate that aPPRove successfully locates how and where a PPR protein belonging to the PLS class can bind to RNA. For each binding event it outputs the binding site, the amino-acid-nucleotide interaction, and its statistical significance. Furthermore, we show that our method can be used to predict binding events for PLS-class proteins using a known edit site and the statistical significance of aligning the PPR protein to that site. In particular, we use our method to make a conjecture regarding an interaction between CLB19 and the second intronic region of ycf3. The aPPRove web server can be found at www.cs.colostate.edu/~approve. PMID:27560805

  8. Conceptual and statistical issues in couples observational research: Rationale and methods for design decisions.

    PubMed

    Baucom, Brian R W; Leo, Karena; Adamo, Colin; Georgiou, Panayiotis; Baucom, Katherine J W

    2017-12-01

    Observational behavioral coding methods are widely used for the study of relational phenomena. There are numerous guidelines for the development and implementation of these methods that include principles for creating new and adapting existing coding systems as well as principles for creating coding teams. While these principles have been successfully implemented in research on relational phenomena, the ever expanding array of phenomena being investigated with observational methods calls for a similar expansion of these principles. Specifically, guidelines are needed for decisions that arise in current areas of emphasis in couple research including observational investigation of related outcomes (e.g., relationship distress and psychological symptoms), the study of change in behavior over time, and the study of group similarities and differences in the enactment and perception of behavior. This article describes conceptual and statistical considerations involved in these 3 areas of research and presents principle- and empirically based rationale for design decisions related to these issues. A unifying principle underlying these guidelines is the need for careful consideration of fit between theory, research questions, selection of coding systems, and creation of coding teams. Implications of (mis)fit for the advancement of theory are discussed. (PsycINFO Database Record (c) 2018 APA, all rights reserved).

  9. Challenges in Species Tree Estimation Under the Multispecies Coalescent Model

    PubMed Central

    Xu, Bo; Yang, Ziheng

    2016-01-01

    The multispecies coalescent (MSC) model has emerged as a powerful framework for inferring species phylogenies while accounting for ancestral polymorphism and gene tree-species tree conflict. A number of methods have been developed in the past few years to estimate the species tree under the MSC. The full likelihood methods (including maximum likelihood and Bayesian inference) average over the unknown gene trees and accommodate their uncertainties properly but involve intensive computation. The approximate or summary coalescent methods are computationally fast and are applicable to genomic datasets with thousands of loci, but do not make an efficient use of information in the multilocus data. Most of them take the two-step approach of reconstructing the gene trees for multiple loci by phylogenetic methods and then treating the estimated gene trees as observed data, without accounting for their uncertainties appropriately. In this article we review the statistical nature of the species tree estimation problem under the MSC, and explore the conceptual issues and challenges of species tree estimation by focusing mainly on simple cases of three or four closely related species. We use mathematical analysis and computer simulation to demonstrate that large differences in statistical performance may exist between the two classes of methods. We illustrate that several counterintuitive behaviors may occur with the summary methods but they are due to inefficient use of information in the data by summary methods and vanish when the data are analyzed using full-likelihood methods. These include (i) unidentifiability of parameters in the model, (ii) inconsistency in the so-called anomaly zone, (iii) singularity on the likelihood surface, and (iv) deterioration of performance upon addition of more data. We discuss the challenges and strategies of species tree inference for distantly related species when the molecular clock is violated, and highlight the need for improving the computational efficiency and model realism of the likelihood methods as well as the statistical efficiency of the summary methods. PMID:27927902

  10. Environmental statistics and optimal regulation

    NASA Astrophysics Data System (ADS)

    Sivak, David; Thomson, Matt

    2015-03-01

    The precision with which an organism can detect its environment, and the timescale for and statistics of environmental change, will affect the suitability of different strategies for regulating protein levels in response to environmental inputs. We propose a general framework--here applied to the enzymatic regulation of metabolism in response to changing nutrient concentrations--to predict the optimal regulatory strategy given the statistics of fluctuations in the environment and measurement apparatus, and the costs associated with enzyme production. We find: (i) relative convexity of enzyme expression cost and benefit influences the fitness of thresholding or graded responses; (ii) intermediate levels of measurement uncertainty call for a sophisticated Bayesian decision rule; and (iii) in dynamic contexts, intermediate levels of uncertainty call for retaining memory of the past. Statistical properties of the environment, such as variability and correlation times, set optimal biochemical parameters, such as thresholds and decay rates in signaling pathways. Our framework provides a theoretical basis for interpreting molecular signal processing algorithms and a classification scheme that organizes known regulatory strategies and may help conceptualize heretofore unknown ones.

  11. Dynamic Optimization

    NASA Technical Reports Server (NTRS)

    Laird, Philip

    1992-01-01

    We distinguish static and dynamic optimization of programs: whereas static optimization modifies a program before runtime and is based only on its syntactical structure, dynamic optimization is based on the statistical properties of the input source and examples of program execution. Explanation-based generalization is a commonly used dynamic optimization method, but its effectiveness as a speedup-learning method is limited, in part because it fails to separate the learning process from the program transformation process. This paper describes a dynamic optimization technique called a learn-optimize cycle that first uses a learning element to uncover predictable patterns in the program execution and then uses an optimization algorithm to map these patterns into beneficial transformations. The technique has been used successfully for dynamic optimization of pure Prolog.

  12. Permutational distribution of the log-rank statistic under random censorship with applications to carcinogenicity assays.

    PubMed

    Heimann, G; Neuhaus, G

    1998-03-01

    In the random censorship model, the log-rank test is often used for comparing a control group with different dose groups. If the number of tumors is small, so-called exact methods are often applied for computing critical values from a permutational distribution. Two of these exact methods are discussed and shown to be incorrect. The correct permutational distribution is derived and studied with respect to its behavior under unequal censoring in the light of recent results proving that the permutational version and the unconditional version of the log-rank test are asymptotically equivalent even under unequal censoring. The log-rank test is studied by simulations of a realistic scenario from a bioassay with small numbers of tumors.

  13. Reconstructing metastatic seeding patterns of human cancers

    PubMed Central

    Reiter, Johannes G.; Makohon-Moore, Alvin P.; Gerold, Jeffrey M.; Bozic, Ivana; Chatterjee, Krishnendu; Iacobuzio-Donahue, Christine A.; Vogelstein, Bert; Nowak, Martin A.

    2017-01-01

    Reconstructing the evolutionary history of metastases is critical for understanding their basic biological principles and has profound clinical implications. Genome-wide sequencing data has enabled modern phylogenomic methods to accurately dissect subclones and their phylogenies from noisy and impure bulk tumour samples at unprecedented depth. However, existing methods are not designed to infer metastatic seeding patterns. Here we develop a tool, called Treeomics, to reconstruct the phylogeny of metastases and map subclones to their anatomic locations. Treeomics infers comprehensive seeding patterns for pancreatic, ovarian, and prostate cancers. Moreover, Treeomics correctly disambiguates true seeding patterns from sequencing artifacts; 7% of variants were misclassified by conventional statistical methods. These artifacts can skew phylogenies by creating illusory tumour heterogeneity among distinct samples. In silico benchmarking on simulated tumour phylogenies across a wide range of sample purities (15–95%) and sequencing depths (25-800 × ) demonstrates the accuracy of Treeomics compared with existing methods. PMID:28139641

  14. Data communications in a parallel active messaging interface of a parallel computer

    DOEpatents

    Archer, Charles J; Blocksome, Michael A; Ratterman, Joseph D; Smith, Brian E

    2013-11-12

    Data communications in a parallel active messaging interface (`PAMI`) of a parallel computer composed of compute nodes that execute a parallel application, each compute node including application processors that execute the parallel application and at least one management processor dedicated to gathering information regarding data communications. The PAMI is composed of data communications endpoints, each endpoint composed of a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes and the endpoints coupled for data communications through the PAMI and through data communications resources. Embodiments function by gathering call site statistics describing data communications resulting from execution of data communications instructions and identifying in dependence upon the call cite statistics a data communications algorithm for use in executing a data communications instruction at a call site in the parallel application.

  15. Optical turbulence forecast: ready for an operational application

    NASA Astrophysics Data System (ADS)

    Masciadri, E.; Lascaux, F.; Turchi, A.; Fini, L.

    2017-04-01

    One of the main goals of the feasibility study MOSE (MOdelling ESO Sites) is to evaluate the performances of a method conceived to forecast the optical turbulence (OT) above the European Southern Observatory (ESO) sites of the Very Large Telescope (VLT) and the European Extremely Large Telescope (E-ELT) in Chile. The method implied the use of a dedicated code conceived for the OT called ASTRO-MESO-NH. In this paper, we present results we obtained at conclusion of this project concerning the performances of this method in forecasting the most relevant parameters related to the OT (CN^2, seeing ɛ, isoplanatic angle θ0 and wavefront coherence time τ0). Numerical predictions related to a very rich statistical sample of nights uniformly distributed along a solar year and belonging to different years have been compared to observations, and different statistical operators have been analysed such as the classical bias, root-mean-squared error, σ and more sophisticated statistical operators derived by the contingency tables that are able to quantify the score of success of a predictive method such as the percentage of correct detection (PC) and the probability to detect a parameter within a specific range of values (POD). The main conclusions of the study tell us that the ASTRO-MESO-NH model provides performances that are already very good to definitely guarantee a not negligible positive impact on the service mode of top-class telescopes and ELTs. A demonstrator for an automatic and operational version of the ASTRO-MESO-NH model will be soon implemented on the sites of VLT and E-ELT.

  16. Temperature rise, sea level rise and increased radiative forcing - an application of cointegration methods

    NASA Astrophysics Data System (ADS)

    Schmith, Torben; Thejll, Peter; Johansen, Søren

    2016-04-01

    We analyse the statistical relationship between changes in global temperature, global steric sea level and radiative forcing in order to reveal causal relationships. There are in this, however, potential pitfalls due to the trending nature of the time series. We therefore apply a statistical method called cointegration analysis, originating from the field of econometrics, which is able to correctly handle the analysis of series with trends and other long-range dependencies. Further, we find a relationship between steric sea level and temperature and find that temperature causally depends on the steric sea level, which can be understood as a consequence of the large heat capacity of the ocean. This result is obtained both when analyzing observed data and data from a CMIP5 historical model run. Finally, we find that in the data from the historical run, the steric sea level, in turn, is driven by the external forcing. Finally, we demonstrate that combining these two results can lead to a novel estimate of radiative forcing back in time based on observations.

  17. Bayesian Estimation of Thermonuclear Reaction Rates for Deuterium+Deuterium Reactions

    NASA Astrophysics Data System (ADS)

    Gómez Iñesta, Á.; Iliadis, C.; Coc, A.

    2017-11-01

    The study of d+d reactions is of major interest since their reaction rates affect the predicted abundances of D, 3He, and 7Li. In particular, recent measurements of primordial D/H ratios call for reduced uncertainties in the theoretical abundances predicted by Big Bang nucleosynthesis (BBN). Different authors have studied reactions involved in BBN by incorporating new experimental data and a careful treatment of systematic and probabilistic uncertainties. To analyze the experimental data, Coc et al. used results of ab initio models for the theoretical calculation of the energy dependence of S-factors in conjunction with traditional statistical methods based on χ 2 minimization. Bayesian methods have now spread to many scientific fields and provide numerous advantages in data analysis. Astrophysical S-factors and reaction rates using Bayesian statistics were calculated by Iliadis et al. Here we present a similar analysis for two d+d reactions, d(d, n)3He and d(d, p)3H, that has been translated into a total decrease of the predicted D/H value by 0.16%.

  18. Integrated driver modelling considering state transition feature for individual adaptation of driver assistance systems

    NASA Astrophysics Data System (ADS)

    Raksincharoensak, Pongsathorn; Khaisongkram, Wathanyoo; Nagai, Masao; Shimosaka, Masamichi; Mori, Taketoshi; Sato, Tomomasa

    2010-12-01

    This paper describes the modelling of naturalistic driving behaviour in real-world traffic scenarios, based on driving data collected via an experimental automobile equipped with a continuous sensing drive recorder. This paper focuses on the longitudinal driving situations which are classified into five categories - car following, braking, free following, decelerating and stopping - and are referred to as driving states. Here, the model is assumed to be represented by a state flow diagram. Statistical machine learning of driver-vehicle-environment system model based on driving database is conducted by a discriminative modelling approach called boosting sequential labelling method.

  19. 75 FR 7445 - Pacific Fishery Management Council; Public Meetings

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-02-19

    ... to Order 1. Opening Remarks and Introductions 2. Roll Call 3. Executive Director's Report 4. Approve... Permits for 2010 SCHEDULE OF ANCILLARY MEETINGS Friday, March 5, 2010 Scientific and Statistical Committee... California Ballroom Salon 2 Scientific and Statistical Committee 8 am California Ballroom Salon 4 Legislative...

  20. ASURV: Astronomical SURVival Statistics

    NASA Astrophysics Data System (ADS)

    Feigelson, E. D.; Nelson, P. I.; Isobe, T.; LaValley, M.

    2014-06-01

    ASURV (Astronomical SURVival Statistics) provides astronomy survival analysis for right- and left-censored data including the maximum-likelihood Kaplan-Meier estimator and several univariate two-sample tests, bivariate correlation measures, and linear regressions. ASURV is written in FORTRAN 77, and is stand-alone and does not call any specialized libraries.

  1. Combined slope ratio analysis and linear-subtraction: An extension of the Pearce ratio method

    NASA Astrophysics Data System (ADS)

    De Waal, Sybrand A.

    1996-07-01

    A new technique, called combined slope ratio analysis, has been developed by extending the Pearce element ratio or conserved-denominator method (Pearce, 1968) to its logical conclusions. If two stoichiometric substances are mixed and certain chemical components are uniquely contained in either one of the two mixing substances, then by treating these unique components as conserved, the composition of the substance not containing the relevant component can be accurately calculated within the limits allowed by analytical and geological error. The calculated composition can then be subjected to rigorous statistical testing using the linear-subtraction method recently advanced by Woronow (1994). Application of combined slope ratio analysis to the rocks of the Uwekahuna Laccolith, Hawaii, USA, and the lavas of the 1959-summit eruption of Kilauea Volcano, Hawaii, USA, yields results that are consistent with field observations.

  2. Statistical issues in the design, conduct and analysis of two large safety studies.

    PubMed

    Gaffney, Michael

    2016-10-01

    The emergence, post approval, of serious medical events, which may be associated with the use of a particular drug or class of drugs, is an important public health and regulatory issue. The best method to address this issue is through a large, rigorously designed safety study. Therefore, it is important to elucidate the statistical issues involved in these large safety studies. Two such studies are PRECISION and EAGLES. PRECISION is the primary focus of this article. PRECISION is a non-inferiority design with a clinically relevant non-inferiority margin. Statistical issues in the design, conduct and analysis of PRECISION are discussed. Quantitative and clinical aspects of the selection of the composite primary endpoint, the determination and role of the non-inferiority margin in a large safety study and the intent-to-treat and modified intent-to-treat analyses in a non-inferiority safety study are shown. Protocol changes that were necessary during the conduct of PRECISION are discussed from a statistical perspective. Issues regarding the complex analysis and interpretation of the results of PRECISION are outlined. EAGLES is presented as a large, rigorously designed safety study when a non-inferiority margin was not able to be determined by a strong clinical/scientific method. In general, when a non-inferiority margin is not able to be determined, the width of the 95% confidence interval is a way to size the study and to assess the cost-benefit of relative trial size. A non-inferiority margin, when able to be determined by a strong scientific method, should be included in a large safety study. Although these studies could not be called "pragmatic," they are examples of best real-world designs to address safety and regulatory concerns. © The Author(s) 2016.

  3. Bayesian evaluation of effect size after replicating an original study

    PubMed Central

    van Aert, Robbie C. M.; van Assen, Marcel A. L. M.

    2017-01-01

    The vast majority of published results in the literature is statistically significant, which raises concerns about their reliability. The Reproducibility Project Psychology (RPP) and Experimental Economics Replication Project (EE-RP) both replicated a large number of published studies in psychology and economics. The original study and replication were statistically significant in 36.1% in RPP and 68.8% in EE-RP suggesting many null effects among the replicated studies. However, evidence in favor of the null hypothesis cannot be examined with null hypothesis significance testing. We developed a Bayesian meta-analysis method called snapshot hybrid that is easy to use and understand and quantifies the amount of evidence in favor of a zero, small, medium and large effect. The method computes posterior model probabilities for a zero, small, medium, and large effect and adjusts for publication bias by taking into account that the original study is statistically significant. We first analytically approximate the methods performance, and demonstrate the necessity to control for the original study’s significance to enable the accumulation of evidence for a true zero effect. Then we applied the method to the data of RPP and EE-RP, showing that the underlying effect sizes of the included studies in EE-RP are generally larger than in RPP, but that the sample sizes of especially the included studies in RPP are often too small to draw definite conclusions about the true effect size. We also illustrate how snapshot hybrid can be used to determine the required sample size of the replication akin to power analysis in null hypothesis significance testing and present an easy to use web application (https://rvanaert.shinyapps.io/snapshot/) and R code for applying the method. PMID:28388646

  4. A Comparison of the Performance of Advanced Statistical Techniques for the Refinement of Day-ahead and Longer NWP-based Wind Power Forecasts

    NASA Astrophysics Data System (ADS)

    Zack, J. W.

    2015-12-01

    Predictions from Numerical Weather Prediction (NWP) models are the foundation for wind power forecasts for day-ahead and longer forecast horizons. The NWP models directly produce three-dimensional wind forecasts on their respective computational grids. These can be interpolated to the location and time of interest. However, these direct predictions typically contain significant systematic errors ("biases"). This is due to a variety of factors including the limited space-time resolution of the NWP models and shortcomings in the model's representation of physical processes. It has become common practice to attempt to improve the raw NWP forecasts by statistically adjusting them through a procedure that is widely known as Model Output Statistics (MOS). The challenge is to identify complex patterns of systematic errors and then use this knowledge to adjust the NWP predictions. The MOS-based improvements are the basis for much of the value added by commercial wind power forecast providers. There are an enormous number of statistical approaches that can be used to generate the MOS adjustments to the raw NWP forecasts. In order to obtain insight into the potential value of some of the newer and more sophisticated statistical techniques often referred to as "machine learning methods" a MOS-method comparison experiment has been performed for wind power generation facilities in 6 wind resource areas of California. The underlying NWP models that provided the raw forecasts were the two primary operational models of the US National Weather Service: the GFS and NAM models. The focus was on 1- and 2-day ahead forecasts of the hourly wind-based generation. The statistical methods evaluated included: (1) screening multiple linear regression, which served as a baseline method, (2) artificial neural networks, (3) a decision-tree approach called random forests, (4) gradient boosted regression based upon an decision-tree algorithm, (5) support vector regression and (6) analog ensemble, which is a case-matching scheme. The presentation will provide (1) an overview of each method and the experimental design, (2) performance comparisons based on standard metrics such as bias, MAE and RMSE, (3) a summary of the performance characteristics of each approach and (4) a preview of further experiments to be conducted.

  5. The average receiver operating characteristic curve in multireader multicase imaging studies

    PubMed Central

    Samuelson, F W

    2014-01-01

    Objective: In multireader, multicase (MRMC) receiver operating characteristic (ROC) studies for evaluating medical imaging systems, the area under the ROC curve (AUC) is often used as a summary metric. Owing to the limitations of AUC, plotting the average ROC curve to accompany the rigorous statistical inference on AUC is recommended. The objective of this article is to investigate methods for generating the average ROC curve from ROC curves of individual readers. Methods: We present both a non-parametric method and a parametric method for averaging ROC curves that produce a ROC curve, the area under which is equal to the average AUC of individual readers (a property we call area preserving). We use hypothetical examples, simulated data and a real-world imaging data set to illustrate these methods and their properties. Results: We show that our proposed methods are area preserving. We also show that the method of averaging the ROC parameters, either the conventional bi-normal parameters (a, b) or the proper bi-normal parameters (c, da), is generally not area preserving and may produce a ROC curve that is intuitively not an average of multiple curves. Conclusion: Our proposed methods are useful for making plots of average ROC curves in MRMC studies as a companion to the rigorous statistical inference on the AUC end point. The software implementing these methods is freely available from the authors. Advances in knowledge: Methods for generating the average ROC curve in MRMC ROC studies are formally investigated. The area-preserving criterion we defined is useful to evaluate such methods. PMID:24884728

  6. CORNAS: coverage-dependent RNA-Seq analysis of gene expression data without biological replicates.

    PubMed

    Low, Joel Z B; Khang, Tsung Fei; Tammi, Martti T

    2017-12-28

    In current statistical methods for calling differentially expressed genes in RNA-Seq experiments, the assumption is that an adjusted observed gene count represents an unknown true gene count. This adjustment usually consists of a normalization step to account for heterogeneous sample library sizes, and then the resulting normalized gene counts are used as input for parametric or non-parametric differential gene expression tests. A distribution of true gene counts, each with a different probability, can result in the same observed gene count. Importantly, sequencing coverage information is currently not explicitly incorporated into any of the statistical models used for RNA-Seq analysis. We developed a fast Bayesian method which uses the sequencing coverage information determined from the concentration of an RNA sample to estimate the posterior distribution of a true gene count. Our method has better or comparable performance compared to NOISeq and GFOLD, according to the results from simulations and experiments with real unreplicated data. We incorporated a previously unused sequencing coverage parameter into a procedure for differential gene expression analysis with RNA-Seq data. Our results suggest that our method can be used to overcome analytical bottlenecks in experiments with limited number of replicates and low sequencing coverage. The method is implemented in CORNAS (Coverage-dependent RNA-Seq), and is available at https://github.com/joel-lzb/CORNAS .

  7. Personalised news filtering and recommendation system using Chi-square statistics-based K-nearest neighbour (χ2SB-KNN) model

    NASA Astrophysics Data System (ADS)

    Adeniyi, D. A.; Wei, Z.; Yang, Y.

    2017-10-01

    Recommendation problem has been extensively studied by researchers in the field of data mining, database and information retrieval. This study presents the design and realisation of an automated, personalised news recommendations system based on Chi-square statistics-based K-nearest neighbour (χ2SB-KNN) model. The proposed χ2SB-KNN model has the potential to overcome computational complexity and information overloading problems, reduces runtime and speeds up execution process through the use of critical value of χ2 distribution. The proposed recommendation engine can alleviate scalability challenges through combined online pattern discovery and pattern matching for real-time recommendations. This work also showcases the development of a novel method of feature selection referred to as Data Discretisation-Based feature selection method. This is used for selecting the best features for the proposed χ2SB-KNN algorithm at the preprocessing stage of the classification procedures. The implementation of the proposed χ2SB-KNN model is achieved through the use of a developed in-house Java program on an experimental website called OUC newsreaders' website. Finally, we compared the performance of our system with two baseline methods which are traditional Euclidean distance K-nearest neighbour and Naive Bayesian techniques. The result shows a significant improvement of our method over the baseline methods studied.

  8. Statistical significance of combinatorial regulations

    PubMed Central

    Terada, Aika; Okada-Hatakeyama, Mariko; Tsuda, Koji; Sese, Jun

    2013-01-01

    More than three transcription factors often work together to enable cells to respond to various signals. The detection of combinatorial regulation by multiple transcription factors, however, is not only computationally nontrivial but also extremely unlikely because of multiple testing correction. The exponential growth in the number of tests forces us to set a strict limit on the maximum arity. Here, we propose an efficient branch-and-bound algorithm called the “limitless arity multiple-testing procedure” (LAMP) to count the exact number of testable combinations and calibrate the Bonferroni factor to the smallest possible value. LAMP lists significant combinations without any limit, whereas the family-wise error rate is rigorously controlled under the threshold. In the human breast cancer transcriptome, LAMP discovered statistically significant combinations of as many as eight binding motifs. This method may contribute to uncover pathways regulated in a coordinated fashion and find hidden associations in heterogeneous data. PMID:23882073

  9. A Method to Categorize 2-Dimensional Patterns Using Statistics of Spatial Organization.

    PubMed

    López-Sauceda, Juan; Rueda-Contreras, Mara D

    2017-01-01

    We developed a measurement framework of spatial organization to categorize 2-dimensional patterns from 2 multiscalar biological architectures. We propose that underlying shapes of biological entities can be approached using the statistical concept of degrees of freedom, defining it through expansion of area variability in a pattern. To help scope this suggestion, we developed a mathematical argument recognizing the deep foundations of area variability in a polygonal pattern (spatial heterogeneity). This measure uses a parameter called eutacticity . Our measuring platform of spatial heterogeneity can assign particular ranges of distribution of spatial areas for 2 biological architectures: ecological patterns of Namibia fairy circles and epithelial sheets. The spatial organizations of our 2 analyzed biological architectures are demarcated by being in a particular position among spatial order and disorder. We suggest that this theoretical platform can give us some insights about the nature of shapes in biological systems to understand organizational constraints.

  10. Estimation of road profile variability from measured vehicle responses

    NASA Astrophysics Data System (ADS)

    Fauriat, W.; Mattrand, C.; Gayton, N.; Beakou, A.; Cembrzynski, T.

    2016-05-01

    When assessing the statistical variability of fatigue loads acting throughout the life of a vehicle, the question of the variability of road roughness naturally arises, as both quantities are strongly related. For car manufacturers, gathering information on the environment in which vehicles evolve is a long and costly but necessary process to adapt their products to durability requirements. In the present paper, a data processing algorithm is proposed in order to estimate the road profiles covered by a given vehicle, from the dynamic responses measured on this vehicle. The algorithm based on Kalman filtering theory aims at solving a so-called inverse problem, in a stochastic framework. It is validated using experimental data obtained from simulations and real measurements. The proposed method is subsequently applied to extract valuable statistical information on road roughness from an existing load characterisation campaign carried out by Renault within one of its markets.

  11. Statistical Inference in Hidden Markov Models Using k-Segment Constraints

    PubMed Central

    Titsias, Michalis K.; Holmes, Christopher C.; Yau, Christopher

    2016-01-01

    Hidden Markov models (HMMs) are one of the most widely used statistical methods for analyzing sequence data. However, the reporting of output from HMMs has largely been restricted to the presentation of the most-probable (MAP) hidden state sequence, found via the Viterbi algorithm, or the sequence of most probable marginals using the forward–backward algorithm. In this article, we expand the amount of information we could obtain from the posterior distribution of an HMM by introducing linear-time dynamic programming recursions that, conditional on a user-specified constraint in the number of segments, allow us to (i) find MAP sequences, (ii) compute posterior probabilities, and (iii) simulate sample paths. We collectively call these recursions k-segment algorithms and illustrate their utility using simulated and real examples. We also highlight the prospective and retrospective use of k-segment constraints for fitting HMMs or exploring existing model fits. Supplementary materials for this article are available online. PMID:27226674

  12. Muver, a computational framework for accurately calling accumulated mutations.

    PubMed

    Burkholder, Adam B; Lujan, Scott A; Lavender, Christopher A; Grimm, Sara A; Kunkel, Thomas A; Fargo, David C

    2018-05-09

    Identification of mutations from next-generation sequencing data typically requires a balance between sensitivity and accuracy. This is particularly true of DNA insertions and deletions (indels), that can impart significant phenotypic consequences on cells but are harder to call than substitution mutations from whole genome mutation accumulation experiments. To overcome these difficulties, we present muver, a computational framework that integrates established bioinformatics tools with novel analytical methods to generate mutation calls with the extremely low false positive rates and high sensitivity required for accurate mutation rate determination and comparison. Muver uses statistical comparison of ancestral and descendant allelic frequencies to identify variant loci and assigns genotypes with models that include per-sample assessments of sequencing errors by mutation type and repeat context. Muver identifies maximally parsimonious mutation pathways that connect these genotypes, differentiating potential allelic conversion events and delineating ambiguities in mutation location, type, and size. Benchmarking with a human gold standard father-son pair demonstrates muver's sensitivity and low false positive rates. In DNA mismatch repair (MMR) deficient Saccharomyces cerevisiae, muver detects multi-base deletions in homopolymers longer than the replicative polymerase footprint at rates greater than predicted for sequential single-base deletions, implying a novel multi-repeat-unit slippage mechanism. Benchmarking results demonstrate the high accuracy and sensitivity achieved with muver, particularly for indels, relative to available tools. Applied to an MMR-deficient Saccharomyces cerevisiae system, muver mutation calls facilitate mechanistic insights into DNA replication fidelity.

  13. TreeShrink: fast and accurate detection of outlier long branches in collections of phylogenetic trees.

    PubMed

    Mai, Uyen; Mirarab, Siavash

    2018-05-08

    Sequence data used in reconstructing phylogenetic trees may include various sources of error. Typically errors are detected at the sequence level, but when missed, the erroneous sequences often appear as unexpectedly long branches in the inferred phylogeny. We propose an automatic method to detect such errors. We build a phylogeny including all the data then detect sequences that artificially inflate the tree diameter. We formulate an optimization problem, called the k-shrink problem, that seeks to find k leaves that could be removed to maximally reduce the tree diameter. We present an algorithm to find the exact solution for this problem in polynomial time. We then use several statistical tests to find outlier species that have an unexpectedly high impact on the tree diameter. These tests can use a single tree or a set of related gene trees and can also adjust to species-specific patterns of branch length. The resulting method is called TreeShrink. We test our method on six phylogenomic biological datasets and an HIV dataset and show that the method successfully detects and removes long branches. TreeShrink removes sequences more conservatively than rogue taxon removal and often reduces gene tree discordance more than rogue taxon removal once the amount of filtering is controlled. TreeShrink is an effective method for detecting sequences that lead to unrealistically long branch lengths in phylogenetic trees. The tool is publicly available at https://github.com/uym2/TreeShrink .

  14. Systems configured to distribute a telephone call, communication systems, communication methods and methods of routing a telephone call to a service representative

    DOEpatents

    Harris, Scott H.; Johnson, Joel A.; Neiswanger, Jeffery R.; Twitchell, Kevin E.

    2004-03-09

    The present invention includes systems configured to distribute a telephone call, communication systems, communication methods and methods of routing a telephone call to a customer service representative. In one embodiment of the invention, a system configured to distribute a telephone call within a network includes a distributor adapted to connect with a telephone system, the distributor being configured to connect a telephone call using the telephone system and output the telephone call and associated data of the telephone call; and a plurality of customer service representative terminals connected with the distributor and a selected customer service representative terminal being configured to receive the telephone call and the associated data, the distributor and the selected customer service representative terminal being configured to synchronize, application of the telephone call and associated data from the distributor to the selected customer service representative terminal.

  15. On correct evaluation techniques of brightness enhancement effect measurement data

    NASA Astrophysics Data System (ADS)

    Kukačka, Leoš; Dupuis, Pascal; Motomura, Hideki; Rozkovec, Jiří; Kolář, Milan; Zissis, Georges; Jinno, Masafumi

    2017-11-01

    This paper aims to establish confidence intervals of the quantification of brightness enhancement effects resulting from the use of pulsing bright light. It is found that the methods used so far may yield significant bias in the published results, overestimating or underestimating the enhancement effect. The authors propose to use a linear algebra method called the total least squares. Upon an example dataset, it is shown that this method does not yield biased results. The statistical significance of the results is also computed. It is concluded over an observation set that the currently used linear algebra methods present many patterns of noise sensitivity. Changing algorithm details leads to inconsistent results. It is thus recommended to use the method with the lowest noise sensitivity. Moreover, it is shown that this method also permits one to obtain an estimate of the confidence interval. This paper neither aims to publish results about a particular experiment nor to draw any particular conclusion about existence or nonexistence of the brightness enhancement effect.

  16. Structurally Sound Statistics Instruction

    ERIC Educational Resources Information Center

    Casey, Stephanie A.; Bostic, Jonathan D.

    2016-01-01

    The Common Core's Standards for Mathematical Practice (SMP) call for all K-grade 12 students to develop expertise in the processes and proficiencies of doing mathematics. However, the Common Core State Standards for Mathematics (CCSSM) (CCSSI 2010) as a whole addresses students' learning of not only mathematics but also statistics. This situation…

  17. ENVIRONMENTAL MONITORING AND ASSESSMENT PROGRAM (EMAP): WESTERN STREAMS AND RIVERS STATISTICAL SUMMARY

    EPA Science Inventory

    This statistical summary reports data from the Environmental Monitoring and Assessment Program (EMAP) Western Pilot (EMAP-W). EMAP-W was a sample survey (or probability survey, often simply called 'random') of streams and rivers in 12 states of the western U.S. (Arizona, Californ...

  18. Soldier Decision-Making for Allocation of Intelligence, Surveillance, and Reconnaissance Assets

    DTIC Science & Technology

    2014-06-01

    Judgments; also called Algoritmic or Statistical Judgements Computer Science , Psychology, and Statistics Actuarial or algorithmic...Jan. 2011. [17] R. M. Dawes, D. Faust, and P. E. Meehl, “Clinical versus Actuarial Judgment,” Science , vol. 243, no. 4899, pp. 1668–1674, 1989. [18...School of Computer Science

  19. 17 CFR Appendix A to Part 145 - Compilation of Commission Records Available to the Public

    Code of Federal Regulations, 2011 CFR

    2011-04-01

    ... photographs. (10) Statistical data concerning the Commission's budget. (11) Statistical data concerning...) Complaint packages, which contain the Reparation Rules, Brochure “Questions and Answers About How You Can... grain reports. (2) Weekly cotton or call reports. (f) Division of Enforcement. Complaint package...

  20. Automated surveillance of 911 call data for detection of possible water contamination incidents.

    PubMed

    Haas, Adam J; Gibbons, Darcy; Dangel, Chrissy; Allgeier, Steve

    2011-03-30

    Drinking water contamination, with the capability to affect large populations, poses a significant risk to public health. In recent water contamination events, the impact of contamination on public health appeared in data streams monitoring health-seeking behavior. While public health surveillance has traditionally focused on the detection of pathogens, developing methods for detection of illness from fast-acting chemicals has not been an emphasis. An automated surveillance system was implemented for Cincinnati's drinking water contamination warning system to monitor health-related 911 calls in the city of Cincinnati. Incident codes indicative of possible water contamination were filtered from all 911 calls for analysis. The 911 surveillance system uses a space-time scan statistic to detect potential water contamination incidents. The frequency and characteristics of the 911 alarms over a 2.5 year period were studied. During the evaluation, 85 alarms occurred, although most occurred prior to the implementation of an additional alerting constraint in May 2009. Data were available for analysis approximately 48 minutes after calls indicating alarms may be generated 1-2 hours after a rapid increase in call volume. Most alerts occurred in areas of high population density. The average alarm area was 9.22 square kilometers. The average number of cases in an alarm was nine calls. The 911 surveillance system provides timely notification of possible public health events, but did have limitations. While the alarms contained incident codes and location of the caller, additional information such as medical status was not available to assist validating the cause of the alarm. Furthermore, users indicated that a better understanding of 911 system functionality is necessary to understand how it would behave in an actual water contamination event.

  1. Impact factor: Universalism and reliability of assessment.

    PubMed

    Grzybowski, Andrzej; Patryn, Rafał

    In 1955, Eugene Garfield (1925-1917) published a paper in Science where for the first time he advocated the necessity of introducing parameters to assess the quality of scientific journals. Underlying this necessity was an observation of a trend where the whole area of influence in academic publishing was dominated by a narrow group of large interdisciplinary research journals. For this reason, along with Irving H. Sher, they created the impact factor (IF), also called the Garfield impact factor, journal citation rate, journal influence, and journal impact factor. The concept of IF concerns a research discipline called bibliometrics, which uses mathematical and statistical methods to analyze scientific publications. Established by Garfield in 1963, the Science Citation Index, a record of scientific publications and citations therein, contributed directly to the increased importance of this method. Since the 1960s, the register of scientific publications has expanded and their evaluation by the IF has become a fundamental and universal measure of the journal's value. Contrary to the authors' intentions in the creation of the index (IF), it is often used to assess the quality of contributions, simultaneously assessing the authors' achievements or academic career and academic institutions' funding possibilities. Copyright © 2016 Elsevier Inc. All rights reserved.

  2. A framework for the assessment of the spatial and temporal patterns of threatened coastal delphinids

    NASA Astrophysics Data System (ADS)

    Wang, Jingzhen; Yang, Yingting; Yang, Feng; Li, Yuelin; Li, Lianjie; Lin, Derun; He, Tangtian; Liang, Bo; Zhang, Tao; Lin, Yao; Li, Ping; Liu, Wenhua

    2016-01-01

    The massively accelerated biodiversity loss rate in the Anthropocene calls for an efficient and effective way to identify the spatial and temporal dynamics of endangered species. To this end, we developed a useful identification framework based on a case study of locally endangered Sousa chinensis by combining both LEK (local ecological knowledge) evaluation and regional boat-based survey methods. Our study investigated the basic ecological information of Sousa chinensis in the estuaries of eastern Guangdong that had previously been neglected, which could guide the future study and conservation. Based on the statistical testing of reported spatial and temporal dolphins sighting data from fishermen and the ecological monitoring analyses, including sighting rate, site fidelity and residence time estimations, some of the current Sousa chinensis units are likely to be geographically isolated and critically endangered, which calls for much greater conservation efforts. Given the accelerated population extinction rate and increasing budgetary constraints, our survey pattern can be applied in a timely and economically acceptable manner to the spatial and temporal assessment of other threatened coastal delphinids, particularly when population distributions are on a large scale and traditional sampling methods are difficult to implement.

  3. A framework for the assessment of the spatial and temporal patterns of threatened coastal delphinids.

    PubMed

    Wang, Jingzhen; Yang, Yingting; Yang, Feng; Li, Yuelin; Li, Lianjie; Lin, Derun; He, Tangtian; Liang, Bo; Zhang, Tao; Lin, Yao; Li, Ping; Liu, Wenhua

    2016-01-25

    The massively accelerated biodiversity loss rate in the Anthropocene calls for an efficient and effective way to identify the spatial and temporal dynamics of endangered species. To this end, we developed a useful identification framework based on a case study of locally endangered Sousa chinensis by combining both LEK (local ecological knowledge) evaluation and regional boat-based survey methods. Our study investigated the basic ecological information of Sousa chinensis in the estuaries of eastern Guangdong that had previously been neglected, which could guide the future study and conservation. Based on the statistical testing of reported spatial and temporal dolphins sighting data from fishermen and the ecological monitoring analyses, including sighting rate, site fidelity and residence time estimations, some of the current Sousa chinensis units are likely to be geographically isolated and critically endangered, which calls for much greater conservation efforts. Given the accelerated population extinction rate and increasing budgetary constraints, our survey pattern can be applied in a timely and economically acceptable manner to the spatial and temporal assessment of other threatened coastal delphinids, particularly when population distributions are on a large scale and traditional sampling methods are difficult to implement.

  4. Time, frequency, and time-varying Granger-causality measures in neuroscience.

    PubMed

    Cekic, Sezen; Grandjean, Didier; Renaud, Olivier

    2018-05-20

    This article proposes a systematic methodological review and an objective criticism of existing methods enabling the derivation of time, frequency, and time-varying Granger-causality statistics in neuroscience. The capacity to describe the causal links between signals recorded at different brain locations during a neuroscience experiment is indeed of primary interest for neuroscientists, who often have very precise prior hypotheses about the relationships between recorded brain signals. The increasing interest and the huge number of publications related to this topic calls for this systematic review, which describes the very complex methodological aspects underlying the derivation of these statistics. In this article, we first present a general framework that allows us to review and compare Granger-causality statistics in the time domain, and the link with transfer entropy. Then, the spectral and the time-varying extensions are exposed and discussed together with their estimation and distributional properties. Although not the focus of this article, partial and conditional Granger causality, dynamical causal modelling, directed transfer function, directed coherence, partial directed coherence, and their variant are also mentioned. Copyright © 2018 John Wiley & Sons, Ltd.

  5. PEPA test: fast and powerful differential analysis from relative quantitative proteomics data using shared peptides.

    PubMed

    Jacob, Laurent; Combes, Florence; Burger, Thomas

    2018-06-18

    We propose a new hypothesis test for the differential abundance of proteins in mass-spectrometry based relative quantification. An important feature of this type of high-throughput analyses is that it involves an enzymatic digestion of the sample proteins into peptides prior to identification and quantification. Due to numerous homology sequences, different proteins can lead to peptides with identical amino acid chains, so that their parent protein is ambiguous. These so-called shared peptides make the protein-level statistical analysis a challenge and are often not accounted for. In this article, we use a linear model describing peptide-protein relationships to build a likelihood ratio test of differential abundance for proteins. We show that the likelihood ratio statistic can be computed in linear time with the number of peptides. We also provide the asymptotic null distribution of a regularized version of our statistic. Experiments on both real and simulated datasets show that our procedures outperforms state-of-the-art methods. The procedures are available via the pepa.test function of the DAPAR Bioconductor R package.

  6. Meta-analysis is not an exact science: Call for guidance on quantitative synthesis decisions.

    PubMed

    Haddaway, Neal R; Rytwinski, Trina

    2018-05-01

    Meta-analysis is becoming increasingly popular in the field of ecology and environmental management. It increases the effective power of analyses relative to single studies, and allows researchers to investigate effect modifiers and sources of heterogeneity that could not be easily examined within single studies. Many systematic reviewers will set out to conduct a meta-analysis as part of their synthesis, but meta-analysis requires a niche set of skills that are not widely held by the environmental research community. Each step in the process of carrying out a meta-analysis requires decisions that have both scientific and statistical implications. Reviewers are likely to be faced with a plethora of decisions over which effect size to choose, how to calculate variances, and how to build statistical models. Some of these decisions may be simple based on appropriateness of the options. At other times, reviewers must choose between equally valid approaches given the information available to them. This presents a significant problem when reviewers are attempting to conduct a reliable synthesis, such as a systematic review, where subjectivity is minimised and all decisions are documented and justified transparently. We propose three urgent, necessary developments within the evidence synthesis community. Firstly, we call on quantitative synthesis experts to improve guidance on how to prepare data for quantitative synthesis, providing explicit detail to support systematic reviewers. Secondly, we call on journal editors and evidence synthesis coordinating bodies (e.g. CEE) to ensure that quantitative synthesis methods are adequately reported in a transparent and repeatable manner in published systematic reviews. Finally, where faced with two or more broadly equally valid alternative methods or actions, reviewers should conduct multiple analyses, presenting all options, and discussing the implications of the different analytical approaches. We believe it is vital to tackle the possible subjectivity in quantitative synthesis described herein to ensure that the extensive efforts expended in producing systematic reviews and other evidence synthesis products is not wasted because of a lack of rigour or reliability in the final synthesis step. Copyright © 2018 Elsevier Ltd. All rights reserved.

  7. QuASAR: quantitative allele-specific analysis of reads.

    PubMed

    Harvey, Chris T; Moyerbrailean, Gregory A; Davis, Gordon O; Wen, Xiaoquan; Luca, Francesca; Pique-Regi, Roger

    2015-04-15

    Expression quantitative trait loci (eQTL) studies have discovered thousands of genetic variants that regulate gene expression, enabling a better understanding of the functional role of non-coding sequences. However, eQTL studies are costly, requiring large sample sizes and genome-wide genotyping of each sample. In contrast, analysis of allele-specific expression (ASE) is becoming a popular approach to detect the effect of genetic variation on gene expression, even within a single individual. This is typically achieved by counting the number of RNA-seq reads matching each allele at heterozygous sites and testing the null hypothesis of a 1:1 allelic ratio. In principle, when genotype information is not readily available, it could be inferred from the RNA-seq reads directly. However, there are currently no existing methods that jointly infer genotypes and conduct ASE inference, while considering uncertainty in the genotype calls. We present QuASAR, quantitative allele-specific analysis of reads, a novel statistical learning method for jointly detecting heterozygous genotypes and inferring ASE. The proposed ASE inference step takes into consideration the uncertainty in the genotype calls, while including parameters that model base-call errors in sequencing and allelic over-dispersion. We validated our method with experimental data for which high-quality genotypes are available. Results for an additional dataset with multiple replicates at different sequencing depths demonstrate that QuASAR is a powerful tool for ASE analysis when genotypes are not available. http://github.com/piquelab/QuASAR. fluca@wayne.edu or rpique@wayne.edu Supplementary Material is available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  8. Effect of call organization on burnout and quality of life in psychiatry residents.

    PubMed

    Scarella, Timothy M; Nelligan, Julia; Roberts, Jacqueline; Boland, Robert J

    2017-02-01

    We aimed to measure the effects of a residency program's mid-year shift from 24-h call to night float on resident burnout and quality of life. At the end of the year, residents who started the year with 24-h call had worse burnout and quality of life, with statistical significance and large effect sizes. Exposure to a twenty-four hour call system, when compared to a full year of night float, may be associated with increased burnout and decreased quality of life, though measuring this effect is not straightforward. Copyright © 2016 Elsevier B.V. All rights reserved.

  9. Relationship of long-term highly active antiretroviral therapy on salivary flow rate and CD4 Count among HIV-infected patients.

    PubMed

    Kumar, J Vijay; Baghirath, P Venkat; Naishadham, P Parameswar; Suneetha, Sujai; Suneetha, Lavanya; Sreedevi, P

    2015-01-01

    To determine if long-term highly active antiretroviral therapy (HAART) therapy alters salivary flow rate and also to compare its relation of CD4 count with unstimulated and stimulated whole saliva. A cross-sectional study was performed on 150 individuals divided into three groups. Group I (50 human immunodeficiency virus (HIV) seropositive patients, but not on HAART therapy), Group II (50 HIV-infected subjects and on HAART for less than 3 years called short-term HAART), Group III (50 HIV-infected subjects and on HAART for more than or equal to 3 years called long-term HAART). Spitting method proposed by Navazesh and Kumar was used for the measurement of unstimulated and stimulated salivary flow rate. Chi-square test and analysis of variance (ANOVA) were used for statistical analysis. The mean CD4 count was 424.78 ± 187.03, 497.82 ± 206.11 and 537.6 ± 264.00 in the respective groups. Majority of the patients in all the groups had a CD4 count between 401 and 600. Both unstimulated and stimulated whole salivary (UWS and SWS) flow rates in Group I was found to be significantly higher than in Group II (P < 0.05). Unstimulated salivary flow rate between Group II and III subjects were also found to be statistically significant (P < 0.05). ANOVA performed between CD4 count and unstimulated and stimulated whole saliva in each group demonstrated a statistically significant relationship in Group II (P < 0.05). There were no significant results found between CD4 count and stimulated whole saliva in each groups. The reduction in CD4 cell counts were significantly associated with salivary flow rates of HIV-infected individuals who are on long-term HAART.

  10. multiDE: a dimension reduced model based statistical method for differential expression analysis using RNA-sequencing data with multiple treatment conditions.

    PubMed

    Kang, Guangliang; Du, Li; Zhang, Hong

    2016-06-22

    The growing complexity of biological experiment design based on high-throughput RNA sequencing (RNA-seq) is calling for more accommodative statistical tools. We focus on differential expression (DE) analysis using RNA-seq data in the presence of multiple treatment conditions. We propose a novel method, multiDE, for facilitating DE analysis using RNA-seq read count data with multiple treatment conditions. The read count is assumed to follow a log-linear model incorporating two factors (i.e., condition and gene), where an interaction term is used to quantify the association between gene and condition. The number of the degrees of freedom is reduced to one through the first order decomposition of the interaction, leading to a dramatically power improvement in testing DE genes when the number of conditions is greater than two. In our simulation situations, multiDE outperformed the benchmark methods (i.e. edgeR and DESeq2) even if the underlying model was severely misspecified, and the power gain was increasing in the number of conditions. In the application to two real datasets, multiDE identified more biologically meaningful DE genes than the benchmark methods. An R package implementing multiDE is available publicly at http://homepage.fudan.edu.cn/zhangh/softwares/multiDE . When the number of conditions is two, multiDE performs comparably with the benchmark methods. When the number of conditions is greater than two, multiDE outperforms the benchmark methods.

  11. PCI fuel failure analysis: a report on a cooperative program undertaken by Pacific Northwest Laboratory and Chalk River Nuclear Laboratories.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mohr, C.L.; Pankaskie, P.J.; Heasler, P.G.

    Reactor fuel failure data sets in the form of initial power (P/sub i/), final power (P/sub f/), transient increase in power (..delta..P), and burnup (Bu) were obtained for pressurized heavy water reactors (PHWRs), boiling water reactors (BWRs), and pressurized water reactors (PWRs). These data sets were evaluated and used as the basis for developing two predictive fuel failure models, a graphical concept called the PCI-OGRAM, and a nonlinear regression based model called PROFIT. The PCI-OGRAM is an extension of the FUELOGRAM developed by AECL. It is based on a critical threshold concept for stress dependent stress corrosion cracking. The PROFITmore » model, developed at Pacific Northwest Laboratory, is the result of applying standard statistical regression methods to the available PCI fuel failure data and an analysis of the environmental and strain rate dependent stress-strain properties of the Zircaloy cladding.« less

  12. Network inference using informative priors

    PubMed Central

    Mukherjee, Sach; Speed, Terence P.

    2008-01-01

    Recent years have seen much interest in the study of systems characterized by multiple interacting components. A class of statistical models called graphical models, in which graphs are used to represent probabilistic relationships between variables, provides a framework for formal inference regarding such systems. In many settings, the object of inference is the network structure itself. This problem of “network inference” is well known to be a challenging one. However, in scientific settings there is very often existing information regarding network connectivity. A natural idea then is to take account of such information during inference. This article addresses the question of incorporating prior information into network inference. We focus on directed models called Bayesian networks, and use Markov chain Monte Carlo to draw samples from posterior distributions over network structures. We introduce prior distributions on graphs capable of capturing information regarding network features including edges, classes of edges, degree distributions, and sparsity. We illustrate our approach in the context of systems biology, applying our methods to network inference in cancer signaling. PMID:18799736

  13. Network inference using informative priors.

    PubMed

    Mukherjee, Sach; Speed, Terence P

    2008-09-23

    Recent years have seen much interest in the study of systems characterized by multiple interacting components. A class of statistical models called graphical models, in which graphs are used to represent probabilistic relationships between variables, provides a framework for formal inference regarding such systems. In many settings, the object of inference is the network structure itself. This problem of "network inference" is well known to be a challenging one. However, in scientific settings there is very often existing information regarding network connectivity. A natural idea then is to take account of such information during inference. This article addresses the question of incorporating prior information into network inference. We focus on directed models called Bayesian networks, and use Markov chain Monte Carlo to draw samples from posterior distributions over network structures. We introduce prior distributions on graphs capable of capturing information regarding network features including edges, classes of edges, degree distributions, and sparsity. We illustrate our approach in the context of systems biology, applying our methods to network inference in cancer signaling.

  14. Imputation of missing data in time series for air pollutants

    NASA Astrophysics Data System (ADS)

    Junger, W. L.; Ponce de Leon, A.

    2015-02-01

    Missing data are major concerns in epidemiological studies of the health effects of environmental air pollutants. This article presents an imputation-based method that is suitable for multivariate time series data, which uses the EM algorithm under the assumption of normal distribution. Different approaches are considered for filtering the temporal component. A simulation study was performed to assess validity and performance of proposed method in comparison with some frequently used methods. Simulations showed that when the amount of missing data was as low as 5%, the complete data analysis yielded satisfactory results regardless of the generating mechanism of the missing data, whereas the validity began to degenerate when the proportion of missing values exceeded 10%. The proposed imputation method exhibited good accuracy and precision in different settings with respect to the patterns of missing observations. Most of the imputations obtained valid results, even under missing not at random. The methods proposed in this study are implemented as a package called mtsdi for the statistical software system R.

  15. Searching the Heavens: Astronomy, Computation, Statistics, Data Mining and Philosophy

    NASA Astrophysics Data System (ADS)

    Glymour, Clark

    2012-03-01

    Our first and purest science, the mother of scientific methods, sustained by sheer curiosity, searching the heavens we cannot manipulate. From the beginning, astronomy has combined mathematical idealization, technological ingenuity, and indefatigable data collection with procedures to search through assembled data for the processes that govern the cosmos. Astronomers are, and ever have been, data miners, and for that reason astronomical methods (but not astronomical discoveries) have often been despised by statisticians and philosophers. Epithets laced the statistical literature: Ransacking! Data dredging! Double Counting! Statistical disdain was usually directed at social scientists and biologists, rarely if ever at astronomers, but the methodological attitudes and goals that many twentieth-century philosophers and statisticians rejected were creations of the astronomical tradition. The philosophical criticisms were earlier and more direct. In the shadow (or in Alexander Pope’s phrasing, the light) cast on nature in the eighteenth century by the Newtonian triumph, David Hume revived arguments from the ancient Greeks to challenge the very possibility of coming to know what causes what. His conclusion was endorsed in the twentieth century by many philosophers who found talk of causation unnecessary or unacceptably metaphysical, and absorbed by many statisticians as a general suspicion of causal claims, except possibly when they are founded on experimental manipulation. And yet in the hands of a mathematician, Thomas Bayes, and another mathematician and philosopher, Richard Price, Hume’s essays prompted the development of a new kind of statistics, the kind we now call "Bayesian." The computer and new data acquisition methods have begun to dissolve the antipathy between astronomy, philosophy, and statistics. But the resolution is practical, without much reflection on the arguments or the course of events. So, I offer a largely unoriginal history, substituting rather dry commentary on method for the fuller, livelier history of astronomers’ ambitions, politics, and passions. My accounts of various episodes in the astronomical tradition are taken from standard sources, especially Neugebauer (1952), Baum & Sheehan (1997), Crelensten (2006), and Stigler (1990). Methodological commentary is mine, not that of these sources.

  16. Expert Systems on Multiprocessor Architectures. Volume 4. Technical Reports

    DTIC Science & Technology

    1991-06-01

    Floated-Current-Time0 -> The time that this function is called in user time uflts, expressed as a floating point number. Halt- Poligono Arrests the...default a statistics file will be printed out, if it can be. To prevent this make No-Statistics true. Unhalt- Poligono Unarrests the process in which the

  17. 78 FR 30922 - Agency Information Collection Activities: Submission for OMB Review; Joint Comment Request

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-05-23

    ..., Division of Research and Statistics, Board of Governors of the Federal Reserve System, 20th and C Streets... individual institutions and the industry as a whole. Call Report data provide the most current statistical data available for evaluating institutions' corporate applications, identifying areas of focus for on...

  18. Factors Contributing to Academic Achievement: A Bayesian Structure Equation Modelling Study

    ERIC Educational Resources Information Center

    Payandeh Najafabadi, Amir T.; Najafabadi, Maryam Omidi; Farid-Rohani, Mohammad Reza

    2013-01-01

    In Iran, high school graduates enter university after taking a very difficult entrance exam called the Konkoor. Therefore, only the top-performing students are admitted by universities to continue their bachelor's education in statistics. Surprisingly, statistically, most of such students fall into the following categories: (1) do not succeed in…

  19. Hispanics in 1979--A Statistical Appraisal.

    ERIC Educational Resources Information Center

    Martinez, Douglas R.

    1979-01-01

    Public Law 94-311, passed by Congress in 1976, called for the expansion of statistics reflecting the socioeconomic status of Hispanics. The article discusses the state of the Hispanic community in early 1979, one agency's difficulties in implementing the mandates of P.L. 94-311, and the status of other agencies' work on the law's requirements. (NQ)

  20. 75 FR 29517 - Pacific Fishery Management Council; Public Meetings

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-05-26

    ... Remarks and Introductions 2. Roll Call 3. Executive Director's Report 4. Approve Agenda B. Groundfish... and Statistical Committee 8 a.m Habitat Committee 8:30 a.m Budget Committee 1:15 p.m Saturday, June 12... Statistical Committee 8 a.m Council Chair's Reception 6 p.m Sunday, June 13, 2010 California State Delegation...

  1. 75 FR 51240 - Pacific Fishery Management Council; Public Meetings

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-08-19

    ... Remarks and Introductions 2. Council Member Appointments 3. Roll Call 4. Executive Director's Report 5... Technical Team/Salmon Amendment Committee Joint 8 a.m. Session Scientific and Statistical Committee 8 a.m.... Salmon Technical Team 8 a.m. Scientific and Statistical Committee 8 a.m. Enforcement Consultants 4:30 p.m...

  2. Statistical Literacy for Active Citizenship: A Call for Data Science Education

    ERIC Educational Resources Information Center

    Engel, Joachim

    2017-01-01

    Data are abundant, quantitative information about the state of society and the wider world is around us more than ever. Paradoxically, recent trends in the public discourse point towards a post-factual world that seems content to ignore or misrepresent empirical evidence. As statistics educators we are challenged to promote understanding of…

  3. Image encryption using random sequence generated from generalized information domain

    NASA Astrophysics Data System (ADS)

    Xia-Yan, Zhang; Guo-Ji, Zhang; Xuan, Li; Ya-Zhou, Ren; Jie-Hua, Wu

    2016-05-01

    A novel image encryption method based on the random sequence generated from the generalized information domain and permutation-diffusion architecture is proposed. The random sequence is generated by reconstruction from the generalized information file and discrete trajectory extraction from the data stream. The trajectory address sequence is used to generate a P-box to shuffle the plain image while random sequences are treated as keystreams. A new factor called drift factor is employed to accelerate and enhance the performance of the random sequence generator. An initial value is introduced to make the encryption method an approximately one-time pad. Experimental results show that the random sequences pass the NIST statistical test with a high ratio and extensive analysis demonstrates that the new encryption scheme has superior security.

  4. Statistical inference of dynamic resting-state functional connectivity using hierarchical observation modeling.

    PubMed

    Sojoudi, Alireza; Goodyear, Bradley G

    2016-12-01

    Spontaneous fluctuations of blood-oxygenation level-dependent functional magnetic resonance imaging (BOLD fMRI) signals are highly synchronous between brain regions that serve similar functions. This provides a means to investigate functional networks; however, most analysis techniques assume functional connections are constant over time. This may be problematic in the case of neurological disease, where functional connections may be highly variable. Recently, several methods have been proposed to determine moment-to-moment changes in the strength of functional connections over an imaging session (so called dynamic connectivity). Here a novel analysis framework based on a hierarchical observation modeling approach was proposed, to permit statistical inference of the presence of dynamic connectivity. A two-level linear model composed of overlapping sliding windows of fMRI signals, incorporating the fact that overlapping windows are not independent was described. To test this approach, datasets were synthesized whereby functional connectivity was either constant (significant or insignificant) or modulated by an external input. The method successfully determines the statistical significance of a functional connection in phase with the modulation, and it exhibits greater sensitivity and specificity in detecting regions with variable connectivity, when compared with sliding-window correlation analysis. For real data, this technique possesses greater reproducibility and provides a more discriminative estimate of dynamic connectivity than sliding-window correlation analysis. Hum Brain Mapp 37:4566-4580, 2016. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

  5. A Statistical Method to Distinguish Functional Brain Networks

    PubMed Central

    Fujita, André; Vidal, Maciel C.; Takahashi, Daniel Y.

    2017-01-01

    One major problem in neuroscience is the comparison of functional brain networks of different populations, e.g., distinguishing the networks of controls and patients. Traditional algorithms are based on search for isomorphism between networks, assuming that they are deterministic. However, biological networks present randomness that cannot be well modeled by those algorithms. For instance, functional brain networks of distinct subjects of the same population can be different due to individual characteristics. Moreover, networks of subjects from different populations can be generated through the same stochastic process. Thus, a better hypothesis is that networks are generated by random processes. In this case, subjects from the same group are samples from the same random process, whereas subjects from different groups are generated by distinct processes. Using this idea, we developed a statistical test called ANOGVA to test whether two or more populations of graphs are generated by the same random graph model. Our simulations' results demonstrate that we can precisely control the rate of false positives and that the test is powerful to discriminate random graphs generated by different models and parameters. The method also showed to be robust for unbalanced data. As an example, we applied ANOGVA to an fMRI dataset composed of controls and patients diagnosed with autism or Asperger. ANOGVA identified the cerebellar functional sub-network as statistically different between controls and autism (p < 0.001). PMID:28261045

  6. A Statistical Method to Distinguish Functional Brain Networks.

    PubMed

    Fujita, André; Vidal, Maciel C; Takahashi, Daniel Y

    2017-01-01

    One major problem in neuroscience is the comparison of functional brain networks of different populations, e.g., distinguishing the networks of controls and patients. Traditional algorithms are based on search for isomorphism between networks, assuming that they are deterministic. However, biological networks present randomness that cannot be well modeled by those algorithms. For instance, functional brain networks of distinct subjects of the same population can be different due to individual characteristics. Moreover, networks of subjects from different populations can be generated through the same stochastic process. Thus, a better hypothesis is that networks are generated by random processes. In this case, subjects from the same group are samples from the same random process, whereas subjects from different groups are generated by distinct processes. Using this idea, we developed a statistical test called ANOGVA to test whether two or more populations of graphs are generated by the same random graph model. Our simulations' results demonstrate that we can precisely control the rate of false positives and that the test is powerful to discriminate random graphs generated by different models and parameters. The method also showed to be robust for unbalanced data. As an example, we applied ANOGVA to an fMRI dataset composed of controls and patients diagnosed with autism or Asperger. ANOGVA identified the cerebellar functional sub-network as statistically different between controls and autism ( p < 0.001).

  7. The use of statistical methods for censored data to evaluate the activity concentration of Pb-210 in beans (Phaseolus vulgaris L.).

    PubMed

    Mingote, Raquel M; Nogueira, Regina A

    2016-10-01

    A survey of 210 Pb activity concentration, one of the major internal natural radiation sources to man, has been carried in the most common species of beans (Phaseolus vulgaris L.) grown and consumed in Brazil. The representative bean types chosen, Carioca beans and black type sown in the Brazilian Midwestern and Southern regions, have been collected in this study and 210 Pb determined by liquid scintillation spectrometry after separation with chromatographic extraction using Sr-resin. Available values in data set of radioactivity in Brazil (GEORAD) on the 210 Pb activity concentration in black beans grown in Southeastern region have been added to the results of this study with the purpose of to amplify the population considered. Concerning the multiple detection limits and due to the high level of censored observations, a robust semi-parametric statistical method called regression on order statistics (ROS) has been employed to provide a reference value of the 210 Pb in Brazilian beans, which amounted to 41 mBq kg -1 fresh wt. The results suggest that the 210 Pb activity concentration in carioca beans is lower than in black beans. Also evaluated was the 210 Pb activity concentration in vegetable component of a typical diet, which displays lower values than those shown in the literature for food consumed in Europe. Copyright © 2016 Elsevier Ltd. All rights reserved.

  8. Educator and participant perceptions and cost analysis of stage-tailored educational telephone calls.

    PubMed

    Esters, Onikia N; Boeckner, Linda S; Hubert, Melanie; Horacek, Tanya; Kritsch, Karen R; Oakland, Mary J; Lohse, Barbara; Greene, Geoffrey; Nitzke, Susan

    2008-01-01

    To identify strengths and weaknesses of nutrition education via telephone calls as part of a larger stage-of-change tailored intervention with mailed materials. Evaluative feedback was elicited from educators who placed the calls and respondents who received the calls. An internet and telephone survey of 10 states in the midwestern United States. 21 educators in 10 states reached via the internet and 50 young adults reached via telephone. VARIABLES MEASURED AND ANALYSIS: Rankings of intervention components, ratings of key aspects of educational calls, and cost data (as provided by a lead researcher in each state) were summarized via descriptive statistics. RESULTS, CONCLUSIONS, AND IMPLICATIONS: Educational calls used 6 to 17 minutes of preparation time, required 8 to 15 minutes of contact time, and had a mean estimated cost of $5.82 per call. Low-income young adults favored print materials over educational calls. However, the calls were reported to have positive effects on motivating participants to set goals. Educators who use educational telephone calls to reach young adults, a highly mobile target audience, may require a robust and flexible contact plan.

  9. Cold Calling and Web Postings: Do They Improve Students' Preparation and Learning in Statistics?

    ERIC Educational Resources Information Center

    Levy, Dan

    2014-01-01

    Getting students to prepare well for class is a common challenge faced by instructors all over the world. This study investigates the effects that two frequently used techniques to increase student preparation--web postings and cold calling--have on student outcomes. The study is based on two experiments and a qualitative study conducted in a…

  10. No Place to Call Home: Child & Youth Homelessness in the United States. Poverty Fact Sheet

    ERIC Educational Resources Information Center

    Damron, Neil

    2015-01-01

    "No Place to Call Home: Child and Youth Homelessness in the United States," prepared by intern Neil Damron and released in May 2015, presents the statistics on child and youth homelessness and recent trends in Wisconsin and the United States. It explores the major challenges faced by homeless minors, and, drawing from recent research by…

  11. Preventive Intervention in Families at Risk: The Limits of Liberalism

    ERIC Educational Resources Information Center

    Snik, Ger; De Jong, Johan; Van Haaften, Wouter

    2004-01-01

    There is an increasing call for preventive state interventions in so-called families at risk - that is, interventions before any overt harm has been done by parents to their children or by the children to a third party, in families that are statistically known to be liable to harm children. One of the basic principles of liberal morality, however,…

  12. STATCONT: A statistical continuum level determination method for line-rich sources

    NASA Astrophysics Data System (ADS)

    Sánchez-Monge, Á.; Schilke, P.; Ginsburg, A.; Cesaroni, R.; Schmiedeke, A.

    2018-01-01

    STATCONT is a python-based tool designed to determine the continuum emission level in spectral data, in particular for sources with a line-rich spectrum. The tool inspects the intensity distribution of a given spectrum and automatically determines the continuum level by using different statistical approaches. The different methods included in STATCONT are tested against synthetic data. We conclude that the sigma-clipping algorithm provides the most accurate continuum level determination, together with information on the uncertainty in its determination. This uncertainty can be used to correct the final continuum emission level, resulting in the here called `corrected sigma-clipping method' or c-SCM. The c-SCM has been tested against more than 750 different synthetic spectra reproducing typical conditions found towards astronomical sources. The continuum level is determined with a discrepancy of less than 1% in 50% of the cases, and less than 5% in 90% of the cases, provided at least 10% of the channels are line free. The main products of STATCONT are the continuum emission level, together with a conservative value of its uncertainty, and datacubes containing only spectral line emission, i.e., continuum-subtracted datacubes. STATCONT also includes the option to estimate the spectral index, when different files covering different frequency ranges are provided.

  13. On data processing required to derive mobility patterns from passively-generated mobile phone data

    PubMed Central

    Wang, Feilong; Chen, Cynthia

    2018-01-01

    Passively-generated mobile phone data is emerging as a potential data source for transportation research and applications. Despite the large amount of studies based on the mobile phone data, only a few have reported the properties of such data, and documented how they have processed the data. In this paper, we describe two types of common mobile phone data: Call Details Record (CDR) data and sightings data, and propose a data processing framework and the associated algorithms to address two key issues associated with the sightings data: locational uncertainty and oscillation. We show the effectiveness of our proposed methods in addressing these two issues compared to the state of art algorithms in the field. We also demonstrate that without proper processing applied to the data, the statistical regularity of human mobility patterns—a key, significant trait identified for human mobility—is over-estimated. We hope this study will stimulate more studies in examining the properties of such data and developing methods to address them. Though not as glamorous as those directly deriving insights on mobility patterns (such as statistical regularity), understanding properties of such data and developing methods to address them is a fundamental research topic on which important insights are derived on mobility patterns. PMID:29398790

  14. Interactive searching of facial image databases

    NASA Astrophysics Data System (ADS)

    Nicholls, Robert A.; Shepherd, John W.; Shepherd, Jean

    1995-09-01

    A set of psychological facial descriptors has been devised to enable computerized searching of criminal photograph albums. The descriptors have been used to encode image databased of up to twelve thousand images. Using a system called FACES, the databases are searched by translating a witness' verbal description into corresponding facial descriptors. Trials of FACES have shown that this coding scheme is more productive and efficient than searching traditional photograph albums. An alternative method of searching the encoded database using a genetic algorithm is currenly being tested. The genetic search method does not require the witness to verbalize a description of the target but merely to indicate a degree of similarity between the target and a limited selection of images from the database. The major drawback of FACES is that is requires a manual encoding of images. Research is being undertaken to automate the process, however, it will require an algorithm which can predict human descriptive values. Alternatives to human derived coding schemes exist using statistical classifications of images. Since databases encoded using statistical classifiers do not have an obvious direct mapping to human derived descriptors, a search method which does not require the entry of human descriptors is required. A genetic search algorithm is being tested for such a purpose.

  15. Accelerating simulation for the multiple-point statistics algorithm using vector quantization

    NASA Astrophysics Data System (ADS)

    Zuo, Chen; Pan, Zhibin; Liang, Hao

    2018-03-01

    Multiple-point statistics (MPS) is a prominent algorithm to simulate categorical variables based on a sequential simulation procedure. Assuming training images (TIs) as prior conceptual models, MPS extracts patterns from TIs using a template and records their occurrences in a database. However, complex patterns increase the size of the database and require considerable time to retrieve the desired elements. In order to speed up simulation and improve simulation quality over state-of-the-art MPS methods, we propose an accelerating simulation for MPS using vector quantization (VQ), called VQ-MPS. First, a variable representation is presented to make categorical variables applicable for vector quantization. Second, we adopt a tree-structured VQ to compress the database so that stationary simulations are realized. Finally, a transformed template and classified VQ are used to address nonstationarity. A two-dimensional (2D) stationary channelized reservoir image is used to validate the proposed VQ-MPS. In comparison with several existing MPS programs, our method exhibits significantly better performance in terms of computational time, pattern reproductions, and spatial uncertainty. Further demonstrations consist of a 2D four facies simulation, two 2D nonstationary channel simulations, and a three-dimensional (3D) rock simulation. The results reveal that our proposed method is also capable of solving multifacies, nonstationarity, and 3D simulations based on 2D TIs.

  16. Factors associated with good compliance and long-term sustainability in a practitioner-based livestock disease surveillance system.

    PubMed

    Zurbrigg, Katherine J; Van den Borre, Nicole M

    2013-03-01

    The Ontario Farm call Surveillance Project (OFSP) was a practitioner-based, syndromic surveillance system for livestock disease. Three data-recording methods (paper, web-based, and handheld electronic) used by participating veterinarians were compared for timeliness (when the report arrived at the OFSP office), completeness of the report, and the usage and costs of incentives offered to veterinarians as compensation for their time to record data. There were no statistically significant differences in these parameters among the 3 data-recording methods. This indicates that different data-recording methods can be used within a single veterinary surveillance program while maintaining data integrity and timely reporting. Factors such as ease of data collection and providing incentives valued by veterinarians ensured high compliance and long-term participation in the project. It also increased the diversity of the participant group, reducing the likelihood of biased data submissions.

  17. Personalized Modeling for Prediction with Decision-Path Models

    PubMed Central

    Visweswaran, Shyam; Ferreira, Antonio; Ribeiro, Guilherme A.; Oliveira, Alexandre C.; Cooper, Gregory F.

    2015-01-01

    Deriving predictive models in medicine typically relies on a population approach where a single model is developed from a dataset of individuals. In this paper we describe and evaluate a personalized approach in which we construct a new type of decision tree model called decision-path model that takes advantage of the particular features of a given person of interest. We introduce three personalized methods that derive personalized decision-path models. We compared the performance of these methods to that of Classification And Regression Tree (CART) that is a population decision tree to predict seven different outcomes in five medical datasets. Two of the three personalized methods performed statistically significantly better on area under the ROC curve (AUC) and Brier skill score compared to CART. The personalized approach of learning decision path models is a new approach for predictive modeling that can perform better than a population approach. PMID:26098570

  18. Certification of ICI 1012 optical data storage tape

    NASA Technical Reports Server (NTRS)

    Howell, J. M.

    1993-01-01

    ICI has developed a unique and novel method of certifying a Terabyte optical tape. The tape quality is guaranteed as a statistical upper limit on the probability of uncorrectable errors. This is called the Corrected Byte Error Rate or CBER. We developed this probabilistic method because of two reasons why error rate cannot be measured directly. Firstly, written data is indelible, so one cannot employ write/read tests such as used for magnetic tape. Secondly, the anticipated error rates need impractically large samples to measure accurately. For example, a rate of 1E-12 implies only one byte in error per tape. The archivability of ICI 1012 Data Storage Tape in general is well characterized and understood. Nevertheless, customers expect performance guarantees to be supported by test results on individual tapes. In particular, they need assurance that data is retrievable after decades in archive. This paper describes the mathematical basis, measurement apparatus and applicability of the certification method.

  19. Factors associated with good compliance and long-term sustainability in a practitioner-based livestock disease surveillance system

    PubMed Central

    Zurbrigg, Katherine J.; Van den Borre, Nicole M.

    2013-01-01

    The Ontario Farm call Surveillance Project (OFSP) was a practitioner-based, syndromic surveillance system for livestock disease. Three data-recording methods (paper, web-based, and handheld electronic) used by participating veterinarians were compared for timeliness (when the report arrived at the OFSP office), completeness of the report, and the usage and costs of incentives offered to veterinarians as compensation for their time to record data. There were no statistically significant differences in these parameters among the 3 data-recording methods. This indicates that different data-recording methods can be used within a single veterinary surveillance program while maintaining data integrity and timely reporting. Factors such as ease of data collection and providing incentives valued by veterinarians ensured high compliance and long-term participation in the project. It also increased the diversity of the participant group, reducing the likelihood of biased data submissions. PMID:23997260

  20. Final Report: Quantification of Uncertainty in Extreme Scale Computations (QUEST)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Marzouk, Youssef; Conrad, Patrick; Bigoni, Daniele

    QUEST (\\url{www.quest-scidac.org}) is a SciDAC Institute that is focused on uncertainty quantification (UQ) in large-scale scientific computations. Our goals are to (1) advance the state of the art in UQ mathematics, algorithms, and software; and (2) provide modeling, algorithmic, and general UQ expertise, together with software tools, to other SciDAC projects, thereby enabling and guiding a broad range of UQ activities in their respective contexts. QUEST is a collaboration among six institutions (Sandia National Laboratories, Los Alamos National Laboratory, the University of Southern California, Massachusetts Institute of Technology, the University of Texas at Austin, and Duke University) with a historymore » of joint UQ research. Our vision encompasses all aspects of UQ in leadership-class computing. This includes the well-founded setup of UQ problems; characterization of the input space given available data/information; local and global sensitivity analysis; adaptive dimensionality and order reduction; forward and inverse propagation of uncertainty; handling of application code failures, missing data, and hardware/software fault tolerance; and model inadequacy, comparison, validation, selection, and averaging. The nature of the UQ problem requires the seamless combination of data, models, and information across this landscape in a manner that provides a self-consistent quantification of requisite uncertainties in predictions from computational models. Accordingly, our UQ methods and tools span an interdisciplinary space across applied math, information theory, and statistics. The MIT QUEST effort centers on statistical inference and methods for surrogate or reduced-order modeling. MIT personnel have been responsible for the development of adaptive sampling methods, methods for approximating computationally intensive models, and software for both forward uncertainty propagation and statistical inverse problems. A key software product of the MIT QUEST effort is the MIT Uncertainty Quantification library, called MUQ (\\url{muq.mit.edu}).« less

  1. Spike Pattern Structure Influences Synaptic Efficacy Variability under STDP and Synaptic Homeostasis. II: Spike Shuffling Methods on LIF Networks

    PubMed Central

    Bi, Zedong; Zhou, Changsong

    2016-01-01

    Synapses may undergo variable changes during plasticity because of the variability of spike patterns such as temporal stochasticity and spatial randomness. Here, we call the variability of synaptic weight changes during plasticity to be efficacy variability. In this paper, we investigate how four aspects of spike pattern statistics (i.e., synchronous firing, burstiness/regularity, heterogeneity of rates and heterogeneity of cross-correlations) influence the efficacy variability under pair-wise additive spike-timing dependent plasticity (STDP) and synaptic homeostasis (the mean strength of plastic synapses into a neuron is bounded), by implementing spike shuffling methods onto spike patterns self-organized by a network of excitatory and inhibitory leaky integrate-and-fire (LIF) neurons. With the increase of the decay time scale of the inhibitory synaptic currents, the LIF network undergoes a transition from asynchronous state to weak synchronous state and then to synchronous bursting state. We first shuffle these spike patterns using a variety of methods, each designed to evidently change a specific pattern statistics; and then investigate the change of efficacy variability of the synapses under STDP and synaptic homeostasis, when the neurons in the network fire according to the spike patterns before and after being treated by a shuffling method. In this way, we can understand how the change of pattern statistics may cause the change of efficacy variability. Our results are consistent with those of our previous study which implements spike-generating models on converging motifs. We also find that burstiness/regularity is important to determine the efficacy variability under asynchronous states, while heterogeneity of cross-correlations is the main factor to cause efficacy variability when the network moves into synchronous bursting states (the states observed in epilepsy). PMID:27555816

  2. The statistical mechanics of complex signaling networks: nerve growth factor signaling

    NASA Astrophysics Data System (ADS)

    Brown, K. S.; Hill, C. C.; Calero, G. A.; Myers, C. R.; Lee, K. H.; Sethna, J. P.; Cerione, R. A.

    2004-10-01

    The inherent complexity of cellular signaling networks and their importance to a wide range of cellular functions necessitates the development of modeling methods that can be applied toward making predictions and highlighting the appropriate experiments to test our understanding of how these systems are designed and function. We use methods of statistical mechanics to extract useful predictions for complex cellular signaling networks. A key difficulty with signaling models is that, while significant effort is being made to experimentally measure the rate constants for individual steps in these networks, many of the parameters required to describe their behavior remain unknown or at best represent estimates. To establish the usefulness of our approach, we have applied our methods toward modeling the nerve growth factor (NGF)-induced differentiation of neuronal cells. In particular, we study the actions of NGF and mitogenic epidermal growth factor (EGF) in rat pheochromocytoma (PC12) cells. Through a network of intermediate signaling proteins, each of these growth factors stimulates extracellular regulated kinase (Erk) phosphorylation with distinct dynamical profiles. Using our modeling approach, we are able to predict the influence of specific signaling modules in determining the integrated cellular response to the two growth factors. Our methods also raise some interesting insights into the design and possible evolution of cellular systems, highlighting an inherent property of these systems that we call 'sloppiness.'

  3. Multivariate Tensor-based Morphometry on Surfaces: Application to Mapping Ventricular Abnormalities in HIV/AIDS

    PubMed Central

    Wang, Yalin; Zhang, Jie; Gutman, Boris; Chan, Tony F.; Becker, James T.; Aizenstein, Howard J.; Lopez, Oscar L.; Tamburo, Robert J.; Toga, Arthur W.; Thompson, Paul M.

    2010-01-01

    Here we developed a new method, called multivariate tensor-based surface morphometry (TBM), and applied it to study lateral ventricular surface differences associated with HIV/AIDS. Using concepts from differential geometry and the theory of differential forms, we created mathematical structures known as holomorphic one-forms, to obtain an efficient and accurate conformal parameterization of the lateral ventricular surfaces in the brain. The new meshing approach also provides a natural way to register anatomical surfaces across subjects, and improves on prior methods as it handles surfaces that branch and join at complex 3D junctions. To analyze anatomical differences, we computed new statistics from the Riemannian surface metrics - these retain multivariate information on local surface geometry. We applied this framework to analyze lateral ventricular surface morphometry in 3D MRI data from 11 subjects with HIV/AIDS and 8 healthy controls. Our method detected a 3D profile of surface abnormalities even in this small sample. Multivariate statistics on the local tensors gave better effect sizes for detecting group differences, relative to other TBM-based methods including analysis of the Jacobian determinant, the largest and smallest eigenvalues of the surface metric, and the pair of eigenvalues of the Jacobian matrix. The resulting analysis pipeline may improve the power of surface-based morphometry studies of the brain. PMID:19900560

  4. FADO: a statistical method to detect favored or avoided distances between occurrences of motifs using the Hawkes' model.

    PubMed

    Gusto, Gaelle; Schbath, Sophie

    2005-01-01

    We propose an original statistical method to estimate how the occurrences of a given process along a genome, genes or motifs for instance, may be influenced by the occurrences of a second process. More precisely, the aim is to detect avoided and/or favored distances between two motifs, for instance, suggesting possible interactions at a molecular level. For this, we consider occurrences along the genome as point processes and we use the so-called Hawkes' model. In such model, the intensity at position t depends linearly on the distances to past occurrences of both processes via two unknown profile functions to estimate. We perform a non parametric estimation of both profiles by using B-spline decompositions and a constrained maximum likelihood method. Finally, we use the AIC criterion for the model selection. Simulations show the excellent behavior of our estimation procedure. We then apply it to study (i) the dependence between gene occurrences along the E. coli genome and the occurrences of a motif known to be part of the major promoter for this bacterium, and (ii) the dependence between the yeast S. cerevisiae genes and the occurrences of putative polyadenylation signals. The results are coherent with known biological properties or previous predictions, meaning this method can be of great interest for functional motif detection, or to improve knowledge of some biological mechanisms.

  5. The Gaussian CL s method for searches of new physics

    DOE PAGES

    Qian, X.; Tan, A.; Ling, J. J.; ...

    2016-04-23

    Here we describe a method based on the CL s approach to present results in searches of new physics, under the condition that the relevant parameter space is continuous. Our method relies on a class of test statistics developed for non-nested hypotheses testing problems, denoted by ΔT, which has a Gaussian approximation to its parent distribution when the sample size is large. This leads to a simple procedure of forming exclusion sets for the parameters of interest, which we call the Gaussian CL s method. Our work provides a self-contained mathematical proof for the Gaussian CL s method, that explicitlymore » outlines the required conditions. These conditions are milder than that required by the Wilks' theorem to set confidence intervals (CIs). We illustrate the Gaussian CL s method in an example of searching for a sterile neutrino, where the CL s approach was rarely used before. We also compare data analysis results produced by the Gaussian CL s method and various CI methods to showcase their differences.« less

  6. The Biomarker-Surrogacy Evaluation Schema: a review of the biomarker-surrogate literature and a proposal for a criterion-based, quantitative, multidimensional hierarchical levels of evidence schema for evaluating the status of biomarkers as surrogate endpoints.

    PubMed

    Lassere, Marissa N

    2008-06-01

    There are clear advantages to using biomarkers and surrogate endpoints, but concerns about clinical and statistical validity and systematic methods to evaluate these aspects hinder their efficient application. Section 2 is a systematic, historical review of the biomarker-surrogate endpoint literature with special reference to the nomenclature, the systems of classification and statistical methods developed for their evaluation. In Section 3 an explicit, criterion-based, quantitative, multidimensional hierarchical levels of evidence schema - Biomarker-Surrogacy Evaluation Schema - is proposed to evaluate and co-ordinate the multiple dimensions (biological, epidemiological, statistical, clinical trial and risk-benefit evidence) of the biomarker clinical endpoint relationships. The schema systematically evaluates and ranks the surrogacy status of biomarkers and surrogate endpoints using defined levels of evidence. The schema incorporates the three independent domains: Study Design, Target Outcome and Statistical Evaluation. Each domain has items ranked from zero to five. An additional category called Penalties incorporates additional considerations of biological plausibility, risk-benefit and generalizability. The total score (0-15) determines the level of evidence, with Level 1 the strongest and Level 5 the weakest. The term ;surrogate' is restricted to markers attaining Levels 1 or 2 only. Surrogacy status of markers can then be directly compared within and across different areas of medicine to guide individual, trial-based or drug-development decisions. This schema would facilitate communication between clinical, researcher, regulatory, industry and consumer participants necessary for evaluation of the biomarker-surrogate-clinical endpoint relationship in their different settings.

  7. Beluga whale, Delphinapterus leucas, vocalizations from the Churchill River, Manitoba, Canada.

    PubMed

    Chmelnitsky, Elly G; Ferguson, Steven H

    2012-06-01

    Classification of animal vocalizations is often done by a human observer using aural and visual analysis but more efficient, automated methods have also been utilized to reduce bias and increase reproducibility. Beluga whale, Delphinapterus leucas, calls were described from recordings collected in the summers of 2006-2008, in the Churchill River, Manitoba. Calls (n=706) were classified based on aural and visual analysis, and call characteristics were measured; calls were separated into 453 whistles (64.2%; 22 types), 183 pulsed∕noisy calls (25.9%; 15 types), and 70 combined calls (9.9%; seven types). Measured parameters varied within each call type but less variation existed in pulsed and noisy call types and some combined call types than in whistles. A more efficient and repeatable hierarchical clustering method was applied to 200 randomly chosen whistles using six call characteristics as variables; twelve groups were identified. Call characteristics varied less in cluster analysis groups than in whistle types described by visual and aural analysis and results were similar to the whistle contours described. This study provided the first description of beluga calls in Hudson Bay and using two methods provides more robust interpretations and an assessment of appropriate methods for future studies.

  8. Statistical universals reveal the structures and functions of human music.

    PubMed

    Savage, Patrick E; Brown, Steven; Sakai, Emi; Currie, Thomas E

    2015-07-21

    Music has been called "the universal language of mankind." Although contemporary theories of music evolution often invoke various musical universals, the existence of such universals has been disputed for decades and has never been empirically demonstrated. Here we combine a music-classification scheme with statistical analyses, including phylogenetic comparative methods, to examine a well-sampled global set of 304 music recordings. Our analyses reveal no absolute universals but strong support for many statistical universals that are consistent across all nine geographic regions sampled. These universals include 18 musical features that are common individually as well as a network of 10 features that are commonly associated with one another. They span not only features related to pitch and rhythm that are often cited as putative universals but also rarely cited domains including performance style and social context. These cross-cultural structural regularities of human music may relate to roles in facilitating group coordination and cohesion, as exemplified by the universal tendency to sing, play percussion instruments, and dance to simple, repetitive music in groups. Our findings highlight the need for scientists studying music evolution to expand the range of musical cultures and musical features under consideration. The statistical universals we identified represent important candidates for future investigation.

  9. Statistical universals reveal the structures and functions of human music

    PubMed Central

    Savage, Patrick E.; Brown, Steven; Sakai, Emi; Currie, Thomas E.

    2015-01-01

    Music has been called “the universal language of mankind.” Although contemporary theories of music evolution often invoke various musical universals, the existence of such universals has been disputed for decades and has never been empirically demonstrated. Here we combine a music-classification scheme with statistical analyses, including phylogenetic comparative methods, to examine a well-sampled global set of 304 music recordings. Our analyses reveal no absolute universals but strong support for many statistical universals that are consistent across all nine geographic regions sampled. These universals include 18 musical features that are common individually as well as a network of 10 features that are commonly associated with one another. They span not only features related to pitch and rhythm that are often cited as putative universals but also rarely cited domains including performance style and social context. These cross-cultural structural regularities of human music may relate to roles in facilitating group coordination and cohesion, as exemplified by the universal tendency to sing, play percussion instruments, and dance to simple, repetitive music in groups. Our findings highlight the need for scientists studying music evolution to expand the range of musical cultures and musical features under consideration. The statistical universals we identified represent important candidates for future investigation. PMID:26124105

  10. A three-dimensional refractive index model for simulation of optical wave propagation in atmospheric turbulence

    NASA Astrophysics Data System (ADS)

    Paramonov, P. V.; Vorontsov, A. M.; Kunitsyn, V. E.

    2015-10-01

    Numerical modeling of optical wave propagation in atmospheric turbulence is traditionally performed with using the so-called "split"-operator method, when the influence of the propagation medium's refractive index inhomogeneities is accounted for only within a system of infinitely narrow layers (phase screens) where phase is distorted. Commonly, under certain assumptions, such phase screens are considered as mutually statistically uncorrelated. However, in several important applications including laser target tracking, remote sensing, and atmospheric imaging, accurate optical field propagation modeling assumes upper limitations on interscreen spacing. The latter situation can be observed, for instance, in the presence of large-scale turbulent inhomogeneities or in deep turbulence conditions, where interscreen distances become comparable with turbulence outer scale and, hence, corresponding phase screens cannot be statistically uncorrelated. In this paper, we discuss correlated phase screens. The statistical characteristics of screens are calculated based on a representation of turbulent fluctuations of three-dimensional (3D) refractive index random field as a set of sequentially correlated 3D layers displaced in the wave propagation direction. The statistical characteristics of refractive index fluctuations are described in terms of the von Karman power spectrum density. In the representation of these 3D layers by corresponding phase screens, the geometrical optics approximation is used.

  11. Interrater reliability: the kappa statistic.

    PubMed

    McHugh, Mary L

    2012-01-01

    The kappa statistic is frequently used to test interrater reliability. The importance of rater reliability lies in the fact that it represents the extent to which the data collected in the study are correct representations of the variables measured. Measurement of the extent to which data collectors (raters) assign the same score to the same variable is called interrater reliability. While there have been a variety of methods to measure interrater reliability, traditionally it was measured as percent agreement, calculated as the number of agreement scores divided by the total number of scores. In 1960, Jacob Cohen critiqued use of percent agreement due to its inability to account for chance agreement. He introduced the Cohen's kappa, developed to account for the possibility that raters actually guess on at least some variables due to uncertainty. Like most correlation statistics, the kappa can range from -1 to +1. While the kappa is one of the most commonly used statistics to test interrater reliability, it has limitations. Judgments about what level of kappa should be acceptable for health research are questioned. Cohen's suggested interpretation may be too lenient for health related studies because it implies that a score as low as 0.41 might be acceptable. Kappa and percent agreement are compared, and levels for both kappa and percent agreement that should be demanded in healthcare studies are suggested.

  12. “Plateau”-related summary statistics are uninformative for comparing working memory models

    PubMed Central

    van den Berg, Ronald; Ma, Wei Ji

    2014-01-01

    Performance on visual working memory tasks decreases as more items need to be remembered. Over the past decade, a debate has unfolded between proponents of slot models and slotless models of this phenomenon. Zhang and Luck (2008) and Anderson, Vogel, and Awh (2011) noticed that as more items need to be remembered, “memory noise” seems to first increase and then reach a “stable plateau.” They argued that three summary statistics characterizing this plateau are consistent with slot models, but not with slotless models. Here, we assess the validity of their methods. We generated synthetic data both from a leading slot model and from a recent slotless model and quantified model evidence using log Bayes factors. We found that the summary statistics provided, at most, 0.15% of the expected model evidence in the raw data. In a model recovery analysis, a total of more than a million trials were required to achieve 99% correct recovery when models were compared on the basis of summary statistics, whereas fewer than 1,000 trials were sufficient when raw data were used. At realistic numbers of trials, plateau-related summary statistics are completely unreliable for model comparison. Applying the same analyses to subject data from Anderson et al. (2011), we found that the evidence in the summary statistics was, at most, 0.12% of the evidence in the raw data and far too weak to warrant any conclusions. These findings call into question claims about working memory that are based on summary statistics. PMID:24719235

  13. Comparison of bootstrap approaches for estimation of uncertainties of DTI parameters.

    PubMed

    Chung, SungWon; Lu, Ying; Henry, Roland G

    2006-11-01

    Bootstrap is an empirical non-parametric statistical technique based on data resampling that has been used to quantify uncertainties of diffusion tensor MRI (DTI) parameters, useful in tractography and in assessing DTI methods. The current bootstrap method (repetition bootstrap) used for DTI analysis performs resampling within the data sharing common diffusion gradients, requiring multiple acquisitions for each diffusion gradient. Recently, wild bootstrap was proposed that can be applied without multiple acquisitions. In this paper, two new approaches are introduced called residual bootstrap and repetition bootknife. We show that repetition bootknife corrects for the large bias present in the repetition bootstrap method and, therefore, better estimates the standard errors. Like wild bootstrap, residual bootstrap is applicable to single acquisition scheme, and both are based on regression residuals (called model-based resampling). Residual bootstrap is based on the assumption that non-constant variance of measured diffusion-attenuated signals can be modeled, which is actually the assumption behind the widely used weighted least squares solution of diffusion tensor. The performances of these bootstrap approaches were compared in terms of bias, variance, and overall error of bootstrap-estimated standard error by Monte Carlo simulation. We demonstrate that residual bootstrap has smaller biases and overall errors, which enables estimation of uncertainties with higher accuracy. Understanding the properties of these bootstrap procedures will help us to choose the optimal approach for estimating uncertainties that can benefit hypothesis testing based on DTI parameters, probabilistic fiber tracking, and optimizing DTI methods.

  14. Design optimization and probabilistic analysis of a hydrodynamic journal bearing

    NASA Technical Reports Server (NTRS)

    Liniecki, Alexander G.

    1990-01-01

    A nonlinear constrained optimization of a hydrodynamic bearing was performed yielding three main variables: radial clearance, bearing length to diameter ratio, and lubricating oil viscosity. As an objective function a combined model of temperature rise and oil supply has been adopted. The optimized model of the bearing has been simulated for population of 1000 cases using Monte Carlo statistical method. It appeared that the so called 'optimal solution' generated more than 50 percent of failed bearings, because their minimum oil film thickness violated stipulated minimum constraint value. As a remedy change of oil viscosity is suggested after several sensitivities of variables have been investigated.

  15. ABrox-A user-friendly Python module for approximate Bayesian computation with a focus on model comparison.

    PubMed

    Mertens, Ulf Kai; Voss, Andreas; Radev, Stefan

    2018-01-01

    We give an overview of the basic principles of approximate Bayesian computation (ABC), a class of stochastic methods that enable flexible and likelihood-free model comparison and parameter estimation. Our new open-source software called ABrox is used to illustrate ABC for model comparison on two prominent statistical tests, the two-sample t-test and the Levene-Test. We further highlight the flexibility of ABC compared to classical Bayesian hypothesis testing by computing an approximate Bayes factor for two multinomial processing tree models. Last but not least, throughout the paper, we introduce ABrox using the accompanied graphical user interface.

  16. Ordinal pattern statistics for the assessment of heart rate variability

    NASA Astrophysics Data System (ADS)

    Graff, G.; Graff, B.; Kaczkowska, A.; Makowiec, D.; Amigó, J. M.; Piskorski, J.; Narkiewicz, K.; Guzik, P.

    2013-06-01

    The recognition of all main features of a healthy heart rhythm (the so-called sinus rhythm) is still one of the biggest challenges in contemporary cardiology. Recently the interesting physiological phenomenon of heart rate asymmetry has been observed. This phenomenon is related to unbalanced contributions of heart rate decelerations and accelerations to heart rate variability. In this paper we apply methods based on the concept of ordinal pattern to the analysis of electrocardiograms (inter-peak intervals) of healthy subjects in the supine position. This way we observe new regularities of the heart rhythm related to the distribution of ordinal patterns of lengths 3 and 4.

  17. An Introduction to MAMA (Meta-Analysis of MicroArray data) System.

    PubMed

    Zhang, Zhe; Fenstermacher, David

    2005-01-01

    Analyzing microarray data across multiple experiments has been proven advantageous. To support this kind of analysis, we are developing a software system called MAMA (Meta-Analysis of MicroArray data). MAMA utilizes a client-server architecture with a relational database on the server-side for the storage of microarray datasets collected from various resources. The client-side is an application running on the end user's computer that allows the user to manipulate microarray data and analytical results locally. MAMA implementation will integrate several analytical methods, including meta-analysis within an open-source framework offering other developers the flexibility to plug in additional statistical algorithms.

  18. Breaking chaotic secure communication using a spectrogram

    NASA Astrophysics Data System (ADS)

    Yang, Tao; Yang, Lin-Bao; Yang, Chun-Mei

    1998-10-01

    We present the results of breaking a kind of chaotic secure communication system called chaotic switching scheme, also known as chaotic shift keying, in which a binary message signal is scrambled by two chaotic attractors. The spectrogram which can reveal the energy evolving process in the spectral-temporal space is used to distinguish the two different chaotic attractors, which are qualitatively and statistically similar in phase space. Then mathematical morphological filters are used to decode the binary message signal without the knowledge of the binary message signal and the transmitter. The computer experimental results are provided to show how our method works when both the chaotic and hyper-chaotic transmitter are used.

  19. Occurrence probability of structured motifs in random sequences.

    PubMed

    Robin, S; Daudin, J-J; Richard, H; Sagot, M-F; Schbath, S

    2002-01-01

    The problem of extracting from a set of nucleic acid sequences motifs which may have biological function is more and more important. In this paper, we are interested in particular motifs that may be implicated in the transcription process. These motifs, called structured motifs, are composed of two ordered parts separated by a variable distance and allowing for substitutions. In order to assess their statistical significance, we propose approximations of the probability of occurrences of such a structured motif in a given sequence. An application of our method to evaluate candidate promoters in E. coli and B. subtilis is presented. Simulations show the goodness of the approximations.

  20. Making Sense of 'Big Data' in Provenance Studies

    NASA Astrophysics Data System (ADS)

    Vermeesch, P.

    2014-12-01

    Huge online databases can be 'mined' to reveal previously hidden trends and relationships in society. One could argue that sedimentary geology has entered a similar era of 'Big Data', as modern provenance studies routinely apply multiple proxies to dozens of samples. Just like the Internet, sedimentary geology now requires specialised statistical tools to interpret such large datasets. These can be organised on three levels of progressively higher order:A single sample: The most effective way to reveal the provenance information contained in a representative sample of detrital zircon U-Pb ages are probability density estimators such as histograms and kernel density estimates. The widely popular 'probability density plots' implemented in IsoPlot and AgeDisplay compound analytical uncertainty with geological scatter and are therefore invalid.Several samples: Multi-panel diagrams comprising many detrital age distributions or compositional pie charts quickly become unwieldy and uninterpretable. For example, if there are N samples in a study, then the number of pairwise comparisons between samples increases quadratically as N(N-1)/2. This is simply too much information for the human eye to process. To solve this problem, it is necessary to (a) express the 'distance' between two samples as a simple scalar and (b) combine all N(N-1)/2 such values in a single two-dimensional 'map', grouping similar and pulling apart dissimilar samples. This can be easily achieved using simple statistics-based dissimilarity measures and a standard statistical method called Multidimensional Scaling (MDS).Several methods: Suppose that we use four provenance proxies: bulk petrography, chemistry, heavy minerals and detrital geochronology. This will result in four MDS maps, each of which likely show slightly different trends and patterns. To deal with such cases, it may be useful to use a related technique called 'three way multidimensional scaling'. This results in two graphical outputs: an MDS map, and a map with 'weights' showing to what extent the different provenance proxies influence the horizontal and vertical axis of the MDS map. Thus, detrital data can not only inform the user about the provenance of sediments, but also about the causal relationships between the mineralogy, geochronology and chemistry.

  1. Perspectives on statistics education: observations from statistical consulting in an academic nursing environment.

    PubMed

    Hayat, Matthew J; Schmiege, Sarah J; Cook, Paul F

    2014-04-01

    Statistics knowledge is essential for understanding the nursing and health care literature, as well as for applying rigorous science in nursing research. Statistical consultants providing services to faculty and students in an academic nursing program have the opportunity to identify gaps and challenges in statistics education for nursing students. This information may be useful to curriculum committees and statistics educators. This article aims to provide perspective on statistics education stemming from the experiences of three experienced statistics educators who regularly collaborate and consult with nurse investigators. The authors share their knowledge and express their views about data management, data screening and manipulation, statistical software, types of scientific investigation, and advanced statistical topics not covered in the usual coursework. The suggestions provided promote a call for data to study these topics. Relevant data about statistics education can assist educators in developing comprehensive statistics coursework for nursing students. Copyright 2014, SLACK Incorporated.

  2. A comparative study of traditional lecture methods and interactive lecture methods in introductory geology courses for non-science majors at the college level

    NASA Astrophysics Data System (ADS)

    Hundley, Stacey A.

    In recent years there has been a national call for reform in undergraduate science education. The goal of this reform movement in science education is to develop ways to improve undergraduate student learning with an emphasis on developing more effective teaching practices. Introductory science courses at the college level are generally taught using a traditional lecture format. Recent studies have shown incorporating active learning strategies within the traditional lecture classroom has positive effects on student outcomes. This study focuses on incorporating interactive teaching methods into the traditional lecture classroom to enhance student learning for non-science majors enrolled in introductory geology courses at a private university. Students' experience and instructional preferences regarding introductory geology courses were identified from survey data analysis. The information gained from responses to the questionnaire was utilized to develop an interactive lecture introductory geology course for non-science majors. Student outcomes were examined in introductory geology courses based on two teaching methods: interactive lecture and traditional lecture. There were no significant statistical differences between the groups based on the student outcomes and teaching methods. Incorporating interactive lecture methods did not statistically improve student outcomes when compared to traditional lecture teaching methods. However, the responses to the survey revealed students have a preference for introductory geology courses taught with lecture and instructor-led discussions and students prefer to work independently or in small groups. The results of this study are useful to individuals who teach introductory geology courses and individuals who teach introductory science courses for non-science majors at the college level.

  3. Spike sorting using locality preserving projection with gap statistics and landmark-based spectral clustering.

    PubMed

    Nguyen, Thanh; Khosravi, Abbas; Creighton, Douglas; Nahavandi, Saeid

    2014-12-30

    Understanding neural functions requires knowledge from analysing electrophysiological data. The process of assigning spikes of a multichannel signal into clusters, called spike sorting, is one of the important problems in such analysis. There have been various automated spike sorting techniques with both advantages and disadvantages regarding accuracy and computational costs. Therefore, developing spike sorting methods that are highly accurate and computationally inexpensive is always a challenge in the biomedical engineering practice. An automatic unsupervised spike sorting method is proposed in this paper. The method uses features extracted by the locality preserving projection (LPP) algorithm. These features afterwards serve as inputs for the landmark-based spectral clustering (LSC) method. Gap statistics (GS) is employed to evaluate the number of clusters before the LSC can be performed. The proposed LPP-LSC is highly accurate and computationally inexpensive spike sorting approach. LPP spike features are very discriminative; thereby boost the performance of clustering methods. Furthermore, the LSC method exhibits its efficiency when integrated with the cluster evaluator GS. The proposed method's accuracy is approximately 13% superior to that of the benchmark combination between wavelet transformation and superparamagnetic clustering (WT-SPC). Additionally, LPP-LSC computing time is six times less than that of the WT-SPC. LPP-LSC obviously demonstrates a win-win spike sorting solution meeting both accuracy and computational cost criteria. LPP and LSC are linear algorithms that help reduce computational burden and thus their combination can be applied into real-time spike analysis. Copyright © 2014 Elsevier B.V. All rights reserved.

  4. Promoting Active Learning When Teaching Introductory Statistics and Probability Using a Portfolio Curriculum Approach

    ERIC Educational Resources Information Center

    Adair, Desmond; Jaeger, Martin; Price, Owen M.

    2018-01-01

    The use of a portfolio curriculum approach, when teaching a university introductory statistics and probability course to engineering students, is developed and evaluated. The portfolio curriculum approach, so called, as the students need to keep extensive records both as hard copies and digitally of reading materials, interactions with faculty,…

  5. Retrieving Essential Material at the End of Lectures Improves Performance on Statistics Exams

    ERIC Educational Resources Information Center

    Lyle, Keith B.; Crawford, Nicole A.

    2011-01-01

    At the end of each lecture in a statistics for psychology course, students answered a small set of questions that required them to retrieve information from the same day's lecture. These exercises constituted retrieval practice for lecture material subsequently tested on four exams throughout the course. This technique is called the PUREMEM…

  6. Doing Research That Matters: A Success Story from Statistics Education

    ERIC Educational Resources Information Center

    Hipkins, Rosemary

    2014-01-01

    This is the first report from a new initiative called TLRI Project Plus. It aims to add value to the Teaching and Learning Research Initiative (TLRI), which NZCER manages on behalf of the government, by synthesising findings across multiple projects. This report focuses on two projects in statistics education and explores the factors that…

  7. Parameterizing Phrase Based Statistical Machine Translation Models: An Analytic Study

    ERIC Educational Resources Information Center

    Cer, Daniel

    2011-01-01

    The goal of this dissertation is to determine the best way to train a statistical machine translation system. I first develop a state-of-the-art machine translation system called Phrasal and then use it to examine a wide variety of potential learning algorithms and optimization criteria and arrive at two very surprising results. First, despite the…

  8. Ten Ways to Improve the Use of Statistical Mediation Analysis in the Practice of Child and Adolescent Treatment Research

    ERIC Educational Resources Information Center

    Maric, Marija; Wiers, Reinout W.; Prins, Pier J. M.

    2012-01-01

    Despite guidelines and repeated calls from the literature, statistical mediation analysis in youth treatment outcome research is rare. Even more concerning is that many studies that "have" reported mediation analyses do not fulfill basic requirements for mediation analysis, providing inconclusive data and clinical implications. As a result, after…

  9. 78 FR 63966 - Gulf of Mexico Fishery Management Council; Public Meeting

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-10-25

    ... Scientific and Statistical Committee (SSC). DATES: The meeting will be held from 9 a.m. Until 5 p.m. on.... Recommendations to the Council 4. Other Business For meeting materials, call (813) 348-1630. Although other non-emergency issues not on the agenda may come before the Scientific and Statistical Committees for discussion...

  10. The PIT-trap-A "model-free" bootstrap procedure for inference about regression models with discrete, multivariate responses.

    PubMed

    Warton, David I; Thibaut, Loïc; Wang, Yi Alice

    2017-01-01

    Bootstrap methods are widely used in statistics, and bootstrapping of residuals can be especially useful in the regression context. However, difficulties are encountered extending residual resampling to regression settings where residuals are not identically distributed (thus not amenable to bootstrapping)-common examples including logistic or Poisson regression and generalizations to handle clustered or multivariate data, such as generalised estimating equations. We propose a bootstrap method based on probability integral transform (PIT-) residuals, which we call the PIT-trap, which assumes data come from some marginal distribution F of known parametric form. This method can be understood as a type of "model-free bootstrap", adapted to the problem of discrete and highly multivariate data. PIT-residuals have the key property that they are (asymptotically) pivotal. The PIT-trap thus inherits the key property, not afforded by any other residual resampling approach, that the marginal distribution of data can be preserved under PIT-trapping. This in turn enables the derivation of some standard bootstrap properties, including second-order correctness of pivotal PIT-trap test statistics. In multivariate data, bootstrapping rows of PIT-residuals affords the property that it preserves correlation in data without the need for it to be modelled, a key point of difference as compared to a parametric bootstrap. The proposed method is illustrated on an example involving multivariate abundance data in ecology, and demonstrated via simulation to have improved properties as compared to competing resampling methods.

  11. Perceptron ensemble of graph-based positive-unlabeled learning for disease gene identification.

    PubMed

    Jowkar, Gholam-Hossein; Mansoori, Eghbal G

    2016-10-01

    Identification of disease genes, using computational methods, is an important issue in biomedical and bioinformatics research. According to observations that diseases with the same or similar phenotype have the same biological characteristics, researchers have tried to identify genes by using machine learning tools. In recent attempts, some semi-supervised learning methods, called positive-unlabeled learning, is used for disease gene identification. In this paper, we present a Perceptron ensemble of graph-based positive-unlabeled learning (PEGPUL) on three types of biological attributes: gene ontologies, protein domains and protein-protein interaction networks. In our method, a reliable set of positive and negative genes are extracted using co-training schema. Then, the similarity graph of genes is built using metric learning by concentrating on multi-rank-walk method to perform inference from labeled genes. At last, a Perceptron ensemble is learned from three weighted classifiers: multilevel support vector machine, k-nearest neighbor and decision tree. The main contributions of this paper are: (i) incorporating the statistical properties of gene data through choosing proper metrics, (ii) statistical evaluation of biological features, and (iii) noise robustness characteristic of PEGPUL via using multilevel schema. In order to assess PEGPUL, we have applied it on 12950 disease genes with 949 positive genes from six class of diseases and 12001 unlabeled genes. Compared with some popular disease gene identification methods, the experimental results show that PEGPUL has reasonable performance. Copyright © 2016 Elsevier Ltd. All rights reserved.

  12. The PIT-trap—A “model-free” bootstrap procedure for inference about regression models with discrete, multivariate responses

    PubMed Central

    Thibaut, Loïc; Wang, Yi Alice

    2017-01-01

    Bootstrap methods are widely used in statistics, and bootstrapping of residuals can be especially useful in the regression context. However, difficulties are encountered extending residual resampling to regression settings where residuals are not identically distributed (thus not amenable to bootstrapping)—common examples including logistic or Poisson regression and generalizations to handle clustered or multivariate data, such as generalised estimating equations. We propose a bootstrap method based on probability integral transform (PIT-) residuals, which we call the PIT-trap, which assumes data come from some marginal distribution F of known parametric form. This method can be understood as a type of “model-free bootstrap”, adapted to the problem of discrete and highly multivariate data. PIT-residuals have the key property that they are (asymptotically) pivotal. The PIT-trap thus inherits the key property, not afforded by any other residual resampling approach, that the marginal distribution of data can be preserved under PIT-trapping. This in turn enables the derivation of some standard bootstrap properties, including second-order correctness of pivotal PIT-trap test statistics. In multivariate data, bootstrapping rows of PIT-residuals affords the property that it preserves correlation in data without the need for it to be modelled, a key point of difference as compared to a parametric bootstrap. The proposed method is illustrated on an example involving multivariate abundance data in ecology, and demonstrated via simulation to have improved properties as compared to competing resampling methods. PMID:28738071

  13. Inferring gene regression networks with model trees

    PubMed Central

    2010-01-01

    Background Novel strategies are required in order to handle the huge amount of data produced by microarray technologies. To infer gene regulatory networks, the first step is to find direct regulatory relationships between genes building the so-called gene co-expression networks. They are typically generated using correlation statistics as pairwise similarity measures. Correlation-based methods are very useful in order to determine whether two genes have a strong global similarity but do not detect local similarities. Results We propose model trees as a method to identify gene interaction networks. While correlation-based methods analyze each pair of genes, in our approach we generate a single regression tree for each gene from the remaining genes. Finally, a graph from all the relationships among output and input genes is built taking into account whether the pair of genes is statistically significant. For this reason we apply a statistical procedure to control the false discovery rate. The performance of our approach, named REGNET, is experimentally tested on two well-known data sets: Saccharomyces Cerevisiae and E.coli data set. First, the biological coherence of the results are tested. Second the E.coli transcriptional network (in the Regulon database) is used as control to compare the results to that of a correlation-based method. This experiment shows that REGNET performs more accurately at detecting true gene associations than the Pearson and Spearman zeroth and first-order correlation-based methods. Conclusions REGNET generates gene association networks from gene expression data, and differs from correlation-based methods in that the relationship between one gene and others is calculated simultaneously. Model trees are very useful techniques to estimate the numerical values for the target genes by linear regression functions. They are very often more precise than linear regression models because they can add just different linear regressions to separate areas of the search space favoring to infer localized similarities over a more global similarity. Furthermore, experimental results show the good performance of REGNET. PMID:20950452

  14. Evaluation of a Partial Genome Screening of Two Asthma Susceptibility Regions Using Bayesian Network Based Bayesian Multilevel Analysis of Relevance

    PubMed Central

    Antal, Péter; Kiszel, Petra Sz.; Gézsi, András; Hadadi, Éva; Virág, Viktor; Hajós, Gergely; Millinghoffer, András; Nagy, Adrienne; Kiss, András; Semsei, Ágnes F.; Temesi, Gergely; Melegh, Béla; Kisfali, Péter; Széll, Márta; Bikov, András; Gálffy, Gabriella; Tamási, Lilla; Falus, András; Szalai, Csaba

    2012-01-01

    Genetic studies indicate high number of potential factors related to asthma. Based on earlier linkage analyses we selected the 11q13 and 14q22 asthma susceptibility regions, for which we designed a partial genome screening study using 145 SNPs in 1201 individuals (436 asthmatic children and 765 controls). The results were evaluated with traditional frequentist methods and we applied a new statistical method, called Bayesian network based Bayesian multilevel analysis of relevance (BN-BMLA). This method uses Bayesian network representation to provide detailed characterization of the relevance of factors, such as joint significance, the type of dependency, and multi-target aspects. We estimated posteriors for these relations within the Bayesian statistical framework, in order to estimate the posteriors whether a variable is directly relevant or its association is only mediated. With frequentist methods one SNP (rs3751464 in the FRMD6 gene) provided evidence for an association with asthma (OR = 1.43(1.2–1.8); p = 3×10−4). The possible role of the FRMD6 gene in asthma was also confirmed in an animal model and human asthmatics. In the BN-BMLA analysis altogether 5 SNPs in 4 genes were found relevant in connection with asthma phenotype: PRPF19 on chromosome 11, and FRMD6, PTGER2 and PTGDR on chromosome 14. In a subsequent step a partial dataset containing rhinitis and further clinical parameters was used, which allowed the analysis of relevance of SNPs for asthma and multiple targets. These analyses suggested that SNPs in the AHNAK and MS4A2 genes were indirectly associated with asthma. This paper indicates that BN-BMLA explores the relevant factors more comprehensively than traditional statistical methods and extends the scope of strong relevance based methods to include partial relevance, global characterization of relevance and multi-target relevance. PMID:22432035

  15. Multiobjective evolutionary algorithm with many tables for purely ab initio protein structure prediction.

    PubMed

    Brasil, Christiane Regina Soares; Delbem, Alexandre Claudio Botazzo; da Silva, Fernando Luís Barroso

    2013-07-30

    This article focuses on the development of an approach for ab initio protein structure prediction (PSP) without using any earlier knowledge from similar protein structures, as fragment-based statistics or inference of secondary structures. Such an approach is called purely ab initio prediction. The article shows that well-designed multiobjective evolutionary algorithms can predict relevant protein structures in a purely ab initio way. One challenge for purely ab initio PSP is the prediction of structures with β-sheets. To work with such proteins, this research has also developed procedures to efficiently estimate hydrogen bond and solvation contribution energies. Considering van der Waals, electrostatic, hydrogen bond, and solvation contribution energies, the PSP is a problem with four energetic terms to be minimized. Each interaction energy term can be considered an objective of an optimization method. Combinatorial problems with four objectives have been considered too complex for the available multiobjective optimization (MOO) methods. The proposed approach, called "Multiobjective evolutionary algorithms with many tables" (MEAMT), can efficiently deal with four objectives through the combination thereof, performing a more adequate sampling of the objective space. Therefore, this method can better map the promising regions in this space, predicting structures in a purely ab initio way. In other words, MEAMT is an efficient optimization method for MOO, which explores simultaneously the search space as well as the objective space. MEAMT can predict structures with one or two domains with RMSDs comparable to values obtained by recently developed ab initio methods (GAPFCG , I-PAES, and Quark) that use different levels of earlier knowledge. Copyright © 2013 Wiley Periodicals, Inc.

  16. The Intracranial Distribution of Gliomas in Relation to Exposure From Mobile Phones: Analyses From the INTERPHONE Study

    PubMed Central

    Grell, Kathrine; Frederiksen, Kirsten; Schüz, Joachim; Cardis, Elisabeth; Armstrong, Bruce; Siemiatycki, Jack; Krewski, Daniel R.; McBride, Mary L.; Johansen, Christoffer; Auvinen, Anssi; Hours, Martine; Blettner, Maria; Sadetzki, Siegal; Lagorio, Susanna; Yamaguchi, Naohito; Woodward, Alistair; Tynes, Tore; Feychting, Maria; Fleming, Sarah J.; Swerdlow, Anthony J.; Andersen, Per K.

    2016-01-01

    When investigating the association between brain tumors and use of mobile telephones, accurate data on tumor position are essential, due to the highly localized absorption of energy in the human brain from the radio-frequency fields emitted. We used a point process model to investigate this association using information that included tumor localization data from the INTERPHONE Study (Australia, Canada, Denmark, Finland, France, Germany, Israel, Italy, Japan, New Zealand, Norway, Sweden, and the United Kingdom). Our main analysis included 792 regular mobile phone users diagnosed with a glioma between 2000 and 2004. Similar to earlier results, we found a statistically significant association between the intracranial distribution of gliomas and the self-reported location of the phone. When we accounted for the preferred side of the head not being exclusively used for all mobile phone calls, the results were similar. The association was independent of the cumulative call time and cumulative number of calls. However, our model used reported side of mobile phone use, which is potentially influenced by recall bias. The point process method provides an alternative to previously used epidemiologic research designs when one is including localization in the investigation of brain tumors and mobile phone use. PMID:27810856

  17. On the assessment of the added value of new predictive biomarkers.

    PubMed

    Chen, Weijie; Samuelson, Frank W; Gallas, Brandon D; Kang, Le; Sahiner, Berkman; Petrick, Nicholas

    2013-07-29

    The surge in biomarker development calls for research on statistical evaluation methodology to rigorously assess emerging biomarkers and classification models. Recently, several authors reported the puzzling observation that, in assessing the added value of new biomarkers to existing ones in a logistic regression model, statistical significance of new predictor variables does not necessarily translate into a statistically significant increase in the area under the ROC curve (AUC). Vickers et al. concluded that this inconsistency is because AUC "has vastly inferior statistical properties," i.e., it is extremely conservative. This statement is based on simulations that misuse the DeLong et al. method. Our purpose is to provide a fair comparison of the likelihood ratio (LR) test and the Wald test versus diagnostic accuracy (AUC) tests. We present a test to compare ideal AUCs of nested linear discriminant functions via an F test. We compare it with the LR test and the Wald test for the logistic regression model. The null hypotheses of these three tests are equivalent; however, the F test is an exact test whereas the LR test and the Wald test are asymptotic tests. Our simulation shows that the F test has the nominal type I error even with a small sample size. Our results also indicate that the LR test and the Wald test have inflated type I errors when the sample size is small, while the type I error converges to the nominal value asymptotically with increasing sample size as expected. We further show that the DeLong et al. method tests a different hypothesis and has the nominal type I error when it is used within its designed scope. Finally, we summarize the pros and cons of all four methods we consider in this paper. We show that there is nothing inherently less powerful or disagreeable about ROC analysis for showing the usefulness of new biomarkers or characterizing the performance of classification models. Each statistical method for assessing biomarkers and classification models has its own strengths and weaknesses. Investigators need to choose methods based on the assessment purpose, the biomarker development phase at which the assessment is being performed, the available patient data, and the validity of assumptions behind the methodologies.

  18. Bladder cancer mapping in Libya based on standardized morbidity ratio and log-normal model

    NASA Astrophysics Data System (ADS)

    Alhdiri, Maryam Ahmed; Samat, Nor Azah; Mohamed, Zulkifley

    2017-05-01

    Disease mapping contains a set of statistical techniques that detail maps of rates based on estimated mortality, morbidity, and prevalence. A traditional approach to measure the relative risk of the disease is called Standardized Morbidity Ratio (SMR). It is the ratio of an observed and expected number of accounts in an area, which has the greatest uncertainty if the disease is rare or if geographical area is small. Therefore, Bayesian models or statistical smoothing based on Log-normal model are introduced which might solve SMR problem. This study estimates the relative risk for bladder cancer incidence in Libya from 2006 to 2007 based on the SMR and log-normal model, which were fitted to data using WinBUGS software. This study starts with a brief review of these models, starting with the SMR method and followed by the log-normal model, which is then applied to bladder cancer incidence in Libya. All results are compared using maps and tables. The study concludes that the log-normal model gives better relative risk estimates compared to the classical method. The log-normal model has can overcome the SMR problem when there is no observed bladder cancer in an area.

  19. Effect of Robotics on Elementary Preservice Teachers' Self-Efficacy, Science Learning, and Computational Thinking

    NASA Astrophysics Data System (ADS)

    Jaipal-Jamani, Kamini; Angeli, Charoula

    2017-04-01

    The current impetus for increasing STEM in K-12 education calls for an examination of how preservice teachers are being prepared to teach STEM. This paper reports on a study that examined elementary preservice teachers' ( n = 21) self-efficacy, understanding of science concepts, and computational thinking as they engaged with robotics in a science methods course. Data collection methods included pretests and posttests on science content, prequestionnaires and postquestionnaires for interest and self-efficacy, and four programming assignments. Statistical results showed that preservice teachers' interest and self-efficacy with robotics increased. There was a statistically significant difference between preknowledge and postknowledge scores, and preservice teachers did show gains in learning how to write algorithms and debug programs over repeated programming tasks. The findings suggest that the robotics activity was an effective instructional strategy to enhance interest in robotics, increase self-efficacy to teach with robotics, develop understandings of science concepts, and promote the development of computational thinking skills. Study findings contribute quantitative evidence to the STEM literature on how robotics develops preservice teachers' self-efficacy, science knowledge, and computational thinking skills in higher education science classroom contexts.

  20. Sparse intervertebral fence composition for 3D cervical vertebra segmentation

    NASA Astrophysics Data System (ADS)

    Liu, Xinxin; Yang, Jian; Song, Shuang; Cong, Weijian; Jiao, Peifeng; Song, Hong; Ai, Danni; Jiang, Yurong; Wang, Yongtian

    2018-06-01

    Statistical shape models are capable of extracting shape prior information, and are usually utilized to assist the task of segmentation of medical images. However, such models require large training datasets in the case of multi-object structures, and it also is difficult to achieve satisfactory results for complex shapes. This study proposed a novel statistical model for cervical vertebra segmentation, called sparse intervertebral fence composition (SiFC), which can reconstruct the boundary between adjacent vertebrae by modeling intervertebral fences. The complex shape of the cervical spine is replaced by a simple intervertebral fence, which considerably reduces the difficulty of cervical segmentation. The final segmentation results are obtained by using a 3D active contour deformation model without shape constraint, which substantially enhances the recognition capability of the proposed method for objects with complex shapes. The proposed segmentation framework is tested on a dataset with CT images from 20 patients. A quantitative comparison against corresponding reference vertebral segmentation yields an overall mean absolute surface distance of 0.70 mm and a dice similarity index of 95.47% for cervical vertebral segmentation. The experimental results show that the SiFC method achieves competitive cervical vertebral segmentation performances, and completely eliminates inter-process overlap.

  1. Red-shouldered hawk occupancy surveys in central Minnesota, USA

    USGS Publications Warehouse

    Henneman, C.; McLeod, M.A.; Andersen, D.E.

    2007-01-01

    Forest-dwelling raptors are often difficult to detect because many species occur at low density or are secretive. Broadcasting conspecific vocalizations can increase the probability of detecting forest-dwelling raptors and has been shown to be an effective method for locating raptors and assessing their relative abundance. Recent advances in statistical techniques based on presence-absence data use probabilistic arguments to derive probability of detection when it is <1 and to provide a model and likelihood-based method for estimating proportion of sites occupied. We used these maximum-likelihood models with data from red-shouldered hawk (Buteo lineatus) call-broadcast surveys conducted in central Minnesota, USA, in 1994-1995 and 2004-2005. Our objectives were to obtain estimates of occupancy and detection probability 1) over multiple sampling seasons (yr), 2) incorporating within-season time-specific detection probabilities, 3) with call type and breeding stage included as covariates in models of probability of detection, and 4) with different sampling strategies. We visited individual survey locations 2-9 times per year, and estimates of both probability of detection (range = 0.28-0.54) and site occupancy (range = 0.81-0.97) varied among years. Detection probability was affected by inclusion of a within-season time-specific covariate, call type, and breeding stage. In 2004 and 2005 we used survey results to assess the effect that number of sample locations, double sampling, and discontinued sampling had on parameter estimates. We found that estimates of probability of detection and proportion of sites occupied were similar across different sampling strategies, and we suggest ways to reduce sampling effort in a monitoring program.

  2. Multilinear Graph Embedding: Representation and Regularization for Images.

    PubMed

    Chen, Yi-Lei; Hsu, Chiou-Ting

    2014-02-01

    Given a set of images, finding a compact and discriminative representation is still a big challenge especially when multiple latent factors are hidden in the way of data generation. To represent multifactor images, although multilinear models are widely used to parameterize the data, most methods are based on high-order singular value decomposition (HOSVD), which preserves global statistics but interprets local variations inadequately. To this end, we propose a novel method, called multilinear graph embedding (MGE), as well as its kernelization MKGE to leverage the manifold learning techniques into multilinear models. Our method theoretically links the linear, nonlinear, and multilinear dimensionality reduction. We also show that the supervised MGE encodes informative image priors for image regularization, provided that an image is represented as a high-order tensor. From our experiments on face and gait recognition, the superior performance demonstrates that MGE better represents multifactor images than classic methods, including HOSVD and its variants. In addition, the significant improvement in image (or tensor) completion validates the potential of MGE for image regularization.

  3. [Hormonal (levonorgestrel) emergency contraception--effectiveness and mechanism of action].

    PubMed

    Medard, Lech M; Ostrowska, Lucyna

    2010-07-01

    Periodic abstinence and coitus interruptus are the most popular methods of contraception in Poland. Recent studies have provided us with evidence that the so-called "menstrual calendar" may be much less effective than it was believed. In these circumstances, promotion and use of safe and truly effective contraceptives is very important for Polish women. Emergency contraception (EC) is a method which could be used even in cases when other contraception methods have failed. Mechanism of action of levonorgestrel used for EC and possible disturbances in the process of implantation of the blastocyst in the endometrium, remain the source of heated discussion among medical professionals. The latest publications provide us with evidence that the use of levonorgestrel in EC neither alters endometrial receptivity nor impedes implantation. Hormonal EC effectiveness is another hot topic of gynecological endocrinology and statistics. There is, however, no better, safer, and more ethically accepted method of preventing unwanted pregnancy for patients in need of postcoital contraception.

  4. Calculating p-values and their significances with the Energy Test for large datasets

    NASA Astrophysics Data System (ADS)

    Barter, W.; Burr, C.; Parkes, C.

    2018-04-01

    The energy test method is a multi-dimensional test of whether two samples are consistent with arising from the same underlying population, through the calculation of a single test statistic (called the T-value). The method has recently been used in particle physics to search for samples that differ due to CP violation. The generalised extreme value function has previously been used to describe the distribution of T-values under the null hypothesis that the two samples are drawn from the same underlying population. We show that, in a simple test case, the distribution is not sufficiently well described by the generalised extreme value function. We present a new method, where the distribution of T-values under the null hypothesis when comparing two large samples can be found by scaling the distribution found when comparing small samples drawn from the same population. This method can then be used to quickly calculate the p-values associated with the results of the test.

  5. Radiocarbon dating uncertainty and the reliability of the PEWMA method of time-series analysis for research on long-term human-environment interaction

    PubMed Central

    Carleton, W. Christopher; Campbell, David

    2018-01-01

    Statistical time-series analysis has the potential to improve our understanding of human-environment interaction in deep time. However, radiocarbon dating—the most common chronometric technique in archaeological and palaeoenvironmental research—creates challenges for established statistical methods. The methods assume that observations in a time-series are precisely dated, but this assumption is often violated when calibrated radiocarbon dates are used because they usually have highly irregular uncertainties. As a result, it is unclear whether the methods can be reliably used on radiocarbon-dated time-series. With this in mind, we conducted a large simulation study to investigate the impact of chronological uncertainty on a potentially useful time-series method. The method is a type of regression involving a prediction algorithm called the Poisson Exponentially Weighted Moving Average (PEMWA). It is designed for use with count time-series data, which makes it applicable to a wide range of questions about human-environment interaction in deep time. Our simulations suggest that the PEWMA method can often correctly identify relationships between time-series despite chronological uncertainty. When two time-series are correlated with a coefficient of 0.25, the method is able to identify that relationship correctly 20–30% of the time, providing the time-series contain low noise levels. With correlations of around 0.5, it is capable of correctly identifying correlations despite chronological uncertainty more than 90% of the time. While further testing is desirable, these findings indicate that the method can be used to test hypotheses about long-term human-environment interaction with a reasonable degree of confidence. PMID:29351329

  6. Radiocarbon dating uncertainty and the reliability of the PEWMA method of time-series analysis for research on long-term human-environment interaction.

    PubMed

    Carleton, W Christopher; Campbell, David; Collard, Mark

    2018-01-01

    Statistical time-series analysis has the potential to improve our understanding of human-environment interaction in deep time. However, radiocarbon dating-the most common chronometric technique in archaeological and palaeoenvironmental research-creates challenges for established statistical methods. The methods assume that observations in a time-series are precisely dated, but this assumption is often violated when calibrated radiocarbon dates are used because they usually have highly irregular uncertainties. As a result, it is unclear whether the methods can be reliably used on radiocarbon-dated time-series. With this in mind, we conducted a large simulation study to investigate the impact of chronological uncertainty on a potentially useful time-series method. The method is a type of regression involving a prediction algorithm called the Poisson Exponentially Weighted Moving Average (PEMWA). It is designed for use with count time-series data, which makes it applicable to a wide range of questions about human-environment interaction in deep time. Our simulations suggest that the PEWMA method can often correctly identify relationships between time-series despite chronological uncertainty. When two time-series are correlated with a coefficient of 0.25, the method is able to identify that relationship correctly 20-30% of the time, providing the time-series contain low noise levels. With correlations of around 0.5, it is capable of correctly identifying correlations despite chronological uncertainty more than 90% of the time. While further testing is desirable, these findings indicate that the method can be used to test hypotheses about long-term human-environment interaction with a reasonable degree of confidence.

  7. The evaluation of a standardized call/recall system for childhood immunizations in Wandsworth, England.

    PubMed

    Atchison, Christina; Zvoc, Miro; Balakrishnan, Ravikumar

    2013-06-01

    To improve uptake of childhood immunizations in Wandsworth we developed a standardized call/recall system based on parents being sent three reminders and defaulters being referred to a Health Visitor. Thirty-two out of 44 primary care practices in the area implemented the intervention in September 2011. The aim of this study was to evaluate the implementation, delivery and impact on immunization uptake of the new call/recall system. To assess implementation and delivery, a mixed method approach was used including qualitative (structured interviews) and quantitative (data collected at three months post-implementation) assessment. To assess the impact, we used Student's t test to compare the difference in immunization uptake rates between intervention and non-intervention practices before and after implementation. The call/recall system was viewed positively by both parents and staff. Most children due or overdue immunizations were successfully captured by the 1st invitation reminder. After three invitations, between 87.3 % (MMR1) and 92.2 % (pre-school booster) of children identified as due or overdue immunizations successfully responded. Prior to implementation there was no difference in uptake rates between intervention and non-intervention practices. Post-implementation uptake rates for DTaP/IPV/Hib, MMR1, MMR2 and the pre-school booster were significantly greater in the intervention practices. Similar findings were seen for PCV and Hib/MenC boosters, although the differences were not statistically significant at the 5 % level. Following the successful implementation of a standardized call/recall system in Wandsworth, other regions or primary care practices may wish to consider introducing a similar system to help improve their immunization coverage levels.

  8. Practical interpretation of CYP2D6 haplotypes: Comparison and integration of automated and expert calling.

    PubMed

    Ruaño, Gualberto; Kocherla, Mohan; Graydon, James S; Holford, Theodore R; Makowski, Gregory S; Goethe, John W

    2016-05-01

    We describe a population genetic approach to compare samples interpreted with expert calling (EC) versus automated calling (AC) for CYP2D6 haplotyping. The analysis represents 4812 haplotype calls based on signal data generated by the Luminex xMap analyzers from 2406 patients referred to a high-complexity molecular diagnostics laboratory for CYP450 testing. DNA was extracted from buccal swabs. We compared the results of expert calls (EC) and automated calls (AC) with regard to haplotype number and frequency. The ratio of EC to AC was 1:3. Haplotype frequencies from EC and AC samples were convergent across haplotypes, and their distribution was not statistically different between the groups. Most duplications required EC, as only expansions with homozygous or hemizygous haplotypes could be automatedly called. High-complexity laboratories can offer equivalent interpretation to automated calling for non-expanded CYP2D6 loci, and superior interpretation for duplications. We have validated scientific expert calling specified by scoring rules as standard operating procedure integrated with an automated calling algorithm. The integration of EC with AC is a practical strategy for CYP2D6 clinical haplotyping. Copyright © 2016 Elsevier B.V. All rights reserved.

  9. A new approach to characterize very-low-level radioactive waste produced at hadron accelerators.

    PubMed

    Zaffora, Biagio; Magistris, Matteo; Chevalier, Jean-Pierre; Luccioni, Catherine; Saporta, Gilbert; Ulrici, Luisa

    2017-04-01

    Radioactive waste is produced as a consequence of preventive and corrective maintenance during the operation of high-energy particle accelerators or associated dismantling campaigns. Their radiological characterization must be performed to ensure an appropriate disposal in the disposal facilities. The radiological characterization of waste includes the establishment of the list of produced radionuclides, called "radionuclide inventory", and the estimation of their activity. The present paper describes the process adopted at CERN to characterize very-low-level radioactive waste with a focus on activated metals. The characterization method consists of measuring and estimating the activity of produced radionuclides either by experimental methods or statistical and numerical approaches. We adapted the so-called Scaling Factor (SF) and Correlation Factor (CF) techniques to the needs of hadron accelerators, and applied them to very-low-level metallic waste produced at CERN. For each type of metal we calculated the radionuclide inventory and identified the radionuclides that most contribute to hazard factors. The methodology proposed is of general validity, can be extended to other activated materials and can be used for the characterization of waste produced in particle accelerators and research centres, where the activation mechanisms are comparable to the ones occurring at CERN. Copyright © 2017 Elsevier Ltd. All rights reserved.

  10. Toward a quantitative account of pitch distribution in spontaneous narrative: Method and validation

    PubMed Central

    Matteson, Samuel E.; Streit Olness, Gloria; Caplow, Nancy J.

    2013-01-01

    Pitch is well-known both to animate human discourse and to convey meaning in communication. The study of the statistical population distributions of pitch in discourse will undoubtedly benefit from methodological improvements. The current investigation examines a method that parameterizes pitch in discourse as musical pitch interval H measured in units of cents and that disaggregates the sequence of peak word-pitches using tools employed in time-series analysis and digital signal processing. The investigators test the proposed methodology by its application to distributions in pitch interval of the peak word-pitch (collectively called the discourse gamut) that occur in simulated and actual spontaneous emotive narratives obtained from 17 middle-aged African-American adults. The analysis, in rigorous tests, not only faithfully reproduced simulated distributions imbedded in realistic time series that drift and include pitch breaks, but the protocol also reveals that the empirical distributions exhibit a common hidden structure when normalized to a slowly varying mode (called the gamut root) of their respective probability density functions. Quantitative differences between narratives reveal the speakers' relative propensity for the use of pitch levels corresponding to elevated degrees of a discourse gamut (the “e-la”) superimposed upon a continuum that conforms systematically to an asymmetric Laplace distribution. PMID:23654400

  11. Data Processing System (DPS) software with experimental design, statistical analysis and data mining developed for use in entomological research.

    PubMed

    Tang, Qi-Yi; Zhang, Chuan-Xi

    2013-04-01

    A comprehensive but simple-to-use software package called DPS (Data Processing System) has been developed to execute a range of standard numerical analyses and operations used in experimental design, statistics and data mining. This program runs on standard Windows computers. Many of the functions are specific to entomological and other biological research and are not found in standard statistical software. This paper presents applications of DPS to experimental design, statistical analysis and data mining in entomology. © 2012 The Authors Insect Science © 2012 Institute of Zoology, Chinese Academy of Sciences.

  12. Adaptive correction of ensemble forecasts

    NASA Astrophysics Data System (ADS)

    Pelosi, Anna; Battista Chirico, Giovanni; Van den Bergh, Joris; Vannitsem, Stephane

    2017-04-01

    Forecasts from numerical weather prediction (NWP) models often suffer from both systematic and non-systematic errors. These are present in both deterministic and ensemble forecasts, and originate from various sources such as model error and subgrid variability. Statistical post-processing techniques can partly remove such errors, which is particularly important when NWP outputs concerning surface weather variables are employed for site specific applications. Many different post-processing techniques have been developed. For deterministic forecasts, adaptive methods such as the Kalman filter are often used, which sequentially post-process the forecasts by continuously updating the correction parameters as new ground observations become available. These methods are especially valuable when long training data sets do not exist. For ensemble forecasts, well-known techniques are ensemble model output statistics (EMOS), and so-called "member-by-member" approaches (MBM). Here, we introduce a new adaptive post-processing technique for ensemble predictions. The proposed method is a sequential Kalman filtering technique that fully exploits the information content of the ensemble. One correction equation is retrieved and applied to all members, however the parameters of the regression equations are retrieved by exploiting the second order statistics of the forecast ensemble. We compare our new method with two other techniques: a simple method that makes use of a running bias correction of the ensemble mean, and an MBM post-processing approach that rescales the ensemble mean and spread, based on minimization of the Continuous Ranked Probability Score (CRPS). We perform a verification study for the region of Campania in southern Italy. We use two years (2014-2015) of daily meteorological observations of 2-meter temperature and 10-meter wind speed from 18 ground-based automatic weather stations distributed across the region, comparing them with the corresponding COSMO-LEPS ensemble forecasts. Deterministic verification scores (e.g., mean absolute error, bias) and probabilistic scores (e.g., CRPS) are used to evaluate the post-processing techniques. We conclude that the new adaptive method outperforms the simpler running bias-correction. The proposed adaptive method often outperforms the MBM method in removing bias. The MBM method has the advantage of correcting the ensemble spread, although it needs more training data.

  13. Tobacco Cessation May Improve Lung Cancer Patient Survival

    PubMed Central

    Dobson Amato, Katharine A.; Hyland, Andrew; Reed, Robert; Mahoney, Martin C.; Marshall, James; Giovino, Gary; Bansal-Travers, Maansi; Ochs-Balcom, Heather M.; Zevon, Michael A.; Cummings, K. Michael; Nwogu, Chukwumere; Singh, Anurag K.; Chen, Hongbin; Warren, Graham W.; Reid, Mary

    2015-01-01

    Introduction This study characterizes tobacco cessation patterns and the association of cessation with survival among lung cancer patients at Roswell Park Cancer Institute: an NCI Designated Comprehensive Cancer Center. Methods Lung cancer patients presenting at this institution were screened with a standardized tobacco assessment, and those who had used tobacco within the past 30 days were automatically referred to a telephone-based cessation service. Demographic, clinical information and self-reported tobacco use at last contact were obtained via electronic medical records and the RPCI tumor registry for all lung cancer patients referred to the service between October 2010 and October 2012. Descriptive statistics and Cox proportional hazards models were used to assess whether tobacco cessation and other factors were associated with lung cancer survival through May 2014. Results Calls were attempted to 313 of 388 lung cancer patients referred to the cessation service. Eighty percent of patients (250/313) were successfully contacted and participated in at least one telephone-based cessation call; 40.8% (102/250) of persons contacted reported having quit at the last contact. After controlling for age, pack year history, sex, ECOG performance status, time between diagnosis and last contact, tumor histology, and clinical stage, a statistically significant increase in survival was associated with quitting compared to continued tobacco use at last contact (HR=1.79; 95% CI: 1.14-2.82) with a median 9 month improvement in overall survival. Conclusions Tobacco cessation among lung cancer patients after diagnosis may increase overall survival. PMID:26102442

  14. R package MVR for Joint Adaptive Mean-Variance Regularization and Variance Stabilization

    PubMed Central

    Dazard, Jean-Eudes; Xu, Hua; Rao, J. Sunil

    2015-01-01

    We present an implementation in the R language for statistical computing of our recent non-parametric joint adaptive mean-variance regularization and variance stabilization procedure. The method is specifically suited for handling difficult problems posed by high-dimensional multivariate datasets (p ≫ n paradigm), such as in ‘omics’-type data, among which are that the variance is often a function of the mean, variable-specific estimators of variances are not reliable, and tests statistics have low powers due to a lack of degrees of freedom. The implementation offers a complete set of features including: (i) normalization and/or variance stabilization function, (ii) computation of mean-variance-regularized t and F statistics, (iii) generation of diverse diagnostic plots, (iv) synthetic and real ‘omics’ test datasets, (v) computationally efficient implementation, using C interfacing, and an option for parallel computing, (vi) manual and documentation on how to setup a cluster. To make each feature as user-friendly as possible, only one subroutine per functionality is to be handled by the end-user. It is available as an R package, called MVR (‘Mean-Variance Regularization’), downloadable from the CRAN. PMID:26819572

  15. Response surface method in geotechnical/structural analysis, phase 1

    NASA Astrophysics Data System (ADS)

    Wong, F. S.

    1981-02-01

    In the response surface approach, an approximating function is fit to a long running computer code based on a limited number of code calculations. The approximating function, called the response surface, is then used to replace the code in subsequent repetitive computations required in a statistical analysis. The procedure of the response surface development and feasibility of the method are shown using a sample problem in slop stability which is based on data from centrifuge experiments of model soil slopes and involves five random soil parameters. It is shown that a response surface can be constructed based on as few as four code calculations and that the response surface is computationally extremely efficient compared to the code calculation. Potential applications of this research include probabilistic analysis of dynamic, complex, nonlinear soil/structure systems such as slope stability, liquefaction, and nuclear reactor safety.

  16. The effects of guided inquiry instruction on student achievement in high school biology

    NASA Astrophysics Data System (ADS)

    Vass, Laszlo

    The purpose of this quantitative, quasi-experimental study was to measure the effect of a student-centered instructional method called guided inquiry on the achievement of students in a unit of study in high school biology. The study used a non-random sample of 109 students, the control group of 55 students enrolled in high school one, received teacher centered instruction while the experimental group of 54 students enrolled at high school two received student-centered, guided inquiry instruction. The pretest-posttest design of the study analyzed scores using an independent t-test, a dependent t-test (p = <.001), an ANCOVA (p = .007), mixed method ANOVA (p = .024) and hierarchical linear regression (p = <.001). The experimental group that received guided inquiry instruction had statistically significantly higher achievement than the control group.

  17. From seconds to months: an overview of multi-scale dynamics of mobile telephone calls

    NASA Astrophysics Data System (ADS)

    Saramäki, Jari; Moro, Esteban

    2015-06-01

    Big Data on electronic records of social interactions allow approaching human behaviour and sociality from a quantitative point of view with unforeseen statistical power. Mobile telephone Call Detail Records (CDRs), automatically collected by telecom operators for billing purposes, have proven especially fruitful for understanding one-to-one communication patterns as well as the dynamics of social networks that are reflected in such patterns. We present an overview of empirical results on the multi-scale dynamics of social dynamics and networks inferred from mobile telephone calls. We begin with the shortest timescales and fastest dynamics, such as burstiness of call sequences between individuals, and "zoom out" towards longer temporal and larger structural scales, from temporal motifs formed by correlated calls between multiple individuals to long-term dynamics of social groups. We conclude this overview with a future outlook.

  18. An Application of Extreme Value Theory to Learning Analytics: Predicting Collaboration Outcome from Eye-Tracking Data

    ERIC Educational Resources Information Center

    Sharma, Kshitij; Chavez-Demoulin, Valérie; Dillenbourg, Pierre

    2017-01-01

    The statistics used in education research are based on central trends such as the mean or standard deviation, discarding outliers. This paper adopts another viewpoint that has emerged in statistics, called extreme value theory (EVT). EVT claims that the bulk of normal distribution is comprised mainly of uninteresting variations while the most…

  19. Including the Tukey Mean-Difference (Bland-Altman) Plot in a Statistics Course

    ERIC Educational Resources Information Center

    Kozak, Marcin; Wnuk, Agnieszka

    2014-01-01

    The Tukey mean-difference plot, also called the Bland-Altman plot, is a recognized graphical tool in the exploration of biometrical data. We show that this technique deserves a place on an introductory statistics course by encouraging students to think about the kind of graph they wish to create, rather than just creating the default graph for the…

  20. Statistical physics of vehicular traffic and some related systems

    NASA Astrophysics Data System (ADS)

    Chowdhury, Debashish; Santen, Ludger; Schadschneider, Andreas

    2000-05-01

    In the so-called “microscopic” models of vehicular traffic, attention is paid explicitly to each individual vehicle each of which is represented by a “particle”; the nature of the “interactions” among these particles is determined by the way the vehicles influence each others’ movement. Therefore, vehicular traffic, modeled as a system of interacting “particles” driven far from equilibrium, offers the possibility to study various fundamental aspects of truly nonequilibrium systems which are of current interest in statistical physics. Analytical as well as numerical techniques of statistical physics are being used to study these models to understand rich variety of physical phenomena exhibited by vehicular traffic. Some of these phenomena, observed in vehicular traffic under different circumstances, include transitions from one dynamical phase to another, criticality and self-organized criticality, metastability and hysteresis, phase-segregation, etc. In this critical review, written from the perspective of statistical physics, we explain the guiding principles behind all the main theoretical approaches. But we present detailed discussions on the results obtained mainly from the so-called “particle-hopping” models, particularly emphasizing those which have been formulated in recent years using the language of cellular automata.

  1. Monetary policy and the effects of oil price shocks on the Japanese economy

    NASA Astrophysics Data System (ADS)

    Lee, Byung Rhae

    1998-12-01

    The evidence of output decreases and price level increases following oil price shocks in the Japanese economy is presented in this paper. These negative effects of oil shocks are better explained by Hamilton's (1996) net oil price increase measure (NOPI) than by other oil measures. The fact that an oil shock has a statistically significant effect on the call money rate and real output and that the call money rate also has a statistically significant effect on real output appears to explain that the effects of oil price shocks on economic activity are partially attributed to contractionary monetary policy responses. The asymmetric effects of positive and negative oil shocks are also found in the Japanese economy and this asymmetry can also be partially explained by monetary policy responses. To assess the relative contribution of oil shocks and endogenous monetary policy responses to the economic downturns, I shut off the responses of the call money rate to oil shocks utilizing the impulse response results from the VAR model. Then, I re-run the VAR with the adjusted call money rate series. The empirical results show that around 30--40% of the negative effects of oil price shocks on the Japanese economy can be accounted for by oil shock induced monetary tightening.

  2. Interactive (statistical) visualisation and exploration of a billion objects with vaex

    NASA Astrophysics Data System (ADS)

    Breddels, M. A.

    2017-06-01

    With new catalogues arriving such as the Gaia DR1, containing more than a billion objects, new methods of handling and visualizing these data volumes are needed. We show that by calculating statistics on a regular (N-dimensional) grid, visualizations of a billion objects can be done within a second on a modern desktop computer. This is achieved using memory mapping of hdf5 files together with a simple binning algorithm, which are part of a Python library called vaex. This enables efficient exploration or large datasets interactively, making science exploration of large catalogues feasible. Vaex is a Python library and an application, which allows for interactive exploration and visualization. The motivation for developing vaex is the catalogue of the Gaia satellite, however, vaex can also be used on SPH or N-body simulations, any other (future) catalogues such as SDSS, Pan-STARRS, LSST, etc. or other tabular data. The homepage for vaex is http://vaex.astro.rug.nl.

  3. Additive Genetic Variability and the Bayesian Alphabet

    PubMed Central

    Gianola, Daniel; de los Campos, Gustavo; Hill, William G.; Manfredi, Eduardo; Fernando, Rohan

    2009-01-01

    The use of all available molecular markers in statistical models for prediction of quantitative traits has led to what could be termed a genomic-assisted selection paradigm in animal and plant breeding. This article provides a critical review of some theoretical and statistical concepts in the context of genomic-assisted genetic evaluation of animals and crops. First, relationships between the (Bayesian) variance of marker effects in some regression models and additive genetic variance are examined under standard assumptions. Second, the connection between marker genotypes and resemblance between relatives is explored, and linkages between a marker-based model and the infinitesimal model are reviewed. Third, issues associated with the use of Bayesian models for marker-assisted selection, with a focus on the role of the priors, are examined from a theoretical angle. The sensitivity of a Bayesian specification that has been proposed (called “Bayes A”) with respect to priors is illustrated with a simulation. Methods that can solve potential shortcomings of some of these Bayesian regression procedures are discussed briefly. PMID:19620397

  4. Optimized design and analysis of preclinical intervention studies in vivo

    PubMed Central

    Laajala, Teemu D.; Jumppanen, Mikael; Huhtaniemi, Riikka; Fey, Vidal; Kaur, Amanpreet; Knuuttila, Matias; Aho, Eija; Oksala, Riikka; Westermarck, Jukka; Mäkelä, Sari; Poutanen, Matti; Aittokallio, Tero

    2016-01-01

    Recent reports have called into question the reproducibility, validity and translatability of the preclinical animal studies due to limitations in their experimental design and statistical analysis. To this end, we implemented a matching-based modelling approach for optimal intervention group allocation, randomization and power calculations, which takes full account of the complex animal characteristics at baseline prior to interventions. In prostate cancer xenograft studies, the method effectively normalized the confounding baseline variability, and resulted in animal allocations which were supported by RNA-seq profiling of the individual tumours. The matching information increased the statistical power to detect true treatment effects at smaller sample sizes in two castration-resistant prostate cancer models, thereby leading to saving of both animal lives and research costs. The novel modelling approach and its open-source and web-based software implementations enable the researchers to conduct adequately-powered and fully-blinded preclinical intervention studies, with the aim to accelerate the discovery of new therapeutic interventions. PMID:27480578

  5. Optimized design and analysis of preclinical intervention studies in vivo.

    PubMed

    Laajala, Teemu D; Jumppanen, Mikael; Huhtaniemi, Riikka; Fey, Vidal; Kaur, Amanpreet; Knuuttila, Matias; Aho, Eija; Oksala, Riikka; Westermarck, Jukka; Mäkelä, Sari; Poutanen, Matti; Aittokallio, Tero

    2016-08-02

    Recent reports have called into question the reproducibility, validity and translatability of the preclinical animal studies due to limitations in their experimental design and statistical analysis. To this end, we implemented a matching-based modelling approach for optimal intervention group allocation, randomization and power calculations, which takes full account of the complex animal characteristics at baseline prior to interventions. In prostate cancer xenograft studies, the method effectively normalized the confounding baseline variability, and resulted in animal allocations which were supported by RNA-seq profiling of the individual tumours. The matching information increased the statistical power to detect true treatment effects at smaller sample sizes in two castration-resistant prostate cancer models, thereby leading to saving of both animal lives and research costs. The novel modelling approach and its open-source and web-based software implementations enable the researchers to conduct adequately-powered and fully-blinded preclinical intervention studies, with the aim to accelerate the discovery of new therapeutic interventions.

  6. Learning coefficient of generalization error in Bayesian estimation and vandermonde matrix-type singularity.

    PubMed

    Aoyagi, Miki; Nagata, Kenji

    2012-06-01

    The term algebraic statistics arises from the study of probabilistic models and techniques for statistical inference using methods from algebra and geometry (Sturmfels, 2009 ). The purpose of our study is to consider the generalization error and stochastic complexity in learning theory by using the log-canonical threshold in algebraic geometry. Such thresholds correspond to the main term of the generalization error in Bayesian estimation, which is called a learning coefficient (Watanabe, 2001a , 2001b ). The learning coefficient serves to measure the learning efficiencies in hierarchical learning models. In this letter, we consider learning coefficients for Vandermonde matrix-type singularities, by using a new approach: focusing on the generators of the ideal, which defines singularities. We give tight new bound values of learning coefficients for the Vandermonde matrix-type singularities and the explicit values with certain conditions. By applying our results, we can show the learning coefficients of three-layered neural networks and normal mixture models.

  7. Combining forecast weights: Why and how?

    NASA Astrophysics Data System (ADS)

    Yin, Yip Chee; Kok-Haur, Ng; Hock-Eam, Lim

    2012-09-01

    This paper proposes a procedure called forecast weight averaging which is a specific combination of forecast weights obtained from different methods of constructing forecast weights for the purpose of improving the accuracy of pseudo out of sample forecasting. It is found that under certain specified conditions, forecast weight averaging can lower the mean squared forecast error obtained from model averaging. In addition, we show that in a linear and homoskedastic environment, this superior predictive ability of forecast weight averaging holds true irrespective whether the coefficients are tested by t statistic or z statistic provided the significant level is within the 10% range. By theoretical proofs and simulation study, we have shown that model averaging like, variance model averaging, simple model averaging and standard error model averaging, each produces mean squared forecast error larger than that of forecast weight averaging. Finally, this result also holds true marginally when applied to business and economic empirical data sets, Gross Domestic Product (GDP growth rate), Consumer Price Index (CPI) and Average Lending Rate (ALR) of Malaysia.

  8. Multiple signal classification algorithm for super-resolution fluorescence microscopy

    PubMed Central

    Agarwal, Krishna; Macháň, Radek

    2016-01-01

    Single-molecule localization techniques are restricted by long acquisition and computational times, or the need of special fluorophores or biologically toxic photochemical environments. Here we propose a statistical super-resolution technique of wide-field fluorescence microscopy we call the multiple signal classification algorithm which has several advantages. It provides resolution down to at least 50 nm, requires fewer frames and lower excitation power and works even at high fluorophore concentrations. Further, it works with any fluorophore that exhibits blinking on the timescale of the recording. The multiple signal classification algorithm shows comparable or better performance in comparison with single-molecule localization techniques and four contemporary statistical super-resolution methods for experiments of in vitro actin filaments and other independently acquired experimental data sets. We also demonstrate super-resolution at timescales of 245 ms (using 49 frames acquired at 200 frames per second) in samples of live-cell microtubules and live-cell actin filaments imaged without imaging buffers. PMID:27934858

  9. Physics First: Impact on SAT Math Scores

    NASA Astrophysics Data System (ADS)

    Bouma, Craig E.

    Improving science, technology, engineering, and mathematics (STEM) education has become a national priority and the call to modernize secondary science has been heard. A Physics First (PF) program with the curriculum sequence of physics, chemistry, and biology (PCB) driven by inquiry- and project-based learning offers a viable alternative to the traditional curricular sequence (BCP) and methods of teaching, but requires more empirical evidence. This study determined impact of a PF program (PF-PCB) on math achievement (SAT math scores) after the first two cohorts of students completed the PF-PCB program at Matteo Ricci High School (MRHS) and provided more quantitative data to inform the PF debate and advance secondary science education. Statistical analysis (ANCOVA) determined the influence of covariates and revealed that PF-PCB program had a significant (p < .05) impact on SAT math scores in the second cohort at MRHS. Statistically adjusted, the SAT math means for PF students were 21.4 points higher than their non-PF counterparts when controlling for prior math achievement (HSTP math), socioeconomic status (SES), and ethnicity/race.

  10. Anima: Modular Workflow System for Comprehensive Image Data Analysis

    PubMed Central

    Rantanen, Ville; Valori, Miko; Hautaniemi, Sampsa

    2014-01-01

    Modern microscopes produce vast amounts of image data, and computational methods are needed to analyze and interpret these data. Furthermore, a single image analysis project may require tens or hundreds of analysis steps starting from data import and pre-processing to segmentation and statistical analysis; and ending with visualization and reporting. To manage such large-scale image data analysis projects, we present here a modular workflow system called Anima. Anima is designed for comprehensive and efficient image data analysis development, and it contains several features that are crucial in high-throughput image data analysis: programing language independence, batch processing, easily customized data processing, interoperability with other software via application programing interfaces, and advanced multivariate statistical analysis. The utility of Anima is shown with two case studies focusing on testing different algorithms developed in different imaging platforms and an automated prediction of alive/dead C. elegans worms by integrating several analysis environments. Anima is a fully open source and available with documentation at www.anduril.org/anima. PMID:25126541

  11. Impact of additional counselling sessions through phone calls on smoking cessation outcomes among smokers in Penang State, Malaysia

    PubMed Central

    2014-01-01

    Background Studies all over the world reported that smoking relapses occur during the first two weeks after a quit date. The current study aimed to assess the impact of the additional phone calls counselling during the first month on the abstinence rate at 3 and 6 months after quit date among smokers in Penang, Malaysia. Methods The study was conducted at Quit Smoking Clinic of two major hospitals in Penang, Malaysia. All the eligible smokers who attended the clinics between February 1st and October 31st 2012 were invited. Participants were randomly assigned by using urn design method either to receive the usual care that followed in the clinics (control) or the usual care procedure plus extra counselling sessions through phone calls during the first month of quit attempt (intervention). Results Participants in our cohort smoked about 14 cigarettes per day on average (mean = 13.78 ± 7.0). At 3 months, control group was less likely to quit smoking compared to intervention group (36.9% vs. 46.7%, verified smoking status) but this did not reach statistical significance (OR = 0.669; 95% CI = 0.395-1.133, P = 0.86). However, at 6 months, 71.7% of the intervention group were successfully quit smoking (bio-chemically verified) compared to 48.6% of the control group (P < 0.001). The control group were significantly less likely to quit smoking (OR = 0.375; 95% CI = 0.217-0.645, P < 0.001). Conclusions Smoking cessation intervention consisting of phone calls counselling delivered during the first month of quit attempt revealed significantly higher abstinence rates compared with a standard care approach. Therefore, the additional counselling in the first few weeks after stop smoking is a promising treatment strategy that should be evaluated further. Trial registration TCTR20140504001 PMID:24886549

  12. A remark on copy number variation detection methods.

    PubMed

    Li, Shuo; Dou, Xialiang; Gao, Ruiqi; Ge, Xinzhou; Qian, Minping; Wan, Lin

    2018-01-01

    Copy number variations (CNVs) are gain and loss of DNA sequence of a genome. High throughput platforms such as microarrays and next generation sequencing technologies (NGS) have been applied for genome wide copy number losses. Although progress has been made in both approaches, the accuracy and consistency of CNV calling from the two platforms remain in dispute. In this study, we perform a deep analysis on copy number losses on 254 human DNA samples, which have both SNP microarray data and NGS data publicly available from Hapmap Project and 1000 Genomes Project respectively. We show that the copy number losses reported from Hapmap Project and 1000 Genome Project only have < 30% overlap, while these reports are required to have cross-platform (e.g. PCR, microarray and high-throughput sequencing) experimental supporting by their corresponding projects, even though state-of-art calling methods were employed. On the other hand, copy number losses are found directly from HapMap microarray data by an accurate algorithm, i.e. CNVhac, almost all of which have lower read mapping depth in NGS data; furthermore, 88% of which can be supported by the sequences with breakpoint in NGS data. Our results suggest the ability of microarray calling CNVs and the possible introduction of false negatives from the unessential requirement of the additional cross-platform supporting. The inconsistency of CNV reports from Hapmap Project and 1000 Genomes Project might result from the inadequate information containing in microarray data, the inconsistent detection criteria, or the filtration effect of cross-platform supporting. The statistical test on CNVs called from CNVhac show that the microarray data can offer reliable CNV reports, and majority of CNV candidates can be confirmed by raw sequences. Therefore, the CNV candidates given by a good caller could be highly reliable without cross-platform supporting, so additional experimental information should be applied in need instead of necessarily.

  13. Universal Recurrence Time Statistics of Characteristic Earthquakes

    NASA Astrophysics Data System (ADS)

    Goltz, C.; Turcotte, D. L.; Abaimov, S.; Nadeau, R. M.

    2006-12-01

    Characteristic earthquakes are defined to occur quasi-periodically on major faults. Do recurrence time statistics of such earthquakes follow a particular statistical distribution? If so, which one? The answer is fundamental and has important implications for hazard assessment. The problem cannot be solved by comparing the goodness of statistical fits as the available sequences are too short. The Parkfield sequence of M ≍ 6 earthquakes, one of the most extensive reliable data sets available, has grown to merely seven events with the last earthquake in 2004, for example. Recently, however, advances in seismological monitoring and improved processing methods have unveiled so-called micro-repeaters, micro-earthquakes which recur exactly in the same location on a fault. It seems plausible to regard these earthquakes as a miniature version of the classic characteristic earthquakes. Micro-repeaters are much more frequent than major earthquakes, leading to longer sequences for analysis. Due to their recent discovery, however, available sequences contain less than 20 events at present. In this paper we present results for the analysis of recurrence times for several micro-repeater sequences from Parkfield and adjacent regions. To improve the statistical significance of our findings, we combine several sequences into one by rescaling the individual sets by their respective mean recurrence intervals and Weibull exponents. This novel approach of rescaled combination yields the most extensive data set possible. We find that the resulting statistics can be fitted well by an exponential distribution, confirming the universal applicability of the Weibull distribution to characteristic earthquakes. A similar result is obtained from rescaled combination, however, with regard to the lognormal distribution.

  14. Who Are We Not Calling On? A Study of Classroom Participation and the Implementation of the Name Card Method.

    ERIC Educational Resources Information Center

    Carter, Angela

    This study involved observing a second-grade classroom to investigate how the teacher called on students, noting whether the teacher gave enough attention to students who raised their hands frequently by calling on them and examining students' responses when called on. Researchers implemented a new method of calling on students using name cards,…

  15. Application of the Linux cluster for exhaustive window haplotype analysis using the FBAT and Unphased programs.

    PubMed

    Mishima, Hiroyuki; Lidral, Andrew C; Ni, Jun

    2008-05-28

    Genetic association studies have been used to map disease-causing genes. A newly introduced statistical method, called exhaustive haplotype association study, analyzes genetic information consisting of different numbers and combinations of DNA sequence variations along a chromosome. Such studies involve a large number of statistical calculations and subsequently high computing power. It is possible to develop parallel algorithms and codes to perform the calculations on a high performance computing (HPC) system. However, most existing commonly-used statistic packages for genetic studies are non-parallel versions. Alternatively, one may use the cutting-edge technology of grid computing and its packages to conduct non-parallel genetic statistical packages on a centralized HPC system or distributed computing systems. In this paper, we report the utilization of a queuing scheduler built on the Grid Engine and run on a Rocks Linux cluster for our genetic statistical studies. Analysis of both consecutive and combinational window haplotypes was conducted by the FBAT (Laird et al., 2000) and Unphased (Dudbridge, 2003) programs. The dataset consisted of 26 loci from 277 extended families (1484 persons). Using the Rocks Linux cluster with 22 compute-nodes, FBAT jobs performed about 14.4-15.9 times faster, while Unphased jobs performed 1.1-18.6 times faster compared to the accumulated computation duration. Execution of exhaustive haplotype analysis using non-parallel software packages on a Linux-based system is an effective and efficient approach in terms of cost and performance.

  16. Application of the Linux cluster for exhaustive window haplotype analysis using the FBAT and Unphased programs

    PubMed Central

    Mishima, Hiroyuki; Lidral, Andrew C; Ni, Jun

    2008-01-01

    Background Genetic association studies have been used to map disease-causing genes. A newly introduced statistical method, called exhaustive haplotype association study, analyzes genetic information consisting of different numbers and combinations of DNA sequence variations along a chromosome. Such studies involve a large number of statistical calculations and subsequently high computing power. It is possible to develop parallel algorithms and codes to perform the calculations on a high performance computing (HPC) system. However, most existing commonly-used statistic packages for genetic studies are non-parallel versions. Alternatively, one may use the cutting-edge technology of grid computing and its packages to conduct non-parallel genetic statistical packages on a centralized HPC system or distributed computing systems. In this paper, we report the utilization of a queuing scheduler built on the Grid Engine and run on a Rocks Linux cluster for our genetic statistical studies. Results Analysis of both consecutive and combinational window haplotypes was conducted by the FBAT (Laird et al., 2000) and Unphased (Dudbridge, 2003) programs. The dataset consisted of 26 loci from 277 extended families (1484 persons). Using the Rocks Linux cluster with 22 compute-nodes, FBAT jobs performed about 14.4–15.9 times faster, while Unphased jobs performed 1.1–18.6 times faster compared to the accumulated computation duration. Conclusion Execution of exhaustive haplotype analysis using non-parallel software packages on a Linux-based system is an effective and efficient approach in terms of cost and performance. PMID:18541045

  17. Ecological Momentary Assessments and Automated Time Series Analysis to Promote Tailored Health Care: A Proof-of-Principle Study

    PubMed Central

    Emerencia, Ando C; Bos, Elisabeth H; Rosmalen, Judith GM; Riese, Harriëtte; Aiello, Marco; Sytema, Sjoerd; de Jonge, Peter

    2015-01-01

    Background Health promotion can be tailored by combining ecological momentary assessments (EMA) with time series analysis. This combined method allows for studying the temporal order of dynamic relationships among variables, which may provide concrete indications for intervention. However, application of this method in health care practice is hampered because analyses are conducted manually and advanced statistical expertise is required. Objective This study aims to show how this limitation can be overcome by introducing automated vector autoregressive modeling (VAR) of EMA data and to evaluate its feasibility through comparisons with results of previously published manual analyses. Methods We developed a Web-based open source application, called AutoVAR, which automates time series analyses of EMA data and provides output that is intended to be interpretable by nonexperts. The statistical technique we used was VAR. AutoVAR tests and evaluates all possible VAR models within a given combinatorial search space and summarizes their results, thereby replacing the researcher’s tasks of conducting the analysis, making an informed selection of models, and choosing the best model. We compared the output of AutoVAR to the output of a previously published manual analysis (n=4). Results An illustrative example consisting of 4 analyses was provided. Compared to the manual output, the AutoVAR output presents similar model characteristics and statistical results in terms of the Akaike information criterion, the Bayesian information criterion, and the test statistic of the Granger causality test. Conclusions Results suggest that automated analysis and interpretation of times series is feasible. Compared to a manual procedure, the automated procedure is more robust and can save days of time. These findings may pave the way for using time series analysis for health promotion on a larger scale. AutoVAR was evaluated using the results of a previously conducted manual analysis. Analysis of additional datasets is needed in order to validate and refine the application for general use. PMID:26254160

  18. A modified weighted function method for parameter estimation of Pearson type three distribution

    NASA Astrophysics Data System (ADS)

    Liang, Zhongmin; Hu, Yiming; Li, Binquan; Yu, Zhongbo

    2014-04-01

    In this paper, an unconventional method called Modified Weighted Function (MWF) is presented for the conventional moment estimation of a probability distribution function. The aim of MWF is to estimate the coefficient of variation (CV) and coefficient of skewness (CS) from the original higher moment computations to the first-order moment calculations. The estimators for CV and CS of Pearson type three distribution function (PE3) were derived by weighting the moments of the distribution with two weight functions, which were constructed by combining two negative exponential-type functions. The selection of these weight functions was based on two considerations: (1) to relate weight functions to sample size in order to reflect the relationship between the quantity of sample information and the role of weight function and (2) to allocate more weights to data close to medium-tail positions in a sample series ranked in an ascending order. A Monte-Carlo experiment was conducted to simulate a large number of samples upon which statistical properties of MWF were investigated. For the PE3 parent distribution, results of MWF were compared to those of the original Weighted Function (WF) and Linear Moments (L-M). The results indicate that MWF was superior to WF and slightly better than L-M, in terms of statistical unbiasness and effectiveness. In addition, the robustness of MWF, WF, and L-M were compared by designing the Monte-Carlo experiment that samples are obtained from Log-Pearson type three distribution (LPE3), three parameter Log-Normal distribution (LN3), and Generalized Extreme Value distribution (GEV), respectively, but all used as samples from the PE3 distribution. The results show that in terms of statistical unbiasness, no one method possesses the absolutely overwhelming advantage among MWF, WF, and L-M, while in terms of statistical effectiveness, the MWF is superior to WF and L-M.

  19. Weighted Feature Significance: A Simple, Interpretable Model of Compound Toxicity Based on the Statistical Enrichment of Structural Features

    PubMed Central

    Huang, Ruili; Southall, Noel; Xia, Menghang; Cho, Ming-Hsuang; Jadhav, Ajit; Nguyen, Dac-Trung; Inglese, James; Tice, Raymond R.; Austin, Christopher P.

    2009-01-01

    In support of the U.S. Tox21 program, we have developed a simple and chemically intuitive model we call weighted feature significance (WFS) to predict the toxicological activity of compounds, based on the statistical enrichment of structural features in toxic compounds. We trained and tested the model on the following: (1) data from quantitative high–throughput screening cytotoxicity and caspase activation assays conducted at the National Institutes of Health Chemical Genomics Center, (2) data from Salmonella typhimurium reverse mutagenicity assays conducted by the U.S. National Toxicology Program, and (3) hepatotoxicity data published in the Registry of Toxic Effects of Chemical Substances. Enrichments of structural features in toxic compounds are evaluated for their statistical significance and compiled into a simple additive model of toxicity and then used to score new compounds for potential toxicity. The predictive power of the model for cytotoxicity was validated using an independent set of compounds from the U.S. Environmental Protection Agency tested also at the National Institutes of Health Chemical Genomics Center. We compared the performance of our WFS approach with classical classification methods such as Naive Bayesian clustering and support vector machines. In most test cases, WFS showed similar or slightly better predictive power, especially in the prediction of hepatotoxic compounds, where WFS appeared to have the best performance among the three methods. The new algorithm has the important advantages of simplicity, power, interpretability, and ease of implementation. PMID:19805409

  20. Bubbles and denaturation in DNA

    NASA Astrophysics Data System (ADS)

    van Erp, T. S.; Cuesta-López, S.; Peyrard, M.

    2006-08-01

    The local opening of DNA is an intriguing phenomenon from a statistical-physics point of view, but is also essential for its biological function. For instance, the transcription and replication of our genetic code cannot take place without the unwinding of the DNA double helix. Although these biological processes are driven by proteins, there might well be a relation between these biological openings and the spontaneous bubble formation due to thermal fluctuations. Mesoscopic models, like the Peyrard-Bishop-Dauxois (PBD) model, have fairly accurately reproduced some experimental denaturation curves and the sharp phase transition in the thermodynamic limit. It is, hence, tempting to see whether these models could be used to predict the biological activity of DNA. In a previous study, we introduced a method that allows to obtain very accurate results on this subject, which showed that some previous claims in this direction, based on molecular-dynamics studies, were premature. This could either imply that the present PBD model should be improved or that biological activity can only be predicted in a more complex framework that involves interactions with proteins and super helical stresses. In this article, we give a detailed description of the statistical method introduced before. Moreover, for several DNA sequences, we give a thorough analysis of the bubble-statistics as a function of position and bubble size and the so-called l-denaturation curves that can be measured experimentally. These show that some important experimental observations are missing in the present model. We discuss how the present model could be improved.

  1. Arsenic contamination of drinking water in Ireland: A spatial analysis of occurrence and potential risk.

    PubMed

    McGrory, Ellen R; Brown, Colin; Bargary, Norma; Williams, Natalya Hunter; Mannix, Anthony; Zhang, Chaosheng; Henry, Tiernan; Daly, Eve; Nicholas, Sarah; Petrunic, Barbara M; Lee, Monica; Morrison, Liam

    2017-02-01

    The presence of arsenic in groundwater has become a global concern due to the health risks from drinking water with elevated concentrations. The Water Framework Directive (WFD) of the European Union calls for drinking water risk assessment for member states. The present study amalgamates readily available national and sub-national scale datasets on arsenic in groundwater in the Republic of Ireland. However, due to the presence of high levels of left censoring (i.e. arsenic values below an analytical detection limit) and changes in detection limits over time, the application of conventional statistical methods would inhibit the generation of meaningful results. In order to handle these issues several arsenic databases were integrated and the data modelled using statistical methods appropriate for non-detect data. In addition, geostatistical methods were used to assess principal risk components of elevated arsenic related to lithology, aquifer type and groundwater vulnerability. Geographic statistical methods were used to overcome some of the geographical limitations of the Irish Environmental Protection Agency (EPA) sample database. Nearest-neighbour inverse distance weighting (IDW) and local indicator of spatial association (LISA) methods were used to estimate risk in non-sampled areas. Significant differences were also noted between different aquifer lithologies, indicating that Rhyolite, Sandstone and Shale (Greywackes), and Impure Limestone potentially presented a greater risk of elevated arsenic in groundwaters. Significant differences also occurred among aquifer types with poorly productive aquifers, locally important fractured bedrock aquifers and regionally important fissured bedrock aquifers presenting the highest potential risk of elevated arsenic. No significant differences were detected among different groundwater vulnerability groups as defined by the Geological Survey of Ireland. This research will assist management and future policy directions of groundwater resources at EU level and guide future research focused on understanding arsenic mobilisation processes to facilitate in guiding future development, testing and treatment requirements of groundwater resources. Copyright © 2016 Elsevier B.V. All rights reserved.

  2. Refining the Use of Linkage Disequilibrium as a Robust Signature of Selective Sweeps.

    PubMed

    Jacobs, Guy S; Sluckin, Tim J; Kivisild, Toomas

    2016-08-01

    During a selective sweep, characteristic patterns of linkage disequilibrium can arise in the genomic region surrounding a selected locus. These have been used to infer past selective sweeps. However, the recombination rate is known to vary substantially along the genome for many species. We here investigate the effectiveness of current (Kelly's [Formula: see text] and [Formula: see text]) and novel statistics at inferring hard selective sweeps based on linkage disequilibrium distortions under different conditions, including a human-realistic demographic model and recombination rate variation. When the recombination rate is constant, Kelly's [Formula: see text] offers high power, but is outperformed by a novel statistic that we test, which we call [Formula: see text] We also find this statistic to be effective at detecting sweeps from standing variation. When recombination rate fluctuations are included, there is a considerable reduction in power for all linkage disequilibrium-based statistics. However, this can largely be reversed by appropriately controlling for expected linkage disequilibrium using a genetic map. To further test these different methods, we perform selection scans on well-characterized HapMap data, finding that all three statistics-[Formula: see text] Kelly's [Formula: see text] and [Formula: see text]-are able to replicate signals at regions previously identified as selection candidates based on population differentiation or the site frequency spectrum. While [Formula: see text] replicates most candidates when recombination map data are not available, the [Formula: see text] and [Formula: see text] statistics are more successful when recombination rate variation is controlled for. Given both this and their higher power in simulations of selective sweeps, these statistics are preferred when information on local recombination rate variation is available. Copyright © 2016 by the Genetics Society of America.

  3. "Plateau"-related summary statistics are uninformative for comparing working memory models.

    PubMed

    van den Berg, Ronald; Ma, Wei Ji

    2014-10-01

    Performance on visual working memory tasks decreases as more items need to be remembered. Over the past decade, a debate has unfolded between proponents of slot models and slotless models of this phenomenon (Ma, Husain, Bays (Nature Neuroscience 17, 347-356, 2014). Zhang and Luck (Nature 453, (7192), 233-235, 2008) and Anderson, Vogel, and Awh (Attention, Perception, Psychophys 74, (5), 891-910, 2011) noticed that as more items need to be remembered, "memory noise" seems to first increase and then reach a "stable plateau." They argued that three summary statistics characterizing this plateau are consistent with slot models, but not with slotless models. Here, we assess the validity of their methods. We generated synthetic data both from a leading slot model and from a recent slotless model and quantified model evidence using log Bayes factors. We found that the summary statistics provided at most 0.15 % of the expected model evidence in the raw data. In a model recovery analysis, a total of more than a million trials were required to achieve 99 % correct recovery when models were compared on the basis of summary statistics, whereas fewer than 1,000 trials were sufficient when raw data were used. Therefore, at realistic numbers of trials, plateau-related summary statistics are highly unreliable for model comparison. Applying the same analyses to subject data from Anderson et al. (Attention, Perception, Psychophys 74, (5), 891-910, 2011), we found that the evidence in the summary statistics was at most 0.12 % of the evidence in the raw data and far too weak to warrant any conclusions. The evidence in the raw data, in fact, strongly favored the slotless model. These findings call into question claims about working memory that are based on summary statistics.

  4. Baseline models of trace elements in major aquifers of the United States

    USGS Publications Warehouse

    Lee, L.; Helsel, D.

    2005-01-01

    Trace-element concentrations in baseline samples from a survey of aquifers used as potable-water supplies in the United States are summarized using methods appropriate for data with multiple detection limits. The resulting statistical distribution models are used to develop summary statistics and estimate probabilities of exceeding water-quality standards. The models are based on data from the major aquifer studies of the USGS National Water Quality Assessment (NAWQA) Program. These data were produced with a nationally-consistent sampling and analytical framework specifically designed to determine the quality of the most important potable groundwater resources during the years 1991-2001. The analytical data for all elements surveyed contain values that were below several detection limits. Such datasets are referred to as multiply-censored data. To address this issue, a robust semi-parametric statistical method called regression on order statistics (ROS) is employed. Utilizing the 90th-95th percentile as an arbitrary range for the upper limits of expected baseline concentrations, the models show that baseline concentrations of dissolved Ba and Zn are below 500 ??g/L. For the same percentile range, dissolved As, Cu and Mo concentrations are below 10 ??g/L, and dissolved Ag, Be, Cd, Co, Cr, Ni, Pb, Sb and Se are below 1-5 ??g/L. These models are also used to determine the probabilities that potable ground waters exceed drinking water standards. For dissolved Ba, Cr, Cu, Pb, Ni, Mo and Se, the likelihood of exceeding the US Environmental Protection Agency standards at the well-head is less than 1-1.5%. A notable exception is As, which has approximately a 7% chance of exceeding the maximum contaminant level (10 ??g/L) at the well head.

  5. On statistical inference in time series analysis of the evolution of road safety.

    PubMed

    Commandeur, Jacques J F; Bijleveld, Frits D; Bergel-Hayat, Ruth; Antoniou, Constantinos; Yannis, George; Papadimitriou, Eleonora

    2013-11-01

    Data collected for building a road safety observatory usually include observations made sequentially through time. Examples of such data, called time series data, include annual (or monthly) number of road traffic accidents, traffic fatalities or vehicle kilometers driven in a country, as well as the corresponding values of safety performance indicators (e.g., data on speeding, seat belt use, alcohol use, etc.). Some commonly used statistical techniques imply assumptions that are often violated by the special properties of time series data, namely serial dependency among disturbances associated with the observations. The first objective of this paper is to demonstrate the impact of such violations to the applicability of standard methods of statistical inference, which leads to an under or overestimation of the standard error and consequently may produce erroneous inferences. Moreover, having established the adverse consequences of ignoring serial dependency issues, the paper aims to describe rigorous statistical techniques used to overcome them. In particular, appropriate time series analysis techniques of varying complexity are employed to describe the development over time, relating the accident-occurrences to explanatory factors such as exposure measures or safety performance indicators, and forecasting the development into the near future. Traditional regression models (whether they are linear, generalized linear or nonlinear) are shown not to naturally capture the inherent dependencies in time series data. Dedicated time series analysis techniques, such as the ARMA-type and DRAG approaches are discussed next, followed by structural time series models, which are a subclass of state space methods. The paper concludes with general recommendations and practice guidelines for the use of time series models in road safety research. Copyright © 2012 Elsevier Ltd. All rights reserved.

  6. The living Drake equation of the Tau Zero Foundation

    NASA Astrophysics Data System (ADS)

    Maccone, Claudio

    2011-03-01

    The living Drake equation is our statistical generalization of the Drake equation such that it can take into account any number of factors. This new result opens up the possibility to enrich the equation by inserting more new factors as long as the scientific learning increases. The adjective "Living" refers just to this continuous enrichment of the Drake equation and is the goal of a new research project that the Tau Zero Foundation has entrusted to this author as the discoverer of the statistical Drake equation described hereafter. From a simple product of seven positive numbers, the Drake equation is now turned into the product of seven positive random variables. We call this "the Statistical Drake Equation". The mathematical consequences of this transformation are then derived. The proof of our results is based on the Central Limit Theorem (CLT) of Statistics. In loose terms, the CLT states that the sum of any number of independent random variables, each of which may be arbitrarily distributed, approaches a Gaussian (i.e. normal) random variable. This is called the Lyapunov form of the CLT, or the Lindeberg form of the CLT, depending on the mathematical constraints assumed on the third moments of the various probability distributions. In conclusion, we show that: The new random variable N, yielding the number of communicating civilizations in the Galaxy, follows the lognormal distribution. Then, the mean value, standard deviation, mode, median and all the moments of this lognormal N can be derived from the means and standard deviations of the seven input random variables. In fact, the seven factors in the ordinary Drake equation now become seven independent positive random variables. The probability distribution of each random variable may be arbitrary. The CLT in the so-called Lyapunov or Lindeberg forms (that both do not assume the factors to be identically distributed) allows for that. In other words, the CLT "translates" into our statistical Drake equation by allowing an arbitrary probability distribution for each factor. This is both physically realistic and practically very useful, of course. An application of our statistical Drake equation then follows. The (average) distance between any two neighbouring and communicating civilizations in the Galaxy may be shown to be inversely proportional to the cubic root of N. Then, this distance now becomes a new random variable. We derive the relevant probability density function, apparently previously unknown (dubbed "Maccone distribution" by Paul Davies). Data Enrichment Principle. It should be noticed that any positive number of random variables in the statistical Drake equation is compatible with the CLT. So, our generalization allows for many more factors to be added in the future as long as more refined scientific knowledge about each factor will be known to the scientists. This capability to make room for more future factors in the statistical Drake equation we call the "Data Enrichment Principle", and regard as the key to more profound, future results in Astrobiology and SETI.

  7. Monitoring Method of Cow Anthrax Based on Gis and Spatial Statistical Analysis

    NASA Astrophysics Data System (ADS)

    Li, Lin; Yang, Yong; Wang, Hongbin; Dong, Jing; Zhao, Yujun; He, Jianbin; Fan, Honggang

    Geographic information system (GIS) is a computer application system, which possesses the ability of manipulating spatial information and has been used in many fields related with the spatial information management. Many methods and models have been established for analyzing animal diseases distribution models and temporal-spatial transmission models. Great benefits have been gained from the application of GIS in animal disease epidemiology. GIS is now a very important tool in animal disease epidemiological research. Spatial analysis function of GIS can be widened and strengthened by using spatial statistical analysis, allowing for the deeper exploration, analysis, manipulation and interpretation of spatial pattern and spatial correlation of the animal disease. In this paper, we analyzed the cow anthrax spatial distribution characteristics in the target district A (due to the secret of epidemic data we call it district A) based on the established GIS of the cow anthrax in this district in combination of spatial statistical analysis and GIS. The Cow anthrax is biogeochemical disease, and its geographical distribution is related closely to the environmental factors of habitats and has some spatial characteristics, and therefore the correct analysis of the spatial distribution of anthrax cow for monitoring and the prevention and control of anthrax has a very important role. However, the application of classic statistical methods in some areas is very difficult because of the pastoral nomadic context. The high mobility of livestock and the lack of enough suitable sampling for the some of the difficulties in monitoring currently make it nearly impossible to apply rigorous random sampling methods. It is thus necessary to develop an alternative sampling method, which could overcome the lack of sampling and meet the requirements for randomness. The GIS computer application software ArcGIS9.1 was used to overcome the lack of data of sampling sites.Using ArcGIS 9.1 and GEODA to analyze the cow anthrax spatial distribution of district A. we gained some conclusions about cow anthrax' density: (1) there is a spatial clustering model. (2) there is an intensely spatial autocorrelation. We established a prediction model to estimate the anthrax distribution based on the spatial characteristic of the density of cow anthrax. Comparing with the true distribution, the prediction model has a well coincidence and is feasible to the application. The method using a GIS tool facilitates can be implemented significantly in the cow anthrax monitoring and investigation, and the space statistics - related prediction model provides a fundamental use for other study on space-related animal diseases.

  8. Distributional fold change test – a statistical approach for detecting differential expression in microarray experiments

    PubMed Central

    2012-01-01

    Background Because of the large volume of data and the intrinsic variation of data intensity observed in microarray experiments, different statistical methods have been used to systematically extract biological information and to quantify the associated uncertainty. The simplest method to identify differentially expressed genes is to evaluate the ratio of average intensities in two different conditions and consider all genes that differ by more than an arbitrary cut-off value to be differentially expressed. This filtering approach is not a statistical test and there is no associated value that can indicate the level of confidence in the designation of genes as differentially expressed or not differentially expressed. At the same time the fold change by itself provide valuable information and it is important to find unambiguous ways of using this information in expression data treatment. Results A new method of finding differentially expressed genes, called distributional fold change (DFC) test is introduced. The method is based on an analysis of the intensity distribution of all microarray probe sets mapped to a three dimensional feature space composed of average expression level, average difference of gene expression and total variance. The proposed method allows one to rank each feature based on the signal-to-noise ratio and to ascertain for each feature the confidence level and power for being differentially expressed. The performance of the new method was evaluated using the total and partial area under receiver operating curves and tested on 11 data sets from Gene Omnibus Database with independently verified differentially expressed genes and compared with the t-test and shrinkage t-test. Overall the DFC test performed the best – on average it had higher sensitivity and partial AUC and its elevation was most prominent in the low range of differentially expressed features, typical for formalin-fixed paraffin-embedded sample sets. Conclusions The distributional fold change test is an effective method for finding and ranking differentially expressed probesets on microarrays. The application of this test is advantageous to data sets using formalin-fixed paraffin-embedded samples or other systems where degradation effects diminish the applicability of correlation adjusted methods to the whole feature set. PMID:23122055

  9. Asymmetries in the individual distinctiveness and maternal recognition of infant contact calls and distress screams in baboons

    PubMed Central

    Rendall, Drew; Notman, Hugh; Owren, Michael J.

    2009-01-01

    A key component of nonhuman primate vocal communication is the production and recognition of clear cues to social identity that function in the management of these species’ individualistic social relationships. However, it remains unclear how ubiquitous such identity cues are across call types and age-sex classes and what the underlying vocal production mechanisms responsible might be. This study focused on two structurally distinct call types produced by infant baboons in contexts that place a similar functional premium on communicating clear cues to caller identity: (1) contact calls produced when physically separated from, and attempting to relocate, mothers and (2) distress screams produced when aggressively attacked by other group members. Acoustic analyses and field experiments were conducted to examine individual differentiation in single vocalizations of each type and to test mothers’ ability to recognize infant calls. Both call types showed statistically significant individual differentiation, but the magnitude of the differentiation was substantially higher in contact calls. Mothers readily discriminated own-offspring contact calls from those of familiar but unrelated infants, but did not do so when it came to distress screams. Several possible explanations for these asymmetries in call differentiation and recognition are considered. PMID:19275336

  10. Association between secure patient–clinician email and clinical services utilisation in a US integrated health system: a retrospective cohort study

    PubMed Central

    Meng, Di; Palen, Ted E; Tsai, Joanne; McLeod, Melanie; Garrido, Terhilda; Qian, Heather

    2015-01-01

    Objective To assess associations between secure patient–clinician email use and clinical services utilisation over time. Design Retrospective cohort study between July 2010 and December 2013. Controlling for a utilisation surge around first secure email use, we analysed difference of differences between propensity score-matched groups of secure patient–clinician email users and non-users for utilisation 1–12 months before and 7–18 months after first email (users) or a randomly assigned index date (non-users). Setting US integrated healthcare delivery system. Participants 9345 adults with first secure email use between July 2011 and July 2012 and continuous enrolment for ≥30 months and 9345 adults without secure email use between July 2010 and July 2012 matched to users on demographics, health status, and baseline utilisation. Primary Outcome Measures Rates of office visits, patient-initiated phone calls, scheduled telephone visits, after-hours clinic visits, emergency department visits, and hospitalisations. Results After controlling for multiple factors, no statistically significant differences in utilisation between secure email users and non-users occurred. Utilisation transiently increased by 88–237% around first email use. Annual rates of patient-initiated phone calls decreased among secure email users, 0.2 fewer calls per person (95% CI −0.3 to −0.1), from a mean of 4.1 calls per person 1–12 months before first use to a mean of 3.8 calls per person 7–18 months after first use. Rates of patient-initiated phone calls also decreased among non-users, 0.1 fewer calls per person (95% CI −0.2 to 0.0), from a mean of 4.2 calls per person 1–12 months before the index date to mean of 4.1 calls per person 7–18 months after the index date. Conclusions Compared with non-users, patient use of secure email with clinicians was not associated with statistically significant differences in clinical services utilisation 7–18 months after first use. PMID:26553841

  11. Association between secure patient-clinician email and clinical services utilisation in a US integrated health system: a retrospective cohort study.

    PubMed

    Meng, Di; Palen, Ted E; Tsai, Joanne; McLeod, Melanie; Garrido, Terhilda; Qian, Heather

    2015-11-09

    To assess associations between secure patient-clinician email use and clinical services utilisation over time. Retrospective cohort study between July 2010 and December 2013. Controlling for a utilisation surge around first secure email use, we analysed difference of differences between propensity score-matched groups of secure patient-clinician email users and non-users for utilisation 1-12 months before and 7-18 months after first email (users) or a randomly assigned index date (non-users). US integrated healthcare delivery system. 9345 adults with first secure email use between July 2011 and July 2012 and continuous enrolment for ≥30 months and 9345 adults without secure email use between July 2010 and July 2012 matched to users on demographics, health status, and baseline utilisation. Rates of office visits, patient-initiated phone calls, scheduled telephone visits, after-hours clinic visits, emergency department visits, and hospitalisations. After controlling for multiple factors, no statistically significant differences in utilisation between secure email users and non-users occurred. Utilisation transiently increased by 88-237% around first email use. Annual rates of patient-initiated phone calls decreased among secure email users, 0.2 fewer calls per person (95% CI -0.3 to -0.1), from a mean of 4.1 calls per person 1-12 months before first use to a mean of 3.8 calls per person 7-18 months after first use. Rates of patient-initiated phone calls also decreased among non-users, 0.1 fewer calls per person (95% CI -0.2 to 0.0), from a mean of 4.2 calls per person 1-12 months before the index date to mean of 4.1 calls per person 7-18 months after the index date. Compared with non-users, patient use of secure email with clinicians was not associated with statistically significant differences in clinical services utilisation 7-18 months after first use. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/

  12. A review of downscaling procedures - a contribution to the research on climate change impacts at city scale

    NASA Astrophysics Data System (ADS)

    Smid, Marek; Costa, Ana; Pebesma, Edzer; Granell, Carlos; Bhattacharya, Devanjan

    2016-04-01

    Human kind is currently predominantly urban based, and the majority of ever continuing population growth will take place in urban agglomerations. Urban systems are not only major drivers of climate change, but also the impact hot spots. Furthermore, climate change impacts are commonly managed at city scale. Therefore, assessing climate change impacts on urban systems is a very relevant subject of research. Climate and its impacts on all levels (local, meso and global scale) and also the inter-scale dependencies of those processes should be a subject to detail analysis. While global and regional projections of future climate are currently available, local-scale information is lacking. Hence, statistical downscaling methodologies represent a potentially efficient way to help to close this gap. In general, the methodological reviews of downscaling procedures cover the various methods according to their application (e.g. downscaling for the hydrological modelling). Some of the most recent and comprehensive studies, such as the ESSEM COST Action ES1102 (VALUE), use the concept of Perfect Prog and MOS. Other examples of classification schemes of downscaling techniques consider three main categories: linear methods, weather classifications and weather generators. Downscaling and climate modelling represent a multidisciplinary field, where researchers from various backgrounds intersect their efforts, resulting in specific terminology, which may be somewhat confusing. For instance, the Polynomial Regression (also called the Surface Trend Analysis) is a statistical technique. In the context of the spatial interpolation procedures, it is commonly classified as a deterministic technique, and kriging approaches are classified as stochastic. Furthermore, the terms "statistical" and "stochastic" (frequently used as names of sub-classes in downscaling methodological reviews) are not always considered as synonymous, even though both terms could be seen as identical since they are referring to methods handling input modelling factors as variables with certain probability distributions. In addition, the recent development is going towards multi-step methodologies containing deterministic and stochastic components. This evolution leads to the introduction of new terms like hybrid or semi-stochastic approaches, which makes the efforts to systematically classifying downscaling methods to the previously defined categories even more challenging. This work presents a review of statistical downscaling procedures, which classifies the methods in two steps. In the first step, we describe several techniques that produce a single climatic surface based on observations. The methods are classified into two categories using an approximation to the broadest consensual statistical terms: linear and non-linear methods. The second step covers techniques that use simulations to generate alternative surfaces, which correspond to different realizations of the same processes. Those simulations are essential because there is a limited number of real observational data, and such procedures are crucial for modelling extremes. This work emphasises the link between statistical downscaling methods and the research of climate change impacts at city scale.

  13. Sound imaging of nocturnal animal calls in their natural habitat.

    PubMed

    Mizumoto, Takeshi; Aihara, Ikkyu; Otsuka, Takuma; Takeda, Ryu; Aihara, Kazuyuki; Okuno, Hiroshi G

    2011-09-01

    We present a novel method for imaging acoustic communication between nocturnal animals. Investigating the spatio-temporal calling behavior of nocturnal animals, e.g., frogs and crickets, has been difficult because of the need to distinguish many animals' calls in noisy environments without being able to see them. Our method visualizes the spatial and temporal dynamics using dozens of sound-to-light conversion devices (called "Firefly") and an off-the-shelf video camera. The Firefly, which consists of a microphone and a light emitting diode, emits light when it captures nearby sound. Deploying dozens of Fireflies in a target area, we record calls of multiple individuals through the video camera. We conduct two experiments, one indoors and the other in the field, using Japanese tree frogs (Hyla japonica). The indoor experiment demonstrates that our method correctly visualizes Japanese tree frogs' calling behavior. It has confirmed the known behavior; two frogs call synchronously or in anti-phase synchronization. The field experiment (in a rice paddy where Japanese tree frogs live) also visualizes the same calling behavior to confirm anti-phase synchronization in the field. Experimental results confirm that our method can visualize the calling behavior of nocturnal animals in their natural habitat.

  14. [Psychopathology of anxiety-phobic disorders that led to hospitalization in a psychiatric hospital].

    PubMed

    Chugunov, D A; Schmilovitch, A A

    To study the psychopathology of anxiety-phobic disorders and motives of hospitalization of patients in a psychiatric hospital. One hundred and thirty-two patients were examined, 72 patients of the main group were admitted to general psychiatric departments, 60 patients of the control group in the sanatorium psychiatric departments. Clinical-psychopathological, follow-up, psychometric and statistical methods were used. Patients with hospital anxiety-phobic disorders had agoraphobia with panic disorder, social phobias, hypochondriacal phobias, specific phobias and multiple phobias. The main reasons for hospitalization were: the intensity of anxiety-phobic disorders, contrast content of phobias, multiplicity of anxiety-phobic disorders, ambulance calls, personality accentuations and rental aims.

  15. Gamma-widths, lifetimes and fluctuations in the nuclear quasi-continuum

    NASA Astrophysics Data System (ADS)

    Guttormsen, M.; Larsen, A. C.; Midtbø, J. E.; Crespo Campo, L.; Görgen, A.; Ingeberg, V. W.; Renstrøm, T.; Siem, S.; Tveten, G. M.; Zeiser, F.; Kirsch, L. E.

    2018-05-01

    Statistical γ-decay from highly excited states is determined by the nuclear level density (NLD) and the γ-ray strength function (γSF). These average quantities have been measured for several nuclei using the Oslo method. For the first time, we exploit the NLD and γSF to evaluate the γ-width in the energy region below the neutron binding energy, often called the quasi-continuum region. The lifetimes of states in the quasi-continuum are important benchmarks for a theoretical description of nuclear structure and dynamics at high temperature. The lifetimes may also have impact on reaction rates for the rapid neutron-capture process, now demonstrated to take place in neutron star mergers.

  16. Information and material flows in complex networks

    NASA Astrophysics Data System (ADS)

    Helbing, Dirk; Armbruster, Dieter; Mikhailov, Alexander S.; Lefeber, Erjen

    2006-04-01

    In this special issue, an overview of the Thematic Institute (TI) on Information and Material Flows in Complex Systems is given. The TI was carried out within EXYSTENCE, the first EU Network of Excellence in the area of complex systems. Its motivation, research approach and subjects are presented here. Among the various methods used are many-particle and statistical physics, nonlinear dynamics, as well as complex systems, network and control theory. The contributions are relevant for complex systems as diverse as vehicle and data traffic in networks, logistics, production, and material flows in biological systems. The key disciplines involved are socio-, econo-, traffic- and bio-physics, and a new research area that could be called “biologistics”.

  17. Accuracy assessment, using stratified plurality sampling, of portions of a LANDSAT classification of the Arctic National Wildlife Refuge Coastal Plain

    NASA Technical Reports Server (NTRS)

    Card, Don H.; Strong, Laurence L.

    1989-01-01

    An application of a classification accuracy assessment procedure is described for a vegetation and land cover map prepared by digital image processing of LANDSAT multispectral scanner data. A statistical sampling procedure called Stratified Plurality Sampling was used to assess the accuracy of portions of a map of the Arctic National Wildlife Refuge coastal plain. Results are tabulated as percent correct classification overall as well as per category with associated confidence intervals. Although values of percent correct were disappointingly low for most categories, the study was useful in highlighting sources of classification error and demonstrating shortcomings of the plurality sampling method.

  18. Unveiling causal activity of complex networks

    NASA Astrophysics Data System (ADS)

    Williams-García, Rashid V.; Beggs, John M.; Ortiz, Gerardo

    2017-07-01

    We introduce a novel tool for analyzing complex network dynamics, allowing for cascades of causally-related events, which we call causal webs (c-webs), to be separated from other non-causally-related events. This tool shows that traditionally-conceived avalanches may contain mixtures of spatially-distinct but temporally-overlapping cascades of events, and dynamical disorder or noise. In contrast, c-webs separate these components, unveiling previously hidden features of the network and dynamics. We apply our method to mouse cortical data with resulting statistics which demonstrate for the first time that neuronal avalanches are not merely composed of causally-related events. The original version of this article was uploaded to the arXiv on March 17th, 2016 [1].

  19. Learning to improve iterative repair scheduling

    NASA Technical Reports Server (NTRS)

    Zweben, Monte; Davis, Eugene

    1992-01-01

    This paper presents a general learning method for dynamically selecting between repair heuristics in an iterative repair scheduling system. The system employs a version of explanation-based learning called Plausible Explanation-Based Learning (PEBL) that uses multiple examples to confirm conjectured explanations. The basic approach is to conjecture contradictions between a heuristic and statistics that measure the quality of the heuristic. When these contradictions are confirmed, a different heuristic is selected. To motivate the utility of this approach we present an empirical evaluation of the performance of a scheduling system with respect to two different repair strategies. We show that the scheduler that learns to choose between the heuristics outperforms the same scheduler with any one of two heuristics alone.

  20. Multiscale hidden Markov models for photon-limited imaging

    NASA Astrophysics Data System (ADS)

    Nowak, Robert D.

    1999-06-01

    Photon-limited image analysis is often hindered by low signal-to-noise ratios. A novel Bayesian multiscale modeling and analysis method is developed in this paper to assist in these challenging situations. In addition to providing a very natural and useful framework for modeling an d processing images, Bayesian multiscale analysis is often much less computationally demanding compared to classical Markov random field models. This paper focuses on a probabilistic graph model called the multiscale hidden Markov model (MHMM), which captures the key inter-scale dependencies present in natural image intensities. The MHMM framework presented here is specifically designed for photon-limited imagin applications involving Poisson statistics, and applications to image intensity analysis are examined.

  1. ChIPnorm: a statistical method for normalizing and identifying differential regions in histone modification ChIP-seq libraries.

    PubMed

    Nair, Nishanth Ulhas; Sahu, Avinash Das; Bucher, Philipp; Moret, Bernard M E

    2012-01-01

    The advent of high-throughput technologies such as ChIP-seq has made possible the study of histone modifications. A problem of particular interest is the identification of regions of the genome where different cell types from the same organism exhibit different patterns of histone enrichment. This problem turns out to be surprisingly difficult, even in simple pairwise comparisons, because of the significant level of noise in ChIP-seq data. In this paper we propose a two-stage statistical method, called ChIPnorm, to normalize ChIP-seq data, and to find differential regions in the genome, given two libraries of histone modifications of different cell types. We show that the ChIPnorm method removes most of the noise and bias in the data and outperforms other normalization methods. We correlate the histone marks with gene expression data and confirm that histone modifications H3K27me3 and H3K4me3 act as respectively a repressor and an activator of genes. Compared to what was previously reported in the literature, we find that a substantially higher fraction of bivalent marks in ES cells for H3K27me3 and H3K4me3 move into a K27-only state. We find that most of the promoter regions in protein-coding genes have differential histone-modification sites. The software for this work can be downloaded from http://lcbb.epfl.ch/software.html.

  2. An optimal scheme for top quark mass measurement near the \\rm{t}\\bar{t} threshold at future \\rm{e}^{+}{e}^{-} colliders

    NASA Astrophysics Data System (ADS)

    Chen, Wei-Guo; Wan, Xia; Wang, You-Kai

    2018-05-01

    A top quark mass measurement scheme near the {{t}}\\bar{{{t}}} production threshold in future {{{e}}}+{{{e}}}- colliders, e.g. the Circular Electron Positron Collider (CEPC), is simulated. A {χ }2 fitting method is adopted to determine the number of energy points to be taken and their locations. Our results show that the optimal energy point is located near the largest slope of the cross section v. beam energy plot, and the most efficient scheme is to concentrate all luminosity on this single energy point in the case of one-parameter top mass fitting. This suggests that the so-called data-driven method could be the best choice for future real experimental measurements. Conveniently, the top mass statistical uncertainty can also be calculated directly by the error matrix even without any sampling and fitting. The agreement of the above two optimization methods has been checked. Our conclusion is that by taking 50 fb‑1 total effective integrated luminosity data, the statistical uncertainty of the top potential subtracted mass can be suppressed to about 7 MeV and the total uncertainty is about 30 MeV. This precision will help to identify the stability of the electroweak vacuum at the Planck scale. Supported by National Science Foundation of China (11405102) and the Fundamental Research Funds for the Central Universities of China (GK201603027, GK201803019)

  3. Q-nexus: a comprehensive and efficient analysis pipeline designed for ChIP-nexus.

    PubMed

    Hansen, Peter; Hecht, Jochen; Ibn-Salem, Jonas; Menkuec, Benjamin S; Roskosch, Sebastian; Truss, Matthias; Robinson, Peter N

    2016-11-04

    ChIP-nexus, an extension of the ChIP-exo protocol, can be used to map the borders of protein-bound DNA sequences at nucleotide resolution, requires less input DNA and enables selective PCR duplicate removal using random barcodes. However, the use of random barcodes requires additional preprocessing of the mapping data, which complicates the computational analysis. To date, only a very limited number of software packages are available for the analysis of ChIP-exo data, which have not yet been systematically tested and compared on ChIP-nexus data. Here, we present a comprehensive software package for ChIP-nexus data that exploits the random barcodes for selective removal of PCR duplicates and for quality control. Furthermore, we developed bespoke methods to estimate the width of the protected region resulting from protein-DNA binding and to infer binding positions from ChIP-nexus data. Finally, we applied our peak calling method as well as the two other methods MACE and MACS2 to the available ChIP-nexus data. The Q-nexus software is efficient and easy to use. Novel statistics about duplication rates in consideration of random barcodes are calculated. Our method for the estimation of the width of the protected region yields unbiased signatures that are highly reproducible for biological replicates and at the same time very specific for the respective factors analyzed. As judged by the irreproducible discovery rate (IDR), our peak calling algorithm shows a substantially better reproducibility. An implementation of Q-nexus is available at http://charite.github.io/Q/ .

  4. Statistics of single unit responses in the human medial temporal lobe: A sparse and overdispersed code

    NASA Astrophysics Data System (ADS)

    Magyar, Andrew

    The recent discovery of cells that respond to purely conceptual features of the environment (particular people, landmarks, objects, etc) in the human medial temporal lobe (MTL), has raised many questions about the nature of the neural code in humans. The goal of this dissertation is to develop a novel statistical method based upon maximum likelihood regression which will then be applied to these experiments in order to produce a quantitative description of the coding properties of the human MTL. In general, the method is applicable to any experiments in which a sequence of stimuli are presented to an organism while the binary responses of a large number of cells are recorded in parallel. The central concept underlying the approach is the total probability that a neuron responds to a random stimulus, called the neuronal sparsity. The model then estimates the distribution of response probabilities across the population of cells. Applying the method to single-unit recordings from the human medial temporal lobe, estimates of the sparsity distributions are acquired in four regions: the hippocampus, the entorhinal cortex, the amygdala, and the parahippocampal cortex. The resulting distributions are found to be sparse (large fraction of cells with a low response probability) and highly non-uniform, with a large proportion of ultra-sparse neurons that possess a very low response probability, and a smaller population of cells which respond much more frequently. Rammifications of the results are discussed in relation to the sparse coding hypothesis, and comparisons are made between the statistics of the human medial temporal lobe cells and place cells observed in the rodent hippocampus.

  5. An Optical Flow-Based Full Reference Video Quality Assessment Algorithm.

    PubMed

    K, Manasa; Channappayya, Sumohana S

    2016-06-01

    We present a simple yet effective optical flow-based full-reference video quality assessment (FR-VQA) algorithm for assessing the perceptual quality of natural videos. Our algorithm is based on the premise that local optical flow statistics are affected by distortions and the deviation from pristine flow statistics is proportional to the amount of distortion. We characterize the local flow statistics using the mean, the standard deviation, the coefficient of variation (CV), and the minimum eigenvalue ( λ min ) of the local flow patches. Temporal distortion is estimated as the change in the CV of the distorted flow with respect to the reference flow, and the correlation between λ min of the reference and of the distorted patches. We rely on the robust multi-scale structural similarity index for spatial quality estimation. The computed temporal and spatial distortions, thus, are then pooled using a perceptually motivated heuristic to generate a spatio-temporal quality score. The proposed method is shown to be competitive with the state-of-the-art when evaluated on the LIVE SD database, the EPFL Polimi SD database, and the LIVE Mobile HD database. The distortions considered in these databases include those due to compression, packet-loss, wireless channel errors, and rate-adaptation. Our algorithm is flexible enough to allow for any robust FR spatial distortion metric for spatial distortion estimation. In addition, the proposed method is not only parameter-free but also independent of the choice of the optical flow algorithm. Finally, we show that the replacement of the optical flow vectors in our proposed method with the much coarser block motion vectors also results in an acceptable FR-VQA algorithm. Our algorithm is called the flow similarity index.

  6. Investigation of Magnetotelluric Source Effect Based on Twenty Years of Telluric and Geomagnetic Observation

    NASA Astrophysics Data System (ADS)

    Kis, A.; Lemperger, I.; Wesztergom, V.; Menvielle, M.; Szalai, S.; Novák, A.; Hada, T.; Matsukiyo, S.; Lethy, A. M.

    2016-12-01

    Magnetotelluric method is widely applied for investigation of subsurface structures by imaging the spatial distribution of electric conductivity. The method is based on the experimental determination of surface electromagnetic impedance tensor (Z) by surface geomagnetic and telluric registrations in two perpendicular orientation. In practical explorations the accurate estimation of Z necessitates the application of robust statistical methods for two reasons:1) the geomagnetic and telluric time series' are contaminated by man-made noise components and2) the non-homogeneous behavior of ionospheric current systems in the period range of interest (ELF-ULF and longer periods) results in systematic deviation of the impedance of individual time windows.Robust statistics manage both load of Z for the purpose of subsurface investigations. However, accurate analysis of the long term temporal variation of the first and second statistical moments of Z may provide valuable information about the characteristics of the ionospheric source current systems. Temporal variation of extent, spatial variability and orientation of the ionospheric source currents has specific effects on the surface impedance tensor. Twenty year long geomagnetic and telluric recordings of the Nagycenk Geophysical Observatory provides unique opportunity to reconstruct the so called magnetotelluric source effect and obtain information about the spatial and temporal behavior of ionospheric source currents at mid-latitudes. Detailed investigation of time series of surface electromagnetic impedance tensor has been carried out in different frequency classes of the ULF range. The presentation aims to provide a brief review of our results related to long term periodic modulations, up to solar cycle scale and about eventual deviations of the electromagnetic impedance and so the reconstructed equivalent ionospheric source effects.

  7. On the selection of ordinary differential equation models with application to predator-prey dynamical models.

    PubMed

    Zhang, Xinyu; Cao, Jiguo; Carroll, Raymond J

    2015-03-01

    We consider model selection and estimation in a context where there are competing ordinary differential equation (ODE) models, and all the models are special cases of a "full" model. We propose a computationally inexpensive approach that employs statistical estimation of the full model, followed by a combination of a least squares approximation (LSA) and the adaptive Lasso. We show the resulting method, here called the LSA method, to be an (asymptotically) oracle model selection method. The finite sample performance of the proposed LSA method is investigated with Monte Carlo simulations, in which we examine the percentage of selecting true ODE models, the efficiency of the parameter estimation compared to simply using the full and true models, and coverage probabilities of the estimated confidence intervals for ODE parameters, all of which have satisfactory performances. Our method is also demonstrated by selecting the best predator-prey ODE to model a lynx and hare population dynamical system among some well-known and biologically interpretable ODE models. © 2014, The International Biometric Society.

  8. CPU Performance Counter-Based Problem Diagnosis for Software Systems

    DTIC Science & Technology

    2009-09-01

    application servers and implementation techniques), this thesis only used the Enterprise Java Bean (EJB) SessionBean version of RUBiS. The PHP and Servlet ...collection statistics at the Java Virtual Machine (JVM) level can be reused for any Java application. Other examples of gray-box instrumentation include path...used gray-box approaches. For example, PinPoint [11, 14] and [29] use request tracing to diagnose Java exceptions, endless calls, and null calls in

  9. Reanalysis of Tyrannosaurus rex Mass Spectra.

    PubMed

    Bern, Marshall; Phinney, Brett S; Goldberg, David

    2009-09-01

    Asara et al. reported the detection of collagen peptides in a 68-million-year-old Tyrannosaurus rex bone by shotgun proteomics. This finding has been called into question as a possible statistical artifact. We reanalyze Asara et al.'s tandem mass spectra using a different search engine and different statistical tools. Our reanalysis shows a sample containing common laboratory contaminants, soil bacteria, and bird-like hemoglobin and collagen.

  10. An Item Fit Statistic Based on Pseudocounts from the Generalized Graded Unfolding Model: A Preliminary Report.

    ERIC Educational Resources Information Center

    Roberts, James S.

    Stone and colleagues (C. Stone, R. Ankenman, S. Lane, and M. Liu, 1993; C. Stone, R. Mislevy and J. Mazzeo, 1994; C. Stone, 2000) have proposed a fit index that explicitly accounts for the measurement error inherent in an estimated theta value, here called chi squared superscript 2, subscript i*. The elements of this statistic are natural…

  11. Compositionality and Statistics in Adjective Acquisition: 4-Year-Olds Interpret "Tall" and "Short" Based on the Size Distributions of Novel Noun Referents

    ERIC Educational Resources Information Center

    Barner, David; Snedeker, Jesse

    2008-01-01

    Four experiments investigated 4-year-olds' understanding of adjective-noun compositionality and their sensitivity to statistics when interpreting scalar adjectives. In Experiments 1 and 2, children selected "tall" and "short" items from 9 novel objects called "pimwits" (1-9 in. in height) or from this array plus 4 taller or shorter distractor…

  12. Screening Health Risk Assessment Burn Pit Exposures, Balad Air Base, Iraq and Addendum Report

    DTIC Science & Technology

    2008-05-01

    risk uses principles drawn from many scientific disciplines including chemistry , toxicology, physics, mathematics, and statistics. Because the data...uses principles drawn from many scientific disciplines, including chemistry , toxicology, physics, mathematics, and statistics. Because the data...natural chemicals in plants (called flavonoids ) also act on the Ah-receptor and could potentially block the effects of dioxins. One more reason to

  13. Biclustering of gene expression data using reactive greedy randomized adaptive search procedure.

    PubMed

    Dharan, Smitha; Nair, Achuthsankar S

    2009-01-30

    Biclustering algorithms belong to a distinct class of clustering algorithms that perform simultaneous clustering of both rows and columns of the gene expression matrix and can be a very useful analysis tool when some genes have multiple functions and experimental conditions are diverse. Cheng and Church have introduced a measure called mean squared residue score to evaluate the quality of a bicluster and has become one of the most popular measures to search for biclusters. In this paper, we review basic concepts of the metaheuristics Greedy Randomized Adaptive Search Procedure (GRASP)-construction and local search phases and propose a new method which is a variant of GRASP called Reactive Greedy Randomized Adaptive Search Procedure (Reactive GRASP) to detect significant biclusters from large microarray datasets. The method has two major steps. First, high quality bicluster seeds are generated by means of k-means clustering. In the second step, these seeds are grown using the Reactive GRASP, in which the basic parameter that defines the restrictiveness of the candidate list is self-adjusted, depending on the quality of the solutions found previously. We performed statistical and biological validations of the biclusters obtained and evaluated the method against the results of basic GRASP and as well as with the classic work of Cheng and Church. The experimental results indicate that the Reactive GRASP approach outperforms the basic GRASP algorithm and Cheng and Church approach. The Reactive GRASP approach for the detection of significant biclusters is robust and does not require calibration efforts.

  14. RELIC: a novel dye-bias correction method for Illumina Methylation BeadChip.

    PubMed

    Xu, Zongli; Langie, Sabine A S; De Boever, Patrick; Taylor, Jack A; Niu, Liang

    2017-01-03

    The Illumina Infinium HumanMethylation450 BeadChip and its successor, Infinium MethylationEPIC BeadChip, have been extensively utilized in epigenome-wide association studies. Both arrays use two fluorescent dyes (Cy3-green/Cy5-red) to measure methylation level at CpG sites. However, performance difference between dyes can result in biased estimates of methylation levels. Here we describe a novel method, called REgression on Logarithm of Internal Control probes (RELIC) to correct for dye bias on whole array by utilizing the intensity values of paired internal control probes that monitor the two color channels. We evaluate the method in several datasets against other widely used dye-bias correction methods. Results on data quality improvement showed that RELIC correction statistically significantly outperforms alternative dye-bias correction methods. We incorporated the method into the R package ENmix, which is freely available from the Bioconductor website ( https://www.bioconductor.org/packages/release/bioc/html/ENmix.html ). RELIC is an efficient and robust method to correct for dye-bias in Illumina Methylation BeadChip data. It outperforms other alternative methods and conveniently implemented in R package ENmix to facilitate DNA methylation studies.

  15. Performance evaluation of the machine learning algorithms used in inference mechanism of a medical decision support system.

    PubMed

    Bal, Mert; Amasyali, M Fatih; Sever, Hayri; Kose, Guven; Demirhan, Ayse

    2014-01-01

    The importance of the decision support systems is increasingly supporting the decision making process in cases of uncertainty and the lack of information and they are widely used in various fields like engineering, finance, medicine, and so forth, Medical decision support systems help the healthcare personnel to select optimal method during the treatment of the patients. Decision support systems are intelligent software systems that support decision makers on their decisions. The design of decision support systems consists of four main subjects called inference mechanism, knowledge-base, explanation module, and active memory. Inference mechanism constitutes the basis of decision support systems. There are various methods that can be used in these mechanisms approaches. Some of these methods are decision trees, artificial neural networks, statistical methods, rule-based methods, and so forth. In decision support systems, those methods can be used separately or a hybrid system, and also combination of those methods. In this study, synthetic data with 10, 100, 1000, and 2000 records have been produced to reflect the probabilities on the ALARM network. The accuracy of 11 machine learning methods for the inference mechanism of medical decision support system is compared on various data sets.

  16. Correcting for Sample Contamination in Genotype Calling of DNA Sequence Data

    PubMed Central

    Flickinger, Matthew; Jun, Goo; Abecasis, Gonçalo R.; Boehnke, Michael; Kang, Hyun Min

    2015-01-01

    DNA sample contamination is a frequent problem in DNA sequencing studies and can result in genotyping errors and reduced power for association testing. We recently described methods to identify within-species DNA sample contamination based on sequencing read data, showed that our methods can reliably detect and estimate contamination levels as low as 1%, and suggested strategies to identify and remove contaminated samples from sequencing studies. Here we propose methods to model contamination during genotype calling as an alternative to removal of contaminated samples from further analyses. We compare our contamination-adjusted calls to calls that ignore contamination and to calls based on uncontaminated data. We demonstrate that, for moderate contamination levels (5%–20%), contamination-adjusted calls eliminate 48%–77% of the genotyping errors. For lower levels of contamination, our contamination correction methods produce genotypes nearly as accurate as those based on uncontaminated data. Our contamination correction methods are useful generally, but are particularly helpful for sample contamination levels from 2% to 20%. PMID:26235984

  17. A label-free, fluorescence based assay for microarray

    NASA Astrophysics Data System (ADS)

    Niu, Sanjun

    DNA chip technology has drawn tremendous attention since it emerged in the mid 90's as a method that expedites gene sequencing by over 100-fold. DNA chip, also called DNA microarray, is a combinatorial technology in which different single-stranded DNA (ssDNA) molecules of known sequences are immobilized at specific spots. The immobilized ssDNA strands are called probes. In application, the chip is exposed to a solution containing ssDNA of unknown sequence, called targets, which are labeled with fluorescent dyes. Due to specific molecular recognition among the base pairs in the DNA, the binding or hybridization occurs only when the probe and target sequences are complementary. The nucleotide sequence of the target is determined by imaging the fluorescence from the spots. The uncertainty of background in signal detection and statistical error in data analysis, primarily due to the error in the DNA amplification process and statistical distribution of the tags in the target DNA, have become the fundamental barriers in bringing the technology into application for clinical diagnostics. Furthermore, the dye and tagging process are expensive, making the cost of DNA chips inhibitive for clinical testing. These limitations and challenges make it difficult to implement DNA chip methods as a diagnostic tool in a pathology laboratory. The objective of this dissertation research is to provide an alternative approach that will address the above challenges. In this research, a label-free assay is designed and studied. Polystyrene (PS), a commonly used polymeric material, serves as the fluorescence agent. Probe ssDNA is covalently immobilized on polystyrene thin film that is supported by a reflecting substrate. When this chip is exposed to excitation light, fluorescence light intensity from PS is detected as the signal. Since the optical constants and conformations of ssDNA and dsDNA (double stranded DNA) are different, the measured fluorescence from PS changes for the same intensity of excitation light. The fluorescence contrast is used to quantify the amount of probe-target hybridization. A mathematical model that considers multiple reflections and scattering is developed to explain the mechanism of the fluorescence contrast which depends on the thickness of the PS film. Scattering is the dominant factor that contributes to the contrast. The potential of this assay to detect single nucleotide polymorphism is also tested.

  18. Graphical Descriptives: A Way to Improve Data Transparency and Methodological Rigor in Psychology.

    PubMed

    Tay, Louis; Parrigon, Scott; Huang, Qiming; LeBreton, James M

    2016-09-01

    Several calls have recently been issued to the social sciences for enhanced transparency of research processes and enhanced rigor in the methodological treatment of data and data analytics. We propose the use of graphical descriptives (GDs) as one mechanism for responding to both of these calls. GDs provide a way to visually examine data. They serve as quick and efficient tools for checking data distributions, variable relations, and the potential appropriateness of different statistical analyses (e.g., do data meet the minimum assumptions for a particular analytic method). Consequently, we believe that GDs can promote increased transparency in the journal review process, encourage best practices for data analysis, and promote a more inductive approach to understanding psychological data. We illustrate the value of potentially including GDs as a step in the peer-review process and provide a user-friendly online resource (www.graphicaldescriptives.org) for researchers interested in including data visualizations in their research. We conclude with suggestions on how GDs can be expanded and developed to enhance transparency. © The Author(s) 2016.

  19. Exploring homogeneity of correlation structures of gene expression datasets within and between etiological disease categories.

    PubMed

    Jong, Victor L; Novianti, Putri W; Roes, Kit C B; Eijkemans, Marinus J C

    2014-12-01

    The literature shows that classifiers perform differently across datasets and that correlations within datasets affect the performance of classifiers. The question that arises is whether the correlation structure within datasets differ significantly across diseases. In this study, we evaluated the homogeneity of correlation structures within and between datasets of six etiological disease categories; inflammatory, immune, infectious, degenerative, hereditary and acute myeloid leukemia (AML). We also assessed the effect of filtering; detection call and variance filtering on correlation structures. We downloaded microarray datasets from ArrayExpress for experiments meeting predefined criteria and ended up with 12 datasets for non-cancerous diseases and six for AML. The datasets were preprocessed by a common procedure incorporating platform-specific recommendations and the two filtering methods mentioned above. Homogeneity of correlation matrices between and within datasets of etiological diseases was assessed using the Box's M statistic on permuted samples. We found that correlation structures significantly differ between datasets of the same and/or different etiological disease categories and that variance filtering eliminates more uncorrelated probesets than detection call filtering and thus renders the data highly correlated.

  20. Probabilistic atlas and geometric variability estimation to drive tissue segmentation.

    PubMed

    Xu, Hao; Thirion, Bertrand; Allassonnière, Stéphanie

    2014-09-10

    Computerized anatomical atlases play an important role in medical image analysis. While an atlas usually refers to a standard or mean image also called template, which presumably represents well a given population, it is not enough to characterize the observed population in detail. A template image should be learned jointly with the geometric variability of the shapes represented in the observations. These two quantities will in the sequel form the atlas of the corresponding population. The geometric variability is modeled as deformations of the template image so that it fits the observations. In this paper, we provide a detailed analysis of a new generative statistical model based on dense deformable templates that represents several tissue types observed in medical images. Our atlas contains both an estimation of probability maps of each tissue (called class) and the deformation metric. We use a stochastic algorithm for the estimation of the probabilistic atlas given a dataset. This atlas is then used for atlas-based segmentation method to segment the new images. Experiments are shown on brain T1 MRI datasets. Copyright © 2014 John Wiley & Sons, Ltd.

  1. Recall accuracy of mobile phone calls among Japanese young people.

    PubMed

    Kiyohara, Kosuke; Wake, Kanako; Watanabe, Soichi; Arima, Takuji; Sato, Yasuto; Kojimahara, Noriko; Taki, Masao; Yamaguchi, Naohito

    2016-11-01

    This study aimed to elucidate the recall accuracy of mobile phone calls among young people using new software-modified phone (SMP) technology. A total of 198 Japanese students aged between 10 and 24 years were instructed to use a SMP for 1 month to record their actual call statuses. Ten to 12 months after this period, face-to-face interviews were conducted to obtain the self-reported call statuses during the monitoring period. Using the SMP record as the gold standard of validation, the recall accuracy of phone calls was evaluated. A total of 19% of the participants (34/177) misclassified their laterality (i.e., the dominant side of ear used while making calls), with the level of agreement being moderate (κ-statistics, 0.449). The level of agreement between the self-reports and SMP records was relatively good for the duration of calls (Pearson's r, 0.620), as compared with the number of calls (Pearson's r, 0.561). The recall was prone to small systematic and large random errors for both the number and duration of calls. Such a large random recall error for the amount of calls and misclassification of laterality suggest that the results of epidemiological studies of mobile phone use based on self-assessment should be interpreted cautiously.

  2. Accounting for GC-content bias reduces systematic errors and batch effects in ChIP-seq data.

    PubMed

    Teng, Mingxiang; Irizarry, Rafael A

    2017-11-01

    The main application of ChIP-seq technology is the detection of genomic regions that bind to a protein of interest. A large part of functional genomics' public catalogs is based on ChIP-seq data. These catalogs rely on peak calling algorithms that infer protein-binding sites by detecting genomic regions associated with more mapped reads (coverage) than expected by chance, as a result of the experimental protocol's lack of perfect specificity. We find that GC-content bias accounts for substantial variability in the observed coverage for ChIP-seq experiments and that this variability leads to false-positive peak calls. More concerning is that the GC effect varies across experiments, with the effect strong enough to result in a substantial number of peaks called differently when different laboratories perform experiments on the same cell line. However, accounting for GC content bias in ChIP-seq is challenging because the binding sites of interest tend to be more common in high GC-content regions, which confounds real biological signals with unwanted variability. To account for this challenge, we introduce a statistical approach that accounts for GC effects on both nonspecific noise and signal induced by the binding site. The method can be used to account for this bias in binding quantification as well to improve existing peak calling algorithms. We use this approach to show a reduction in false-positive peaks as well as improved consistency across laboratories. © 2017 Teng and Irizarry; Published by Cold Spring Harbor Laboratory Press.

  3. On the application of the Principal Component Analysis for an efficient climate downscaling of surface wind fields

    NASA Astrophysics Data System (ADS)

    Chavez, Roberto; Lozano, Sergio; Correia, Pedro; Sanz-Rodrigo, Javier; Probst, Oliver

    2013-04-01

    With the purpose of efficiently and reliably generating long-term wind resource maps for the wind energy industry, the application and verification of a statistical methodology for the climate downscaling of wind fields at surface level is presented in this work. This procedure is based on the combination of the Monte Carlo and the Principal Component Analysis (PCA) statistical methods. Firstly the Monte Carlo method is used to create a huge number of daily-based annual time series, so called climate representative years, by the stratified sampling of a 33-year-long time series corresponding to the available period of the NCAR/NCEP global reanalysis data set (R-2). Secondly the representative years are evaluated such that the best set is chosen according to its capability to recreate the Sea Level Pressure (SLP) temporal and spatial fields from the R-2 data set. The measure of this correspondence is based on the Euclidean distance between the Empirical Orthogonal Functions (EOF) spaces generated by the PCA (Principal Component Analysis) decomposition of the SLP fields from both the long-term and the representative year data sets. The methodology was verified by comparing the selected 365-days period against a 9-year period of wind fields generated by dynamical downscaling the Global Forecast System data with the mesoscale model SKIRON for the Iberian Peninsula. These results showed that, compared to the traditional method of dynamical downscaling any random 365-days period, the error in the average wind velocity by the PCA's representative year was reduced by almost 30%. Moreover the Mean Absolute Errors (MAE) in the monthly and daily wind profiles were also reduced by almost 25% along all SKIRON grid points. These results showed also that the methodology presented maximum error values in the wind speed mean of 0.8 m/s and maximum MAE in the monthly curves of 0.7 m/s. Besides the bulk numbers, this work shows the spatial distribution of the errors across the Iberian domain and additional wind statistics such as the velocity and directional frequency. Additional repetitions were performed to prove the reliability and robustness of this kind-of statistical-dynamical downscaling method.

  4. Comparison of the Effect of Mindfulness-based Cognitive Therapy Accompanied by Pharmacotherapy With Pharmacotherapy Alone in Treating Dysthymic Patients

    PubMed Central

    Hamidian, Sajedeh; Omidi, Abdollah; Mousavinasab, Seyyed Masoud; Naziri, Ghasem

    2013-01-01

    Background Dysthimia in adults is a chronic depression disorder which is characterized by a mild depression for at least 2 years. Remarkable psycho-social involvements, greater disturbances in psycho-social functions compared to other forms of depression and lack of definite findings about preferred treatment for this disorder led us to evaluate the effectiveness of Mindfulness based cognitive therapy (MBCT) method adjunct to pharmacotherapy compared with pharmachothrapy alone in treating dysthymia in this thesis. Objectives This study aimed to evaluate the effectiveness of mindfulness-based cognitive therapy on a chronic type of depression disorder called dysthymia Patients and Methods This study is a clinical trial of an interventional method which was carried out on dysthymic and double depressed patients who had referred to psychiatric clinics of Shiraz University of Medical Sciences, Shiraz, Iran. In doing so, 50 patients above the age of 18 were selected through convenience sampling and assigned into intervention and control groups. The control group only received medications while the intervention group in addition to receiving medication, participated in 8 sessions of a mindfulness based cognitive therapy course which was held once a week and each session lasted for 2 to 2.5 hours. All the participants filled out Beck Depression Inventory II and five facet mindfulness questionnaire. The data were analyzed using the SPSS statistical software (version 16) and univariate covariance and independent t test statistical methods. Results In this study, no statistically significant differences were found between the two groups regarding the demographic characteristics. The mean difference between the two groups was statistically significant for the variables in post-test considering the pre-test. The experimental group participants showed significant improvement in terms of the defined variables; a trend which was not observed in the control group participants. Conclusion The results of this study show that adding MBCT to pharmacotherapy in treatment of dysthymic patients can cause significant improvement in depression symptoms and mindfulness skills in patients compared to pharmacotherapy alone. PMID:23984005

  5. Analytical and statistical consideration on the use of the ISAG-ICAR-SNP bovine panel for parentage control, using the Illumina BeadChip technology: example on the German Holstein population.

    PubMed

    Schütz, Ekkehard; Brenig, Bertram

    2015-02-05

    Parentage control is moving from short tandem repeats- to single nucleotide polymorphism (SNP) systems. For SNP-based parentage control in cattle, the ISAG-ICAR Committee proposes a set of 100/200 SNPs but quality criteria are lacking. Regarding German Holstein-Friesian cattle with only a limited number of evaluated individuals, the exclusion probability is not well-defined. We propose a statistical procedure for excluding single SNPs from parentage control, based on case-by-case evaluation of the GenCall score, to minimize parentage exclusion, based on miscalled genotypes. Exclusion power of the ISAG-ICAR SNPs used for the German Holstein-Friesian population was adjusted based on the results of more than 25,000 individuals. Experimental data were derived from routine genomic selection analyses of the German Holstein-Friesian population using the Illumina BovineSNP50 v2 BeadChip (20,000 individuals) or the EuroG10K variant (7000 individuals). Averages and standard deviations of GenCall scores for the 200 SNPs of the ISAG-ICAR recommended panel were calculated and used to calculate the downward Z-value. Based on minor allelic frequencies in the Holstein-Friesian population, one minus exclusion probability was equal to 1.4×10⁻¹⁰ and 7.2×10⁻²⁶, with one and two parents, respectively. Two monomorphic SNPs from the 100-SNP ISAG-ICAR core-panel did not contribute. Simulation of 10,000 parentage control combinations, using the GenCall score data from both BeadChips, showed that with a Z-value greater than 3.66 only about 2.5% parentages were excluded, based on the ISAG-ICAR recommendations (core-panel: ≥ 90 SNPs for one, ≥ 85 SNPs for two parents). When applied to real data from 1750 single parentage assessments, the optimal threshold was determined to be Z = 5.0, with only 34 censored cases and reduction to four (0.2%) doubtful parentages. About 70 parentage exclusions due to weak genotype calls were avoided, whereas true exclusions (n = 34) were unaffected. Using SNPs for parentage evaluation provides a high exclusion power also for parent identification. SNPs with a low GenCall score show a high tendency towards intra-molecular secondary structures and substantially contribute to false exclusion of parentages. We propose a method that controls this error without excluding too many parent combinations from the evaluation.

  6. The Statistical Drake Equation

    NASA Astrophysics Data System (ADS)

    Maccone, Claudio

    2010-12-01

    We provide the statistical generalization of the Drake equation. From a simple product of seven positive numbers, the Drake equation is now turned into the product of seven positive random variables. We call this "the Statistical Drake Equation". The mathematical consequences of this transformation are then derived. The proof of our results is based on the Central Limit Theorem (CLT) of Statistics. In loose terms, the CLT states that the sum of any number of independent random variables, each of which may be ARBITRARILY distributed, approaches a Gaussian (i.e. normal) random variable. This is called the Lyapunov Form of the CLT, or the Lindeberg Form of the CLT, depending on the mathematical constraints assumed on the third moments of the various probability distributions. In conclusion, we show that: The new random variable N, yielding the number of communicating civilizations in the Galaxy, follows the LOGNORMAL distribution. Then, as a consequence, the mean value of this lognormal distribution is the ordinary N in the Drake equation. The standard deviation, mode, and all the moments of this lognormal N are also found. The seven factors in the ordinary Drake equation now become seven positive random variables. The probability distribution of each random variable may be ARBITRARY. The CLT in the so-called Lyapunov or Lindeberg forms (that both do not assume the factors to be identically distributed) allows for that. In other words, the CLT "translates" into our statistical Drake equation by allowing an arbitrary probability distribution for each factor. This is both physically realistic and practically very useful, of course. An application of our statistical Drake equation then follows. The (average) DISTANCE between any two neighboring and communicating civilizations in the Galaxy may be shown to be inversely proportional to the cubic root of N. Then, in our approach, this distance becomes a new random variable. We derive the relevant probability density function, apparently previously unknown and dubbed "Maccone distribution" by Paul Davies. DATA ENRICHMENT PRINCIPLE. It should be noticed that ANY positive number of random variables in the Statistical Drake Equation is compatible with the CLT. So, our generalization allows for many more factors to be added in the future as long as more refined scientific knowledge about each factor will be known to the scientists. This capability to make room for more future factors in the statistical Drake equation, we call the "Data Enrichment Principle," and we regard it as the key to more profound future results in the fields of Astrobiology and SETI. Finally, a practical example is given of how our statistical Drake equation works numerically. We work out in detail the case, where each of the seven random variables is uniformly distributed around its own mean value and has a given standard deviation. For instance, the number of stars in the Galaxy is assumed to be uniformly distributed around (say) 350 billions with a standard deviation of (say) 1 billion. Then, the resulting lognormal distribution of N is computed numerically by virtue of a MathCad file that the author has written. This shows that the mean value of the lognormal random variable N is actually of the same order as the classical N given by the ordinary Drake equation, as one might expect from a good statistical generalization.

  7. Efficient Implementation of an Optimal Interpolator for Large Spatial Data Sets

    NASA Technical Reports Server (NTRS)

    Memarsadeghi, Nargess; Mount, David M.

    2007-01-01

    Interpolating scattered data points is a problem of wide ranging interest. A number of approaches for interpolation have been proposed both from theoretical domains such as computational geometry and in applications' fields such as geostatistics. Our motivation arises from geological and mining applications. In many instances data can be costly to compute and are available only at nonuniformly scattered positions. Because of the high cost of collecting measurements, high accuracy is required in the interpolants. One of the most popular interpolation methods in this field is called ordinary kriging. It is popular because it is a best linear unbiased estimator. The price for its statistical optimality is that the estimator is computationally very expensive. This is because the value of each interpolant is given by the solution of a large dense linear system. In practice, kriging problems have been solved approximately by restricting the domain to a small local neighborhood of points that lie near the query point. Determining the proper size for this neighborhood is a solved by ad hoc methods, and it has been shown that this approach leads to undesirable discontinuities in the interpolant. Recently a more principled approach to approximating kriging has been proposed based on a technique called covariance tapering. This process achieves its efficiency by replacing the large dense kriging system with a much sparser linear system. This technique has been applied to a restriction of our problem, called simple kriging, which is not unbiased for general data sets. In this paper we generalize these results by showing how to apply covariance tapering to the more general problem of ordinary kriging. Through experimentation we demonstrate the space and time efficiency and accuracy of approximating ordinary kriging through the use of covariance tapering combined with iterative methods for solving large sparse systems. We demonstrate our approach on large data sizes arising both from synthetic sources and from real applications.

  8. Multiscale multifractal time irreversibility analysis of stock markets

    NASA Astrophysics Data System (ADS)

    Jiang, Chenguang; Shang, Pengjian; Shi, Wenbin

    2016-11-01

    Time irreversibility is one of the most important properties of nonstationary time series. Complex time series often demonstrate even multiscale time irreversibility, such that not only the original but also coarse-grained time series are asymmetric over a wide range of scales. We study the multiscale time irreversibility of time series. In this paper, we develop a method called multiscale multifractal time irreversibility analysis (MMRA), which allows us to extend the description of time irreversibility to include the dependence on the segment size and statistical moments. We test the effectiveness of MMRA in detecting multifractality and time irreversibility of time series generated from delayed Henon map and binomial multifractal model. Then we employ our method to the time irreversibility analysis of stock markets in different regions. We find that the emerging market has higher multifractality degree and time irreversibility compared with developed markets. In this sense, the MMRA method may provide new angles in assessing the evolution stage of stock markets.

  9. Study on probability distributions for evolution in modified extremal optimization

    NASA Astrophysics Data System (ADS)

    Zeng, Guo-Qiang; Lu, Yong-Zai; Mao, Wei-Jie; Chu, Jian

    2010-05-01

    It is widely believed that the power-law is a proper probability distribution being effectively applied for evolution in τ-EO (extremal optimization), a general-purpose stochastic local-search approach inspired by self-organized criticality, and its applications in some NP-hard problems, e.g., graph partitioning, graph coloring, spin glass, etc. In this study, we discover that the exponential distributions or hybrid ones (e.g., power-laws with exponential cutoff) being popularly used in the research of network sciences may replace the original power-laws in a modified τ-EO method called self-organized algorithm (SOA), and provide better performances than other statistical physics oriented methods, such as simulated annealing, τ-EO and SOA etc., from the experimental results on random Euclidean traveling salesman problems (TSP) and non-uniform instances. From the perspective of optimization, our results appear to demonstrate that the power-law is not the only proper probability distribution for evolution in EO-similar methods at least for TSP, the exponential and hybrid distributions may be other choices.

  10. Evaluation of Penalized and Nonpenalized Methods for Disease Prediction with Large-Scale Genetic Data.

    PubMed

    Won, Sungho; Choi, Hosik; Park, Suyeon; Lee, Juyoung; Park, Changyi; Kwon, Sunghoon

    2015-01-01

    Owing to recent improvement of genotyping technology, large-scale genetic data can be utilized to identify disease susceptibility loci and this successful finding has substantially improved our understanding of complex diseases. However, in spite of these successes, most of the genetic effects for many complex diseases were found to be very small, which have been a big hurdle to build disease prediction model. Recently, many statistical methods based on penalized regressions have been proposed to tackle the so-called "large P and small N" problem. Penalized regressions including least absolute selection and shrinkage operator (LASSO) and ridge regression limit the space of parameters, and this constraint enables the estimation of effects for very large number of SNPs. Various extensions have been suggested, and, in this report, we compare their accuracy by applying them to several complex diseases. Our results show that penalized regressions are usually robust and provide better accuracy than the existing methods for at least diseases under consideration.

  11. Comparative study on the performance of textural image features for active contour segmentation.

    PubMed

    Moraru, Luminita; Moldovanu, Simona

    2012-07-01

    We present a computerized method for the semi-automatic detection of contours in ultrasound images. The novelty of our study is the introduction of a fast and efficient image function relating to parametric active contour models. This new function is a combination of the gray-level information and first-order statistical features, called standard deviation parameters. In a comprehensive study, the developed algorithm and the efficiency of segmentation were first tested for synthetic images. Tests were also performed on breast and liver ultrasound images. The proposed method was compared with the watershed approach to show its efficiency. The performance of the segmentation was estimated using the area error rate. Using the standard deviation textural feature and a 5×5 kernel, our curve evolution was able to produce results close to the minimal area error rate (namely 8.88% for breast images and 10.82% for liver images). The image resolution was evaluated using the contrast-to-gradient method. The experiments showed promising segmentation results.

  12. Projection methods for line radiative transfer in spherical media.

    NASA Astrophysics Data System (ADS)

    Anusha, L. S.; Nagendra, K. N.

    An efficient numerical method called the Preconditioned Bi-Conjugate Gradient (Pre-BiCG) method is presented for the solution of radiative transfer equation in spherical geometry. A variant of this method called Stabilized Preconditioned Bi-Conjugate Gradient (Pre-BiCG-STAB) is also presented. These methods are based on projections on the subspaces of the n dimensional Euclidean space mathbb {R}n called Krylov subspaces. The methods are shown to be faster in terms of convergence rate compared to the contemporary iterative methods such as Jacobi, Gauss-Seidel and Successive Over Relaxation (SOR).

  13. Individual responsibility and health-risk behaviour: a contingent valuation study from the ex ante societal perspective.

    PubMed

    van der Star, Sanne M; van den Berg, Bernard

    2011-08-01

    This study analyzes peoples' social preferences for individual responsibility to health-risk behaviour in health care using the contingent valuation method adopting a societal perspective. We measure peoples' willingness to pay for inclusion of a treatment in basic health insurance of a hypothetical lifestyle dependent (smoking) and lifestyle independent (chronic) health problem. Our hypothesis is that peoples' willingness to pay for the independent and the dependent health problems are similar. As a methodological challenge, this study also analyzes the extent to which people consider their personal situation when answering contingent valuation questions adopting a societal perspective. 513 Dutch inhabitants responded to the questionnaire. They were asked to state their maximum willingness to pay for inclusion of treatments in basic health insurance package for two health problems. We asked them to assume that one hypothetical health problem was totally independent of behaviour (for simplicity called chronic disease). Alternatively, we asked them to assume that the other hypothetical health problem was totally caused by health-risk behaviour (for simplicity called smoking disease). We applied the payment card method to guide respondents to answer the contingent valuation method questions. Mean willingness to pay was 42.39 Euros (CI=37.24-47.55) for inclusion of treatment for health problem that was unrelated to behaviour, with '5-10' and '10-20 Euros' as most frequently stated answers. In contrast, mean willingness to pay for inclusion treatment for health-risk related problem was 11.29 Euros (CI=8.83-14.55), with '0' and '0-5 Euros' as most frequently provided answers. Difference in mean willingness to pay was substantial (over 30 Euros) and statistically significant (p-value=0.000). Smokers were statistically significantly more (p-value<0.01) willing to pay for the health-risk related (smoking) problem compared with non-smokers, while people with chronic condition were not willing to pay more for the health-risk unrelated (chronic) problem than people without chronic condition. This suggests that sub groups of people might differ in terms of abstracting from their personal situation when answering valuation questions from a societal perspective. Copyright © 2010 Elsevier Ireland Ltd. All rights reserved.

  14. Aggregative Learning Method and Its Application for Communication Quality Evaluation

    NASA Astrophysics Data System (ADS)

    Akhmetov, Dauren F.; Kotaki, Minoru

    2007-12-01

    In this paper, so-called Aggregative Learning Method (ALM) is proposed to improve and simplify the learning and classification abilities of different data processing systems. It provides a universal basis for design and analysis of mathematical models of wide class. A procedure was elaborated for time series model reconstruction and analysis for linear and nonlinear cases. Data approximation accuracy (during learning phase) and data classification quality (during recall phase) are estimated from introduced statistic parameters. The validity and efficiency of the proposed approach have been demonstrated through its application for monitoring of wireless communication quality, namely, for Fixed Wireless Access (FWA) system. Low memory and computation resources were shown to be needed for the procedure realization, especially for data classification (recall) stage. Characterized with high computational efficiency and simple decision making procedure, the derived approaches can be useful for simple and reliable real-time surveillance and control system design.

  15. Image analysis for microelectronic retinal prosthesis.

    PubMed

    Hallum, L E; Cloherty, S L; Lovell, N H

    2008-01-01

    By way of extracellular, stimulating electrodes, a microelectronic retinal prosthesis aims to render discrete, luminous spots-so-called phosphenes-in the visual field, thereby providing a phosphene image (PI) as a rudimentary remediation of profound blindness. As part thereof, a digital camera, or some other photosensitive array, captures frames, frames are analyzed, and phosphenes are actuated accordingly by way of modulated charge injections. Here, we present a method that allows the assessment of image analysis schemes for integration with a prosthetic device, that is, the means of converting the captured image (high resolution) to modulated charge injections (low resolution). We use the mutual-information function to quantify the amount of information conveyed to the PI observer (device implantee), while accounting for the statistics of visual stimuli. We demonstrate an effective scheme involving overlapping, Gaussian kernels, and discuss extensions of the method to account for shortterm visual memory in observers, and their perceptual errors of omission and commission.

  16. Improving Electronic Sensor Reliability by Robust Outlier Screening

    PubMed Central

    Moreno-Lizaranzu, Manuel J.; Cuesta, Federico

    2013-01-01

    Electronic sensors are widely used in different application areas, and in some of them, such as automotive or medical equipment, they must perform with an extremely low defect rate. Increasing reliability is paramount. Outlier detection algorithms are a key component in screening latent defects and decreasing the number of customer quality incidents (CQIs). This paper focuses on new spatial algorithms (Good Die in a Bad Cluster with Statistical Bins (GDBC SB) and Bad Bin in a Bad Cluster (BBBC)) and an advanced outlier screening method, called Robust Dynamic Part Averaging Testing (RDPAT), as well as two practical improvements, which significantly enhance existing algorithms. Those methods have been used in production in Freescale® Semiconductor probe factories around the world for several years. Moreover, a study was conducted with production data of 289,080 dice with 26 CQIs to determine and compare the efficiency and effectiveness of all these algorithms in identifying CQIs. PMID:24113682

  17. Improving electronic sensor reliability by robust outlier screening.

    PubMed

    Moreno-Lizaranzu, Manuel J; Cuesta, Federico

    2013-10-09

    Electronic sensors are widely used in different application areas, and in some of them, such as automotive or medical equipment, they must perform with an extremely low defect rate. Increasing reliability is paramount. Outlier detection algorithms are a key component in screening latent defects and decreasing the number of customer quality incidents (CQIs). This paper focuses on new spatial algorithms (Good Die in a Bad Cluster with Statistical Bins (GDBC SB) and Bad Bin in a Bad Cluster (BBBC)) and an advanced outlier screening method, called Robust Dynamic Part Averaging Testing (RDPAT), as well as two practical improvements, which significantly enhance existing algorithms. Those methods have been used in production in Freescale® Semiconductor probe factories around the world for several years. Moreover, a study was conducted with production data of 289,080 dice with 26 CQIs to determine and compare the efficiency and effectiveness of all these algorithms in identifying CQIs.

  18. Correlations between emission timescale of fragments and isospin dynamics in 124Sn+64Ni and 112Sn+58Ni reactions at 35A MeV

    NASA Astrophysics Data System (ADS)

    De Filippo, E.; Pagano, A.; Russotto, P.; Amorini, F.; Anzalone, A.; Auditore, L.; Baran, V.; Berceanu, I.; Borderie, B.; Bougault, R.; Bruno, M.; Cap, T.; Cardella, G.; Cavallaro, S.; Chatterjee, M. B.; Chbihi, A.; Colonna, M.; D'Agostino, M.; Dayras, R.; Di Toro, M.; Frankland, J.; Galichet, E.; Gawlikowicz, W.; Geraci, E.; Grzeszczuk, A.; Guazzoni, P.; Kowalski, S.; La Guidara, E.; Lanzalone, G.; Lanzanò, G.; Le Neindre, N.; Lombardo, I.; Maiolino, C.; Papa, M.; Piasecki, E.; Pirrone, S.; Płaneta, R.; Politi, G.; Pop, A.; Porto, F.; Rivet, M. F.; Rizzo, F.; Rosato, E.; Schmidt, K.; Siwek-Wilczyńska, K.; Skwira-Chalot, I.; Trifirò, A.; Trimarchi, M.; Verde, G.; Vigilante, M.; Wieleczko, J. P.; Wilczyński, J.; Zetta, L.; Zipper, W.

    2012-07-01

    We present a new experimental method to correlate the isotopic composition of intermediate mass fragments (IMF) emitted at midrapidity in semiperipheral collisions with the emission timescale: IMFs emitted in the early stage of the reaction show larger values of isospin asymmetry, stronger angular anisotropies, and reduced odd-even staggering effects in neutron to proton ratio distributions than those produced in sequential statistical emission. All these effects support the concept of isospin “migration”, that is sensitive to the density gradient between participant and quasispectator nuclear matter, in the so called neck fragmentation mechanism. By comparing the data to a stochastic mean field (SMF) simulation we show that this method gives valuable constraints on the symmetry energy term of nuclear equation of state at subsaturation densities. An indication emerges for a linear density dependence of the symmetry energy.

  19. Chaos based video encryption using maps and Ikeda time delay system

    NASA Astrophysics Data System (ADS)

    Valli, D.; Ganesan, K.

    2017-12-01

    Chaos based cryptosystems are an efficient method to deal with improved speed and highly secured multimedia encryption because of its elegant features, such as randomness, mixing, ergodicity, sensitivity to initial conditions and control parameters. In this paper, two chaos based cryptosystems are proposed: one is the higher-dimensional 12D chaotic map and the other is based on the Ikeda delay differential equation (DDE) suitable for designing a real-time secure symmetric video encryption scheme. These encryption schemes employ a substitution box (S-box) to diffuse the relationship between pixels of plain video and cipher video along with the diffusion of current input pixel with the previous cipher pixel, called cipher block chaining (CBC). The proposed method enhances the robustness against statistical, differential and chosen/known plain text attacks. Detailed analysis is carried out in this paper to demonstrate the security and uniqueness of the proposed scheme.

  20. Optimization of radial-type superconducting magnetic bearing using the Taguchi method

    NASA Astrophysics Data System (ADS)

    Ai, Liwang; Zhang, Guomin; Li, Wanjie; Liu, Guole; Liu, Qi

    2018-07-01

    It is important and complicated to model and optimize the levitation behavior of superconducting magnetic bearing (SMB). That is due to the nonlinear constitutive relationships of superconductor and ferromagnetic materials, the relative movement between the superconducting stator and PM rotor, and the multi-parameter (e.g., air-gap, critical current density, and remanent flux density, etc.) affecting the levitation behavior. In this paper, we present a theoretical calculation and optimization method of the levitation behavior for radial-type SMB. A simplified model of levitation force calculation is established using 2D finite element method with H-formulation. In the model, the boundary condition of superconducting stator is imposed by harmonic series expressions to describe the traveling magnetic field generated by the moving PM rotor. Also, experimental measurements of the levitation force are performed and validate the model method. A statistical method called Taguchi method is adopted to carry out an optimization of load capacity for SMB. Then the factor effects of six optimization parameters on the target characteristics are discussed and the optimum parameters combination is determined finally. The results show that the levitation behavior of SMB is greatly improved and the Taguchi method is suitable for optimizing the SMB.

Top