Sample records for statistical model identification

  1. Probability of identification: a statistical model for the validation of qualitative botanical identification methods.

    PubMed

    LaBudde, Robert A; Harnly, James M

    2012-01-01

    A qualitative botanical identification method (BIM) is an analytical procedure that returns a binary result (1 = Identified, 0 = Not Identified). A BIM may be used by a buyer, manufacturer, or regulator to determine whether a botanical material being tested is the same as the target (desired) material, or whether it contains excessive nontarget (undesirable) material. The report describes the development and validation of studies for a BIM based on the proportion of replicates identified, or probability of identification (POI), as the basic observed statistic. The statistical procedures proposed for data analysis follow closely those of the probability of detection, and harmonize the statistical concepts and parameters between quantitative and qualitative method validation. Use of POI statistics also harmonizes statistical concepts for botanical, microbiological, toxin, and other analyte identification methods that produce binary results. The POI statistical model provides a tool for graphical representation of response curves for qualitative methods, reporting of descriptive statistics, and application of performance requirements. Single collaborator and multicollaborative study examples are given.

  2. Model Identification in Time-Series Analysis: Some Empirical Results.

    ERIC Educational Resources Information Center

    Padia, William L.

    Model identification of time-series data is essential to valid statistical tests of intervention effects. Model identification is, at best, inexact in the social and behavioral sciences where one is often confronted with small numbers of observations. These problems are discussed, and the results of independent identifications of 130 social and…

  3. Mixed models, linear dependency, and identification in age-period-cohort models.

    PubMed

    O'Brien, Robert M

    2017-07-20

    This paper examines the identification problem in age-period-cohort models that use either linear or categorically coded ages, periods, and cohorts or combinations of these parameterizations. These models are not identified using the traditional fixed effect regression model approach because of a linear dependency between the ages, periods, and cohorts. However, these models can be identified if the researcher introduces a single just identifying constraint on the model coefficients. The problem with such constraints is that the results can differ substantially depending on the constraint chosen. Somewhat surprisingly, age-period-cohort models that specify one or more of ages and/or periods and/or cohorts as random effects are identified. This is the case without introducing an additional constraint. I label this identification as statistical model identification and show how statistical model identification comes about in mixed models and why which effects are treated as fixed and which are treated as random can substantially change the estimates of the age, period, and cohort effects. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.

  4. Theoretic aspects of the identification of the parameters in the optimal control model

    NASA Technical Reports Server (NTRS)

    Vanwijk, R. A.; Kok, J. J.

    1977-01-01

    The identification of the parameters of the optimal control model from input-output data of the human operator is considered. Accepting the basic structure of the model as a cascade of a full-order observer and a feedback law, and suppressing the inherent optimality of the human controller, the parameters to be identified are the feedback matrix, the observer gain matrix, and the intensity matrices of the observation noise and the motor noise. The identification of the parameters is a statistical problem, because the system and output are corrupted by noise, and therefore the solution must be based on the statistics (probability density function) of the input and output data of the human operator. However, based on the statistics of the input-output data of the human operator, no distinction can be made between the observation and the motor noise, which shows that the model suffers from overparameterization.

  5. The Use of Computer-Assisted Identification of ARIMA Time-Series.

    ERIC Educational Resources Information Center

    Brown, Roger L.

    This study was conducted to determine the effects of using various levels of tutorial statistical software for the tentative identification of nonseasonal ARIMA models, a statistical technique proposed by Box and Jenkins for the interpretation of time-series data. The Box-Jenkins approach is an iterative process encompassing several stages of…

  6. IDENTIFICATION OF REGIME SHIFTS IN TIME SERIES USING NEIGHBORHOOD STATISTICS

    EPA Science Inventory

    The identification of alternative dynamic regimes in ecological systems requires several lines of evidence. Previous work on time series analysis of dynamic regimes includes mainly model-fitting methods. We introduce two methods that do not use models. These approaches use state-...

  7. Visualization of the variability of 3D statistical shape models by animation.

    PubMed

    Lamecker, Hans; Seebass, Martin; Lange, Thomas; Hege, Hans-Christian; Deuflhard, Peter

    2004-01-01

    Models of the 3D shape of anatomical objects and the knowledge about their statistical variability are of great benefit in many computer assisted medical applications like images analysis, therapy or surgery planning. Statistical model of shapes have successfully been applied to automate the task of image segmentation. The generation of 3D statistical shape models requires the identification of corresponding points on two shapes. This remains a difficult problem, especially for shapes of complicated topology. In order to interpret and validate variations encoded in a statistical shape model, visual inspection is of great importance. This work describes the generation and interpretation of statistical shape models of the liver and the pelvic bone.

  8. RAId_DbS: Peptide Identification using Database Searches with Realistic Statistics

    PubMed Central

    Alves, Gelio; Ogurtsov, Aleksey Y; Yu, Yi-Kuo

    2007-01-01

    Background The key to mass-spectrometry-based proteomics is peptide identification. A major challenge in peptide identification is to obtain realistic E-values when assigning statistical significance to candidate peptides. Results Using a simple scoring scheme, we propose a database search method with theoretically characterized statistics. Taking into account possible skewness in the random variable distribution and the effect of finite sampling, we provide a theoretical derivation for the tail of the score distribution. For every experimental spectrum examined, we collect the scores of peptides in the database, and find good agreement between the collected score statistics and our theoretical distribution. Using Student's t-tests, we quantify the degree of agreement between the theoretical distribution and the score statistics collected. The T-tests may be used to measure the reliability of reported statistics. When combined with reported P-value for a peptide hit using a score distribution model, this new measure prevents exaggerated statistics. Another feature of RAId_DbS is its capability of detecting multiple co-eluted peptides. The peptide identification performance and statistical accuracy of RAId_DbS are assessed and compared with several other search tools. The executables and data related to RAId_DbS are freely available upon request. PMID:17961253

  9. K-nearest neighbors based methods for identification of different gear crack levels under different motor speeds and loads: Revisited

    NASA Astrophysics Data System (ADS)

    Wang, Dong

    2016-03-01

    Gears are the most commonly used components in mechanical transmission systems. Their failures may cause transmission system breakdown and result in economic loss. Identification of different gear crack levels is important to prevent any unexpected gear failure because gear cracks lead to gear tooth breakage. Signal processing based methods mainly require expertize to explain gear fault signatures which is usually not easy to be achieved by ordinary users. In order to automatically identify different gear crack levels, intelligent gear crack identification methods should be developed. The previous case studies experimentally proved that K-nearest neighbors based methods exhibit high prediction accuracies for identification of 3 different gear crack levels under different motor speeds and loads. In this short communication, to further enhance prediction accuracies of existing K-nearest neighbors based methods and extend identification of 3 different gear crack levels to identification of 5 different gear crack levels, redundant statistical features are constructed by using Daubechies 44 (db44) binary wavelet packet transform at different wavelet decomposition levels, prior to the use of a K-nearest neighbors method. The dimensionality of redundant statistical features is 620, which provides richer gear fault signatures. Since many of these statistical features are redundant and highly correlated with each other, dimensionality reduction of redundant statistical features is conducted to obtain new significant statistical features. At last, the K-nearest neighbors method is used to identify 5 different gear crack levels under different motor speeds and loads. A case study including 3 experiments is investigated to demonstrate that the developed method provides higher prediction accuracies than the existing K-nearest neighbors based methods for recognizing different gear crack levels under different motor speeds and loads. Based on the new significant statistical features, some other popular statistical models including linear discriminant analysis, quadratic discriminant analysis, classification and regression tree and naive Bayes classifier, are compared with the developed method. The results show that the developed method has the highest prediction accuracies among these statistical models. Additionally, selection of the number of new significant features and parameter selection of K-nearest neighbors are thoroughly investigated.

  10. Fast Bayesian approach for modal identification using free vibration data, Part I - Most probable value

    NASA Astrophysics Data System (ADS)

    Zhang, Feng-Liang; Ni, Yan-Chun; Au, Siu-Kui; Lam, Heung-Fai

    2016-03-01

    The identification of modal properties from field testing of civil engineering structures is becoming economically viable, thanks to the advent of modern sensor and data acquisition technology. Its demand is driven by innovative structural designs and increased performance requirements of dynamic-prone structures that call for a close cross-checking or monitoring of their dynamic properties and responses. Existing instrumentation capabilities and modal identification techniques allow structures to be tested under free vibration, forced vibration (known input) or ambient vibration (unknown broadband loading). These tests can be considered complementary rather than competing as they are based on different modeling assumptions in the identification model and have different implications on costs and benefits. Uncertainty arises naturally in the dynamic testing of structures due to measurement noise, sensor alignment error, modeling error, etc. This is especially relevant in field vibration tests because the test condition in the field environment can hardly be controlled. In this work, a Bayesian statistical approach is developed for modal identification using the free vibration response of structures. A frequency domain formulation is proposed that makes statistical inference based on the Fast Fourier Transform (FFT) of the data in a selected frequency band. This significantly simplifies the identification model because only the modes dominating the frequency band need to be included. It also legitimately ignores the information in the excluded frequency bands that are either irrelevant or difficult to model, thereby significantly reducing modeling error risk. The posterior probability density function (PDF) of the modal parameters is derived rigorously from modeling assumptions and Bayesian probability logic. Computational difficulties associated with calculating the posterior statistics, including the most probable value (MPV) and the posterior covariance matrix, are addressed. Fast computational algorithms for determining the MPV are proposed so that the method can be practically implemented. In the companion paper (Part II), analytical formulae are derived for the posterior covariance matrix so that it can be evaluated without resorting to finite difference method. The proposed method is verified using synthetic data. It is also applied to modal identification of full-scale field structures.

  11. Probability of identification (POI): a statistical model for the validation of qualitative botanical identification methods

    USDA-ARS?s Scientific Manuscript database

    A qualitative botanical identification method (BIM) is an analytical procedure which returns a binary result (1 = Identified, 0 = Not Identified). A BIM may be used by a buyer, manufacturer, or regulator to determine whether a botanical material being tested is the same as the target (desired) mate...

  12. A Statistical Decision Model for Periodical Selection for a Specialized Information Center

    ERIC Educational Resources Information Center

    Dym, Eleanor D.; Shirey, Donald L.

    1973-01-01

    An experiment is described which attempts to define a quantitative methodology for the identification and evaluation of all possibly relevant periodical titles containing toxicological-biological information. A statistical decision model was designed and employed, along with yes/no criteria questions, a training technique and a quality control…

  13. Never too old for anonymity: a statistical standard for demographic data sharing via the HIPAA Privacy Rule

    PubMed Central

    Benitez, Kathleen; Masys, Daniel

    2010-01-01

    Objective Healthcare organizations must de-identify patient records before sharing data. Many organizations rely on the Safe Harbor Standard of the HIPAA Privacy Rule, which enumerates 18 identifiers that must be suppressed (eg, ages over 89). An alternative model in the Privacy Rule, known as the Statistical Standard, can facilitate the sharing of more detailed data, but is rarely applied because of a lack of published methodologies. The authors propose an intuitive approach to de-identifying patient demographics in accordance with the Statistical Standard. Design The authors conduct an analysis of the demographics of patient cohorts in five medical centers developed for the NIH-sponsored Electronic Medical Records and Genomics network, with respect to the US census. They report the re-identification risk of patient demographics disclosed according to the Safe Harbor policy and the relative risk rate for sharing such information via alternative policies. Measurements The re-identification risk of Safe Harbor demographics ranged from 0.01% to 0.19%. The findings show alternative de-identification models can be created with risks no greater than Safe Harbor. The authors illustrate that the disclosure of patient ages over the age of 89 is possible when other features are reduced in granularity. Limitations The de-identification approach described in this paper was evaluated with demographic data only and should be evaluated with other potential identifiers. Conclusion Alternative de-identification policies to the Safe Harbor model can be derived for patient demographics to enable the disclosure of values that were previously suppressed. The method is generalizable to any environment in which population statistics are available. PMID:21169618

  14. Functional recognition imaging using artificial neural networks: applications to rapid cellular identification via broadband electromechanical response

    NASA Astrophysics Data System (ADS)

    Nikiforov, M. P.; Reukov, V. V.; Thompson, G. L.; Vertegel, A. A.; Guo, S.; Kalinin, S. V.; Jesse, S.

    2009-10-01

    Functional recognition imaging in scanning probe microscopy (SPM) using artificial neural network identification is demonstrated. This approach utilizes statistical analysis of complex SPM responses at a single spatial location to identify the target behavior, which is reminiscent of associative thinking in the human brain, obviating the need for analytical models. We demonstrate, as an example of recognition imaging, rapid identification of cellular organisms using the difference in electromechanical activity over a broad frequency range. Single-pixel identification of model Micrococcus lysodeikticus and Pseudomonas fluorescens bacteria is achieved, demonstrating the viability of the method.

  15. On-Orbit System Identification

    NASA Technical Reports Server (NTRS)

    Mettler, E.; Milman, M. H.; Bayard, D.; Eldred, D. B.

    1987-01-01

    Information derived from accelerometer readings benefits important engineering and control functions. Report discusses methodology for detection, identification, and analysis of motions within space station. Techniques of vibration and rotation analyses, control theory, statistics, filter theory, and transform methods integrated to form system for generating models and model parameters that characterize total motion of complicated space station, with respect to both control-induced and random mechanical disturbances.

  16. Evaluation of trace analyte identification in complex matrices by low-resolution gas chromatography--Mass spectrometry through signal simulation.

    PubMed

    Bettencourt da Silva, Ricardo J N

    2016-04-01

    The identification of trace levels of compounds in complex matrices by conventional low-resolution gas chromatography hyphenated with mass spectrometry is based in the comparison of retention times and abundance ratios of characteristic mass spectrum fragments of analyte peaks from calibrators with sample peaks. Statistically sound criteria for the comparison of these parameters were developed based on the normal distribution of retention times and the simulation of possible non-normal distribution of correlated abundances ratios. The confidence level used to set the statistical maximum and minimum limits of parameters defines the true positive rates of identifications. The false positive rate of identification was estimated from worst-case signal noise models. The estimated true and false positive identifications rate from one retention time and two correlated ratios of three fragments abundances were combined using simple Bayes' statistics to estimate the probability of compound identification being correct designated examination uncertainty. Models of the variation of examination uncertainty with analyte quantity allowed the estimation of the Limit of Examination as the lowest quantity that produced "Extremely strong" evidences of compound presence. User friendly MS-Excel files are made available to allow the easy application of developed approach in routine and research laboratories. The developed approach was successfully applied to the identification of chlorpyrifos-methyl and malathion in QuEChERS method extracts of vegetables with high water content for which the estimated Limit of Examination is 0.14 mg kg(-1) and 0.23 mg kg(-1) respectively. Copyright © 2015 Elsevier B.V. All rights reserved.

  17. Time Series Model Identification by Estimating Information.

    DTIC Science & Technology

    1982-11-01

    principle, Applications of Statistics, P. R. Krishnaiah , ed., North-Holland: Amsterdam, 27-41. Anderson, T. W. (1971). The Statistical Analysis of Time Series...E. (1969). Multiple Time Series Modeling, Multivariate Analysis II, edited by P. Krishnaiah , Academic Press: New York, 389-409. Parzen, E. (1981...Newton, H. J. (1980). Multiple Time Series Modeling, II Multivariate Analysis - V, edited by P. Krishnaiah , North Holland: Amsterdam, 181-197. Shibata, R

  18. Influence of Head Motion on the Accuracy of 3D Reconstruction with Cone-Beam CT: Landmark Identification Errors in Maxillofacial Surface Model.

    PubMed

    Lee, Kyung-Min; Song, Jin-Myoung; Cho, Jin-Hyoung; Hwang, Hyeon-Shik

    2016-01-01

    The purpose of this study was to investigate the influence of head motion on the accuracy of three-dimensional (3D) reconstruction with cone-beam computed tomography (CBCT) scan. Fifteen dry skulls were incorporated into a motion controller which simulated four types of head motion during CBCT scan: 2 horizontal rotations (to the right/to the left) and 2 vertical rotations (upward/downward). Each movement was triggered to occur at the start of the scan for 1 second by remote control. Four maxillofacial surface models with head motion and one control surface model without motion were obtained for each skull. Nine landmarks were identified on the five maxillofacial surface models for each skull, and landmark identification errors were compared between the control model and each of the models with head motion. Rendered surface models with head motion were similar to the control model in appearance; however, the landmark identification errors showed larger values in models with head motion than in the control. In particular, the Porion in the horizontal rotation models presented statistically significant differences (P < .05). Statistically significant difference in the errors between the right and left side landmark was present in the left side rotation which was opposite direction to the scanner rotation (P < .05). Patient movement during CBCT scan might cause landmark identification errors on the 3D surface model in relation to the direction of the scanner rotation. Clinicians should take this into consideration to prevent patient movement during CBCT scan, particularly horizontal movement.

  19. Asymptotic inference in system identification for the atom maser.

    PubMed

    Catana, Catalin; van Horssen, Merlijn; Guta, Madalin

    2012-11-28

    System identification is closely related to control theory and plays an increasing role in quantum engineering. In the quantum set-up, system identification is usually equated to process tomography, i.e. estimating a channel by probing it repeatedly with different input states. However, for quantum dynamical systems such as quantum Markov processes, it is more natural to consider the estimation based on continuous measurements of the output, with a given input that may be stationary. We address this problem using asymptotic statistics tools, for the specific example of estimating the Rabi frequency of an atom maser. We compute the Fisher information of different measurement processes as well as the quantum Fisher information of the atom maser, and establish the local asymptotic normality of these statistical models. The statistical notions can be expressed in terms of spectral properties of certain deformed Markov generators, and the connection to large deviations is briefly discussed.

  20. Multi-innovation auto-constructed least squares identification for 4 DOF ship manoeuvring modelling with full-scale trial data.

    PubMed

    Zhang, Guoqing; Zhang, Xianku; Pang, Hongshuai

    2015-09-01

    This research is concerned with the problem of 4 degrees of freedom (DOF) ship manoeuvring identification modelling with the full-scale trial data. To avoid the multi-innovation matrix inversion in the conventional multi-innovation least squares (MILS) algorithm, a new transformed multi-innovation least squares (TMILS) algorithm is first developed by virtue of the coupling identification concept. And much effort is made to guarantee the uniformly ultimate convergence. Furthermore, the auto-constructed TMILS scheme is derived for the ship manoeuvring motion identification by combination with a statistic index. Comparing with the existing results, the proposed scheme has the significant computational advantage and is able to estimate the model structure. The illustrative examples demonstrate the effectiveness of the proposed algorithm, especially including the identification application with full-scale trial data. Copyright © 2015 ISA. Published by Elsevier Ltd. All rights reserved.

  1. Forecasting volatility with neural regression: a contribution to model adequacy.

    PubMed

    Refenes, A N; Holt, W T

    2001-01-01

    Neural nets' usefulness for forecasting is limited by problems of overfitting and the lack of rigorous procedures for model identification, selection and adequacy testing. This paper describes a methodology for neural model misspecification testing. We introduce a generalization of the Durbin-Watson statistic for neural regression and discuss the general issues of misspecification testing using residual analysis. We derive a generalized influence matrix for neural estimators which enables us to evaluate the distribution of the statistic. We deploy Monte Carlo simulation to compare the power of the test for neural and linear regressors. While residual testing is not a sufficient condition for model adequacy, it is nevertheless a necessary condition to demonstrate that the model is a good approximation to the data generating process, particularly as neural-network estimation procedures are susceptible to partial convergence. The work is also an important step toward developing rigorous procedures for neural model identification, selection and adequacy testing which have started to appear in the literature. We demonstrate its applicability in the nontrivial problem of forecasting implied volatility innovations using high-frequency stock index options. Each step of the model building process is validated using statistical tests to verify variable significance and model adequacy with the results confirming the presence of nonlinear relationships in implied volatility innovations.

  2. Automatic identification of bullet signatures based on consecutive matching striae (CMS) criteria.

    PubMed

    Chu, Wei; Thompson, Robert M; Song, John; Vorburger, Theodore V

    2013-09-10

    The consecutive matching striae (CMS) numeric criteria for firearm and toolmark identifications have been widely accepted by forensic examiners, although there have been questions concerning its observer subjectivity and limited statistical support. In this paper, based on signal processing and extraction, a model for the automatic and objective counting of CMS is proposed. The position and shape information of the striae on the bullet land is represented by a feature profile, which is used for determining the CMS number automatically. Rapid counting of CMS number provides a basis for ballistics correlations with large databases and further statistical and probability analysis. Experimental results in this report using bullets fired from ten consecutively manufactured barrels support this developed model. Published by Elsevier Ireland Ltd.

  3. Heuristic Identification of Biological Architectures for Simulating Complex Hierarchical Genetic Interactions

    PubMed Central

    Moore, Jason H; Amos, Ryan; Kiralis, Jeff; Andrews, Peter C

    2015-01-01

    Simulation plays an essential role in the development of new computational and statistical methods for the genetic analysis of complex traits. Most simulations start with a statistical model using methods such as linear or logistic regression that specify the relationship between genotype and phenotype. This is appealing due to its simplicity and because these statistical methods are commonly used in genetic analysis. It is our working hypothesis that simulations need to move beyond simple statistical models to more realistically represent the biological complexity of genetic architecture. The goal of the present study was to develop a prototype genotype–phenotype simulation method and software that are capable of simulating complex genetic effects within the context of a hierarchical biology-based framework. Specifically, our goal is to simulate multilocus epistasis or gene–gene interaction where the genetic variants are organized within the framework of one or more genes, their regulatory regions and other regulatory loci. We introduce here the Heuristic Identification of Biological Architectures for simulating Complex Hierarchical Interactions (HIBACHI) method and prototype software for simulating data in this manner. This approach combines a biological hierarchy, a flexible mathematical framework, a liability threshold model for defining disease endpoints, and a heuristic search strategy for identifying high-order epistatic models of disease susceptibility. We provide several simulation examples using genetic models exhibiting independent main effects and three-way epistatic effects. PMID:25395175

  4. Simulation, identification and statistical variation in cardiovascular analysis (SISCA) - A software framework for multi-compartment lumped modeling.

    PubMed

    Huttary, Rudolf; Goubergrits, Leonid; Schütte, Christof; Bernhard, Stefan

    2017-08-01

    It has not yet been possible to obtain modeling approaches suitable for covering a wide range of real world scenarios in cardiovascular physiology because many of the system parameters are uncertain or even unknown. Natural variability and statistical variation of cardiovascular system parameters in healthy and diseased conditions are characteristic features for understanding cardiovascular diseases in more detail. This paper presents SISCA, a novel software framework for cardiovascular system modeling and its MATLAB implementation. The framework defines a multi-model statistical ensemble approach for dimension reduced, multi-compartment models and focuses on statistical variation, system identification and patient-specific simulation based on clinical data. We also discuss a data-driven modeling scenario as a use case example. The regarded dataset originated from routine clinical examinations and comprised typical pre and post surgery clinical data from a patient diagnosed with coarctation of aorta. We conducted patient and disease specific pre/post surgery modeling by adapting a validated nominal multi-compartment model with respect to structure and parametrization using metadata and MRI geometry. In both models, the simulation reproduced measured pressures and flows fairly well with respect to stenosis and stent treatment and by pre-treatment cross stenosis phase shift of the pulse wave. However, with post-treatment data showing unrealistic phase shifts and other more obvious inconsistencies within the dataset, the methods and results we present suggest that conditioning and uncertainty management of routine clinical data sets needs significantly more attention to obtain reasonable results in patient-specific cardiovascular modeling. Copyright © 2017 Elsevier Ltd. All rights reserved.

  5. Identification and Forecasting in Mortality Models

    PubMed Central

    Nielsen, Jens P.

    2014-01-01

    Mortality models often have inbuilt identification issues challenging the statistician. The statistician can choose to work with well-defined freely varying parameters, derived as maximal invariants in this paper, or with ad hoc identified parameters which at first glance seem more intuitive, but which can introduce a number of unnecessary challenges. In this paper we describe the methodological advantages from using the maximal invariant parameterisation and we go through the extra methodological challenges a statistician has to deal with when insisting on working with ad hoc identifications. These challenges are broadly similar in frequentist and in Bayesian setups. We also go through a number of examples from the literature where ad hoc identifications have been preferred in the statistical analyses. PMID:24987729

  6. Estimating error rates for firearm evidence identifications in forensic science

    PubMed Central

    Song, John; Vorburger, Theodore V.; Chu, Wei; Yen, James; Soons, Johannes A.; Ott, Daniel B.; Zhang, Nien Fan

    2018-01-01

    Estimating error rates for firearm evidence identification is a fundamental challenge in forensic science. This paper describes the recently developed congruent matching cells (CMC) method for image comparisons, its application to firearm evidence identification, and its usage and initial tests for error rate estimation. The CMC method divides compared topography images into correlation cells. Four identification parameters are defined for quantifying both the topography similarity of the correlated cell pairs and the pattern congruency of the registered cell locations. A declared match requires a significant number of CMCs, i.e., cell pairs that meet all similarity and congruency requirements. Initial testing on breech face impressions of a set of 40 cartridge cases fired with consecutively manufactured pistol slides showed wide separation between the distributions of CMC numbers observed for known matching and known non-matching image pairs. Another test on 95 cartridge cases from a different set of slides manufactured by the same process also yielded widely separated distributions. The test results were used to develop two statistical models for the probability mass function of CMC correlation scores. The models were applied to develop a framework for estimating cumulative false positive and false negative error rates and individual error rates of declared matches and non-matches for this population of breech face impressions. The prospect for applying the models to large populations and realistic case work is also discussed. The CMC method can provide a statistical foundation for estimating error rates in firearm evidence identifications, thus emulating methods used for forensic identification of DNA evidence. PMID:29331680

  7. Estimating error rates for firearm evidence identifications in forensic science.

    PubMed

    Song, John; Vorburger, Theodore V; Chu, Wei; Yen, James; Soons, Johannes A; Ott, Daniel B; Zhang, Nien Fan

    2018-03-01

    Estimating error rates for firearm evidence identification is a fundamental challenge in forensic science. This paper describes the recently developed congruent matching cells (CMC) method for image comparisons, its application to firearm evidence identification, and its usage and initial tests for error rate estimation. The CMC method divides compared topography images into correlation cells. Four identification parameters are defined for quantifying both the topography similarity of the correlated cell pairs and the pattern congruency of the registered cell locations. A declared match requires a significant number of CMCs, i.e., cell pairs that meet all similarity and congruency requirements. Initial testing on breech face impressions of a set of 40 cartridge cases fired with consecutively manufactured pistol slides showed wide separation between the distributions of CMC numbers observed for known matching and known non-matching image pairs. Another test on 95 cartridge cases from a different set of slides manufactured by the same process also yielded widely separated distributions. The test results were used to develop two statistical models for the probability mass function of CMC correlation scores. The models were applied to develop a framework for estimating cumulative false positive and false negative error rates and individual error rates of declared matches and non-matches for this population of breech face impressions. The prospect for applying the models to large populations and realistic case work is also discussed. The CMC method can provide a statistical foundation for estimating error rates in firearm evidence identifications, thus emulating methods used for forensic identification of DNA evidence. Published by Elsevier B.V.

  8. Laboratory for Engineering Man/Machine Systems (LEMS): System identification, model reduction and deconvolution filtering using Fourier based modulating signals and high order statistics

    NASA Technical Reports Server (NTRS)

    Pan, Jianqiang

    1992-01-01

    Several important problems in the fields of signal processing and model identification, such as system structure identification, frequency response determination, high order model reduction, high resolution frequency analysis, deconvolution filtering, and etc. Each of these topics involves a wide range of applications and has received considerable attention. Using the Fourier based sinusoidal modulating signals, it is shown that a discrete autoregressive model can be constructed for the least squares identification of continuous systems. Some identification algorithms are presented for both SISO and MIMO systems frequency response determination using only transient data. Also, several new schemes for model reduction were developed. Based upon the complex sinusoidal modulating signals, a parametric least squares algorithm for high resolution frequency estimation is proposed. Numerical examples show that the proposed algorithm gives better performance than the usual. Also, the problem was studied of deconvolution and parameter identification of a general noncausal nonminimum phase ARMA system driven by non-Gaussian stationary random processes. Algorithms are introduced for inverse cumulant estimation, both in the frequency domain via the FFT algorithms and in the domain via the least squares algorithm.

  9. Statistical appearance models based on probabilistic correspondences.

    PubMed

    Krüger, Julia; Ehrhardt, Jan; Handels, Heinz

    2017-04-01

    Model-based image analysis is indispensable in medical image processing. One key aspect of building statistical shape and appearance models is the determination of one-to-one correspondences in the training data set. At the same time, the identification of these correspondences is the most challenging part of such methods. In our earlier work, we developed an alternative method using correspondence probabilities instead of exact one-to-one correspondences for a statistical shape model (Hufnagel et al., 2008). In this work, a new approach for statistical appearance models without one-to-one correspondences is proposed. A sparse image representation is used to build a model that combines point position and appearance information at the same time. Probabilistic correspondences between the derived multi-dimensional feature vectors are used to omit the need for extensive preprocessing of finding landmarks and correspondences as well as to reduce the dependence of the generated model on the landmark positions. Model generation and model fitting can now be expressed by optimizing a single global criterion derived from a maximum a-posteriori (MAP) approach with respect to model parameters that directly affect both shape and appearance of the considered objects inside the images. The proposed approach describes statistical appearance modeling in a concise and flexible mathematical framework. Besides eliminating the demand for costly correspondence determination, the method allows for additional constraints as topological regularity in the modeling process. In the evaluation the model was applied for segmentation and landmark identification in hand X-ray images. The results demonstrate the feasibility of the model to detect hand contours as well as the positions of the joints between finger bones for unseen test images. Further, we evaluated the model on brain data of stroke patients to show the ability of the proposed model to handle partially corrupted data and to demonstrate a possible employment of the correspondence probabilities to indicate these corrupted/pathological areas. Copyright © 2017 Elsevier B.V. All rights reserved.

  10. An adaptive optimal control for smart structures based on the subspace tracking identification technique

    NASA Astrophysics Data System (ADS)

    Ripamonti, Francesco; Resta, Ferruccio; Borroni, Massimo; Cazzulani, Gabriele

    2014-04-01

    A new method for the real-time identification of mechanical system modal parameters is used in order to design different adaptive control logics aiming to reduce the vibrations in a carbon fiber plate smart structure. It is instrumented with three piezoelectric actuators, three accelerometers and three strain gauges. The real-time identification is based on a recursive subspace tracking algorithm whose outputs are elaborated by an ARMA model. A statistical approach is finally applied to choose the modal parameter correct values. These are given in input to model-based control logics such as a gain scheduling and an adaptive LQR control.

  11. Overhead longwave infrared hyperspectral material identification using radiometric models

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zelinski, M. E.

    Material detection algorithms used in hyperspectral data processing are computationally efficient but can produce relatively high numbers of false positives. Material identification performed as a secondary processing step on detected pixels can help separate true and false positives. This paper presents a material identification processing chain for longwave infrared hyperspectral data of solid materials collected from airborne platforms. The algorithms utilize unwhitened radiance data and an iterative algorithm that determines the temperature, humidity, and ozone of the atmospheric profile. Pixel unmixing is done using constrained linear regression and Bayesian Information Criteria for model selection. The resulting product includes an optimalmore » atmospheric profile and full radiance material model that includes material temperature, abundance values, and several fit statistics. A logistic regression method utilizing all model parameters to improve identification is also presented. This paper details the processing chain and provides justification for the algorithms used. Several examples are provided using modeled data at different noise levels.« less

  12. Detection of crossover time scales in multifractal detrended fluctuation analysis

    NASA Astrophysics Data System (ADS)

    Ge, Erjia; Leung, Yee

    2013-04-01

    Fractal is employed in this paper as a scale-based method for the identification of the scaling behavior of time series. Many spatial and temporal processes exhibiting complex multi(mono)-scaling behaviors are fractals. One of the important concepts in fractals is crossover time scale(s) that separates distinct regimes having different fractal scaling behaviors. A common method is multifractal detrended fluctuation analysis (MF-DFA). The detection of crossover time scale(s) is, however, relatively subjective since it has been made without rigorous statistical procedures and has generally been determined by eye balling or subjective observation. Crossover time scales such determined may be spurious and problematic. It may not reflect the genuine underlying scaling behavior of a time series. The purpose of this paper is to propose a statistical procedure to model complex fractal scaling behaviors and reliably identify the crossover time scales under MF-DFA. The scaling-identification regression model, grounded on a solid statistical foundation, is first proposed to describe multi-scaling behaviors of fractals. Through the regression analysis and statistical inference, we can (1) identify the crossover time scales that cannot be detected by eye-balling observation, (2) determine the number and locations of the genuine crossover time scales, (3) give confidence intervals for the crossover time scales, and (4) establish the statistically significant regression model depicting the underlying scaling behavior of a time series. To substantive our argument, the regression model is applied to analyze the multi-scaling behaviors of avian-influenza outbreaks, water consumption, daily mean temperature, and rainfall of Hong Kong. Through the proposed model, we can have a deeper understanding of fractals in general and a statistical approach to identify multi-scaling behavior under MF-DFA in particular.

  13. Autoregressive statistical pattern recognition algorithms for damage detection in civil structures

    NASA Astrophysics Data System (ADS)

    Yao, Ruigen; Pakzad, Shamim N.

    2012-08-01

    Statistical pattern recognition has recently emerged as a promising set of complementary methods to system identification for automatic structural damage assessment. Its essence is to use well-known concepts in statistics for boundary definition of different pattern classes, such as those for damaged and undamaged structures. In this paper, several statistical pattern recognition algorithms using autoregressive models, including statistical control charts and hypothesis testing, are reviewed as potentially competitive damage detection techniques. To enhance the performance of statistical methods, new feature extraction techniques using model spectra and residual autocorrelation, together with resampling-based threshold construction methods, are proposed. Subsequently, simulated acceleration data from a multi degree-of-freedom system is generated to test and compare the efficiency of the existing and proposed algorithms. Data from laboratory experiments conducted on a truss and a large-scale bridge slab model are then used to further validate the damage detection methods and demonstrate the superior performance of proposed algorithms.

  14. 40 CFR Appendix Xviii to Part 86 - Statistical Outlier Identification Procedure for Light-Duty Vehicles and Light Light-Duty Trucks...

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 40 Protection of Environment 19 2011-07-01 2011-07-01 false Statistical Outlier Identification... (CONTINUED) Pt. 86, App. XVIII Appendix XVIII to Part 86—Statistical Outlier Identification Procedure for..., but suffer theoretical deficiencies if statistical significance tests are required. Consequently, the...

  15. 40 CFR Appendix Xviii to Part 86 - Statistical Outlier Identification Procedure for Light-Duty Vehicles and Light Light-Duty Trucks...

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 40 Protection of Environment 19 2010-07-01 2010-07-01 false Statistical Outlier Identification... (CONTINUED) Pt. 86, App. XVIII Appendix XVIII to Part 86—Statistical Outlier Identification Procedure for..., but suffer theoretical deficiencies if statistical significance tests are required. Consequently, the...

  16. Correlation techniques to determine model form in robust nonlinear system realization/identification

    NASA Technical Reports Server (NTRS)

    Stry, Greselda I.; Mook, D. Joseph

    1991-01-01

    The fundamental challenge in identification of nonlinear dynamic systems is determining the appropriate form of the model. A robust technique is presented which essentially eliminates this problem for many applications. The technique is based on the Minimum Model Error (MME) optimal estimation approach. A detailed literature review is included in which fundamental differences between the current approach and previous work is described. The most significant feature is the ability to identify nonlinear dynamic systems without prior assumption regarding the form of the nonlinearities, in contrast to existing nonlinear identification approaches which usually require detailed assumptions of the nonlinearities. Model form is determined via statistical correlation of the MME optimal state estimates with the MME optimal model error estimates. The example illustrations indicate that the method is robust with respect to prior ignorance of the model, and with respect to measurement noise, measurement frequency, and measurement record length.

  17. Statistical Development and Application of Cultural Consensus Theory

    DTIC Science & Technology

    2012-03-31

    Bulletin & Review , 17, 275-286. Schmittmann, V.D., Dolan, C.V., Raijmakers, M.E.J., and Batchelder, W.H. (2010). Parameter identification in...Wu, H., Myung, J.I., and Batchelder, W.H. (2010). Minimum description length model selection of multinomial processing tree models. Psychonomic

  18. Continuous-time system identification of a smoking cessation intervention

    NASA Astrophysics Data System (ADS)

    Timms, Kevin P.; Rivera, Daniel E.; Collins, Linda M.; Piper, Megan E.

    2014-07-01

    Cigarette smoking is a major global public health issue and the leading cause of preventable death in the United States. Toward a goal of designing better smoking cessation treatments, system identification techniques are applied to intervention data to describe smoking cessation as a process of behaviour change. System identification problems that draw from two modelling paradigms in quantitative psychology (statistical mediation and self-regulation) are considered, consisting of a series of continuous-time estimation problems. A continuous-time dynamic modelling approach is employed to describe the response of craving and smoking rates during a quit attempt, as captured in data from a smoking cessation clinical trial. The use of continuous-time models provide benefits of parsimony, ease of interpretation, and the opportunity to work with uneven or missing data.

  19. Terminology, concepts, and models in genetic epidemiology.

    PubMed

    Teare, M Dawn; Koref, Mauro F Santibàñez

    2011-01-01

    Genetic epidemiology brings together approaches and techniques developed in mathematical genetics and statistics, medical genetics, quantitative genetics, and epidemiology. In the 1980s, the focus was on the mapping and identification of genes where defects had large effects at the individual level. More recently, statistical and experimental advances have made possible to identify and characterise genes associated with small effects at the individual level. In this chapter, we provide a brief outline of the models, concepts, and terminology used in genetic epidemiology.

  20. Identification of Water Bodies in a Landsat 8 OLI Image Using a J48 Decision Tree.

    PubMed

    Acharya, Tri Dev; Lee, Dong Ha; Yang, In Tae; Lee, Jae Kang

    2016-07-12

    Water bodies are essential to humans and other forms of life. Identification of water bodies can be useful in various ways, including estimation of water availability, demarcation of flooded regions, change detection, and so on. In past decades, Landsat satellite sensors have been used for land use classification and water body identification. Due to the introduction of a New Operational Land Imager (OLI) sensor on Landsat 8 with a high spectral resolution and improved signal-to-noise ratio, the quality of imagery sensed by Landsat 8 has improved, enabling better characterization of land cover and increased data size. Therefore, it is necessary to explore the most appropriate and practical water identification methods that take advantage of the improved image quality and use the fewest inputs based on the original OLI bands. The objective of the study is to explore the potential of a J48 decision tree (JDT) in identifying water bodies using reflectance bands from Landsat 8 OLI imagery. J48 is an open-source decision tree. The test site for the study is in the Northern Han River Basin, which is located in Gangwon province, Korea. Training data with individual bands were used to develop the JDT model and later applied to the whole study area. The performance of the model was statistically analysed using the kappa statistic and area under the curve (AUC). The results were compared with five other known water identification methods using a confusion matrix and related statistics. Almost all the methods showed high accuracy, and the JDT was successfully applied to the OLI image using only four bands, where the new additional deep blue band of OLI was found to have the third highest information gain. Thus, the JDT can be a good method for water body identification based on images with improved resolution and increased size.

  1. Data-Driven Learning of Q-Matrix

    ERIC Educational Resources Information Center

    Liu, Jingchen; Xu, Gongjun; Ying, Zhiliang

    2012-01-01

    The recent surge of interests in cognitive assessment has led to developments of novel statistical models for diagnostic classification. Central to many such models is the well-known "Q"-matrix, which specifies the item-attribute relationships. This article proposes a data-driven approach to identification of the "Q"-matrix and estimation of…

  2. 45 CFR 310.10 - What are the functional requirements for the Model Tribal IV-D System?

    Code of Federal Regulations, 2010 CFR

    2010-10-01

    ... Number; and (E) Participant Identification Number; (ii) Delinquency and enforcement activities; (iii... operations and to assess program performance through the audit of financial and statistical data maintained...

  3. Statistical dependency in visual scanning

    NASA Technical Reports Server (NTRS)

    Ellis, Stephen R.; Stark, Lawrence

    1986-01-01

    A method to identify statistical dependencies in the positions of eye fixations is developed and applied to eye movement data from subjects who viewed dynamic displays of air traffic and judged future relative position of aircraft. Analysis of approximately 23,000 fixations on points of interest on the display identified statistical dependencies in scanning that were independent of the physical placement of the points of interest. Identification of these dependencies is inconsistent with random-sampling-based theories used to model visual search and information seeking.

  4. A model-based approach to wildland fire reconstruction using sediment charcoal records

    USGS Publications Warehouse

    Itter, Malcolm S.; Finley, Andrew O.; Hooten, Mevin B.; Higuera, Philip E.; Marlon, Jennifer R.; Kelly, Ryan; McLachlan, Jason S.

    2017-01-01

    Lake sediment charcoal records are used in paleoecological analyses to reconstruct fire history, including the identification of past wildland fires. One challenge of applying sediment charcoal records to infer fire history is the separation of charcoal associated with local fire occurrence and charcoal originating from regional fire activity. Despite a variety of methods to identify local fires from sediment charcoal records, an integrated statistical framework for fire reconstruction is lacking. We develop a Bayesian point process model to estimate the probability of fire associated with charcoal counts from individual-lake sediments and estimate mean fire return intervals. A multivariate extension of the model combines records from multiple lakes to reduce uncertainty in local fire identification and estimate a regional mean fire return interval. The univariate and multivariate models are applied to 13 lakes in the Yukon Flats region of Alaska. Both models resulted in similar mean fire return intervals (100–350 years) with reduced uncertainty under the multivariate model due to improved estimation of regional charcoal deposition. The point process model offers an integrated statistical framework for paleofire reconstruction and extends existing methods to infer regional fire history from multiple lake records with uncertainty following directly from posterior distributions.

  5. Application of 3D models of palatal rugae to personal identification: hints at identification from 3D-3D superimposition techniques.

    PubMed

    Gibelli, Daniele; De Angelis, Danilo; Pucciarelli, Valentina; Riboli, Francesco; Ferrario, Virgilio F; Dolci, Claudia; Sforza, Chiarella; Cattaneo, Cristina

    2017-11-20

    Palatal rugae are known in literature as individualizing anatomical structures with a strong potential for personal identification. However, a 3D assessment of their uniqueness has not yet been performed. The present study aims at verifying the uniqueness of 3D models of the palate. Twenty-six subjects were recruited among the orthodontic patients of a private dental office; from every patient, at least two dental casts were taken in different time periods, for a total of 62 casts. Dental casts were digitized by a 3D laser scanner (iSeries, Dental Wings©, Montreal, Canada). The palatal area was identified, and a series of 250 superimpositions was then performed automatically through VAM©software in order to reach the minimum point-to point distance between two models. In 36 matches the models belonged to the same individual, whereas in 214 mismatches they came from different subjects. The RMS (root mean square) of point-to-point distances was then calculated by 3D software. Possible statistically significant differences were assessed through Mann-Whitney test (p < 0.05). Results showed a statistically significant difference in RMS mean point-to-point distance between matches (mean 0.26 mm; SD 0.12) and mismatches (mean 1.30; SD 0.44) (p < 0.0001).All matches reached an RMS value below 0.50 mm. This study first provided an assessment of uniqueness of palatal rugae, based on their anatomical 3D conformations, with consequent applications to personal identification.

  6. Reagent-free bacterial identification using multivariate analysis of transmission spectra

    NASA Astrophysics Data System (ADS)

    Smith, Jennifer M.; Huffman, Debra E.; Acosta, Dayanis; Serebrennikova, Yulia; García-Rubio, Luis; Leparc, German F.

    2012-10-01

    The identification of bacterial pathogens from culture is critical to the proper administration of antibiotics and patient treatment. Many of the tests currently used in the clinical microbiology laboratory for bacterial identification today can be highly sensitive and specific; however, they have the additional burdens of complexity, cost, and the need for specialized reagents. We present an innovative, reagent-free method for the identification of pathogens from culture. A clinical study has been initiated to evaluate the sensitivity and specificity of this approach. Multiwavelength transmission spectra were generated from a set of clinical isolates including Escherichia coli, Klebsiella pneumoniae, Pseudomonas aeruginosa, and Staphylococcus aureus. Spectra of an initial training set of these target organisms were used to create identification models representing the spectral variability of each species using multivariate statistical techniques. Next, the spectra of the blinded isolates of targeted species were identified using the model achieving >94% sensitivity and >98% specificity, with 100% accuracy for P. aeruginosa and S. aureus. The results from this on-going clinical study indicate this approach is a powerful and exciting technique for identification of pathogens. The menu of models is being expanded to include other bacterial genera and species of clinical significance.

  7. Iterative LQG Controller Design Through Closed-Loop Identification

    NASA Technical Reports Server (NTRS)

    Hsiao, Min-Hung; Huang, Jen-Kuang; Cox, David E.

    1996-01-01

    This paper presents an iterative Linear Quadratic Gaussian (LQG) controller design approach for a linear stochastic system with an uncertain open-loop model and unknown noise statistics. This approach consists of closed-loop identification and controller redesign cycles. In each cycle, the closed-loop identification method is used to identify an open-loop model and a steady-state Kalman filter gain from closed-loop input/output test data obtained by using a feedback LQG controller designed from the previous cycle. Then the identified open-loop model is used to redesign the state feedback. The state feedback and the identified Kalman filter gain are used to form an updated LQC controller for the next cycle. This iterative process continues until the updated controller converges. The proposed controller design is demonstrated by numerical simulations and experiments on a highly unstable large-gap magnetic suspension system.

  8. Development of statistical models to forecast crossing times of commercial vehicles.

    DOT National Transportation Integrated Search

    2011-07-01

    Border crossing time measurement systems for commercial vehicles are being implemented throughout : the U.S.-Mexico border. These systems are based on radio frequency identification (RFID) technology. : With funding from the Federal Highway Administr...

  9. About the Environmental Public Health Division (EPHD) of EPA's National Health and Environmental Effects Research Laboratory

    EPA Pesticide Factsheets

    The EPHD performs integrated epidemiological, clinical, animal and cellular biological research and statistical modeling to provide the scientific foundation in support of hazard identification, risk assessment, and standard setting.

  10. Statistical modeling of natural backgrounds in hyperspectral LWIR data

    NASA Astrophysics Data System (ADS)

    Truslow, Eric; Manolakis, Dimitris; Cooley, Thomas; Meola, Joseph

    2016-09-01

    Hyperspectral sensors operating in the long wave infrared (LWIR) have a wealth of applications including remote material identification and rare target detection. While statistical models for modeling surface reflectance in visible and near-infrared regimes have been well studied, models for the temperature and emissivity in the LWIR have not been rigorously investigated. In this paper, we investigate modeling hyperspectral LWIR data using a statistical mixture model for the emissivity and surface temperature. Statistical models for the surface parameters can be used to simulate surface radiances and at-sensor radiance which drives the variability of measured radiance and ultimately the performance of signal processing algorithms. Thus, having models that adequately capture data variation is extremely important for studying performance trades. The purpose of this paper is twofold. First, we study the validity of this model using real hyperspectral data, and compare the relative variability of hyperspectral data in the LWIR and visible and near-infrared (VNIR) regimes. Second, we illustrate how materials that are easily distinguished in the VNIR, may be difficult to separate when imaged in the LWIR.

  11. Evaluating the risk of patient re-identification from adverse drug event reports

    PubMed Central

    2013-01-01

    Background Our objective was to develop a model for measuring re-identification risk that more closely mimics the behaviour of an adversary by accounting for repeated attempts at matching and verification of matches, and apply it to evaluate the risk of re-identification for Canada’s post-marketing adverse drug event database (ADE).Re-identification is only demonstrably plausible for deaths in ADE. A matching experiment between ADE records and virtual obituaries constructed from Statistics Canada vital statistics was simulated. A new re-identification risk is considered, it assumes that after gathering all the potential matches for a patient record (all records in the obituaries that are potential matches for an ADE record), an adversary tries to verify these potential matches. Two adversary scenarios were considered: (a) a mildly motivated adversary who will stop after one verification attempt, and (b) a highly motivated adversary who will attempt to verify all the potential matches and is only limited by practical or financial considerations. Methods The mean percentage of records in ADE that had a high probability of being re-identified was computed. Results Under scenario (a), the risk of re-identification from disclosing the province, age at death, gender, and exact date of the report is quite high, but the removal of province brings down the risk significantly. By only generalizing the date of reporting to month and year and including all other variables, the risk is always low. All ADE records have a high risk of re-identification under scenario (b), but the plausibility of that scenario is limited because of the financial and practical deterrent even for highly motivated adversaries. Conclusions It is possible to disclose Canada’s adverse drug event database while ensuring that plausible re-identification risks are acceptably low. Our new re-identification risk model is suitable for such risk assessments. PMID:24094134

  12. Camera-Model Identification Using Markovian Transition Probability Matrix

    NASA Astrophysics Data System (ADS)

    Xu, Guanshuo; Gao, Shang; Shi, Yun Qing; Hu, Ruimin; Su, Wei

    Detecting the (brands and) models of digital cameras from given digital images has become a popular research topic in the field of digital forensics. As most of images are JPEG compressed before they are output from cameras, we propose to use an effective image statistical model to characterize the difference JPEG 2-D arrays of Y and Cb components from the JPEG images taken by various camera models. Specifically, the transition probability matrices derived from four different directional Markov processes applied to the image difference JPEG 2-D arrays are used to identify statistical difference caused by image formation pipelines inside different camera models. All elements of the transition probability matrices, after a thresholding technique, are directly used as features for classification purpose. Multi-class support vector machines (SVM) are used as the classification tool. The effectiveness of our proposed statistical model is demonstrated by large-scale experimental results.

  13. Experimental design and data analysis of Ago-RIP-Seq experiments for the identification of microRNA targets.

    PubMed

    Tichy, Diana; Pickl, Julia Maria Anna; Benner, Axel; Sültmann, Holger

    2017-03-31

    The identification of microRNA (miRNA) target genes is crucial for understanding miRNA function. Many methods for the genome-wide miRNA target identification have been developed in recent years; however, they have several limitations including the dependence on low-confident prediction programs and artificial miRNA manipulations. Ago-RNA immunoprecipitation combined with high-throughput sequencing (Ago-RIP-Seq) is a promising alternative. However, appropriate statistical data analysis algorithms taking into account the experimental design and the inherent noise of such experiments are largely lacking.Here, we investigate the experimental design for Ago-RIP-Seq and examine biostatistical methods to identify de novo miRNA target genes. Statistical approaches considered are either based on a negative binomial model fit to the read count data or applied to transformed data using a normal distribution-based generalized linear model. We compare them by a real data simulation study using plasmode data sets and evaluate the suitability of the approaches to detect true miRNA targets by sensitivity and false discovery rates. Our results suggest that simple approaches like linear regression models on (appropriately) transformed read count data are preferable. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  14. Evaluating Video Self-Modeling Treatment Outcomes: Differentiating between Statistically and Clinically Significant Change

    ERIC Educational Resources Information Center

    La Spata, Michelle G.; Carter, Christopher W.; Johnson, Wendi L.; McGill, Ryan J.

    2016-01-01

    The present study examined the utility of video self-modeling (VSM) for reducing externalizing behaviors (e.g., aggression, conduct problems, hyperactivity, and impulsivity) observed within the classroom environment. After identification of relevant target behaviors, VSM interventions were developed for first and second grade students (N = 4),…

  15. Confounding factors in determining causal soil moisture-precipitation feedback

    NASA Astrophysics Data System (ADS)

    Tuttle, Samuel E.; Salvucci, Guido D.

    2017-07-01

    Identification of causal links in the land-atmosphere system is important for construction and testing of land surface and general circulation models. However, the land and atmosphere are highly coupled and linked by a vast number of complex, interdependent processes. Statistical methods, such as Granger causality, can help to identify feedbacks from observational data, independent of the different parameterizations of physical processes and spatiotemporal resolution effects that influence feedbacks in models. However, statistical causal identification methods can easily be misapplied, leading to erroneous conclusions about feedback strength and sign. Here, we discuss three factors that must be accounted for in determination of causal soil moisture-precipitation feedback in observations and model output: seasonal and interannual variability, precipitation persistence, and endogeneity. The effect of neglecting these factors is demonstrated in simulated and observational data. The results show that long-timescale variability and precipitation persistence can have a substantial effect on detected soil moisture-precipitation feedback strength, while endogeneity has a smaller effect that is often masked by measurement error and thus is more likely to be an issue when analyzing model data or highly accurate observational data.

  16. MEASURE: An integrated data-analysis and model identification facility

    NASA Technical Reports Server (NTRS)

    Singh, Jaidip; Iyer, Ravi K.

    1990-01-01

    The first phase of the development of MEASURE, an integrated data analysis and model identification facility is described. The facility takes system activity data as input and produces as output representative behavioral models of the system in near real time. In addition a wide range of statistical characteristics of the measured system are also available. The usage of the system is illustrated on data collected via software instrumentation of a network of SUN workstations at the University of Illinois. Initially, statistical clustering is used to identify high density regions of resource-usage in a given environment. The identified regions form the states for building a state-transition model to evaluate system and program performance in real time. The model is then solved to obtain useful parameters such as the response-time distribution and the mean waiting time in each state. A graphical interface which displays the identified models and their characteristics (with real time updates) was also developed. The results provide an understanding of the resource-usage in the system under various workload conditions. This work is targeted for a testbed of UNIX workstations with the initial phase ported to SUN workstations on the NASA, Ames Research Center Advanced Automation Testbed.

  17. Statistical genetics concepts and approaches in schizophrenia and related neuropsychiatric research.

    PubMed

    Schork, Nicholas J; Greenwood, Tiffany A; Braff, David L

    2007-01-01

    Statistical genetics is a research field that focuses on mathematical models and statistical inference methodologies that relate genetic variations (ie, naturally occurring human DNA sequence variations or "polymorphisms") to particular traits or diseases (phenotypes) usually from data collected on large samples of families or individuals. The ultimate goal of such analysis is the identification of genes and genetic variations that influence disease susceptibility. Although of extreme interest and importance, the fact that many genes and environmental factors contribute to neuropsychiatric diseases of public health importance (eg, schizophrenia, bipolar disorder, and depression) complicates relevant studies and suggests that very sophisticated mathematical and statistical modeling may be required. In addition, large-scale contemporary human DNA sequencing and related projects, such as the Human Genome Project and the International HapMap Project, as well as the development of high-throughput DNA sequencing and genotyping technologies have provided statistical geneticists with a great deal of very relevant and appropriate information and resources. Unfortunately, the use of these resources and their interpretation are not straightforward when applied to complex, multifactorial diseases such as schizophrenia. In this brief and largely nonmathematical review of the field of statistical genetics, we describe many of the main concepts, definitions, and issues that motivate contemporary research. We also provide a discussion of the most pressing contemporary problems that demand further research if progress is to be made in the identification of genes and genetic variations that predispose to complex neuropsychiatric diseases.

  18. Identification of natural images and computer-generated graphics based on statistical and textural features.

    PubMed

    Peng, Fei; Li, Jiao-ting; Long, Min

    2015-03-01

    To discriminate the acquisition pipelines of digital images, a novel scheme for the identification of natural images and computer-generated graphics is proposed based on statistical and textural features. First, the differences between them are investigated from the view of statistics and texture, and 31 dimensions of feature are acquired for identification. Then, LIBSVM is used for the classification. Finally, the experimental results are presented. The results show that it can achieve an identification accuracy of 97.89% for computer-generated graphics, and an identification accuracy of 97.75% for natural images. The analyses also demonstrate the proposed method has excellent performance, compared with some existing methods based only on statistical features or other features. The method has a great potential to be implemented for the identification of natural images and computer-generated graphics. © 2014 American Academy of Forensic Sciences.

  19. Multifactor-Dimensionality Reduction Reveals High-Order Interactions among Estrogen-Metabolism Genes in Sporadic Breast Cancer

    PubMed Central

    Ritchie, Marylyn D.; Hahn, Lance W.; Roodi, Nady; Bailey, L. Renee; Dupont, William D.; Parl, Fritz F.; Moore, Jason H.

    2001-01-01

    One of the greatest challenges facing human geneticists is the identification and characterization of susceptibility genes for common complex multifactorial human diseases. This challenge is partly due to the limitations of parametric-statistical methods for detection of gene effects that are dependent solely or partially on interactions with other genes and with environmental exposures. We introduce multifactor-dimensionality reduction (MDR) as a method for reducing the dimensionality of multilocus information, to improve the identification of polymorphism combinations associated with disease risk. The MDR method is nonparametric (i.e., no hypothesis about the value of a statistical parameter is made), is model-free (i.e., it assumes no particular inheritance model), and is directly applicable to case-control and discordant-sib-pair studies. Using simulated case-control data, we demonstrate that MDR has reasonable power to identify interactions among two or more loci in relatively small samples. When it was applied to a sporadic breast cancer case-control data set, in the absence of any statistically significant independent main effects, MDR identified a statistically significant high-order interaction among four polymorphisms from three different estrogen-metabolism genes. To our knowledge, this is the first report of a four-locus interaction associated with a common complex multifactorial disease. PMID:11404819

  20. Baseline Estimation and Outlier Identification for Halocarbons

    NASA Astrophysics Data System (ADS)

    Wang, D.; Schuck, T.; Engel, A.; Gallman, F.

    2017-12-01

    The aim of this paper is to build a baseline model for halocarbons and to statistically identify the outliers under specific conditions. In this paper, time series of regional CFC-11 and Chloromethane measurements was discussed, which taken over the last 4 years at two locations, including a monitoring station at northwest of Frankfurt am Main (Germany) and Mace Head station (Ireland). In addition to analyzing time series of CFC-11 and Chloromethane, more importantly, a statistical approach of outlier identification is also introduced in this paper in order to make a better estimation of baseline. A second-order polynomial plus harmonics are fitted to CFC-11 and chloromethane mixing ratios data. Measurements with large distance to the fitting curve are regard as outliers and flagged. Under specific requirement, the routine is iteratively adopted without the flagged measurements until no additional outliers are found. Both model fitting and the proposed outlier identification method are realized with the help of a programming language, Python. During the period, CFC-11 shows a gradual downward trend. And there is a slightly upward trend in the mixing ratios of Chloromethane. The concentration of chloromethane also has a strong seasonal variation, mostly due to the seasonal cycle of OH. The usage of this statistical method has a considerable effect on the results. This method efficiently identifies a series of outliers according to the standard deviation requirements. After removing the outliers, the fitting curves and trend estimates are more reliable.

  1. Universal Algorithm for Identification of Fractional Brownian Motion. A Case of Telomere Subdiffusion

    PubMed Central

    Burnecki, Krzysztof; Kepten, Eldad; Janczura, Joanna; Bronshtein, Irena; Garini, Yuval; Weron, Aleksander

    2012-01-01

    We present a systematic statistical analysis of the recently measured individual trajectories of fluorescently labeled telomeres in the nucleus of living human cells. The experiments were performed in the U2OS cancer cell line. We propose an algorithm for identification of the telomere motion. By expanding the previously published data set, we are able to explore the dynamics in six time orders, a task not possible earlier. As a result, we establish a rigorous mathematical characterization of the stochastic process and identify the basic mathematical mechanisms behind the telomere motion. We find that the increments of the motion are stationary, Gaussian, ergodic, and even more chaotic—mixing. Moreover, the obtained memory parameter estimates, as well as the ensemble average mean square displacement reveal subdiffusive behavior at all time spans. All these findings statistically prove a fractional Brownian motion for the telomere trajectories, which is confirmed by a generalized p-variation test. Taking into account the biophysical nature of telomeres as monomers in the chromatin chain, we suggest polymer dynamics as a sufficient framework for their motion with no influence of other models. In addition, these results shed light on other studies of telomere motion and the alternative telomere lengthening mechanism. We hope that identification of these mechanisms will allow the development of a proper physical and biological model for telomere subdynamics. This array of tests can be easily implemented to other data sets to enable quick and accurate analysis of their statistical characteristics. PMID:23199912

  2. Area estimation using multiyear designs and partial crop identification

    NASA Technical Reports Server (NTRS)

    Sielken, R. L., Jr.

    1983-01-01

    Progress is reported for the following areas: (1) estimating the stratum's crop acreage proportion using the multiyear area estimation model; (2) assessment of multiyear sampling designs; and (3) development of statistical methodology for incorporating partially identified sample segments into crop area estimation.

  3. Automated finite element modeling of the lumbar spine: Using a statistical shape model to generate a virtual population of models.

    PubMed

    Campbell, J Q; Petrella, A J

    2016-09-06

    Population-based modeling of the lumbar spine has the potential to be a powerful clinical tool. However, developing a fully parameterized model of the lumbar spine with accurate geometry has remained a challenge. The current study used automated methods for landmark identification to create a statistical shape model of the lumbar spine. The shape model was evaluated using compactness, generalization ability, and specificity. The primary shape modes were analyzed visually, quantitatively, and biomechanically. The biomechanical analysis was performed by using the statistical shape model with an automated method for finite element model generation to create a fully parameterized finite element model of the lumbar spine. Functional finite element models of the mean shape and the extreme shapes (±3 standard deviations) of all 17 shape modes were created demonstrating the robust nature of the methods. This study represents an advancement in finite element modeling of the lumbar spine and will allow population-based modeling in the future. Copyright © 2016 Elsevier Ltd. All rights reserved.

  4. Quantum Biometrics with Retinal Photon Counting

    NASA Astrophysics Data System (ADS)

    Loulakis, M.; Blatsios, G.; Vrettou, C. S.; Kominis, I. K.

    2017-10-01

    It is known that the eye's scotopic photodetectors, rhodopsin molecules, and their associated phototransduction mechanism leading to light perception, are efficient single-photon counters. We here use the photon-counting principles of human rod vision to propose a secure quantum biometric identification based on the quantum-statistical properties of retinal photon detection. The photon path along the human eye until its detection by rod cells is modeled as a filter having a specific transmission coefficient. Precisely determining its value from the photodetection statistics registered by the conscious observer is a quantum parameter estimation problem that leads to a quantum secure identification method. The probabilities for false-positive and false-negative identification of this biometric technique can readily approach 10-10 and 10-4, respectively. The security of the biometric method can be further quantified by the physics of quantum measurements. An impostor must be able to perform quantum thermometry and quantum magnetometry with energy resolution better than 10-9ℏ , in order to foil the device by noninvasively monitoring the biometric activity of a user.

  5. People counting and re-identification using fusion of video camera and laser scanner

    NASA Astrophysics Data System (ADS)

    Ling, Bo; Olivera, Santiago; Wagley, Raj

    2016-05-01

    We present a system for people counting and re-identification. It can be used by transit and homeland security agencies. Under FTA SBIR program, we have developed a preliminary system for transit passenger counting and re-identification using a laser scanner and video camera. The laser scanner is used to identify the locations of passenger's head and shoulder in an image, a challenging task in crowed environment. It can also estimate the passenger height without prior calibration. Various color models have been applied to form color signatures. Finally, using a statistical fusion and classification scheme, passengers are counted and re-identified.

  6. A system identification technique based on the random decrement signatures. Part 2: Experimental results

    NASA Technical Reports Server (NTRS)

    Bedewi, Nabih E.; Yang, Jackson C. S.

    1987-01-01

    Identification of the system parameters of a randomly excited structure may be treated using a variety of statistical techniques. Of all these techniques, the Random Decrement is unique in that it provides the homogeneous component of the system response. Using this quality, a system identification technique was developed based on a least-squares fit of the signatures to estimate the mass, damping, and stiffness matrices of a linear randomly excited system. The results of an experiment conducted on an offshore platform scale model to verify the validity of the technique and to demonstrate its application in damage detection are presented.

  7. Statistics of acoustic emissions and stress drops during granular shearing using a stick-slip fiber bundle mode

    NASA Astrophysics Data System (ADS)

    Cohen, D.; Michlmayr, G.; Or, D.

    2012-04-01

    Shearing of dense granular materials appears in many engineering and Earth sciences applications. Under a constant strain rate, the shearing stress at steady state oscillates with slow rises followed by rapid drops that are linked to the build up and failure of force chains. Experiments indicate that these drops display exponential statistics. Measurements of acoustic emissions during shearing indicates that the energy liberated by failure of these force chains has power-law statistics. Representing force chains as fibers, we use a stick-slip fiber bundle model to obtain analytical solutions of the statistical distribution of stress drops and failure energy. In the model, fibers stretch, fail, and regain strength during deformation. Fibers have Weibull-distributed threshold strengths with either quenched and annealed disorder. The shape of the distribution for drops and energy obtained from the model are similar to those measured during shearing experiments. This simple model may be useful to identify failure events linked to force chain failures. Future generalizations of the model that include different types of fiber failure may also allow identification of different types of granular failures that have distinct statistical acoustic emission signatures.

  8. Structural damage detection based on stochastic subspace identification and statistical pattern recognition: I. Theory

    NASA Astrophysics Data System (ADS)

    Ren, W. X.; Lin, Y. Q.; Fang, S. E.

    2011-11-01

    One of the key issues in vibration-based structural health monitoring is to extract the damage-sensitive but environment-insensitive features from sampled dynamic response measurements and to carry out the statistical analysis of these features for structural damage detection. A new damage feature is proposed in this paper by using the system matrices of the forward innovation model based on the covariance-driven stochastic subspace identification of a vibrating system. To overcome the variations of the system matrices, a non-singularity transposition matrix is introduced so that the system matrices are normalized to their standard forms. For reducing the effects of modeling errors, noise and environmental variations on measured structural responses, a statistical pattern recognition paradigm is incorporated into the proposed method. The Mahalanobis and Euclidean distance decision functions of the damage feature vector are adopted by defining a statistics-based damage index. The proposed structural damage detection method is verified against one numerical signal and two numerical beams. It is demonstrated that the proposed statistics-based damage index is sensitive to damage and shows some robustness to the noise and false estimation of the system ranks. The method is capable of locating damage of the beam structures under different types of excitations. The robustness of the proposed damage detection method to the variations in environmental temperature is further validated in a companion paper by a reinforced concrete beam tested in the laboratory and a full-scale arch bridge tested in the field.

  9. Animal models of addiction

    PubMed Central

    Spanagel, Rainer

    2017-01-01

    In recent years, animal models in psychiatric research have been criticized for their limited translational value to the clinical situation. Failures in clinical trials have thus often been attributed to the lack of predictive power of preclinical animal models. Here, I argue that animal models of voluntary drug intake—under nonoperant and operant conditions—and addiction models based on the Diagnostic and Statistical Manual of Mental Disorders are crucial and informative tools for the identification of pathological mechanisms, target identification, and drug development. These models provide excellent face validity, and it is assumed that the neurochemical and neuroanatomical substrates involved in drug-intake behavior are similar in laboratory rodents and humans. Consequently, animal models of drug consumption and addiction provide predictive validity. This predictive power is best illustrated in alcohol research, in which three approved medications—acamprosate, naltrexone, and nalmefene—were developed by means of animal models and then successfully translated into the clinical situation. PMID:29302222

  10. Sensitivity analysis, calibration, and testing of a distributed hydrological model using error‐based weighting and one objective function

    USGS Publications Warehouse

    Foglia, L.; Hill, Mary C.; Mehl, Steffen W.; Burlando, P.

    2009-01-01

    We evaluate the utility of three interrelated means of using data to calibrate the fully distributed rainfall‐runoff model TOPKAPI as applied to the Maggia Valley drainage area in Switzerland. The use of error‐based weighting of observation and prior information data, local sensitivity analysis, and single‐objective function nonlinear regression provides quantitative evaluation of sensitivity of the 35 model parameters to the data, identification of data types most important to the calibration, and identification of correlations among parameters that contribute to nonuniqueness. Sensitivity analysis required only 71 model runs, and regression required about 50 model runs. The approach presented appears to be ideal for evaluation of models with long run times or as a preliminary step to more computationally demanding methods. The statistics used include composite scaled sensitivities, parameter correlation coefficients, leverage, Cook's D, and DFBETAS. Tests suggest predictive ability of the calibrated model typical of hydrologic models.

  11. Forecasting runout of rock and debris avalanches

    USGS Publications Warehouse

    Iverson, Richard M.; Evans, S.G.; Mugnozza, G.S.; Strom, A.; Hermanns, R.L.

    2006-01-01

    Physically based mathematical models and statistically based empirical equations each may provide useful means of forecasting runout of rock and debris avalanches. This paper compares the foundations, strengths, and limitations of a physically based model and a statistically based forecasting method, both of which were developed to predict runout across three-dimensional topography. The chief advantage of the physically based model results from its ties to physical conservation laws and well-tested axioms of soil and rock mechanics, such as the Coulomb friction rule and effective-stress principle. The output of this model provides detailed information about the dynamics of avalanche runout, at the expense of high demands for accurate input data, numerical computation, and experimental testing. In comparison, the statistical method requires relatively modest computation and no input data except identification of prospective avalanche source areas and a range of postulated avalanche volumes. Like the physically based model, the statistical method yields maps of predicted runout, but it provides no information on runout dynamics. Although the two methods differ significantly in their structure and objectives, insights gained from one method can aid refinement of the other.

  12. The non-trusty clown attack on model-based speaker recognition systems

    NASA Astrophysics Data System (ADS)

    Farrokh Baroughi, Alireza; Craver, Scott

    2015-03-01

    Biometric detectors for speaker identification commonly employ a statistical model for a subject's voice, such as a Gaussian Mixture Model, that combines multiple means to improve detector performance. This allows a malicious insider to amend or append a component of a subject's statistical model so that a detector behaves normally except under a carefully engineered circumstance. This allows an attacker to force a misclassification of his or her voice only when desired, by smuggling data into a database far in advance of an attack. Note that the attack is possible if attacker has access to database even for a limited time to modify victim's model. We exhibit such an attack on a speaker identification, in which an attacker can force a misclassification by speaking in an unusual voice, and replacing the least weighted component of victim's model by the most weighted competent of the unusual voice of the attacker's model. The reason attacker make his or her voice unusual during the attack is because his or her normal voice model can be in database, and by attacking with unusual voice, the attacker has the option to be recognized as himself or herself when talking normally or as the victim when talking in the unusual manner. By attaching an appropriately weighted vector to a victim's model, we can impersonate all users in our simulations, while avoiding unwanted false rejections.

  13. Simulation of target interpretation based on infrared image features and psychology principle

    NASA Astrophysics Data System (ADS)

    Lin, Wei; Chen, Yu-hua; Gao, Hong-sheng; Wang, Zhan-feng; Wang, Ji-jun; Su, Rong-hua; Huang, Yan-ping

    2009-07-01

    It's an important and complicated process in target interpretation that target features extraction and identification, which effect psychosensorial quantity of interpretation person to target infrared image directly, and decide target viability finally. Using statistical decision theory and psychology principle, designing four psychophysical experiment, the interpretation model of the infrared target is established. The model can get target detection probability by calculating four features similarity degree between target region and background region, which were plotted out on the infrared image. With the verification of a great deal target interpretation in practice, the model can simulate target interpretation and detection process effectively, get the result of target interpretation impersonality, which can provide technique support for target extraction, identification and decision-making.

  14. Real-Time Identification of Wheel Terrain Interaction Models for Enhanced Autonomous Vehicle Mobility

    DTIC Science & Technology

    2014-04-24

    tim at io n Er ro r ( cm ) 0 2 4 6 8 10 Color Statistics Angelova...Color_Statistics_Error) / Average_Slip_Error Position Estimation Error: Global Pose Po si tio n Es tim at io n Er ro r ( cm ) 0 2 4 6 8 10 12 Color...get some kind of clearance for releasing pose and odometry data) collected at the following sites – Taylor, Gascola, Somerset, Fort Bliss and

  15. Characterization of palmprints by wavelet signatures via directional context modeling.

    PubMed

    Zhang, Lei; Zhang, David

    2004-06-01

    The palmprint is one of the most reliable physiological characteristics that can be used to distinguish between individuals. Current palmprint-based systems are more user friendly, more cost effective, and require fewer data signatures than traditional fingerprint-based identification systems. The principal lines and wrinkles captured in a low-resolution palmprint image provide more than enough information to uniquely identify an individual. This paper presents a palmprint identification scheme that characterizes a palmprint using a set of statistical signatures. The palmprint is first transformed into the wavelet domain, and the directional context of each wavelet subband is defined and computed in order to collect the predominant coefficients of its principal lines and wrinkles. A set of statistical signatures, which includes gravity center, density, spatial dispersivity and energy, is then defined to characterize the palmprint with the selected directional context values. A classification and identification scheme based on these signatures is subsequently developed. This scheme exploits the features of principal lines and prominent wrinkles sufficiently and achieves satisfactory results. Compared with the line-segments-matching or interesting-points-matching based palmprint verification schemes, the proposed scheme uses a much smaller amount of data signatures. It also provides a convenient classification strategy and more accurate identification.

  16. Modeling, estimation and identification methods for static shape determination of flexible structures. [for large space structure design

    NASA Technical Reports Server (NTRS)

    Rodriguez, G.; Scheid, R. E., Jr.

    1986-01-01

    This paper outlines methods for modeling, identification and estimation for static determination of flexible structures. The shape estimation schemes are based on structural models specified by (possibly interconnected) elliptic partial differential equations. The identification techniques provide approximate knowledge of parameters in elliptic systems. The techniques are based on the method of maximum-likelihood that finds parameter values such that the likelihood functional associated with the system model is maximized. The estimation methods are obtained by means of a function-space approach that seeks to obtain the conditional mean of the state given the data and a white noise characterization of model errors. The solutions are obtained in a batch-processing mode in which all the data is processed simultaneously. After methods for computing the optimal estimates are developed, an analysis of the second-order statistics of the estimates and of the related estimation error is conducted. In addition to outlining the above theoretical results, the paper presents typical flexible structure simulations illustrating performance of the shape determination methods.

  17. Detection of Erroneous Payments Utilizing Supervised And Unsupervised Data Mining Techniques

    DTIC Science & Technology

    2004-09-01

    will look at which statistical analysis technique will work best in developing and enhancing existing erroneous payment models . Chapter I and II... payment models that are used for selection of records to be audited. The models are set up such that if two or more records have the same payment...Identification Number, Invoice Number and Delivery Order Number are not compared. The DM0102 Duplicate Payment Model will be analyzed in this thesis

  18. Construction and identification of a D-Vine model applied to the probability distribution of modal parameters in structural dynamics

    NASA Astrophysics Data System (ADS)

    Dubreuil, S.; Salaün, M.; Rodriguez, E.; Petitjean, F.

    2018-01-01

    This study investigates the construction and identification of the probability distribution of random modal parameters (natural frequencies and effective parameters) in structural dynamics. As these parameters present various types of dependence structures, the retained approach is based on pair copula construction (PCC). A literature review leads us to choose a D-Vine model for the construction of modal parameters probability distributions. Identification of this model is based on likelihood maximization which makes it sensitive to the dimension of the distribution, namely the number of considered modes in our context. To this respect, a mode selection preprocessing step is proposed. It allows the selection of the relevant random modes for a given transfer function. The second point, addressed in this study, concerns the choice of the D-Vine model. Indeed, D-Vine model is not uniquely defined. Two strategies are proposed and compared. The first one is based on the context of the study whereas the second one is purely based on statistical considerations. Finally, the proposed approaches are numerically studied and compared with respect to their capabilities, first in the identification of the probability distribution of random modal parameters and second in the estimation of the 99 % quantiles of some transfer functions.

  19. Identified state-space prediction model for aero-optical wavefronts

    NASA Astrophysics Data System (ADS)

    Faghihi, Azin; Tesch, Jonathan; Gibson, Steve

    2013-07-01

    A state-space disturbance model and associated prediction filter for aero-optical wavefronts are described. The model is computed by system identification from a sequence of wavefronts measured in an airborne laboratory. Estimates of the statistics and flow velocity of the wavefront data are shown and can be computed from the matrices in the state-space model without returning to the original data. Numerical results compare velocity values and power spectra computed from the identified state-space model with those computed from the aero-optical data.

  20. Identification of Mobile Phones Using the Built-In Magnetometers Stimulated by Motion Patterns.

    PubMed

    Baldini, Gianmarco; Dimc, Franc; Kamnik, Roman; Steri, Gary; Giuliani, Raimondo; Gentile, Claudio

    2017-04-06

    We investigate the identification of mobile phones through their built-in magnetometers. These electronic components have started to be widely deployed in mass market phones in recent years, and they can be exploited to uniquely identify mobile phones due their physical differences, which appear in the digital output generated by them. This is similar to approaches reported in the literature for other components of the mobile phone, including the digital camera, the microphones or their RF transmission components. In this paper, the identification is performed through an inexpensive device made up of a platform that rotates the mobile phone under test and a fixed magnet positioned on the edge of the rotating platform. When the mobile phone passes in front of the fixed magnet, the built-in magnetometer is stimulated, and its digital output is recorded and analyzed. For each mobile phone, the experiment is repeated over six different days to ensure consistency in the results. A total of 10 phones of different brands and models or of the same model were used in our experiment. The digital output from the magnetometers is synchronized and correlated, and statistical features are extracted to generate a fingerprint of the built-in magnetometer and, consequently, of the mobile phone. A SVM machine learning algorithm is used to classify the mobile phones on the basis of the extracted statistical features. Our results show that inter-model classification (i.e., different models and brands classification) is possible with great accuracy, but intra-model (i.e., phones with different serial numbers and same model) classification is more challenging, the resulting accuracy being just slightly above random choice.

  1. Identification of Mobile Phones Using the Built-In Magnetometers Stimulated by Motion Patterns

    PubMed Central

    Baldini, Gianmarco; Dimc, Franc; Kamnik, Roman; Steri, Gary; Giuliani, Raimondo; Gentile, Claudio

    2017-01-01

    We investigate the identification of mobile phones through their built-in magnetometers. These electronic components have started to be widely deployed in mass market phones in recent years, and they can be exploited to uniquely identify mobile phones due their physical differences, which appear in the digital output generated by them. This is similar to approaches reported in the literature for other components of the mobile phone, including the digital camera, the microphones or their RF transmission components. In this paper, the identification is performed through an inexpensive device made up of a platform that rotates the mobile phone under test and a fixed magnet positioned on the edge of the rotating platform. When the mobile phone passes in front of the fixed magnet, the built-in magnetometer is stimulated, and its digital output is recorded and analyzed. For each mobile phone, the experiment is repeated over six different days to ensure consistency in the results. A total of 10 phones of different brands and models or of the same model were used in our experiment. The digital output from the magnetometers is synchronized and correlated, and statistical features are extracted to generate a fingerprint of the built-in magnetometer and, consequently, of the mobile phone. A SVM machine learning algorithm is used to classify the mobile phones on the basis of the extracted statistical features. Our results show that inter-model classification (i.e., different models and brands classification) is possible with great accuracy, but intra-model (i.e., phones with different serial numbers and same model) classification is more challenging, the resulting accuracy being just slightly above random choice. PMID:28383482

  2. Breast cancer lymphoscintigraphy: Factors associated with sentinel lymph node non visualization.

    PubMed

    Vaz, S C; Silva, Â; Sousa, R; Ferreira, T C; Esteves, S; Carvalho, I P; Ratão, P; Daniel, A; Salgado, L

    2015-01-01

    To evaluate factors associated with non identification of the sentinel lymph node (SLN) in lymphoscintigraphy of breast cancer patients and analyze the relationship with SLN metastases. A single-center, cross-sectional and retrospective study was performed. Forty patients with lymphoscintigraphy without sentinel lymph node identification (negative lymphoscintigraphy - NL) were enrolled. The control group included 184 patients with SLN identification (positive lymphoscintigraphy - PL). Evaluated factors were age, body mass index (BMI), tumor size, histology, localization, preoperative breast lesion hookwire (harpoon) marking and SLN metastases. The statistical analysis was performed with uni- and multivariate logistic regression models and matched-pairs analysis. Age (p=0.036) or having BMI (p=0.047) were the only factors significantly associated with NL. Being ≥60 years with a BMI ≥30 increased the odds of having a NL 2 and 3.8 times, respectively. Marking with hookwire seems to increase the likelihood of NL, but demonstrated statistical significance is lacking (p=0.087). The other tested variables did not affect the examination result. When controlling for age, BMI and marking with the harpoon, a significant association between lymph node metastization and NL was not found (p=0.565). The most important factors related with non identification of SLN in the patients were age, BMI and marking with hook wire. However, only the first two had statistical importance. When these variables were controlled, no association was found between NL and axillary metastases. Copyright © 2015 Elsevier España, S.L.U. and SEMNIM. All rights reserved.

  3. Applying Rasch model analysis in the development of the cantonese tone identification test (CANTIT).

    PubMed

    Lee, Kathy Y S; Lam, Joffee H S; Chan, Kit T Y; van Hasselt, Charles Andrew; Tong, Michael C F

    2017-01-01

    Applying Rasch analysis to evaluate the internal structure of a lexical tone perception test known as the Cantonese Tone Identification Test (CANTIT). A 75-item pool (CANTIT-75) with pictures and sound tracks was developed. Respondents were required to make a four-alternative forced choice on each item. A short version of 30 items (CANTIT-30) was developed based on fit statistics, difficulty estimates, and content evaluation. Internal structure was evaluated by fit statistics and Rasch Factor Analysis (RFA). 200 children with normal hearing and 141 children with hearing impairment were recruited. For CANTIT-75, all infit and 97% of outfit values were < 2.0. RFA revealed 40.1% of total variance was explained by the Rasch measure. The first residual component explained 2.5% of total variance in an eigenvalue of 3.1. For CANTIT-30, all infit and outfit values were < 2.0. The Rasch measure explained 38.8% of total variance, the first residual component explained 3.9% of total variance in an eigenvalue of 1.9. The Rasch model provides excellent guidance for the development of short forms. Both CANTIT-75 and CANTIT-30 possess satisfactory internal structure as a construct validity evidence in measuring the lexical tone identification ability of the Cantonese speakers.

  4. Frequency Response Function Based Damage Identification for Aerospace Structures

    NASA Astrophysics Data System (ADS)

    Oliver, Joseph Acton

    Structural health monitoring technologies continue to be pursued for aerospace structures in the interests of increased safety and, when combined with health prognosis, efficiency in life-cycle management. The current dissertation develops and validates damage identification technology as a critical component for structural health monitoring of aerospace structures and, in particular, composite unmanned aerial vehicles. The primary innovation is a statistical least-squares damage identification algorithm based in concepts of parameter estimation and model update. The algorithm uses frequency response function based residual force vectors derived from distributed vibration measurements to update a structural finite element model through statistically weighted least-squares minimization producing location and quantification of the damage, estimation uncertainty, and an updated model. Advantages compared to other approaches include robust applicability to systems which are heavily damped, large, and noisy, with a relatively low number of distributed measurement points compared to the number of analytical degrees-of-freedom of an associated analytical structural model (e.g., modal finite element model). Motivation, research objectives, and a dissertation summary are discussed in Chapter 1 followed by a literature review in Chapter 2. Chapter 3 gives background theory and the damage identification algorithm derivation followed by a study of fundamental algorithm behavior on a two degree-of-freedom mass-spring system with generalized damping. Chapter 4 investigates the impact of noise then successfully proves the algorithm against competing methods using an analytical eight degree-of-freedom mass-spring system with non-proportional structural damping. Chapter 5 extends use of the algorithm to finite element models, including solutions for numerical issues, approaches for modeling damping approximately in reduced coordinates, and analytical validation using a composite sandwich plate model. Chapter 6 presents the final extension to experimental systems-including methods for initial baseline correlation and data reduction-and validates the algorithm on an experimental composite plate with impact damage. The final chapter deviates from development and validation of the primary algorithm to discuss development of an experimental scaled-wing test bed as part of a collaborative effort for developing structural health monitoring and prognosis technology. The dissertation concludes with an overview of technical conclusions and recommendations for future work.

  5. Universal algorithm for identification of fractional Brownian motion. A case of telomere subdiffusion.

    PubMed

    Burnecki, Krzysztof; Kepten, Eldad; Janczura, Joanna; Bronshtein, Irena; Garini, Yuval; Weron, Aleksander

    2012-11-07

    We present a systematic statistical analysis of the recently measured individual trajectories of fluorescently labeled telomeres in the nucleus of living human cells. The experiments were performed in the U2OS cancer cell line. We propose an algorithm for identification of the telomere motion. By expanding the previously published data set, we are able to explore the dynamics in six time orders, a task not possible earlier. As a result, we establish a rigorous mathematical characterization of the stochastic process and identify the basic mathematical mechanisms behind the telomere motion. We find that the increments of the motion are stationary, Gaussian, ergodic, and even more chaotic--mixing. Moreover, the obtained memory parameter estimates, as well as the ensemble average mean square displacement reveal subdiffusive behavior at all time spans. All these findings statistically prove a fractional Brownian motion for the telomere trajectories, which is confirmed by a generalized p-variation test. Taking into account the biophysical nature of telomeres as monomers in the chromatin chain, we suggest polymer dynamics as a sufficient framework for their motion with no influence of other models. In addition, these results shed light on other studies of telomere motion and the alternative telomere lengthening mechanism. We hope that identification of these mechanisms will allow the development of a proper physical and biological model for telomere subdynamics. This array of tests can be easily implemented to other data sets to enable quick and accurate analysis of their statistical characteristics. Copyright © 2012 Biophysical Society. Published by Elsevier Inc. All rights reserved.

  6. Fast Bayesian approach for modal identification using forced vibration data considering the ambient effect

    NASA Astrophysics Data System (ADS)

    Ni, Yan-Chun; Zhang, Feng-Liang

    2018-05-01

    Modal identification based on vibration response measured from real structures is becoming more popular, especially after benefiting from the great improvement of the measurement technology. The results are reliable to estimate the dynamic performance, which fits the increasing requirement of different design configurations of the new structures. However, the high-quality vibration data collection technology calls for a more accurate modal identification method to improve the accuracy of the results. Through the whole measurement process of dynamic testing, there are many aspects that will cause the rise of uncertainty, such as measurement noise, alignment error and modeling error, since the test conditions are not directly controlled. Depending on these demands, a Bayesian statistical approach is developed in this work to estimate the modal parameters using the forced vibration response of structures, simultaneously considering the effect of the ambient vibration. This method makes use of the Fast Fourier Transform (FFT) of the data in a selected frequency band to identify the modal parameters of the mode dominating this frequency band and estimate the remaining uncertainty of the parameters correspondingly. In the existing modal identification methods for forced vibration, it is generally assumed that the forced vibration response dominates the measurement data and the influence of the ambient vibration response is ignored. However, ambient vibration will cause modeling error and affect the accuracy of the identified results. The influence is shown in the spectra as some phenomena that are difficult to explain and irrelevant to the mode to be identified. These issues all mean that careful choice of assumptions in the identification model and fundamental formulation to account for uncertainty are necessary. During the calculation, computational difficulties associated with calculating the posterior statistics are addressed. Finally, a fast computational algorithm is proposed so that the method can be practically implemented. Numerical verification with synthetic data and applicable investigation with full-scale field structures data are all carried out for the proposed method.

  7. Bootstrapping a de-identification system for narrative patient records: cost-performance tradeoffs.

    PubMed

    Hanauer, David; Aberdeen, John; Bayer, Samuel; Wellner, Benjamin; Clark, Cheryl; Zheng, Kai; Hirschman, Lynette

    2013-09-01

    We describe an experiment to build a de-identification system for clinical records using the open source MITRE Identification Scrubber Toolkit (MIST). We quantify the human annotation effort needed to produce a system that de-identifies at high accuracy. Using two types of clinical records (history and physical notes, and social work notes), we iteratively built statistical de-identification models by annotating 10 notes, training a model, applying the model to another 10 notes, correcting the model's output, and training from the resulting larger set of annotated notes. This was repeated for 20 rounds of 10 notes each, and then an additional 6 rounds of 20 notes each, and a final round of 40 notes. At each stage, we measured precision, recall, and F-score, and compared these to the amount of annotation time needed to complete the round. After the initial 10-note round (33min of annotation time) we achieved an F-score of 0.89. After just over 8h of annotation time (round 21) we achieved an F-score of 0.95. Number of annotation actions needed, as well as time needed, decreased in later rounds as model performance improved. Accuracy on history and physical notes exceeded that of social work notes, suggesting that the wider variety and contexts for protected health information (PHI) in social work notes is more difficult to model. It is possible, with modest effort, to build a functioning de-identification system de novo using the MIST framework. The resulting system achieved performance comparable to other high-performing de-identification systems. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.

  8. Unobserved time effects confound the identification of climate change impacts.

    PubMed

    Auffhammer, Maximilian; Vincent, Jeffrey R

    2012-07-24

    A recent study by Feng et al. [Feng S, Krueger A, Oppenheimer M (2010) Proc Natl Acad Sci USA 107:14257-14262] in PNAS reported statistical evidence of a weather-driven causal effect of crop yields on human migration from Mexico to the United States. We show that this conclusion is based on a different statistical model than the one stated in the paper. When we correct for this mistake, there is no evidence of a causal link.

  9. The potential of statistical shape modelling for geometric morphometric analysis of human teeth in archaeological research

    PubMed Central

    Fernee, Christianne; Browne, Martin; Zakrzewski, Sonia

    2017-01-01

    This paper introduces statistical shape modelling (SSM) for use in osteoarchaeology research. SSM is a full field, multi-material analytical technique, and is presented as a supplementary geometric morphometric (GM) tool. Lower mandibular canines from two archaeological populations and one modern population were sampled, digitised using micro-CT, aligned, registered to a baseline and statistically modelled using principal component analysis (PCA). Sample material properties were incorporated as a binary enamel/dentin parameter. Results were assessed qualitatively and quantitatively using anatomical landmarks. Finally, the technique’s application was demonstrated for inter-sample comparison through analysis of the principal component (PC) weights. It was found that SSM could provide high detail qualitative and quantitative insight with respect to archaeological inter- and intra-sample variability. This technique has value for archaeological, biomechanical and forensic applications including identification, finite element analysis (FEA) and reconstruction from partial datasets. PMID:29216199

  10. An Adaptive Technique for a Redundant-Sensor Navigation System. Ph.D. Thesis

    NASA Technical Reports Server (NTRS)

    Chien, T. T.

    1972-01-01

    An on-line adaptive technique is developed to provide a self-contained redundant-sensor navigation system with a capability to utilize its full potentiality in reliability and performance. The gyro navigation system is modeled as a Gauss-Markov process, with degradation modes defined as changes in characteristics specified by parameters associated with the model. The adaptive system is formulated as a multistage stochastic process: (1) a detection system, (2) an identification system and (3) a compensation system. It is shown that the sufficient statistics for the partially observable process in the detection and identification system is the posterior measure of the state of degradation, conditioned on the measurement history.

  11. Time series modeling of human operator dynamics in manual control tasks

    NASA Technical Reports Server (NTRS)

    Biezad, D. J.; Schmidt, D. K.

    1984-01-01

    A time-series technique is presented for identifying the dynamic characteristics of the human operator in manual control tasks from relatively short records of experimental data. Control of system excitation signals used in the identification is not required. The approach is a multi-channel identification technique for modeling multi-input/multi-output situations. The method presented includes statistical tests for validity, is designed for digital computation, and yields estimates for the frequency responses of the human operator. A comprehensive relative power analysis may also be performed for validated models. This method is applied to several sets of experimental data; the results are discussed and shown to compare favorably with previous research findings. New results are also presented for a multi-input task that has not been previously modeled to demonstrate the strengths of the method.

  12. Time Series Modeling of Human Operator Dynamics in Manual Control Tasks

    NASA Technical Reports Server (NTRS)

    Biezad, D. J.; Schmidt, D. K.

    1984-01-01

    A time-series technique is presented for identifying the dynamic characteristics of the human operator in manual control tasks from relatively short records of experimental data. Control of system excitation signals used in the identification is not required. The approach is a multi-channel identification technique for modeling multi-input/multi-output situations. The method presented includes statistical tests for validity, is designed for digital computation, and yields estimates for the frequency response of the human operator. A comprehensive relative power analysis may also be performed for validated models. This method is applied to several sets of experimental data; the results are discussed and shown to compare favorably with previous research findings. New results are also presented for a multi-input task that was previously modeled to demonstrate the strengths of the method.

  13. The Variability of Crater Identification Among Expert and Community Crater Analysts

    NASA Astrophysics Data System (ADS)

    Robbins, S. J.; Antonenko, I.; Kirchoff, M. R.; Chapman, C. R.; Fassett, C. I.; Herrick, R. R.; Singer, K.; Zanetti, M.; Lehan, C.; Huang, D.; Gay, P.

    2014-04-01

    Statistical studies of impact crater populations have been used to model ages of planetary surfaces for several decades [1]. This assumes that crater counts are approximately invariant and a "correct" population will be identified if the analyst is skilled and diligent. However, the reality is that crater identification is somewhat subjective, so variability between analysts, or even a single analyst's variation from day-to-day, is expected [e.g., 2, 3]. This study was undertaken to quantify that variability within an expert analyst population and between experts and minimally trained volunteers.

  14. Large scale landslide susceptibility assessment using the statistical methods of logistic regression and BSA - study case: the sub-basin of the small Niraj (Transylvania Depression, Romania)

    NASA Astrophysics Data System (ADS)

    Roşca, S.; Bilaşco, Ş.; Petrea, D.; Fodorean, I.; Vescan, I.; Filip, S.; Măguţ, F.-L.

    2015-11-01

    The existence of a large number of GIS models for the identification of landslide occurrence probability makes difficult the selection of a specific one. The present study focuses on the application of two quantitative models: the logistic and the BSA models. The comparative analysis of the results aims at identifying the most suitable model. The territory corresponding to the Niraj Mic Basin (87 km2) is an area characterised by a wide variety of the landforms with their morphometric, morphographical and geological characteristics as well as by a high complexity of the land use types where active landslides exist. This is the reason why it represents the test area for applying the two models and for the comparison of the results. The large complexity of input variables is illustrated by 16 factors which were represented as 72 dummy variables, analysed on the basis of their importance within the model structures. The testing of the statistical significance corresponding to each variable reduced the number of dummy variables to 12 which were considered significant for the test area within the logistic model, whereas for the BSA model all the variables were employed. The predictability degree of the models was tested through the identification of the area under the ROC curve which indicated a good accuracy (AUROC = 0.86 for the testing area) and predictability of the logistic model (AUROC = 0.63 for the validation area).

  15. Identification of PARMA Models and Their Application to the Modeling of River flows

    NASA Astrophysics Data System (ADS)

    Tesfaye, Y. G.; Meerschaert, M. M.; Anderson, P. L.

    2004-05-01

    The generation of synthetic river flow samples that can reproduce the essential statistical features of historical river flows is essential to the planning, design and operation of water resource systems. Most river flow series are periodically stationary; that is, their mean and covariance functions are periodic with respect to time. We employ a periodic ARMA (PARMA) model. The innovation algorithm can be used to obtain parameter estimates for PARMA models with finite fourth moment as well as infinite fourth moment but finite variance. Anderson and Meerschaert (2003) provide a method for model identification when the time series has finite fourth moment. This article, an extension of the previous work by Anderson and Meerschaert, demonstrates the effectiveness of the technique using simulated data. An application to monthly flow data for the Frazier River in British Columbia is also included to illustrate the use of these methods.

  16. Development of advanced techniques for rotorcraft state estimation and parameter identification

    NASA Technical Reports Server (NTRS)

    Hall, W. E., Jr.; Bohn, J. G.; Vincent, J. H.

    1980-01-01

    An integrated methodology for rotorcraft system identification consists of rotorcraft mathematical modeling, three distinct data processing steps, and a technique for designing inputs to improve the identifiability of the data. These elements are as follows: (1) a Kalman filter smoother algorithm which estimates states and sensor errors from error corrupted data. Gust time histories and statistics may also be estimated; (2) a model structure estimation algorithm for isolating a model which adequately explains the data; (3) a maximum likelihood algorithm for estimating the parameters and estimates for the variance of these estimates; and (4) an input design algorithm, based on a maximum likelihood approach, which provides inputs to improve the accuracy of parameter estimates. Each step is discussed with examples to both flight and simulated data cases.

  17. Stochastic global identification of a bio-inspired self-sensing composite UAV wing via wind tunnel experiments

    NASA Astrophysics Data System (ADS)

    Kopsaftopoulos, Fotios; Nardari, Raphael; Li, Yu-Hung; Wang, Pengchuan; Chang, Fu-Kuo

    2016-04-01

    In this work, the system design, integration, and wind tunnel experimental evaluation are presented for a bioinspired self-sensing intelligent composite unmanned aerial vehicle (UAV) wing. A total of 148 micro-sensors, including piezoelectric, strain, and temperature sensors, in the form of stretchable sensor networks are embedded in the layup of a composite wing in order to enable its self-sensing capabilities. Novel stochastic system identification techniques based on time series models and statistical parameter estimation are employed in order to accurately interpret the sensing data and extract real-time information on the coupled air flow-structural dynamics. Special emphasis is given to the wind tunnel experimental assessment under various flight conditions defined by multiple airspeeds and angles of attack. A novel modeling approach based on the recently introduced Vector-dependent Functionally Pooled (VFP) model structure is employed for the stochastic identification of the "global" coupled airflow-structural dynamics of the wing and their correlation with dynamic utter and stall. The obtained results demonstrate the successful system-level integration and effectiveness of the stochastic identification approach, thus opening new perspectives for the state sensing and awareness capabilities of the next generation of "fly-by-fee" UAVs.

  18. Using psychological constructs from the MUSIC Model of Motivation to predict students' science identification and career goals: results from the U.S. and Iceland

    NASA Astrophysics Data System (ADS)

    Jones, Brett D.; Sahbaz, Sumeyra; Schram, Asta B.; Chittum, Jessica R.

    2017-05-01

    We investigated students' perceptions related to psychological constructs in their science classes and the influence of these perceptions on their science identification and science career goals. Participants included 575 middle school students from two countries (334 students in the U.S. and 241 students in Iceland). Students completed a self-report questionnaire that included items from several measures. We conducted correlational analyses, confirmatory factor analyses, and structural equation modelling to test our hypotheses. Students' class perceptions (i.e. empowerment, usefulness, success, interest, and caring) were significantly correlated with their science identification, which was correlated positively with their science career goals. Combining students' science class perceptions, science identification, and career goals into one model, we documented that the U.S. and Icelandic samples fit the data reasonably well. However, not all of the hypothesised paths were statistically significant. For example, only students' perceptions of usefulness (for the U.S. and Icelandic students) and success (for the U.S. students only) significantly predicted students' career goals in the full model. Theoretically, our findings are consistent with results from samples of university engineering students, yet different in some ways. Our results provide evidence for the theoretical relationships between students' perceptions of science classes and their career goals.

  19. Statistical Analyses of Scatterplots to Identify Important Factors in Large-Scale Simulations

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kleijnen, J.P.C.; Helton, J.C.

    1999-04-01

    The robustness of procedures for identifying patterns in scatterplots generated in Monte Carlo sensitivity analyses is investigated. These procedures are based on attempts to detect increasingly complex patterns in the scatterplots under consideration and involve the identification of (1) linear relationships with correlation coefficients, (2) monotonic relationships with rank correlation coefficients, (3) trends in central tendency as defined by means, medians and the Kruskal-Wallis statistic, (4) trends in variability as defined by variances and interquartile ranges, and (5) deviations from randomness as defined by the chi-square statistic. The following two topics related to the robustness of these procedures are consideredmore » for a sequence of example analyses with a large model for two-phase fluid flow: the presence of Type I and Type II errors, and the stability of results obtained with independent Latin hypercube samples. Observations from analysis include: (1) Type I errors are unavoidable, (2) Type II errors can occur when inappropriate analysis procedures are used, (3) physical explanations should always be sought for why statistical procedures identify variables as being important, and (4) the identification of important variables tends to be stable for independent Latin hypercube samples.« less

  20. Automatic identification of bacterial types using statistical imaging methods

    NASA Astrophysics Data System (ADS)

    Trattner, Sigal; Greenspan, Hayit; Tepper, Gapi; Abboud, Shimon

    2003-05-01

    The objective of the current study is to develop an automatic tool to identify bacterial types using computer-vision and statistical modeling techniques. Bacteriophage (phage)-typing methods are used to identify and extract representative profiles of bacterial types, such as the Staphylococcus Aureus. Current systems rely on the subjective reading of plaque profiles by human expert. This process is time-consuming and prone to errors, especially as technology is enabling the increase in the number of phages used for typing. The statistical methodology presented in this work, provides for an automated, objective and robust analysis of visual data, along with the ability to cope with increasing data volumes.

  1. Analysis of blind identification methods for estimation of kinetic parameters in dynamic medical imaging

    NASA Astrophysics Data System (ADS)

    Riabkov, Dmitri

    Compartment modeling of dynamic medical image data implies that the concentration of the tracer over time in a particular region of the organ of interest is well-modeled as a convolution of the tissue response with the tracer concentration in the blood stream. The tissue response is different for different tissues while the blood input is assumed to be the same for different tissues. The kinetic parameters characterizing the tissue responses can be estimated by blind identification methods. These algorithms use the simultaneous measurements of concentration in separate regions of the organ; if the regions have different responses, the measurement of the blood input function may not be required. In this work it is shown that the blind identification problem has a unique solution for two-compartment model tissue response. For two-compartment model tissue responses in dynamic cardiac MRI imaging conditions with gadolinium-DTPA contrast agent, three blind identification algorithms are analyzed here to assess their utility: Eigenvector-based Algorithm for Multichannel Blind Deconvolution (EVAM), Cross Relations (CR), and Iterative Quadratic Maximum Likelihood (IQML). Comparisons of accuracy with conventional (not blind) identification techniques where the blood input is known are made as well. The statistical accuracies of estimation for the three methods are evaluated and compared for multiple parameter sets. The results show that the IQML method gives more accurate estimates than the other two blind identification methods. A proof is presented here that three-compartment model blind identification is not unique in the case of only two regions. It is shown that it is likely unique for the case of more than two regions, but this has not been proved analytically. For the three-compartment model the tissue responses in dynamic FDG PET imaging conditions are analyzed with the blind identification algorithms EVAM and Separable variables Least Squares (SLS). A method of identification that assumes that FDG blood input in the brain can be modeled as a function of time and several parameters (IFM) is analyzed also. Nonuniform sampling SLS (NSLS) is developed due to the rapid change of the FDG concentration in the blood during the early postinjection stage. Comparisons of accuracy of EVAM, SLS, NSLS and IFM identification techniques are made.

  2. MALDI-TOF-MS with PLS Modeling Enables Strain Typing of the Bacterial Plant Pathogen Xanthomonas axonopodis

    NASA Astrophysics Data System (ADS)

    Sindt, Nathan M.; Robison, Faith; Brick, Mark A.; Schwartz, Howard F.; Heuberger, Adam L.; Prenni, Jessica E.

    2018-02-01

    Matrix-assisted desorption/ionization time of flight mass spectrometry (MALDI-TOF-MS) is a fast and effective tool for microbial species identification. However, current approaches are limited to species-level identification even when genetic differences are known. Here, we present a novel workflow that applies the statistical method of partial least squares discriminant analysis (PLS-DA) to MALDI-TOF-MS protein fingerprint data of Xanthomonas axonopodis, an important bacterial plant pathogen of fruit and vegetable crops. Mass spectra of 32 X. axonopodis strains were used to create a mass spectral library and PLS-DA was employed to model the closely related strains. A robust workflow was designed to optimize the PLS-DA model by assessing the model performance over a range of signal-to-noise ratios (s/n) and mass filter (MF) thresholds. The optimized parameters were observed to be s/n = 3 and MF = 0.7. The model correctly classified 83% of spectra withheld from the model as a test set. A new decision rule was developed, termed the rolled-up Maximum Decision Rule (ruMDR), and this method improved identification rates to 92%. These results demonstrate that MALDI-TOF-MS protein fingerprints of bacterial isolates can be utilized to enable identification at the strain level. Furthermore, the open-source framework of this workflow allows for broad implementation across various instrument platforms as well as integration with alternative modeling and classification algorithms.

  3. The Effect of Photon Statistics and Pulse Shaping on the Performance of the Wiener Filter Crystal Identification Algorithm Applied to LabPET Phoswich Detectors

    NASA Astrophysics Data System (ADS)

    Yousefzadeh, Hoorvash Camilia; Lecomte, Roger; Fontaine, Réjean

    2012-06-01

    A fast Wiener filter-based crystal identification (WFCI) algorithm was recently developed to discriminate crystals with close scintillation decay times in phoswich detectors. Despite the promising performance of WFCI, the influence of various physical factors and electrical noise sources of the data acquisition chain (DAQ) on the crystal identification process was not fully investigated. This paper examines the effect of different noise sources, such as photon statistics, avalanche photodiode (APD) excess multiplication noise, and front-end electronic noise, as well as the influence of different shaping filters on the performance of the WFCI algorithm. To this end, a PET-like signal simulator based on a model of the LabPET DAQ, a small animal APD-based digital PET scanner, was developed. Simulated signals were generated under various noise conditions with CR-RC shapers of order 1, 3, and 5 having different time constants (τ). Applying the WFCI algorithm to these simulated signals showed that the non-stationary Poisson photon statistics is the main contributor to the identification error of WFCI algorithm. A shaping filter of order 1 with τ = 50 ns yielded the best WFCI performance (error 1%), while a longer shaping time of τ = 100 ns slightly degraded the WFCI performance (error 3%). Filters of higher orders with fast shaping time constants (10-33 ns) also produced good WFCI results (error 1.4% to 1.6%). This study shows the advantage of the pulse simulator in evaluating various DAQ conditions and confirms the influence of the detection chain on the WFCI performance.

  4. The Consequences of Model Misidentification in the Interrupted Time-Series Experiment.

    ERIC Educational Resources Information Center

    Padia, William L.

    Campbell (l969) argued for the interrupted time-series experiment as a useful methodology for testing intervention effects in the social sciences. The validity of the statistical hypothesis testing of time-series, is, however, dependent upon the proper identification of the underlying stochastic nature of the data. Several types of model…

  5. Parental Characteristics and Resiliency in Identification Rates for Special Education

    ERIC Educational Resources Information Center

    Anderson, Jeffrey A.; Howland, Allison A.; McCoach, D. Betsy

    2015-01-01

    Even with increased risks, many children demonstrate resiliency and avoid being labeled for special education; however, research on risk and resilience has been problematic because of inadequate statistical models, limitations of available data, and the exclusion of key protective factors. This study used a national sample to examine the influence…

  6. Fingerprint identification: advances since the 2009 National Research Council report

    PubMed Central

    Champod, Christophe

    2015-01-01

    This paper will discuss the major developments in the area of fingerprint identification that followed the publication of the National Research Council (NRC, of the US National Academies of Sciences) report in 2009 entitled: Strengthening Forensic Science in the United States: A Path Forward. The report portrayed an image of a field of expertise used for decades without the necessary scientific research-based underpinning. The advances since the report and the needs in selected areas of fingerprinting will be detailed. It includes the measurement of the accuracy, reliability, repeatability and reproducibility of the conclusions offered by fingerprint experts. The paper will also pay attention to the development of statistical models allowing assessment of fingerprint comparisons. As a corollary of these developments, the next challenge is to reconcile a traditional practice dominated by deterministic conclusions with the probabilistic logic of any statistical model. There is a call for greater candour and fingerprint experts will need to communicate differently on the strengths and limitations of their findings. Their testimony will have to go beyond the blunt assertion of the uniqueness of fingerprints or the opinion delivered ispe dixit. PMID:26101284

  7. An M-estimator for reduced-rank system identification.

    PubMed

    Chen, Shaojie; Liu, Kai; Yang, Yuguang; Xu, Yuting; Lee, Seonjoo; Lindquist, Martin; Caffo, Brian S; Vogelstein, Joshua T

    2017-01-15

    High-dimensional time-series data from a wide variety of domains, such as neuroscience, are being generated every day. Fitting statistical models to such data, to enable parameter estimation and time-series prediction, is an important computational primitive. Existing methods, however, are unable to cope with the high-dimensional nature of these data, due to both computational and statistical reasons. We mitigate both kinds of issues by proposing an M-estimator for Reduced-rank System IDentification ( MR. SID). A combination of low-rank approximations, ℓ 1 and ℓ 2 penalties, and some numerical linear algebra tricks, yields an estimator that is computationally efficient and numerically stable. Simulations and real data examples demonstrate the usefulness of this approach in a variety of problems. In particular, we demonstrate that MR. SID can accurately estimate spatial filters, connectivity graphs, and time-courses from native resolution functional magnetic resonance imaging data. MR. SID therefore enables big time-series data to be analyzed using standard methods, readying the field for further generalizations including non-linear and non-Gaussian state-space models.

  8. An M-estimator for reduced-rank system identification

    PubMed Central

    Chen, Shaojie; Liu, Kai; Yang, Yuguang; Xu, Yuting; Lee, Seonjoo; Lindquist, Martin; Caffo, Brian S.; Vogelstein, Joshua T.

    2018-01-01

    High-dimensional time-series data from a wide variety of domains, such as neuroscience, are being generated every day. Fitting statistical models to such data, to enable parameter estimation and time-series prediction, is an important computational primitive. Existing methods, however, are unable to cope with the high-dimensional nature of these data, due to both computational and statistical reasons. We mitigate both kinds of issues by proposing an M-estimator for Reduced-rank System IDentification ( MR. SID). A combination of low-rank approximations, ℓ1 and ℓ2 penalties, and some numerical linear algebra tricks, yields an estimator that is computationally efficient and numerically stable. Simulations and real data examples demonstrate the usefulness of this approach in a variety of problems. In particular, we demonstrate that MR. SID can accurately estimate spatial filters, connectivity graphs, and time-courses from native resolution functional magnetic resonance imaging data. MR. SID therefore enables big time-series data to be analyzed using standard methods, readying the field for further generalizations including non-linear and non-Gaussian state-space models. PMID:29391659

  9. Time Series Expression Analyses Using RNA-seq: A Statistical Approach

    PubMed Central

    Oh, Sunghee; Song, Seongho; Grabowski, Gregory; Zhao, Hongyu; Noonan, James P.

    2013-01-01

    RNA-seq is becoming the de facto standard approach for transcriptome analysis with ever-reducing cost. It has considerable advantages over conventional technologies (microarrays) because it allows for direct identification and quantification of transcripts. Many time series RNA-seq datasets have been collected to study the dynamic regulations of transcripts. However, statistically rigorous and computationally efficient methods are needed to explore the time-dependent changes of gene expression in biological systems. These methods should explicitly account for the dependencies of expression patterns across time points. Here, we discuss several methods that can be applied to model timecourse RNA-seq data, including statistical evolutionary trajectory index (SETI), autoregressive time-lagged regression (AR(1)), and hidden Markov model (HMM) approaches. We use three real datasets and simulation studies to demonstrate the utility of these dynamic methods in temporal analysis. PMID:23586021

  10. Detection of reflecting surfaces by a statistical model

    NASA Astrophysics Data System (ADS)

    He, Qiang; Chu, Chee-Hung H.

    2009-02-01

    Remote sensing is widely used assess the destruction from natural disasters and to plan relief and recovery operations. How to automatically extract useful features and segment interesting objects from digital images, including remote sensing imagery, becomes a critical task for image understanding. Unfortunately, current research on automated feature extraction is ignorant of contextual information. As a result, the fidelity of populating attributes corresponding to interesting features and objects cannot be satisfied. In this paper, we present an exploration on meaningful object extraction integrating reflecting surfaces. Detection of specular reflecting surfaces can be useful in target identification and then can be applied to environmental monitoring, disaster prediction and analysis, military, and counter-terrorism. Our method is based on a statistical model to capture the statistical properties of specular reflecting surfaces. And then the reflecting surfaces are detected through cluster analysis.

  11. Time series expression analyses using RNA-seq: a statistical approach.

    PubMed

    Oh, Sunghee; Song, Seongho; Grabowski, Gregory; Zhao, Hongyu; Noonan, James P

    2013-01-01

    RNA-seq is becoming the de facto standard approach for transcriptome analysis with ever-reducing cost. It has considerable advantages over conventional technologies (microarrays) because it allows for direct identification and quantification of transcripts. Many time series RNA-seq datasets have been collected to study the dynamic regulations of transcripts. However, statistically rigorous and computationally efficient methods are needed to explore the time-dependent changes of gene expression in biological systems. These methods should explicitly account for the dependencies of expression patterns across time points. Here, we discuss several methods that can be applied to model timecourse RNA-seq data, including statistical evolutionary trajectory index (SETI), autoregressive time-lagged regression (AR(1)), and hidden Markov model (HMM) approaches. We use three real datasets and simulation studies to demonstrate the utility of these dynamic methods in temporal analysis.

  12. Analysis Monthly Import of Palm Oil Products Using Box-Jenkins Model

    NASA Astrophysics Data System (ADS)

    Ahmad, Nurul F. Y.; Khalid, Kamil; Saifullah Rusiman, Mohd; Ghazali Kamardan, M.; Roslan, Rozaini; Che-Him, Norziha

    2018-04-01

    The palm oil industry has been an important component of the national economy especially the agriculture sector. The aim of this study is to identify the pattern of import of palm oil products, to model the time series using Box-Jenkins model and to forecast the monthly import of palm oil products. The method approach is included in the statistical test for verifying the equivalence model and statistical measurement of three models, namely Autoregressive (AR) model, Moving Average (MA) model and Autoregressive Moving Average (ARMA) model. The model identification of all product import palm oil is different in which the AR(1) was found to be the best model for product import palm oil while MA(3) was found to be the best model for products import palm kernel oil. For the palm kernel, MA(4) was found to be the best model. The results forecast for the next four months for products import palm oil, palm kernel oil and palm kernel showed the most significant decrease compared to the actual data.

  13. Hidden Markov models incorporating fuzzy measures and integrals for protein sequence identification and alignment.

    PubMed

    Bidargaddi, Niranjan P; Chetty, Madhu; Kamruzzaman, Joarder

    2008-06-01

    Profile hidden Markov models (HMMs) based on classical HMMs have been widely applied for protein sequence identification. The formulation of the forward and backward variables in profile HMMs is made under statistical independence assumption of the probability theory. We propose a fuzzy profile HMM to overcome the limitations of that assumption and to achieve an improved alignment for protein sequences belonging to a given family. The proposed model fuzzifies the forward and backward variables by incorporating Sugeno fuzzy measures and Choquet integrals, thus further extends the generalized HMM. Based on the fuzzified forward and backward variables, we propose a fuzzy Baum-Welch parameter estimation algorithm for profiles. The strong correlations and the sequence preference involved in the protein structures make this fuzzy architecture based model as a suitable candidate for building profiles of a given family, since the fuzzy set can handle uncertainties better than classical methods.

  14. Linear control of oscillator and amplifier flows*

    NASA Astrophysics Data System (ADS)

    Schmid, Peter J.; Sipp, Denis

    2016-08-01

    Linear control applied to fluid systems near an equilibrium point has important applications for many flows of industrial or fundamental interest. In this article we give an exposition of tools and approaches for the design of control strategies for globally stable or unstable flows. For unstable oscillator flows a feedback configuration and a model-based approach is proposed, while for stable noise-amplifier flows a feedforward setup and an approach based on system identification is advocated. Model reduction and robustness issues are addressed for the oscillator case; statistical learning techniques are emphasized for the amplifier case. Effective suppression of global and convective instabilities could be demonstrated for either case, even though the system-identification approach results in a superior robustness to off-design conditions.

  15. Pathways to Identity: Aiding Law Enforcement in Identification Tasks With Visual Analytics

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bruce, Joseph R.; Scholtz, Jean; Hodges, Duncan

    The nature of identity has changed dramatically in recent years, and has grown in complexity. Identities are defined in multiple domains: biological and psychological elements strongly contribute, but also biographical and cyber elements are necessary to complete the picture. Law enforcement is beginning to adjust to these changes, recognizing its importance in criminal justice. The SuperIdentity project seeks to aid law enforcement officials in their identification tasks through research of techniques for discovering identity traits, generation of statistical models of identity and analysis of identity traits through visualization. We present use cases compiled through user interviews in multiple fields, includingmore » law enforcement, as well as the modeling and visualization tools design to aid in those use cases.« less

  16. Pathways to Identity. Using Visualization to Aid Law Enforcement in Identification Tasks

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bruce, Joseph R.; Scholtz, Jean; Hodges, Duncan

    The nature of identity has changed dramatically in recent years and has grown in complexity. Identities are defined in multiple domains: biological and psychological elements strongly contribute, but biographical and cyber elements also are necessary to complete the picture. Law enforcement is beginning to adjust to these changes, recognizing identity’s importance in criminal justice. The SuperIdentity project seeks to aid law enforcement officials in their identification tasks through research of techniques for discovering identity traits, generation of statistical models of identity and analysis of identity traits through visualization. We present use cases compiled through user interviews in multiple fields, includingmore » law enforcement, and describe the modeling and visualization tools design to aid in those use cases.« less

  17. An in silico model for identification of small RNAs in whole bacterial genomes: characterization of antisense RNAs in pathogenic Escherichia coli and Streptococcus agalactiae strains.

    PubMed

    Pichon, Christophe; du Merle, Laurence; Caliot, Marie Elise; Trieu-Cuot, Patrick; Le Bouguénec, Chantal

    2012-04-01

    Characterization of small non-coding ribonucleic acids (sRNA) among the large volume of data generated by high-throughput RNA-seq or tiling microarray analyses remains a challenge. Thus, there is still a need for accurate in silico prediction methods to identify sRNAs within a given bacterial species. After years of effort, dedicated software were developed based on comparative genomic analyses or mathematical/statistical models. Although these genomic analyses enabled sRNAs in intergenic regions to be efficiently identified, they all failed to predict antisense sRNA genes (asRNA), i.e. RNA genes located on the DNA strand complementary to that which encodes the protein. The statistical models enabled any genomic region to be analyzed theorically but not efficiently. We present a new model for in silico identification of sRNA and asRNA candidates within an entire bacterial genome. This model was successfully used to analyze the Gram-negative Escherichia coli and Gram-positive Streptococcus agalactiae. In both bacteria, numerous asRNAs are transcribed from the complementary strand of genes located in pathogenicity islands, strongly suggesting that these asRNAs are regulators of the virulence expression. In particular, we characterized an asRNA that acted as an enhancer-like regulator of the type 1 fimbriae production involved in the virulence of extra-intestinal pathogenic E. coli.

  18. An in silico model for identification of small RNAs in whole bacterial genomes: characterization of antisense RNAs in pathogenic Escherichia coli and Streptococcus agalactiae strains

    PubMed Central

    Pichon, Christophe; du Merle, Laurence; Caliot, Marie Elise; Trieu-Cuot, Patrick; Le Bouguénec, Chantal

    2012-01-01

    Characterization of small non-coding ribonucleic acids (sRNA) among the large volume of data generated by high-throughput RNA-seq or tiling microarray analyses remains a challenge. Thus, there is still a need for accurate in silico prediction methods to identify sRNAs within a given bacterial species. After years of effort, dedicated software were developed based on comparative genomic analyses or mathematical/statistical models. Although these genomic analyses enabled sRNAs in intergenic regions to be efficiently identified, they all failed to predict antisense sRNA genes (asRNA), i.e. RNA genes located on the DNA strand complementary to that which encodes the protein. The statistical models enabled any genomic region to be analyzed theorically but not efficiently. We present a new model for in silico identification of sRNA and asRNA candidates within an entire bacterial genome. This model was successfully used to analyze the Gram-negative Escherichia coli and Gram-positive Streptococcus agalactiae. In both bacteria, numerous asRNAs are transcribed from the complementary strand of genes located in pathogenicity islands, strongly suggesting that these asRNAs are regulators of the virulence expression. In particular, we characterized an asRNA that acted as an enhancer-like regulator of the type 1 fimbriae production involved in the virulence of extra-intestinal pathogenic E. coli. PMID:22139924

  19. Binomial probability distribution model-based protein identification algorithm for tandem mass spectrometry utilizing peak intensity information.

    PubMed

    Xiao, Chuan-Le; Chen, Xiao-Zhou; Du, Yang-Li; Sun, Xuesong; Zhang, Gong; He, Qing-Yu

    2013-01-04

    Mass spectrometry has become one of the most important technologies in proteomic analysis. Tandem mass spectrometry (LC-MS/MS) is a major tool for the analysis of peptide mixtures from protein samples. The key step of MS data processing is the identification of peptides from experimental spectra by searching public sequence databases. Although a number of algorithms to identify peptides from MS/MS data have been already proposed, e.g. Sequest, OMSSA, X!Tandem, Mascot, etc., they are mainly based on statistical models considering only peak-matches between experimental and theoretical spectra, but not peak intensity information. Moreover, different algorithms gave different results from the same MS data, implying their probable incompleteness and questionable reproducibility. We developed a novel peptide identification algorithm, ProVerB, based on a binomial probability distribution model of protein tandem mass spectrometry combined with a new scoring function, making full use of peak intensity information and, thus, enhancing the ability of identification. Compared with Mascot, Sequest, and SQID, ProVerB identified significantly more peptides from LC-MS/MS data sets than the current algorithms at 1% False Discovery Rate (FDR) and provided more confident peptide identifications. ProVerB is also compatible with various platforms and experimental data sets, showing its robustness and versatility. The open-source program ProVerB is available at http://bioinformatics.jnu.edu.cn/software/proverb/ .

  20. Structural damage detection based on stochastic subspace identification and statistical pattern recognition: II. Experimental validation under varying temperature

    NASA Astrophysics Data System (ADS)

    Lin, Y. Q.; Ren, W. X.; Fang, S. E.

    2011-11-01

    Although most vibration-based damage detection methods can acquire satisfactory verification on analytical or numerical structures, most of them may encounter problems when applied to real-world structures under varying environments. The damage detection methods that directly extract damage features from the periodically sampled dynamic time history response measurements are desirable but relevant research and field application verification are still lacking. In this second part of a two-part paper, the robustness and performance of the statistics-based damage index using the forward innovation model by stochastic subspace identification of a vibrating structure proposed in the first part have been investigated against two prestressed reinforced concrete (RC) beams tested in the laboratory and a full-scale RC arch bridge tested in the field under varying environments. Experimental verification is focused on temperature effects. It is demonstrated that the proposed statistics-based damage index is insensitive to temperature variations but sensitive to the structural deterioration or state alteration. This makes it possible to detect the structural damage for the real-scale structures experiencing ambient excitations and varying environmental conditions.

  1. Generalized linear and generalized additive models in studies of species distributions: Setting the scene

    USGS Publications Warehouse

    Guisan, Antoine; Edwards, T.C.; Hastie, T.

    2002-01-01

    An important statistical development of the last 30 years has been the advance in regression analysis provided by generalized linear models (GLMs) and generalized additive models (GAMs). Here we introduce a series of papers prepared within the framework of an international workshop entitled: Advances in GLMs/GAMs modeling: from species distribution to environmental management, held in Riederalp, Switzerland, 6-11 August 2001. We first discuss some general uses of statistical models in ecology, as well as provide a short review of several key examples of the use of GLMs and GAMs in ecological modeling efforts. We next present an overview of GLMs and GAMs, and discuss some of their related statistics used for predictor selection, model diagnostics, and evaluation. Included is a discussion of several new approaches applicable to GLMs and GAMs, such as ridge regression, an alternative to stepwise selection of predictors, and methods for the identification of interactions by a combined use of regression trees and several other approaches. We close with an overview of the papers and how we feel they advance our understanding of their application to ecological modeling. ?? 2002 Elsevier Science B.V. All rights reserved.

  2. Data-Driven Learning of Q-Matrix

    PubMed Central

    Liu, Jingchen; Xu, Gongjun; Ying, Zhiliang

    2013-01-01

    The recent surge of interests in cognitive assessment has led to developments of novel statistical models for diagnostic classification. Central to many such models is the well-known Q-matrix, which specifies the item–attribute relationships. This article proposes a data-driven approach to identification of the Q-matrix and estimation of related model parameters. A key ingredient is a flexible T-matrix that relates the Q-matrix to response patterns. The flexibility of the T-matrix allows the construction of a natural criterion function as well as a computationally amenable algorithm. Simulations results are presented to demonstrate usefulness and applicability of the proposed method. Extension to handling of the Q-matrix with partial information is presented. The proposed method also provides a platform on which important statistical issues, such as hypothesis testing and model selection, may be formally addressed. PMID:23926363

  3. Time Series Model Identification by Estimating Information, Memory, and Quantiles.

    DTIC Science & Technology

    1983-07-01

    Standards, Sect. D, 68D, 937-951. Parzen, Emanuel (1969) "Multiple time series modeling" Multivariate Analysis - II, edited by P. Krishnaiah , Academic... Krishnaiah , North Holland: Amsterdam, 283-295. Parzen, Emanuel (1979) "Forecasting and Whitening Filter Estimation" TIMS Studies in the Management...principle. Applications of Statistics, P. R. Krishnaiah , ed. North Holland: Amsterdam, 27-41. Box, G. E. P. and Jenkins, G. M. (1970) Time Series Analysis

  4. Consonant and Vowel Identification in Cochlear Implant Users Measured by Nonsense Words: A Systematic Review and Meta-Analysis.

    PubMed

    Rødvik, Arne Kirkhorn; von Koss Torkildsen, Janne; Wie, Ona Bø; Storaker, Marit Aarvaag; Silvola, Juha Tapio

    2018-04-17

    The purpose of this systematic review and meta-analysis was to establish a baseline of the vowel and consonant identification scores in prelingually and postlingually deaf users of multichannel cochlear implants (CIs) tested with consonant-vowel-consonant and vowel-consonant-vowel nonsense syllables. Six electronic databases were searched for peer-reviewed articles reporting consonant and vowel identification scores in CI users measured by nonsense words. Relevant studies were independently assessed and screened by 2 reviewers. Consonant and vowel identification scores were presented in forest plots and compared between studies in a meta-analysis. Forty-seven articles with 50 studies, including 647 participants, thereof 581 postlingually deaf and 66 prelingually deaf, met the inclusion criteria of this study. The mean performance on vowel identification tasks for the postlingually deaf CI users was 76.8% (N = 5), which was higher than the mean performance for the prelingually deaf CI users (67.7%; N = 1). The mean performance on consonant identification tasks for the postlingually deaf CI users was higher (58.4%; N = 44) than for the prelingually deaf CI users (46.7%; N = 6). The most common consonant confusions were found between those with same manner of articulation (/k/ as /t/, /m/ as /n/, and /p/ as /t/). The mean performance on consonant identification tasks for the prelingually and postlingually deaf CI users was found. There were no statistically significant differences between the scores for prelingually and postlingually deaf CI users. The consonants that were incorrectly identified were typically confused with other consonants with the same acoustic properties, namely, voicing, duration, nasality, and silent gaps. A univariate metaregression model, although not statistically significant, indicated that duration of implant use in postlingually deaf adults predict a substantial portion of their consonant identification ability. As there is no ceiling effect, a nonsense syllable identification test may be a useful addition to the standard test battery in audiology clinics when assessing the speech perception of CI users.

  5. Mass Conservation and Inference of Metabolic Networks from High-Throughput Mass Spectrometry Data

    PubMed Central

    Bandaru, Pradeep; Bansal, Mukesh

    2011-01-01

    Abstract We present a step towards the metabolome-wide computational inference of cellular metabolic reaction networks from metabolic profiling data, such as mass spectrometry. The reconstruction is based on identification of irreducible statistical interactions among the metabolite activities using the ARACNE reverse-engineering algorithm and on constraining possible metabolic transformations to satisfy the conservation of mass. The resulting algorithms are validated on synthetic data from an abridged computational model of Escherichia coli metabolism. Precision rates upwards of 50% are routinely observed for identification of full metabolic reactions, and recalls upwards of 20% are also seen. PMID:21314454

  6. The use of machine learning for the identification of peripheral artery disease and future mortality risk.

    PubMed

    Ross, Elsie Gyang; Shah, Nigam H; Dalman, Ronald L; Nead, Kevin T; Cooke, John P; Leeper, Nicholas J

    2016-11-01

    A key aspect of the precision medicine effort is the development of informatics tools that can analyze and interpret "big data" sets in an automated and adaptive fashion while providing accurate and actionable clinical information. The aims of this study were to develop machine learning algorithms for the identification of disease and the prognostication of mortality risk and to determine whether such models perform better than classical statistical analyses. Focusing on peripheral artery disease (PAD), patient data were derived from a prospective, observational study of 1755 patients who presented for elective coronary angiography. We employed multiple supervised machine learning algorithms and used diverse clinical, demographic, imaging, and genomic information in a hypothesis-free manner to build models that could identify patients with PAD and predict future mortality. Comparison was made to standard stepwise linear regression models. Our machine-learned models outperformed stepwise logistic regression models both for the identification of patients with PAD (area under the curve, 0.87 vs 0.76, respectively; P = .03) and for the prediction of future mortality (area under the curve, 0.76 vs 0.65, respectively; P = .10). Both machine-learned models were markedly better calibrated than the stepwise logistic regression models, thus providing more accurate disease and mortality risk estimates. Machine learning approaches can produce more accurate disease classification and prediction models. These tools may prove clinically useful for the automated identification of patients with highly morbid diseases for which aggressive risk factor management can improve outcomes. Copyright © 2016 Society for Vascular Surgery. Published by Elsevier Inc. All rights reserved.

  7. Observing Consistency in Online Communication Patterns for User Re-Identification.

    PubMed

    Adeyemi, Ikuesan Richard; Razak, Shukor Abd; Salleh, Mazleena; Venter, Hein S

    2016-01-01

    Comprehension of the statistical and structural mechanisms governing human dynamics in online interaction plays a pivotal role in online user identification, online profile development, and recommender systems. However, building a characteristic model of human dynamics on the Internet involves a complete analysis of the variations in human activity patterns, which is a complex process. This complexity is inherent in human dynamics and has not been extensively studied to reveal the structural composition of human behavior. A typical method of anatomizing such a complex system is viewing all independent interconnectivity that constitutes the complexity. An examination of the various dimensions of human communication pattern in online interactions is presented in this paper. The study employed reliable server-side web data from 31 known users to explore characteristics of human-driven communications. Various machine-learning techniques were explored. The results revealed that each individual exhibited a relatively consistent, unique behavioral signature and that the logistic regression model and model tree can be used to accurately distinguish online users. These results are applicable to one-to-one online user identification processes, insider misuse investigation processes, and online profiling in various areas.

  8. School district resources and identification of children with autistic disorder.

    PubMed

    Palmer, Raymond F; Blanchard, Stephen; Jean, Carlos R; Mandell, David S

    2005-01-01

    We estimated the effect of community and school district resources on the identification of children with autistic disorder. Latent growth curve regression models were applied to school district-level data from one large state. The rate of identification of autistic disorder increased on average by 1.0 child per 10000 per year (P<.001), with statistically significant district variation. After adjustment for district and community characteristics, each increase in decile of school revenue was associated with an increase of 0.16 per 10000 children identified with autistic disorder. The proportion of economically disadvantaged children per district was inversely associated with autistic disorder cases. District revenue was associated with higher proportions of children identified with autistic disorder at baseline and increasing rates of identification when measured longitudinally. Economically disadvantaged communities may need assistance to identify children with autistic spectrum disorders and other developmental delays that require attention.

  9. School District Resources and Identification of Children With Autistic Disorder

    PubMed Central

    Palmer, Raymond F.; Blanchard, Stephen; Jean, Carlos R.; Mandell, David S.

    2005-01-01

    Objectives. We estimated the effect of community and school district resources on the identification of children with autistic disorder. Methods. Latent growth curve regression models were applied to school district–level data from one large state. Results. The rate of identification of autistic disorder increased on average by 1.0 child per 10000 per year (P<.001), with statistically significant district variation. After adjustment for district and community characteristics, each increase in decile of school revenue was associated with an increase of 0.16 per 10000 children identified with autistic disorder. The proportion of economically disadvantaged children per district was inversely associated with autistic disorder cases. Conclusions. District revenue was associated with higher proportions of children identified with autistic disorder at baseline and increasing rates of identification when measured longitudinally. Economically disadvantaged communities may need assistance to identify children with autistic spectrum disorders and other developmental delays that require attention. PMID:15623872

  10. Supervised variational model with statistical inference and its application in medical image segmentation.

    PubMed

    Li, Changyang; Wang, Xiuying; Eberl, Stefan; Fulham, Michael; Yin, Yong; Dagan Feng, David

    2015-01-01

    Automated and general medical image segmentation can be challenging because the foreground and the background may have complicated and overlapping density distributions in medical imaging. Conventional region-based level set algorithms often assume piecewise constant or piecewise smooth for segments, which are implausible for general medical image segmentation. Furthermore, low contrast and noise make identification of the boundaries between foreground and background difficult for edge-based level set algorithms. Thus, to address these problems, we suggest a supervised variational level set segmentation model to harness the statistical region energy functional with a weighted probability approximation. Our approach models the region density distributions by using the mixture-of-mixtures Gaussian model to better approximate real intensity distributions and distinguish statistical intensity differences between foreground and background. The region-based statistical model in our algorithm can intuitively provide better performance on noisy images. We constructed a weighted probability map on graphs to incorporate spatial indications from user input with a contextual constraint based on the minimization of contextual graphs energy functional. We measured the performance of our approach on ten noisy synthetic images and 58 medical datasets with heterogeneous intensities and ill-defined boundaries and compared our technique to the Chan-Vese region-based level set model, the geodesic active contour model with distance regularization, and the random walker model. Our method consistently achieved the highest Dice similarity coefficient when compared to the other methods.

  11. Direct and Indirect Effects of Birth Order on Personality and Identity: Support for the Null Hypothesis

    ERIC Educational Resources Information Center

    Dunkel, Curtis S.; Harbke, Colin R.; Papini, Dennis R.

    2009-01-01

    The authors proposed that birth order affects psychosocial outcomes through differential investment from parent to child and differences in the degree of identification from child to parent. The authors conducted this study to test these 2 models. Despite the use of statistical and methodological procedures to increase sensitivity and reduce…

  12. The Kano Model: Identification of Handbook Attributes to Learn in Practice

    ERIC Educational Resources Information Center

    Szymczak, Michal; Kowal, Krzysztof

    2016-01-01

    Purpose: Statistics shows terrifying tendencies in people' unwillingness to develop themselves by reading books. The situation is even more serious if we look at companies and their employees. People want to be specialists, but in fact reading culture in companies is rare. Many actions which are undertaken to reverse this trend may lead to sales…

  13. Mental Mechanisms for Topics Identification

    PubMed Central

    2014-01-01

    Topics identification (TI) is the process that consists in determining the main themes present in natural language documents. The current TI modeling paradigm aims at acquiring semantic information from statistic properties of large text datasets. We investigate the mental mechanisms responsible for the identification of topics in a single document given existing knowledge. Our main hypothesis is that topics are the result of accumulated neural activation of loosely organized information stored in long-term memory (LTM). We experimentally tested our hypothesis with a computational model that simulates LTM activation. The model assumes activation decay as an unavoidable phenomenon originating from the bioelectric nature of neural systems. Since decay should negatively affect the quality of topics, the model predicts the presence of short-term memory (STM) to keep the focus of attention on a few words, with the expected outcome of restoring quality to a baseline level. Our experiments measured topics quality of over 300 documents with various decay rates and STM capacity. Our results showed that accumulated activation of loosely organized information was an effective mental computational commodity to identify topics. It was furthermore confirmed that rapid decay is detrimental to topics quality but that limited capacity STM restores quality to a baseline level, even exceeding it slightly. PMID:24744775

  14. Assumption Trade-Offs When Choosing Identification Strategies for Pre-Post Treatment Effect Estimation: An Illustration of a Community-Based Intervention in Madagascar.

    PubMed

    Weber, Ann M; van der Laan, Mark J; Petersen, Maya L

    2015-03-01

    Failure (or success) in finding a statistically significant effect of a large-scale intervention may be due to choices made in the evaluation. To highlight the potential limitations and pitfalls of some common identification strategies used for estimating causal effects of community-level interventions, we apply a roadmap for causal inference to a pre-post evaluation of a national nutrition program in Madagascar. Selection into the program was non-random and strongly associated with the pre-treatment (lagged) outcome. Using structural causal models (SCM), directed acyclic graphs (DAGs) and simulated data, we illustrate that an estimand with the outcome defined as the post-treatment outcome controls for confounding by the lagged outcome but not by possible unmeasured confounders. Two separate differencing estimands (of the pre- and post-treatment outcome) have the potential to adjust for a certain type of unmeasured confounding, but introduce bias if the additional identification assumptions they rely on are not met. In order to illustrate the practical impact of choice between three common identification strategies and their corresponding estimands, we used observational data from the community nutrition program in Madagascar to estimate each of these three estimands. Specifically, we estimated the average treatment effect of the program on the community mean nutritional status of children 5 years and under and found that the estimate based on the post-treatment estimand was about a quarter of the magnitude of either of the differencing estimands (0.066 SD vs. 0.26-0.27 SD increase in mean weight-for-age z-score). Choice of estimand clearly has important implications for the interpretation of the success of the program to improve nutritional status of young children. A careful appraisal of the assumptions underlying the causal model is imperative before committing to a statistical model and progressing to estimation. However, knowledge about the data-generating process must be sufficient in order to choose the identification strategy that gets us closest to the truth.

  15. An Assessment of Land Surface and Lightning Characteristics Associated with Lightning-Initiated Wildfires

    NASA Technical Reports Server (NTRS)

    Coy, James; Schultz, Christopher J.; Case, Jonathan L.

    2017-01-01

    Can we use modeled information of the land surface and characteristics of lightning beyond flash occurrence to increase the identification and prediction of wildfires? Combine observed cloud-to-ground (CG) flashes with real-time land surface model output, and Compare data with areas where lightning did not start a wildfire to determine what land surface conditions and lightning characteristics were responsible for causing wildfires. Statistical differences between suspected fire-starters and non-fire-starters were peak-current dependent 0-10 cm Volumetric and Relative Soil Moisture comparisons were statistically dependent to at least the p = 0.05 independence level for both polarity flash types Suspected fire-starters typically occurred in areas of lower soil moisture than non-fire-starters. GVF value comparisons were only found to be statistically dependent for -CG flashes. However, random sampling of the -CG non-fire starter dataset revealed that this relationship may not always hold.

  16. [Application of a mathematical algorithm for the detection of electroneuromyographic results in the pathogenesis study of facial dyskinesia].

    PubMed

    Gribova, N P; Iudel'son, Ia B; Golubev, V L; Abramenkova, I V

    2003-01-01

    To carry out a differential diagnosis of two facial dyskinesia (FD) models--facial hemispasm (FH) and facial paraspasm (FP), a combined program of electroneuromyographic (ENMG) examination has been created, using statistical analyses, including that for objects identification based on hybrid neural network with the application of adaptive fuzzy logic method and standard statistics programs (Wilcoxon, Student statistics). In FH, a lesion of peripheral facial neuromotor apparatus with augmentation of functions of inter-neurons in segmental and upper segmental stem levels predominated. In FP, primary afferent strengthening in mimic muscles was accompanied by increased motor neurons activity and reciprocal augmentation of inter-neurons, inhibiting motor portion of V pair. Mathematical algorithm for ENMG results recognition worked out in the study provides a precise differentiation of two FD models and opens possibilities for differential diagnosis of other facial motor disorders.

  17. Identification of Microorganisms by High Resolution Tandem Mass Spectrometry with Accurate Statistical Significance

    NASA Astrophysics Data System (ADS)

    Alves, Gelio; Wang, Guanghui; Ogurtsov, Aleksey Y.; Drake, Steven K.; Gucek, Marjan; Suffredini, Anthony F.; Sacks, David B.; Yu, Yi-Kuo

    2016-02-01

    Correct and rapid identification of microorganisms is the key to the success of many important applications in health and safety, including, but not limited to, infection treatment, food safety, and biodefense. With the advance of mass spectrometry (MS) technology, the speed of identification can be greatly improved. However, the increasing number of microbes sequenced is challenging correct microbial identification because of the large number of choices present. To properly disentangle candidate microbes, one needs to go beyond apparent morphology or simple `fingerprinting'; to correctly prioritize the candidate microbes, one needs to have accurate statistical significance in microbial identification. We meet these challenges by using peptidome profiles of microbes to better separate them and by designing an analysis method that yields accurate statistical significance. Here, we present an analysis pipeline that uses tandem MS (MS/MS) spectra for microbial identification or classification. We have demonstrated, using MS/MS data of 81 samples, each composed of a single known microorganism, that the proposed pipeline can correctly identify microorganisms at least at the genus and species levels. We have also shown that the proposed pipeline computes accurate statistical significances, i.e., E-values for identified peptides and unified E-values for identified microorganisms. The proposed analysis pipeline has been implemented in MiCId, a freely available software for Microorganism Classification and Identification. MiCId is available for download at http://www.ncbi.nlm.nih.gov/CBBresearch/Yu/downloads.html.

  18. Minimum length Pb/SCIN detector for efficient cosmic ray identification

    NASA Technical Reports Server (NTRS)

    Snyder, H. David

    1989-01-01

    A study was made of the performance of a minimal length cosmic ray shower detector that would be light enough for space flight and would provide efficient identification of positions and protons. Cosmic ray positions are mainly produced in the decay chain of: Pion yields Muon yields Positron and they provide a measure of the matter density traversed by primary protons. Present positron flux measurements are consistent with the Leaky Box and Halo models for sources of cosmic rays. Abundant protons in the space environment are a significant source of background that would wash out the positron signal. Protons and positrons produced very distictive showers of particles when they enter matter; many studies have been published on their behavior on large calorimeter detectors. The challenge is to determine the minimal material necessary (minimal calorimeter depth) for positive particles identification. The primary instrument for the investigation is the Monte Carlo code GEANT, a library of programs from CERN that can be used to model experimental geometry, detector responses and particle interaction processes. The use of the Monte Carlo approach is crucial since statistical fluctuations in shower shape are significant. Studies conducted during the 1988 summer program showed that straightforward approaches to the problem achieved 85 to 90 percent correct identification, but left a residue of 10 to 15 percent misidentified particles. This percentage improved to a few percent when multiple shower-cut criteria were applied to the data. This summer, the same study was extended to employ several physical and statistical methods of identifying response of the calorimeter and the efficiency of the optimal shower cuts to off-normal incidence particle was determined.

  19. A data recipient centered de-identification method to retain statistical attributes.

    PubMed

    Gal, Tamas S; Tucker, Thomas C; Gangopadhyay, Aryya; Chen, Zhiyuan

    2014-08-01

    Privacy has always been a great concern of patients and medical service providers. As a result of the recent advances in information technology and the government's push for the use of Electronic Health Record (EHR) systems, a large amount of medical data is collected and stored electronically. This data needs to be made available for analysis but at the same time patient privacy has to be protected through de-identification. Although biomedical researchers often describe their research plans when they request anonymized data, most existing anonymization methods do not use this information when de-identifying the data. As a result, the anonymized data may not be useful for the planned research project. This paper proposes a data recipient centered approach to tailor the de-identification method based on input from the recipient of the data. We demonstrate our approach through an anonymization project for biomedical researchers with specific goals to improve the utility of the anonymized data for statistical models used for their research project. The selected algorithm improves a privacy protection method called Condensation by Aggarwal et al. Our methods were tested and validated on real cancer surveillance data provided by the Kentucky Cancer Registry. Copyright © 2014 Elsevier Inc. All rights reserved.

  20. A theory of fine structure image models with an application to detection and classification of dementia.

    PubMed

    O'Neill, William; Penn, Richard; Werner, Michael; Thomas, Justin

    2015-06-01

    Estimation of stochastic process models from data is a common application of time series analysis methods. Such system identification processes are often cast as hypothesis testing exercises whose intent is to estimate model parameters and test them for statistical significance. Ordinary least squares (OLS) regression and the Levenberg-Marquardt algorithm (LMA) have proven invaluable computational tools for models being described by non-homogeneous, linear, stationary, ordinary differential equations. In this paper we extend stochastic model identification to linear, stationary, partial differential equations in two independent variables (2D) and show that OLS and LMA apply equally well to these systems. The method employs an original nonparametric statistic as a test for the significance of estimated parameters. We show gray scale and color images are special cases of 2D systems satisfying a particular autoregressive partial difference equation which estimates an analogous partial differential equation. Several applications to medical image modeling and classification illustrate the method by correctly classifying demented and normal OLS models of axial magnetic resonance brain scans according to subject Mini Mental State Exam (MMSE) scores. Comparison with 13 image classifiers from the literature indicates our classifier is at least 14 times faster than any of them and has a classification accuracy better than all but one. Our modeling method applies to any linear, stationary, partial differential equation and the method is readily extended to 3D whole-organ systems. Further, in addition to being a robust image classifier, estimated image models offer insights into which parameters carry the most diagnostic image information and thereby suggest finer divisions could be made within a class. Image models can be estimated in milliseconds which translate to whole-organ models in seconds; such runtimes could make real-time medicine and surgery modeling possible.

  1. ECG Identification System Using Neural Network with Global and Local Features

    ERIC Educational Resources Information Center

    Tseng, Kuo-Kun; Lee, Dachao; Chen, Charles

    2016-01-01

    This paper proposes a human identification system via extracted electrocardiogram (ECG) signals. Two hierarchical classification structures based on global shape feature and local statistical feature is used to extract ECG signals. Global shape feature represents the outline information of ECG signals and local statistical feature extracts the…

  2. An overview of the essential differences and similarities of system identification techniques

    NASA Technical Reports Server (NTRS)

    Mehra, Raman K.

    1991-01-01

    Information is given in the form of outlines, graphs, tables and charts. Topics include system identification, Bayesian statistical decision theory, Maximum Likelihood Estimation, identification methods, structural mode identification using a stochastic realization algorithm, and identification results regarding membrane simulations and X-29 flutter flight test data.

  3. Assessing differential gene expression with small sample sizes in oligonucleotide arrays using a mean-variance model.

    PubMed

    Hu, Jianhua; Wright, Fred A

    2007-03-01

    The identification of the genes that are differentially expressed in two-sample microarray experiments remains a difficult problem when the number of arrays is very small. We discuss the implications of using ordinary t-statistics and examine other commonly used variants. For oligonucleotide arrays with multiple probes per gene, we introduce a simple model relating the mean and variance of expression, possibly with gene-specific random effects. Parameter estimates from the model have natural shrinkage properties that guard against inappropriately small variance estimates, and the model is used to obtain a differential expression statistic. A limiting value to the positive false discovery rate (pFDR) for ordinary t-tests provides motivation for our use of the data structure to improve variance estimates. Our approach performs well compared to other proposed approaches in terms of the false discovery rate.

  4. Remote sensing-aided systems for snow qualification, evapotranspiration estimation, and their application in hydrologic models

    NASA Technical Reports Server (NTRS)

    Korram, S.

    1977-01-01

    The design of general remote sensing-aided methodologies was studied to provide the estimates of several important inputs to water yield forecast models. These input parameters are snow area extent, snow water content, and evapotranspiration. The study area is Feather River Watershed (780,000 hectares), Northern California. The general approach involved a stepwise sequence of identification of the required information, sample design, measurement/estimation, and evaluation of results. All the relevent and available information types needed in the estimation process are being defined. These include Landsat, meteorological satellite, and aircraft imagery, topographic and geologic data, ground truth data, and climatic data from ground stations. A cost-effective multistage sampling approach was employed in quantification of all the required parameters. The physical and statistical models for both snow quantification and evapotranspiration estimation was developed. These models use the information obtained by aerial and ground data through appropriate statistical sampling design.

  5. Coherent spectroscopic methods for monitoring pathogens, genetically modified products and nanostructured materials in colloidal solution

    NASA Astrophysics Data System (ADS)

    Moguilnaya, T.; Suminov, Y.; Botikov, A.; Ignatov, S.; Kononenko, A.; Agibalov, A.

    2017-01-01

    We developed the new automatic method that combines the method of forced luminescence and stimulated Brillouin scattering. This method is used for monitoring pathogens, genetically modified products and nanostructured materials in colloidal solution. We carried out the statistical spectral analysis of pathogens, genetically modified soy and nano-particles of silver in water from different regions in order to determine the statistical errors of the method. We studied spectral characteristics of these objects in water to perform the initial identification with 95% probability. These results were used for creation of the model of the device for monitor of pathogenic organisms and working model of the device to determine the genetically modified soy in meat.

  6. Development of a Robust Identifier for NPPs Transients Combining ARIMA Model and EBP Algorithm

    NASA Astrophysics Data System (ADS)

    Moshkbar-Bakhshayesh, Khalil; Ghofrani, Mohammad B.

    2014-08-01

    This study introduces a novel identification method for recognition of nuclear power plants (NPPs) transients by combining the autoregressive integrated moving-average (ARIMA) model and the neural network with error backpropagation (EBP) learning algorithm. The proposed method consists of three steps. First, an EBP based identifier is adopted to distinguish the plant normal states from the faulty ones. In the second step, ARIMA models use integrated (I) process to convert non-stationary data of the selected variables into stationary ones. Subsequently, ARIMA processes, including autoregressive (AR), moving-average (MA), or autoregressive moving-average (ARMA) are used to forecast time series of the selected plant variables. In the third step, for identification the type of transients, the forecasted time series are fed to the modular identifier which has been developed using the latest advances of EBP learning algorithm. Bushehr nuclear power plant (BNPP) transients are probed to analyze the ability of the proposed identifier. Recognition of transient is based on similarity of its statistical properties to the reference one, rather than the values of input patterns. More robustness against noisy data and improvement balance between memorization and generalization are salient advantages of the proposed identifier. Reduction of false identification, sole dependency of identification on the sign of each output signal, selection of the plant variables for transients training independent of each other, and extendibility for identification of more transients without unfavorable effects are other merits of the proposed identifier.

  7. Nonparametric method for failures detection and localization in the actuating subsystem of aircraft control system

    NASA Astrophysics Data System (ADS)

    Karpenko, S. S.; Zybin, E. Yu; Kosyanchuk, V. V.

    2018-02-01

    In this paper we design a nonparametric method for failures detection and localization in the aircraft control system that uses the measurements of the control signals and the aircraft states only. It doesn’t require a priori information of the aircraft model parameters, training or statistical calculations, and is based on algebraic solvability conditions for the aircraft model identification problem. This makes it possible to significantly increase the efficiency of detection and localization problem solution by completely eliminating errors, associated with aircraft model uncertainties.

  8. Soft and Robust Identification of Body Fluid Using Fourier Transform Infrared Spectroscopy and Chemometric Strategies for Forensic Analysis.

    PubMed

    Takamura, Ayari; Watanabe, Ken; Akutsu, Tomoko; Ozawa, Takeaki

    2018-05-31

    Body fluid (BF) identification is a critical part of a criminal investigation because of its ability to suggest how the crime was committed and to provide reliable origins of DNA. In contrast to current methods using serological and biochemical techniques, vibrational spectroscopic approaches provide alternative advantages for forensic BF identification, such as non-destructivity and versatility for various BF types and analytical interests. However, unexplored issues remain for its practical application to forensics; for example, a specific BF needs to be discriminated from all other suspicious materials as well as other BFs, and the method should be applicable even to aged BF samples. Herein, we describe an innovative modeling method for discriminating the ATR FT-IR spectra of various BFs, including peripheral blood, saliva, semen, urine and sweat, to meet the practical demands described above. Spectra from unexpected non-BF samples were efficiently excluded as outliers by adopting the Q-statistics technique. The robustness of the models against aged BFs was significantly improved by using the discrimination scheme of a dichotomous classification tree with hierarchical clustering. The present study advances the use of vibrational spectroscopy and a chemometric strategy for forensic BF identification.

  9. catcher: A Software Program to Detect Answer Copying in Multiple-Choice Tests Based on Nominal Response Model

    ERIC Educational Resources Information Center

    Kalender, Ilker

    2012-01-01

    catcher is a software program designed to compute the [omega] index, a common statistical index for the identification of collusions (cheating) among examinees taking an educational or psychological test. It requires (a) responses and (b) ability estimations of individuals, and (c) item parameters to make computations and outputs the results of…

  10. In-group and role identity influences on the initiation and maintenance of students' voluntary attendance at peer study sessions for statistics.

    PubMed

    White, Katherine M; O'Connor, Erin L; Hamilton, Kyra

    2011-06-01

    Although class attendance is linked to academic performance, questions remain about what determines students' decisions to attend or miss class. In addition to the constructs of a common decision-making model, the theory of planned behaviour, the present study examined the influence of student role identity and university student (in-group) identification for predicting both the initiation and maintenance of students' attendance at voluntary peer-assisted study sessions in a statistics subject. University students enrolled in a statistics subject were invited to complete a questionnaire at two time points across the academic semester. A total of 79 university students completed questionnaires at the first data collection point, with 46 students completing the questionnaire at the second data collection point. Twice during the semester, students' attitudes, subjective norm, perceived behavioural control, student role identity, in-group identification, and intention to attend study sessions were assessed via on-line questionnaires. Objective measures of class attendance records for each half-semester (or 'term') were obtained. Across both terms, students' attitudes predicted their attendance intentions, with intentions predicting class attendance. Earlier in the semester, in addition to perceived behavioural control, both student role identity and in-group identification predicted students' attendance intentions, with only role identity influencing intentions later in the semester. These findings highlight the possible chronology that different identity influences have in determining students' initial and maintained attendance at voluntary sessions designed to facilitate their learning. ©2010 The British Psychological Society.

  11. Behavioral biometrics for verification and recognition of malicious software agents

    NASA Astrophysics Data System (ADS)

    Yampolskiy, Roman V.; Govindaraju, Venu

    2008-04-01

    Homeland security requires technologies capable of positive and reliable identification of humans for law enforcement, government, and commercial applications. As artificially intelligent agents improve in their abilities and become a part of our everyday life, the possibility of using such programs for undermining homeland security increases. Virtual assistants, shopping bots, and game playing programs are used daily by millions of people. We propose applying statistical behavior modeling techniques developed by us for recognition of humans to the identification and verification of intelligent and potentially malicious software agents. Our experimental results demonstrate feasibility of such methods for both artificial agent verification and even for recognition purposes.

  12. Peptide identification

    DOEpatents

    Jarman, Kristin H [Richland, WA; Cannon, William R [Richland, WA; Jarman, Kenneth D [Richland, WA; Heredia-Langner, Alejandro [Richland, WA

    2011-07-12

    Peptides are identified from a list of candidates using collision-induced dissociation tandem mass spectrometry data. A probabilistic model for the occurrence of spectral peaks corresponding to frequently observed partial peptide fragment ions is applied. As part of the identification procedure, a probability score is produced that indicates the likelihood of any given candidate being the correct match. The statistical significance of the score is known without necessarily having reference to the actual identity of the peptide. In one form of the invention, a genetic algorithm is applied to candidate peptides using an objective function that takes into account the number of shifted peaks appearing in the candidate spectrum relative to the test spectrum.

  13. Heterogeneous path ensembles for conformational transitions in semi–atomistic models of adenylate kinase

    PubMed Central

    Bhatt, Divesh; Zuckerman, Daniel M.

    2010-01-01

    We performed “weighted ensemble” path–sampling simulations of adenylate kinase, using several semi–atomistic protein models. The models have an all–atom backbone with various levels of residue interactions. The primary result is that full statistically rigorous path sampling required only a few weeks of single–processor computing time with these models, indicating the addition of further chemical detail should be readily feasible. Our semi–atomistic path ensembles are consistent with previous biophysical findings: the presence of two distinct pathways, identification of intermediates, and symmetry of forward and reverse pathways. PMID:21660120

  14. Modelling of peak temperature during friction stir processing of magnesium alloy AZ91

    NASA Astrophysics Data System (ADS)

    Vaira Vignesh, R.; Padmanaban, R.

    2018-02-01

    Friction stir processing (FSP) is a solid state processing technique with potential to modify the properties of the material through microstructural modification. The study of heat transfer in FSP aids in the identification of defects like flash, inadequate heat input, poor material flow and mixing etc. In this paper, transient temperature distribution during FSP of magnesium alloy AZ91 was simulated using finite element modelling. The numerical model results were validated using the experimental results from the published literature. The model was used to predict the peak temperature obtained during FSP for various process parameter combinations. The simulated peak temperature results were used to develop a statistical model. The effect of process parameters namely tool rotation speed, tool traverse speed and shoulder diameter of the tool on the peak temperature was investigated using the developed statistical model. It was found that peak temperature was directly proportional to tool rotation speed and shoulder diameter and inversely proportional to tool traverse speed.

  15. Development of an errorable car-following driver model

    NASA Astrophysics Data System (ADS)

    Yang, H.-H.; Peng, H.

    2010-06-01

    An errorable car-following driver model is presented in this paper. An errorable driver model is one that emulates human driver's functions and can generate both nominal (error-free), as well as devious (with error) behaviours. This model was developed for evaluation and design of active safety systems. The car-following data used for developing and validating the model were obtained from a large-scale naturalistic driving database. The stochastic car-following behaviour was first analysed and modelled as a random process. Three error-inducing behaviours were then introduced. First, human perceptual limitation was studied and implemented. Distraction due to non-driving tasks was then identified based on the statistical analysis of the driving data. Finally, time delay of human drivers was estimated through a recursive least-square identification process. By including these three error-inducing behaviours, rear-end collisions with the lead vehicle could occur. The simulated crash rate was found to be similar but somewhat higher than that reported in traffic statistics.

  16. Rapid Classification and Identification of Multiple Microorganisms with Accurate Statistical Significance via High-Resolution Tandem Mass Spectrometry

    NASA Astrophysics Data System (ADS)

    Alves, Gelio; Wang, Guanghui; Ogurtsov, Aleksey Y.; Drake, Steven K.; Gucek, Marjan; Sacks, David B.; Yu, Yi-Kuo

    2018-06-01

    Rapid and accurate identification and classification of microorganisms is of paramount importance to public health and safety. With the advance of mass spectrometry (MS) technology, the speed of identification can be greatly improved. However, the increasing number of microbes sequenced is complicating correct microbial identification even in a simple sample due to the large number of candidates present. To properly untwine candidate microbes in samples containing one or more microbes, one needs to go beyond apparent morphology or simple "fingerprinting"; to correctly prioritize the candidate microbes, one needs to have accurate statistical significance in microbial identification. We meet these challenges by using peptide-centric representations of microbes to better separate them and by augmenting our earlier analysis method that yields accurate statistical significance. Here, we present an updated analysis workflow that uses tandem MS (MS/MS) spectra for microbial identification or classification. We have demonstrated, using 226 MS/MS publicly available data files (each containing from 2500 to nearly 100,000 MS/MS spectra) and 4000 additional MS/MS data files, that the updated workflow can correctly identify multiple microbes at the genus and often the species level for samples containing more than one microbe. We have also shown that the proposed workflow computes accurate statistical significances, i.e., E values for identified peptides and unified E values for identified microbes. Our updated analysis workflow MiCId, a freely available software for Microorganism Classification and Identification, is available for download at https://www.ncbi.nlm.nih.gov/CBBresearch/Yu/downloads.html.

  17. Rapid Classification and Identification of Multiple Microorganisms with Accurate Statistical Significance via High-Resolution Tandem Mass Spectrometry.

    PubMed

    Alves, Gelio; Wang, Guanghui; Ogurtsov, Aleksey Y; Drake, Steven K; Gucek, Marjan; Sacks, David B; Yu, Yi-Kuo

    2018-06-05

    Rapid and accurate identification and classification of microorganisms is of paramount importance to public health and safety. With the advance of mass spectrometry (MS) technology, the speed of identification can be greatly improved. However, the increasing number of microbes sequenced is complicating correct microbial identification even in a simple sample due to the large number of candidates present. To properly untwine candidate microbes in samples containing one or more microbes, one needs to go beyond apparent morphology or simple "fingerprinting"; to correctly prioritize the candidate microbes, one needs to have accurate statistical significance in microbial identification. We meet these challenges by using peptide-centric representations of microbes to better separate them and by augmenting our earlier analysis method that yields accurate statistical significance. Here, we present an updated analysis workflow that uses tandem MS (MS/MS) spectra for microbial identification or classification. We have demonstrated, using 226 MS/MS publicly available data files (each containing from 2500 to nearly 100,000 MS/MS spectra) and 4000 additional MS/MS data files, that the updated workflow can correctly identify multiple microbes at the genus and often the species level for samples containing more than one microbe. We have also shown that the proposed workflow computes accurate statistical significances, i.e., E values for identified peptides and unified E values for identified microbes. Our updated analysis workflow MiCId, a freely available software for Microorganism Classification and Identification, is available for download at https://www.ncbi.nlm.nih.gov/CBBresearch/Yu/downloads.html . Graphical Abstract ᅟ.

  18. Nonparametric identification of nonlinear dynamic systems using a synchronisation-based method

    NASA Astrophysics Data System (ADS)

    Kenderi, Gábor; Fidlin, Alexander

    2014-12-01

    The present study proposes an identification method for highly nonlinear mechanical systems that does not require a priori knowledge of the underlying nonlinearities to reconstruct arbitrary restoring force surfaces between degrees of freedom. This approach is based on the master-slave synchronisation between a dynamic model of the system as the slave and the real system as the master using measurements of the latter. As the model synchronises to the measurements, it becomes an observer of the real system. The optimal observer algorithm in a least-squares sense is given by the Kalman filter. Using the well-known state augmentation technique, the Kalman filter can be turned into a dual state and parameter estimator to identify parameters of a priori characterised nonlinearities. The paper proposes an extension of this technique towards nonparametric identification. A general system model is introduced by describing the restoring forces as bilateral spring-dampers with time-variant coefficients, which are estimated as augmented states. The estimation procedure is followed by an a posteriori statistical analysis to reconstruct noise-free restoring force characteristics using the estimated states and their estimated variances. Observability is provided using only one measured mechanical quantity per degree of freedom, which makes this approach less demanding in the number of necessary measurement signals compared with truly nonparametric solutions, which typically require displacement, velocity and acceleration signals. Additionally, due to the statistical rigour of the procedure, it successfully addresses signals corrupted by significant measurement noise. In the present paper, the method is described in detail, which is followed by numerical examples of one degree of freedom (1DoF) and 2DoF mechanical systems with strong nonlinearities of vibro-impact type to demonstrate the effectiveness of the proposed technique.

  19. Evaluating the statistical power of DNA-based identification, exemplified by 'The missing grandchildren of Argentina'.

    PubMed

    Kling, Daniel; Egeland, Thore; Piñero, Mariana Herrera; Vigeland, Magnus Dehli

    2017-11-01

    Methods and implementations of DNA-based identification are well established in several forensic contexts. However, assessing the statistical power of these methods has been largely overlooked, except in the simplest cases. In this paper we outline general methods for such power evaluation, and apply them to a large set of family reunification cases, where the objective is to decide whether a person of interest (POI) is identical to the missing person (MP) in a family, based on the DNA profile of the POI and available family members. As such, this application closely resembles database searching and disaster victim identification (DVI). If parents or children of the MP are available, they will typically provide sufficient statistical evidence to settle the case. However, if one must resort to more distant relatives, it is not a priori obvious that a reliable conclusion is likely to be reached. In these cases power evaluation can be highly valuable, for instance in the recruitment of additional family members. To assess the power in an identification case, we advocate the combined use of two statistics: the Probability of Exclusion, and the Probability of Exceedance. The former is the probability that the genotypes of a random, unrelated person are incompatible with the available family data. If this is close to 1, it is likely that a conclusion will be achieved regarding general relatedness, but not necessarily the specific relationship. To evaluate the ability to recognize a true match, we use simulations to estimate exceedance probabilities, i.e. the probability that the likelihood ratio will exceed a given threshold, assuming that the POI is indeed the MP. All simulations are done conditionally on available family data. Such conditional simulations have a long history in medical linkage analysis, but to our knowledge this is the first systematic forensic genetics application. Also, for forensic markers mutations cannot be ignored and therefore current models and implementations must be extended. All the tools are freely available in Familias (http://www.familias.no) empowered by the R library paramlink. The above approach is applied to a large and important data set: 'The missing grandchildren of Argentina'. We evaluate the power of 196 families from the DNA reference databank (Banco Nacional de Datos Genéticos, http://www.bndg.gob.ar. As a result we show that 58 of the families have poor statistical power and require additional genetic data to enable a positive identification. Copyright © 2017 Elsevier B.V. All rights reserved.

  20. Advanced building energy management system demonstration for Department of Defense buildings.

    PubMed

    O'Neill, Zheng; Bailey, Trevor; Dong, Bing; Shashanka, Madhusudana; Luo, Dong

    2013-08-01

    This paper presents an advanced building energy management system (aBEMS) that employs advanced methods of whole-building performance monitoring combined with statistical methods of learning and data analysis to enable identification of both gradual and discrete performance erosion and faults. This system assimilated data collected from multiple sources, including blueprints, reduced-order models (ROM) and measurements, and employed advanced statistical learning algorithms to identify patterns of anomalies. The results were presented graphically in a manner understandable to facilities managers. A demonstration of aBEMS was conducted in buildings at Naval Station Great Lakes. The facility building management systems were extended to incorporate the energy diagnostics and analysis algorithms, producing systematic identification of more efficient operation strategies. At Naval Station Great Lakes, greater than 20% savings were demonstrated for building energy consumption by improving facility manager decision support to diagnose energy faults and prioritize alternative, energy-efficient operation strategies. The paper concludes with recommendations for widespread aBEMS success. © 2013 New York Academy of Sciences.

  1. Combining Phase Identification and Statistic Modeling for Automated Parallel Benchmark Generation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jin, Ye; Ma, Xiaosong; Liu, Qing Gary

    2015-01-01

    Parallel application benchmarks are indispensable for evaluating/optimizing HPC software and hardware. However, it is very challenging and costly to obtain high-fidelity benchmarks reflecting the scale and complexity of state-of-the-art parallel applications. Hand-extracted synthetic benchmarks are time-and labor-intensive to create. Real applications themselves, while offering most accurate performance evaluation, are expensive to compile, port, reconfigure, and often plainly inaccessible due to security or ownership concerns. This work contributes APPRIME, a novel tool for trace-based automatic parallel benchmark generation. Taking as input standard communication-I/O traces of an application's execution, it couples accurate automatic phase identification with statistical regeneration of event parameters tomore » create compact, portable, and to some degree reconfigurable parallel application benchmarks. Experiments with four NAS Parallel Benchmarks (NPB) and three real scientific simulation codes confirm the fidelity of APPRIME benchmarks. They retain the original applications' performance characteristics, in particular the relative performance across platforms.« less

  2. Linear regression analysis: part 14 of a series on evaluation of scientific publications.

    PubMed

    Schneider, Astrid; Hommel, Gerhard; Blettner, Maria

    2010-11-01

    Regression analysis is an important statistical method for the analysis of medical data. It enables the identification and characterization of relationships among multiple factors. It also enables the identification of prognostically relevant risk factors and the calculation of risk scores for individual prognostication. This article is based on selected textbooks of statistics, a selective review of the literature, and our own experience. After a brief introduction of the uni- and multivariable regression models, illustrative examples are given to explain what the important considerations are before a regression analysis is performed, and how the results should be interpreted. The reader should then be able to judge whether the method has been used correctly and interpret the results appropriately. The performance and interpretation of linear regression analysis are subject to a variety of pitfalls, which are discussed here in detail. The reader is made aware of common errors of interpretation through practical examples. Both the opportunities for applying linear regression analysis and its limitations are presented.

  3. Observing Consistency in Online Communication Patterns for User Re-Identification

    PubMed Central

    Venter, Hein S.

    2016-01-01

    Comprehension of the statistical and structural mechanisms governing human dynamics in online interaction plays a pivotal role in online user identification, online profile development, and recommender systems. However, building a characteristic model of human dynamics on the Internet involves a complete analysis of the variations in human activity patterns, which is a complex process. This complexity is inherent in human dynamics and has not been extensively studied to reveal the structural composition of human behavior. A typical method of anatomizing such a complex system is viewing all independent interconnectivity that constitutes the complexity. An examination of the various dimensions of human communication pattern in online interactions is presented in this paper. The study employed reliable server-side web data from 31 known users to explore characteristics of human-driven communications. Various machine-learning techniques were explored. The results revealed that each individual exhibited a relatively consistent, unique behavioral signature and that the logistic regression model and model tree can be used to accurately distinguish online users. These results are applicable to one-to-one online user identification processes, insider misuse investigation processes, and online profiling in various areas. PMID:27918593

  4. Person authentication using brainwaves (EEG) and maximum a posteriori model adaptation.

    PubMed

    Marcel, Sébastien; Millán, José Del R

    2007-04-01

    In this paper, we investigate the use of brain activity for person authentication. It has been shown in previous studies that the brain-wave pattern of every individual is unique and that the electroencephalogram (EEG) can be used for biometric identification. EEG-based biometry is an emerging research topic and we believe that it may open new research directions and applications in the future. However, very little work has been done in this area and was focusing mainly on person identification but not on person authentication. Person authentication aims to accept or to reject a person claiming an identity, i.e., comparing a biometric data to one template, while the goal of person identification is to match the biometric data against all the records in a database. We propose the use of a statistical framework based on Gaussian Mixture Models and Maximum A Posteriori model adaptation, successfully applied to speaker and face authentication, which can deal with only one training session. We perform intensive experimental simulations using several strict train/test protocols to show the potential of our method. We also show that there are some mental tasks that are more appropriate for person authentication than others.

  5. OPLS statistical model versus linear regression to assess sonographic predictors of stroke prognosis.

    PubMed

    Vajargah, Kianoush Fathi; Sadeghi-Bazargani, Homayoun; Mehdizadeh-Esfanjani, Robab; Savadi-Oskouei, Daryoush; Farhoudi, Mehdi

    2012-01-01

    The objective of the present study was to assess the comparable applicability of orthogonal projections to latent structures (OPLS) statistical model vs traditional linear regression in order to investigate the role of trans cranial doppler (TCD) sonography in predicting ischemic stroke prognosis. The study was conducted on 116 ischemic stroke patients admitted to a specialty neurology ward. The Unified Neurological Stroke Scale was used once for clinical evaluation on the first week of admission and again six months later. All data was primarily analyzed using simple linear regression and later considered for multivariate analysis using PLS/OPLS models through the SIMCA P+12 statistical software package. The linear regression analysis results used for the identification of TCD predictors of stroke prognosis were confirmed through the OPLS modeling technique. Moreover, in comparison to linear regression, the OPLS model appeared to have higher sensitivity in detecting the predictors of ischemic stroke prognosis and detected several more predictors. Applying the OPLS model made it possible to use both single TCD measures/indicators and arbitrarily dichotomized measures of TCD single vessel involvement as well as the overall TCD result. In conclusion, the authors recommend PLS/OPLS methods as complementary rather than alternative to the available classical regression models such as linear regression.

  6. DeltaSA tool for source apportionment benchmarking, description and sensitivity analysis

    NASA Astrophysics Data System (ADS)

    Pernigotti, D.; Belis, C. A.

    2018-05-01

    DeltaSA is an R-package and a Java on-line tool developed at the EC-Joint Research Centre to assist and benchmark source apportionment applications. Its key functionalities support two critical tasks in this kind of studies: the assignment of a factor to a source in factor analytical models (source identification) and the model performance evaluation. The source identification is based on the similarity between a given factor and source chemical profiles from public databases. The model performance evaluation is based on statistical indicators used to compare model output with reference values generated in intercomparison exercises. The references values are calculated as the ensemble average of the results reported by participants that have passed a set of testing criteria based on chemical profiles and time series similarity. In this study, a sensitivity analysis of the model performance criteria is accomplished using the results of a synthetic dataset where "a priori" references are available. The consensus modulated standard deviation punc gives the best choice for the model performance evaluation when a conservative approach is adopted.

  7. Two-level structural sparsity regularization for identifying lattices and defects in noisy images

    DOE PAGES

    Li, Xin; Belianinov, Alex; Dyck, Ondrej E.; ...

    2018-03-09

    Here, this paper presents a regularized regression model with a two-level structural sparsity penalty applied to locate individual atoms in a noisy scanning transmission electron microscopy image (STEM). In crystals, the locations of atoms is symmetric, condensed into a few lattice groups. Therefore, by identifying the underlying lattice in a given image, individual atoms can be accurately located. We propose to formulate the identification of the lattice groups as a sparse group selection problem. Furthermore, real atomic scale images contain defects and vacancies, so atomic identification based solely on a lattice group may result in false positives and false negatives.more » To minimize error, model includes an individual sparsity regularization in addition to the group sparsity for a within-group selection, which results in a regression model with a two-level sparsity regularization. We propose a modification of the group orthogonal matching pursuit (gOMP) algorithm with a thresholding step to solve the atom finding problem. The convergence and statistical analyses of the proposed algorithm are presented. The proposed algorithm is also evaluated through numerical experiments with simulated images. The applicability of the algorithm on determination of atom structures and identification of imaging distortions and atomic defects was demonstrated using three real STEM images. In conclusion, we believe this is an important step toward automatic phase identification and assignment with the advent of genomic databases for materials.« less

  8. Two-level structural sparsity regularization for identifying lattices and defects in noisy images

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Li, Xin; Belianinov, Alex; Dyck, Ondrej E.

    Here, this paper presents a regularized regression model with a two-level structural sparsity penalty applied to locate individual atoms in a noisy scanning transmission electron microscopy image (STEM). In crystals, the locations of atoms is symmetric, condensed into a few lattice groups. Therefore, by identifying the underlying lattice in a given image, individual atoms can be accurately located. We propose to formulate the identification of the lattice groups as a sparse group selection problem. Furthermore, real atomic scale images contain defects and vacancies, so atomic identification based solely on a lattice group may result in false positives and false negatives.more » To minimize error, model includes an individual sparsity regularization in addition to the group sparsity for a within-group selection, which results in a regression model with a two-level sparsity regularization. We propose a modification of the group orthogonal matching pursuit (gOMP) algorithm with a thresholding step to solve the atom finding problem. The convergence and statistical analyses of the proposed algorithm are presented. The proposed algorithm is also evaluated through numerical experiments with simulated images. The applicability of the algorithm on determination of atom structures and identification of imaging distortions and atomic defects was demonstrated using three real STEM images. In conclusion, we believe this is an important step toward automatic phase identification and assignment with the advent of genomic databases for materials.« less

  9. A statistical framework for biomedical literature mining.

    PubMed

    Chung, Dongjun; Lawson, Andrew; Zheng, W Jim

    2017-09-30

    In systems biology, it is of great interest to identify new genes that were not previously reported to be associated with biological pathways related to various functions and diseases. Identification of these new pathway-modulating genes does not only promote understanding of pathway regulation mechanisms but also allow identification of novel targets for therapeutics. Recently, biomedical literature has been considered as a valuable resource to investigate pathway-modulating genes. While the majority of currently available approaches are based on the co-occurrence of genes within an abstract, it has been reported that these approaches show only sub-optimal performances because 70% of abstracts contain information only for a single gene. To overcome such limitation, we propose a novel statistical framework based on the concept of ontology fingerprint that uses gene ontology to extract information from large biomedical literature data. The proposed framework simultaneously identifies pathway-modulating genes and facilitates interpreting functions of these new genes. We also propose a computationally efficient posterior inference procedure based on Metropolis-Hastings within Gibbs sampler for parameter updates and the poor man's reversible jump Markov chain Monte Carlo approach for model selection. We evaluate the proposed statistical framework with simulation studies, experimental validation, and an application to studies of pathway-modulating genes in yeast. The R implementation of the proposed model is currently available at https://dongjunchung.github.io/bayesGO/. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.

  10. Disconcordance in Statistical Models of Bisphenol A and Chronic Disease Outcomes in NHANES 2003-08

    PubMed Central

    Casey, Martin F.; Neidell, Matthew

    2013-01-01

    Background Bisphenol A (BPA), a high production chemical commonly found in plastics, has drawn great attention from researchers due to the substance’s potential toxicity. Using data from three National Health and Nutrition Examination Survey (NHANES) cycles, we explored the consistency and robustness of BPA’s reported effects on coronary heart disease and diabetes. Methods And Findings We report the use of three different statistical models in the analysis of BPA: (1) logistic regression, (2) log-linear regression, and (3) dose-response logistic regression. In each variation, confounders were added in six blocks to account for demographics, urinary creatinine, source of BPA exposure, healthy behaviours, and phthalate exposure. Results were sensitive to the variations in functional form of our statistical models, but no single model yielded consistent results across NHANES cycles. Reported ORs were also found to be sensitive to inclusion/exclusion criteria. Further, observed effects, which were most pronounced in NHANES 2003-04, could not be explained away by confounding. Conclusions Limitations in the NHANES data and a poor understanding of the mode of action of BPA have made it difficult to develop informative statistical models. Given the sensitivity of effect estimates to functional form, researchers should report results using multiple specifications with different assumptions about BPA measurement, thus allowing for the identification of potential discrepancies in the data. PMID:24223205

  11. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Webb-Robertson, Bobbie-Jo M.

    Accurate identification of peptides is a current challenge in mass spectrometry (MS) based proteomics. The standard approach uses a search routine to compare tandem mass spectra to a database of peptides associated with the target organism. These database search routines yield multiple metrics associated with the quality of the mapping of the experimental spectrum to the theoretical spectrum of a peptide. The structure of these results make separating correct from false identifications difficult and has created a false identification problem. Statistical confidence scores are an approach to battle this false positive problem that has led to significant improvements in peptidemore » identification. We have shown that machine learning, specifically support vector machine (SVM), is an effective approach to separating true peptide identifications from false ones. The SVM-based peptide statistical scoring method transforms a peptide into a vector representation based on database search metrics to train and validate the SVM. In practice, following the database search routine, a peptides is denoted in its vector representation and the SVM generates a single statistical score that is then used to classify presence or absence in the sample« less

  12. Rapid Identification of Candida Species by Using Nuclear Magnetic Resonance Spectroscopy and a Statistical Classification Strategy

    PubMed Central

    Himmelreich, Uwe; Somorjai, Ray L.; Dolenko, Brion; Lee, Ok Cha; Daniel, Heide-Marie; Murray, Ronan; Mountford, Carolyn E.; Sorrell, Tania C.

    2003-01-01

    Nuclear magnetic resonance (NMR) spectra were acquired from suspensions of clinically important yeast species of the genus Candida to characterize the relationship between metabolite profiles and species identification. Major metabolites were identified by using two-dimensional correlation NMR spectroscopy. One-dimensional proton NMR spectra were analyzed by using a staged statistical classification strategy. Analysis of NMR spectra from 442 isolates of Candida albicans, C. glabrata, C. krusei, C. parapsilosis, and C. tropicalis resulted in rapid, accurate identification when compared with conventional and DNA-based identification. Spectral regions used for the classification of the five yeast species revealed species-specific differences in relative amounts of lipids, trehalose, polyols, and other metabolites. Isolates of C. parapsilosis and C. glabrata with unusual PCR fingerprinting patterns also generated atypical NMR spectra, suggesting the possibility of intraspecies discontinuity. We conclude that NMR spectroscopy combined with a statistical classification strategy is a rapid, nondestructive, and potentially valuable method for identification and chemotaxonomic characterization that may be broadly applicable to fungi and other microorganisms. PMID:12902244

  13. Effect of a Simulation Exercise on Restorative Identification Skills of First Year Dental Hygiene Students.

    PubMed

    Lemaster, Margaret; Flores, Joyce M; Blacketer, Margaret S

    2016-02-01

    This study explored the effectiveness of simulated mouth models to improve identification and recording of dental restorations when compared to using traditional didactic instruction combined with 2-dimensional images. Simulation has been adopted into medical and dental education curriculum to improve both student learning and patient safety outcomes. A 2-sample, independent t-test analysis of data was conducted to compare graded dental recordings of dental hygiene students using simulated mouth models and dental hygiene students using 2-dimensional photographs. Evaluations from graded dental charts were analyzed and compared between groups of students using the simulated mouth models containing random placement of custom preventive and restorative materials and traditional 2-dimensional representations of didactically described conditions. Results demonstrated a statistically significant (p≤0.0001) difference: for experimental group, students using the simulated mouth models to identify and record dental conditions had a mean of 86.73 and variance of 33.84. The control group students using traditional 2-dimensional images mean graded dental chart scores were 74.43 and variance was 14.25. Using modified simulation technology for dental charting identification may increase level of dental charting skill competency in first year dental hygiene students. Copyright © 2016 The American Dental Hygienists’ Association.

  14. Feature Extraction with GMDH-Type Neural Networks for EEG-Based Person Identification.

    PubMed

    Schetinin, Vitaly; Jakaite, Livija; Nyah, Ndifreke; Novakovic, Dusica; Krzanowski, Wojtek

    2018-08-01

    The brain activity observed on EEG electrodes is influenced by volume conduction and functional connectivity of a person performing a task. When the task is a biometric test the EEG signals represent the unique "brain print", which is defined by the functional connectivity that is represented by the interactions between electrodes, whilst the conduction components cause trivial correlations. Orthogonalization using autoregressive modeling minimizes the conduction components, and then the residuals are related to features correlated with the functional connectivity. However, the orthogonalization can be unreliable for high-dimensional EEG data. We have found that the dimensionality can be significantly reduced if the baselines required for estimating the residuals can be modeled by using relevant electrodes. In our approach, the required models are learnt by a Group Method of Data Handling (GMDH) algorithm which we have made capable of discovering reliable models from multidimensional EEG data. In our experiments on the EEG-MMI benchmark data which include 109 participants, the proposed method has correctly identified all the subjects and provided a statistically significant ([Formula: see text]) improvement of the identification accuracy. The experiments have shown that the proposed GMDH method can learn new features from multi-electrode EEG data, which are capable to improve the accuracy of biometric identification.

  15. Statistical use of argonaute expression and RISC assembly in microRNA target identification.

    PubMed

    Stanhope, Stephen A; Sengupta, Srikumar; den Boon, Johan; Ahlquist, Paul; Newton, Michael A

    2009-09-01

    MicroRNAs (miRNAs) posttranscriptionally regulate targeted messenger RNAs (mRNAs) by inducing cleavage or otherwise repressing their translation. We address the problem of detecting m/miRNA targeting relationships in homo sapiens from microarray data by developing statistical models that are motivated by the biological mechanisms used by miRNAs. The focus of our modeling is the construction, activity, and mediation of RNA-induced silencing complexes (RISCs) competent for targeted mRNA cleavage. We demonstrate that regression models accommodating RISC abundance and controlling for other mediating factors fit the expression profiles of known target pairs substantially better than models based on m/miRNA expressions alone, and lead to verifications of computational target pair predictions that are more sensitive than those based on marginal expression levels. Because our models are fully independent of exogenous results from sequence-based computational methods, they are appropriate for use as either a primary or secondary source of information regarding m/miRNA target pair relationships, especially in conjunction with high-throughput expression studies.

  16. Faster Mass Spectrometry-based Protein Inference: Junction Trees are More Efficient than Sampling and Marginalization by Enumeration

    PubMed Central

    Serang, Oliver; Noble, William Stafford

    2012-01-01

    The problem of identifying the proteins in a complex mixture using tandem mass spectrometry can be framed as an inference problem on a graph that connects peptides to proteins. Several existing protein identification methods make use of statistical inference methods for graphical models, including expectation maximization, Markov chain Monte Carlo, and full marginalization coupled with approximation heuristics. We show that, for this problem, the majority of the cost of inference usually comes from a few highly connected subgraphs. Furthermore, we evaluate three different statistical inference methods using a common graphical model, and we demonstrate that junction tree inference substantially improves rates of convergence compared to existing methods. The python code used for this paper is available at http://noble.gs.washington.edu/proj/fido. PMID:22331862

  17. Using Remote Sensing Observations and Empirical-Statistical Methods to Understand the Present State and Predictable Future Changes in the State of Permafrost Distribution in North-Western Himalayas

    NASA Astrophysics Data System (ADS)

    Baral, P.; Haq, M. A.; Mangan, P.

    2017-12-01

    The impacts of climate change on extent of permafrost degradation in the Himalayas and its effect upon the carbon cycle and ecosystem changes are not well understood due to lack of historical ground-based observations. We have used high resolution optical and satellite radar observations and applied empirical-statistical methods for the estimation of spatial and altitudinal limits of permafrost distribution in North-Western Himalayas. Visual interpretations of morphological characteristics using high resolution optical images have been used for mapping, identification and classification of distinctive geomorphological landforms. Subsequently, we have created a detail inventory of different types of rock glaciers and studied the contribution of topo climatic factors in their occurrence and distribution through Logistic Regression modelling. This model establishes the relationship between presence of permafrost and topo-climatic factors like Mean Annual Air Temperature (MAAT), Potential Incoming Solar Radiation (PISR), altitude, aspect and slope. This relationship has been used to estimate the distributed probability of permafrost occurrence, within a GIS environment. The ability of the model to predict permafrost occurrence has been tested using locations of mapped rock glaciers and the area under the Receiver Operating Characteristic (ROC) curve. Additionally, interferometric properties of Sentinel and ALOS PALSAR datasets are used for the identification and assessment of rock glacier activity in the region.

  18. The interprocess NIR sampling as an alternative approach to multivariate statistical process control for identifying sources of product-quality variability.

    PubMed

    Marković, Snežana; Kerč, Janez; Horvat, Matej

    2017-03-01

    We are presenting a new approach of identifying sources of variability within a manufacturing process by NIR measurements of samples of intermediate material after each consecutive unit operation (interprocess NIR sampling technique). In addition, we summarize the development of a multivariate statistical process control (MSPC) model for the production of enteric-coated pellet product of the proton-pump inhibitor class. By developing provisional NIR calibration models, the identification of critical process points yields comparable results to the established MSPC modeling procedure. Both approaches are shown to lead to the same conclusion, identifying parameters of extrusion/spheronization and characteristics of lactose that have the greatest influence on the end-product's enteric coating performance. The proposed approach enables quicker and easier identification of variability sources during manufacturing process, especially in cases when historical process data is not straightforwardly available. In the presented case the changes of lactose characteristics are influencing the performance of the extrusion/spheronization process step. The pellet cores produced by using one (considered as less suitable) lactose source were on average larger and more fragile, leading to consequent breakage of the cores during subsequent fluid bed operations. These results were confirmed by additional experimental analyses illuminating the underlying mechanism of fracture of oblong pellets during the pellet coating process leading to compromised film coating.

  19. Automatic stage identification of Drosophila egg chamber based on DAPI images

    PubMed Central

    Jia, Dongyu; Xu, Qiuping; Xie, Qian; Mio, Washington; Deng, Wu-Min

    2016-01-01

    The Drosophila egg chamber, whose development is divided into 14 stages, is a well-established model for developmental biology. However, visual stage determination can be a tedious, subjective and time-consuming task prone to errors. Our study presents an objective, reliable and repeatable automated method for quantifying cell features and classifying egg chamber stages based on DAPI images. The proposed approach is composed of two steps: 1) a feature extraction step and 2) a statistical modeling step. The egg chamber features used are egg chamber size, oocyte size, egg chamber ratio and distribution of follicle cells. Methods for determining the on-site of the polytene stage and centripetal migration are also discussed. The statistical model uses linear and ordinal regression to explore the stage-feature relationships and classify egg chamber stages. Combined with machine learning, our method has great potential to enable discovery of hidden developmental mechanisms. PMID:26732176

  20. Estimation of dew point temperature using neuro-fuzzy and neural network techniques

    NASA Astrophysics Data System (ADS)

    Kisi, Ozgur; Kim, Sungwon; Shiri, Jalal

    2013-11-01

    This study investigates the ability of two different artificial neural network (ANN) models, generalized regression neural networks model (GRNNM) and Kohonen self-organizing feature maps neural networks model (KSOFM), and two different adaptive neural fuzzy inference system (ANFIS) models, ANFIS model with sub-clustering identification (ANFIS-SC) and ANFIS model with grid partitioning identification (ANFIS-GP), for estimating daily dew point temperature. The climatic data that consisted of 8 years of daily records of air temperature, sunshine hours, wind speed, saturation vapor pressure, relative humidity, and dew point temperature from three weather stations, Daego, Pohang, and Ulsan, in South Korea were used in the study. The estimates of ANN and ANFIS models were compared according to the three different statistics, root mean square errors, mean absolute errors, and determination coefficient. Comparison results revealed that the ANFIS-SC, ANFIS-GP, and GRNNM models showed almost the same accuracy and they performed better than the KSOFM model. Results also indicated that the sunshine hours, wind speed, and saturation vapor pressure have little effect on dew point temperature. It was found that the dew point temperature could be successfully estimated by using T mean and R H variables.

  1. Application of PCA and SIMCA statistical analysis of FT-IR spectra for the classification and identification of different slag types with environmental origin.

    PubMed

    Stumpe, B; Engel, T; Steinweg, B; Marschner, B

    2012-04-03

    In the past, different slag materials were often used for landscaping and construction purposes or simply dumped. Nowadays German environmental laws strictly control the use of slags, but there is still a remaining part of 35% which is uncontrolled dumped in landfills. Since some slags have high heavy metal contents and different slag types have typical chemical and physical properties that will influence the risk potential and other characteristics of the deposits, an identification of the slag types is needed. We developed a FT-IR-based statistical method to identify different slags classes. Slags samples were collected at different sites throughout various cities within the industrial Ruhr area. Then, spectra of 35 samples from four different slags classes, ladle furnace (LF), blast furnace (BF), oxygen furnace steel (OF), and zinc furnace slags (ZF), were determined in the mid-infrared region (4000-400 cm(-1)). The spectra data sets were subject to statistical classification methods for the separation of separate spectral data of different slag classes. Principal component analysis (PCA) models for each slag class were developed and further used for soft independent modeling of class analogy (SIMCA). Precise classification of slag samples into four different slag classes were achieved using two different SIMCA models stepwise. At first, SIMCA 1 was used for classification of ZF as well as OF slags over the total spectral range. If no correct classification was found, then the spectrum was analyzed with SIMCA 2 at reduced wavenumbers for the classification of LF as well as BF spectra. As a result, we provide a time- and cost-efficient method based on FT-IR spectroscopy for processing and identifying large numbers of environmental slag samples.

  2. MixGF: spectral probabilities for mixture spectra from more than one peptide.

    PubMed

    Wang, Jian; Bourne, Philip E; Bandeira, Nuno

    2014-12-01

    In large-scale proteomic experiments, multiple peptide precursors are often cofragmented simultaneously in the same mixture tandem mass (MS/MS) spectrum. These spectra tend to elude current computational tools because of the ubiquitous assumption that each spectrum is generated from only one peptide. Therefore, tools that consider multiple peptide matches to each MS/MS spectrum can potentially improve the relatively low spectrum identification rate often observed in proteomics experiments. More importantly, data independent acquisition protocols promoting the cofragmentation of multiple precursors are emerging as alternative methods that can greatly improve the throughput of peptide identifications but their success also depends on the availability of algorithms to identify multiple peptides from each MS/MS spectrum. Here we address a fundamental question in the identification of mixture MS/MS spectra: determining the statistical significance of multiple peptides matched to a given MS/MS spectrum. We propose the MixGF generating function model to rigorously compute the statistical significance of peptide identifications for mixture spectra and show that this approach improves the sensitivity of current mixture spectra database search tools by a ≈30-390%. Analysis of multiple data sets with MixGF reveals that in complex biological samples the number of identified mixture spectra can be as high as 20% of all the identified spectra and the number of unique peptides identified only in mixture spectra can be up to 35.4% of those identified in single-peptide spectra. © 2014 by The American Society for Biochemistry and Molecular Biology, Inc.

  3. Evaluation of mass spectrometric data using principal component analysis for determination of the effects of organic lakes on protein binder identification.

    PubMed

    Hrdlickova Kuckova, Stepanka; Rambouskova, Gabriela; Hynek, Radovan; Cejnar, Pavel; Oltrogge, Doris; Fuchs, Robert

    2015-11-01

    Matrix-assisted laser desorption/ionisation-time of flight (MALDI-TOF) mass spectrometry is commonly used for the identification of proteinaceous binders and their mixtures in artworks. The determination of protein binders is based on a comparison between the m/z values of tryptic peptides in the unknown sample and a reference one (egg, casein, animal glues etc.), but this method has greater potential to study changes due to ageing and the influence of organic/inorganic components on protein identification. However, it is necessary to then carry out statistical evaluation on the obtained data. Before now, it has been complicated to routinely convert the mass spectrometric data into a statistical programme, to extract and match the appropriate peaks. Only several 'homemade' computer programmes without user-friendly interfaces are available for these purposes. In this paper, we would like to present our completely new, publically available, non-commercial software, ms-alone and multiMS-toolbox, for principal component analyses of MALDI-TOF MS data for R software, and their application to the study of the influence of heterogeneous matrices (organic lakes) for protein identification. Using this new software, we determined the main factors that influence the protein analyses of artificially aged model mixtures of organic lakes and fish glue, prepared according to historical recipes that were used for book illumination, using MALDI-TOF peptide mass mapping. Copyright © 2015 John Wiley & Sons, Ltd.

  4. MixGF: Spectral Probabilities for Mixture Spectra from more than One Peptide*

    PubMed Central

    Wang, Jian; Bourne, Philip E.; Bandeira, Nuno

    2014-01-01

    In large-scale proteomic experiments, multiple peptide precursors are often cofragmented simultaneously in the same mixture tandem mass (MS/MS) spectrum. These spectra tend to elude current computational tools because of the ubiquitous assumption that each spectrum is generated from only one peptide. Therefore, tools that consider multiple peptide matches to each MS/MS spectrum can potentially improve the relatively low spectrum identification rate often observed in proteomics experiments. More importantly, data independent acquisition protocols promoting the cofragmentation of multiple precursors are emerging as alternative methods that can greatly improve the throughput of peptide identifications but their success also depends on the availability of algorithms to identify multiple peptides from each MS/MS spectrum. Here we address a fundamental question in the identification of mixture MS/MS spectra: determining the statistical significance of multiple peptides matched to a given MS/MS spectrum. We propose the MixGF generating function model to rigorously compute the statistical significance of peptide identifications for mixture spectra and show that this approach improves the sensitivity of current mixture spectra database search tools by a ≈30–390%. Analysis of multiple data sets with MixGF reveals that in complex biological samples the number of identified mixture spectra can be as high as 20% of all the identified spectra and the number of unique peptides identified only in mixture spectra can be up to 35.4% of those identified in single-peptide spectra. PMID:25225354

  5. Speaker gender identification based on majority vote classifiers

    NASA Astrophysics Data System (ADS)

    Mezghani, Eya; Charfeddine, Maha; Nicolas, Henri; Ben Amar, Chokri

    2017-03-01

    Speaker gender identification is considered among the most important tools in several multimedia applications namely in automatic speech recognition, interactive voice response systems and audio browsing systems. Gender identification systems performance is closely linked to the selected feature set and the employed classification model. Typical techniques are based on selecting the best performing classification method or searching optimum tuning of one classifier parameters through experimentation. In this paper, we consider a relevant and rich set of features involving pitch, MFCCs as well as other temporal and frequency-domain descriptors. Five classification models including decision tree, discriminant analysis, nave Bayes, support vector machine and k-nearest neighbor was experimented. The three best perming classifiers among the five ones will contribute by majority voting between their scores. Experimentations were performed on three different datasets spoken in three languages: English, German and Arabic in order to validate language independency of the proposed scheme. Results confirm that the presented system has reached a satisfying accuracy rate and promising classification performance thanks to the discriminating abilities and diversity of the used features combined with mid-level statistics.

  6. Identification of microRNAs with regulatory potential using a matched microRNA-mRNA time-course data.

    PubMed

    Jayaswal, Vivek; Lutherborrow, Mark; Ma, David D F; Hwa Yang, Yee

    2009-05-01

    Over the past decade, a class of small RNA molecules called microRNAs (miRNAs) has been shown to regulate gene expression at the post-transcription stage. While early work focused on the identification of miRNAs using a combination of experimental and computational techniques, subsequent studies have focused on identification of miRNA-target mRNA pairs as each miRNA can have hundreds of mRNA targets. The experimental validation of some miRNAs as oncogenic has provided further motivation for research in this area. In this article we propose an odds-ratio (OR) statistic for identification of regulatory miRNAs. It is based on integrative analysis of matched miRNA and mRNA time-course microarray data. The OR-statistic was used for (i) identification of miRNAs with regulatory potential, (ii) identification of miRNA-target mRNA pairs and (iii) identification of time lags between changes in miRNA expression and those of its target mRNAs. We applied the OR-statistic to a cancer data set and identified a small set of miRNAs that were negatively correlated to mRNAs. A literature survey revealed that some of the miRNAs that were predicted to be regulatory, were indeed oncogenic or tumor suppressors. Finally, some of the predicted miRNA targets have been shown to be experimentally valid.

  7. Anomaly Detection in Gamma-Ray Vehicle Spectra with Principal Components Analysis and Mahalanobis Distances

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Tardiff, Mark F.; Runkle, Robert C.; Anderson, K. K.

    2006-01-23

    The goal of primary radiation monitoring in support of routine screening and emergency response is to detect characteristics in vehicle radiation signatures that indicate the presence of potential threats. Two conceptual approaches to analyzing gamma-ray spectra for threat detection are isotope identification and anomaly detection. While isotope identification is the time-honored method, an emerging technique is anomaly detection that uses benign vehicle gamma ray signatures to define an expectation of the radiation signature for vehicles that do not pose a threat. Newly acquired spectra are then compared to this expectation using statistical criteria that reflect acceptable false alarm rates andmore » probabilities of detection. The gamma-ray spectra analyzed here were collected at a U.S. land Port of Entry (POE) using a NaI-based radiation portal monitor (RPM). The raw data were analyzed to develop a benign vehicle expectation by decimating the original pulse-height channels to 35 energy bins, extracting composite variables via principal components analysis (PCA), and estimating statistically weighted distances from the mean vehicle spectrum with the mahalanobis distance (MD) metric. This paper reviews the methods used to establish the anomaly identification criteria and presents a systematic analysis of the response of the combined PCA and MD algorithm to modeled mono-energetic gamma-ray sources.« less

  8. Probability of identification: adulteration of American Ginseng with Asian Ginseng.

    PubMed

    Harnly, James; Chen, Pei; Harrington, Peter De B

    2013-01-01

    The AOAC INTERNATIONAL guidelines for validation of botanical identification methods were applied to the detection of Asian Ginseng [Panax ginseng (PG)] as an adulterant for American Ginseng [P. quinquefolius (PQ)] using spectral fingerprints obtained by flow injection mass spectrometry (FIMS). Samples of 100% PQ and 100% PG were physically mixed to provide 90, 80, and 50% PQ. The multivariate FIMS fingerprint data were analyzed using soft independent modeling of class analogy (SIMCA) based on 100% PQ. The Q statistic, a measure of the degree of non-fit of the test samples with the calibration model, was used as the analytical parameter. FIMS was able to discriminate between 100% PQ and 100% PG, and between 100% PQ and 90, 80, and 50% PQ. The probability of identification (POI) curve was estimated based on the SD of 90% PQ. A digital model of adulteration, obtained by mathematically summing the experimentally acquired spectra of 100% PQ and 100% PG in the desired ratios, agreed well with the physical data and provided an easy and more accurate method for constructing the POI curve. Two chemometric modeling methods, SIMCA and fuzzy optimal associative memories, and two classification methods, partial least squares-discriminant analysis and fuzzy rule-building expert systems, were applied to the data. The modeling methods correctly identified the adulterated samples; the classification methods did not.

  9. Combination of statistical and physically based methods to assess shallow slide susceptibility at the basin scale

    NASA Astrophysics Data System (ADS)

    Oliveira, Sérgio C.; Zêzere, José L.; Lajas, Sara; Melo, Raquel

    2017-07-01

    Approaches used to assess shallow slide susceptibility at the basin scale are conceptually different depending on the use of statistical or physically based methods. The former are based on the assumption that the same causes are more likely to produce the same effects, whereas the latter are based on the comparison between forces which tend to promote movement along the slope and the counteracting forces that are resistant to motion. Within this general framework, this work tests two hypotheses: (i) although conceptually and methodologically distinct, the statistical and deterministic methods generate similar shallow slide susceptibility results regarding the model's predictive capacity and spatial agreement; and (ii) the combination of shallow slide susceptibility maps obtained with statistical and physically based methods, for the same study area, generate a more reliable susceptibility model for shallow slide occurrence. These hypotheses were tested at a small test site (13.9 km2) located north of Lisbon (Portugal), using a statistical method (the information value method, IV) and a physically based method (the infinite slope method, IS). The landslide susceptibility maps produced with the statistical and deterministic methods were combined into a new landslide susceptibility map. The latter was based on a set of integration rules defined by the cross tabulation of the susceptibility classes of both maps and analysis of the corresponding contingency tables. The results demonstrate a higher predictive capacity of the new shallow slide susceptibility map, which combines the independent results obtained with statistical and physically based models. Moreover, the combination of the two models allowed the identification of areas where the results of the information value and the infinite slope methods are contradictory. Thus, these areas were classified as uncertain and deserve additional investigation at a more detailed scale.

  10. Fuzzy model-based fault detection and diagnosis for a pilot heat exchanger

    NASA Astrophysics Data System (ADS)

    Habbi, Hacene; Kidouche, Madjid; Kinnaert, Michel; Zelmat, Mimoun

    2011-04-01

    This article addresses the design and real-time implementation of a fuzzy model-based fault detection and diagnosis (FDD) system for a pilot co-current heat exchanger. The design method is based on a three-step procedure which involves the identification of data-driven fuzzy rule-based models, the design of a fuzzy residual generator and the evaluation of the residuals for fault diagnosis using statistical tests. The fuzzy FDD mechanism has been implemented and validated on the real co-current heat exchanger, and has been proven to be efficient in detecting and isolating process, sensor and actuator faults.

  11. Models of Pilot Behavior and Their Use to Evaluate the State of Pilot Training

    NASA Astrophysics Data System (ADS)

    Jirgl, Miroslav; Jalovecky, Rudolf; Bradac, Zdenek

    2016-07-01

    This article discusses the possibilities of obtaining new information related to human behavior, namely the changes or progressive development of pilots' abilities during training. The main assumption is that a pilot's ability can be evaluated based on a corresponding behavioral model whose parameters are estimated using mathematical identification procedures. The mean values of the identified parameters are obtained via statistical methods. These parameters are then monitored and their changes evaluated. In this context, the paper introduces and examines relevant mathematical models of human (pilot) behavior, the pilot-aircraft interaction, and an example of the mathematical analysis.

  12. A case management agency and bank create a service innovation.

    PubMed

    Katz, K S; Stowe, A W

    1992-01-01

    Connecticut Community Care, Inc. (CCCI), a statewide, nonprofit case management agency, in collaboration with Connecticut National Bank (CNB), developed a unique model of delivering case management services to bank trust clients. No reports of such a collaborative model have been found in the published literature in the United States. The article presents a historical overview of this innovative initiative; the identification of the target population; the delivery of the assessment, coordination, and monitoring services; and the marketing techniques. Utilization statistics, a synopsis of the model outcomes as viewed by the trust officers, and suggestions for replication are also presented.

  13. Examining the effectiveness of discriminant function analysis and cluster analysis in species identification of male field crickets based on their calling songs.

    PubMed

    Jaiswara, Ranjana; Nandi, Diptarup; Balakrishnan, Rohini

    2013-01-01

    Traditional taxonomy based on morphology has often failed in accurate species identification owing to the occurrence of cryptic species, which are reproductively isolated but morphologically identical. Molecular data have thus been used to complement morphology in species identification. The sexual advertisement calls in several groups of acoustically communicating animals are species-specific and can thus complement molecular data as non-invasive tools for identification. Several statistical tools and automated identifier algorithms have been used to investigate the efficiency of acoustic signals in species identification. Despite a plethora of such methods, there is a general lack of knowledge regarding the appropriate usage of these methods in specific taxa. In this study, we investigated the performance of two commonly used statistical methods, discriminant function analysis (DFA) and cluster analysis, in identification and classification based on acoustic signals of field cricket species belonging to the subfamily Gryllinae. Using a comparative approach we evaluated the optimal number of species and calling song characteristics for both the methods that lead to most accurate classification and identification. The accuracy of classification using DFA was high and was not affected by the number of taxa used. However, a constraint in using discriminant function analysis is the need for a priori classification of songs. Accuracy of classification using cluster analysis, which does not require a priori knowledge, was maximum for 6-7 taxa and decreased significantly when more than ten taxa were analysed together. We also investigated the efficacy of two novel derived acoustic features in improving the accuracy of identification. Our results show that DFA is a reliable statistical tool for species identification using acoustic signals. Our results also show that cluster analysis of acoustic signals in crickets works effectively for species classification and identification.

  14. Confidence assignment for mass spectrometry based peptide identifications via the extreme value distribution.

    PubMed

    Alves, Gelio; Yu, Yi-Kuo

    2016-09-01

    There is a growing trend for biomedical researchers to extract evidence and draw conclusions from mass spectrometry based proteomics experiments, the cornerstone of which is peptide identification. Inaccurate assignments of peptide identification confidence thus may have far-reaching and adverse consequences. Although some peptide identification methods report accurate statistics, they have been limited to certain types of scoring function. The extreme value statistics based method, while more general in the scoring functions it allows, demands accurate parameter estimates and requires, at least in its original design, excessive computational resources. Improving the parameter estimate accuracy and reducing the computational cost for this method has two advantages: it provides another feasible route to accurate significance assessment, and it could provide reliable statistics for scoring functions yet to be developed. We have formulated and implemented an efficient algorithm for calculating the extreme value statistics for peptide identification applicable to various scoring functions, bypassing the need for searching large random databases. The source code, implemented in C ++ on a linux system, is available for download at ftp://ftp.ncbi.nlm.nih.gov/pub/qmbp/qmbp_ms/RAId/RAId_Linux_64Bit yyu@ncbi.nlm.nih.gov Supplementary data are available at Bioinformatics online. Published by Oxford University Press 2016. This work is written by US Government employees and is in the public domain in the US.

  15. 40 CFR Appendix Xviii to Part 86 - Statistical Outlier Identification Procedure for Light-Duty Vehicles and Light Light-Duty Trucks...

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 40 Protection of Environment 20 2012-07-01 2012-07-01 false Statistical Outlier Identification Procedure for Light-Duty Vehicles and Light Light-Duty Trucks Certifying to the Provisions of Part 86... (CONTINUED) AIR PROGRAMS (CONTINUED) CONTROL OF EMISSIONS FROM NEW AND IN-USE HIGHWAY VEHICLES AND ENGINES...

  16. 40 CFR Appendix Xviii to Part 86 - Statistical Outlier Identification Procedure for Light-Duty Vehicles and Light Light-Duty Trucks...

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 40 Protection of Environment 20 2013-07-01 2013-07-01 false Statistical Outlier Identification Procedure for Light-Duty Vehicles and Light Light-Duty Trucks Certifying to the Provisions of Part 86... (CONTINUED) AIR PROGRAMS (CONTINUED) CONTROL OF EMISSIONS FROM NEW AND IN-USE HIGHWAY VEHICLES AND ENGINES...

  17. A Statistics-based Platform for Quantitative N-terminome Analysis and Identification of Protease Cleavage Products*

    PubMed Central

    auf dem Keller, Ulrich; Prudova, Anna; Gioia, Magda; Butler, Georgina S.; Overall, Christopher M.

    2010-01-01

    Terminal amine isotopic labeling of substrates (TAILS), our recently introduced platform for quantitative N-terminome analysis, enables wide dynamic range identification of original mature protein N-termini and protease cleavage products. Modifying TAILS by use of isobaric tag for relative and absolute quantification (iTRAQ)-like labels for quantification together with a robust statistical classifier derived from experimental protease cleavage data, we report reliable and statistically valid identification of proteolytic events in complex biological systems in MS2 mode. The statistical classifier is supported by a novel parameter evaluating ion intensity-dependent quantification confidences of single peptide quantifications, the quantification confidence factor (QCF). Furthermore, the isoform assignment score (IAS) is introduced, a new scoring system for the evaluation of single peptide-to-protein assignments based on high confidence protein identifications in the same sample prior to negative selection enrichment of N-terminal peptides. By these approaches, we identified and validated, in addition to known substrates, low abundance novel bioactive MMP-2 targets including the plasminogen receptor S100A10 (p11) and the proinflammatory cytokine proEMAP/p43 that were previously undescribed. PMID:20305283

  18. Circularly-symmetric complex normal ratio distribution for scalar transmissibility functions. Part III: Application to statistical modal analysis

    NASA Astrophysics Data System (ADS)

    Yan, Wang-Ji; Ren, Wei-Xin

    2018-01-01

    This study applies the theoretical findings of circularly-symmetric complex normal ratio distribution Yan and Ren (2016) [1,2] to transmissibility-based modal analysis from a statistical viewpoint. A probabilistic model of transmissibility function in the vicinity of the resonant frequency is formulated in modal domain, while some insightful comments are offered. It theoretically reveals that the statistics of transmissibility function around the resonant frequency is solely dependent on 'noise-to-signal' ratio and mode shapes. As a sequel to the development of the probabilistic model of transmissibility function in modal domain, this study poses the process of modal identification in the context of Bayesian framework by borrowing a novel paradigm. Implementation issues unique to the proposed approach are resolved by Lagrange multiplier approach. Also, this study explores the possibility of applying Bayesian analysis in distinguishing harmonic components and structural ones. The approaches are verified through simulated data and experimentally testing data. The uncertainty behavior due to variation of different factors is also discussed in detail.

  19. Bayesian 2-Stage Space-Time Mixture Modeling With Spatial Misalignment of the Exposure in Small Area Health Data.

    PubMed

    Lawson, Andrew B; Choi, Jungsoon; Cai, Bo; Hossain, Monir; Kirby, Russell S; Liu, Jihong

    2012-09-01

    We develop a new Bayesian two-stage space-time mixture model to investigate the effects of air pollution on asthma. The two-stage mixture model proposed allows for the identification of temporal latent structure as well as the estimation of the effects of covariates on health outcomes. In the paper, we also consider spatial misalignment of exposure and health data. A simulation study is conducted to assess the performance of the 2-stage mixture model. We apply our statistical framework to a county-level ambulatory care asthma data set in the US state of Georgia for the years 1999-2008.

  20. Accounting for cell lineage and sex effects in the identification of cell-specific DNA methylation using a Bayesian model selection algorithm.

    PubMed

    White, Nicole; Benton, Miles; Kennedy, Daniel; Fox, Andrew; Griffiths, Lyn; Lea, Rodney; Mengersen, Kerrie

    2017-01-01

    Cell- and sex-specific differences in DNA methylation are major sources of epigenetic variation in whole blood. Heterogeneity attributable to cell type has motivated the identification of cell-specific methylation at the CpG level, however statistical methods for this purpose have been limited to pairwise comparisons between cell types or between the cell type of interest and whole blood. We developed a Bayesian model selection algorithm for the identification of cell-specific methylation profiles that incorporates knowledge of shared cell lineage and allows for the identification of differential methylation profiles in one or more cell types simultaneously. Under the proposed methodology, sex-specific differences in methylation by cell type are also assessed. Using publicly available, cell-sorted methylation data, we show that 51.3% of female CpG markers and 61.4% of male CpG markers identified were associated with differential methylation in more than one cell type. The impact of cell lineage on differential methylation was also highlighted. An evaluation of sex-specific differences revealed differences in CD56+NK methylation, within both single and multi- cell dependent methylation patterns. Our findings demonstrate the need to account for cell lineage in studies of differential methylation and associated sex effects.

  1. ERBE Geographic Scene and Monthly Snow Data

    NASA Technical Reports Server (NTRS)

    Coleman, Lisa H.; Flug, Beth T.; Gupta, Shalini; Kizer, Edward A.; Robbins, John L.

    1997-01-01

    The Earth Radiation Budget Experiment (ERBE) is a multisatellite system designed to measure the Earth's radiation budget. The ERBE data processing system consists of several software packages or sub-systems, each designed to perform a particular task. The primary task of the Inversion Subsystem is to reduce satellite altitude radiances to fluxes at the top of the Earth's atmosphere. To accomplish this, angular distribution models (ADM's) are required. These ADM's are a function of viewing and solar geometry and of the scene type as determined by the ERBE scene identification algorithm which is a part of the Inversion Subsystem. The Inversion Subsystem utilizes 12 scene types which are determined by the ERBE scene identification algorithm. The scene type is found by combining the most probable cloud cover, which is determined statistically by the scene identification algorithm, with the underlying geographic scene type. This Contractor Report describes how the geographic scene type is determined on a monthly basis.

  2. Biophysical model for assessment of risk of acute exposures in combination with low level chronic irradiation

    NASA Astrophysics Data System (ADS)

    Smirnova, O. A.

    A biophysical model is developed which describes the mortality dynamics in mammalian populations unexposed and exposed to radiation The model relates statistical biometric functions mortality rate life span probability density and life span probability with statistical characteristics and dynamics of a critical body system in individuals composing the population The model describing the dynamics of thrombocytopoiesis in nonirradiated and irradiated mammals is also developed this hematopoietic line being considered as the critical body system under exposures in question The mortality model constructed in the framework of the proposed approach was identified to reproduce the irradiation effects on populations of mice The most parameters of the thrombocytopoiesis model were determined from the data available in the literature on hematology and radiobiology the rest parameters were evaluated by fitting some experimental data on the dynamics of this system in acutely irradiated mice The successful verification of the thrombocytopoiesis model was fulfilled by the quantitative juxtaposition of the modeling predictions and experimental data on the dynamics of this system in mice exposed to either acute or chronic irradiation at wide ranges of doses and dose rates It is important that only experimental data on the mortality rate in nonirradiated population and the relevant statistical characteristics of the thrombocytopoiesis system in mice which are also available in the literature on radiobiology are needed for the final identification of

  3. PhenStat | Informatics Technology for Cancer Research (ITCR)

    Cancer.gov

    PhenStat is a freely available R package that provides a variety of statistical methods for the identification of phenotypic associations from model organisms developed for the International Mouse Phenotyping Consortium (IMPC at www.mousephenotype.org ). The methods have been developed for high throughput phenotyping pipelines implemented across various experimental designs with an emphasis on managing temporal variation and is being adapted for analysis with PDX mouse strains.

  4. Fully Bayesian inference for structural MRI: application to segmentation and statistical analysis of T2-hypointensities.

    PubMed

    Schmidt, Paul; Schmid, Volker J; Gaser, Christian; Buck, Dorothea; Bührlen, Susanne; Förschler, Annette; Mühlau, Mark

    2013-01-01

    Aiming at iron-related T2-hypointensity, which is related to normal aging and neurodegenerative processes, we here present two practicable approaches, based on Bayesian inference, for preprocessing and statistical analysis of a complex set of structural MRI data. In particular, Markov Chain Monte Carlo methods were used to simulate posterior distributions. First, we rendered a segmentation algorithm that uses outlier detection based on model checking techniques within a Bayesian mixture model. Second, we rendered an analytical tool comprising a Bayesian regression model with smoothness priors (in the form of Gaussian Markov random fields) mitigating the necessity to smooth data prior to statistical analysis. For validation, we used simulated data and MRI data of 27 healthy controls (age: [Formula: see text]; range, [Formula: see text]). We first observed robust segmentation of both simulated T2-hypointensities and gray-matter regions known to be T2-hypointense. Second, simulated data and images of segmented T2-hypointensity were analyzed. We found not only robust identification of simulated effects but also a biologically plausible age-related increase of T2-hypointensity primarily within the dentate nucleus but also within the globus pallidus, substantia nigra, and red nucleus. Our results indicate that fully Bayesian inference can successfully be applied for preprocessing and statistical analysis of structural MRI data.

  5. Polypropylene Production Optimization in Fluidized Bed Catalytic Reactor (FBCR): Statistical Modeling and Pilot Scale Experimental Validation

    PubMed Central

    Khan, Mohammad Jakir Hossain; Hussain, Mohd Azlan; Mujtaba, Iqbal Mohammed

    2014-01-01

    Propylene is one type of plastic that is widely used in our everyday life. This study focuses on the identification and justification of the optimum process parameters for polypropylene production in a novel pilot plant based fluidized bed reactor. This first-of-its-kind statistical modeling with experimental validation for the process parameters of polypropylene production was conducted by applying ANNOVA (Analysis of variance) method to Response Surface Methodology (RSM). Three important process variables i.e., reaction temperature, system pressure and hydrogen percentage were considered as the important input factors for the polypropylene production in the analysis performed. In order to examine the effect of process parameters and their interactions, the ANOVA method was utilized among a range of other statistical diagnostic tools such as the correlation between actual and predicted values, the residuals and predicted response, outlier t plot, 3D response surface and contour analysis plots. The statistical analysis showed that the proposed quadratic model had a good fit with the experimental results. At optimum conditions with temperature of 75°C, system pressure of 25 bar and hydrogen percentage of 2%, the highest polypropylene production obtained is 5.82% per pass. Hence it is concluded that the developed experimental design and proposed model can be successfully employed with over a 95% confidence level for optimum polypropylene production in a fluidized bed catalytic reactor (FBCR). PMID:28788576

  6. Effective connectivity: Influence, causality and biophysical modeling

    PubMed Central

    Valdes-Sosa, Pedro A.; Roebroeck, Alard; Daunizeau, Jean; Friston, Karl

    2011-01-01

    This is the final paper in a Comments and Controversies series dedicated to “The identification of interacting networks in the brain using fMRI: Model selection, causality and deconvolution”. We argue that discovering effective connectivity depends critically on state-space models with biophysically informed observation and state equations. These models have to be endowed with priors on unknown parameters and afford checks for model Identifiability. We consider the similarities and differences among Dynamic Causal Modeling, Granger Causal Modeling and other approaches. We establish links between past and current statistical causal modeling, in terms of Bayesian dependency graphs and Wiener–Akaike–Granger–Schweder influence measures. We show that some of the challenges faced in this field have promising solutions and speculate on future developments. PMID:21477655

  7. Synthesis fidelity and time-varying spectral change in vowels

    NASA Astrophysics Data System (ADS)

    Assmann, Peter F.; Katz, William F.

    2005-02-01

    Recent studies have shown that synthesized versions of American English vowels are less accurately identified when the natural time-varying spectral changes are eliminated by holding the formant frequencies constant over the duration of the vowel. A limitation of these experiments has been that vowels produced by formant synthesis are generally less accurately identified than the natural vowels after which they are modeled. To overcome this limitation, a high-quality speech analysis-synthesis system (STRAIGHT) was used to synthesize versions of 12 American English vowels spoken by adults and children. Vowels synthesized with STRAIGHT were identified as accurately as the natural versions, in contrast with previous results from our laboratory showing identification rates 9%-12% lower for the same vowels synthesized using the cascade formant model. Consistent with earlier studies, identification accuracy was not reduced when the fundamental frequency was held constant across the vowel. However, elimination of time-varying changes in the spectral envelope using STRAIGHT led to a greater reduction in accuracy (23%) than was previously found with cascade formant synthesis (11%). A statistical pattern recognition model, applied to acoustic measurements of the natural and synthesized vowels, predicted both the higher identification accuracy for vowels synthesized using STRAIGHT compared to formant synthesis, and the greater effects of holding the formant frequencies constant over time with STRAIGHT synthesis. Taken together, the experiment and modeling results suggest that formant estimation errors and incorrect rendering of spectral and temporal cues by cascade formant synthesis contribute to lower identification accuracy and underestimation of the role of time-varying spectral change in vowels. .

  8. Stochastic modeling of sunshine number data

    NASA Astrophysics Data System (ADS)

    Brabec, Marek; Paulescu, Marius; Badescu, Viorel

    2013-11-01

    In this paper, we will present a unified statistical modeling framework for estimation and forecasting sunshine number (SSN) data. Sunshine number has been proposed earlier to describe sunshine time series in qualitative terms (Theor Appl Climatol 72 (2002) 127-136) and since then, it was shown to be useful not only for theoretical purposes but also for practical considerations, e.g. those related to the development of photovoltaic energy production. Statistical modeling and prediction of SSN as a binary time series has been challenging problem, however. Our statistical model for SSN time series is based on an underlying stochastic process formulation of Markov chain type. We will show how its transition probabilities can be efficiently estimated within logistic regression framework. In fact, our logistic Markovian model can be relatively easily fitted via maximum likelihood approach. This is optimal in many respects and it also enables us to use formalized statistical inference theory to obtain not only the point estimates of transition probabilities and their functions of interest, but also related uncertainties, as well as to test of various hypotheses of practical interest, etc. It is straightforward to deal with non-homogeneous transition probabilities in this framework. Very importantly from both physical and practical points of view, logistic Markov model class allows us to test hypotheses about how SSN dependents on various external covariates (e.g. elevation angle, solar time, etc.) and about details of the dynamic model (order and functional shape of the Markov kernel, etc.). Therefore, using generalized additive model approach (GAM), we can fit and compare models of various complexity which insist on keeping physical interpretation of the statistical model and its parts. After introducing the Markovian model and general approach for identification of its parameters, we will illustrate its use and performance on high resolution SSN data from the Solar Radiation Monitoring Station of the West University of Timisoara.

  9. Stochastic modeling of sunshine number data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Brabec, Marek, E-mail: mbrabec@cs.cas.cz; Paulescu, Marius; Badescu, Viorel

    2013-11-13

    In this paper, we will present a unified statistical modeling framework for estimation and forecasting sunshine number (SSN) data. Sunshine number has been proposed earlier to describe sunshine time series in qualitative terms (Theor Appl Climatol 72 (2002) 127-136) and since then, it was shown to be useful not only for theoretical purposes but also for practical considerations, e.g. those related to the development of photovoltaic energy production. Statistical modeling and prediction of SSN as a binary time series has been challenging problem, however. Our statistical model for SSN time series is based on an underlying stochastic process formulation ofmore » Markov chain type. We will show how its transition probabilities can be efficiently estimated within logistic regression framework. In fact, our logistic Markovian model can be relatively easily fitted via maximum likelihood approach. This is optimal in many respects and it also enables us to use formalized statistical inference theory to obtain not only the point estimates of transition probabilities and their functions of interest, but also related uncertainties, as well as to test of various hypotheses of practical interest, etc. It is straightforward to deal with non-homogeneous transition probabilities in this framework. Very importantly from both physical and practical points of view, logistic Markov model class allows us to test hypotheses about how SSN dependents on various external covariates (e.g. elevation angle, solar time, etc.) and about details of the dynamic model (order and functional shape of the Markov kernel, etc.). Therefore, using generalized additive model approach (GAM), we can fit and compare models of various complexity which insist on keeping physical interpretation of the statistical model and its parts. After introducing the Markovian model and general approach for identification of its parameters, we will illustrate its use and performance on high resolution SSN data from the Solar Radiation Monitoring Station of the West University of Timisoara.« less

  10. [The informational support of statistical observation related to children disability].

    PubMed

    Son, I M; Polikarpov, A V; Ogrizko, E V; Golubeva, T Yu

    2016-01-01

    Within the framework of the Convention on rights of the disabled the revision is specified concerning criteria of identification of disability of children and reformation of system of medical social expertise according international standards of indices of health and indices related to health. In connection with it, it is important to consider the relationship between alterations in forms of the Federal statistical monitoring in the part of registration of disabled children in the Russian Federation and classification of health indices and indices related to health applied at identification of disability. The article presents analysis of relationship between alterations in forms of the Federal statistical monitoring in the part of registration of disabled children in the Russian Federation and applied classifications used at identification of disability (International classification of impairments, disabilities and handicap (ICDH), international classification of functioning, disability and health (ICF), international classification of functioning, disability and health, version for children and youth (ICF-CY). The intersectorial interaction is considered within the framework of statistics of children disability.

  11. An error-dependent model of instrument-scanning behavior in commercial airline pilots. Ph.D. Thesis - May 1983

    NASA Technical Reports Server (NTRS)

    Jones, D. H.

    1985-01-01

    A new flexible model of pilot instrument scanning behavior is presented which assumes that the pilot uses a set of deterministic scanning patterns on the pilot's perception of error in the state of the aircraft, and the pilot's knowledge of the interactive nature of the aircraft's systems. Statistical analyses revealed that a three stage Markov process composed of the pilot's three predicted lookpoints (LP), occurring 1/30, 2/30, and 3/30 of a second prior to each LP, accurately modelled the scanning behavior of 14 commercial airline pilots while flying steep turn maneuvers in a Boeing 737 flight simulator. The modelled scanning data for each pilot were not statistically different from the observed scanning data in comparisons of mean dwell time, entropy, and entropy rate. These findings represent the first direct evidence that pilots are using deterministic scanning patterns during instrument flight. The results are interpreted as direct support for the error dependent model and suggestions are made for further research that could allow for identification of the specific scanning patterns suggested by the model.

  12. A theory of fine structure image models with an application to detection and classification of dementia

    PubMed Central

    Penn, Richard; Werner, Michael; Thomas, Justin

    2015-01-01

    Background Estimation of stochastic process models from data is a common application of time series analysis methods. Such system identification processes are often cast as hypothesis testing exercises whose intent is to estimate model parameters and test them for statistical significance. Ordinary least squares (OLS) regression and the Levenberg-Marquardt algorithm (LMA) have proven invaluable computational tools for models being described by non-homogeneous, linear, stationary, ordinary differential equations. Methods In this paper we extend stochastic model identification to linear, stationary, partial differential equations in two independent variables (2D) and show that OLS and LMA apply equally well to these systems. The method employs an original nonparametric statistic as a test for the significance of estimated parameters. Results We show gray scale and color images are special cases of 2D systems satisfying a particular autoregressive partial difference equation which estimates an analogous partial differential equation. Several applications to medical image modeling and classification illustrate the method by correctly classifying demented and normal OLS models of axial magnetic resonance brain scans according to subject Mini Mental State Exam (MMSE) scores. Comparison with 13 image classifiers from the literature indicates our classifier is at least 14 times faster than any of them and has a classification accuracy better than all but one. Conclusions Our modeling method applies to any linear, stationary, partial differential equation and the method is readily extended to 3D whole-organ systems. Further, in addition to being a robust image classifier, estimated image models offer insights into which parameters carry the most diagnostic image information and thereby suggest finer divisions could be made within a class. Image models can be estimated in milliseconds which translate to whole-organ models in seconds; such runtimes could make real-time medicine and surgery modeling possible. PMID:26029638

  13. Inclusion probability for DNA mixtures is a subjective one-sided match statistic unrelated to identification information

    PubMed Central

    Perlin, Mark William

    2015-01-01

    Background: DNA mixtures of two or more people are a common type of forensic crime scene evidence. A match statistic that connects the evidence to a criminal defendant is usually needed for court. Jurors rely on this strength of match to help decide guilt or innocence. However, the reliability of unsophisticated match statistics for DNA mixtures has been questioned. Materials and Methods: The most prevalent match statistic for DNA mixtures is the combined probability of inclusion (CPI), used by crime labs for over 15 years. When testing 13 short tandem repeat (STR) genetic loci, the CPI-1 value is typically around a million, regardless of DNA mixture composition. However, actual identification information, as measured by a likelihood ratio (LR), spans a much broader range. This study examined probability of inclusion (PI) mixture statistics for 517 locus experiments drawn from 16 reported cases and compared them with LR locus information calculated independently on the same data. The log(PI-1) values were examined and compared with corresponding log(LR) values. Results: The LR and CPI methods were compared in case examples of false inclusion, false exclusion, a homicide, and criminal justice outcomes. Statistical analysis of crime laboratory STR data shows that inclusion match statistics exhibit a truncated normal distribution having zero center, with little correlation to actual identification information. By the law of large numbers (LLN), CPI-1 increases with the number of tested genetic loci, regardless of DNA mixture composition or match information. These statistical findings explain why CPI is relatively constant, with implications for DNA policy, criminal justice, cost of crime, and crime prevention. Conclusions: Forensic crime laboratories have generated CPI statistics on hundreds of thousands of DNA mixture evidence items. However, this commonly used match statistic behaves like a random generator of inclusionary values, following the LLN rather than measuring identification information. A quantitative CPI number adds little meaningful information beyond the analyst's initial qualitative assessment that a person's DNA is included in a mixture. Statistical methods for reporting on DNA mixture evidence should be scientifically validated before they are relied upon by criminal justice. PMID:26605124

  14. Inclusion probability for DNA mixtures is a subjective one-sided match statistic unrelated to identification information.

    PubMed

    Perlin, Mark William

    2015-01-01

    DNA mixtures of two or more people are a common type of forensic crime scene evidence. A match statistic that connects the evidence to a criminal defendant is usually needed for court. Jurors rely on this strength of match to help decide guilt or innocence. However, the reliability of unsophisticated match statistics for DNA mixtures has been questioned. The most prevalent match statistic for DNA mixtures is the combined probability of inclusion (CPI), used by crime labs for over 15 years. When testing 13 short tandem repeat (STR) genetic loci, the CPI(-1) value is typically around a million, regardless of DNA mixture composition. However, actual identification information, as measured by a likelihood ratio (LR), spans a much broader range. This study examined probability of inclusion (PI) mixture statistics for 517 locus experiments drawn from 16 reported cases and compared them with LR locus information calculated independently on the same data. The log(PI(-1)) values were examined and compared with corresponding log(LR) values. The LR and CPI methods were compared in case examples of false inclusion, false exclusion, a homicide, and criminal justice outcomes. Statistical analysis of crime laboratory STR data shows that inclusion match statistics exhibit a truncated normal distribution having zero center, with little correlation to actual identification information. By the law of large numbers (LLN), CPI(-1) increases with the number of tested genetic loci, regardless of DNA mixture composition or match information. These statistical findings explain why CPI is relatively constant, with implications for DNA policy, criminal justice, cost of crime, and crime prevention. Forensic crime laboratories have generated CPI statistics on hundreds of thousands of DNA mixture evidence items. However, this commonly used match statistic behaves like a random generator of inclusionary values, following the LLN rather than measuring identification information. A quantitative CPI number adds little meaningful information beyond the analyst's initial qualitative assessment that a person's DNA is included in a mixture. Statistical methods for reporting on DNA mixture evidence should be scientifically validated before they are relied upon by criminal justice.

  15. A statistical rain attenuation prediction model with application to the advanced communication technology satellite project. 3: A stochastic rain fade control algorithm for satellite link power via non linear Markow filtering theory

    NASA Technical Reports Server (NTRS)

    Manning, Robert M.

    1991-01-01

    The dynamic and composite nature of propagation impairments that are incurred on Earth-space communications links at frequencies in and above 30/20 GHz Ka band, i.e., rain attenuation, cloud and/or clear air scintillation, etc., combined with the need to counter such degradations after the small link margins have been exceeded, necessitate the use of dynamic statistical identification and prediction processing of the fading signal in order to optimally estimate and predict the levels of each of the deleterious attenuation components. Such requirements are being met in NASA's Advanced Communications Technology Satellite (ACTS) Project by the implementation of optimal processing schemes derived through the use of the Rain Attenuation Prediction Model and nonlinear Markov filtering theory.

  16. Proteomic Workflows for Biomarker Identification Using Mass Spectrometry — Technical and Statistical Considerations during Initial Discovery

    PubMed Central

    Orton, Dennis J.; Doucette, Alan A.

    2013-01-01

    Identification of biomarkers capable of differentiating between pathophysiological states of an individual is a laudable goal in the field of proteomics. Protein biomarker discovery generally employs high throughput sample characterization by mass spectrometry (MS), being capable of identifying and quantifying thousands of proteins per sample. While MS-based technologies have rapidly matured, the identification of truly informative biomarkers remains elusive, with only a handful of clinically applicable tests stemming from proteomic workflows. This underlying lack of progress is attributed in large part to erroneous experimental design, biased sample handling, as well as improper statistical analysis of the resulting data. This review will discuss in detail the importance of experimental design and provide some insight into the overall workflow required for biomarker identification experiments. Proper balance between the degree of biological vs. technical replication is required for confident biomarker identification. PMID:28250400

  17. A statistical method for assessing peptide identification confidence in accurate mass and time tag proteomics

    PubMed Central

    Stanley, Jeffrey R.; Adkins, Joshua N.; Slysz, Gordon W.; Monroe, Matthew E.; Purvine, Samuel O.; Karpievitch, Yuliya V.; Anderson, Gordon A.; Smith, Richard D.; Dabney, Alan R.

    2011-01-01

    Current algorithms for quantifying peptide identification confidence in the accurate mass and time (AMT) tag approach assume that the AMT tags themselves have been correctly identified. However, there is uncertainty in the identification of AMT tags, as this is based on matching LC-MS/MS fragmentation spectra to peptide sequences. In this paper, we incorporate confidence measures for the AMT tag identifications into the calculation of probabilities for correct matches to an AMT tag database, resulting in a more accurate overall measure of identification confidence for the AMT tag approach. The method is referred to as Statistical Tools for AMT tag Confidence (STAC). STAC additionally provides a Uniqueness Probability (UP) to help distinguish between multiple matches to an AMT tag and a method to calculate an overall false discovery rate (FDR). STAC is freely available for download as both a command line and a Windows graphical application. PMID:21692516

  18. Multivariate Statistical Analysis of Orthogonal Mass Spectral Data for the Identification of Chemical Attribution Signatures of 3-Methylfentanyl

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mayer, B. P.; Valdez, C. A.; DeHope, A. J.

    Critical to many modern forensic investigations is the chemical attribution of the origin of an illegal drug. This process greatly relies on identification of compounds indicative of its clandestine or commercial production. The results of these studies can yield detailed information on method of manufacture, sophistication of the synthesis operation, starting material source, and final product. In the present work, chemical attribution signatures (CAS) associated with the synthesis of the analgesic 3- methylfentanyl, N-(3-methyl-1-phenethylpiperidin-4-yl)-N-phenylpropanamide, were investigated. Six synthesis methods were studied in an effort to identify and classify route-specific signatures. These methods were chosen to minimize the use of scheduledmore » precursors, complicated laboratory equipment, number of overall steps, and demanding reaction conditions. Using gas and liquid chromatographies combined with mass spectrometric methods (GC-QTOF and LC-QTOF) in conjunction with inductivelycoupled plasma mass spectrometry (ICP-MS), over 240 distinct compounds and elements were monitored. As seen in our previous work with CAS of fentanyl synthesis the complexity of the resultant data matrix necessitated the use of multivariate statistical analysis. Using partial least squares discriminant analysis (PLS-DA), 62 statistically significant, route-specific CAS were identified. Statistical classification models using a variety of machine learning techniques were then developed with the ability to predict the method of 3-methylfentanyl synthesis from three blind crude samples generated by synthetic chemists without prior experience with these methods.« less

  19. The modified turning bands (MTB) model for space-time rainfall. I. Model definition and properties

    NASA Astrophysics Data System (ADS)

    Mellor, Dale

    1996-02-01

    A new stochastic model of space-time rainfall, the Modified Turning Bands (MTB) model, is proposed which reproduces, in particular, the movements and developments of rainbands, cluster potential regions and raincells, as well as their respective interactions. The ensemble correlation structure is unsuitable for practical estimation of the model parameters because the model is not ergodic in this statistic, and hence it cannot easily be measured from a single real storm. Thus, some general theory on the internal covariance structure of a class of stochastic models is presented, of which the MTB model is an example. It is noted that, for the MTB model, the internal covariance structure may be measured from a single storm, and can thus be used for model identification.

  20. Multi-species Identification of Polymorphic Peptide Variants via Propagation in Spectral Networks

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Na, Seungjin; Payne, Samuel H.; Bandeira, Nuno

    The spectral networks approach enables the detection of pairs of spectra from related peptides and thus allows for the propagation of annotations from identified peptides to unidentified spectra. Beyond allowing for unbiased discovery of unexpected post-translational modifications, spectral networks are also applicable to multi-species comparative proteomics or metaproteomics to identify numerous orthologous versions of a protein. We present algorithmic and statistical advances in spectral networks that have made it possible to rigorously assess the statistical significance of spectral pairs and accurately estimate the error rate of identifications via propagation. In the analysis of three related Cyanothece species, a model organismmore » for biohydrogen production, spectral networks identified peptides with highly divergent sequences with up to dozens of variants per peptide, including many novel peptides in species that lack a sequenced genome. Furthermore, spectral networks strongly suggested the presence of novel peptides even in genomically characterized species (i.e. missing from databases) in that a significant portion of unidentified multi-species networks included at least two polymorphic peptide variants.« less

  1. Automated Identification and Shape Analysis of Chorus Elements in the Van Allen Radiation Belts

    NASA Astrophysics Data System (ADS)

    Sen Gupta, Ananya; Kletzing, Craig; Howk, Robin; Kurth, William; Matheny, Morgan

    2017-12-01

    An important goal of the Van Allen Probes mission is to understand wave-particle interaction by chorus emissions in terrestrial Van Allen radiation belts. To test models, statistical characterization of chorus properties, such as amplitude variation and sweep rates, is an important scientific goal. The Electric and Magnetic Field Instrument Suite and Integrated Science (EMFISIS) instrumentation suite provides measurements of wave electric and magnetic fields as well as DC magnetic fields for the Van Allen Probes mission. However, manual inspection across terabytes of EMFISIS data is not feasible and as such introduces human confirmation bias. We present signal processing techniques for automated identification, shape analysis, and sweep rate characterization of high-amplitude whistler-mode chorus elements in the Van Allen radiation belts. Specifically, we develop signal processing techniques based on the radon transform that disambiguate chorus elements with a dominant sweep rate against hiss-like chorus. We present representative results validating our techniques and also provide statistical characterization of detected chorus elements across a case study of a 6 s epoch.

  2. Effective techniques for the identification and accommodation of disturbances

    NASA Technical Reports Server (NTRS)

    Johnson, C. D.

    1989-01-01

    The successful control of dynamic systems such as space stations, or launch vehicles, requires a controller design methodology that acknowledges and addresses the disruptive effects caused by external and internal disturbances that inevitably act on such systems. These disturbances, technically defined as uncontrollable inputs, typically vary with time in an uncertain manner and usually cannot be directly measured in real time. A relatively new non-statistical technique for modeling, and (on-line) identification, of those complex uncertain disturbances that are not as erratic and capricious as random noise is described. This technique applies to multi-input cases and to many of the practical disturbances associated with the control of space stations, or launch vehicles. Then, a collection of smart controller design techniques that allow controlled dynamic systems, with possible multi-input controls, to accommodate (cope with) such disturbances with extraordinary effectiveness are associated. These new smart controllers are designed by non-statistical techniques and typically turn out to be unconventional forms of dynamic linear controllers (compensators) with constant coefficients. The simplicity and reliability of linear, constant coefficient controllers is well-known in the aerospace field.

  3. Aftershock identification problem via the nearest-neighbor analysis for marked point processes

    NASA Astrophysics Data System (ADS)

    Gabrielov, A.; Zaliapin, I.; Wong, H.; Keilis-Borok, V.

    2007-12-01

    The centennial observations on the world seismicity have revealed a wide variety of clustering phenomena that unfold in the space-time-energy domain and provide most reliable information about the earthquake dynamics. However, there is neither a unifying theory nor a convenient statistical apparatus that would naturally account for the different types of seismic clustering. In this talk we present a theoretical framework for nearest-neighbor analysis of marked processes and obtain new results on hierarchical approach to studying seismic clustering introduced by Baiesi and Paczuski (2004). Recall that under this approach one defines an asymmetric distance D in space-time-energy domain such that the nearest-neighbor spanning graph with respect to D becomes a time- oriented tree. We demonstrate how this approach can be used to detect earthquake clustering. We apply our analysis to the observed seismicity of California and synthetic catalogs from ETAS model and show that the earthquake clustering part is statistically different from the homogeneous part. This finding may serve as a basis for an objective aftershock identification procedure.

  4. 3D topography measurements on correlation cells—a new approach to forensic ballistics identifications

    NASA Astrophysics Data System (ADS)

    Song, John; Chu, Wei; Tong, Mingsi; Soons, Johannes

    2014-06-01

    Based on three-dimensional (3D) topography measurements on correlation cells, the National Institute of Standards and Technology (NIST) has developed the ‘NIST Ballistics Identification System (NBIS)’ aimed at accurate ballistics identifications and fast ballistics evidence searches. The 3D topographies are divided into arrays of correlation cells to identify ‘valid correlation areas’ and eliminate ‘invalid correlation areas’ from the matching and identification procedure. A ‘congruent matching cells’ (CMC)’ method using three types of identification parameters of the paired correlation cells (cross correlation function maximum CCFmax, spatial registration position in x-y and registration angle θ) is used for high accuracy ballistics identifications. ‘Synchronous processing’ is proposed for correlating multiple cell pairs at the same time to increase the correlation speed. The proposed NBIS can be used for correlations of both geometrical topographies and optical intensity images. All the correlation parameters and algorithms are in the public domain and subject to open tests. An error rate reporting procedure has been developed that can greatly add to the scientific support for the firearm and toolmark identification specialty, and give confidence to the trier of fact in court proceedings. The NBIS is engineered to employ transparent identification parameters and criteria, statistical models and correlation algorithms. In this way, interoperability between different ballistics identification systems can be more easily achieved. This interoperability will make the NBIS suitable for ballistics identifications and evidence searches with large national databases, such as the National Integrated Ballistic Information Network in the United States.

  5. Experimental design and statistical methods for improved hit detection in high-throughput screening.

    PubMed

    Malo, Nathalie; Hanley, James A; Carlile, Graeme; Liu, Jing; Pelletier, Jerry; Thomas, David; Nadon, Robert

    2010-09-01

    Identification of active compounds in high-throughput screening (HTS) contexts can be substantially improved by applying classical experimental design and statistical inference principles to all phases of HTS studies. The authors present both experimental and simulated data to illustrate how true-positive rates can be maximized without increasing false-positive rates by the following analytical process. First, the use of robust data preprocessing methods reduces unwanted variation by removing row, column, and plate biases. Second, replicate measurements allow estimation of the magnitude of the remaining random error and the use of formal statistical models to benchmark putative hits relative to what is expected by chance. Receiver Operating Characteristic (ROC) analyses revealed superior power for data preprocessed by a trimmed-mean polish method combined with the RVM t-test, particularly for small- to moderate-sized biological hits.

  6. Alphabetic letter identification: Effects of perceivability, similarity, and bias☆

    PubMed Central

    Mueller, Shane T.; Weidemann, Christoph T.

    2012-01-01

    The legibility of the letters in the Latin alphabet has been measured numerous times since the beginning of experimental psychology. To identify the theoretical mechanisms attributed to letter identification, we report a comprehensive review of literature, spanning more than a century. This review revealed that identification accuracy has frequently been attributed to a subset of three common sources: perceivability, bias, and similarity. However, simultaneous estimates of these values have rarely (if ever) been performed. We present the results of two new experiments which allow for the simultaneous estimation of these factors, and examine how the shape of a visual mask impacts each of them, as inferred through a new statistical model. Results showed that the shape and identity of the mask impacted the inferred perceivability, bias, and similarity space of a letter set, but that there were aspects of similarity that were robust to the choice of mask. The results illustrate how the psychological concepts of perceivability, bias, and similarity can be estimated simultaneously, and how each make powerful contributions to visual letter identification. PMID:22036587

  7. The Use of Growth Mixture Modeling for Studying Resilience to Major Life Stressors in Adulthood and Old Age: Lessons for Class Size and Identification and Model Selection.

    PubMed

    Infurna, Frank J; Grimm, Kevin J

    2017-12-15

    Growth mixture modeling (GMM) combines latent growth curve and mixture modeling approaches and is typically used to identify discrete trajectories following major life stressors (MLS). However, GMM is often applied to data that does not meet the statistical assumptions of the model (e.g., within-class normality) and researchers often do not test additional model constraints (e.g., homogeneity of variance across classes), which can lead to incorrect conclusions regarding the number and nature of the trajectories. We evaluate how these methodological assumptions influence trajectory size and identification in the study of resilience to MLS. We use data on changes in subjective well-being and depressive symptoms following spousal loss from the HILDA and HRS. Findings drastically differ when constraining the variances to be homogenous versus heterogeneous across trajectories, with overextraction being more common when constraining the variances to be homogeneous across trajectories. In instances, when the data are non-normally distributed, assuming normally distributed data increases the extraction of latent classes. Our findings showcase that the assumptions typically underlying GMM are not tenable, influencing trajectory size and identification and most importantly, misinforming conceptual models of resilience. The discussion focuses on how GMM can be leveraged to effectively examine trajectories of adaptation following MLS and avenues for future research. © The Author 2017. Published by Oxford University Press on behalf of The Gerontological Society of America. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  8. PSSMSearch: a server for modeling, visualization, proteome-wide discovery and annotation of protein motif specificity determinants.

    PubMed

    Krystkowiak, Izabella; Manguy, Jean; Davey, Norman E

    2018-06-05

    There is a pressing need for in silico tools that can aid in the identification of the complete repertoire of protein binding (SLiMs, MoRFs, miniMotifs) and modification (moiety attachment/removal, isomerization, cleavage) motifs. We have created PSSMSearch, an interactive web-based tool for rapid statistical modeling, visualization, discovery and annotation of protein motif specificity determinants to discover novel motifs in a proteome-wide manner. PSSMSearch analyses proteomes for regions with significant similarity to a motif specificity determinant model built from a set of aligned motif-containing peptides. Multiple scoring methods are available to build a position-specific scoring matrix (PSSM) describing the motif specificity determinant model. This model can then be modified by a user to add prior knowledge of specificity determinants through an interactive PSSM heatmap. PSSMSearch includes a statistical framework to calculate the significance of specificity determinant model matches against a proteome of interest. PSSMSearch also includes the SLiMSearch framework's annotation, motif functional analysis and filtering tools to highlight relevant discriminatory information. Additional tools to annotate statistically significant shared keywords and GO terms, or experimental evidence of interaction with a motif-recognizing protein have been added. Finally, PSSM-based conservation metrics have been created for taxonomic range analyses. The PSSMSearch web server is available at http://slim.ucd.ie/pssmsearch/.

  9. Statistical Use of Argonaute Expression and RISC Assembly in microRNA Target Identification

    PubMed Central

    Stanhope, Stephen A.; Sengupta, Srikumar; den Boon, Johan; Ahlquist, Paul; Newton, Michael A.

    2009-01-01

    MicroRNAs (miRNAs) posttranscriptionally regulate targeted messenger RNAs (mRNAs) by inducing cleavage or otherwise repressing their translation. We address the problem of detecting m/miRNA targeting relationships in homo sapiens from microarray data by developing statistical models that are motivated by the biological mechanisms used by miRNAs. The focus of our modeling is the construction, activity, and mediation of RNA-induced silencing complexes (RISCs) competent for targeted mRNA cleavage. We demonstrate that regression models accommodating RISC abundance and controlling for other mediating factors fit the expression profiles of known target pairs substantially better than models based on m/miRNA expressions alone, and lead to verifications of computational target pair predictions that are more sensitive than those based on marginal expression levels. Because our models are fully independent of exogenous results from sequence-based computational methods, they are appropriate for use as either a primary or secondary source of information regarding m/miRNA target pair relationships, especially in conjunction with high-throughput expression studies. PMID:19779550

  10. Nonlinear identification of the total baroreflex arc: higher-order nonlinearity

    PubMed Central

    Moslehpour, Mohsen; Kawada, Toru; Sunagawa, Kenji; Sugimachi, Masaru

    2016-01-01

    The total baroreflex arc is the open-loop system relating carotid sinus pressure (CSP) to arterial pressure (AP). The nonlinear dynamics of this system were recently characterized. First, Gaussian white noise CSP stimulation was employed in open-loop conditions in normotensive and hypertensive rats with sectioned vagal and aortic depressor nerves. Nonparametric system identification was then applied to measured CSP and AP to establish a second-order nonlinear Uryson model. The aim in this study was to assess the importance of higher-order nonlinear dynamics via development and evaluation of a third-order nonlinear model of the total arc using the same experimental data. Third-order Volterra and Uryson models were developed by employing nonparametric and parametric identification methods. The R2 values between the AP predicted by the best third-order Volterra model and measured AP in response to Gaussian white noise CSP not utilized in developing the model were 0.69 ± 0.03 and 0.70 ± 0.03 for normotensive and hypertensive rats, respectively. The analogous R2 values for the best third-order Uryson model were 0.71 ± 0.03 and 0.73 ± 0.03. These R2 values were not statistically different from the corresponding values for the previously established second-order Uryson model, which were both 0.71 ± 0.03 (P > 0.1). Furthermore, none of the third-order models predicted well-known nonlinear behaviors including thresholding and saturation better than the second-order Uryson model. Additional experiments suggested that the unexplained AP variance was partly due to higher brain center activity. In conclusion, the second-order Uryson model sufficed to represent the sympathetically mediated total arc under the employed experimental conditions. PMID:27629885

  11. Climate Considerations Of The Electricity Supply Systems In Industries

    NASA Astrophysics Data System (ADS)

    Asset, Khabdullin; Zauresh, Khabdullina

    2014-12-01

    The study is focused on analysis of climate considerations of electricity supply systems in a pellet industry. The developed analysis model consists of two modules: statistical data of active power losses evaluation module and climate aspects evaluation module. The statistical data module is presented as a universal mathematical model of electrical systems and components of industrial load. It forms a basis for detailed accounting of power loss from the voltage levels. On the basis of the universal model, a set of programs is designed to perform the calculation and experimental research. It helps to obtain the statistical characteristics of the power losses and loads of the electricity supply systems and to define the nature of changes in these characteristics. Within the module, several methods and algorithms for calculating parameters of equivalent circuits of low- and high-voltage ADC and SD with a massive smooth rotor with laminated poles are developed. The climate aspects module includes an analysis of the experimental data of power supply system in pellet production. It allows identification of GHG emission reduction parameters: operation hours, type of electrical motors, values of load factor and deviation of standard value of voltage.

  12. Statistical prediction of September Arctic Sea Ice minimum based on stable teleconnections with global climate and oceanic patterns

    NASA Astrophysics Data System (ADS)

    Ionita, M.; Grosfeld, K.; Scholz, P.; Lohmann, G.

    2016-12-01

    Sea ice in both Polar Regions is an important indicator for the expression of global climate change and its polar amplification. Consequently, a broad information interest exists on sea ice, its coverage, variability and long term change. Knowledge on sea ice requires high quality data on ice extent, thickness and its dynamics. However, its predictability depends on various climate parameters and conditions. In order to provide insights into the potential development of a monthly/seasonal signal, we developed a robust statistical model based on ocean heat content, sea surface temperature and atmospheric variables to calculate an estimate of the September minimum sea ice extent for every year. Although previous statistical attempts at monthly/seasonal forecasts of September sea ice minimum show a relatively reduced skill, here it is shown that more than 97% (r = 0.98) of the September sea ice extent can predicted three months in advance by using previous months conditions via a multiple linear regression model based on global sea surface temperature (SST), mean sea level pressure (SLP), air temperature at 850hPa (TT850), surface winds and sea ice extent persistence. The statistical model is based on the identification of regions with stable teleconnections between the predictors (climatological parameters) and the predictand (here sea ice extent). The results based on our statistical model contribute to the sea ice prediction network for the sea ice outlook report (https://www.arcus.org/sipn) and could provide a tool for identifying relevant regions and climate parameters that are important for the sea ice development in the Arctic and for detecting sensitive and critical regions in global coupled climate models with focus on sea ice formation.

  13. Dimensional Reduction for the General Markov Model on Phylogenetic Trees.

    PubMed

    Sumner, Jeremy G

    2017-03-01

    We present a method of dimensional reduction for the general Markov model of sequence evolution on a phylogenetic tree. We show that taking certain linear combinations of the associated random variables (site pattern counts) reduces the dimensionality of the model from exponential in the number of extant taxa, to quadratic in the number of taxa, while retaining the ability to statistically identify phylogenetic divergence events. A key feature is the identification of an invariant subspace which depends only bilinearly on the model parameters, in contrast to the usual multi-linear dependence in the full space. We discuss potential applications including the computation of split (edge) weights on phylogenetic trees from observed sequence data.

  14. Identification of Handicapped Students (Ages 12-17) Using Data from Teachers, Parents and Tests.

    ERIC Educational Resources Information Center

    Malgoire, Mary A.; And Others

    The report examines the identification of potentially handicapping conditions in an adolescent population using data from the National Center for Health Statistics' survey (approximatey 8,000 adolescents selected in 1968). Identification of the following disabilities were examined: hearing impairment, vision problems, orthopedic handicaps, mental…

  15. Development and validation of a prediction model for functional decline in older medical inpatients.

    PubMed

    Takada, Toshihiko; Fukuma, Shingo; Yamamoto, Yosuke; Tsugihashi, Yukio; Nagano, Hiroyuki; Hayashi, Michio; Miyashita, Jun; Azuma, Teruhisa; Fukuhara, Shunichi

    2018-05-17

    To prevent functional decline in older inpatients, identification of high-risk patients is crucial. The aim of this study was to develop and validate a prediction model to assess the risk of functional decline in older medical inpatients. In this retrospective cohort study, patients ≥65 years admitted acutely to medical wards were included. The healthcare database of 246 acute care hospitals (n = 229,913) was used for derivation, and two acute care hospitals (n = 1767 and 5443, respectively) were used for validation. Data were collected using a national administrative claims and discharge database. Functional decline was defined as a decline of the Katz score at discharge compared with on admission. About 6% of patients in the derivation cohort and 9% and 2% in each validation cohort developed functional decline. A model with 7 items, age, body mass index, living in a nursing home, ambulance use, need for assistance in walking, dementia, and bedsore, was developed. On internal validation, it demonstrated a c-statistic of 0.77 (95% confidence interval (CI) = 0.767-0.771) and good fit on the calibration plot. On external validation, the c-statistics were 0.79 (95% CI = 0.77-0.81) and 0.75 (95% CI = 0.73-0.77) for each cohort, respectively. Calibration plots showed good fit in one cohort and overestimation in the other one. A prediction model for functional decline in older medical inpatients was derived and validated. It is expected that use of the model would lead to early identification of high-risk patients and introducing early intervention. Copyright © 2018 Elsevier B.V. All rights reserved.

  16. Examining the Effectiveness of Discriminant Function Analysis and Cluster Analysis in Species Identification of Male Field Crickets Based on Their Calling Songs

    PubMed Central

    Jaiswara, Ranjana; Nandi, Diptarup; Balakrishnan, Rohini

    2013-01-01

    Traditional taxonomy based on morphology has often failed in accurate species identification owing to the occurrence of cryptic species, which are reproductively isolated but morphologically identical. Molecular data have thus been used to complement morphology in species identification. The sexual advertisement calls in several groups of acoustically communicating animals are species-specific and can thus complement molecular data as non-invasive tools for identification. Several statistical tools and automated identifier algorithms have been used to investigate the efficiency of acoustic signals in species identification. Despite a plethora of such methods, there is a general lack of knowledge regarding the appropriate usage of these methods in specific taxa. In this study, we investigated the performance of two commonly used statistical methods, discriminant function analysis (DFA) and cluster analysis, in identification and classification based on acoustic signals of field cricket species belonging to the subfamily Gryllinae. Using a comparative approach we evaluated the optimal number of species and calling song characteristics for both the methods that lead to most accurate classification and identification. The accuracy of classification using DFA was high and was not affected by the number of taxa used. However, a constraint in using discriminant function analysis is the need for a priori classification of songs. Accuracy of classification using cluster analysis, which does not require a priori knowledge, was maximum for 6–7 taxa and decreased significantly when more than ten taxa were analysed together. We also investigated the efficacy of two novel derived acoustic features in improving the accuracy of identification. Our results show that DFA is a reliable statistical tool for species identification using acoustic signals. Our results also show that cluster analysis of acoustic signals in crickets works effectively for species classification and identification. PMID:24086666

  17. Rainfall Downscaling Conditional on Upper-air Atmospheric Predictors: Improved Assessment of Rainfall Statistics in a Changing Climate

    NASA Astrophysics Data System (ADS)

    Langousis, Andreas; Mamalakis, Antonis; Deidda, Roberto; Marrocu, Marino

    2015-04-01

    To improve the level skill of Global Climate Models (GCMs) and Regional Climate Models (RCMs) in reproducing the statistics of rainfall at a basin level and at hydrologically relevant temporal scales (e.g. daily), two types of statistical approaches have been suggested. One is the statistical correction of climate model rainfall outputs using historical series of precipitation. The other is the use of stochastic models of rainfall to conditionally simulate precipitation series, based on large-scale atmospheric predictors produced by climate models (e.g. geopotential height, relative vorticity, divergence, mean sea level pressure). The latter approach, usually referred to as statistical rainfall downscaling, aims at reproducing the statistical character of rainfall, while accounting for the effects of large-scale atmospheric circulation (and, therefore, climate forcing) on rainfall statistics. While promising, statistical rainfall downscaling has not attracted much attention in recent years, since the suggested approaches involved complex (i.e. subjective or computationally intense) identification procedures of the local weather, in addition to demonstrating limited success in reproducing several statistical features of rainfall, such as seasonal variations, the distributions of dry and wet spell lengths, the distribution of the mean rainfall intensity inside wet periods, and the distribution of rainfall extremes. In an effort to remedy those shortcomings, Langousis and Kaleris (2014) developed a statistical framework for simulation of daily rainfall intensities conditional on upper air variables, which accurately reproduces the statistical character of rainfall at multiple time-scales. Here, we study the relative performance of: a) quantile-quantile (Q-Q) correction of climate model rainfall products, and b) the statistical downscaling scheme of Langousis and Kaleris (2014), in reproducing the statistical structure of rainfall, as well as rainfall extremes, at a regional level. This is done for an intermediate-sized catchment in Italy, i.e. the Flumendosa catchment, using climate model rainfall and atmospheric data from the ENSEMBLES project (http://ensembleseu.metoffice.com). In doing so, we split the historical rainfall record of mean areal precipitation (MAP) in 15-year calibration and 45-year validation periods, and compare the historical rainfall statistics to those obtained from: a) Q-Q corrected climate model rainfall products, and b) synthetic rainfall series generated by the suggested downscaling scheme. To our knowledge, this is the first time that climate model rainfall and statistically downscaled precipitation are compared to catchment-averaged MAP at a daily resolution. The obtained results are promising, since the proposed downscaling scheme is more accurate and robust in reproducing a number of historical rainfall statistics, independent of the climate model used and the length of the calibration period. This is particularly the case for the yearly rainfall maxima, where direct statistical correction of climate model rainfall outputs shows increased sensitivity to the length of the calibration period and the climate model used. The robustness of the suggested downscaling scheme in modeling rainfall extremes at a daily resolution, is a notable feature that can effectively be used to assess hydrologic risk at a regional level under changing climatic conditions. Acknowledgments The research project is implemented within the framework of the Action «Supporting Postdoctoral Researchers» of the Operational Program "Education and Lifelong Learning" (Action's Beneficiary: General Secretariat for Research and Technology), and is co-financed by the European Social Fund (ESF) and the Greek State. CRS4 highly acknowledges the contribution of the Sardinian regional authorities.

  18. Model identification using stochastic differential equation grey-box models in diabetes.

    PubMed

    Duun-Henriksen, Anne Katrine; Schmidt, Signe; Røge, Rikke Meldgaard; Møller, Jonas Bech; Nørgaard, Kirsten; Jørgensen, John Bagterp; Madsen, Henrik

    2013-03-01

    The acceptance of virtual preclinical testing of control algorithms is growing and thus also the need for robust and reliable models. Models based on ordinary differential equations (ODEs) can rarely be validated with standard statistical tools. Stochastic differential equations (SDEs) offer the possibility of building models that can be validated statistically and that are capable of predicting not only a realistic trajectory, but also the uncertainty of the prediction. In an SDE, the prediction error is split into two noise terms. This separation ensures that the errors are uncorrelated and provides the possibility to pinpoint model deficiencies. An identifiable model of the glucoregulatory system in a type 1 diabetes mellitus (T1DM) patient is used as the basis for development of a stochastic-differential-equation-based grey-box model (SDE-GB). The parameters are estimated on clinical data from four T1DM patients. The optimal SDE-GB is determined from likelihood-ratio tests. Finally, parameter tracking is used to track the variation in the "time to peak of meal response" parameter. We found that the transformation of the ODE model into an SDE-GB resulted in a significant improvement in the prediction and uncorrelated errors. Tracking of the "peak time of meal absorption" parameter showed that the absorption rate varied according to meal type. This study shows the potential of using SDE-GBs in diabetes modeling. Improved model predictions were obtained due to the separation of the prediction error. SDE-GBs offer a solid framework for using statistical tools for model validation and model development. © 2013 Diabetes Technology Society.

  19. Introduction to bioinformatics.

    PubMed

    Can, Tolga

    2014-01-01

    Bioinformatics is an interdisciplinary field mainly involving molecular biology and genetics, computer science, mathematics, and statistics. Data intensive, large-scale biological problems are addressed from a computational point of view. The most common problems are modeling biological processes at the molecular level and making inferences from collected data. A bioinformatics solution usually involves the following steps: Collect statistics from biological data. Build a computational model. Solve a computational modeling problem. Test and evaluate a computational algorithm. This chapter gives a brief introduction to bioinformatics by first providing an introduction to biological terminology and then discussing some classical bioinformatics problems organized by the types of data sources. Sequence analysis is the analysis of DNA and protein sequences for clues regarding function and includes subproblems such as identification of homologs, multiple sequence alignment, searching sequence patterns, and evolutionary analyses. Protein structures are three-dimensional data and the associated problems are structure prediction (secondary and tertiary), analysis of protein structures for clues regarding function, and structural alignment. Gene expression data is usually represented as matrices and analysis of microarray data mostly involves statistics analysis, classification, and clustering approaches. Biological networks such as gene regulatory networks, metabolic pathways, and protein-protein interaction networks are usually modeled as graphs and graph theoretic approaches are used to solve associated problems such as construction and analysis of large-scale networks.

  20. Fast Identification of Biological Pathways Associated with a Quantitative Trait Using Group Lasso with Overlaps

    PubMed Central

    Silver, Matt; Montana, Giovanni

    2012-01-01

    Where causal SNPs (single nucleotide polymorphisms) tend to accumulate within biological pathways, the incorporation of prior pathways information into a statistical model is expected to increase the power to detect true associations in a genetic association study. Most existing pathways-based methods rely on marginal SNP statistics and do not fully exploit the dependence patterns among SNPs within pathways. We use a sparse regression model, with SNPs grouped into pathways, to identify causal pathways associated with a quantitative trait. Notable features of our “pathways group lasso with adaptive weights” (P-GLAW) algorithm include the incorporation of all pathways in a single regression model, an adaptive pathway weighting procedure that accounts for factors biasing pathway selection, and the use of a bootstrap sampling procedure for the ranking of important pathways. P-GLAW takes account of the presence of overlapping pathways and uses a novel combination of techniques to optimise model estimation, making it fast to run, even on whole genome datasets. In a comparison study with an alternative pathways method based on univariate SNP statistics, our method demonstrates high sensitivity and specificity for the detection of important pathways, showing the greatest relative gains in performance where marginal SNP effect sizes are small. PMID:22499682

  1. Methodological development for selection of significant predictors explaining fatal road accidents.

    PubMed

    Dadashova, Bahar; Arenas-Ramírez, Blanca; Mira-McWilliams, José; Aparicio-Izquierdo, Francisco

    2016-05-01

    Identification of the most relevant factors for explaining road accident occurrence is an important issue in road safety research, particularly for future decision-making processes in transport policy. However model selection for this particular purpose is still an ongoing research. In this paper we propose a methodological development for model selection which addresses both explanatory variable and adequate model selection issues. A variable selection procedure, TIM (two-input model) method is carried out by combining neural network design and statistical approaches. The error structure of the fitted model is assumed to follow an autoregressive process. All models are estimated using Markov Chain Monte Carlo method where the model parameters are assigned non-informative prior distributions. The final model is built using the results of the variable selection. For the application of the proposed methodology the number of fatal accidents in Spain during 2000-2011 was used. This indicator has experienced the maximum reduction internationally during the indicated years thus making it an interesting time series from a road safety policy perspective. Hence the identification of the variables that have affected this reduction is of particular interest for future decision making. The results of the variable selection process show that the selected variables are main subjects of road safety policy measures. Published by Elsevier Ltd.

  2. Anthropometric correlations between parts of the upper and lower limb: models for personal identification in a Sudanese population.

    PubMed

    Ahmed, Altayeb Abdalla

    2016-09-01

    Identification of a deceased individual is an essential component of medicolegal practice. However, personal identification based on commingled limbs or parts of limbs, necessary in investigations of mass disasters or some crimes, is a difficult task. Limb measurements have been utilized in the development of biological parameters for personal identification, but the possibility to estimate the dimensions of parts of limbs other than hands and feet has not been assessed. The present study proposes an approach to estimate the dimensions of various parts of limbs based on other limb measurements. The study included 320 Sudanese adults, with equal representation of men and women. Nine limb dimensions were measured (five based on the upper limb, four based on the lower limb), and extensive statistical analysis of the distribution of values was performed. The results showed that all of the measured dimensions were sexually dimorphic and that there was a significant positive correlation between the dimensions of various parts of limbs. Regression models (direct and stepwise) were developed to estimate the dimensions of parts of limbs based on measurements pertaining to one or more other parts of limbs. The study revealed that the dimensions of parts of the upper and lower limb can be estimated from one another. These findings can be used in medicolegal practice and extended to constructive surgery, orthopedics, and prosthesis design for lost limbs.

  3. Label-free sensor for automatic identification of erythrocytes using digital in-line holographic microscopy and machine learning.

    PubMed

    Go, Taesik; Byeon, Hyeokjun; Lee, Sang Joon

    2018-04-30

    Cell types of erythrocytes should be identified because they are closely related to their functionality and viability. Conventional methods for classifying erythrocytes are time consuming and labor intensive. Therefore, an automatic and accurate erythrocyte classification system is indispensable in healthcare and biomedical fields. In this study, we proposed a new label-free sensor for automatic identification of erythrocyte cell types using a digital in-line holographic microscopy (DIHM) combined with machine learning algorithms. A total of 12 features, including information on intensity distributions, morphological descriptors, and optical focusing characteristics, is quantitatively obtained from numerically reconstructed holographic images. All individual features for discocytes, echinocytes, and spherocytes are statistically different. To improve the performance of cell type identification, we adopted several machine learning algorithms, such as decision tree model, support vector machine, linear discriminant classification, and k-nearest neighbor classification. With the aid of these machine learning algorithms, the extracted features are effectively utilized to distinguish erythrocytes. Among the four tested algorithms, the decision tree model exhibits the best identification performance for the training sets (n = 440, 98.18%) and test sets (n = 190, 97.37%). This proposed methodology, which smartly combined DIHM and machine learning, would be helpful for sensing abnormal erythrocytes and computer-aided diagnosis of hematological diseases in clinic. Copyright © 2017 Elsevier B.V. All rights reserved.

  4. Feature discrimination/identification based upon SAR return variations

    NASA Technical Reports Server (NTRS)

    Rasco, W. A., Sr.; Pietsch, R.

    1978-01-01

    A study of the statistics of The look-to-look variation statistics in the returns recorded in-flight by a digital, realtime SAR system are analyzed. The determination that the variations in the look-to-look returns from different classes do carry information content unique to the classes was illustrated by a model based on four variants derived from four look in-flight SAR data under study. The model was limited to four classes of returns: mowed grass on a athletic field, rough unmowed grass and weeds on a large vacant field, young fruit trees in a large orchard, and metal mobile homes and storage buildings in a large mobile home park. The data population in excess of 1000 returns represented over 250 individual pixels from the four classes. The multivariant discriminant model operated on the set of returns for each pixel and assigned that pixel to one of the four classes, based on the target variants and the probability distribution function of the four variants for each class.

  5. Line identification studies using traditional techniques and wavelength coincidence statistics

    NASA Technical Reports Server (NTRS)

    Cowley, Charles R.; Adelman, Saul J.

    1990-01-01

    Traditional line identification techniques result in the assignment of individual lines to an atomic or ionic species. These methods may be supplemented by wavelength coincidence statistics (WCS). The strength and weakness of these methods are discussed using spectra of a number of normal and peculiar B and A stars that have been studied independently by both methods. The present results support the overall findings of some earlier studies. WCS would be most useful in a first survey, before traditional methods have been applied. WCS can quickly make a global search for all species and in this way may enable identifications of an unexpected spectrum that could easily be omitted entirely from a traditional study. This is illustrated by O I. WCS is a subject to well known weakness of any statistical technique, for example, a predictable number of spurious results are to be expected. The danger of small number statistics are illustrated. WCS is at its best relative to traditional methods in finding a line-rich atomic species that is only weakly present in a complicated stellar spectrum.

  6. Development of the statistical ARIMA model: an application for predicting the upcoming of MJO index

    NASA Astrophysics Data System (ADS)

    Hermawan, Eddy; Nurani Ruchjana, Budi; Setiawan Abdullah, Atje; Gede Nyoman Mindra Jaya, I.; Berliana Sipayung, Sinta; Rustiana, Shailla

    2017-10-01

    This study is mainly concerned in development one of the most important equatorial atmospheric phenomena that we call as the Madden Julian Oscillation (MJO) which having strong impacts to the extreme rainfall anomalies over the Indonesian Maritime Continent (IMC). In this study, we focused to the big floods over Jakarta and surrounded area that suspecting caused by the impacts of MJO. We concentrated to develop the MJO index using the statistical model that we call as Box-Jenkis (ARIMA) ini 1996, 2002, and 2007, respectively. They are the RMM (Real Multivariate MJO) index as represented by RMM1 and RMM2, respectively. There are some steps to develop that model, starting from identification of data, estimated, determined model, before finally we applied that model for investigation some big floods that occurred at Jakarta in 1996, 2002, and 2007 respectively. We found the best of estimated model for the RMM1 and RMM2 prediction is ARIMA (2,1,2). Detailed steps how that model can be extracted and applying to predict the rainfall anomalies over Jakarta for 3 to 6 months later is discussed at this paper.

  7. Direct and indirect effects of birth order on personality and identity: support for the null hypothesis.

    PubMed

    Dunkel, Curtis S; Harbke, Colin R; Papini, Dennis R

    2009-06-01

    The authors proposed that birth order affects psychosocial outcomes through differential investment from parent to child and differences in the degree of identification from child to parent. The authors conducted this study to test these 2 models. Despite the use of statistical and methodological procedures to increase sensitivity and reduce error, the authors did not find support for the models. They discuss results in the context of the mixed-research findings regarding birth order and suggest further research on the proposed developmental dynamics that may produce birth-order effects.

  8. MDD-carb: a combinatorial model for the identification of protein carbonylation sites with substrate motifs.

    PubMed

    Kao, Hui-Ju; Weng, Shun-Long; Huang, Kai-Yao; Kaunang, Fergie Joanda; Hsu, Justin Bo-Kai; Huang, Chien-Hsun; Lee, Tzong-Yi

    2017-12-21

    Carbonylation, which takes place through oxidation of reactive oxygen species (ROS) on specific residues, is an irreversibly oxidative modification of proteins. It has been reported that the carbonylation is related to a number of metabolic or aging diseases including diabetes, chronic lung disease, Parkinson's disease, and Alzheimer's disease. Due to the lack of computational methods dedicated to exploring motif signatures of protein carbonylation sites, we were motivated to exploit an iterative statistical method to characterize and identify carbonylated sites with motif signatures. By manually curating experimental data from research articles, we obtained 332, 144, 135, and 140 verified substrate sites for K (lysine), R (arginine), T (threonine), and P (proline) residues, respectively, from 241 carbonylated proteins. In order to examine the informative attributes for classifying between carbonylated and non-carbonylated sites, multifarious features including composition of twenty amino acids (AAC), composition of amino acid pairs (AAPC), position-specific scoring matrix (PSSM), and positional weighted matrix (PWM) were investigated in this study. Additionally, in an attempt to explore the motif signatures of carbonylation sites, an iterative statistical method was adopted to detect statistically significant dependencies of amino acid compositions between specific positions around substrate sites. Profile hidden Markov model (HMM) was then utilized to train a predictive model from each motif signature. Moreover, based on the method of support vector machine (SVM), we adopted it to construct an integrative model by combining the values of bit scores obtained from profile HMMs. The combinatorial model could provide an enhanced performance with evenly predictive sensitivity and specificity in the evaluation of cross-validation and independent testing. This study provides a new scheme for exploring potential motif signatures at substrate sites of protein carbonylation. The usefulness of the revealed motifs in the identification of carbonylated sites is demonstrated by their effective performance in cross-validation and independent testing. Finally, these substrate motifs were adopted to build an available online resource (MDD-Carb, http://csb.cse.yzu.edu.tw/MDDCarb/ ) and are also anticipated to facilitate the study of large-scale carbonylated proteomes.

  9. Development of a conceptual integrated traffic safety problem identification database

    DOT National Transportation Integrated Search

    1999-12-01

    The project conceptualized a traffic safety risk management information system and statistical database for improved problem-driver identification, countermeasure development, and resource allocation. The California Department of Motor Vehicles Drive...

  10. [Dermatoglyphics in the prognostication of constitutional and physical traits in humans].

    PubMed

    Mazur, E S; Sidorenko, A G

    2009-01-01

    The present study was designed to elucidate the relationship between palmar and digital dermatoglyphic patterns and descriptive signs of human appearance based on the results of comprehensive anthropometric examination of 2620 men and 380 women. A battery of different methods were used to statistically treat the results of dactyloscopic records. They demonstrated correlation between skin patterns and external body features that can be used to construct diagnostic models for the purpose of personality identification.

  11. Statistical Analysis of Physiological Signals

    NASA Astrophysics Data System (ADS)

    Ruiz, María G.; Pérez, Leticia

    2003-07-01

    In spite of two hundred years of clinical practice, Homeopathy still lacks of scientific basis. Its fundamental laws, similia principle and the activity of the denominated ultra-high dilutions are controversial issues that do not fit into the mainstream medicine or current physical-chemistry field as well. Aside its clinical efficacy, the identification of physical - chemistry parameters, as markers of the homeopathic effect, would allow to construct mathematic models [1], which in turn, could provide clues regarding the involved mechanism.

  12. Gene Identification Algorithms Using Exploratory Statistical Analysis of Periodicity

    NASA Astrophysics Data System (ADS)

    Mukherjee, Shashi Bajaj; Sen, Pradip Kumar

    2010-10-01

    Studying periodic pattern is expected as a standard line of attack for recognizing DNA sequence in identification of gene and similar problems. But peculiarly very little significant work is done in this direction. This paper studies statistical properties of DNA sequences of complete genome using a new technique. A DNA sequence is converted to a numeric sequence using various types of mappings and standard Fourier technique is applied to study the periodicity. Distinct statistical behaviour of periodicity parameters is found in coding and non-coding sequences, which can be used to distinguish between these parts. Here DNA sequences of Drosophila melanogaster were analyzed with significant accuracy.

  13. Advancements in robust algorithm formulation for speaker identification of whispered speech

    NASA Astrophysics Data System (ADS)

    Fan, Xing

    Whispered speech is an alternative speech production mode from neutral speech, which is used by talkers intentionally in natural conversational scenarios to protect privacy and to avoid certain content from being overheard/made public. Due to the profound differences between whispered and neutral speech in production mechanism and the absence of whispered adaptation data, the performance of speaker identification systems trained with neutral speech degrades significantly. This dissertation therefore focuses on developing a robust closed-set speaker recognition system for whispered speech by using no or limited whispered adaptation data from non-target speakers. This dissertation proposes the concept of "High''/"Low'' performance whispered data for the purpose of speaker identification. A variety of acoustic properties are identified that contribute to the quality of whispered data. An acoustic analysis is also conducted to compare the phoneme/speaker dependency of the differences between whispered and neutral data in the feature domain. The observations from those acoustic analysis are new in this area and also serve as a guidance for developing robust speaker identification systems for whispered speech. This dissertation further proposes two systems for speaker identification of whispered speech. One system focuses on front-end processing. A two-dimensional feature space is proposed to search for "Low''-quality performance based whispered utterances and separate feature mapping functions are applied to vowels and consonants respectively in order to retain the speaker's information shared between whispered and neutral speech. The other system focuses on speech-mode-independent model training. The proposed method generates pseudo whispered features from neutral features by using the statistical information contained in a whispered Universal Background model (UBM) trained from extra collected whispered data from non-target speakers. Four modeling methods are proposed for the transformation estimation in order to generate the pseudo whispered features. Both of the above two systems demonstrate a significant improvement over the baseline system on the evaluation data. This dissertation has therefore contributed to providing a scientific understanding of the differences between whispered and neutral speech as well as improved front-end processing and modeling method for speaker identification of whispered speech. Such advancements will ultimately contribute to improve the robustness of speech processing systems.

  14. Standardized residual as response function for order identification of multi input intervention analysis

    NASA Astrophysics Data System (ADS)

    Suhartono, Lee, Muhammad Hisyam; Rezeki, Sri

    2017-05-01

    Intervention analysis is a statistical model in the group of time series analysis which is widely used to describe the effect of an intervention caused by external or internal factors. An example of external factors that often occurs in Indonesia is a disaster, both natural or man-made disaster. The main purpose of this paper is to provide the results of theoretical studies on identification step for determining the order of multi inputs intervention analysis for evaluating the magnitude and duration of the impact of interventions on time series data. The theoretical result showed that the standardized residuals could be used properly as response function for determining the order of multi inputs intervention model. Then, these results are applied for evaluating the impact of a disaster on a real case in Indonesia, i.e. the magnitude and duration of the impact of the Lapindo mud on the volume of vehicles on the highway. Moreover, the empirical results showed that the multi inputs intervention model can describe and explain accurately the magnitude and duration of the impact of disasters on a time series data.

  15. Extended Kalman filtering for the detection of damage in linear mechanical structures

    NASA Astrophysics Data System (ADS)

    Liu, X.; Escamilla-Ambrosio, P. J.; Lieven, N. A. J.

    2009-09-01

    This paper addresses the problem of assessing the location and extent of damage in a vibrating structure by means of vibration measurements. Frequency domain identification methods (e.g. finite element model updating) have been widely used in this area while time domain methods such as the extended Kalman filter (EKF) method, are more sparsely represented. The difficulty of applying EKF in mechanical system damage identification and localisation lies in: the high computational cost, the dependence of estimation results on the initial estimation error covariance matrix P(0), the initial value of parameters to be estimated, and on the statistics of measurement noise R and process noise Q. To resolve these problems in the EKF, a multiple model adaptive estimator consisting of a bank of EKF in modal domain was designed, each filter in the bank is based on different P(0). The algorithm was iterated by using the weighted global iteration method. A fuzzy logic model was incorporated in each filter to estimate the variance of the measurement noise R. The application of the method is illustrated by simulated and real examples.

  16. Research Update: Spatially resolved mapping of electronic structure on atomic level by multivariate statistical analysis

    NASA Astrophysics Data System (ADS)

    Belianinov, Alex; Ganesh, Panchapakesan; Lin, Wenzhi; Sales, Brian C.; Sefat, Athena S.; Jesse, Stephen; Pan, Minghu; Kalinin, Sergei V.

    2014-12-01

    Atomic level spatial variability of electronic structure in Fe-based superconductor FeTe0.55Se0.45 (Tc = 15 K) is explored using current-imaging tunneling-spectroscopy. Multivariate statistical analysis of the data differentiates regions of dissimilar electronic behavior that can be identified with the segregation of chalcogen atoms, as well as boundaries between terminations and near neighbor interactions. Subsequent clustering analysis allows identification of the spatial localization of these dissimilar regions. Similar statistical analysis of modeled calculated density of states of chemically inhomogeneous FeTe1-xSex structures further confirms that the two types of chalcogens, i.e., Te and Se, can be identified by their electronic signature and differentiated by their local chemical environment. This approach allows detailed chemical discrimination of the scanning tunneling microscopy data including separation of atomic identities, proximity, and local configuration effects and can be universally applicable to chemically and electronically inhomogeneous surfaces.

  17. Identification of abnormal accident patterns at intersections

    DOT National Transportation Integrated Search

    1999-08-01

    This report presents the findings and recommendations based on the Identification of Abnormal Accident Patterns at Intersections. This project used a statistically valid sampling method to determine whether a specific intersection has an abnormally h...

  18. Towards simplification of hydrologic modeling: Identification of dominant processes

    USGS Publications Warehouse

    Markstrom, Steven; Hay, Lauren E.; Clark, Martyn P.

    2016-01-01

    The Precipitation–Runoff Modeling System (PRMS), a distributed-parameter hydrologic model, has been applied to the conterminous US (CONUS). Parameter sensitivity analysis was used to identify: (1) the sensitive input parameters and (2) particular model output variables that could be associated with the dominant hydrologic process(es). Sensitivity values of 35 PRMS calibration parameters were computed using the Fourier amplitude sensitivity test procedure on 110 000 independent hydrologically based spatial modeling units covering the CONUS and then summarized to process (snowmelt, surface runoff, infiltration, soil moisture, evapotranspiration, interflow, baseflow, and runoff) and model performance statistic (mean, coefficient of variation, and autoregressive lag 1). Identified parameters and processes provide insight into model performance at the location of each unit and allow the modeler to identify the most dominant process on the basis of which processes are associated with the most sensitive parameters. The results of this study indicate that: (1) the choice of performance statistic and output variables has a strong influence on parameter sensitivity, (2) the apparent model complexity to the modeler can be reduced by focusing on those processes that are associated with sensitive parameters and disregarding those that are not, (3) different processes require different numbers of parameters for simulation, and (4) some sensitive parameters influence only one hydrologic process, while others may influence many

  19. The Statistical Analysis Techniques to Support the NGNP Fuel Performance Experiments

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bihn T. Pham; Jeffrey J. Einerson

    2010-06-01

    This paper describes the development and application of statistical analysis techniques to support the AGR experimental program on NGNP fuel performance. The experiments conducted in the Idaho National Laboratory’s Advanced Test Reactor employ fuel compacts placed in a graphite cylinder shrouded by a steel capsule. The tests are instrumented with thermocouples embedded in graphite blocks and the target quantity (fuel/graphite temperature) is regulated by the He-Ne gas mixture that fills the gap volume. Three techniques for statistical analysis, namely control charting, correlation analysis, and regression analysis, are implemented in the SAS-based NGNP Data Management and Analysis System (NDMAS) for automatedmore » processing and qualification of the AGR measured data. The NDMAS also stores daily neutronic (power) and thermal (heat transfer) code simulation results along with the measurement data, allowing for their combined use and comparative scrutiny. The ultimate objective of this work includes (a) a multi-faceted system for data monitoring and data accuracy testing, (b) identification of possible modes of diagnostics deterioration and changes in experimental conditions, (c) qualification of data for use in code validation, and (d) identification and use of data trends to support effective control of test conditions with respect to the test target. Analysis results and examples given in the paper show the three statistical analysis techniques providing a complementary capability to warn of thermocouple failures. It also suggests that the regression analysis models relating calculated fuel temperatures and thermocouple readings can enable online regulation of experimental parameters (i.e. gas mixture content), to effectively maintain the target quantity (fuel temperature) within a given range.« less

  20. A video multitracking system for quantification of individual behavior in a large fish shoal: advantages and limits.

    PubMed

    Delcourt, Johann; Becco, Christophe; Vandewalle, Nicolas; Poncin, Pascal

    2009-02-01

    The capability of a new multitracking system to track a large number of unmarked fish (up to 100) is evaluated. This system extrapolates a trajectory from each individual and analyzes recorded sequences that are several minutes long. This system is very efficient in statistical individual tracking, where the individual's identity is important for a short period of time in comparison with the duration of the track. Individual identification is typically greater than 99%. Identification is largely efficient (more than 99%) when the fish images do not cross the image of a neighbor fish. When the images of two fish merge (occlusion), we consider that the spot on the screen has a double identity. Consequently, there are no identification errors during occlusions, even though the measurement of the positions of each individual is imprecise. When the images of these two merged fish separate (separation), individual identification errors are more frequent, but their effect is very low in statistical individual tracking. On the other hand, in complete individual tracking, where individual fish identity is important for the entire trajectory, each identification error invalidates the results. In such cases, the experimenter must observe whether the program assigns the correct identification, and, when an error is made, must edit the results. This work is not too costly in time because it is limited to the separation events, accounting for fewer than 0.1% of individual identifications. Consequently, in both statistical and rigorous individual tracking, this system allows the experimenter to gain time by measuring the individual position automatically. It can also analyze the structural and dynamic properties of an animal group with a very large sample, with precision and sampling that are impossible to obtain with manual measures.

  1. Inferring Characteristics of Sensorimotor Behavior by Quantifying Dynamics of Animal Locomotion

    NASA Astrophysics Data System (ADS)

    Leung, KaWai

    Locomotion is one of the most well-studied topics in animal behavioral studies. Many fundamental and clinical research make use of the locomotion of an animal model to explore various aspects in sensorimotor behavior. In the past, most of these studies focused on population average of a specific trait due to limitation of data collection and processing power. With recent advance in computer vision and statistical modeling techniques, it is now possible to track and analyze large amounts of behavioral data. In this thesis, I present two projects that aim to infer the characteristics of sensorimotor behavior by quantifying the dynamics of locomotion of nematode Caenorhabditis elegans and fruit fly Drosophila melanogaster, shedding light on statistical dependence between sensing and behavior. In the first project, I investigate the possibility of inferring noxious sensory information from the behavior of Caenorhabditis elegans. I develop a statistical model to infer the heat stimulus level perceived by individual animals from their stereotyped escape responses after stimulation by an IR laser. The model allows quantification of analgesic-like effects of chemical agents or genetic mutations in the worm. At the same time, the method is able to differentiate perturbations of locomotion behavior that are beyond affecting the sensory system. With this model I propose experimental designs that allows statistically significant identification of analgesic-like effects. In the second project, I investigate the relationship of energy budget and stability of locomotion in determining the walking speed distribution of Drosophila melanogaster during aging. The locomotion stability at different age groups is estimated from video recordings using Floquet theory. I calculate the power consumption of different locomotion speed using a biomechanics model. In conclusion, the power consumption, not stability, predicts the locomotion speed distribution at different ages.

  2. Gait patterns for crime fighting: statistical evaluation

    NASA Astrophysics Data System (ADS)

    Sulovská, Kateřina; Bělašková, Silvie; Adámek, Milan

    2013-10-01

    The criminality is omnipresent during the human history. Modern technology brings novel opportunities for identification of a perpetrator. One of these opportunities is an analysis of video recordings, which may be taken during the crime itself or before/after the crime. The video analysis can be classed as identification analyses, respectively identification of a person via externals. The bipedal locomotion focuses on human movement on the basis of their anatomical-physiological features. Nowadays, the human gait is tested by many laboratories to learn whether the identification via bipedal locomotion is possible or not. The aim of our study is to use 2D components out of 3D data from the VICON Mocap system for deep statistical analyses. This paper introduces recent results of a fundamental study focused on various gait patterns during different conditions. The study contains data from 12 participants. Curves obtained from these measurements were sorted, averaged and statistically tested to estimate the stability and distinctiveness of this biometrics. Results show satisfactory distinctness of some chosen points, while some do not embody significant difference. However, results presented in this paper are of initial phase of further deeper and more exacting analyses of gait patterns under different conditions.

  3. Analysis of survival data from telemetry projects

    USGS Publications Warehouse

    Bunck, C.M.; Winterstein, S.R.; Pollock, K.H.

    1985-01-01

    Telemetry techniques can be used to study the survival rates of animal populations and are particularly suitable for species or settings for which band recovery models are not. Statistical methods for estimating survival rates and parameters of survival distributions from observations of radio-tagged animals will be described. These methods have been applied to medical and engineering studies and to the study of nest success. Estimates and tests based on discrete models, originally introduced by Mayfield, and on continuous models, both parametric and nonparametric, will be described. Generalizations, including staggered entry of subjects into the study and identification of mortality factors will be considered. Additional discussion topics will include sample size considerations, relocation frequency for subjects, and use of covariates.

  4. Likelihood Ratio, Optimal Decision Rules, and Relationship between Proportion Correct and d' in the Dual-Pair AB vs BA identification Paradigm

    PubMed Central

    Micheyl, Christophe; Dai, Huanping

    2010-01-01

    The equal-variance Gaussian signal-detection-theory (SDT) decision model for the dual-pair change-detection (or “4IAX”) paradigm has been described in earlier publications. In this note, we consider the equal-variance Gaussian SDT model for the related dual-pair AB vs BA identification paradigm. The likelihood ratios, optimal decision rules, receiver operating characteristics (ROCs), and relationships between d' and proportion-correct (PC) are analyzed for two special cases: that of statistically independent observations, which is likely to apply in constant-stimuli experiments, and that of highly correlated observations, which is likely to apply in experiments where stimuli are roved widely across trials or pairs. A surprising outcome of this analysis is that although these two situations lead to different optimal decision rules, the predicted ROCs and proportions of correct responses (PCs) for these two cases are not substantially different, and are either identical or similar to those observed in the basic Yes-No paradigm. PMID:19633356

  5. Goldstone radio spectrum signal identification, March 1980 - March 1982

    NASA Technical Reports Server (NTRS)

    Gaudian, B. A.

    1982-01-01

    The signal identification process is described. The Goldstone radio spectrum environment contains signals that are a potential source of electromagnetic interference to the Goldstone tracking receivers. The identification of these signals is accomplished by the use of signal parameters and environment parameters. Statistical data on the Goldstone radio spectrum environment from 2285 to 2305 MHz are provided.

  6. Identification of market trends with string and D2-brane maps

    NASA Astrophysics Data System (ADS)

    Bartoš, Erik; Pinčák, Richard

    2017-08-01

    The multidimensional string objects are introduced as a new alternative for an application of string models for time series forecasting in trading on financial markets. The objects are represented by open string with 2-endpoints and D2-brane, which are continuous enhancement of 1-endpoint open string model. We show how new object properties can change the statistics of the predictors, which makes them the candidates for modeling a wide range of time series systems. String angular momentum is proposed as another tool to analyze the stability of currency rates except the historical volatility. To show the reliability of our approach with application of string models for time series forecasting we present the results of real demo simulations for four currency exchange pairs.

  7. Identification of Chinese plague foci from long-term epidemiological data

    PubMed Central

    Ben-Ari, Tamara; Neerinckx, Simon; Agier, Lydiane; Cazelles, Bernard; Xu, Lei; Zhang, Zhibin; Fang, Xiye; Wang, Shuchun; Liu, Qiyong; Stenseth, Nils C.

    2012-01-01

    Carrying out statistical analysis over an extensive dataset of human plague reports in Chinese villages from 1772 to 1964, we identified plague endemic territories in China (i.e., plague foci). Analyses rely on (i) a clustering method that groups time series based on their time-frequency resemblances and (ii) an ecological niche model that helps identify plague suitable territories characterized by value ranges for a set of predefined environmental variables. Results from both statistical tools indicate the existence of two disconnected plague territories corresponding to Northern and Southern China. Altogether, at least four well defined independent foci are identified. Their contours compare favorably with field observations. Potential and limitations of inferring plague foci and dynamics using epidemiological data is discussed. PMID:22570501

  8. Computational Identification of Genomic Features That Influence 3D Chromatin Domain Formation.

    PubMed

    Mourad, Raphaël; Cuvier, Olivier

    2016-05-01

    Recent advances in long-range Hi-C contact mapping have revealed the importance of the 3D structure of chromosomes in gene expression. A current challenge is to identify the key molecular drivers of this 3D structure. Several genomic features, such as architectural proteins and functional elements, were shown to be enriched at topological domain borders using classical enrichment tests. Here we propose multiple logistic regression to identify those genomic features that positively or negatively influence domain border establishment or maintenance. The model is flexible, and can account for statistical interactions among multiple genomic features. Using both simulated and real data, we show that our model outperforms enrichment test and non-parametric models, such as random forests, for the identification of genomic features that influence domain borders. Using Drosophila Hi-C data at a very high resolution of 1 kb, our model suggests that, among architectural proteins, BEAF-32 and CP190 are the main positive drivers of 3D domain borders. In humans, our model identifies well-known architectural proteins CTCF and cohesin, as well as ZNF143 and Polycomb group proteins as positive drivers of domain borders. The model also reveals the existence of several negative drivers that counteract the presence of domain borders including P300, RXRA, BCL11A and ELK1.

  9. Computational Identification of Genomic Features That Influence 3D Chromatin Domain Formation

    PubMed Central

    Mourad, Raphaël; Cuvier, Olivier

    2016-01-01

    Recent advances in long-range Hi-C contact mapping have revealed the importance of the 3D structure of chromosomes in gene expression. A current challenge is to identify the key molecular drivers of this 3D structure. Several genomic features, such as architectural proteins and functional elements, were shown to be enriched at topological domain borders using classical enrichment tests. Here we propose multiple logistic regression to identify those genomic features that positively or negatively influence domain border establishment or maintenance. The model is flexible, and can account for statistical interactions among multiple genomic features. Using both simulated and real data, we show that our model outperforms enrichment test and non-parametric models, such as random forests, for the identification of genomic features that influence domain borders. Using Drosophila Hi-C data at a very high resolution of 1 kb, our model suggests that, among architectural proteins, BEAF-32 and CP190 are the main positive drivers of 3D domain borders. In humans, our model identifies well-known architectural proteins CTCF and cohesin, as well as ZNF143 and Polycomb group proteins as positive drivers of domain borders. The model also reveals the existence of several negative drivers that counteract the presence of domain borders including P300, RXRA, BCL11A and ELK1. PMID:27203237

  10. AIR PARTICULATE POLLUTION CARDIOVASCULAR TOXICITY: HAZARD IDENTIFICATION AND MECHANISMS OF ACTION

    EPA Science Inventory


    The overall weight of evidence from epidemiological studies has shown statistical associations between air particulate pollution exposure and mortality\\morbidity particularly within individuals with cardiovascular disease (1-4). Identification of causal particle properties ...

  11. A neural network for the identification of measured helicopter noise

    NASA Technical Reports Server (NTRS)

    Cabell, R. H.; Fuller, C. R.; O'Brien, W. F.

    1991-01-01

    The results of a preliminary study of the components of a novel acoustic helicopter identification system are described. The identification system uses the relationship between the amplitudes of the first eight harmonics in the main rotor noise spectrum to distinguish between helicopter types. Two classification algorithms are tested; a statistically optimal Bayes classifier, and a neural network adaptive classifier. The performance of these classifiers is tested using measured noise of three helicopters. The statistical classifier can correctly identify the helicopter an average of 67 percent of the time, while the neural network is correct an average of 65 percent of the time. These results indicate the need for additional study of the envelope of harmonic amplitudes as a component of a helicopter identification system. Issues concerning the implementation of the neural network classifier, such as training time and structure of the network, are discussed.

  12. A critique of the use of indicator-species scores for identifying thresholds in species responses

    USGS Publications Warehouse

    Cuffney, Thomas F.; Qian, Song S.

    2013-01-01

    Identification of ecological thresholds is important both for theoretical and applied ecology. Recently, Baker and King (2010, King and Baker 2010) proposed a method, threshold indicator analysis (TITAN), to calculate species and community thresholds based on indicator species scores adapted from Dufrêne and Legendre (1997). We tested the ability of TITAN to detect thresholds using models with (broken-stick, disjointed broken-stick, dose-response, step-function, Gaussian) and without (linear) definitive thresholds. TITAN accurately and consistently detected thresholds in step-function models, but not in models characterized by abrupt changes in response slopes or response direction. Threshold detection in TITAN was very sensitive to the distribution of 0 values, which caused TITAN to identify thresholds associated with relatively small differences in the distribution of 0 values while ignoring thresholds associated with large changes in abundance. Threshold identification and tests of statistical significance were based on the same data permutations resulting in inflated estimates of statistical significance. Application of bootstrapping to the split-point problem that underlies TITAN led to underestimates of the confidence intervals of thresholds. Bias in the derivation of the z-scores used to identify TITAN thresholds and skewedness in the distribution of data along the gradient produced TITAN thresholds that were much more similar than the actual thresholds. This tendency may account for the synchronicity of thresholds reported in TITAN analyses. The thresholds identified by TITAN represented disparate characteristics of species responses that, when coupled with the inability of TITAN to identify thresholds accurately and consistently, does not support the aggregation of individual species thresholds into a community threshold.

  13. Individualized statistical learning from medical image databases: application to identification of brain lesions.

    PubMed

    Erus, Guray; Zacharaki, Evangelia I; Davatzikos, Christos

    2014-04-01

    This paper presents a method for capturing statistical variation of normal imaging phenotypes, with emphasis on brain structure. The method aims to estimate the statistical variation of a normative set of images from healthy individuals, and identify abnormalities as deviations from normality. A direct estimation of the statistical variation of the entire volumetric image is challenged by the high-dimensionality of images relative to smaller sample sizes. To overcome this limitation, we iteratively sample a large number of lower dimensional subspaces that capture image characteristics ranging from fine and localized to coarser and more global. Within each subspace, a "target-specific" feature selection strategy is applied to further reduce the dimensionality, by considering only imaging characteristics present in a test subject's images. Marginal probability density functions of selected features are estimated through PCA models, in conjunction with an "estimability" criterion that limits the dimensionality of estimated probability densities according to available sample size and underlying anatomy variation. A test sample is iteratively projected to the subspaces of these marginals as determined by PCA models, and its trajectory delineates potential abnormalities. The method is applied to segmentation of various brain lesion types, and to simulated data on which superiority of the iterative method over straight PCA is demonstrated. Copyright © 2014 Elsevier B.V. All rights reserved.

  14. Individualized Statistical Learning from Medical Image Databases: Application to Identification of Brain Lesions

    PubMed Central

    Erus, Guray; Zacharaki, Evangelia I.; Davatzikos, Christos

    2014-01-01

    This paper presents a method for capturing statistical variation of normal imaging phenotypes, with emphasis on brain structure. The method aims to estimate the statistical variation of a normative set of images from healthy individuals, and identify abnormalities as deviations from normality. A direct estimation of the statistical variation of the entire volumetric image is challenged by the high-dimensionality of images relative to smaller sample sizes. To overcome this limitation, we iteratively sample a large number of lower dimensional subspaces that capture image characteristics ranging from fine and localized to coarser and more global. Within each subspace, a “target-specific” feature selection strategy is applied to further reduce the dimensionality, by considering only imaging characteristics present in a test subject’s images. Marginal probability density functions of selected features are estimated through PCA models, in conjunction with an “estimability” criterion that limits the dimensionality of estimated probability densities according to available sample size and underlying anatomy variation. A test sample is iteratively projected to the subspaces of these marginals as determined by PCA models, and its trajectory delineates potential abnormalities. The method is applied to segmentation of various brain lesion types, and to simulated data on which superiority of the iterative method over straight PCA is demonstrated. PMID:24607564

  15. Identification of reliable gridded reference data for statistical downscaling methods in Alberta

    NASA Astrophysics Data System (ADS)

    Eum, H. I.; Gupta, A.

    2017-12-01

    Climate models provide essential information to assess impacts of climate change at regional and global scales. However, statistical downscaling methods have been applied to prepare climate model data for various applications such as hydrologic and ecologic modelling at a watershed scale. As the reliability and (spatial and temporal) resolution of statistically downscaled climate data mainly depend on a reference data, identifying the most reliable reference data is crucial for statistical downscaling. A growing number of gridded climate products are available for key climate variables which are main input data to regional modelling systems. However, inconsistencies in these climate products, for example, different combinations of climate variables, varying data domains and data lengths and data accuracy varying with physiographic characteristics of the landscape, have caused significant challenges in selecting the most suitable reference climate data for various environmental studies and modelling. Employing various observation-based daily gridded climate products available in public domain, i.e. thin plate spline regression products (ANUSPLIN and TPS), inverse distance method (Alberta Townships), and numerical climate model (North American Regional Reanalysis) and an optimum interpolation technique (Canadian Precipitation Analysis), this study evaluates the accuracy of the climate products at each grid point by comparing with the Adjusted and Homogenized Canadian Climate Data (AHCCD) observations for precipitation, minimum and maximum temperature over the province of Alberta. Based on the performance of climate products at AHCCD stations, we ranked the reliability of these publically available climate products corresponding to the elevations of stations discretized into several classes. According to the rank of climate products for each elevation class, we identified the most reliable climate products based on the elevation of target points. A web-based system was developed to allow users to easily select the most reliable reference climate data at each target point based on the elevation of grid cell. By constructing the best combination of reference data for the study domain, the accurate and reliable statistically downscaled climate projections could be significantly improved.

  16. The use of the temporal scan statistic to detect methicillin-resistant Staphylococcus aureus clusters in a community hospital.

    PubMed

    Faires, Meredith C; Pearl, David L; Ciccotelli, William A; Berke, Olaf; Reid-Smith, Richard J; Weese, J Scott

    2014-07-08

    In healthcare facilities, conventional surveillance techniques using rule-based guidelines may result in under- or over-reporting of methicillin-resistant Staphylococcus aureus (MRSA) outbreaks, as these guidelines are generally unvalidated. The objectives of this study were to investigate the utility of the temporal scan statistic for detecting MRSA clusters, validate clusters using molecular techniques and hospital records, and determine significant differences in the rate of MRSA cases using regression models. Patients admitted to a community hospital between August 2006 and February 2011, and identified with MRSA>48 hours following hospital admission, were included in this study. Between March 2010 and February 2011, MRSA specimens were obtained for spa typing. MRSA clusters were investigated using a retrospective temporal scan statistic. Tests were conducted on a monthly scale and significant clusters were compared to MRSA outbreaks identified by hospital personnel. Associations between the rate of MRSA cases and the variables year, month, and season were investigated using a negative binomial regression model. During the study period, 735 MRSA cases were identified and 167 MRSA isolates were spa typed. Nine different spa types were identified with spa type 2/t002 (88.6%) the most prevalent. The temporal scan statistic identified significant MRSA clusters at the hospital (n=2), service (n=16), and ward (n=10) levels (P ≤ 0.05). Seven clusters were concordant with nine MRSA outbreaks identified by hospital staff. For the remaining clusters, seven events may have been equivalent to true outbreaks and six clusters demonstrated possible transmission events. The regression analysis indicated years 2009-2011, compared to 2006, and months March and April, compared to January, were associated with an increase in the rate of MRSA cases (P ≤ 0.05). The application of the temporal scan statistic identified several MRSA clusters that were not detected by hospital personnel. The identification of specific years and months with increased MRSA rates may be attributable to several hospital level factors including the presence of other pathogens. Within hospitals, the incorporation of the temporal scan statistic to standard surveillance techniques is a valuable tool for healthcare workers to evaluate surveillance strategies and aid in the identification of MRSA clusters.

  17. Mass spectra and fusion cross sections for /sup 20/Ne+/sup 24/Mg interaction at 55 and 85 MeV

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Grotowski, K.; Belery, P.; Delbar, T.

    1981-06-01

    Inclusive ..gamma.. spectra from the /sup 20/Ne+/sup 24/Mg interaction have been measured using 55- and 85-MeV /sup 20/Ne ions. The identification of ..gamma.. lines allows the determination of mass spectra in the region 12< or =A< or =43. Experimental results are compared with statistical model calculations. The total reaction and fusion cross sections are extracted. Cross sections for inelastic scattering, few nucleon transfers, and deep inelastic scattering are estimated.

  18. Redesign of the Stabilized Pitch Control System of a Semi-Active Terminal Homing Missile System.

    DTIC Science & Technology

    1979-04-20

    34 AIEE Trans. Application and Industry , pp. 65-77, May 1961. [3] L. S. Shieh, "An Algebraic Approach to System Identification and Compensator Design...34A Quick Method for Estimating Closed-Loop Poles of Control Systems," Trans. AIEE, Applications and Industry , Vol. 76, pp. 80-87, May 1957. [101 C...Mathe- matical and Statistical Library). [16] C. J. Huang and L. S. Shieh, "Modeling Large Dynamical Systems with industrial Specifications," Int. J

  19. A Situational-Awareness System For Networked Infantry Including An Accelerometer-Based Shot-Identification Algorithm For Direct-Fire Weapons

    DTIC Science & Technology

    2016-09-01

    noise density and temperature sensitivity of these devices are all on the same order of magnitude. Even the worst- case noise density of the GCDC...accelerations from a handgun firing were distinct from other impulsive events on the wrist, such as using a hammer. Loeffler first identified potential shots by...spikes, taking various statistical parameters. He used a logistic regression model on these parameters and was able to classify 98.9% of shots

  20. Machine Learning Methods for Production Cases Analysis

    NASA Astrophysics Data System (ADS)

    Mokrova, Nataliya V.; Mokrov, Alexander M.; Safonova, Alexandra V.; Vishnyakov, Igor V.

    2018-03-01

    Approach to analysis of events occurring during the production process were proposed. Described machine learning system is able to solve classification tasks related to production control and hazard identification at an early stage. Descriptors of the internal production network data were used for training and testing of applied models. k-Nearest Neighbors and Random forest methods were used to illustrate and analyze proposed solution. The quality of the developed classifiers was estimated using standard statistical metrics, such as precision, recall and accuracy.

  1. Developing a Zebrafish Model of NF1 for Structure-Function Analysis and Identification of Modifier Genes

    DTIC Science & Technology

    2010-04-01

    equipped with a spinning-disc confocal system ( Yokogawa ) was used. The statistical significance of changes to OPC cell numbers and migration upon nf1...that they are expressed in overlapping tissues. We examined the expression of both genes by whole mount in situ hybridization between the 4- cell stage...sorted cells confirmed expression, particularly in the vascular endothelium (Figure 4E-G), while RNA from 1- cell embryos indicate that both genes are

  2. An evolving computational platform for biological mass spectrometry: workflows, statistics and data mining with MASSyPup64.

    PubMed

    Winkler, Robert

    2015-01-01

    In biological mass spectrometry, crude instrumental data need to be converted into meaningful theoretical models. Several data processing and data evaluation steps are required to come to the final results. These operations are often difficult to reproduce, because of too specific computing platforms. This effect, known as 'workflow decay', can be diminished by using a standardized informatic infrastructure. Thus, we compiled an integrated platform, which contains ready-to-use tools and workflows for mass spectrometry data analysis. Apart from general unit operations, such as peak picking and identification of proteins and metabolites, we put a strong emphasis on the statistical validation of results and Data Mining. MASSyPup64 includes e.g., the OpenMS/TOPPAS framework, the Trans-Proteomic-Pipeline programs, the ProteoWizard tools, X!Tandem, Comet and SpiderMass. The statistical computing language R is installed with packages for MS data analyses, such as XCMS/metaXCMS and MetabR. The R package Rattle provides a user-friendly access to multiple Data Mining methods. Further, we added the non-conventional spreadsheet program teapot for editing large data sets and a command line tool for transposing large matrices. Individual programs, console commands and modules can be integrated using the Workflow Management System (WMS) taverna. We explain the useful combination of the tools by practical examples: (1) A workflow for protein identification and validation, with subsequent Association Analysis of peptides, (2) Cluster analysis and Data Mining in targeted Metabolomics, and (3) Raw data processing, Data Mining and identification of metabolites in untargeted Metabolomics. Association Analyses reveal relationships between variables across different sample sets. We present its application for finding co-occurring peptides, which can be used for target proteomics, the discovery of alternative biomarkers and protein-protein interactions. Data Mining derived models displayed a higher robustness and accuracy for classifying sample groups in targeted Metabolomics than cluster analyses. Random Forest models do not only provide predictive models, which can be deployed for new data sets, but also the variable importance. We demonstrate that the later is especially useful for tracking down significant signals and affected pathways in untargeted Metabolomics. Thus, Random Forest modeling supports the unbiased search for relevant biological features in Metabolomics. Our results clearly manifest the importance of Data Mining methods to disclose non-obvious information in biological mass spectrometry . The application of a Workflow Management System and the integration of all required programs and data in a consistent platform makes the presented data analyses strategies reproducible for non-expert users. The simple remastering process and the Open Source licenses of MASSyPup64 (http://www.bioprocess.org/massypup/) enable the continuous improvement of the system.

  3. Identification of elastic, dielectric, and piezoelectric constants in piezoceramic disks.

    PubMed

    Perez, Nicolas; Andrade, Marco A B; Buiochi, Flavio; Adamowski, Julio C

    2010-12-01

    Three-dimensional modeling of piezoelectric devices requires a precise knowledge of piezoelectric material parameters. The commonly used piezoelectric materials belong to the 6mm symmetry class, which have ten independent constants. In this work, a methodology to obtain precise material constants over a wide frequency band through finite element analysis of a piezoceramic disk is presented. Given an experimental electrical impedance curve and a first estimate for the piezoelectric material properties, the objective is to find the material properties that minimize the difference between the electrical impedance calculated by the finite element method and that obtained experimentally by an electrical impedance analyzer. The methodology consists of four basic steps: experimental measurement, identification of vibration modes and their sensitivity to material constants, a preliminary identification algorithm, and final refinement of the material constants using an optimization algorithm. The application of the methodology is exemplified using a hard lead zirconate titanate piezoceramic. The same methodology is applied to a soft piezoceramic. The errors in the identification of each parameter are statistically estimated in both cases, and are less than 0.6% for elastic constants, and less than 6.3% for dielectric and piezoelectric constants.

  4. An evaluation of talker localization based on direction of arrival estimation and statistical sound source identification

    NASA Astrophysics Data System (ADS)

    Nishiura, Takanobu; Nakamura, Satoshi

    2002-11-01

    It is very important to capture distant-talking speech for a hands-free speech interface with high quality. A microphone array is an ideal candidate for this purpose. However, this approach requires localizing the target talker. Conventional talker localization algorithms in multiple sound source environments not only have difficulty localizing the multiple sound sources accurately, but also have difficulty localizing the target talker among known multiple sound source positions. To cope with these problems, we propose a new talker localization algorithm consisting of two algorithms. One is DOA (direction of arrival) estimation algorithm for multiple sound source localization based on CSP (cross-power spectrum phase) coefficient addition method. The other is statistical sound source identification algorithm based on GMM (Gaussian mixture model) for localizing the target talker position among localized multiple sound sources. In this paper, we particularly focus on the talker localization performance based on the combination of these two algorithms with a microphone array. We conducted evaluation experiments in real noisy reverberant environments. As a result, we confirmed that multiple sound signals can be identified accurately between ''speech'' or ''non-speech'' by the proposed algorithm. [Work supported by ATR, and MEXT of Japan.

  5. Markov vs. Hurst-Kolmogorov behaviour identification in hydroclimatic processes

    NASA Astrophysics Data System (ADS)

    Dimitriadis, Panayiotis; Gournari, Naya; Koutsoyiannis, Demetris

    2016-04-01

    Hydroclimatic processes are usually modelled either by exponential decay of the autocovariance function, i.e., Markovian behaviour, or power type decay, i.e., long-term persistence (or else Hurst-Kolmogorov behaviour). For the identification and quantification of such behaviours several graphical stochastic tools can be used such as the climacogram (i.e., plot of the variance of the averaged process vs. scale), autocovariance, variogram, power spectrum etc. with the former usually exhibiting smaller statistical uncertainty as compared to the others. However, most methodologies including these tools are based on the expected value of the process. In this analysis, we explore a methodology that combines both the practical use of a graphical representation of the internal structure of the process as well as the statistical robustness of the maximum-likelihood estimation. For validation and illustration purposes, we apply this methodology to fundamental stochastic processes, such as Markov and Hurst-Kolmogorov type ones. Acknowledgement: This research is conducted within the frame of the undergraduate course "Stochastic Methods in Water Resources" of the National Technical University of Athens (NTUA). The School of Civil Engineering of NTUA provided moral support for the participation of the students in the Assembly.

  6. Spatial diffusion of influenza outbreak-related climate factors in Chiang Mai Province, Thailand.

    PubMed

    Nakapan, Supachai; Tripathi, Nitin Kumar; Tipdecho, Taravudh; Souris, Marc

    2012-10-24

    Influenza is one of the most important leading causes of respiratory illness in the countries located in the tropical areas of South East Asia and Thailand. In this study the climate factors associated with influenza incidence in Chiang Mai Province, Northern Thailand, were investigated. Identification of factors responsible for influenza outbreaks and the mapping of potential risk areas in Chiang Mai are long overdue. This work examines the association between yearly climate patterns between 2001 and 2008 and influenza outbreaks in the Chiang Mai Province. The climatic factors included the amount of rainfall, percent of rainy days, relative humidity, maximum, minimum temperatures and temperature difference. The study develops a statistical analysis to quantitatively assess the relationship between climate and influenza outbreaks and then evaluate its suitability for predicting influenza outbreaks. A multiple linear regression technique was used to fit the statistical model. The Inverse Distance Weighted (IDW) interpolation and Geographic Information System (GIS) techniques were used in mapping the spatial diffusion of influenza risk zones. The results show that there is a significance correlation between influenza outbreaks and climate factors for the majority of the studied area. A statistical analysis was conducted to assess the validity of the model comparing model outputs and actual outbreaks.

  7. A statistical pixel intensity model for segmentation of confocal laser scanning microscopy images.

    PubMed

    Calapez, Alexandre; Rosa, Agostinho

    2010-09-01

    Confocal laser scanning microscopy (CLSM) has been widely used in the life sciences for the characterization of cell processes because it allows the recording of the distribution of fluorescence-tagged macromolecules on a section of the living cell. It is in fact the cornerstone of many molecular transport and interaction quantification techniques where the identification of regions of interest through image segmentation is usually a required step. In many situations, because of the complexity of the recorded cellular structures or because of the amounts of data involved, image segmentation either is too difficult or inefficient to be done by hand and automated segmentation procedures have to be considered. Given the nature of CLSM images, statistical segmentation methodologies appear as natural candidates. In this work we propose a model to be used for statistical unsupervised CLSM image segmentation. The model is derived from the CLSM image formation mechanics and its performance is compared to the existing alternatives. Results show that it provides a much better description of the data on classes characterized by their mean intensity, making it suitable not only for segmentation methodologies with known number of classes but also for use with schemes aiming at the estimation of the number of classes through the application of cluster selection criteria.

  8. 75 FR 8363 - Office for Civil Rights; Workshop on the HIPAA Privacy Rule's De-Identification Standard; Notice...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-02-24

    ... Recovery and Reinvestment Act of 2009 (ARRA),\\1\\ requires HHS to issue guidance on methods for de...). --Methodological Issues Associated with HIPAA Privacy Rule De- Identification. --Statistical Disclosure Control and...

  9. 77 FR 18689 - Changes to Standard Numbering System, Vessel Identification System, and Boating Accident Report...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-03-28

    ... requires States to compile and send us reports, information, and statistics on casualties reported to them... data and statistical information received from the current collection to establish National... accident prevention programs; and publish accident statistics in accordance with Title 46 U.S.C. 6102...

  10. Economic Statistics and Information Concerning the Japanese Auto Industry

    DOT National Transportation Integrated Search

    1980-12-01

    The report examines the following aspects of the Japanese automobile Industry: Identification of Japanese agencies that receive statistical data on the automobile industry; Determination of research and development and capital investment procedures; ...

  11. Palatal rugae pattern: An aid for sex identification

    PubMed Central

    Gadicherla, Prahlad; Saini, Divya; Bhaskar, Milana

    2017-01-01

    Background: Palatal rugoscopy, or palatoscopy, is the process by which human identification can be obtained by inspecting the transverse palatal rugae inside the mouth. Aim: The aim of the study is to investigate the potential of using palatal rugae as an aid for sex identification in Bengaluru population. Materials and Methods: One hundred plaster casts equally distributed between males and females belonging to age range of 4–16 years were examined for different rugae patterns. Thomas and Kotze classification was adopted for identification of these rugae patterns. Statistical Analysis: The data obtained were subjected to discriminant function analysis to determine the applicability of palatal rugae pattern as an aid for sex identification. Results: Difference in unification patterns among males and females was found to be statistically significant. No significant difference was found between males and females in terms of number of rugae. Overall, wavy and curvy were the most predominant type of rugae seen. Discriminant function analysis enabled sex identification with an accuracy of 80%. Conclusion: This preliminary study undertaken showed the existence of a distinct pattern of distribution of palatal rugae between males and females of Bengaluru population. This study opens scope for further research with a larger sample size to establish palatal rugae as a valuable tool for sex identification for forensic purposes. PMID:28584485

  12. MSblender: A probabilistic approach for integrating peptide identifications from multiple database search engines.

    PubMed

    Kwon, Taejoon; Choi, Hyungwon; Vogel, Christine; Nesvizhskii, Alexey I; Marcotte, Edward M

    2011-07-01

    Shotgun proteomics using mass spectrometry is a powerful method for protein identification but suffers limited sensitivity in complex samples. Integrating peptide identifications from multiple database search engines is a promising strategy to increase the number of peptide identifications and reduce the volume of unassigned tandem mass spectra. Existing methods pool statistical significance scores such as p-values or posterior probabilities of peptide-spectrum matches (PSMs) from multiple search engines after high scoring peptides have been assigned to spectra, but these methods lack reliable control of identification error rates as data are integrated from different search engines. We developed a statistically coherent method for integrative analysis, termed MSblender. MSblender converts raw search scores from search engines into a probability score for every possible PSM and properly accounts for the correlation between search scores. The method reliably estimates false discovery rates and identifies more PSMs than any single search engine at the same false discovery rate. Increased identifications increment spectral counts for most proteins and allow quantification of proteins that would not have been quantified by individual search engines. We also demonstrate that enhanced quantification contributes to improve sensitivity in differential expression analyses.

  13. MSblender: a probabilistic approach for integrating peptide identifications from multiple database search engines

    PubMed Central

    Kwon, Taejoon; Choi, Hyungwon; Vogel, Christine; Nesvizhskii, Alexey I.; Marcotte, Edward M.

    2011-01-01

    Shotgun proteomics using mass spectrometry is a powerful method for protein identification but suffers limited sensitivity in complex samples. Integrating peptide identifications from multiple database search engines is a promising strategy to increase the number of peptide identifications and reduce the volume of unassigned tandem mass spectra. Existing methods pool statistical significance scores such as p-values or posterior probabilities of peptide-spectrum matches (PSMs) from multiple search engines after high scoring peptides have been assigned to spectra, but these methods lack reliable control of identification error rates as data are integrated from different search engines. We developed a statistically coherent method for integrative analysis, termed MSblender. MSblender converts raw search scores from search engines into a probability score for all possible PSMs and properly accounts for the correlation between search scores. The method reliably estimates false discovery rates and identifies more PSMs than any single search engine at the same false discovery rate. Increased identifications increment spectral counts for all detected proteins and allow quantification of proteins that would not have been quantified by individual search engines. We also demonstrate that enhanced quantification contributes to improve sensitivity in differential expression analyses. PMID:21488652

  14. A regulation probability model-based meta-analysis of multiple transcriptomics data sets for cancer biomarker identification.

    PubMed

    Xie, Xin-Ping; Xie, Yu-Feng; Wang, Hong-Qiang

    2017-08-23

    Large-scale accumulation of omics data poses a pressing challenge of integrative analysis of multiple data sets in bioinformatics. An open question of such integrative analysis is how to pinpoint consistent but subtle gene activity patterns across studies. Study heterogeneity needs to be addressed carefully for this goal. This paper proposes a regulation probability model-based meta-analysis, jGRP, for identifying differentially expressed genes (DEGs). The method integrates multiple transcriptomics data sets in a gene regulatory space instead of in a gene expression space, which makes it easy to capture and manage data heterogeneity across studies from different laboratories or platforms. Specifically, we transform gene expression profiles into a united gene regulation profile across studies by mathematically defining two gene regulation events between two conditions and estimating their occurring probabilities in a sample. Finally, a novel differential expression statistic is established based on the gene regulation profiles, realizing accurate and flexible identification of DEGs in gene regulation space. We evaluated the proposed method on simulation data and real-world cancer datasets and showed the effectiveness and efficiency of jGRP in identifying DEGs identification in the context of meta-analysis. Data heterogeneity largely influences the performance of meta-analysis of DEGs identification. Existing different meta-analysis methods were revealed to exhibit very different degrees of sensitivity to study heterogeneity. The proposed method, jGRP, can be a standalone tool due to its united framework and controllable way to deal with study heterogeneity.

  15. Threshold Values for Identification of Contamination Predicted by Reduced-Order Models

    DOE PAGES

    Last, George V.; Murray, Christopher J.; Bott, Yi-Ju; ...

    2014-12-31

    The U.S. Department of Energy’s (DOE’s) National Risk Assessment Partnership (NRAP) Project is developing reduced-order models to evaluate potential impacts on underground sources of drinking water (USDWs) if CO2 or brine leaks from deep CO2 storage reservoirs. Threshold values, below which there would be no predicted impacts, were determined for portions of two aquifer systems. These threshold values were calculated using an interwell approach for determining background groundwater concentrations that is an adaptation of methods described in the U.S. Environmental Protection Agency’s Unified Guidance for Statistical Analysis of Groundwater Monitoring Data at RCRA Facilities.

  16. An intervention to decrease patient identification band errors in a children's hospital.

    PubMed

    Hain, Paul D; Joers, B; Rush, M; Slayton, J; Throop, P; Hoagg, S; Allen, L; Grantham, J; Deshpande, J K

    2010-06-01

    Patient misidentification continues to be a quality and safety issue. There is a paucity of US data describing interventions to reduce identification band error rates. Monroe Carell Jr Children's Hospital at Vanderbilt. Percentage of patients with defective identification bands. Web-based surveys were sent, asking hospital personnel to anonymously identify perceived barriers to reaching zero defects with identification bands. Corrective action plans were created and implemented with ideas from leadership, front-line staff and the online survey. Data from unannounced audits of patient identification bands were plotted on statistical process control charts and shared monthly with staff. All hospital personnel were expected to "stop the line" if there were any patient identification questions. The first audit showed a defect rate of 20.4%. The original mean defect rate was 6.5%. After interventions and education, the new mean defect rate was 2.6%. (a) The initial rate of patient identification band errors in the hospital was higher than expected. (b) The action resulting in most significant improvement was staff awareness of the problem, with clear expectations to immediately stop the line if a patient identification error was present. (c) Staff surveys are an excellent source of suggestions for combating patient identification issues. (d) Continued audit and data collection is necessary for sustainable staff focus and continued improvement. (e) Statistical process control charts are both an effective method to track results and an easily understood tool for sharing data with staff.

  17. Resolving the double tension: Toward a new approach to measurement modeling in cross-national research

    NASA Astrophysics Data System (ADS)

    Medina, Tait Runnfeldt

    The increasing global reach of survey research provides sociologists with new opportunities to pursue theory building and refinement through comparative analysis. However, comparison across a broad array of diverse contexts introduces methodological complexities related to the development of constructs (i.e., measurement modeling) that if not adequately recognized and properly addressed undermine the quality of research findings and cast doubt on the validity of substantive conclusions. The motivation for this dissertation arises from a concern that the availability of cross-national survey data has outpaced sociologists' ability to appropriately analyze and draw meaningful conclusions from such data. I examine the implicit assumptions and detail the limitations of three commonly used measurement models in cross-national analysis---summative scale, pooled factor model, and multiple-group factor model with measurement invariance. Using the orienting lens of the double tension I argue that a new approach to measurement modeling that incorporates important cross-national differences into the measurement process is needed. Two such measurement models---multiple-group factor model with partial measurement invariance (Byrne, Shavelson and Muthen 1989) and the alignment method (Asparouhov and Muthen 2014; Muthen and Asparouhov 2014)---are discussed in detail and illustrated using a sociologically relevant substantive example. I demonstrate that the former approach is vulnerable to an identification problem that arbitrarily impacts substantive conclusions. I conclude that the alignment method is built on model assumptions that are consistent with theoretical understandings of cross-national comparability and provides an approach to measurement modeling and construct development that is uniquely suited for cross-national research. The dissertation makes three major contributions: First, it provides theoretical justification for a new cross-national measurement model and explicates a link between theoretical conceptions of cross-national comparability and a statistical method. Second, it provides a clear and detailed discussion of model identification in multiple-group confirmatory factor analysis that is missing from the literature. This discussion sets the stage for the introduction of the identification problem within multiple-group confirmatory factor analysis with partial measurement invariance and the alternative approach to model identification employed by the alignment method. Third, it offers the first pedagogical presentation of the alignment method using a sociologically relevant example.

  18. Comparing the landcapes of common retroviral insertion sites across tumor models

    NASA Astrophysics Data System (ADS)

    Weishaupt, Holger; Čančer, Matko; Engström, Cristopher; Silvestrov, Sergei; Swartling, Fredrik J.

    2017-01-01

    Retroviral tagging represents an important technique, which allows researchers to screen for candidate cancer genes. The technique is based on the integration of retroviral sequences into the genome of a host organism, which might then lead to the artificial inhibition or expression of proximal genetic elements. The identification of potential cancer genes in this framework involves the detection of genomic regions (common insertion sites; CIS) which contain a number of such viral integration sites that is greater than expected by chance. During the last two decades, a number of different methods have been discussed for the identification of such loci and the respective techniques have been applied to a variety of different retroviruses and/or tumor models. We have previously established a retrovirus driven brain tumor model and reported the CISs which were found based on a Monte Carlo statistics derived detection paradigm. In this study, we consider a recently proposed alternative graph theory based method for identifying CISs and compare the resulting CIS landscape in our brain tumor dataset to those obtained when using the Monte Carlo approach. Finally, we also employ the graph-based method to compare the CIS landscape in our brain tumor model with those of other published retroviral tumor models.

  19. Statistical analysis of texture in trunk images for biometric identification of tree species.

    PubMed

    Bressane, Adriano; Roveda, José A F; Martins, Antônio C G

    2015-04-01

    The identification of tree species is a key step for sustainable management plans of forest resources, as well as for several other applications that are based on such surveys. However, the present available techniques are dependent on the presence of tree structures, such as flowers, fruits, and leaves, limiting the identification process to certain periods of the year. Therefore, this article introduces a study on the application of statistical parameters for texture classification of tree trunk images. For that, 540 samples from five Brazilian native deciduous species were acquired and measures of entropy, uniformity, smoothness, asymmetry (third moment), mean, and standard deviation were obtained from the presented textures. Using a decision tree, a biometric species identification system was constructed and resulted to a 0.84 average precision rate for species classification with 0.83accuracy and 0.79 agreement. Thus, it can be considered that the use of texture presented in trunk images can represent an important advance in tree identification, since the limitations of the current techniques can be overcome.

  20. A response surface methodology based damage identification technique

    NASA Astrophysics Data System (ADS)

    Fang, S. E.; Perera, R.

    2009-06-01

    Response surface methodology (RSM) is a combination of statistical and mathematical techniques to represent the relationship between the inputs and outputs of a physical system by explicit functions. This methodology has been widely employed in many applications such as design optimization, response prediction and model validation. But so far the literature related to its application in structural damage identification (SDI) is scarce. Therefore this study attempts to present a systematic SDI procedure comprising four sequential steps of feature selection, parameter screening, primary response surface (RS) modeling and updating, and reference-state RS modeling with SDI realization using the factorial design (FD) and the central composite design (CCD). The last two steps imply the implementation of inverse problems by model updating in which the RS models substitute the FE models. The proposed method was verified against a numerical beam, a tested reinforced concrete (RC) frame and an experimental full-scale bridge with the modal frequency being the output responses. It was found that the proposed RSM-based method performs well in predicting the damage of both numerical and experimental structures having single and multiple damage scenarios. The screening capacity of the FD can provide quantitative estimation of the significance levels of updating parameters. Meanwhile, the second-order polynomial model established by the CCD provides adequate accuracy in expressing the dynamic behavior of a physical system.

  1. Perceptual context and individual differences in the language proficiency of preschool children.

    PubMed

    Banai, Karen; Yifat, Rachel

    2016-02-01

    Although the contribution of perceptual processes to language skills during infancy is well recognized, the role of perception in linguistic processing beyond infancy is not well understood. In the experiments reported here, we asked whether manipulating the perceptual context in which stimuli are presented across trials influences how preschool children perform visual (shape-size identification; Experiment 1) and auditory (syllable identification; Experiment 2) tasks. Another goal was to determine whether the sensitivity to perceptual context can explain part of the variance in oral language skills in typically developing preschool children. Perceptual context was manipulated by changing the relative frequency with which target visual (Experiment 1) and auditory (Experiment 2) stimuli were presented in arrays of fixed size, and identification of the target stimuli was tested. Oral language skills were assessed using vocabulary, word definition, and phonological awareness tasks. Changes in perceptual context influenced the performance of the majority of children on both identification tasks. Sensitivity to perceptual context accounted for 7% to 15% of the variance in language scores. We suggest that context effects are an outcome of a statistical learning process. Therefore, the current findings demonstrate that statistical learning can facilitate both visual and auditory identification processes in preschool children. Furthermore, consistent with previous findings in infants and in older children and adults, individual differences in statistical learning were found to be associated with individual differences in language skills of preschool children. Copyright © 2015 Elsevier Inc. All rights reserved.

  2. Forensic Hair Differentiation Using Attenuated Total Reflection Fourier Transform Infrared (ATR FT-IR) Spectroscopy.

    PubMed

    Manheim, Jeremy; Doty, Kyle C; McLaughlin, Gregory; Lednev, Igor K

    2016-07-01

    Hair and fibers are common forms of trace evidence found at crime scenes. The current methodology of microscopic examination of potential hair evidence is absent of statistical measures of performance, and examiner results for identification can be subjective. Here, attenuated total reflection (ATR) Fourier transform-infrared (FT-IR) spectroscopy was used to analyze synthetic fibers and natural hairs of human, cat, and dog origin. Chemometric analysis was used to differentiate hair spectra from the three different species, and to predict unknown hairs to their proper species class, with a high degree of certainty. A species-specific partial least squares discriminant analysis (PLSDA) model was constructed to discriminate human hair from cat and dog hairs. This model was successful in distinguishing between the three classes and, more importantly, all human samples were correctly predicted as human. An external validation resulted in zero false positive and false negative assignments for the human class. From a forensic perspective, this technique would be complementary to microscopic hair examination, and in no way replace it. As such, this methodology is able to provide a statistical measure of confidence to the identification of a sample of human, cat, and dog hair, which was called for in the 2009 National Academy of Sciences report. More importantly, this approach is non-destructive, rapid, can provide reliable results, and requires no sample preparation, making it of ample importance to the field of forensic science. © The Author(s) 2016.

  3. Development of a quantitative multivariable radiographic method to evaluate anatomic changes associated with laminitis in the forefeet of donkeys.

    PubMed

    Collins, Simon N; Dyson, Sue J; Murray, Rachel C; Newton, J Richard; Burden, Faith; Trawford, Andrew F

    2012-08-01

    To establish and validate an objective method of radiographic diagnosis of anatomic changes in laminitic forefeet of donkeys on the basis of data from a comprehensive series of radiographic measurements. 85 donkeys with and 85 without forelimb laminitis for baseline data determination; a cohort of 44 donkeys with and 18 without forelimb laminitis was used for validation analyses. For each donkey, lateromedial radiographic views of 1 weight-bearing forelimb were obtained; images from 11 laminitic and 2 nonlaminitic donkeys were excluded (motion artifact) from baseline data determination. Data from an a priori selection of 19 measurements of anatomic features of laminitic and nonlaminitic donkey feet were analyzed by use of a novel application of multivariate statistical techniques. The resultant diagnostic models were validated in a blinded manner with data from the separate cohort of laminitic and nonlaminitic donkeys. Data were modeled, and robust statistical rules were established for the diagnosis of anatomic changes within laminitic donkey forefeet. Component 1 scores ≤ -3.5 were indicative of extreme anatomic change, and scores from -2.0 to 0.0 denoted modest change. Nonlaminitic donkeys with a score from 0.5 to 1.0 should be considered as at risk for laminitis. Results indicated that the radiographic procedures evaluated can be used for the identification, assessment, and monitoring of anatomic changes associated with laminitis. Screening assessments by use of this method may enable early detection of mild anatomic change and identification of at-risk donkeys.

  4. Development of statistical linear regression model for metals from transportation land uses.

    PubMed

    Maniquiz, Marla C; Lee, Soyoung; Lee, Eunju; Kim, Lee-Hyung

    2009-01-01

    The transportation landuses possessing impervious surfaces such as highways, parking lots, roads, and bridges were recognized as the highly polluted non-point sources (NPSs) in the urban areas. Lots of pollutants from urban transportation are accumulating on the paved surfaces during dry periods and are washed-off during a storm. In Korea, the identification and monitoring of NPSs still represent a great challenge. Since 2004, the Ministry of Environment (MOE) has been engaged in several researches and monitoring to develop stormwater management policies and treatment systems for future implementation. The data over 131 storm events during May 2004 to September 2008 at eleven sites were analyzed to identify correlation relationships between particulates and metals, and to develop simple linear regression (SLR) model to estimate event mean concentration (EMC). Results indicate that there was no significant relationship between metals and TSS EMC. However, the SLR estimation models although not providing useful results are valuable indicators of high uncertainties that NPS pollution possess. Therefore, long term monitoring employing proper methods and precise statistical analysis of the data should be undertaken to eliminate these uncertainties.

  5. The Link between Basing Self-Worth on Academics and Student Performance Depends on Domain Identification and Academic Setting

    ERIC Educational Resources Information Center

    Lawrence, Jason S.; Charbonneau, Joseph

    2009-01-01

    Two studies showed that the link between how much students base their self-worth on academics and their math performance depends on whether their identification with math was statistically controlled and whether the task measured ability or not. Study 1 showed that, when math identification was uncontrolled and the task was ability-diagnostic,…

  6. An approach for the assessment of the statistical aspects of the SEA coupling loss factors and the vibrational energy transmission in complex aircraft structures: Experimental investigation and methods benchmark

    NASA Astrophysics Data System (ADS)

    Bouhaj, M.; von Estorff, O.; Peiffer, A.

    2017-09-01

    In the application of Statistical Energy Analysis "SEA" to complex assembled structures, a purely predictive model often exhibits errors. These errors are mainly due to a lack of accurate modelling of the power transmission mechanism described through the Coupling Loss Factors (CLF). Experimental SEA (ESEA) is practically used by the automotive and aerospace industry to verify and update the model or to derive the CLFs for use in an SEA predictive model when analytical estimates cannot be made. This work is particularly motivated by the lack of procedures that allow an estimate to be made of the variance and confidence intervals of the statistical quantities when using the ESEA technique. The aim of this paper is to introduce procedures enabling a statistical description of measured power input, vibration energies and the derived SEA parameters. Particular emphasis is placed on the identification of structural CLFs of complex built-up structures comparing different methods. By adopting a Stochastic Energy Model (SEM), the ensemble average in ESEA is also addressed. For this purpose, expressions are obtained to randomly perturb the energy matrix elements and generate individual samples for the Monte Carlo (MC) technique applied to derive the ensemble averaged CLF. From results of ESEA tests conducted on an aircraft fuselage section, the SEM approach provides a better performance of estimated CLFs compared to classical matrix inversion methods. The expected range of CLF values and the synthesized energy are used as quality criteria of the matrix inversion, allowing to assess critical SEA subsystems, which might require a more refined statistical description of the excitation and the response fields. Moreover, the impact of the variance of the normalized vibration energy on uncertainty of the derived CLFs is outlined.

  7. Machinery running state identification based on discriminant semi-supervised local tangent space alignment for feature fusion and extraction

    NASA Astrophysics Data System (ADS)

    Su, Zuqiang; Xiao, Hong; Zhang, Yi; Tang, Baoping; Jiang, Yonghua

    2017-04-01

    Extraction of sensitive features is a challenging but key task in data-driven machinery running state identification. Aimed at solving this problem, a method for machinery running state identification that applies discriminant semi-supervised local tangent space alignment (DSS-LTSA) for feature fusion and extraction is proposed. Firstly, in order to extract more distinct features, the vibration signals are decomposed by wavelet packet decomposition WPD, and a mixed-domain feature set consisted of statistical features, autoregressive (AR) model coefficients, instantaneous amplitude Shannon entropy and WPD energy spectrum is extracted to comprehensively characterize the properties of machinery running state(s). Then, the mixed-dimension feature set is inputted into DSS-LTSA for feature fusion and extraction to eliminate redundant information and interference noise. The proposed DSS-LTSA can extract intrinsic structure information of both labeled and unlabeled state samples, and as a result the over-fitting problem of supervised manifold learning and blindness problem of unsupervised manifold learning are overcome. Simultaneously, class discrimination information is integrated within the dimension reduction process in a semi-supervised manner to improve sensitivity of the extracted fusion features. Lastly, the extracted fusion features are inputted into a pattern recognition algorithm to achieve the running state identification. The effectiveness of the proposed method is verified by a running state identification case in a gearbox, and the results confirm the improved accuracy of the running state identification.

  8. Blind identification of image manipulation type using mixed statistical moments

    NASA Astrophysics Data System (ADS)

    Jeong, Bo Gyu; Moon, Yong Ho; Eom, Il Kyu

    2015-01-01

    We present a blind identification of image manipulation types such as blurring, scaling, sharpening, and histogram equalization. Motivated by the fact that image manipulations can change the frequency characteristics of an image, we introduce three types of feature vectors composed of statistical moments. The proposed statistical moments are generated from separated wavelet histograms, the characteristic functions of the wavelet variance, and the characteristic functions of the spatial image. Our method can solve the n-class classification problem. Through experimental simulations, we demonstrate that our proposed method can achieve high performance in manipulation type detection. The average rate of the correctly identified manipulation types is as high as 99.22%, using 10,800 test images and six manipulation types including the authentic image.

  9. Statistical analysis of RHIC beam position monitors performance

    NASA Astrophysics Data System (ADS)

    Calaga, R.; Tomás, R.

    2004-04-01

    A detailed statistical analysis of beam position monitors (BPM) performance at RHIC is a critical factor in improving regular operations and future runs. Robust identification of malfunctioning BPMs plays an important role in any orbit or turn-by-turn analysis. Singular value decomposition and Fourier transform methods, which have evolved as powerful numerical techniques in signal processing, will aid in such identification from BPM data. This is the first attempt at RHIC to use a large set of data to statistically enhance the capability of these two techniques and determine BPM performance. A comparison from run 2003 data shows striking agreement between the two methods and hence can be used to improve BPM functioning at RHIC and possibly other accelerators.

  10. Identification of Chemical Attribution Signatures of Fentanyl Syntheses Using Multivariate Statistical Analysis of Orthogonal Analytical Data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mayer, B. P.; Mew, D. A.; DeHope, A.

    Attribution of the origin of an illicit drug relies on identification of compounds indicative of its clandestine production and is a key component of many modern forensic investigations. The results of these studies can yield detailed information on method of manufacture, starting material source, and final product - all critical forensic evidence. In the present work, chemical attribution signatures (CAS) associated with the synthesis of the analgesic fentanyl, N-(1-phenylethylpiperidin-4-yl)-N-phenylpropanamide, were investigated. Six synthesis methods, all previously published fentanyl synthetic routes or hybrid versions thereof, were studied in an effort to identify and classify route-specific signatures. 160 distinct compounds and inorganicmore » species were identified using gas and liquid chromatographies combined with mass spectrometric methods (GC-MS and LCMS/ MS-TOF) in conjunction with inductively coupled plasma mass spectrometry (ICPMS). The complexity of the resultant data matrix urged the use of multivariate statistical analysis. Using partial least squares discriminant analysis (PLS-DA), 87 route-specific CAS were classified and a statistical model capable of predicting the method of fentanyl synthesis was validated and tested against CAS profiles from crude fentanyl products deposited and later extracted from two operationally relevant surfaces: stainless steel and vinyl tile. This work provides the most detailed fentanyl CAS investigation to date by using orthogonal mass spectral data to identify CAS of forensic significance for illicit drug detection, profiling, and attribution.« less

  11. A study on identification of bacteria in environmental samples using single-cell Raman spectroscopy: feasibility and reference libraries.

    PubMed

    Baritaux, Jean-Charles; Simon, Anne-Catherine; Schultz, Emmanuelle; Emain, C; Laurent, P; Dinten, Jean-Marc

    2016-05-01

    We report on our recent efforts towards identifying bacteria in environmental samples by means of Raman spectroscopy. We established a database of Raman spectra from bacteria submitted to various environmental conditions. This dataset was used to verify that Raman typing is possible from measurements performed in non-ideal conditions. Starting from the same dataset, we then varied the phenotype and matrix diversity content included in the reference library used to train the statistical model. The results show that it is possible to obtain models with an extended coverage of spectral variabilities, compared to environment-specific models trained on spectra from a restricted set of conditions. Broad coverage models are desirable for environmental samples since the exact conditions of the bacteria cannot be controlled.

  12. Identification of Nasal Bone Fractures on Conventional Radiography and Facial CT: Comparison of the Diagnostic Accuracy in Different Imaging Modalities and Analysis of Interobserver Reliability.

    PubMed

    Baek, Hye Jin; Kim, Dong Wook; Ryu, Ji Hwa; Lee, Yoo Jin

    2013-09-01

    There has been no study to compare the diagnostic accuracy of an experienced radiologist with a trainee in nasal bone fracture. To compare the diagnostic accuracy between conventional radiography and computed tomography (CT) for the identification of nasal bone fractures and to evaluate the interobserver reliability between a staff radiologist and a trainee. A total of 108 patients who underwent conventional radiography and CT after acute nasal trauma were included in this retrospective study. Two readers, a staff radiologist and a second-year resident, independently assessed the results of the imaging studies. Of the 108 patients, the presence of a nasal bone fracture was confirmed in 88 (81.5%) patients. The number of non-depressed fractures was higher than the number of depressed fractures. In nine (10.2%) patients, nasal bone fractures were only identified on conventional radiography, including three depressed and six non-depressed fractures. CT was more accurate as compared to conventional radiography for the identification of nasal bone fractures as determined by both readers (P <0.05), all diagnostic indices of an experienced radiologist were similar to or higher than those of a trainee, and κ statistics showed moderate agreement between the two diagnostic tools for both readers. There was no statistical difference in the assessment of interobserver reliability for both imaging modalities in the identification of nasal bone fractures. For the identification of nasal bone fractures, CT was significantly superior to conventional radiography. Although a staff radiologist showed better values in the identification of nasal bone fracture and differentiation between depressed and non-depressed fractures than a trainee, there was no statistically significant difference in the interpretation of conventional radiography and CT between a radiologist and a trainee.

  13. Application of statistical process control and process capability analysis procedures in orbiter processing activities at the Kennedy Space Center

    NASA Technical Reports Server (NTRS)

    Safford, Robert R.; Jackson, Andrew E.; Swart, William W.; Barth, Timothy S.

    1994-01-01

    Successful ground processing at KSC requires that flight hardware and ground support equipment conform to specifications at tens of thousands of checkpoints. Knowledge of conformance is an essential requirement for launch. That knowledge of conformance at every requisite point does not, however, enable identification of past problems with equipment, or potential problem areas. This paper describes how the introduction of Statistical Process Control and Process Capability Analysis identification procedures into existing shuttle processing procedures can enable identification of potential problem areas and candidates for improvements to increase processing performance measures. Results of a case study describing application of the analysis procedures to Thermal Protection System processing are used to illustrate the benefits of the approaches described in the paper.

  14. Efficient identification and referral of low-income women at high risk for hereditary breast cancer: a practice-based approach.

    PubMed

    Joseph, G; Kaplan, C; Luce, J; Lee, R; Stewart, S; Guerra, C; Pasick, R

    2012-01-01

    Identification of low-income women with the rare but serious risk of hereditary cancer and their referral to appropriate services presents an important public health challenge. We report the results of formative research to reach thousands of women for efficient identification of those at high risk and expedient access to free genetic services. External validity is maximized by emphasizing intervention fit with the two end-user organizations who must connect to make this possible. This study phase informed the design of a subsequent randomized controlled trial. We conducted a randomized controlled pilot study (n = 38) to compare two intervention models for feasibility and impact. The main outcome was receipt of genetic counseling during a two-month intervention period. Model 1 was based on the usual outcall protocol of an academic hospital genetic risk program, and Model 2 drew on the screening and referral procedures of a statewide toll-free phone line through which large numbers of high-risk women can be identified. In Model 1, the risk program proactively calls patients to schedule genetic counseling; for Model 2, women are notified of their eligibility for counseling and make the call themselves. We also developed and pretested a family history screener for administration by phone to identify women appropriate for genetic counseling. There was no statistically significant difference in receipt of genetic counseling between women randomized to Model 1 (3/18) compared with Model 2 (3/20) during the intervention period. However, when unresponsive women in Model 2 were called after 2 months, 7 more obtained counseling; 4 women from Model 1 were also counseled after the intervention. Thus, the intervention model that closely aligned with the risk program's outcall to high-risk women was found to be feasible and brought more low-income women to free genetic counseling. Our screener was easy to administer by phone and appeared to identify high-risk callers effectively. The model and screener are now in use in the main trial to test the effectiveness of this screening and referral intervention. A validation analysis of the screener is also underway. Identification of intervention strategies and tools, and their systematic comparison for impact and efficiency in the context where they will ultimately be used are critical elements of practice-based research. Copyright © 2012 S. Karger AG, Basel.

  15. Raman spectroscopy coupled with advanced statistics for differentiating menstrual and peripheral blood.

    PubMed

    Sikirzhytskaya, Aliaksandra; Sikirzhytski, Vitali; Lednev, Igor K

    2014-01-01

    Body fluids are a common and important type of forensic evidence. In particular, the identification of menstrual blood stains is often a key step during the investigation of rape cases. Here, we report on the application of near-infrared Raman microspectroscopy for differentiating menstrual blood from peripheral blood. We observed that the menstrual and peripheral blood samples have similar but distinct Raman spectra. Advanced statistical analysis of the multiple Raman spectra that were automatically (Raman mapping) acquired from the 40 dried blood stains (20 donors for each group) allowed us to build classification model with maximum (100%) sensitivity and specificity. We also demonstrated that despite certain common constituents, menstrual blood can be readily distinguished from vaginal fluid. All of the classification models were verified using cross-validation methods. The proposed method overcomes the problems associated with currently used biochemical methods, which are destructive, time consuming and expensive. Copyright © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  16. What's statistical about learning? Insights from modelling statistical learning as a set of memory processes

    PubMed Central

    2017-01-01

    Statistical learning has been studied in a variety of different tasks, including word segmentation, object identification, category learning, artificial grammar learning and serial reaction time tasks (e.g. Saffran et al. 1996 Science 274, 1926–1928; Orban et al. 2008 Proceedings of the National Academy of Sciences 105, 2745–2750; Thiessen & Yee 2010 Child Development 81, 1287–1303; Saffran 2002 Journal of Memory and Language 47, 172–196; Misyak & Christiansen 2012 Language Learning 62, 302–331). The difference among these tasks raises questions about whether they all depend on the same kinds of underlying processes and computations, or whether they are tapping into different underlying mechanisms. Prior theoretical approaches to statistical learning have often tried to explain or model learning in a single task. However, in many cases these approaches appear inadequate to explain performance in multiple tasks. For example, explaining word segmentation via the computation of sequential statistics (such as transitional probability) provides little insight into the nature of sensitivity to regularities among simultaneously presented features. In this article, we will present a formal computational approach that we believe is a good candidate to provide a unifying framework to explore and explain learning in a wide variety of statistical learning tasks. This framework suggests that statistical learning arises from a set of processes that are inherent in memory systems, including activation, interference, integration of information and forgetting (e.g. Perruchet & Vinter 1998 Journal of Memory and Language 39, 246–263; Thiessen et al. 2013 Psychological Bulletin 139, 792–814). From this perspective, statistical learning does not involve explicit computation of statistics, but rather the extraction of elements of the input into memory traces, and subsequent integration across those memory traces that emphasize consistent information (Thiessen and Pavlik 2013 Cognitive Science 37, 310–343). This article is part of the themed issue ‘New frontiers for statistical learning in the cognitive sciences'. PMID:27872374

  17. What's statistical about learning? Insights from modelling statistical learning as a set of memory processes.

    PubMed

    Thiessen, Erik D

    2017-01-05

    Statistical learning has been studied in a variety of different tasks, including word segmentation, object identification, category learning, artificial grammar learning and serial reaction time tasks (e.g. Saffran et al. 1996 Science 274: , 1926-1928; Orban et al. 2008 Proceedings of the National Academy of Sciences 105: , 2745-2750; Thiessen & Yee 2010 Child Development 81: , 1287-1303; Saffran 2002 Journal of Memory and Language 47: , 172-196; Misyak & Christiansen 2012 Language Learning 62: , 302-331). The difference among these tasks raises questions about whether they all depend on the same kinds of underlying processes and computations, or whether they are tapping into different underlying mechanisms. Prior theoretical approaches to statistical learning have often tried to explain or model learning in a single task. However, in many cases these approaches appear inadequate to explain performance in multiple tasks. For example, explaining word segmentation via the computation of sequential statistics (such as transitional probability) provides little insight into the nature of sensitivity to regularities among simultaneously presented features. In this article, we will present a formal computational approach that we believe is a good candidate to provide a unifying framework to explore and explain learning in a wide variety of statistical learning tasks. This framework suggests that statistical learning arises from a set of processes that are inherent in memory systems, including activation, interference, integration of information and forgetting (e.g. Perruchet & Vinter 1998 Journal of Memory and Language 39: , 246-263; Thiessen et al. 2013 Psychological Bulletin 139: , 792-814). From this perspective, statistical learning does not involve explicit computation of statistics, but rather the extraction of elements of the input into memory traces, and subsequent integration across those memory traces that emphasize consistent information (Thiessen and Pavlik 2013 Cognitive Science 37: , 310-343).This article is part of the themed issue 'New frontiers for statistical learning in the cognitive sciences'. © 2016 The Author(s).

  18. Inter-speaker speech variability assessment using statistical deformable models from 3.0 tesla magnetic resonance images.

    PubMed

    Vasconcelos, Maria J M; Ventura, Sandra M R; Freitas, Diamantino R S; Tavares, João Manuel R S

    2012-03-01

    The morphological and dynamic characterisation of the vocal tract during speech production has been gaining greater attention due to the motivation of the latest improvements in magnetic resonance (MR) imaging; namely, with the use of higher magnetic fields, such as 3.0 Tesla. In this work, the automatic study of the vocal tract from 3.0 Tesla MR images was assessed through the application of statistical deformable models. Therefore, the primary goal focused on the analysis of the shape of the vocal tract during the articulation of European Portuguese sounds, followed by the evaluation of the results concerning the automatic segmentation, i.e. identification of the vocal tract in new MR images. In what concerns speech production, this is the first attempt to automatically characterise and reconstruct the vocal tract shape of 3.0 Tesla MR images by using deformable models; particularly, by using active and appearance shape models. The achieved results clearly evidence the adequacy and advantage of the automatic analysis of the 3.0 Tesla MR images of these deformable models in order to extract the vocal tract shape and assess the involved articulatory movements. These achievements are mostly required, for example, for a better knowledge of speech production, mainly of patients suffering from articulatory disorders, and to build enhanced speech synthesizer models.

  19. Acoustic Analogy and Alternative Theories for Jet Noise Prediction

    NASA Technical Reports Server (NTRS)

    Morris, Philip J.; Farassat, F.

    2002-01-01

    Several methods for the prediction of jet noise are described. All but one of the noise prediction schemes are based on Lighthill's or Lilley's acoustic analogy, whereas the other is the jet noise generation model recently proposed by Tam and Auriault. In all of the approaches, some assumptions must be made concerning the statistical properties of the turbulent sources. In each case the characteristic scales of the turbulence are obtained from a solution of the Reynolds-averaged Navier-Stokes equation using a kappa-sigma turbulence model. It is shown that, for the same level of empiricism, Tam and Auriault's model yields better agreement with experimental noise measurements than the acoustic analogy. It is then shown that this result is not because of some fundamental flaw in the acoustic analogy approach, but instead is associated with the assumptions made in the approximation of the turbulent source statistics. If consistent assumptions are made, both the acoustic analogy and Tam and Auriault's model yield identical noise predictions. In conclusion, a proposal is presented for an acoustic analogy that provides a clearer identification of the equivalent source mechanisms, as is a discussion of noise prediction issues that remain to be resolved.

  20. The Acoustic Analogy and Alternative Theories for Jet Noise Prediction

    NASA Technical Reports Server (NTRS)

    Morris, Philip J.; Farassat, F.; Morris, Philip J.

    2002-01-01

    This paper describes several methods for the prediction of jet noise. All but one of the noise prediction schemes are based on Lighthill's or Lilley's acoustic analogy while the other is the jet noise generation model recently proposed by Tam and Auriault. In all the approaches some assumptions must be made concerning the statistical properties of the turbulent sources. In each case the characteristic scales of the turbulence are obtained from a solution of the Reynolds-averaged Navier Stokes equation using a k-epsilon turbulence model. It is shown that, for the same level of empiricism, Tam and Auriault's model yields better agreement with experimental noise measurements than the acoustic analogy. It is then shown that this result is not because of some fundamental flaw in the acoustic analogy approach: but, is associated with the assumptions made in the approximation of the turbulent source statistics. If consistent assumptions are made, both the acoustic analogy and Tam and Auriault's model yield identical noise predictions. The paper concludes with a proposal for an acoustic analogy that provides a clearer identification of the equivalent source mechanisms and a discussion of noise prediction issues that remain to be resolved.

  1. Identifying pleiotropic genes in genome-wide association studies from related subjects using the linear mixed model and Fisher combination function.

    PubMed

    Yang, James J; Williams, L Keoki; Buu, Anne

    2017-08-24

    A multivariate genome-wide association test is proposed for analyzing data on multivariate quantitative phenotypes collected from related subjects. The proposed method is a two-step approach. The first step models the association between the genotype and marginal phenotype using a linear mixed model. The second step uses the correlation between residuals of the linear mixed model to estimate the null distribution of the Fisher combination test statistic. The simulation results show that the proposed method controls the type I error rate and is more powerful than the marginal tests across different population structures (admixed or non-admixed) and relatedness (related or independent). The statistical analysis on the database of the Study of Addiction: Genetics and Environment (SAGE) demonstrates that applying the multivariate association test may facilitate identification of the pleiotropic genes contributing to the risk for alcohol dependence commonly expressed by four correlated phenotypes. This study proposes a multivariate method for identifying pleiotropic genes while adjusting for cryptic relatedness and population structure between subjects. The two-step approach is not only powerful but also computationally efficient even when the number of subjects and the number of phenotypes are both very large.

  2. The Acoustic Analogy and Alternative Theories for Jet Noise Prediction

    NASA Technical Reports Server (NTRS)

    Morris, Philip J.; Farassat, F.

    2002-01-01

    This paper describes several methods for the prediction of jet noise. All but one of the noise prediction schemes are based on Lighthill's or Lilley's acoustic analogy while the other is the jet noise generation model recently proposed by Tam and Auriault. In all the approaches some assumptions must be made concerning the statistical properties of the turbulent sources. In each case the characteristic scales of the turbulence are obtained from a solution of the Reynolds-averaged Navier Stokes equation using a k - epsilon turbulence model. It is shown that, for the same level of empiricism, Tam and Auriault's model yields better agreement with experimental noise measurements than the acoustic analogy. It is then shown that this result is not because of some fundamental flaw in the acoustic analogy approach: but, is associated with the assumptions made in the approximation of the turbulent source statistics. If consistent assumptions are made, both the acoustic analogy and Tam and Auriault's model yield identical noise predictions. The paper concludes with a proposal for an acoustic analogy that provides a clearer identification of the equivalent source mechanisms and a discussion of noise prediction issues that remain to be resolved.

  3. The geostatistical approach for structural and stratigraphic framework analysis of offshore NW Bonaparte Basin, Australia

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wahid, Ali, E-mail: ali.wahid@live.com; Salim, Ahmed Mohamed Ahmed, E-mail: mohamed.salim@petronas.com.my; Yusoff, Wan Ismail Wan, E-mail: wanismail-wanyusoff@petronas.com.my

    2016-02-01

    Geostatistics or statistical approach is based on the studies of temporal and spatial trend, which depend upon spatial relationships to model known information of variable(s) at unsampled locations. The statistical technique known as kriging was used for petrophycial and facies analysis, which help to assume spatial relationship to model the geological continuity between the known data and the unknown to produce a single best guess of the unknown. Kriging is also known as optimal interpolation technique, which facilitate to generate best linear unbiased estimation of each horizon. The idea is to construct a numerical model of the lithofacies and rockmore » properties that honor available data and further integrate with interpreting seismic sections, techtonostratigraphy chart with sea level curve (short term) and regional tectonics of the study area to find the structural and stratigraphic growth history of the NW Bonaparte Basin. By using kriging technique the models were built which help to estimate different parameters like horizons, facies, and porosities in the study area. The variograms were used to determine for identification of spatial relationship between data which help to find the depositional history of the North West (NW) Bonaparte Basin.« less

  4. Dynamic rain fade compensation techniques for the advanced communications technology satellite

    NASA Technical Reports Server (NTRS)

    Manning, Robert M.

    1992-01-01

    The dynamic and composite nature of propagation impairments that are incurred on earth-space communications links at frequencies in and above the 30/20 GHz Ka band necessitate the use of dynamic statistical identification and prediction processing of the fading signal in order to optimally estimate and predict the levels of each of the deleterious attenuation components. Such requirements are being met in NASA's Advanced Communications Technology Satellite (ACTS) project by the implementation of optimal processing schemes derived through the use of the ACTS Rain Attenuation Prediction Model and nonlinear Markov filtering theory. The ACTS Rain Attenuation Prediction Model discerns climatological variations on the order of 0.5 deg in latitude and longitude in the continental U.S. The time-dependent portion of the model gives precise availability predictions for the 'spot beam' links of ACTS. However, the structure of the dynamic portion of the model, which yields performance parameters such as fade duration probabilities, is isomorphic to the state-variable approach of stochastic control theory and is amenable to the design of such statistical fade processing schemes which can be made specific to the particular climatological location at which they are employed.

  5. Efficient exploration of pan-cancer networks by generalized covariance selection and interactive web content

    PubMed Central

    Kling, Teresia; Johansson, Patrik; Sanchez, José; Marinescu, Voichita D.; Jörnsten, Rebecka; Nelander, Sven

    2015-01-01

    Statistical network modeling techniques are increasingly important tools to analyze cancer genomics data. However, current tools and resources are not designed to work across multiple diagnoses and technical platforms, thus limiting their applicability to comprehensive pan-cancer datasets such as The Cancer Genome Atlas (TCGA). To address this, we describe a new data driven modeling method, based on generalized Sparse Inverse Covariance Selection (SICS). The method integrates genetic, epigenetic and transcriptional data from multiple cancers, to define links that are present in multiple cancers, a subset of cancers, or a single cancer. It is shown to be statistically robust and effective at detecting direct pathway links in data from TCGA. To facilitate interpretation of the results, we introduce a publicly accessible tool (cancerlandscapes.org), in which the derived networks are explored as interactive web content, linked to several pathway and pharmacological databases. To evaluate the performance of the method, we constructed a model for eight TCGA cancers, using data from 3900 patients. The model rediscovered known mechanisms and contained interesting predictions. Possible applications include prediction of regulatory relationships, comparison of network modules across multiple forms of cancer and identification of drug targets. PMID:25953855

  6. PharmMapper 2017 update: a web server for potential drug target identification with a comprehensive target pharmacophore database.

    PubMed

    Wang, Xia; Shen, Yihang; Wang, Shiwei; Li, Shiliang; Zhang, Weilin; Liu, Xiaofeng; Lai, Luhua; Pei, Jianfeng; Li, Honglin

    2017-07-03

    The PharmMapper online tool is a web server for potential drug target identification by reversed pharmacophore matching the query compound against an in-house pharmacophore model database. The original version of PharmMapper includes more than 7000 target pharmacophores derived from complex crystal structures with corresponding protein target annotations. In this article, we present a new version of the PharmMapper web server, of which the backend pharmacophore database is six times larger than the earlier one, with a total of 23 236 proteins covering 16 159 druggable pharmacophore models and 51 431 ligandable pharmacophore models. The expanded target data cover 450 indications and 4800 molecular functions compared to 110 indications and 349 molecular functions in our last update. In addition, the new web server is united with the statistically meaningful ranking of the identified drug targets, which is achieved through the use of standard scores. It also features an improved user interface. The proposed web server is freely available at http://lilab.ecust.edu.cn/pharmmapper/. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  7. Advances in Medical Analytics Solutions for Autonomous Medical Operations on Long-Duration Missions

    NASA Technical Reports Server (NTRS)

    Thompson, David E.; Lindsey, Antonia Edward

    2017-01-01

    A review will be presented on the progress made under STMDGame Changing Development Program Funding towards the development of a Medical Decision Support System for augmenting crew capabilities during long-duration missions, such as Mars Transit. To create an MDSS, initial work requires acquiring images and developing models that analyze and assess the features in such medical biosensor images that support medical assessment of pathologies. For FY17, the project has focused on ultrasound images towards cardiac pathologies: namely, evaluation and assessment of pericardial effusion identification and discrimination from related pneumothorax and even bladder-induced infections that cause inflammation around the heart. This identification is substantially changed due to uncertainty due to conditions of fluid behavior under space-microgravity. This talk will present and discuss the work-to-date in this Project, recognizing conditions under which various machine learning technologies, deep-learning via convolutional neural nets, and statistical learning methods for feature identification and classification can be employed and conditioned to graphical format in preparation for attachment to an inference engine that eventually creates decision support recommendations to remote crew in a triage setting.

  8. Accounting for regional background and population size in the detection of spatial clusters and outliers using geostatistical filtering and spatial neutral models: the case of lung cancer in Long Island, New York

    PubMed Central

    Goovaerts, Pierre; Jacquez, Geoffrey M

    2004-01-01

    Background Complete Spatial Randomness (CSR) is the null hypothesis employed by many statistical tests for spatial pattern, such as local cluster or boundary analysis. CSR is however not a relevant null hypothesis for highly complex and organized systems such as those encountered in the environmental and health sciences in which underlying spatial pattern is present. This paper presents a geostatistical approach to filter the noise caused by spatially varying population size and to generate spatially correlated neutral models that account for regional background obtained by geostatistical smoothing of observed mortality rates. These neutral models were used in conjunction with the local Moran statistics to identify spatial clusters and outliers in the geographical distribution of male and female lung cancer in Nassau, Queens, and Suffolk counties, New York, USA. Results We developed a typology of neutral models that progressively relaxes the assumptions of null hypotheses, allowing for the presence of spatial autocorrelation, non-uniform risk, and incorporation of spatially heterogeneous population sizes. Incorporation of spatial autocorrelation led to fewer significant ZIP codes than found in previous studies, confirming earlier claims that CSR can lead to over-identification of the number of significant spatial clusters or outliers. Accounting for population size through geostatistical filtering increased the size of clusters while removing most of the spatial outliers. Integration of regional background into the neutral models yielded substantially different spatial clusters and outliers, leading to the identification of ZIP codes where SMR values significantly depart from their regional background. Conclusion The approach presented in this paper enables researchers to assess geographic relationships using appropriate null hypotheses that account for the background variation extant in real-world systems. In particular, this new methodology allows one to identify geographic pattern above and beyond background variation. The implementation of this approach in spatial statistical software will facilitate the detection of spatial disparities in mortality rates, establishing the rationale for targeted cancer control interventions, including consideration of health services needs, and resource allocation for screening and diagnostic testing. It will allow researchers to systematically evaluate how sensitive their results are to assumptions implicit under alternative null hypotheses. PMID:15272930

  9. Current genetic methodologies in the identification of disaster victims and in forensic analysis.

    PubMed

    Ziętkiewicz, Ewa; Witt, Magdalena; Daca, Patrycja; Zebracka-Gala, Jadwiga; Goniewicz, Mariusz; Jarząb, Barbara; Witt, Michał

    2012-02-01

    This review presents the basic problems and currently available molecular techniques used for genetic profiling in disaster victim identification (DVI). The environmental conditions of a mass disaster often result in severe fragmentation, decomposition and intermixing of the remains of victims. In such cases, traditional identification based on the anthropological and physical characteristics of the victims is frequently inconclusive. This is the reason why DNA profiling became the gold standard for victim identification in mass-casualty incidents (MCIs) or any forensic cases where human remains are highly fragmented and/or degraded beyond recognition. The review provides general information about the sources of genetic material for DNA profiling, the genetic markers routinely used during genetic profiling (STR markers, mtDNA and single-nucleotide polymorphisms [SNP]) and the basic statistical approaches used in DNA-based disaster victim identification. Automated technological platforms that allow the simultaneous analysis of a multitude of genetic markers used in genetic identification (oligonucleotide microarray techniques and next-generation sequencing) are also presented. Forensic and population databases containing information on human variability, routinely used for statistical analyses, are discussed. The final part of this review is focused on recent developments, which offer particularly promising tools for forensic applications (mRNA analysis, transcriptome variation in individuals/populations and genetic profiling of specific cells separated from mixtures).

  10. Enhanced detection and visualization of anomalies in spectral imagery

    NASA Astrophysics Data System (ADS)

    Basener, William F.; Messinger, David W.

    2009-05-01

    Anomaly detection algorithms applied to hyperspectral imagery are able to reliably identify man-made objects from a natural environment based on statistical/geometric likelyhood. The process is more robust than target identification, which requires precise prior knowledge of the object of interest, but has an inherently higher false alarm rate. Standard anomaly detection algorithms measure deviation of pixel spectra from a parametric model (either statistical or linear mixing) estimating the image background. The topological anomaly detector (TAD) creates a fully non-parametric, graph theory-based, topological model of the image background and measures deviation from this background using codensity. In this paper we present a large-scale comparative test of TAD against 80+ targets in four full HYDICE images using the entire canonical target set for generation of ROC curves. TAD will be compared against several statistics-based detectors including local RX and subspace RX. Even a perfect anomaly detection algorithm would have a high practical false alarm rate in most scenes simply because the user/analyst is not interested in every anomalous object. To assist the analyst in identifying and sorting objects of interest, we investigate coloring of the anomalies with principle components projections using statistics computed from the anomalies. This gives a very useful colorization of anomalies in which objects of similar material tend to have the same color, enabling an analyst to quickly sort and identify anomalies of highest interest.

  11. Framework for Uncertainty Assessment - Hanford Site-Wide Groundwater Flow and Transport Modeling

    NASA Astrophysics Data System (ADS)

    Bergeron, M. P.; Cole, C. R.; Murray, C. J.; Thorne, P. D.; Wurstner, S. K.

    2002-05-01

    Pacific Northwest National Laboratory is in the process of development and implementation of an uncertainty estimation methodology for use in future site assessments that addresses parameter uncertainty as well as uncertainties related to the groundwater conceptual model. The long-term goals of the effort are development and implementation of an uncertainty estimation methodology for use in future assessments and analyses being made with the Hanford site-wide groundwater model. The basic approach in the framework developed for uncertainty assessment consists of: 1) Alternate conceptual model (ACM) identification to identify and document the major features and assumptions of each conceptual model. The process must also include a periodic review of the existing and proposed new conceptual models as data or understanding become available. 2) ACM development of each identified conceptual model through inverse modeling with historical site data. 3) ACM evaluation to identify which of conceptual models are plausible and should be included in any subsequent uncertainty assessments. 4) ACM uncertainty assessments will only be carried out for those ACMs determined to be plausible through comparison with historical observations and model structure identification measures. The parameter uncertainty assessment process generally involves: a) Model Complexity Optimization - to identify the important or relevant parameters for the uncertainty analysis; b) Characterization of Parameter Uncertainty - to develop the pdfs for the important uncertain parameters including identification of any correlations among parameters; c) Propagation of Uncertainty - to propagate parameter uncertainties (e.g., by first order second moment methods if applicable or by a Monte Carlo approach) through the model to determine the uncertainty in the model predictions of interest. 5)Estimation of combined ACM and scenario uncertainty by a double sum with each component of the inner sum (an individual CCDF) representing parameter uncertainty associated with a particular scenario and ACM and the outer sum enumerating the various plausible ACM and scenario combinations in order to represent the combined estimate of uncertainty (a family of CCDFs). A final important part of the framework includes identification, enumeration, and documentation of all the assumptions, which include those made during conceptual model development, required by the mathematical model, required by the numerical model, made during the spatial and temporal descretization process, needed to assign the statistical model and associated parameters that describe the uncertainty in the relevant input parameters, and finally those assumptions required by the propagation method. Pacific Northwest National Laboratory is operated for the U.S. Department of Energy under Contract DE-AC06-76RL01830.

  12. Research Update: Spatially resolved mapping of electronic structure on atomic level by multivariate statistical analysis

    DOE PAGES

    Belianinov, Alex; Panchapakesan, G.; Lin, Wenzhi; ...

    2014-12-02

    Atomic level spatial variability of electronic structure in Fe-based superconductor FeTe0.55Se0.45 (Tc = 15 K) is explored using current-imaging tunneling-spectroscopy. Multivariate statistical analysis of the data differentiates regions of dissimilar electronic behavior that can be identified with the segregation of chalcogen atoms, as well as boundaries between terminations and near neighbor interactions. Subsequent clustering analysis allows identification of the spatial localization of these dissimilar regions. Similar statistical analysis of modeled calculated density of states of chemically inhomogeneous FeTe1 x Sex structures further confirms that the two types of chalcogens, i.e., Te and Se, can be identified by their electronic signaturemore » and differentiated by their local chemical environment. This approach allows detailed chemical discrimination of the scanning tunneling microscopy data including separation of atomic identities, proximity, and local configuration effects and can be universally applicable to chemically and electronically inhomogeneous surfaces.« less

  13. Research Update: Spatially resolved mapping of electronic structure on atomic level by multivariate statistical analysis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Belianinov, Alex, E-mail: belianinova@ornl.gov; Ganesh, Panchapakesan; Lin, Wenzhi

    2014-12-01

    Atomic level spatial variability of electronic structure in Fe-based superconductor FeTe{sub 0.55}Se{sub 0.45} (T{sub c} = 15 K) is explored using current-imaging tunneling-spectroscopy. Multivariate statistical analysis of the data differentiates regions of dissimilar electronic behavior that can be identified with the segregation of chalcogen atoms, as well as boundaries between terminations and near neighbor interactions. Subsequent clustering analysis allows identification of the spatial localization of these dissimilar regions. Similar statistical analysis of modeled calculated density of states of chemically inhomogeneous FeTe{sub 1−x}Se{sub x} structures further confirms that the two types of chalcogens, i.e., Te and Se, can be identified bymore » their electronic signature and differentiated by their local chemical environment. This approach allows detailed chemical discrimination of the scanning tunneling microscopy data including separation of atomic identities, proximity, and local configuration effects and can be universally applicable to chemically and electronically inhomogeneous surfaces.« less

  14. The Accuracy of Parameter Estimation in System Identification of Noisy Aircraft Load Measurement. Ph.D. Thesis

    NASA Technical Reports Server (NTRS)

    Kong, Jeffrey

    1994-01-01

    This thesis focuses on the subject of the accuracy of parameter estimation and system identification techniques. Motivated by a complicated load measurement from NASA Dryden Flight Research Center, advanced system identification techniques are needed. The objective of this problem is to accurately predict the load experienced by the aircraft wing structure during flight determined from a set of calibrated load and gage response relationship. We can then model the problem as a black box input-output system identification from which the system parameter has to be estimated. Traditional LS (Least Square) techniques and the issues of noisy data and model accuracy are addressed. A statistical bound reflecting the change in residual is derived in order to understand the effects of the perturbations on the data. Due to the intrinsic nature of the LS problem, LS solution faces the dilemma of the trade off between model accuracy and noise sensitivity. A method of conflicting performance indices is presented, thus allowing us to improve the noise sensitivity while at the same time configuring the degredation of the model accuracy. SVD techniques for data reduction are studied and the equivalence of the Correspondence Analysis (CA) and Total Least Squares Criteria are proved. We also looked at nonlinear LS problems with NASA F-111 data set as an example. Conventional methods are neither easily applicable nor suitable for the specific load problem since the exact model of the system is unknown. Neural Network (NN) does not require prior information on the model of the system. This robustness motivated us to apply the NN techniques on our load problem. Simulation results for the NN methods used in both the single load and the 'warning signal' problems are both useful and encouraging. The performance of the NN (for single load estimate) is better than the LS approach, whereas no conventional approach was tried for the 'warning signals' problems. The NN design methodology is also presented. The use of SVD, CA and Collinearity Index methods are used to reduce the number of neurons in a layer.

  15. Corpus Approaches to Language Ideology

    ERIC Educational Resources Information Center

    Vessey, Rachelle

    2017-01-01

    This paper outlines how corpus linguistics--and more specifically the corpus-assisted discourse studies approach--can add useful dimensions to studies of language ideology. First, it is argued that the identification of words of high, low, and statistically significant frequency can help in the identification and exploration of language ideologies…

  16. 21 CFR 820.200 - Servicing.

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ... reports with appropriate statistical methodology in accordance with § 820.100. (c) Each manufacturer who... chapter shall automatically consider the report a complaint and shall process it in accordance with the... device serviced; (2) Any device identification(s) and control number(s) used; (3) The date of service; (4...

  17. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kolker, Eugene

    Our project focused primarily on analysis of different types of data produced by global high-throughput technologies, data integration of gene annotation, and gene and protein expression information, as well as on getting a better functional annotation of Shewanella genes. Specifically, four of our numerous major activities and achievements include the development of: statistical models for identification and expression proteomics, superior to currently available approaches (including our own earlier ones); approaches to improve gene annotations on the whole-organism scale; standards for annotation, transcriptomics and proteomics approaches; and generalized approaches for data integration of gene annotation, gene and protein expression information.

  18. Model selection and parameter estimation in structural dynamics using approximate Bayesian computation

    NASA Astrophysics Data System (ADS)

    Ben Abdessalem, Anis; Dervilis, Nikolaos; Wagg, David; Worden, Keith

    2018-01-01

    This paper will introduce the use of the approximate Bayesian computation (ABC) algorithm for model selection and parameter estimation in structural dynamics. ABC is a likelihood-free method typically used when the likelihood function is either intractable or cannot be approached in a closed form. To circumvent the evaluation of the likelihood function, simulation from a forward model is at the core of the ABC algorithm. The algorithm offers the possibility to use different metrics and summary statistics representative of the data to carry out Bayesian inference. The efficacy of the algorithm in structural dynamics is demonstrated through three different illustrative examples of nonlinear system identification: cubic and cubic-quintic models, the Bouc-Wen model and the Duffing oscillator. The obtained results suggest that ABC is a promising alternative to deal with model selection and parameter estimation issues, specifically for systems with complex behaviours.

  19. Individual-based model for radiation risk assessment

    NASA Astrophysics Data System (ADS)

    Smirnova, O.

    A mathematical model is developed which enables one to predict the life span probability for mammals exposed to radiation. It relates statistical biometric functions with statistical and dynamic characteristics of an organism's critical system. To calculate the dynamics of the latter, the respective mathematical model is used too. This approach is applied to describe the effects of low level chronic irradiation on mice when the hematopoietic system (namely, thrombocytopoiesis) is the critical one. For identification of the joint model, experimental data on hematopoiesis in nonirradiated and irradiated mice, as well as on mortality dynamics of those in the absence of radiation are utilized. The life span probability and life span shortening predicted by the model agree with corresponding experimental data. Modeling results show the significance of ac- counting the variability of the individual radiosensitivity of critical system cells when estimating the radiation risk. These findings are corroborated by clinical data on persons involved in the elimination of the Chernobyl catastrophe after- effects. All this makes it feasible to use the model for radiation risk assessments for cosmonauts and astronauts on long-term missions such as a voyage to Mars or a lunar colony. In this case the model coefficients have to be determined by making use of the available data for humans. Scenarios for the dynamics of dose accumulation during space flights should also be taken into account.

  20. Dynamics of essential collective motions in proteins: Theory

    NASA Astrophysics Data System (ADS)

    Stepanova, Maria

    2007-11-01

    A general theoretical background is introduced for characterization of conformational motions in protein molecules, and for building reduced coarse-grained models of proteins, based on the statistical analysis of their phase trajectories. Using the projection operator technique, a system of coupled generalized Langevin equations is derived for essential collective coordinates, which are generated by principal component analysis of molecular dynamic trajectories. The number of essential degrees of freedom is not limited in the theory. An explicit analytic relation is established between the generalized Langevin equation for essential collective coordinates and that for the all-atom phase trajectory projected onto the subspace of essential collective degrees of freedom. The theory introduced is applied to identify correlated dynamic domains in a macromolecule and to construct coarse-grained models representing the conformational motions in a protein through a few interacting domains embedded in a dissipative medium. A rigorous theoretical background is provided for identification of dynamic correlated domains in a macromolecule. Examples of domain identification in protein G are given and employed to interpret NMR experiments. Challenges and potential outcomes of the theory are discussed.

  1. A trauma-like model of political extremism: psycho-political fault lines in Israel.

    PubMed

    Laor, Nathaniel; Yanay-Shani, Alma; Wolmer, Leo; Khoury, Oula

    2010-10-01

    This study examines a trauma-like model of potentially violent political extremism among Jewish Israelis. We study the psychosocial characteristics of political extremists that may lie at the root of sociopolitical instability and assess personal (gender, stressful life events, Holocaust family background, and political activism) and psychological parameters (self- and political transcendence, perceived political threats, in/out-group identification ratio) that may predict readiness to engage in destructive political behavior. We examine the ideological zeal of various political groups, the relationship between the latter and perceived political threats, and the predictors of extreme political activism. Results showed that the extreme political poles displayed high level of ideological and morbid transcendence. Right extremists displayed higher perceived threats to physical existence and national identity. Left extremists scored highest on perceived moral integrity threat. Higher perceived threats to national identity and moral integrity, risk, and self-transcendence statistically explain morbid transcendence. When fear conjures up extremely skewed sociopolitical identifications across political boundaries, morbid transcendence may manifest itself in destructive political activity. © 2010 Association for Research in Nervous and Mental Disease.

  2. Inductive reasoning 2.0.

    PubMed

    Hayes, Brett K; Heit, Evan

    2018-05-01

    Inductive reasoning entails using existing knowledge to make predictions about novel cases. The first part of this review summarizes key inductive phenomena and critically evaluates theories of induction. We highlight recent theoretical advances, with a special emphasis on the structured statistical approach, the importance of sampling assumptions in Bayesian models, and connectionist modeling. A number of new research directions in this field are identified including comparisons of inductive and deductive reasoning, the identification of common core processes in induction and memory tasks and induction involving category uncertainty. The implications of induction research for areas as diverse as complex decision-making and fear generalization are discussed. This article is categorized under: Psychology > Reasoning and Decision Making Psychology > Learning. © 2017 Wiley Periodicals, Inc.

  3. Ship speeds and sea ice forecasts - how are they related?

    NASA Astrophysics Data System (ADS)

    Loeptien, Ulrike; Axell, Lars

    2014-05-01

    The Baltic Sea is a shallow marginal sea, located in northern Europe. A seasonally occurring sea ice cover has the potential to hinder the intense ship traffic substantially. There are thus considerable efforts to fore- and nowcast ice conditions. Here we take a somewhat opposite approach and relate ship speeds, as observed via the Automatic Identification System (AIS) network, back to the prevailing sea ice conditions. We show that these information are useful to constrain fore- and nowcasts. More specifically we find, by fitting a statistical model (mixed effect model) for a test region in the Bothnian Bay, that the forecasted ice properties can explain 60-65% of the ship speed variations (based on 25 minute averages).

  4. Statistical analysis of whole-body absorption depending on anatomical human characteristics at a frequency of 2.1 GHz.

    PubMed

    Habachi, A El; Conil, E; Hadjem, A; Vazquez, E; Wong, M F; Gati, A; Fleury, G; Wiart, J

    2010-04-07

    In this paper, we propose identification of the morphological factors that may impact the whole-body averaged specific absorption rate (WBSAR). This study is conducted for the case of exposure to a front plane wave at a 2100 MHz frequency carrier. This study is based on the development of different regression models for estimating the WBSAR as a function of morphological factors. For this purpose, a database of 12 anatomical human models (phantoms) has been considered. Also, 18 supplementary phantoms obtained using the morphing technique were generated to build the required relation. This paper presents three models based on external morphological factors such as the body surface area, the body mass index or the body mass. These models show good results in estimating the WBSAR (<10%) for families obtained by the morphing technique, but these are still less accurate (30%) when applied to different original phantoms. This study stresses the importance of the internal morphological factors such as muscle and fat proportions in characterization of the WBSAR. The regression models are then improved using internal morphological factors with an estimation error of approximately 10% on the WBSAR. Finally, this study is suitable for establishing the statistical distribution of the WBSAR for a given population characterized by its morphology.

  5. Statistical analysis of whole-body absorption depending on anatomical human characteristics at a frequency of 2.1 GHz

    NASA Astrophysics Data System (ADS)

    El Habachi, A.; Conil, E.; Hadjem, A.; Vazquez, E.; Wong, M. F.; Gati, A.; Fleury, G.; Wiart, J.

    2010-04-01

    In this paper, we propose identification of the morphological factors that may impact the whole-body averaged specific absorption rate (WBSAR). This study is conducted for the case of exposure to a front plane wave at a 2100 MHz frequency carrier. This study is based on the development of different regression models for estimating the WBSAR as a function of morphological factors. For this purpose, a database of 12 anatomical human models (phantoms) has been considered. Also, 18 supplementary phantoms obtained using the morphing technique were generated to build the required relation. This paper presents three models based on external morphological factors such as the body surface area, the body mass index or the body mass. These models show good results in estimating the WBSAR (<10%) for families obtained by the morphing technique, but these are still less accurate (30%) when applied to different original phantoms. This study stresses the importance of the internal morphological factors such as muscle and fat proportions in characterization of the WBSAR. The regression models are then improved using internal morphological factors with an estimation error of approximately 10% on the WBSAR. Finally, this study is suitable for establishing the statistical distribution of the WBSAR for a given population characterized by its morphology.

  6. Stochastic system identification in structural dynamics

    USGS Publications Warehouse

    Safak, Erdal

    1988-01-01

    Recently, new identification methods have been developed by using the concept of optimal-recursive filtering and stochastic approximation. These methods, known as stochastic identification, are based on the statistical properties of the signal and noise, and do not require the assumptions of current methods. The criterion for stochastic system identification is that the difference between the recorded output and the output from the identified system (i.e., the residual of the identification) should be equal to white noise. In this paper, first a brief review of the theory is given. Then, an application of the method is presented by using ambient vibration data from a nine-story building.

  7. Identification of statistically independent climatic pattern in GRACE and hydrological model data over West-Africa

    NASA Astrophysics Data System (ADS)

    Kusche, J.; Forootan, E.; Eicker, A.; Hoffmann-Dobrev, H.

    2012-04-01

    West-African countries have been exposed to changes in rainfall patterns over the last decades, including a significant negative trend. This causes adverse effects on water resources, for instance reduced freshwater availability, and changes in the frequency, duration and magnitude of droughts and floods. Extracting the main patterns of water storage change in West Africa from remote sensing and linking them to climate variability, is therefore an essential step to understand the hydrological aspects of the region. In this study, the higher order statistical method of Independent Component Analysis (ICA) is employed to extract statistically independent water storage patterns from monthly Gravity Recovery And Climate Experiment (GRACE), from the WaterGAP Global Hydrology Model (WGHM) and from Tropical Rainfall Measuring Mission (TRMM) products over West Africa, for the period 2002-2012. Then, to reveal the influences of climatic teleconnections on the individual patterns, these results were correlated to the El Nino-Southern Oscillation (ENSO) and the Indian Ocean Dipole (IOD) indices. To study the predictability of water storage changes, advanced statistical methods were applied on the main independent Sea Surface Temperature (SST) patterns over the Atlantic and Indian Oceans for the period 2002-2012 and the ICA results. Our results show a water storage decrease over the coastal regions of West Africa (including Sierra Leone, Liberia, Togo and Nigeria), associated with rainfall decrease. The comparison between GRACE estimations and WGHM results indicates some inconsistencies that underline the importance of forcing data for hydrological modeling of West Africa. Keywords: West Africa; GRACE-derived water storage; ICA; ENSO; IOD

  8. Attempting to physically explain space-time correlation of extremes

    NASA Astrophysics Data System (ADS)

    Bernardara, Pietro; Gailhard, Joel

    2010-05-01

    Spatial and temporal clustering of hydro-meteorological extreme events is scientific evidence. Moreover, the statistical parameters characterizing their local frequencies of occurrence show clear spatial patterns. Thus, in order to robustly assess the hydro-meteorological hazard, statistical models need to be able to take into account spatial and temporal dependencies. Statistical models considering long term correlation for quantifying and qualifying temporal and spatial dependencies are available, such as multifractal approach. Furthermore, the development of regional frequency analysis techniques allows estimating the frequency of occurrence of extreme events taking into account spatial patterns on the extreme quantiles behaviour. However, in order to understand the origin of spatio-temporal clustering, an attempt to find physical explanation should be done. Here, some statistical evidences of spatio-temporal correlation and spatial patterns of extreme behaviour are given on a large database of more than 400 rainfall and discharge series in France. In particular, the spatial distribution of multifractal and Generalized Pareto distribution parameters shows evident correlation patterns in the behaviour of frequency of occurrence of extremes. It is then shown that the identification of atmospheric circulation pattern (weather types) can physically explain the temporal clustering of extreme rainfall events (seasonality) and the spatial pattern of the frequency of occurrence. Moreover, coupling this information with the hydrological modelization of a watershed (as in the Schadex approach) an explanation of spatio-temporal distribution of extreme discharge can also be provided. We finally show that a hydro-meteorological approach (as the Schadex approach) can explain and take into account space and time dependencies of hydro-meteorological extreme events.

  9. Opportunities for collaborative phenotyping for disease resistance traits in a large beef cattle resource population.

    PubMed

    Thallman, R M; Kuehn, L A; Allan, M F; Bennett, G L; Koohmaraie, M

    2008-01-01

    The Germplasm Evaluation (GPE) Project at the US Meat Animal Research Center (USMARC) is planned to produce about 3,000 calves per year in support of the following objectives: identification and validation of genetic polymorphisms related to economically relevant traits (ERT), estimation of breed and heterosis effects among 16 breeds for ERT, and estimation of genetic correlations among ERT and physiological indicator traits (PIT). Opportunities exist for collaboration in the development and collection of PIT phenotypes for disease resistance. Other areas of potential collaboration include detailed diagnosis (identification of disease causing organisms, etc.) of treated animals, collaborative development of epidemiological statistical models that would extract more information from the records of diagnoses and treatments, or pharmacogenetics. Concentrating a variety of different phenotypes and research approaches on the same population makes each component much more valuable than it would be individually.

  10. Logistic Regression in the Identification of Hazards in Construction

    NASA Astrophysics Data System (ADS)

    Drozd, Wojciech

    2017-10-01

    The construction site and its elements create circumstances that are conducive to the formation of risks to safety during the execution of works. Analysis indicates the critical importance of these factors in the set of characteristics that describe the causes of accidents in the construction industry. This article attempts to analyse the characteristics related to the construction site, in order to indicate their importance in defining the circumstances of accidents at work. The study includes sites inspected in 2014 - 2016 by the employees of the District Labour Inspectorate in Krakow (Poland). The analysed set of detailed (disaggregated) data includes both quantitative and qualitative characteristics. The substantive task focused on classification modelling in the identification of hazards in construction and identifying those of the analysed characteristics that are important in an accident. In terms of methodology, resource data analysis using statistical classifiers, in the form of logistic regression, was the method used.

  11. (Multi-)strange hadron and light (anti-)nuclei production with ALICE at the LHC

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lea, Ramona

    Thanks to its excellent tracking performance and particle identification capabilities, the ALICE detector allows for the identification of light (anti-)(hyper)nuclei and for the measurement of (multi-)strange particles over a wide range of transverse momentum. Deuterons, {sup 3}He and {sup 4}He and their corresponding anti-nuclei are identified via their specific energy loss in the Time Projection Chamber and the velocity measurement provided by the Time-Of-Flight detector. Strange and multi-strange baryons and mesons as well as (anti-)hypertritons are reconstructed via their topological decays. Detailed measurements of (multi-)strange hadron production in pp, p–Pb and Pb–Pb collision and of light (anti-)nuclei and (anti-)hypertritons inmore » Pb–Pb collisions with ALICE at the LHC are presented. The experimental results will be compared with the predictions of both statistical hadronization and coalescence models.« less

  12. Evaluation of a parallel implementation of the learning portion of the backward error propagation neural network: experiments in artifact identification.

    PubMed Central

    Sittig, D. F.; Orr, J. A.

    1991-01-01

    Various methods have been proposed in an attempt to solve problems in artifact and/or alarm identification including expert systems, statistical signal processing techniques, and artificial neural networks (ANN). ANNs consist of a large number of simple processing units connected by weighted links. To develop truly robust ANNs, investigators are required to train their networks on huge training data sets, requiring enormous computing power. We implemented a parallel version of the backward error propagation neural network training algorithm in the widely portable parallel programming language C-Linda. A maximum speedup of 4.06 was obtained with six processors. This speedup represents a reduction in total run-time from approximately 6.4 hours to 1.5 hours. We conclude that use of the master-worker model of parallel computation is an excellent method for obtaining speedups in the backward error propagation neural network training algorithm. PMID:1807607

  13. Myth 3: A Family of Identification Myths--Your Sample Must Be the Same as the Population. There Is a "Silver Bullet" in Identification. There Must Be "Winners" and "Losers" in Identification and Programming

    ERIC Educational Resources Information Center

    Callahan, Carolyn M.

    2009-01-01

    The evolution of several interrelated myths reflects a combination of misinterpretation of statistics, the commendable intention of ensuring that bias and prejudice do not play roles in the provision of services to underrepresented populations of gifted students, and misapplication of programming options for gifted students. Separately, these…

  14. Crop identification technology assessment for remote sensing. (CITARS) Volume 9: Statistical analysis of results

    NASA Technical Reports Server (NTRS)

    Davis, B. J.; Feiveson, A. H.

    1975-01-01

    Results are presented of CITARS data processing in raw form. Tables of descriptive statistics are given along with descriptions and results of inferential analyses. The inferential results are organized by questions which CITARS was designed to answer.

  15. Identification of damage in composite structures using Gaussian mixture model-processed Lamb waves

    NASA Astrophysics Data System (ADS)

    Wang, Qiang; Ma, Shuxian; Yue, Dong

    2018-04-01

    Composite materials have comprehensively better properties than traditional materials, and therefore have been more and more widely used, especially because of its higher strength-weight ratio. However, the damage of composite structures is usually varied and complicated. In order to ensure the security of these structures, it is necessary to monitor and distinguish the structural damage in a timely manner. Lamb wave-based structural health monitoring (SHM) has been proved to be effective in online structural damage detection and evaluation; furthermore, the characteristic parameters of the multi-mode Lamb wave varies in response to different types of damage in the composite material. This paper studies the damage identification approach for composite structures using the Lamb wave and the Gaussian mixture model (GMM). The algorithm and principle of the GMM, and the parameter estimation, is introduced. Multi-statistical characteristic parameters of the excited Lamb waves are extracted, and the parameter space with reduced dimensions is adopted by principal component analysis (PCA). The damage identification system using the GMM is then established through training. Experiments on a glass fiber-reinforced epoxy composite laminate plate are conducted to verify the feasibility of the proposed approach in terms of damage classification. The experimental results show that different types of damage can be identified according to the value of the likelihood function of the GMM.

  16. Assigning statistical significance to proteotypic peptides via database searches

    PubMed Central

    Alves, Gelio; Ogurtsov, Aleksey Y.; Yu, Yi-Kuo

    2011-01-01

    Querying MS/MS spectra against a database containing only proteotypic peptides reduces data analysis time due to reduction of database size. Despite the speed advantage, this search strategy is challenged by issues of statistical significance and coverage. The former requires separating systematically significant identifications from less confident identifications, while the latter arises when the underlying peptide is not present, due to single amino acid polymorphisms (SAPs) or post-translational modifications (PTMs), in the proteotypic peptide libraries searched. To address both issues simultaneously, we have extended RAId’s knowledge database to include proteotypic information, utilized RAId’s statistical strategy to assign statistical significance to proteotypic peptides, and modified RAId’s programs to allow for consideration of proteotypic information during database searches. The extended database alleviates the coverage problem since all annotated modifications, even those occurred within proteotypic peptides, may be considered. Taking into account the likelihoods of observation, the statistical strategy of RAId provides accurate E-value assignments regardless whether a candidate peptide is proteotypic or not. The advantage of including proteotypic information is evidenced by its superior retrieval performance when compared to regular database searches. PMID:21055489

  17. Use of 16S rRNA gene for identification of a broad range of clinically relevant bacterial pathogens

    DOE PAGES

    Srinivasan, Ramya; Karaoz, Ulas; Volegova, Marina; ...

    2015-02-06

    According to World Health Organization statistics of 2011, infectious diseases remain in the top five causes of mortality worldwide. However, despite sophisticated research tools for microbial detection, rapid and accurate molecular diagnostics for identification of infection in humans have not been extensively adopted. Time-consuming culture-based methods remain to the forefront of clinical microbial detection. The 16S rRNA gene, a molecular marker for identification of bacterial species, is ubiquitous to members of this domain and, thanks to ever-expanding databases of sequence information, a useful tool for bacterial identification. In this study, we assembled an extensive repository of clinical isolates (n =more » 617), representing 30 medically important pathogenic species and originally identified using traditional culture-based or non-16S molecular methods. This strain repository was used to systematically evaluate the ability of 16S rRNA for species level identification. To enable the most accurate species level classification based on the paucity of sequence data accumulated in public databases, we built a Naïve Bayes classifier representing a diverse set of high-quality sequences from medically important bacterial organisms. We show that for species identification, a model-based approach is superior to an alignment based method. Overall, between 16S gene based and clinical identities, our study shows a genus-level concordance rate of 96% and a species-level concordance rate of 87.5%. We point to multiple cases of probable clinical misidentification with traditional culture based identification across a wide range of gram-negative rods and gram-positive cocci as well as common gram-negative cocci.« less

  18. Use of 16S rRNA Gene for Identification of a Broad Range of Clinically Relevant Bacterial Pathogens

    PubMed Central

    Srinivasan, Ramya; Karaoz, Ulas; Volegova, Marina; MacKichan, Joanna; Kato-Maeda, Midori; Miller, Steve; Nadarajan, Rohan; Brodie, Eoin L.; Lynch, Susan V.

    2015-01-01

    According to World Health Organization statistics of 2011, infectious diseases remain in the top five causes of mortality worldwide. However, despite sophisticated research tools for microbial detection, rapid and accurate molecular diagnostics for identification of infection in humans have not been extensively adopted. Time-consuming culture-based methods remain to the forefront of clinical microbial detection. The 16S rRNA gene, a molecular marker for identification of bacterial species, is ubiquitous to members of this domain and, thanks to ever-expanding databases of sequence information, a useful tool for bacterial identification. In this study, we assembled an extensive repository of clinical isolates (n = 617), representing 30 medically important pathogenic species and originally identified using traditional culture-based or non-16S molecular methods. This strain repository was used to systematically evaluate the ability of 16S rRNA for species level identification. To enable the most accurate species level classification based on the paucity of sequence data accumulated in public databases, we built a Naïve Bayes classifier representing a diverse set of high-quality sequences from medically important bacterial organisms. We show that for species identification, a model-based approach is superior to an alignment based method. Overall, between 16S gene based and clinical identities, our study shows a genus-level concordance rate of 96% and a species-level concordance rate of 87.5%. We point to multiple cases of probable clinical misidentification with traditional culture based identification across a wide range of gram-negative rods and gram-positive cocci as well as common gram-negative cocci. PMID:25658760

  19. Identification of SNPs associated with variola virus virulence.

    PubMed

    Hoen, Anne Gatewood; Gardner, Shea N; Moore, Jason H

    2013-02-14

    Decades after the eradication of smallpox, its etiological agent, variola virus (VARV), remains a threat as a potential bioweapon. Outbreaks of smallpox around the time of the global eradication effort exhibited variable case fatality rates (CFRs), likely attributable in part to complex viral genetic determinants of smallpox virulence. We aimed to identify genome-wide single nucleotide polymorphisms associated with CFR. We evaluated unadjusted and outbreak geographic location-adjusted models of single SNPs and two- and three-way interactions between SNPs. Using the data mining approach multifactor dimensionality reduction (MDR), we identified five VARV SNPs in models significantly associated with CFR. The top performing unadjusted model and adjusted models both revealed the same two-way gene-gene interaction. We discuss the biological plausibility of the influence of the SNPs identified these and other significant models on the strain-specific virulence of VARV. We have identified genetic loci in the VARV genome that are statistically associated with VARV virulence as measured by CFR. While our ability to infer a causal relationship between the specific SNPs identified in our analysis and VARV virulence is limited, our results suggest that smallpox severity is in part associated with VARV strain variation and that VARV virulence may be determined by multiple genetic loci. This study represents the first application of MDR to the identification of pathogen gene-gene interactions for predicting infectious disease outbreak severity.

  20. Identification of SNPs associated with variola virus virulence

    PubMed Central

    2013-01-01

    Background Decades after the eradication of smallpox, its etiological agent, variola virus (VARV), remains a threat as a potential bioweapon. Outbreaks of smallpox around the time of the global eradication effort exhibited variable case fatality rates (CFRs), likely attributable in part to complex viral genetic determinants of smallpox virulence. We aimed to identify genome-wide single nucleotide polymorphisms associated with CFR. We evaluated unadjusted and outbreak geographic location-adjusted models of single SNPs and two- and three-way interactions between SNPs. Findings Using the data mining approach multifactor dimensionality reduction (MDR), we identified five VARV SNPs in models significantly associated with CFR. The top performing unadjusted model and adjusted models both revealed the same two-way gene-gene interaction. We discuss the biological plausibility of the influence of the SNPs identified these and other significant models on the strain-specific virulence of VARV. Conclusions We have identified genetic loci in the VARV genome that are statistically associated with VARV virulence as measured by CFR. While our ability to infer a causal relationship between the specific SNPs identified in our analysis and VARV virulence is limited, our results suggest that smallpox severity is in part associated with VARV strain variation and that VARV virulence may be determined by multiple genetic loci. This study represents the first application of MDR to the identification of pathogen gene-gene interactions for predicting infectious disease outbreak severity. PMID:23410064

  1. Identification of structural damage using wavelet-based data classification

    NASA Astrophysics Data System (ADS)

    Koh, Bong-Hwan; Jeong, Min-Joong; Jung, Uk

    2008-03-01

    Predicted time-history responses from a finite-element (FE) model provide a baseline map where damage locations are clustered and classified by extracted damage-sensitive wavelet coefficients such as vertical energy threshold (VET) positions having large silhouette statistics. Likewise, the measured data from damaged structure are also decomposed and rearranged according to the most dominant positions of wavelet coefficients. Having projected the coefficients to the baseline map, the true localization of damage can be identified by investigating the level of closeness between the measurement and predictions. The statistical confidence of baseline map improves as the number of prediction cases increases. The simulation results of damage detection in a truss structure show that the approach proposed in this study can be successfully applied for locating structural damage even in the presence of a considerable amount of process and measurement noise.

  2. Systematic review of statistically-derived models of immunological response in HIV-infected adults on antiretroviral therapy in Sub-Saharan Africa.

    PubMed

    Sempa, Joseph B; Ujeneza, Eva L; Nieuwoudt, Martin

    2017-01-01

    In Sub-Saharan African (SSA) resource limited settings, Cluster of Differentiation 4 (CD4) counts continue to be used for clinical decision making in antiretroviral therapy (ART). Here, HIV-infected people often remain with CD4 counts <350 cells/μL even after 5 years of viral load suppression. Ongoing immunological monitoring is necessary. Due to varying statistical modeling methods comparing immune response to ART across different cohorts is difficult. We systematically review such models and detail the similarities, differences and problems. 'Preferred Reporting Items for Systematic Review and Meta-Analyses' guidelines were used. Only studies of immune-response after ART initiation from SSA in adults were included. Data was extracted from each study and tabulated. Outcomes were categorized into 3 groups: 'slope', 'survival', and 'asymptote' models. Wordclouds were drawn wherein the frequency of variables occurring in the reviewed models is indicated by their size and color. 69 covariates were identified in the final models of 35 studies. Effect sizes of covariates were not directly quantitatively comparable in view of the combination of differing variables and scale transformation methods across models. Wordclouds enabled the identification of qualitative and semi-quantitative covariate sets for each outcome category. Comparison across categories identified sex, baseline age, baseline log viral load, baseline CD4, ART initiation regimen and ART duration as a minimal consensus set. Most models were different with respect to covariates included, variable transformations and scales, model assumptions, modelling strategies and reporting methods, even for the same outcomes. To enable comparison across cohorts, statistical models would benefit from the application of more uniform modelling techniques. Historic efforts have produced results that are anecdotal to individual cohorts only. This study was able to define 'prior' knowledge in the Bayesian sense. Such information has value for prospective modelling efforts.

  3. Searching for the elusive gift: advances in talent identification in sport.

    PubMed

    Mann, David L; Dehghansai, Nima; Baker, Joseph

    2017-08-01

    The incentives for sport organizations to identify talented athletes from a young age continue to grow, yet effective talent identification remains a challenging task. This opinion paper examines recent advances in talent identification, focusing in particular on the emergence of new approaches that may offer promise to identify talent (e.g., small-sided games, genetic testing, and advanced statistical analyses). We appraise new multi-disciplinary and large-scale population studies of talent identification, provide a consideration of the most recent psychological predictors of performance, examine the emergence of new approaches that strive to diminish biases in talent identification, and look at the rise in interest in talent identification in Paralympic sport. Crown Copyright © 2017. Published by Elsevier Ltd. All rights reserved.

  4. 21st Century Projections of High Streamflow Events in the UK and Germany

    NASA Astrophysics Data System (ADS)

    Cioffi, Francesco; Rosario Conticello, Federico; Lall, Upmanu; Merz, Bruno

    2017-04-01

    Radiative effects of anthropogenic changes in atmospheric composition are expected to enhance the hydrological cycle leading to more frequent and intense floods. To explore if there will be an increased risk of river flooding in the future, 21st century projections under global warming scenarios of High Streamflow Events (HSEs) for UK and German rivers are carried out, using a model that statistically relates large-scale atmospheric predictors - 850 hPa Geopotential Height (GPH850) and Integrated Water Vapor Transport (IVT) - to the occurrence of HSEs in one or simultaneously in several streamflow gauges. Here, HSE is defined as the streamflow exceeding the 99th percentile of daily flowrate time series measured at streamflow gauges. For the common period 1960-2012, historical data from 57 streamflow gauges in UK and 61 streamflow gauges in Germany, as well as, reanalysis data of GPH850 and IVT fields, bounded from 90W to 70E and from 20N to 80N are used. The link between GPH850 configurations and HSEs, and more precisely, identification of the GPH850 states potentially able to generate HSEs, is performed by a combined Kohonen Networks (Self Organized Map, SOM) and Event Syncronization approach. Complex network and modularity methods are used to cluster streamflow gauges that share common GPH850 configurations. Then a model based on a conditional Poisson distribution, in which the parameter of the Poisson distribution is assumed to be a nonlinear function of GPH850 and IVT, allows for the identification of GPH850 state and threshold of IVT beyond which there is the HSE highest probability. Using that model, projections of 21st century changes in frequency of HSEs occurrence in UK and Germany are estimated using the simulated fields of GPH850 and IVT from selected GCMs belonging to the Coupled Model Inter-comparison Project Phase 5 (CMIP5). Among the different GCMs, those are selected whose retrospective predictor fields have consistent statistics with the corresponding reanalysis data.

  5. Attaining insight into interactions between hydrologic model parameters and geophysical attributes for national-scale model parameter estimation

    NASA Astrophysics Data System (ADS)

    Mizukami, N.; Clark, M. P.; Newman, A. J.; Wood, A.; Gutmann, E. D.

    2017-12-01

    Estimating spatially distributed model parameters is a grand challenge for large domain hydrologic modeling, especially in the context of hydrologic model applications such as streamflow forecasting. Multi-scale Parameter Regionalization (MPR) is a promising technique that accounts for the effects of fine-scale geophysical attributes (e.g., soil texture, land cover, topography, climate) on model parameters and nonlinear scaling effects on model parameters. MPR computes model parameters with transfer functions (TFs) that relate geophysical attributes to model parameters at the native input data resolution and then scales them using scaling functions to the spatial resolution of the model implementation. One of the biggest challenges in the use of MPR is identification of TFs for each model parameter: both functional forms and geophysical predictors. TFs used to estimate the parameters of hydrologic models typically rely on previous studies or were derived in an ad-hoc, heuristic manner, potentially not utilizing maximum information content contained in the geophysical attributes for optimal parameter identification. Thus, it is necessary to first uncover relationships among geophysical attributes, model parameters, and hydrologic processes (i.e., hydrologic signatures) to obtain insight into which and to what extent geophysical attributes are related to model parameters. We perform multivariate statistical analysis on a large-sample catchment data set including various geophysical attributes as well as constrained VIC model parameters at 671 unimpaired basins over the CONUS. We first calibrate VIC model at each catchment to obtain constrained parameter sets. Additionally, parameter sets sampled during the calibration process are used for sensitivity analysis using various hydrologic signatures as objectives to understand the relationships among geophysical attributes, parameters, and hydrologic processes.

  6. A New Zero-Inflated Negative Binomial Methodology for Latent Category Identification

    ERIC Educational Resources Information Center

    Blanchard, Simon J.; DeSarbo, Wayne S.

    2013-01-01

    We introduce a new statistical procedure for the identification of unobserved categories that vary between individuals and in which objects may span multiple categories. This procedure can be used to analyze data from a proposed sorting task in which individuals may simultaneously assign objects to multiple piles. The results of a synthetic…

  7. Improved method for reliable HMW-GS identification by RP-HPLC and SDS-PAGE in common wheat cultivars

    USDA-ARS?s Scientific Manuscript database

    The accurate identification of alleles for high-molecular weight glutenins (HMW-GS) is critical for wheat breeding programs targeting end-use quality. RP-HPLC methods were optimized for separation of HMW-GS, resulting in enhanced resolution of 1By and 1Dx subunits. Statistically significant differe...

  8. Pepitome: evaluating improved spectral library search for identification complementarity and quality assessment

    PubMed Central

    Dasari, Surendra; Chambers, Matthew C.; Martinez, Misti A.; Carpenter, Kristin L.; Ham, Amy-Joan L.; Vega-Montoto, Lorenzo J.; Tabb, David L.

    2012-01-01

    Spectral libraries have emerged as a viable alternative to protein sequence databases for peptide identification. These libraries contain previously detected peptide sequences and their corresponding tandem mass spectra (MS/MS). Search engines can then identify peptides by comparing experimental MS/MS scans to those in the library. Many of these algorithms employ the dot product score for measuring the quality of a spectrum-spectrum match (SSM). This scoring system does not offer a clear statistical interpretation and ignores fragment ion m/z discrepancies in the scoring. We developed a new spectral library search engine, Pepitome, which employs statistical systems for scoring SSMs. Pepitome outperformed the leading library search tool, SpectraST, when analyzing data sets acquired on three different mass spectrometry platforms. We characterized the reliability of spectral library searches by confirming shotgun proteomics identifications through RNA-Seq data. Applying spectral library and database searches on the same sample revealed their complementary nature. Pepitome identifications enabled the automation of quality analysis and quality control (QA/QC) for shotgun proteomics data acquisition pipelines. PMID:22217208

  9. [Road Extraction in Remote Sensing Images Based on Spectral and Edge Analysis].

    PubMed

    Zhao, Wen-zhi; Luo, Li-qun; Guo, Zhou; Yue, Jun; Yu, Xue-ying; Liu, Hui; Wei, Jing

    2015-10-01

    Roads are typically man-made objects in urban areas. Road extraction from high-resolution images has important applications for urban planning and transportation development. However, due to the confusion of spectral characteristic, it is difficult to distinguish roads from other objects by merely using traditional classification methods that mainly depend on spectral information. Edge is an important feature for the identification of linear objects (e. g. , roads). The distribution patterns of edges vary greatly among different objects. It is crucial to merge edge statistical information into spectral ones. In this study, a new method that combines spectral information and edge statistical features has been proposed. First, edge detection is conducted by using self-adaptive mean-shift algorithm on the panchromatic band, which can greatly reduce pseudo-edges and noise effects. Then, edge statistical features are obtained from the edge statistical model, which measures the length and angle distribution of edges. Finally, by integrating the spectral and edge statistical features, SVM algorithm is used to classify the image and roads are ultimately extracted. A series of experiments are conducted and the results show that the overall accuracy of proposed method is 93% comparing with only 78% overall accuracy of the traditional. The results demonstrate that the proposed method is efficient and valuable for road extraction, especially on high-resolution images.

  10. Methods and application of system identification in shock and vibration.

    NASA Technical Reports Server (NTRS)

    Collins, J. D.; Young, J. P.; Kiefling, L.

    1972-01-01

    A logical picture is presented of current useful system identification techniques in the shock and vibration field. A technology tree diagram is developed for the purpose of organizing and categorizing the widely varying approaches according to the fundamental nature of each. Specific examples of accomplished activity for each identification category are noted and discussed. To provide greater insight into the most current trends in the system identification field, a somewhat detailed description is presented of the essential features of a recently developed technique that is based on making the maximum use of all statistically known information about a system.

  11. Apparatus and method for identification and recognition of an item with ultrasonic patterns from item subsurface micro-features

    DOEpatents

    Perkins, Richard W.; Fuller, James L.; Doctor, Steven R.; Good, Morris S.; Heasler, Patrick G.; Skorpik, James R.; Hansen, Norman H.

    1995-01-01

    The present invention is a means and method for identification and recognition of an item by ultrasonic imaging of material microfeatures and/or macrofeatures within the bulk volume of a material. The invention is based upon ultrasonic interrogation and imaging of material microfeatures within the body of material by accepting only reflected ultrasonic energy from a preselected plane or volume within the material. An initial interrogation produces an identification reference. Subsequent new scans are statistically compared to the identification reference for making a match/non-match decision.

  12. Apparatus and method for identification and recognition of an item with ultrasonic patterns from item subsurface micro-features

    DOEpatents

    Perkins, R.W.; Fuller, J.L.; Doctor, S.R.; Good, M.S.; Heasler, P.G.; Skorpik, J.R.; Hansen, N.H.

    1995-09-26

    The present invention is a means and method for identification and recognition of an item by ultrasonic imaging of material microfeatures and/or macrofeatures within the bulk volume of a material. The invention is based upon ultrasonic interrogation and imaging of material microfeatures within the body of material by accepting only reflected ultrasonic energy from a preselected plane or volume within the material. An initial interrogation produces an identification reference. Subsequent new scans are statistically compared to the identification reference for making a match/non-match decision. 15 figs.

  13. [Projection of prisoner numbers].

    PubMed

    Metz, Rainer; Sohn, Werner

    2015-01-01

    The past and future development of occupancy rates in prisons is of crucial importance for the judicial administration of every country. Basic factors for planning the required penal facilities are seasonal fluctuations, minimum, maximum and average occupancy as well as the present situation and potential development of certain imprisonment categories. As the prisoner number of a country is determined by a complex set of interdependent conditions, it has turned out to be difficult to provide any theoretical explanations. The idea accepted in criminology for a long time that prisoner numbers are interdependent with criminal policy must be regarded as having failed. Statistical and time series analyses may help, however, to identify the factors having influenced the development of prisoner numbers in the past. The analyses presented here, first describe such influencing factors from a criminological perspective and then deal with their statistical identification and modelling. Using the development of prisoner numbers in Hesse as an example, it has been found that modelling methods in which the independent variables predict the dependent variable with a time lag are particularly helpful. A potential complication is, however, that for predicting the number of prisoners the different dynamics in German and foreign prisoners require the development of further models.

  14. Enhancing the Biological Relevance of Secretome-Based Proteomics by Linking Tumor Cell Proliferation and Protein Secretion.

    PubMed

    Gregori, Josep; Méndez, Olga; Katsila, Theodora; Pujals, Mireia; Salvans, Cándida; Villarreal, Laura; Arribas, Joaquin; Tabernero, Josep; Sánchez, Alex; Villanueva, Josep

    2014-07-15

    Secretome profiling has become a methodology of choice for the identification of tumor biomarkers. We hypothesized that due to the dynamic nature of secretomes cellular perturbations could affect their composition but also change the global amount of protein secreted per cell. We confirmed our hypothesis by measuring the levels of secreted proteins taking into account the amount of proteome produced per cell. Then, we established a correlation between cell proliferation and protein secretion that explained the observed changes in global protein secretion. Next, we implemented a normalization correcting the statistical results of secretome studies by the global protein secretion of cells into a generalized linear model (GLM). The application of the normalization to two biological perturbations on tumor cells resulted in drastic changes in the list of statistically significant proteins. Furthermore, we found that known epithelial-to-mesenchymal transition (EMT) effectors were only statistically significant when the normalization was applied. Therefore, the normalization proposed here increases the sensitivity of statistical tests by increasing the number of true-positives. From an oncology perspective, the correlation between protein secretion and cellular proliferation suggests that slow-growing tumors could have high-protein secretion rates and consequently contribute strongly to tumor paracrine signaling.

  15. Adaptive filtering in biological signal processing.

    PubMed

    Iyer, V K; Ploysongsang, Y; Ramamoorthy, P A

    1990-01-01

    The high dependence of conventional optimal filtering methods on the a priori knowledge of the signal and noise statistics render them ineffective in dealing with signals whose statistics cannot be predetermined accurately. Adaptive filtering methods offer a better alternative, since the a priori knowledge of statistics is less critical, real time processing is possible, and the computations are less expensive for this approach. Adaptive filtering methods compute the filter coefficients "on-line", converging to the optimal values in the least-mean square (LMS) error sense. Adaptive filtering is therefore apt for dealing with the "unknown" statistics situation and has been applied extensively in areas like communication, speech, radar, sonar, seismology, and biological signal processing and analysis for channel equalization, interference and echo canceling, line enhancement, signal detection, system identification, spectral analysis, beamforming, modeling, control, etc. In this review article adaptive filtering in the context of biological signals is reviewed. An intuitive approach to the underlying theory of adaptive filters and its applicability are presented. Applications of the principles in biological signal processing are discussed in a manner that brings out the key ideas involved. Current and potential future directions in adaptive biological signal processing are also discussed.

  16. High-accuracy user identification using EEG biometrics.

    PubMed

    Koike-Akino, Toshiaki; Mahajan, Ruhi; Marks, Tim K; Ye Wang; Watanabe, Shinji; Tuzel, Oncel; Orlik, Philip

    2016-08-01

    We analyze brain waves acquired through a consumer-grade EEG device to investigate its capabilities for user identification and authentication. First, we show the statistical significance of the P300 component in event-related potential (ERP) data from 14-channel EEGs across 25 subjects. We then apply a variety of machine learning techniques, comparing the user identification performance of various different combinations of a dimensionality reduction technique followed by a classification algorithm. Experimental results show that an identification accuracy of 72% can be achieved using only a single 800 ms ERP epoch. In addition, we demonstrate that the user identification accuracy can be significantly improved to more than 96.7% by joint classification of multiple epochs.

  17. Geometry of behavioral spaces: A computational approach to analysis and understanding of agent based models and agent behaviors

    NASA Astrophysics Data System (ADS)

    Cenek, Martin; Dahl, Spencer K.

    2016-11-01

    Systems with non-linear dynamics frequently exhibit emergent system behavior, which is important to find and specify rigorously to understand the nature of the modeled phenomena. Through this analysis, it is possible to characterize phenomena such as how systems assemble or dissipate and what behaviors lead to specific final system configurations. Agent Based Modeling (ABM) is one of the modeling techniques used to study the interaction dynamics between a system's agents and its environment. Although the methodology of ABM construction is well understood and practiced, there are no computational, statistically rigorous, comprehensive tools to evaluate an ABM's execution. Often, a human has to observe an ABM's execution in order to analyze how the ABM functions, identify the emergent processes in the agent's behavior, or study a parameter's effect on the system-wide behavior. This paper introduces a new statistically based framework to automatically analyze agents' behavior, identify common system-wide patterns, and record the probability of agents changing their behavior from one pattern of behavior to another. We use network based techniques to analyze the landscape of common behaviors in an ABM's execution. Finally, we test the proposed framework with a series of experiments featuring increasingly emergent behavior. The proposed framework will allow computational comparison of ABM executions, exploration of a model's parameter configuration space, and identification of the behavioral building blocks in a model's dynamics.

  18. Geometry of behavioral spaces: A computational approach to analysis and understanding of agent based models and agent behaviors.

    PubMed

    Cenek, Martin; Dahl, Spencer K

    2016-11-01

    Systems with non-linear dynamics frequently exhibit emergent system behavior, which is important to find and specify rigorously to understand the nature of the modeled phenomena. Through this analysis, it is possible to characterize phenomena such as how systems assemble or dissipate and what behaviors lead to specific final system configurations. Agent Based Modeling (ABM) is one of the modeling techniques used to study the interaction dynamics between a system's agents and its environment. Although the methodology of ABM construction is well understood and practiced, there are no computational, statistically rigorous, comprehensive tools to evaluate an ABM's execution. Often, a human has to observe an ABM's execution in order to analyze how the ABM functions, identify the emergent processes in the agent's behavior, or study a parameter's effect on the system-wide behavior. This paper introduces a new statistically based framework to automatically analyze agents' behavior, identify common system-wide patterns, and record the probability of agents changing their behavior from one pattern of behavior to another. We use network based techniques to analyze the landscape of common behaviors in an ABM's execution. Finally, we test the proposed framework with a series of experiments featuring increasingly emergent behavior. The proposed framework will allow computational comparison of ABM executions, exploration of a model's parameter configuration space, and identification of the behavioral building blocks in a model's dynamics.

  19. A statistical model for monitoring shell disease in inshore lobster fisheries: A case study in Long Island Sound

    PubMed Central

    Chen, Yong

    2017-01-01

    The expansion of shell disease is an emerging threat to the inshore lobster fisheries in the northeastern United States. The development of models to improve the efficiency and precision of existing monitoring programs is advocated as an important step in mitigating its harmful effects. The objective of this study is to construct a statistical model that could enhance the existing monitoring effort through (1) identification of potential disease-associated abiotic and biotic factors, and (2) estimation of spatial variation in disease prevalence in the lobster fishery. A delta-generalized additive modeling (GAM) approach was applied using bottom trawl survey data collected from 2001–2013 in Long Island Sound, a tidal estuary between New York and Connecticut states. Spatial distribution of shell disease prevalence was found to be strongly influenced by the interactive effects of latitude and longitude, possibly indicative of a geographic origin of shell disease. Bottom temperature, bottom salinity, and depth were also important factors affecting the spatial variability in shell disease prevalence. The delta-GAM projected high disease prevalence in non-surveyed locations. Additionally, a potential spatial discrepancy was found between modeled disease hotspots and survey-based gravity centers of disease prevalence. This study provides a modeling framework to enhance research, monitoring and management of emerging and continuing marine disease threats. PMID:28196150

  20. Identification of Nasal Bone Fractures on Conventional Radiography and Facial CT: Comparison of the Diagnostic Accuracy in Different Imaging Modalities and Analysis of Interobserver Reliability

    PubMed Central

    Baek, Hye Jin; Kim, Dong Wook; Ryu, Ji Hwa; Lee, Yoo Jin

    2013-01-01

    Background There has been no study to compare the diagnostic accuracy of an experienced radiologist with a trainee in nasal bone fracture. Objectives To compare the diagnostic accuracy between conventional radiography and computed tomography (CT) for the identification of nasal bone fractures and to evaluate the interobserver reliability between a staff radiologist and a trainee. Patients and Methods A total of 108 patients who underwent conventional radiography and CT after acute nasal trauma were included in this retrospective study. Two readers, a staff radiologist and a second-year resident, independently assessed the results of the imaging studies. Results Of the 108 patients, the presence of a nasal bone fracture was confirmed in 88 (81.5%) patients. The number of non-depressed fractures was higher than the number of depressed fractures. In nine (10.2%) patients, nasal bone fractures were only identified on conventional radiography, including three depressed and six non-depressed fractures. CT was more accurate as compared to conventional radiography for the identification of nasal bone fractures as determined by both readers (P <0.05), all diagnostic indices of an experienced radiologist were similar to or higher than those of a trainee, and κ statistics showed moderate agreement between the two diagnostic tools for both readers. There was no statistical difference in the assessment of interobserver reliability for both imaging modalities in the identification of nasal bone fractures. Conclusion For the identification of nasal bone fractures, CT was significantly superior to conventional radiography. Although a staff radiologist showed better values in the identification of nasal bone fracture and differentiation between depressed and non-depressed fractures than a trainee, there was no statistically significant difference in the interpretation of conventional radiography and CT between a radiologist and a trainee. PMID:24348599

  1. Mapping model behaviour using Self-Organizing Maps

    NASA Astrophysics Data System (ADS)

    Herbst, M.; Gupta, H. V.; Casper, M. C.

    2009-03-01

    Hydrological model evaluation and identification essentially involves extracting and processing information from model time series. However, the type of information extracted by statistical measures has only very limited meaning because it does not relate to the hydrological context of the data. To overcome this inadequacy we exploit the diagnostic evaluation concept of Signature Indices, in which model performance is measured using theoretically relevant characteristics of system behaviour. In our study, a Self-Organizing Map (SOM) is used to process the Signatures extracted from Monte-Carlo simulations generated by the distributed conceptual watershed model NASIM. The SOM creates a hydrologically interpretable mapping of overall model behaviour, which immediately reveals deficits and trade-offs in the ability of the model to represent the different functional behaviours of the watershed. Further, it facilitates interpretation of the hydrological functions of the model parameters and provides preliminary information regarding their sensitivities. Most notably, we use this mapping to identify the set of model realizations (among the Monte-Carlo data) that most closely approximate the observed discharge time series in terms of the hydrologically relevant characteristics, and to confine the parameter space accordingly. Our results suggest that Signature Index based SOMs could potentially serve as tools for decision makers inasmuch as model realizations with specific Signature properties can be selected according to the purpose of the model application. Moreover, given that the approach helps to represent and analyze multi-dimensional distributions, it could be used to form the basis of an optimization framework that uses SOMs to characterize the model performance response surface. As such it provides a powerful and useful way to conduct model identification and model uncertainty analyses.

  2. Mapping model behaviour using Self-Organizing Maps

    NASA Astrophysics Data System (ADS)

    Herbst, M.; Gupta, H. V.; Casper, M. C.

    2008-12-01

    Hydrological model evaluation and identification essentially depends on the extraction of information from model time series and its processing. However, the type of information extracted by statistical measures has only very limited meaning because it does not relate to the hydrological context of the data. To overcome this inadequacy we exploit the diagnostic evaluation concept of Signature Indices, in which model performance is measured using theoretically relevant characteristics of system behaviour. In our study, a Self-Organizing Map (SOM) is used to process the Signatures extracted from Monte-Carlo simulations generated by a distributed conceptual watershed model. The SOM creates a hydrologically interpretable mapping of overall model behaviour, which immediately reveals deficits and trade-offs in the ability of the model to represent the different functional behaviours of the watershed. Further, it facilitates interpretation of the hydrological functions of the model parameters and provides preliminary information regarding their sensitivities. Most notably, we use this mapping to identify the set of model realizations (among the Monte-Carlo data) that most closely approximate the observed discharge time series in terms of the hydrologically relevant characteristics, and to confine the parameter space accordingly. Our results suggest that Signature Index based SOMs could potentially serve as tools for decision makers inasmuch as model realizations with specific Signature properties can be selected according to the purpose of the model application. Moreover, given that the approach helps to represent and analyze multi-dimensional distributions, it could be used to form the basis of an optimization framework that uses SOMs to characterize the model performance response surface. As such it provides a powerful and useful way to conduct model identification and model uncertainty analyses.

  3. Search-based model identification of smart-structure damage

    NASA Technical Reports Server (NTRS)

    Glass, B. J.; Macalou, A.

    1991-01-01

    This paper describes the use of a combined model and parameter identification approach, based on modal analysis and artificial intelligence (AI) techniques, for identifying damage or flaws in a rotating truss structure incorporating embedded piezoceramic sensors. This smart structure example is representative of a class of structures commonly found in aerospace systems and next generation space structures. Artificial intelligence techniques of classification, heuristic search, and an object-oriented knowledge base are used in an AI-based model identification approach. A finite model space is classified into a search tree, over which a variant of best-first search is used to identify the model whose stored response most closely matches that of the input. Newly-encountered models can be incorporated into the model space. This adaptativeness demonstrates the potential for learning control. Following this output-error model identification, numerical parameter identification is used to further refine the identified model. Given the rotating truss example in this paper, noisy data corresponding to various damage configurations are input to both this approach and a conventional parameter identification method. The combination of the AI-based model identification with parameter identification is shown to lead to smaller parameter corrections than required by the use of parameter identification alone.

  4. Synthetic data sets for the identification of key ingredients for RNA-seq differential analysis.

    PubMed

    Rigaill, Guillem; Balzergue, Sandrine; Brunaud, Véronique; Blondet, Eddy; Rau, Andrea; Rogier, Odile; Caius, José; Maugis-Rabusseau, Cathy; Soubigou-Taconnat, Ludivine; Aubourg, Sébastien; Lurin, Claire; Martin-Magniette, Marie-Laure; Delannoy, Etienne

    2018-01-01

    Numerous statistical pipelines are now available for the differential analysis of gene expression measured with RNA-sequencing technology. Most of them are based on similar statistical frameworks after normalization, differing primarily in the choice of data distribution, mean and variance estimation strategy and data filtering. We propose an evaluation of the impact of these choices when few biological replicates are available through the use of synthetic data sets. This framework is based on real data sets and allows the exploration of various scenarios differing in the proportion of non-differentially expressed genes. Hence, it provides an evaluation of the key ingredients of the differential analysis, free of the biases associated with the simulation of data using parametric models. Our results show the relevance of a proper modeling of the mean by using linear or generalized linear modeling. Once the mean is properly modeled, the impact of the other parameters on the performance of the test is much less important. Finally, we propose to use the simple visualization of the raw P-value histogram as a practical evaluation criterion of the performance of differential analysis methods on real data sets. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  5. Field-theoretic approach to fluctuation effects in neural networks

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Buice, Michael A.; Cowan, Jack D.; Mathematics Department, University of Chicago, Chicago, Illinois 60637

    A well-defined stochastic theory for neural activity, which permits the calculation of arbitrary statistical moments and equations governing them, is a potentially valuable tool for theoretical neuroscience. We produce such a theory by analyzing the dynamics of neural activity using field theoretic methods for nonequilibrium statistical processes. Assuming that neural network activity is Markovian, we construct the effective spike model, which describes both neural fluctuations and response. This analysis leads to a systematic expansion of corrections to mean field theory, which for the effective spike model is a simple version of the Wilson-Cowan equation. We argue that neural activity governedmore » by this model exhibits a dynamical phase transition which is in the universality class of directed percolation. More general models (which may incorporate refractoriness) can exhibit other universality classes, such as dynamic isotropic percolation. Because of the extremely high connectivity in typical networks, it is expected that higher-order terms in the systematic expansion are small for experimentally accessible measurements, and thus, consistent with measurements in neocortical slice preparations, we expect mean field exponents for the transition. We provide a quantitative criterion for the relative magnitude of each term in the systematic expansion, analogous to the Ginsburg criterion. Experimental identification of dynamic universality classes in vivo is an outstanding and important question for neuroscience.« less

  6. [Study on the automatic parameters identification of water pipe network model].

    PubMed

    Jia, Hai-Feng; Zhao, Qi-Feng

    2010-01-01

    Based on the problems analysis on development and application of water pipe network model, the model parameters automatic identification is regarded as a kernel bottleneck of model's application in water supply enterprise. The methodology of water pipe network model parameters automatic identification based on GIS and SCADA database is proposed. Then the kernel algorithm of model parameters automatic identification is studied, RSA (Regionalized Sensitivity Analysis) is used for automatic recognition of sensitive parameters, and MCS (Monte-Carlo Sampling) is used for automatic identification of parameters, the detail technical route based on RSA and MCS is presented. The module of water pipe network model parameters automatic identification is developed. At last, selected a typical water pipe network as a case, the case study on water pipe network model parameters automatic identification is conducted and the satisfied results are achieved.

  7. DNA analysis in Disaster Victim Identification.

    PubMed

    Montelius, Kerstin; Lindblom, Bertil

    2012-06-01

    DNA profiling and matching is one of the primary methods to identify missing persons in a disaster, as defined by the Interpol Disaster Victim Identification Guide. The process to identify a victim by DNA includes: the collection of the best possible ante-mortem (AM) samples, the choice of post-mortem (PM) samples, DNA-analysis, matching and statistical weighting of the genetic relationship or match. Each disaster has its own scenario, and each scenario defines its own methods for identification of the deceased.

  8. Identification of a Class of Filtered Poisson Processes.

    DTIC Science & Technology

    1981-01-01

    LD-A135 371 IDENTIFICATION OF A CLASS OF FILERED POISSON PROCESSES I AU) NORTH CAROLINA UNIV AT CHAPEL HIL DEPT 0F STATISTICS D DE RRUC ET AL 1981...STNO&IO$ !tt ~ 4.s " . , ".7" -L N ~ TITLE :IDENTIFICATION OF A CLASS OF FILTERED POISSON PROCESSES Authors : DE BRUCQ Denis - GUALTIEROTTI Antonio...filtered Poisson processes is intro- duced : the amplitude has a law which is spherically invariant and the filter is real, linear and causal. It is shown

  9. Implementation and evaluation of a community-based interprofessional learning activity.

    PubMed

    Luebbers, Ellen L; Dolansky, Mary A; Vehovec, Anton; Petty, Gayle

    2017-01-01

    Implementation of large-scale, meaningful interprofessional learning activities for pre-licensure students has significant barriers and requires novel approaches to ensure success. To accomplish this goal, faculty at Case Western Reserve University, Ohio, USA, used the Ottawa Model of Research Use (OMRU) framework to create, improve, and sustain a community-based interprofessional learning activity for large numbers of medical students (N = 177) and nursing students (N = 154). The model guided the process and included identification of context-specific barriers and facilitators, continual monitoring and improvement using data, and evaluation of student learning outcomes as well as programme outcomes. First year Case Western Reserve University medical students and undergraduate nursing students participated in team-structured prevention screening clinics in the Cleveland Metropolitan Public School District. Identification of barriers and facilitators assisted with overcoming logistic and scheduling issues, large class size, differing ages and skill levels of students and creating sustainability. Continual monitoring led to three distinct phases of improvement and resulted in the creation of an authentic team structure, role clarification, and relevance for students. Evaluation of student learning included both qualitative and quantitative methods, resulting in statistically significant findings and qualitative themes of learner outcomes. The OMRU implementation model provided a useful framework for successful implementation resulting in a sustainable interprofessional learning activity.

  10. GPHMM: an integrated hidden Markov model for identification of copy number alteration and loss of heterozygosity in complex tumor samples using whole genome SNP arrays

    PubMed Central

    Li, Ao; Liu, Zongzhi; Lezon-Geyda, Kimberly; Sarkar, Sudipa; Lannin, Donald; Schulz, Vincent; Krop, Ian; Winer, Eric; Harris, Lyndsay; Tuck, David

    2011-01-01

    There is an increasing interest in using single nucleotide polymorphism (SNP) genotyping arrays for profiling chromosomal rearrangements in tumors, as they allow simultaneous detection of copy number and loss of heterozygosity with high resolution. Critical issues such as signal baseline shift due to aneuploidy, normal cell contamination, and the presence of GC content bias have been reported to dramatically alter SNP array signals and complicate accurate identification of aberrations in cancer genomes. To address these issues, we propose a novel Global Parameter Hidden Markov Model (GPHMM) to unravel tangled genotyping data generated from tumor samples. In contrast to other HMM methods, a distinct feature of GPHMM is that the issues mentioned above are quantitatively modeled by global parameters and integrated within the statistical framework. We developed an efficient EM algorithm for parameter estimation. We evaluated performance on three data sets and show that GPHMM can correctly identify chromosomal aberrations in tumor samples containing as few as 10% cancer cells. Furthermore, we demonstrated that the estimation of global parameters in GPHMM provides information about the biological characteristics of tumor samples and the quality of genotyping signal from SNP array experiments, which is helpful for data quality control and outlier detection in cohort studies. PMID:21398628

  11. The application of new software tools to quantitative protein profiling via isotope-coded affinity tag (ICAT) and tandem mass spectrometry: I. Statistically annotated datasets for peptide sequences and proteins identified via the application of ICAT and tandem mass spectrometry to proteins copurifying with T cell lipid rafts.

    PubMed

    von Haller, Priska D; Yi, Eugene; Donohoe, Samuel; Vaughn, Kelly; Keller, Andrew; Nesvizhskii, Alexey I; Eng, Jimmy; Li, Xiao-jun; Goodlett, David R; Aebersold, Ruedi; Watts, Julian D

    2003-07-01

    Lipid rafts were prepared according to standard protocols from Jurkat T cells stimulated via T cell receptor/CD28 cross-linking and from control (unstimulated) cells. Co-isolating proteins from the control and stimulated cell preparations were labeled with isotopically normal (d0) and heavy (d8) versions of the same isotope-coded affinity tag (ICAT) reagent, respectively. Samples were combined, proteolyzed, and resultant peptides fractionated via cation exchange chromatography. Cysteine-containing (ICAT-labeled) peptides were recovered via the biotin tag component of the ICAT reagents by avidin-affinity chromatography. On-line micro-capillary liquid chromatography tandem mass spectrometry was performed on both avidin-affinity (ICAT-labeled) and flow-through (unlabeled) fractions. Initial peptide sequence identification was by searching recorded tandem mass spectrometry spectra against a human sequence data base using SEQUEST software. New statistical data modeling algorithms were then applied to the SEQUEST search results. These allowed for discrimination between likely "correct" and "incorrect" peptide assignments, and from these the inferred proteins that they collectively represented, by calculating estimated probabilities that each peptide assignment and subsequent protein identification was a member of the "correct" population. For convenience, the resultant lists of peptide sequences assigned and the proteins to which they corresponded were filtered at an arbitrarily set cut-off of 0.5 (i.e. 50% likely to be "correct") and above and compiled into two separate datasets. In total, these data sets contained 7667 individual peptide identifications, which represented 2669 unique peptide sequences, corresponding to 685 proteins and related protein groups.

  12. Attenuation of Storm Surge Flooding By Wetlands in the Chesapeake Bay: An Integrated Geospatial Framework Evaluating Impacts to Critical Infrastructure

    NASA Astrophysics Data System (ADS)

    Khalid, A.; Haddad, J.; Lawler, S.; Ferreira, C.

    2014-12-01

    Areas along the Chesapeake Bay and its tributaries are extremely vulnerable to hurricane flooding, as evidenced by the costly effects and severe impacts of recent storms along the Virginia coast, such as Hurricane Isabel in 2003 and Hurricane Sandy in 2012. Coastal wetlands, in addition to their ecological importance, are expected to mitigate the impact of storm surge by acting as a natural protection against hurricane flooding. Quantifying such interactions helps to provide a sound scientific basis to support planning and decision making. Using storm surge flooding from various historical hurricanes, simulated using a coupled hydrodynamic wave model (ADCIRC-SWAN), we propose an integrated framework yielding a geospatial identification of the capacity of Chesapeake Bay wetlands to protect critical infrastructure. Spatial identification of Chesapeake Bay wetlands is derived from the National Wetlands Inventory (NWI), National Land Cover Database (NLCD), and the Coastal Change Analysis Program (C-CAP). Inventories of population and critical infrastructure are extracted from US Census block data and FEMA's HAZUS-Multi Hazard geodatabase. Geospatial and statistical analyses are carried out to develop a relationship between wetland land cover, hurricane flooding, population and infrastructure vulnerability. These analyses result in the identification and quantification of populations and infrastructure in flooded areas that lie within a reasonable buffer surrounding the identified wetlands. Our analysis thus produces a spatial perspective on the potential for wetlands to attenuate hurricane flood impacts in critical areas. Statistical analysis will support hypothesis testing to evaluate the benefits of wetlands from a flooding and storm-surge attenuation perspective. Results from geospatial analysis are used to identify where interactions with critical infrastructure are relevant in the Chesapeake Bay.

  13. Quantifying animal movement for caching foragers: the path identification index (PII) and cougars, Puma concolor.

    PubMed

    Ironside, Kirsten E; Mattson, David J; Theimer, Tad; Jansen, Brian; Holton, Brandon; Arundel, Terence; Peters, Michael; Sexton, Joseph O; Edwards, Thomas C

    2017-01-01

    Many studies of animal movement have focused on directed versus area-restricted movement, which rely on correlations between step-length and turn-angles and on stationarity through time to define behavioral states. Although these approaches might apply well to grazing in patchy landscapes, species that either feed for short periods on large, concentrated food sources or cache food exhibit movements that are difficult to model using the traditional metrics of turn-angle and step-length alone. We used GPS telemetry collected from a prey-caching predator, the cougar ( Puma concolor, Linnaeus ), to test whether combining metrics of site recursion, spatiotemporal clustering, speed, and turning into an index of movement using partial sums, improves the ability to identify caching behavior. The index was used to identify changes in movement characteristics over time and segment paths into behavioral classes. The identification of behaviors from the Path Identification Index (PII) was evaluated using field investigations of cougar activities at GPS locations. We tested for statistical stationarity across behaviors for use of topographic view-sheds. Changes in the frequency and duration of PII were useful for identifying seasonal activities such as migration, gestation, and denning. The comparison of field investigations of cougar activities to behavioral PII classes resulted in an overall classification accuracy of 81%. Changes in behaviors were reflected in cougars' use of topographic view-sheds, resulting in statistical nonstationarity over time, and revealed important aspects of hunting behavior. Incorporating metrics of site recursion and spatiotemporal clustering revealed the temporal structure in movements of a caching forager. The movement index PII, shows promise for identifying behaviors in species that frequently return to specific locations such as food caches, watering holes, or dens, and highlights the potential role memory and cognitive abilities play in determining animal movements.

  14. Terrestrial Environment (Climatic) Criteria Handbook for Use in Aerospace Vehicle Development

    NASA Technical Reports Server (NTRS)

    Johnson, Dale L.; Vaughan, William W.

    2004-01-01

    Aerospace Meteorology provides the identification of that aspect of meteorology that is concerned with the definition and modeling of atmospheric parameters for use in aerospace vehicle development, mission planning and operational capability assessments. One of the principal sources of this information is the NASA-HDBK-1001 "Terrestrial Environment (Climatic) Criteria Handbook for Use in Aerospace Vehicle Development'. This handbook was approved by the NASA Chief Engineer in 2000 as a NASA Preferred Technical Standard . Its technical contents were based on natural environment statistics/models and criteria developed mostly in the early 1990's. A task was approved to completely update the handbook to reflect the current state-of-the-art in the various terrestrial environment climatic areas.

  15. Statistics for the Relative Detectability of Chemicals in Weak Gaseous Plumes in LWIR Hyperspectral Imagery

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Metoyer, Candace N.; Walsh, Stephen J.; Tardiff, Mark F.

    2008-10-30

    The detection and identification of weak gaseous plumes using thermal imaging data is complicated by many factors. These include variability due to atmosphere, ground and plume temperature, and background clutter. This paper presents an analysis of one formulation of the physics-based model that describes the at-sensor observed radiance. The motivating question for the analyses performed in this paper is as follows. Given a set of backgrounds, is there a way to predict the background over which the probability of detecting a given chemical will be the highest? Two statistics were developed to address this question. These statistics incorporate data frommore » the long-wave infrared band to predict the background over which chemical detectability will be the highest. These statistics can be computed prior to data collection. As a preliminary exploration into the predictive ability of these statistics, analyses were performed on synthetic hyperspectral images. Each image contained one chemical (either carbon tetrachloride or ammonia) spread across six distinct background types. The statistics were used to generate predictions for the background ranks. Then, the predicted ranks were compared to the empirical ranks obtained from the analyses of the synthetic images. For the simplified images under consideration, the predicted and empirical ranks showed a promising amount of agreement. One statistic accurately predicted the best and worst background for detection in all of the images. Future work may include explorations of more complicated plume ingredients, background types, and noise structures.« less

  16. Identification of Patient Zero in Static and Temporal Networks: Robustness and Limitations

    NASA Astrophysics Data System (ADS)

    Antulov-Fantulin, Nino; Lančić, Alen; Šmuc, Tomislav; Štefančić, Hrvoje; Šikić, Mile

    2015-06-01

    Detection of patient zero can give new insights to epidemiologists about the nature of first transmissions into a population. In this Letter, we study the statistical inference problem of detecting the source of epidemics from a snapshot of spreading on an arbitrary network structure. By using exact analytic calculations and Monte Carlo estimators, we demonstrate the detectability limits for the susceptible-infected-recovered model, which primarily depend on the spreading process characteristics. Finally, we demonstrate the applicability of the approach in a case of a simulated sexually transmitted infection spreading over an empirical temporal network of sexual interactions.

  17. Healthcare Information Systems for the epidemiologic surveillance within the community.

    PubMed

    Diomidous, Marianna; Pistolis, John; Mechili, Aggelos; Kolokathi, Aikaterini; Zimeras, Stelios

    2013-01-01

    Public health and health care are important issues for developing countries and access to health care is a significant factor that contributes to a healthy population. In response to these issues, the World Health Organization (WHO) has been working on the development of methods and models for measuring physical accessibility to health care using several layers of information integrated in a GIS. This paper describes the methodological approach for the development of a real time electronic health record, based on the statistical and geographic information for the identification of various diseases and accidents that can happen in a specific place.

  18. Do employers reward physical attractiveness in transition countries?

    PubMed

    Mavisakalyan, Astghik

    2018-02-01

    This paper studies the labour market returns to physical attractiveness using data from three transition countries of the Caucasus: Armenia, Azerbaijan and Georgia. I estimate a large positive effect of attractive looks on males' probability of employment. Results from the most comprehensive model suggest a marginal effect of 11.1 percentage points. Using a partial identification approach, I show that this relationship is likely to be causal. After accounting for covariates, particularly measures of human capital, there is no evidence for a statistically significant link between females' attractiveness and employment. Copyright © 2017 Elsevier B.V. All rights reserved.

  19. SEDA: A software package for the Statistical Earthquake Data Analysis

    NASA Astrophysics Data System (ADS)

    Lombardi, A. M.

    2017-03-01

    In this paper, the first version of the software SEDA (SEDAv1.0), designed to help seismologists statistically analyze earthquake data, is presented. The package consists of a user-friendly Matlab-based interface, which allows the user to easily interact with the application, and a computational core of Fortran codes, to guarantee the maximum speed. The primary factor driving the development of SEDA is to guarantee the research reproducibility, which is a growing movement among scientists and highly recommended by the most important scientific journals. SEDAv1.0 is mainly devoted to produce accurate and fast outputs. Less care has been taken for the graphic appeal, which will be improved in the future. The main part of SEDAv1.0 is devoted to the ETAS modeling. SEDAv1.0 contains a set of consistent tools on ETAS, allowing the estimation of parameters, the testing of model on data, the simulation of catalogs, the identification of sequences and forecasts calculation. The peculiarities of routines inside SEDAv1.0 are discussed in this paper. More specific details on the software are presented in the manual accompanying the program package.

  20. SEDA: A software package for the Statistical Earthquake Data Analysis

    PubMed Central

    Lombardi, A. M.

    2017-01-01

    In this paper, the first version of the software SEDA (SEDAv1.0), designed to help seismologists statistically analyze earthquake data, is presented. The package consists of a user-friendly Matlab-based interface, which allows the user to easily interact with the application, and a computational core of Fortran codes, to guarantee the maximum speed. The primary factor driving the development of SEDA is to guarantee the research reproducibility, which is a growing movement among scientists and highly recommended by the most important scientific journals. SEDAv1.0 is mainly devoted to produce accurate and fast outputs. Less care has been taken for the graphic appeal, which will be improved in the future. The main part of SEDAv1.0 is devoted to the ETAS modeling. SEDAv1.0 contains a set of consistent tools on ETAS, allowing the estimation of parameters, the testing of model on data, the simulation of catalogs, the identification of sequences and forecasts calculation. The peculiarities of routines inside SEDAv1.0 are discussed in this paper. More specific details on the software are presented in the manual accompanying the program package. PMID:28290482

  1. [Is ultrasound equal to X-ray in pediatric fracture diagnosis?].

    PubMed

    Moritz, J D; Hoffmann, B; Meuser, S H; Sehr, D H; Caliebe, A; Heller, M

    2010-08-01

    Ultrasound is currently not established for the diagnosis of fractures. The aim of this study was to compare ultrasound and X-ray beyond their use solely for the identification of fractures, i. e., for the detection of fracture type and dislocation for pediatric fracture diagnosis. Limb bones of dead young pigs served as a model for pediatric bones. The fractured bones were examined with ultrasound, X-ray, and CT, which served as the gold standard. 162 of 248 bones were fractured. 130 fractures were identified using ultrasound, and 148 using X-ray. There were some advantages of X-ray over ultrasound in the detection of fracture type (80 correct results using X-ray, 66 correct results using ultrasound). Ultrasound, however, was superior to X-ray for dislocation identification (41 correct results using X-ray, 51 correct results using ultrasound). Both findings were not statistically significant after adjustment for multiple testing. Ultrasound not only has comparable sensitivity to that of X-ray for the identification of limb fractures but is also equally effective for the diagnosis of fracture type and dislocation. Thus, ultrasound can be used as an adequate alternative method to X-ray for pediatric fracture diagnosis. Georg Thieme Verlag KG Stuttgart, New York.

  2. An efficient identification approach for stable and unstable nonlinear systems using Colliding Bodies Optimization algorithm.

    PubMed

    Pal, Partha S; Kar, R; Mandal, D; Ghoshal, S P

    2015-11-01

    This paper presents an efficient approach to identify different stable and practically useful Hammerstein models as well as unstable nonlinear process along with its stable closed loop counterpart with the help of an evolutionary algorithm as Colliding Bodies Optimization (CBO) optimization algorithm. The performance measures of the CBO based optimization approach such as precision, accuracy are justified with the minimum output mean square value (MSE) which signifies that the amount of bias and variance in the output domain are also the least. It is also observed that the optimization of output MSE in the presence of outliers has resulted in a very close estimation of the output parameters consistently, which also justifies the effective general applicability of the CBO algorithm towards the system identification problem and also establishes the practical usefulness of the applied approach. Optimum values of the MSEs, computational times and statistical information of the MSEs are all found to be the superior as compared with those of the other existing similar types of stochastic algorithms based approaches reported in different recent literature, which establish the robustness and efficiency of the applied CBO based identification scheme. Copyright © 2015 ISA. Published by Elsevier Ltd. All rights reserved.

  3. Multi-species Identification of Polymorphic Peptide Variants via Propagation in Spectral Networks*

    PubMed Central

    Bandeira, Nuno

    2016-01-01

    Peptide and protein identification remains challenging in organisms with poorly annotated or rapidly evolving genomes, as are commonly encountered in environmental or biofuels research. Such limitations render tandem mass spectrometry (MS/MS) database search algorithms ineffective as they lack corresponding sequences required for peptide-spectrum matching. We address this challenge with the spectral networks approach to (1) match spectra of orthologous peptides across multiple related species and then (2) propagate peptide annotations from identified to unidentified spectra. We here present algorithms to assess the statistical significance of spectral alignments (Align-GF), reduce the impurity in spectral networks, and accurately estimate the error rate in propagated identifications. Analyzing three related Cyanothece species, a model organism for biohydrogen production, spectral networks identified peptides from highly divergent sequences from networks with dozens of variant peptides, including thousands of peptides in species lacking a sequenced genome. Our analysis further detected the presence of many novel putative peptides even in genomically characterized species, thus suggesting the possibility of gaps in our understanding of their proteomic and genomic expression. A web-based pipeline for spectral networks analysis is available at http://proteomics.ucsd.edu/software. PMID:27609420

  4. A View to the Future: A Novel Approach for 3D-3D Superimposition and Quantification of Differences for Identification from Next-Generation Video Surveillance Systems.

    PubMed

    Gibelli, Daniele; De Angelis, Danilo; Poppa, Pasquale; Sforza, Chiarella; Cattaneo, Cristina

    2017-03-01

    Techniques of 2D-3D superimposition are widely used in cases of personal identification from video surveillance systems. However, the progressive improvement of 3D image acquisition technology will enable operators to perform also 3D-3D facial superimposition. This study aims at analyzing the possible applications of 3D-3D superimposition to personal identification, although from a theoretical point of view. Twenty subjects underwent a facial 3D scan by stereophotogrammetry twice at different time periods. Scans were superimposed two by two according to nine landmarks, and root-mean-square (RMS) value of point-to-point distances was calculated. When the two superimposed models belonged to the same individual, RMS value was 2.10 mm, while it was 4.47 mm in mismatches with a statistically significant difference (p < 0.0001). This experiment shows the potential of 3D-3D superimposition: Further studies are needed to ascertain technical limits which may occur in practice and to improve methods useful in the forensic practice. © 2016 American Academy of Forensic Sciences.

  5. Two-dimensional correlation spectroscopy reveals the underlying compositions for FT-NIR identification of the medicinal bulbs of the genus Fritillaria

    NASA Astrophysics Data System (ADS)

    Chen, Jianbo; Wang, Yue; Liu, Aoxue; Rong, Lixin; Wang, Jingjuan

    2018-03-01

    Fritillariae Bulbus, the dried bulbs of several species of the genus Fritillaria, is often used in traditional Chinese medicine for the treatment of cough and pulmonary diseases. However, the similar appearances make it difficult to identify different kinds of Fritillariae Bulbus. In this research, Fourier transform near-infrared (FT-NIR) spectroscopy with a reflection fiber probe is employed for the direct testing and automatic identification of different kinds of Fritillariae Bulbus to ensure the authenticity, efficacy and safety. The bulbs can be measured directly without pulverizing. According to the two-dimensional (2D) correlation analysis and statistical analysis, the height ratio of the two peaks near 4860 cm-1 and 4750 cm-1 in the second derivative spectra is specific to the species of Fritillariae Bulbus. This indicates that the relative amount of protein and carbohydrate may be critical to identify Fritillariae Bulbus. With the help of the SIMCA model, the four kinds of Fritillariae Bulbus can be identified correctly by FT-NIR spectroscopy. The results show the potential of FT-NIR spectroscopy with a reflection fiber probe in the rapid testing and identification of Fritillariae Bulbus.

  6. Workflow for Criticality Assessment Applied in Biopharmaceutical Process Validation Stage 1.

    PubMed

    Zahel, Thomas; Marschall, Lukas; Abad, Sandra; Vasilieva, Elena; Maurer, Daniel; Mueller, Eric M; Murphy, Patrick; Natschläger, Thomas; Brocard, Cécile; Reinisch, Daniela; Sagmeister, Patrick; Herwig, Christoph

    2017-10-12

    Identification of critical process parameters that impact product quality is a central task during regulatory requested process validation. Commonly, this is done via design of experiments and identification of parameters significantly impacting product quality (rejection of the null hypothesis that the effect equals 0). However, parameters which show a large uncertainty and might result in an undesirable product quality limit critical to the product, may be missed. This might occur during the evaluation of experiments since residual/un-modelled variance in the experiments is larger than expected a priori. Estimation of such a risk is the task of the presented novel retrospective power analysis permutation test. This is evaluated using a data set for two unit operations established during characterization of a biopharmaceutical process in industry. The results show that, for one unit operation, the observed variance in the experiments is much larger than expected a priori, resulting in low power levels for all non-significant parameters. Moreover, we present a workflow of how to mitigate the risk associated with overlooked parameter effects. This enables a statistically sound identification of critical process parameters. The developed workflow will substantially support industry in delivering constant product quality, reduce process variance and increase patient safety.

  7. Automated Identification and Characterization of Secondary & Tertiary gamma’ Precipitates in Nickel-Based Superalloys (PREPRINT)

    DTIC Science & Technology

    2010-01-01

    and intensity information from the EFTEM images. The microstructural statistics obtained from the segmented γ’ precipitates agreed with those of the...is its ability to automate segmentation of precipitates in a reproducible manner for acquiring microstructural statistics that relate to both...were identified using a combination of visual inspection and intensity information from the EFTEM images. The microstructural statistics obtained

  8. General purpose simulation system of the data management system for Space Shuttle mission 18

    NASA Technical Reports Server (NTRS)

    Bengtson, N. M.; Mellichamp, J. M.; Smith, O. C.

    1976-01-01

    A simulation program for the flow of data through the Data Management System of Spacelab and Space Shuttle was presented. The science, engineering, command and guidance, navigation and control data were included. The programming language used was General Purpose Simulation System V (OS). The science and engineering data flow was modeled from its origin at the experiments and subsystems to transmission from Space Shuttle. Command data flow was modeled from the point of reception onboard and from the CDMS Control Panel to the experiments and subsystems. The GN&C data flow model handled data between the General Purpose Computer and the experiments and subsystems. Mission 18 was the particular flight chosen for simulation. The general structure of the program is presented, followed by a user's manual. Input data required to make runs are discussed followed by identification of the output statistics. The appendices contain a detailed model configuration, program listing and results.

  9. Computational Approaches to Chemical Hazard Assessment

    PubMed Central

    Luechtefeld, Thomas; Hartung, Thomas

    2018-01-01

    Summary Computational prediction of toxicity has reached new heights as a result of decades of growth in the magnitude and diversity of biological data. Public packages for statistics and machine learning make model creation faster. New theory in machine learning and cheminformatics enables integration of chemical structure, toxicogenomics, simulated and physical data in the prediction of chemical health hazards, and other toxicological information. Our earlier publications have characterized a toxicological dataset of unprecedented scale resulting from the European REACH legislation (Registration Evaluation Authorisation and Restriction of Chemicals). These publications dove into potential use cases for regulatory data and some models for exploiting this data. This article analyzes the options for the identification and categorization of chemicals, moves on to the derivation of descriptive features for chemicals, discusses different kinds of targets modeled in computational toxicology, and ends with a high-level perspective of the algorithms used to create computational toxicology models. PMID:29101769

  10. Mechanism-based model of a mass rapid transit system: A perspective

    NASA Astrophysics Data System (ADS)

    Legara, Erika Fille; Khoon, Lee Kee; Guang, Hung Gih; Monterola, Christopher

    2015-01-01

    In this paper, we discuss our findings on the spatiotemporal dynamics within the mass rapid transit (MRT) system of Singapore. We show that the trip distribution of Origin-Destination (OD) station pairs follows a power-law, implying the existence of critical OD pairs. We then present and discuss the empirically validated agent-based model (ABM) we have developed. The model allows recreation of the observed statistics and the setting up of various scenarios and their effects on the system, such as increasing the commuter population and the propagation of travel delays within the transportation network. The proposed model further enables identification of bottlenecks that can cause the MRT to break down, and consequently provide foresight on how such disruptions can possibly be managed. This can potentially provide a versatile approach for transport planners and government regulators to make quantifiable policies that optimally balance cost and convenience as a function of the number of the commuting public.

  11. Emotion Awareness and Identification Skills in Adolescent Girls with Bulimia Nervosa

    ERIC Educational Resources Information Center

    Sim, Leslie; Zeman, Janice

    2004-01-01

    This study examined emotion-identification skills in 19 adolescent girls (M age = 16 years, 8 months) diagnosed with a Diagnostic and Statistical Manual of Mental Disorders (4th ed. [DSM-IV], American Psychiatric Association, 1994) diagnosis of bulimia nervosa or eating disorder not otherwise specified in the bulimic spectrum, 19 age-matched girls…

  12. Complex network theory for the identification and assessment of candidate protein targets.

    PubMed

    McGarry, Ken; McDonald, Sharon

    2018-06-01

    In this work we use complex network theory to provide a statistical model of the connectivity patterns of human proteins and their interaction partners. Our intention is to identify important proteins that may be predisposed to be potential candidates as drug targets for therapeutic interventions. Target proteins usually have more interaction partners than non-target proteins, but there are no hard-and-fast rules for defining the actual number of interactions. We devise a statistical measure for identifying hub proteins, we score our target proteins with gene ontology annotations. The important druggable protein targets are likely to have similar biological functions that can be assessed for their potential therapeutic value. Our system provides a statistical analysis of the local and distant neighborhood protein interactions of the potential targets using complex network measures. This approach builds a more accurate model of drug-to-target activity and therefore the likely impact on treating diseases. We integrate high quality protein interaction data from the HINT database and disease associated proteins from the DrugTarget database. Other sources include biological knowledge from Gene Ontology and drug information from DrugBank. The problem is a very challenging one since the data is highly imbalanced between target proteins and the more numerous nontargets. We use undersampling on the training data and build Random Forest classifier models which are used to identify previously unclassified target proteins. We validate and corroborate these findings from the available literature. Copyright © 2018 Elsevier Ltd. All rights reserved.

  13. Characterizing the D2 statistic: word matches in biological sequences.

    PubMed

    Forêt, Sylvain; Wilson, Susan R; Burden, Conrad J

    2009-01-01

    Word matches are often used in sequence comparison methods, either as a measure of sequence similarity or in the first search steps of algorithms such as BLAST or BLAT. The D2 statistic is the number of matches of words of k letters between two sequences. Recent advances have been made in the characterization of this statistic and in the approximation of its distribution. Here, these results are extended to the case of approximate word matches. We compute the exact value of the variance of the D2 statistic for the case of a uniform letter distribution, and introduce a method to provide accurate approximations of the variance in the remaining cases. This enables the distribution of D2 to be approximated for typical situations arising in biological research. We apply these results to the identification of cis-regulatory modules, and show that this method detects such sequences with a high accuracy. The ability to approximate the distribution of D2 for both exact and approximate word matches will enable the use of this statistic in a more precise manner for sequence comparison, database searches, and identification of transcription factor binding sites.

  14. Identification and characterization of earthquake clusters: a comparative analysis for selected sequences in Italy

    NASA Astrophysics Data System (ADS)

    Peresan, Antonella; Gentili, Stefania

    2017-04-01

    Identification and statistical characterization of seismic clusters may provide useful insights about the features of seismic energy release and their relation to physical properties of the crust within a given region. Moreover, a number of studies based on spatio-temporal analysis of main-shocks occurrence require preliminary declustering of the earthquake catalogs. Since various methods, relying on different physical/statistical assumptions, may lead to diverse classifications of earthquakes into main events and related events, we aim to investigate the classification differences among different declustering techniques. Accordingly, a formal selection and comparative analysis of earthquake clusters is carried out for the most relevant earthquakes in North-Eastern Italy, as reported in the local OGS-CRS bulletins, compiled at the National Institute of Oceanography and Experimental Geophysics since 1977. The comparison is then extended to selected earthquake sequences associated with a different seismotectonic setting, namely to events that occurred in the region struck by the recent Central Italy destructive earthquakes, making use of INGV data. Various techniques, ranging from classical space-time windows methods to ad hoc manual identification of aftershocks, are applied for detection of earthquake clusters. In particular, a statistical method based on nearest-neighbor distances of events in space-time-energy domain, is considered. Results from clusters identification by the nearest-neighbor method turn out quite robust with respect to the time span of the input catalogue, as well as to minimum magnitude cutoff. The identified clusters for the largest events reported in North-Eastern Italy since 1977 are well consistent with those reported in earlier studies, which were aimed at detailed manual aftershocks identification. The study shows that the data-driven approach, based on the nearest-neighbor distances, can be satisfactorily applied to decompose the seismic catalog into background seismicity and individual sequences of earthquake clusters, also in areas characterized by moderate seismic activity, where the standard declustering techniques may turn out rather gross approximations. With these results acquired, the main statistical features of seismic clusters are explored, including complex interdependence of related events, with the aim to characterize the space-time patterns of earthquakes occurrence in North-Eastern Italy and capture their basic differences with Central Italy sequences.

  15. Integrative analysis of the Caenorhabditis elegans genome by the modENCODE project.

    PubMed

    Gerstein, Mark B; Lu, Zhi John; Van Nostrand, Eric L; Cheng, Chao; Arshinoff, Bradley I; Liu, Tao; Yip, Kevin Y; Robilotto, Rebecca; Rechtsteiner, Andreas; Ikegami, Kohta; Alves, Pedro; Chateigner, Aurelien; Perry, Marc; Morris, Mitzi; Auerbach, Raymond K; Feng, Xin; Leng, Jing; Vielle, Anne; Niu, Wei; Rhrissorrakrai, Kahn; Agarwal, Ashish; Alexander, Roger P; Barber, Galt; Brdlik, Cathleen M; Brennan, Jennifer; Brouillet, Jeremy Jean; Carr, Adrian; Cheung, Ming-Sin; Clawson, Hiram; Contrino, Sergio; Dannenberg, Luke O; Dernburg, Abby F; Desai, Arshad; Dick, Lindsay; Dosé, Andréa C; Du, Jiang; Egelhofer, Thea; Ercan, Sevinc; Euskirchen, Ghia; Ewing, Brent; Feingold, Elise A; Gassmann, Reto; Good, Peter J; Green, Phil; Gullier, Francois; Gutwein, Michelle; Guyer, Mark S; Habegger, Lukas; Han, Ting; Henikoff, Jorja G; Henz, Stefan R; Hinrichs, Angie; Holster, Heather; Hyman, Tony; Iniguez, A Leo; Janette, Judith; Jensen, Morten; Kato, Masaomi; Kent, W James; Kephart, Ellen; Khivansara, Vishal; Khurana, Ekta; Kim, John K; Kolasinska-Zwierz, Paulina; Lai, Eric C; Latorre, Isabel; Leahey, Amber; Lewis, Suzanna; Lloyd, Paul; Lochovsky, Lucas; Lowdon, Rebecca F; Lubling, Yaniv; Lyne, Rachel; MacCoss, Michael; Mackowiak, Sebastian D; Mangone, Marco; McKay, Sheldon; Mecenas, Desirea; Merrihew, Gennifer; Miller, David M; Muroyama, Andrew; Murray, John I; Ooi, Siew-Loon; Pham, Hoang; Phippen, Taryn; Preston, Elicia A; Rajewsky, Nikolaus; Rätsch, Gunnar; Rosenbaum, Heidi; Rozowsky, Joel; Rutherford, Kim; Ruzanov, Peter; Sarov, Mihail; Sasidharan, Rajkumar; Sboner, Andrea; Scheid, Paul; Segal, Eran; Shin, Hyunjin; Shou, Chong; Slack, Frank J; Slightam, Cindie; Smith, Richard; Spencer, William C; Stinson, E O; Taing, Scott; Takasaki, Teruaki; Vafeados, Dionne; Voronina, Ksenia; Wang, Guilin; Washington, Nicole L; Whittle, Christina M; Wu, Beijing; Yan, Koon-Kiu; Zeller, Georg; Zha, Zheng; Zhong, Mei; Zhou, Xingliang; Ahringer, Julie; Strome, Susan; Gunsalus, Kristin C; Micklem, Gos; Liu, X Shirley; Reinke, Valerie; Kim, Stuart K; Hillier, LaDeana W; Henikoff, Steven; Piano, Fabio; Snyder, Michael; Stein, Lincoln; Lieb, Jason D; Waterston, Robert H

    2010-12-24

    We systematically generated large-scale data sets to improve genome annotation for the nematode Caenorhabditis elegans, a key model organism. These data sets include transcriptome profiling across a developmental time course, genome-wide identification of transcription factor-binding sites, and maps of chromatin organization. From this, we created more complete and accurate gene models, including alternative splice forms and candidate noncoding RNAs. We constructed hierarchical networks of transcription factor-binding and microRNA interactions and discovered chromosomal locations bound by an unusually large number of transcription factors. Different patterns of chromatin composition and histone modification were revealed between chromosome arms and centers, with similarly prominent differences between autosomes and the X chromosome. Integrating data types, we built statistical models relating chromatin, transcription factor binding, and gene expression. Overall, our analyses ascribed putative functions to most of the conserved genome.

  16. A basket two-part model to analyze medical expenditure on interdependent multiple sectors.

    PubMed

    Sugawara, Shinya; Wu, Tianyi; Yamanishi, Kenji

    2018-05-01

    This study proposes a novel statistical methodology to analyze expenditure on multiple medical sectors using consumer data. Conventionally, medical expenditure has been analyzed by two-part models, which separately consider purchase decision and amount of expenditure. We extend the traditional two-part models by adding the step of basket analysis for dimension reduction. This new step enables us to analyze complicated interdependence between multiple sectors without an identification problem. As an empirical application for the proposed method, we analyze data of 13 medical sectors from the Medical Expenditure Panel Survey. In comparison with the results of previous studies that analyzed the multiple sector independently, our method provides more detailed implications of the impacts of individual socioeconomic status on the composition of joint purchases from multiple medical sectors; our method has a better prediction performance.

  17. Causality

    NASA Astrophysics Data System (ADS)

    Pearl, Judea

    2000-03-01

    Written by one of the pre-eminent researchers in the field, this book provides a comprehensive exposition of modern analysis of causation. It shows how causality has grown from a nebulous concept into a mathematical theory with significant applications in the fields of statistics, artificial intelligence, philosophy, cognitive science, and the health and social sciences. Pearl presents a unified account of the probabilistic, manipulative, counterfactual and structural approaches to causation, and devises simple mathematical tools for analyzing the relationships between causal connections, statistical associations, actions and observations. The book will open the way for including causal analysis in the standard curriculum of statistics, artifical intelligence, business, epidemiology, social science and economics. Students in these areas will find natural models, simple identification procedures, and precise mathematical definitions of causal concepts that traditional texts have tended to evade or make unduly complicated. This book will be of interest to professionals and students in a wide variety of fields. Anyone who wishes to elucidate meaningful relationships from data, predict effects of actions and policies, assess explanations of reported events, or form theories of causal understanding and causal speech will find this book stimulating and invaluable.

  18. PEPA test: fast and powerful differential analysis from relative quantitative proteomics data using shared peptides.

    PubMed

    Jacob, Laurent; Combes, Florence; Burger, Thomas

    2018-06-18

    We propose a new hypothesis test for the differential abundance of proteins in mass-spectrometry based relative quantification. An important feature of this type of high-throughput analyses is that it involves an enzymatic digestion of the sample proteins into peptides prior to identification and quantification. Due to numerous homology sequences, different proteins can lead to peptides with identical amino acid chains, so that their parent protein is ambiguous. These so-called shared peptides make the protein-level statistical analysis a challenge and are often not accounted for. In this article, we use a linear model describing peptide-protein relationships to build a likelihood ratio test of differential abundance for proteins. We show that the likelihood ratio statistic can be computed in linear time with the number of peptides. We also provide the asymptotic null distribution of a regularized version of our statistic. Experiments on both real and simulated datasets show that our procedures outperforms state-of-the-art methods. The procedures are available via the pepa.test function of the DAPAR Bioconductor R package.

  19. The Sport Students’ Ability of Literacy and Statistical Reasoning

    NASA Astrophysics Data System (ADS)

    Hidayah, N.

    2017-03-01

    The ability of literacy and statistical reasoning is very important for the students of sport education college due to the materials of statistical learning can be taken from their many activities such as sport competition, the result of test and measurement, predicting achievement based on training, finding connection among variables, and others. This research tries to describe the sport education college students’ ability of literacy and statistical reasoning related to the identification of data type, probability, table interpretation, description and explanation by using bar or pie graphic, explanation of variability, interpretation, the calculation and explanation of mean, median, and mode through an instrument. This instrument is tested to 50 college students majoring in sport resulting only 26% of all students have the ability above 30% while others still below 30%. Observing from all subjects; 56% of students have the ability of identification data classification, 49% of students have the ability to read, display and interpret table through graphic, 27% students have the ability in probability, 33% students have the ability to describe variability, and 16.32% students have the ability to read, count and describe mean, median and mode. The result of this research shows that the sport students’ ability of literacy and statistical reasoning has not been adequate and students’ statistical study has not reached comprehending concept, literary ability trining and statistical rasoning, so it is critical to increase the sport students’ ability of literacy and statistical reasoning

  20. The Role of Occupational Identification During Post-Merger Integration

    PubMed Central

    Kroon, David P.; Noorderhaven, Niels G.

    2016-01-01

    Integration processes after mergers are fraught with difficulties, and constitute a main cause of merger failure. This study focuses on the human aspect of post-merger integration, and in particular, on the role of occupational identification. We theorize and empirically demonstrate by means of a survey design that employees’ identification with their occupation is positively related to their willingness to cooperate in the post-merger integration process, over and above the effect of organization members’ organizational identification. This positive effect of occupational identification is stronger for uniformed personnel but attenuates in the course of the integration process. Qualitative interviews further explore and interpret the results from our statistical analysis. Together, these findings have important practical implications and suggest future research directions. PMID:29568214

  1. Information-dependent enrichment analysis reveals time-dependent transcriptional regulation of the estrogen pathway of toxicity.

    PubMed

    Pendse, Salil N; Maertens, Alexandra; Rosenberg, Michael; Roy, Dipanwita; Fasani, Rick A; Vantangoli, Marguerite M; Madnick, Samantha J; Boekelheide, Kim; Fornace, Albert J; Odwin, Shelly-Ann; Yager, James D; Hartung, Thomas; Andersen, Melvin E; McMullen, Patrick D

    2017-04-01

    The twenty-first century vision for toxicology involves a transition away from high-dose animal studies to in vitro and computational models (NRC in Toxicity testing in the 21st century: a vision and a strategy, The National Academies Press, Washington, DC, 2007). This transition requires mapping pathways of toxicity by understanding how in vitro systems respond to chemical perturbation. Uncovering transcription factors/signaling networks responsible for gene expression patterns is essential for defining pathways of toxicity, and ultimately, for determining the chemical modes of action through which a toxicant acts. Traditionally, transcription factor identification is achieved via chromatin immunoprecipitation studies and summarized by calculating which transcription factors are statistically associated with up- and downregulated genes. These lists are commonly determined via statistical or fold-change cutoffs, a procedure that is sensitive to statistical power and may not be as useful for determining transcription factor associations. To move away from an arbitrary statistical or fold-change-based cutoff, we developed, in the context of the Mapping the Human Toxome project, an enrichment paradigm called information-dependent enrichment analysis (IDEA) to guide identification of the transcription factor network. We used a test case of activation in MCF-7 cells by 17β estradiol (E2). Using this new approach, we established a time course for transcriptional and functional responses to E2. ERα and ERβ were associated with short-term transcriptional changes in response to E2. Sustained exposure led to recruitment of additional transcription factors and alteration of cell cycle machinery. TFAP2C and SOX2 were the transcription factors most highly correlated with dose. E2F7, E2F1, and Foxm1, which are involved in cell proliferation, were enriched only at 24 h. IDEA should be useful for identifying candidate pathways of toxicity. IDEA outperforms gene set enrichment analysis (GSEA) and provides similar results to weighted gene correlation network analysis, a platform that helps to identify genes not annotated to pathways.

  2. Constructing Compact Takagi-Sugeno Rule Systems: Identification of Complex Interactions in Epidemiological Data

    PubMed Central

    Zhou, Shang-Ming; Lyons, Ronan A.; Brophy, Sinead; Gravenor, Mike B.

    2012-01-01

    The Takagi-Sugeno (TS) fuzzy rule system is a widely used data mining technique, and is of particular use in the identification of non-linear interactions between variables. However the number of rules increases dramatically when applied to high dimensional data sets (the curse of dimensionality). Few robust methods are available to identify important rules while removing redundant ones, and this results in limited applicability in fields such as epidemiology or bioinformatics where the interaction of many variables must be considered. Here, we develop a new parsimonious TS rule system. We propose three statistics: R, L, and ω-values, to rank the importance of each TS rule, and a forward selection procedure to construct a final model. We use our method to predict how key components of childhood deprivation combine to influence educational achievement outcome. We show that a parsimonious TS model can be constructed, based on a small subset of rules, that provides an accurate description of the relationship between deprivation indices and educational outcomes. The selected rules shed light on the synergistic relationships between the variables, and reveal that the effect of targeting specific domains of deprivation is crucially dependent on the state of the other domains. Policy decisions need to incorporate these interactions, and deprivation indices should not be considered in isolation. The TS rule system provides a basis for such decision making, and has wide applicability for the identification of non-linear interactions in complex biomedical data. PMID:23272108

  3. Constructing compact Takagi-Sugeno rule systems: identification of complex interactions in epidemiological data.

    PubMed

    Zhou, Shang-Ming; Lyons, Ronan A; Brophy, Sinead; Gravenor, Mike B

    2012-01-01

    The Takagi-Sugeno (TS) fuzzy rule system is a widely used data mining technique, and is of particular use in the identification of non-linear interactions between variables. However the number of rules increases dramatically when applied to high dimensional data sets (the curse of dimensionality). Few robust methods are available to identify important rules while removing redundant ones, and this results in limited applicability in fields such as epidemiology or bioinformatics where the interaction of many variables must be considered. Here, we develop a new parsimonious TS rule system. We propose three statistics: R, L, and ω-values, to rank the importance of each TS rule, and a forward selection procedure to construct a final model. We use our method to predict how key components of childhood deprivation combine to influence educational achievement outcome. We show that a parsimonious TS model can be constructed, based on a small subset of rules, that provides an accurate description of the relationship between deprivation indices and educational outcomes. The selected rules shed light on the synergistic relationships between the variables, and reveal that the effect of targeting specific domains of deprivation is crucially dependent on the state of the other domains. Policy decisions need to incorporate these interactions, and deprivation indices should not be considered in isolation. The TS rule system provides a basis for such decision making, and has wide applicability for the identification of non-linear interactions in complex biomedical data.

  4. The Essential Genome of Escherichia coli K-12.

    PubMed

    Goodall, Emily C A; Robinson, Ashley; Johnston, Iain G; Jabbari, Sara; Turner, Keith A; Cunningham, Adam F; Lund, Peter A; Cole, Jeffrey A; Henderson, Ian R

    2018-02-20

    Transposon-directed insertion site sequencing (TraDIS) is a high-throughput method coupling transposon mutagenesis with short-fragment DNA sequencing. It is commonly used to identify essential genes. Single gene deletion libraries are considered the gold standard for identifying essential genes. Currently, the TraDIS method has not been benchmarked against such libraries, and therefore, it remains unclear whether the two methodologies are comparable. To address this, a high-density transposon library was constructed in Escherichia coli K-12. Essential genes predicted from sequencing of this library were compared to existing essential gene databases. To decrease false-positive identification of essential genes, statistical data analysis included corrections for both gene length and genome length. Through this analysis, new essential genes and genes previously incorrectly designated essential were identified. We show that manual analysis of TraDIS data reveals novel features that would not have been detected by statistical analysis alone. Examples include short essential regions within genes, orientation-dependent effects, and fine-resolution identification of genome and protein features. Recognition of these insertion profiles in transposon mutagenesis data sets will assist genome annotation of less well characterized genomes and provides new insights into bacterial physiology and biochemistry. IMPORTANCE Incentives to define lists of genes that are essential for bacterial survival include the identification of potential targets for antibacterial drug development, genes required for rapid growth for exploitation in biotechnology, and discovery of new biochemical pathways. To identify essential genes in Escherichia coli , we constructed a transposon mutant library of unprecedented density. Initial automated analysis of the resulting data revealed many discrepancies compared to the literature. We now report more extensive statistical analysis supported by both literature searches and detailed inspection of high-density TraDIS sequencing data for each putative essential gene for the E. coli model laboratory organism. This paper is important because it provides a better understanding of the essential genes of E. coli , reveals the limitations of relying on automated analysis alone, and provides a new standard for the analysis of TraDIS data. Copyright © 2018 Goodall et al.

  5. A Novel Approach for Adaptive Signal Processing

    NASA Technical Reports Server (NTRS)

    Chen, Ya-Chin; Juang, Jer-Nan

    1998-01-01

    Adaptive linear predictors have been used extensively in practice in a wide variety of forms. In the main, their theoretical development is based upon the assumption of stationarity of the signals involved, particularly with respect to the second order statistics. On this basis, the well-known normal equations can be formulated. If high- order statistical stationarity is assumed, then the equivalent normal equations involve high-order signal moments. In either case, the cross moments (second or higher) are needed. This renders the adaptive prediction procedure non-blind. A novel procedure for blind adaptive prediction has been proposed and considerable implementation has been made in our contributions in the past year. The approach is based upon a suitable interpretation of blind equalization methods that satisfy the constant modulus property and offers significant deviations from the standard prediction methods. These blind adaptive algorithms are derived by formulating Lagrange equivalents from mechanisms of constrained optimization. In this report, other new update algorithms are derived from the fundamental concepts of advanced system identification to carry out the proposed blind adaptive prediction. The results of the work can be extended to a number of control-related problems, such as disturbance identification. The basic principles are outlined in this report and differences from other existing methods are discussed. The applications implemented are speech processing, such as coding and synthesis. Simulations are included to verify the novel modelling method.

  6. The usefulness of administrative databases for identifying disease cohorts is increased with a multivariate model.

    PubMed

    van Walraven, Carl; Austin, Peter C; Manuel, Douglas; Knoll, Greg; Jennings, Allison; Forster, Alan J

    2010-12-01

    Administrative databases commonly use codes to indicate diagnoses. These codes alone are often inadequate to accurately identify patients with particular conditions. In this study, we determined whether we could quantify the probability that a person has a particular disease-in this case renal failure-using other routinely collected information available in an administrative data set. This would allow the accurate identification of a disease cohort in an administrative database. We determined whether patients in a randomly selected 100,000 hospitalizations had kidney disease (defined as two or more sequential serum creatinines or the single admission creatinine indicating a calculated glomerular filtration rate less than 60 mL/min/1.73 m²). The independent association of patient- and hospitalization-level variables with renal failure was measured using a multivariate logistic regression model in a random 50% sample of the patients. The model was validated in the remaining patients. Twenty thousand seven hundred thirteen patients had kidney disease (20.7%). A diagnostic code of kidney disease was strongly associated with kidney disease (relative risk: 34.4), but the accuracy of the code was poor (sensitivity: 37.9%; specificity: 98.9%). Twenty-nine patient- and hospitalization-level variables entered the kidney disease model. This model had excellent discrimination (c-statistic: 90.1%) and accurately predicted the probability of true renal failure. The probability threshold that maximized sensitivity and specificity for the identification of true kidney disease was 21.3% (sensitivity: 80.0%; specificity: 82.2%). Multiple variables available in administrative databases can be combined to quantify the probability that a person has a particular disease. This process permits accurate identification of a disease cohort in an administrative database. These methods may be extended to other diagnoses or procedures and could both facilitate and clarify the use of administrative databases for research and quality improvement. Copyright © 2010 Elsevier Inc. All rights reserved.

  7. A versatile mathematical work-flow to explore how Cancer Stem Cell fate influences tumor progression.

    PubMed

    Fornari, Chiara; Balbo, Gianfranco; Halawani, Sami M; Ba-Rukab, Omar; Ahmad, Ab Rahman; Calogero, Raffaele A; Cordero, Francesca; Beccuti, Marco

    2015-01-01

    Nowadays multidisciplinary approaches combining mathematical models with experimental assays are becoming relevant for the study of biological systems. Indeed, in cancer research multidisciplinary approaches are successfully used to understand the crucial aspects implicated in tumor growth. In particular, the Cancer Stem Cell (CSC) biology represents an area particularly suited to be studied through multidisciplinary approaches, and modeling has significantly contributed to pinpoint the crucial aspects implicated in this theory. More generally, to acquire new insights on a biological system it is necessary to have an accurate description of the phenomenon, such that making accurate predictions on its future behaviors becomes more likely. In this context, the identification of the parameters influencing model dynamics can be advantageous to increase model accuracy and to provide hints in designing wet experiments. Different techniques, ranging from statistical methods to analytical studies, have been developed. Their applications depend on case-specific aspects, such as the availability and quality of experimental data, and the dimension of the parameter space. The study of a new model on the CSC-based tumor progression has been the motivation to design a new work-flow that helps to characterize possible system dynamics and to identify those parameters influencing such behaviors. In detail, we extended our recent model on CSC-dynamics creating a new system capable of describing tumor growth during the different stages of cancer progression. Indeed, tumor cells appear to progress through lineage stages like those of normal tissues, being their division auto-regulated by internal feedback mechanisms. These new features have introduced some non-linearities in the model, making it more difficult to be studied by solely analytical techniques. Our new work-flow, based on statistical methods, was used to identify the parameters which influence the tumor growth. The effectiveness of the presented work-flow was firstly verified on two well known models and then applied to investigate our extended CSC model. We propose a new work-flow to study in a practical and informative way complex systems, allowing an easy identification, interpretation, and visualization of the key model parameters. Our methodology is useful to investigate possible model behaviors and to establish factors driving model dynamics. Analyzing our new CSC model guided by the proposed work-flow, we found that the deregulation of CSC asymmetric proliferation contributes to cancer initiation, in accordance with several experimental evidences. Specifically, model results indicated that the probability of CSC symmetric proliferation is responsible of a switching-like behavior which discriminates between tumorigenesis and unsustainable tumor growth.

  8. Statistics of Shared Components in Complex Component Systems

    NASA Astrophysics Data System (ADS)

    Mazzolini, Andrea; Gherardi, Marco; Caselle, Michele; Cosentino Lagomarsino, Marco; Osella, Matteo

    2018-04-01

    Many complex systems are modular. Such systems can be represented as "component systems," i.e., sets of elementary components, such as LEGO bricks in LEGO sets. The bricks found in a LEGO set reflect a target architecture, which can be built following a set-specific list of instructions. In other component systems, instead, the underlying functional design and constraints are not obvious a priori, and their detection is often a challenge of both scientific and practical importance, requiring a clear understanding of component statistics. Importantly, some quantitative invariants appear to be common to many component systems, most notably a common broad distribution of component abundances, which often resembles the well-known Zipf's law. Such "laws" affect in a general and nontrivial way the component statistics, potentially hindering the identification of system-specific functional constraints or generative processes. Here, we specifically focus on the statistics of shared components, i.e., the distribution of the number of components shared by different system realizations, such as the common bricks found in different LEGO sets. To account for the effects of component heterogeneity, we consider a simple null model, which builds system realizations by random draws from a universe of possible components. Under general assumptions on abundance heterogeneity, we provide analytical estimates of component occurrence, which quantify exhaustively the statistics of shared components. Surprisingly, this simple null model can positively explain important features of empirical component-occurrence distributions obtained from large-scale data on bacterial genomes, LEGO sets, and book chapters. Specific architectural features and functional constraints can be detected from occurrence patterns as deviations from these null predictions, as we show for the illustrative case of the "core" genome in bacteria.

  9. A Non-Destructive Method for Distinguishing Reindeer Antler (Rangifer tarandus) from Red Deer Antler (Cervus elaphus) Using X-Ray Micro-Tomography Coupled with SVM Classifiers

    PubMed Central

    Lefebvre, Alexandre; Rochefort, Gael Y.; Santos, Frédéric; Le Denmat, Dominique; Salmon, Benjamin; Pétillon, Jean-Marc

    2016-01-01

    Over the last decade, biomedical 3D-imaging tools have gained widespread use in the analysis of prehistoric bone artefacts. While initial attempts to characterise the major categories used in osseous industry (i.e. bone, antler, and dentine/ivory) have been successful, the taxonomic determination of prehistoric artefacts remains to be investigated. The distinction between reindeer and red deer antler can be challenging, particularly in cases of anthropic and/or taphonomic modifications. In addition to the range of destructive physicochemical identification methods available (mass spectrometry, isotopic ratio, and DNA analysis), X-ray micro-tomography (micro-CT) provides convincing non-destructive 3D images and analyses. This paper presents the experimental protocol (sample scans, image processing, and statistical analysis) we have developed in order to identify modern and archaeological antler collections (from Isturitz, France). This original method is based on bone microstructure analysis combined with advanced statistical support vector machine (SVM) classifiers. A combination of six microarchitecture biomarkers (bone volume fraction, trabecular number, trabecular separation, trabecular thickness, trabecular bone pattern factor, and structure model index) were screened using micro-CT in order to characterise internal alveolar structure. Overall, reindeer alveoli presented a tighter mesh than red deer alveoli, and statistical analysis allowed us to distinguish archaeological antler by species with an accuracy of 96%, regardless of anatomical location on the antler. In conclusion, micro-CT combined with SVM classifiers proves to be a promising additional non-destructive method for antler identification, suitable for archaeological artefacts whose degree of human modification and cultural heritage or scientific value has previously made it impossible (tools, ornaments, etc.). PMID:26901355

  10. Identification of biogeochemical hot spots using time-lapse hydrogeophysics

    NASA Astrophysics Data System (ADS)

    Franz, T. E.; Loecke, T.; Burgin, A.

    2016-12-01

    The identification and monitoring of biogeochemical hot spots and hot moments is difficult using point based sampling techniques and sensors. Without proper monitoring and accounting of water, energy, and trace gas fluxes it is difficult to assess the environmental footprint of land management practices. One key limitation is optimal placement of sensors/chambers that adequately capture the point scale fluxes and thus a reasonable integration to landscape scale flux. In this work we present time-lapse hydrogeophysical imaging at an old agricultural field converted into a wetland mitigation bank near Dayton, Ohio. While the wetland was previously instrumented with a network of soil sensors and surface chambers to capture a suite of state variables and fluxes, we hypothesize that time-lapse hydrogeophysical imaging is an underutilized and critical reconnaissance tool for effective network design and landscape scaling. Here we combine the time-lapse hydrogeophysical imagery with the multivariate statistical technique of Empirical Orthogonal Functions (EOF) in order to isolate the spatial and temporal components of the imagery. Comparisons of soil core information (e.g. soil texture, soil carbon) from around the study site and organized within like spatial zones reveal statistically different mean values of soil properties. Moreover, the like spatial zones can be used to identify a finite number of future sampling locations, evaluation of the placement of existing sensors/chambers, upscale/downscale observations, all of which are desirable techniques for commercial use in precision agriculture. Finally, we note that combining the EOF analysis with continuous monitoring from point sensors or remote sensing products may provide a robust statistical framework for scaling observations through time as well as provide appropriate datasets for use in landscape biogeochemical models.

  11. Identification of Differential Item Functioning in Multiple-Group Settings: A Multivariate Outlier Detection Approach

    ERIC Educational Resources Information Center

    Magis, David; De Boeck, Paul

    2011-01-01

    We focus on the identification of differential item functioning (DIF) when more than two groups of examinees are considered. We propose to consider items as elements of a multivariate space, where DIF items are outlying elements. Following this approach, the situation of multiple groups is a quite natural case. A robust statistics technique is…

  12. Deriving injury risk curves using survival analysis from biomechanical experiments.

    PubMed

    Yoganandan, Narayan; Banerjee, Anjishnu; Hsu, Fang-Chi; Bass, Cameron R; Voo, Liming; Pintar, Frank A; Gayzik, F Scott

    2016-10-03

    Injury risk curves from biomechanical experimental data analysis are used in automotive studies to improve crashworthiness and advance occupant safety. Metrics such as acceleration and deflection coupled with outcomes such as fractures and anatomical disruptions from impact tests are used in simple binary regression models. As an improvement, the International Standards Organization suggested a different approach. It was based on survival analysis. While probability curves for side-impact-induced thorax and abdominal injuries and frontal impact-induced foot-ankle-leg injuries are developed using this approach, deficiencies are apparent. The objective of this study is to present an improved, robust and generalizable methodology in an attempt to resolve these issues. It includes: (a) statistical identification of the most appropriate independent variable (metric) from a pool of candidate metrics, measured and or derived during experimentation and analysis processes, based on the highest area under the receiver operator curve, (b) quantitative determination of the most optimal probability distribution based on the lowest Akaike information criterion, (c) supplementing the qualitative/visual inspection method for comparing the selected distribution with a non-parametric distribution with objective measures, (d) identification of overly influential observations using different methods, and (e) estimation of confidence intervals using techniques more appropriate to the underlying survival statistical model. These clear and quantified details can be easily implemented with commercial/open source packages. They can be used in retrospective analysis and prospective design of experiments, and in applications to different loading scenarios such as underbody blast events. The feasibility of the methodology is demonstrated using post mortem human subject experiments and 24 metrics associated with thoracic/abdominal injuries in side-impacts. Published by Elsevier Ltd.

  13. A Model-Based Approach to Infer Shifts in Regional Fire Regimes Over Time Using Sediment Charcoal Records

    NASA Astrophysics Data System (ADS)

    Itter, M.; Finley, A. O.; Hooten, M.; Higuera, P. E.; Marlon, J. R.; McLachlan, J. S.; Kelly, R.

    2016-12-01

    Sediment charcoal records are used in paleoecological analyses to identify individual local fire events and to estimate fire frequency and regional biomass burned at centennial to millenial time scales. Methods to identify local fire events based on sediment charcoal records have been well developed over the past 30 years, however, an integrated statistical framework for fire identification is still lacking. We build upon existing paleoecological methods to develop a hierarchical Bayesian point process model for local fire identification and estimation of fire return intervals. The model is unique in that it combines sediment charcoal records from multiple lakes across a region in a spatially-explicit fashion leading to estimation of a joint, regional fire return interval in addition to lake-specific local fire frequencies. Further, the model estimates a joint regional charcoal deposition rate free from the effects of local fires that can be used as a measure of regional biomass burned over time. Finally, the hierarchical Bayesian approach allows for tractable error propagation such that estimates of fire return intervals reflect the full range of uncertainty in sediment charcoal records. Specific sources of uncertainty addressed include sediment age models, the separation of local versus regional charcoal sources, and generation of a composite charcoal record The model is applied to sediment charcoal records from a dense network of lakes in the Yukon Flats region of Alaska. The multivariate joint modeling approach results in improved estimates of regional charcoal deposition with reduced uncertainty in the identification of individual fire events and local fire return intervals compared to individual lake approaches. Modeled individual-lake fire return intervals range from 100 to 500 years with a regional interval of roughly 200 years. Regional charcoal deposition to the network of lakes is correlated up to 50 kilometers. Finally, the joint regional charcoal deposition rate exhibits changes over time coincident with major climatic and vegetation shifts over the past 10,000 years. Ongoing work will use the regional charcoal deposition rate to estimate changes in biomass burned as a function of climate variability and regional vegetation pattern.

  14. Identification badges: a potential fomite?

    PubMed

    Ota, Kaede; Profiti, Raffaela; Smaill, Fiona; Matlow, Anne G; Smieja, Marek

    2007-01-01

    Staff identification badges are mandatory in all hospitals. The purpose of this study was to assess microbial contamination of identification badges at a Canadian tertiary centre. Risk factors for badge contamination were also investigated. Badges were cultured from 118 subjects including secretaries, physicians, nurses, and allied health workers. Subjects also completed a demographic questionnaire. Badge contamination was analyzed according to profession, workplace, duration of badge use, presence of a plastic cover, how the badge was worn, and cleaning frequency. 13.6% of the badges were contaminated with significant pathogens. S. aureus was isolated in 6.8% of the badges, gram-negative bacilli in 5.9%. Contamination was highest in nurses (21.4% versus 9.4-14.3% in other professions) and in the ICU (22.6% versus 8.3%-14.3% at other locations). Neither association was statistically significant. Covered and non-covered badges had similar contamination rates (12% and 17.1%) as did badges worn around the neck compared with those worn clipped to clothing (13.0% versus 14.6%). Contamination of recently cleaned badges was not statistically different from those that had not. Identification badges do not appear to be a major reservoir for pathogenic organisms. Badges can, however, harbour disease-causing organisms and should be cleaned regularly.

  15. Estimation of survival of adult Florida manatees in the Crystal River, at Blue Spring, and on the Atlantic Coast

    USGS Publications Warehouse

    O'Shea, Thomas J.; Langtimm, Catherine A.; O'Shea, Thomas J.; Ackerman, B.B.; Percival, H. Franklin

    1995-01-01

    We applied Cormack-Jolly-Seber open population models to manatee (Trichechus manatus latirostris) photo-identification databases to estimate adult survival probabilities. The computer programs JOLLY and RECAPCO were used to estimate survival of 677 individuals in three study areas: Crystal River (winters 1977-78 to 1990-91), Blue Spring (winters 1977-78 to 1990-91), and the Atlantic Coast (winters 1984-85 to 1990-91). We also estimated annual survival from observations of 111 manatees tagged for studies with radiotelemetry. Survival estimated from observations with telemetry had broader confidence intervals than survival estimated with the Cormack-Jolly-Seber models. Annual probabilities of capture based on photo-identification records were generally high. The mean annual adult survival estimated from sighting-resighting records was 0.959-0.962 in the Crystal River and 0.936-0.948 at Blue Spring and may be high enough to permit population growth, given the values of other life-history parameters. On the Atlantic Coast, the estimated annual adult survival (range of means = 0.877-0.885) may signify a declining population. However, for several reasons, interpretation of data from the latter study group should be tempered with caution. Adult survivorship seems to be constant with age in all three study groups. No strong differences were apparent between adult survival ofmales and females in the Crystal River or at Blue Spring; the basis of significant differences between sexes on the Atlantic Coast is unclear. Future research into estimating survival with photo-identification and the Cormack-Jolly-Seber models should be vigorously pursued. Estimates of annual survival can provide an additional indication of Florida manatee population status with a stronger statistical basis than aerial counts and carcass totals.

  16. Analysis of select Dalbergia and trade timber using direct analysis in real time and time-of-flight mass spectrometry for CITES enforcement.

    PubMed

    Lancaster, Cady; Espinoza, Edgard

    2012-05-15

    International trade of several Dalbergia wood species is regulated by The Convention on International Trade in Endangered Species of Wild Fauna and Flora (CITES). In order to supplement morphological identification of these species, a rapid chemical method of analysis was developed. Using Direct Analysis in Real Time (DART) ionization coupled with Time-of-Flight (TOF) Mass Spectrometry (MS), selected Dalbergia and common trade species were analyzed. Each of the 13 wood species was classified using principal component analysis and linear discriminant analysis (LDA). These statistical data clusters served as reliable anchors for species identification of unknowns. Analysis of 20 or more samples from the 13 species studied in this research indicates that the DART-TOFMS results are reproducible. Statistical analysis of the most abundant ions gave good classifications that were useful for identifying unknown wood samples. DART-TOFMS and LDA analysis of 13 species of selected timber samples and the statistical classification allowed for the correct assignment of unknown wood samples. This method is rapid and can be useful when anatomical identification is difficult but needed in order to support CITES enforcement. Published 2012. This article is a US Government work and is in the public domain in the USA.

  17. Identification of propulsion systems

    NASA Technical Reports Server (NTRS)

    Merrill, Walter; Guo, Ten-Huei; Duyar, Ahmet

    1991-01-01

    This paper presents a tutorial on the use of model identification techniques for the identification of propulsion system models. These models are important for control design, simulation, parameter estimation, and fault detection. Propulsion system identification is defined in the context of the classical description of identification as a four step process that is unique because of special considerations of data and error sources. Propulsion system models are described along with the dependence of system operation on the environment. Propulsion system simulation approaches are discussed as well as approaches to propulsion system identification with examples for both air breathing and rocket systems.

  18. Identification and influence of spatio-temporal outliers in urban air quality measurements.

    PubMed

    O'Leary, Brendan; Reiners, John J; Xu, Xiaohong; Lemke, Lawrence D

    2016-12-15

    Forty eight potential outliers in air pollution measurements taken simultaneously in Detroit, Michigan, USA and Windsor, Ontario, Canada in 2008 and 2009 were identified using four independent methods: box plots, variogram clouds, difference maps, and the Local Moran's I statistic. These methods were subsequently used in combination to reduce and select a final set of 13 outliers for nitrogen dioxide (NO 2 ), volatile organic compounds (VOCs), total benzene, toluene, ethyl benzene, and xylene (BTEX), and particulate matter in two size fractions (PM 2.5 and PM 10 ). The selected outliers were excluded from the measurement datasets and used to revise air pollution models. In addition, a set of temporally-scaled air pollution models was generated using time series measurements from community air quality monitors, with and without the selected outliers. The influence of outlier exclusion on associations with asthma exacerbation rates aggregated at a postal zone scale in both cities was evaluated. Results demonstrate that the inclusion or exclusion of outliers influences the strength of observed associations between intraurban air quality and asthma exacerbation in both cities. The box plot, variogram cloud, and difference map methods largely determined the final list of outliers, due to the high degree of conformity among their results. The Moran's I approach was not useful for outlier identification in the datasets studied. Removing outliers changed the spatial distribution of modeled concentration values and derivative exposure estimates averaged over postal zones. Overall, associations between air pollution and acute asthma exacerbation rates were weaker with outliers removed, but improved with the addition of temporal information. Decreases in statistically significant associations between air pollution and asthma resulted, in part, from smaller pollutant concentration ranges used for linear regression. Nevertheless, the practice of identifying outliers through congruence among multiple methods strengthens confidence in the analysis of outlier presence and influence in environmental datasets. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.

  19. On the effectiveness of noise masks: naturalistic vs. un-naturalistic image statistics.

    PubMed

    Hansen, Bruce C; Hess, Robert F

    2012-05-01

    It has been argued that the human visual system is optimized for identification of broadband objects embedded in stimuli possessing orientation averaged power spectra fall-offs that obey the 1/f(β) relationship typically observed in natural scene imagery (i.e., β=2.0 on logarithmic axes). Here, we were interested in whether individual spatial channels leading to recognition are functionally optimized for narrowband targets when masked by noise possessing naturalistic image statistics (β=2.0). The current study therefore explores the impact of variable β noise masks on the identification of narrowband target stimuli ranging in spatial complexity, while simultaneously controlling for physical or perceived differences between the masks. The results show that β=2.0 noise masks produce the largest identification thresholds regardless of target complexity, and thus do not seem to yield functionally optimized channel processing. The differential masking effects are discussed in the context of contrast gain control. Copyright © 2012 Elsevier Ltd. All rights reserved.

  20. Systematic and fully automated identification of protein sequence patterns.

    PubMed

    Hart, R K; Royyuru, A K; Stolovitzky, G; Califano, A

    2000-01-01

    We present an efficient algorithm to systematically and automatically identify patterns in protein sequence families. The procedure is based on the Splash deterministic pattern discovery algorithm and on a framework to assess the statistical significance of patterns. We demonstrate its application to the fully automated discovery of patterns in 974 PROSITE families (the complete subset of PROSITE families which are defined by patterns and contain DR records). Splash generates patterns with better specificity and undiminished sensitivity, or vice versa, in 28% of the families; identical statistics were obtained in 48% of the families, worse statistics in 15%, and mixed behavior in the remaining 9%. In about 75% of the cases, Splash patterns identify sequence sites that overlap more than 50% with the corresponding PROSITE pattern. The procedure is sufficiently rapid to enable its use for daily curation of existing motif and profile databases. Third, our results show that the statistical significance of discovered patterns correlates well with their biological significance. The trypsin subfamily of serine proteases is used to illustrate this method's ability to exhaustively discover all motifs in a family that are statistically and biologically significant. Finally, we discuss applications of sequence patterns to multiple sequence alignment and the training of more sensitive score-based motif models, akin to the procedure used by PSI-BLAST. All results are available at httpl//www.research.ibm.com/spat/.

  1. Using the domain identification model to study major and career decision-making processes

    NASA Astrophysics Data System (ADS)

    Tendhar, Chosang; Singh, Kusum; Jones, Brett D.

    2018-03-01

    The purpose of this study was to examine the extent to which (1) a domain identification model could be used to predict students' engineering major and career intentions and (2) the MUSIC Model of Motivation components could be used to predict domain identification. The data for this study were collected from first-year engineering students. We used a structural equation model to test the hypothesised relationship between variables in the partial domain identification model. The findings suggested that engineering identification significantly predicted engineering major intentions and career intentions and had the highest effect on those two variables compared to other motivational constructs. Furthermore, results suggested that success, interest, and caring are plausible contributors to students' engineering identification. Overall, there is strong evidence that the domain identification model can be used as a lens to study career decision-making processes in engineering, and potentially, in other fields as well.

  2. Development of rotorcraft interior noise control concepts. Phase 3: Development of noise control concepts

    NASA Technical Reports Server (NTRS)

    Yoerkie, Charles A.; Gintoli, P. J.; Ingraham, S. T.; Moore, J. A.

    1986-01-01

    The goal of this research is the understanding of helicopter internal noise mechanisms and the development, design, and testing of noise control concepts which will produce significant reductions in the acoustic environment to which passengers are exposed. The Phase 3 effort involved the identification and evaluation of current and advanced treatment concepts, including isolation of structure-borne paths. In addition, a plan was devised for the full-scale evaluation of an isolation concept. Specific objectives were as follows: (1) identification and characterization of various noise control concepts; (2) implementation of noise control concepts within the S-76 SEA (statistical energy analysis) model; (3) definition and evaluation of a preliminary acoustic isolation design to reduce structure-borne transmission of acoustic frequency main gearbox gear clash vibrations into the airframe; (4) formulation of a plan for the full-scale validation of the isolation concept; and (5) prediction of the cabin noise environment with various noise control concepts installed.

  3. A Bayesian Approach for Sensor Optimisation in Impact Identification

    PubMed Central

    Mallardo, Vincenzo; Sharif Khodaei, Zahra; Aliabadi, Ferri M. H.

    2016-01-01

    This paper presents a Bayesian approach for optimizing the position of sensors aimed at impact identification in composite structures under operational conditions. The uncertainty in the sensor data has been represented by statistical distributions of the recorded signals. An optimisation strategy based on the genetic algorithm is proposed to find the best sensor combination aimed at locating impacts on composite structures. A Bayesian-based objective function is adopted in the optimisation procedure as an indicator of the performance of meta-models developed for different sensor combinations to locate various impact events. To represent a real structure under operational load and to increase the reliability of the Structural Health Monitoring (SHM) system, the probability of malfunctioning sensors is included in the optimisation. The reliability and the robustness of the procedure is tested with experimental and numerical examples. Finally, the proposed optimisation algorithm is applied to a composite stiffened panel for both the uniform and non-uniform probability of impact occurrence. PMID:28774064

  4. Cross-Identification of Astronomical Catalogs on Multiple GPUs

    NASA Astrophysics Data System (ADS)

    Lee, M. A.; Budavári, T.

    2013-10-01

    One of the most fundamental problems in observational astronomy is the cross-identification of sources. Observations are made in different wavelengths, at different times, and from different locations and instruments, resulting in a large set of independent observations. The scientific outcome is often limited by our ability to quickly perform meaningful associations between detections. The matching, however, is difficult scientifically, statistically, as well as computationally. The former two require detailed physical modeling and advanced probabilistic concepts; the latter is due to the large volumes of data and the problem's combinatorial nature. In order to tackle the computational challenge and to prepare for future surveys, whose measurements will be exponentially increasing in size past the scale of feasible CPU-based solutions, we developed a new implementation which addresses the issue by performing the associations on multiple Graphics Processing Units (GPUs). Our implementation utilizes up to 6 GPUs in combination with the Thrust library to achieve an over 40x speed up verses the previous best implementation running on a multi-CPU SQL Server.

  5. Analysis and automatic identification of sleep stages using higher order spectra.

    PubMed

    Acharya, U Rajendra; Chua, Eric Chern-Pin; Chua, Kuang Chua; Min, Lim Choo; Tamura, Toshiyo

    2010-12-01

    Electroencephalogram (EEG) signals are widely used to study the activity of the brain, such as to determine sleep stages. These EEG signals are nonlinear and non-stationary in nature. It is difficult to perform sleep staging by visual interpretation and linear techniques. Thus, we use a nonlinear technique, higher order spectra (HOS), to extract hidden information in the sleep EEG signal. In this study, unique bispectrum and bicoherence plots for various sleep stages were proposed. These can be used as visual aid for various diagnostics application. A number of HOS based features were extracted from these plots during the various sleep stages (Wakefulness, Rapid Eye Movement (REM), Stage 1-4 Non-REM) and they were found to be statistically significant with p-value lower than 0.001 using ANOVA test. These features were fed to a Gaussian mixture model (GMM) classifier for automatic identification. Our results indicate that the proposed system is able to identify sleep stages with an accuracy of 88.7%.

  6. Probing the top-quark width using the charge identification of b jets

    DOE PAGES

    Giardino, Pier Paolo; Zhang, Cen

    2017-07-18

    We propose a new method for measuring the top-quark width based on the on-/off-shell ratio of b -charge asymmetry in pp → Wbj production at the LHC. The charge asymmetry removes virtually all backgrounds and related uncertainties, while remaining systematic and theoretical uncertainties can be taken under control by the ratio of cross sections. Limited only by statistical error, in an optimistic scenario, we find that our approach leads to good precision at high integrated luminosity, at a few hundred MeV assuming 300 – 3000 fb -1 at the LHC. The approach directly probes the total width, in such amore » way that model-dependence can be minimized. It is complementary to existing cross section measurements which always leave a degeneracy between the total rate and the branching ratio, and provides valuable information about the properties of the top quark. Here, the proposal opens up new opportunities for precision top measurements using a b-charge identification algorithm.« less

  7. The Assessment of Climatological Impacts on Agricultural Production and Residential Energy Demand

    NASA Astrophysics Data System (ADS)

    Cooter, Ellen Jean

    The assessment of climatological impacts on selected economic activities is presented as a multi-step, inter -disciplinary problem. The assessment process which is addressed explicitly in this report focuses on (1) user identification, (2) direct impact model selection, (3) methodological development, (4) product development and (5) product communication. Two user groups of major economic importance were selected for study; agriculture and gas utilities. The broad agricultural sector is further defined as U.S.A. corn production. The general category of utilities is narrowed to Oklahoma residential gas heating demand. The CERES physiological growth model was selected as the process model for corn production. The statistical analysis for corn production suggests that (1) although this is a statistically complex model, it can yield useful impact information, (2) as a result of output distributional biases, traditional statistical techniques are not adequate analytical tools, (3) the model yield distribution as a whole is probably non-Gausian, particularly in the tails and (4) there appears to be identifiable weekly patterns of forecasted yields throughout the growing season. Agricultural quantities developed include point yield impact estimates and distributional characteristics, geographic corn weather distributions, return period estimates, decision making criteria (confidence limits) and time series of indices. These products were communicated in economic terms through the use of a Bayesian decision example and an econometric model. The NBSLD energy load model was selected to represent residential gas heating consumption. A cursory statistical analysis suggests relationships among weather variables across the Oklahoma study sites. No linear trend in "technology -free" modeled energy demand or input weather variables which would correspond to that contained in observed state -level residential energy use was detected. It is suggested that this trend is largely the result of non-weather factors such as population and home usage patterns rather than regional climate change. Year-to-year changes in modeled residential heating demand on the order of 10('6) Btu's per household were determined and later related to state -level components of the Oklahoma economy. Products developed include the definition of regional forecast areas, likelihood estimates of extreme seasonal conditions and an energy/climate index. This information is communicated in economic terms through an input/output model which is used to estimate changes in Gross State Product and Household income attributable to weather variability.

  8. Empirical performance of the self-controlled case series design: lessons for developing a risk identification and analysis system.

    PubMed

    Suchard, Marc A; Zorych, Ivan; Simpson, Shawn E; Schuemie, Martijn J; Ryan, Patrick B; Madigan, David

    2013-10-01

    The self-controlled case series (SCCS) offers potential as an statistical method for risk identification involving medical products from large-scale observational healthcare data. However, analytic design choices remain in encoding the longitudinal health records into the SCCS framework and its risk identification performance across real-world databases is unknown. To evaluate the performance of SCCS and its design choices as a tool for risk identification in observational healthcare data. We examined the risk identification performance of SCCS across five design choices using 399 drug-health outcome pairs in five real observational databases (four administrative claims and one electronic health records). In these databases, the pairs involve 165 positive controls and 234 negative controls. We also consider several synthetic databases with known relative risks between drug-outcome pairs. We evaluate risk identification performance through estimating the area under the receiver-operator characteristics curve (AUC) and bias and coverage probability in the synthetic examples. The SCCS achieves strong predictive performance. Twelve of the twenty health outcome-database scenarios return AUCs >0.75 across all drugs. Including all adverse events instead of just the first per patient and applying a multivariate adjustment for concomitant drug use are the most important design choices. However, the SCCS as applied here returns relative risk point-estimates biased towards the null value of 1 with low coverage probability. The SCCS recently extended to apply a multivariate adjustment for concomitant drug use offers promise as a statistical tool for risk identification in large-scale observational healthcare databases. Poor estimator calibration dampens enthusiasm, but on-going work should correct this short-coming.

  9. graph-GPA: A graphical model for prioritizing GWAS results and investigating pleiotropic architecture.

    PubMed

    Chung, Dongjun; Kim, Hang J; Zhao, Hongyu

    2017-02-01

    Genome-wide association studies (GWAS) have identified tens of thousands of genetic variants associated with hundreds of phenotypes and diseases, which have provided clinical and medical benefits to patients with novel biomarkers and therapeutic targets. However, identification of risk variants associated with complex diseases remains challenging as they are often affected by many genetic variants with small or moderate effects. There has been accumulating evidence suggesting that different complex traits share common risk basis, namely pleiotropy. Recently, several statistical methods have been developed to improve statistical power to identify risk variants for complex traits through a joint analysis of multiple GWAS datasets by leveraging pleiotropy. While these methods were shown to improve statistical power for association mapping compared to separate analyses, they are still limited in the number of phenotypes that can be integrated. In order to address this challenge, in this paper, we propose a novel statistical framework, graph-GPA, to integrate a large number of GWAS datasets for multiple phenotypes using a hidden Markov random field approach. Application of graph-GPA to a joint analysis of GWAS datasets for 12 phenotypes shows that graph-GPA improves statistical power to identify risk variants compared to statistical methods based on smaller number of GWAS datasets. In addition, graph-GPA also promotes better understanding of genetic mechanisms shared among phenotypes, which can potentially be useful for the development of improved diagnosis and therapeutics. The R implementation of graph-GPA is currently available at https://dongjunchung.github.io/GGPA/.

  10. The fourth radiation transfer model intercomparison (RAMI-IV): Proficiency testing of canopy reflectance models with ISO-13528

    NASA Astrophysics Data System (ADS)

    Widlowski, J.-L.; Pinty, B.; Lopatka, M.; Atzberger, C.; Buzica, D.; Chelle, M.; Disney, M.; Gastellu-Etchegorry, J.-P.; Gerboles, M.; Gobron, N.; Grau, E.; Huang, H.; Kallel, A.; Kobayashi, H.; Lewis, P. E.; Qin, W.; Schlerf, M.; Stuckens, J.; Xie, D.

    2013-07-01

    The radiation transfer model intercomparison (RAMI) activity aims at assessing the reliability of physics-based radiative transfer (RT) models under controlled experimental conditions. RAMI focuses on computer simulation models that mimic the interactions of radiation with plant canopies. These models are increasingly used in the development of satellite retrieval algorithms for terrestrial essential climate variables (ECVs). Rather than applying ad hoc performance metrics, RAMI-IV makes use of existing ISO standards to enhance the rigor of its protocols evaluating the quality of RT models. ISO-13528 was developed "to determine the performance of individual laboratories for specific tests or measurements." More specifically, it aims to guarantee that measurement results fall within specified tolerance criteria from a known reference. Of particular interest to RAMI is that ISO-13528 provides guidelines for comparisons where the true value of the target quantity is unknown. In those cases, "truth" must be replaced by a reliable "conventional reference value" to enable absolute performance tests. This contribution will show, for the first time, how the ISO-13528 standard developed by the chemical and physical measurement communities can be applied to proficiency testing of computer simulation models. Step by step, the pre-screening of data, the identification of reference solutions, and the choice of proficiency statistics will be discussed and illustrated with simulation results from the RAMI-IV "abstract canopy" scenarios. Detailed performance statistics of the participating RT models will be provided and the role of the accuracy of the reference solutions as well as the choice of the tolerance criteria will be highlighted.

  11. A Statistical Testing Approach for Quantifying Software Reliability; Application to an Example System

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chu, Tsong-Lun; Varuttamaseni, Athi; Baek, Joo-Seok

    The U.S. Nuclear Regulatory Commission (NRC) encourages the use of probabilistic risk assessment (PRA) technology in all regulatory matters, to the extent supported by the state-of-the-art in PRA methods and data. Although much has been accomplished in the area of risk-informed regulation, risk assessment for digital systems has not been fully developed. The NRC established a plan for research on digital systems to identify and develop methods, analytical tools, and regulatory guidance for (1) including models of digital systems in the PRAs of nuclear power plants (NPPs), and (2) incorporating digital systems in the NRC's risk-informed licensing and oversight activities.more » Under NRC's sponsorship, Brookhaven National Laboratory (BNL) explored approaches for addressing the failures of digital instrumentation and control (I and C) systems in the current NPP PRA framework. Specific areas investigated included PRA modeling digital hardware, development of a philosophical basis for defining software failure, and identification of desirable attributes of quantitative software reliability methods. Based on the earlier research, statistical testing is considered a promising method for quantifying software reliability. This paper describes a statistical software testing approach for quantifying software reliability and applies it to the loop-operating control system (LOCS) of an experimental loop of the Advanced Test Reactor (ATR) at Idaho National Laboratory (INL).« less

  12. iCFD: Interpreted Computational Fluid Dynamics - Degeneration of CFD to one-dimensional advection-dispersion models using statistical experimental design - The secondary clarifier.

    PubMed

    Guyonvarch, Estelle; Ramin, Elham; Kulahci, Murat; Plósz, Benedek Gy

    2015-10-15

    The present study aims at using statistically designed computational fluid dynamics (CFD) simulations as numerical experiments for the identification of one-dimensional (1-D) advection-dispersion models - computationally light tools, used e.g., as sub-models in systems analysis. The objective is to develop a new 1-D framework, referred to as interpreted CFD (iCFD) models, in which statistical meta-models are used to calculate the pseudo-dispersion coefficient (D) as a function of design and flow boundary conditions. The method - presented in a straightforward and transparent way - is illustrated using the example of a circular secondary settling tank (SST). First, the significant design and flow factors are screened out by applying the statistical method of two-level fractional factorial design of experiments. Second, based on the number of significant factors identified through the factor screening study and system understanding, 50 different sets of design and flow conditions are selected using Latin Hypercube Sampling (LHS). The boundary condition sets are imposed on a 2-D axi-symmetrical CFD simulation model of the SST. In the framework, to degenerate the 2-D model structure, CFD model outputs are approximated by the 1-D model through the calibration of three different model structures for D. Correlation equations for the D parameter then are identified as a function of the selected design and flow boundary conditions (meta-models), and their accuracy is evaluated against D values estimated in each numerical experiment. The evaluation and validation of the iCFD model structure is carried out using scenario simulation results obtained with parameters sampled from the corners of the LHS experimental region. For the studied SST, additional iCFD model development was carried out in terms of (i) assessing different density current sub-models; (ii) implementation of a combined flocculation, hindered, transient and compression settling velocity function; and (iii) assessment of modelling the onset of transient and compression settling. Furthermore, the optimal level of model discretization both in 2-D and 1-D was undertaken. Results suggest that the iCFD model developed for the SST through the proposed methodology is able to predict solid distribution with high accuracy - taking a reasonable computational effort - when compared to multi-dimensional numerical experiments, under a wide range of flow and design conditions. iCFD tools could play a crucial role in reliably predicting systems' performance under normal and shock events. Copyright © 2015 Elsevier Ltd. All rights reserved.

  13. Characterization of Strong Light-Matter Coupling in Semiconductor Quantum-Dot Microcavities via Photon-Statistics Spectroscopy

    NASA Astrophysics Data System (ADS)

    Schneebeli, L.; Kira, M.; Koch, S. W.

    2008-08-01

    It is shown that spectrally resolved photon-statistics measurements of the resonance fluorescence from realistic semiconductor quantum-dot systems allow for high contrast identification of the two-photon strong-coupling states. Using a microscopic theory, the second-rung resonance of Jaynes-Cummings ladder is analyzed and optimum excitation conditions are determined. The computed photon-statistics spectrum displays gigantic, experimentally robust resonances at the energetic positions of the second-rung emission.

  14. Structurally Dynamic Spin Market Networks

    NASA Astrophysics Data System (ADS)

    Horváth, Denis; Kuscsik, Zoltán

    The agent-based model of stock price dynamics on a directed evolving complex network is suggested and studied by direct simulation. The stationary regime is maintained as a result of the balance between the extremal dynamics, adaptivity of strategic variables and reconnection rules. The inherent structure of node agent "brain" is modeled by a recursive neural network with local and global inputs and feedback connections. For specific parametric combination the complex network displays small-world phenomenon combined with scale-free behavior. The identification of a local leader (network hub, agent whose strategies are frequently adapted by its neighbors) is carried out by repeated random walk process through network. The simulations show empirically relevant dynamics of price returns and volatility clustering. The additional emerging aspects of stylized market statistics are Zipfian distributions of fitness.

  15. Assessment of statistical methods used in library-based approaches to microbial source tracking.

    PubMed

    Ritter, Kerry J; Carruthers, Ethan; Carson, C Andrew; Ellender, R D; Harwood, Valerie J; Kingsley, Kyle; Nakatsu, Cindy; Sadowsky, Michael; Shear, Brian; West, Brian; Whitlock, John E; Wiggins, Bruce A; Wilbur, Jayson D

    2003-12-01

    Several commonly used statistical methods for fingerprint identification in microbial source tracking (MST) were examined to assess the effectiveness of pattern-matching algorithms to correctly identify sources. Although numerous statistical methods have been employed for source identification, no widespread consensus exists as to which is most appropriate. A large-scale comparison of several MST methods, using identical fecal sources, presented a unique opportunity to assess the utility of several popular statistical methods. These included discriminant analysis, nearest neighbour analysis, maximum similarity and average similarity, along with several measures of distance or similarity. Threshold criteria for excluding uncertain or poorly matched isolates from final analysis were also examined for their ability to reduce false positives and increase prediction success. Six independent libraries used in the study were constructed from indicator bacteria isolated from fecal materials of humans, seagulls, cows and dogs. Three of these libraries were constructed using the rep-PCR technique and three relied on antibiotic resistance analysis (ARA). Five of the libraries were constructed using Escherichia coli and one using Enterococcus spp. (ARA). Overall, the outcome of this study suggests a high degree of variability across statistical methods. Despite large differences in correct classification rates among the statistical methods, no single statistical approach emerged as superior. Thresholds failed to consistently increase rates of correct classification and improvement was often associated with substantial effective sample size reduction. Recommendations are provided to aid in selecting appropriate analyses for these types of data.

  16. Organizational justice, trust, and identification and their effects on organizational commitment in hospital nursing staff.

    PubMed

    Chen, Su-Yueh; Wu, Wen-Chuan; Chang, Ching-Sheng; Lin, Chia-Tzu; Kung, Jung-Yuan; Weng, Hui-Ching; Lin, Yu-Tz; Lee, Shu-I

    2015-09-07

    It is of importance and urgency for hospitals to retain excellent nursing staff in order to improve patient satisfaction and hospital performance. However, it was found that simply increasing the salary is not the best method to resolve the problem of lacking nursing staff; it is necessary to focus on the impact of non-monetary factors. The delicate relationship between organizational justice, organizational trust, organizational identification, and organizational commitment requires investigation and clarification from more studies if application in nursing practice is to be expected. Therefore, this study was to investigate how the organizational justice perception could affect nurses' organizational trust and organizational identification, and whether the organizational trust and organizational identification could encourage nurses to willingly remain in their jobs and commit themselves to the hospitals. A cross-sectional design was used. Questionnaires were distributed in 2013 to a convenience sample of 400 registered nurses in one teaching hospital in Taiwan: 392 were retrieved. Of these, 386 questionnaires were valid, which was a 96.5% response rate. The SPSS 17.0 and Amos 17.0 (structural equation modeling) statistical software packages were used for data analysis. The organizational justice perceived by nurses significantly and positively affects their organizational trust (γ₁₁ = 0.49) and organizational identification (γ₂₁ = 0.58). Organizational trust (β₃₁ = 0.62) and organizational identification (β₃₂ = 0.53) significantly and positively affect organizational commitment. Hospital managers can enhance the service concepts and attitudes of frontline nursing personnel by maximizing organizational justice, organizational trust and organizational identification. Nursing personnel would then be motivated to provide feedback to the attention and care provided by hospital management by demonstrating substantial improvements in their extra-role performance. Improved service concepts and attitudes would also facilitate teamwork among colleagues, boost the morale of the nursing faculty and reduce resignations and career changes.

  17. An Automated Method for Landmark Identification and Finite-Element Modeling of the Lumbar Spine.

    PubMed

    Campbell, Julius Quinn; Petrella, Anthony J

    2015-11-01

    The purpose of this study was to develop a method for the automated creation of finite-element models of the lumbar spine. Custom scripts were written to extract bone landmarks of lumbar vertebrae and assemble L1-L5 finite-element models. End-plate borders, ligament attachment points, and facet surfaces were identified. Landmarks were identified to maintain mesh correspondence between meshes for later use in statistical shape modeling. 90 lumbar vertebrae were processed creating 18 subject-specific finite-element models. Finite-element model surfaces and ligament attachment points were reproduced within 1e-5 mm of the bone surface, including the critical contact surfaces of the facets. Element quality exceeded specifications in 97% of elements for the 18 models created. The current method is capable of producing subject-specific finite-element models of the lumbar spine with good accuracy, quality, and robustness. The automated methods developed represent advancement in the state of the art of subject-specific lumbar spine modeling to a scale not possible with prior manual and semiautomated methods.

  18. Discrete dynamical system modelling for gene regulatory networks of 5-hydroxymethylfurfural tolerance for ethanologenic yeast.

    PubMed

    Song, M; Ouyang, Z; Liu, Z L

    2009-05-01

    Composed of linear difference equations, a discrete dynamical system (DDS) model was designed to reconstruct transcriptional regulations in gene regulatory networks (GRNs) for ethanologenic yeast Saccharomyces cerevisiae in response to 5-hydroxymethylfurfural (HMF), a bioethanol conversion inhibitor. The modelling aims at identification of a system of linear difference equations to represent temporal interactions among significantly expressed genes. Power stability is imposed on a system model under the normal condition in the absence of the inhibitor. Non-uniform sampling, typical in a time-course experimental design, is addressed by a log-time domain interpolation. A statistically significant DDS model of the yeast GRN derived from time-course gene expression measurements by exposure to HMF, revealed several verified transcriptional regulation events. These events implicate Yap1 and Pdr3, transcription factors consistently known for their regulatory roles by other studies or postulated by independent sequence motif analysis, suggesting their involvement in yeast tolerance and detoxification of the inhibitor.

  19. Statistics of Magnetic Reconnection X-Lines in Kinetic Turbulence

    NASA Astrophysics Data System (ADS)

    Haggerty, C. C.; Parashar, T.; Matthaeus, W. H.; Shay, M. A.; Wan, M.; Servidio, S.; Wu, P.

    2016-12-01

    In this work we examine the statistics of magnetic reconnection (x-lines) and their associated reconnection rates in intermittent current sheets generated in turbulent plasmas. Although such statistics have been studied previously for fluid simulations (e.g. [1]), they have not yet been generalized to fully kinetic particle-in-cell (PIC) simulations. A significant problem with PIC simulations, however, is electrostatic fluctuations generated due to numerical particle counting statistics. We find that analyzing gradients of the magnetic vector potential from the raw PIC field data identifies numerous artificial (or non-physical) x-points. Using small Orszag-Tang vortex PIC simulations, we analyze x-line identification and show that these artificial x-lines can be removed using sub-Debye length filtering of the data. We examine how turbulent properties such as the magnetic spectrum and scale dependent kurtosis are affected by particle noise and sub-Debye length filtering. We subsequently apply these analysis methods to a large scale kinetic PIC turbulent simulation. Consistent with previous fluid models, we find a range of normalized reconnection rates as large as ½ but with the bulk of the rates being approximately less than to 0.1. [1] Servidio, S., W. H. Matthaeus, M. A. Shay, P. A. Cassak, and P. Dmitruk (2009), Magnetic reconnection and two-dimensional magnetohydrodynamic turbulence, Phys. Rev. Lett., 102, 115003.

  20. 75 FR 94 - Amendments to the Section 7216 Regulations-Disclosure or Use of Information by Preparers of Returns

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-01-04

    ... use of statistical compilations of data under section 7216 of the Internal Revenue Code (Code) by a... preparation business, including identification of additional limited circumstances when a tax return preparer... tax return business under Sec. 301.7216-2(n); disclose and use statistical compilations of data...

  1. Identification and Definition of Lexically Ambiguous Words in Statistics by Tutors and Students

    ERIC Educational Resources Information Center

    Richardson, Alice M.; Dunn, Peter K.; Hutchins, Rene

    2013-01-01

    Lexical ambiguity arises when a word from everyday English is used differently in a particular discipline, such as statistics. This paper reports on a project that begins by identifying tutors' perceptions of words that are potentially lexically ambiguous to students, in two different ways. Students' definitions of nine lexically ambiguous words…

  2. Statistical method evaluation for differentially methylated CpGs in base resolution next-generation DNA sequencing data.

    PubMed

    Zhang, Yun; Baheti, Saurabh; Sun, Zhifu

    2018-05-01

    High-throughput bisulfite methylation sequencing such as reduced representation bisulfite sequencing (RRBS), Agilent SureSelect Human Methyl-Seq (Methyl-seq) or whole-genome bisulfite sequencing is commonly used for base resolution methylome research. These data are represented either by the ratio of methylated cytosine versus total coverage at a CpG site or numbers of methylated and unmethylated cytosines. Multiple statistical methods can be used to detect differentially methylated CpGs (DMCs) between conditions, and these methods are often the base for the next step of differentially methylated region identification. The ratio data have a flexibility of fitting to many linear models, but the raw count data take consideration of coverage information. There is an array of options in each datatype for DMC detection; however, it is not clear which is an optimal statistical method. In this study, we systematically evaluated four statistic methods on methylation ratio data and four methods on count-based data and compared their performances with regard to type I error control, sensitivity and specificity of DMC detection and computational resource demands using real RRBS data along with simulation. Our results show that the ratio-based tests are generally more conservative (less sensitive) than the count-based tests. However, some count-based methods have high false-positive rates and should be avoided. The beta-binomial model gives a good balance between sensitivity and specificity and is preferred method. Selection of methods in different settings, signal versus noise and sample size estimation are also discussed.

  3. Tooth enamel oxygen "isoscapes" show a high degree of human mobility in prehistoric Britain.

    PubMed

    Pellegrini, Maura; Pouncett, John; Jay, Mandy; Pearson, Mike Parker; Richards, Michael P

    2016-10-07

    A geostatistical model to predict human skeletal oxygen isotope values (δ 18 O p ) in Britain is presented here based on a new dataset of Chalcolithic and Early Bronze Age human teeth. The spatial statistics which underpin this model allow the identification of individuals interpreted as 'non-local' to the areas where they were buried (spatial outliers). A marked variation in δ 18 O p is observed in several areas, including the Stonehenge region, the Peak District, and the Yorkshire Wolds, suggesting a high degree of human mobility. These areas, rich in funerary and ceremonial monuments, may have formed focal points for people, some of whom would have travelled long distances, ultimately being buried there. The dataset and model represent a baseline for future archaeological studies, avoiding the complex conversions from skeletal to water δ 18 O values-a process known to be problematic.

  4. Tooth enamel oxygen “isoscapes” show a high degree of human mobility in prehistoric Britain

    PubMed Central

    Pellegrini, Maura; Pouncett, John; Jay, Mandy; Pearson, Mike Parker; Richards, Michael P.

    2016-01-01

    A geostatistical model to predict human skeletal oxygen isotope values (δ18Op) in Britain is presented here based on a new dataset of Chalcolithic and Early Bronze Age human teeth. The spatial statistics which underpin this model allow the identification of individuals interpreted as ‘non-local’ to the areas where they were buried (spatial outliers). A marked variation in δ18Op is observed in several areas, including the Stonehenge region, the Peak District, and the Yorkshire Wolds, suggesting a high degree of human mobility. These areas, rich in funerary and ceremonial monuments, may have formed focal points for people, some of whom would have travelled long distances, ultimately being buried there. The dataset and model represent a baseline for future archaeological studies, avoiding the complex conversions from skeletal to water δ18O values–a process known to be problematic. PMID:27713538

  5. Tooth enamel oxygen “isoscapes” show a high degree of human mobility in prehistoric Britain

    NASA Astrophysics Data System (ADS)

    Pellegrini, Maura; Pouncett, John; Jay, Mandy; Pearson, Mike Parker; Richards, Michael P.

    2016-10-01

    A geostatistical model to predict human skeletal oxygen isotope values (δ18Op) in Britain is presented here based on a new dataset of Chalcolithic and Early Bronze Age human teeth. The spatial statistics which underpin this model allow the identification of individuals interpreted as ‘non-local’ to the areas where they were buried (spatial outliers). A marked variation in δ18Op is observed in several areas, including the Stonehenge region, the Peak District, and the Yorkshire Wolds, suggesting a high degree of human mobility. These areas, rich in funerary and ceremonial monuments, may have formed focal points for people, some of whom would have travelled long distances, ultimately being buried there. The dataset and model represent a baseline for future archaeological studies, avoiding the complex conversions from skeletal to water δ18O values-a process known to be problematic.

  6. Modelling runway incursion severity.

    PubMed

    Wilke, Sabine; Majumdar, Arnab; Ochieng, Washington Y

    2015-06-01

    Analysis of the causes underlying runway incursions is fundamental for the development of effective mitigation measures. However, there are significant weaknesses in the current methods to model these factors. This paper proposes a structured framework for modelling causal factors and their relationship to severity, which includes a description of the airport surface system architecture, establishment of terminological definitions, the determination and collection of appropriate data, the analysis of occurrences for severity and causes, and the execution of a statistical analysis framework. It is implemented in the context of U.S. airports, enabling the identification of a number of priority interventions, including the need for better investigation and causal factor capture, recommendations for airfield design, operating scenarios and technologies, and better training for human operators in the system. The framework is recommended for the analysis of runway incursions to support safety improvements and the methodology is transferable to other areas of aviation safety risk analysis. Copyright © 2015 Elsevier Ltd. All rights reserved.

  7. Workshop on Algorithms for Time-Series Analysis

    NASA Astrophysics Data System (ADS)

    Protopapas, Pavlos

    2012-04-01

    abstract-type="normal">SummaryThis Workshop covered the four major subjects listed below in two 90-minute sessions. Each talk or tutorial allowed questions, and concluded with a discussion. Classification: Automatic classification using machine-learning methods is becoming a standard in surveys that generate large datasets. Ashish Mahabal (Caltech) reviewed various methods, and presented examples of several applications. Time-Series Modelling: Suzanne Aigrain (Oxford University) discussed autoregressive models and multivariate approaches such as Gaussian Processes. Meta-classification/mixture of expert models: Karim Pichara (Pontificia Universidad Católica, Chile) described the substantial promise which machine-learning classification methods are now showing in automatic classification, and discussed how the various methods can be combined together. Event Detection: Pavlos Protopapas (Harvard) addressed methods of fast identification of events with low signal-to-noise ratios, enlarging on the characterization and statistical issues of low signal-to-noise ratios and rare events.

  8. The effects of a hardiness educational intervention on hardiness and perceived stress of junior baccalaureate nursing students.

    PubMed

    Jameson, Paula R

    2014-04-01

    Baccalaureate nursing education is stressful. The stress encompasses a range of academic, personal, clinical, and social reasons. A hardiness educational program, a tool for stress management, based on theory, research, and practice, exists to enhance the attitudes and coping strategies of hardiness (Maddi, 2007; Maddi et al., 2002). Research has shown that students who completed the hardiness educational program, subsequently improved in grade point average (GPA), college retention rates, and health (Maddi et al., 2002). Little research has been done to explore the effects of hardiness education with junior baccalaureate nursing students. Early identification of hardiness, the need for hardiness education, or stress management in this population may influence persistence in and completion of a nursing program (Hensel and Stoelting-Gettelfinger, 2011). Therefore, the aims were to determine if an increase in hardiness and a decrease in perceived stress in junior baccalaureate nursing students occurred in those who participated in a hardiness intervention. The application of the Hardiness Model and the Roy Adaptation Model established connections and conceptual collaboration among stress, stimuli, adaptation, and hardi-coping. A quasi-experimental non-equivalent control group with pre-test and post-test was used with a convenience sample of full-time junior level baccalaureate nursing students. Data were collected from August 2011 to December 2011. Results of statistical analyses by paired t-tests revealed that the hardiness intervention did not have a statistically significant effect on increasing hardiness scores. The hardiness intervention did have a statistically significant effect on decreasing perceived stress scores. The significant decrease in perceived stress was congruent with the Hardiness Model and the Roy Adaptation Model. Further hardiness research among junior baccalaureate nursing students, utilizing the entire hardiness intervention, was recommended. © 2013.

  9. Virtual Beach 3: user's guide

    USGS Publications Warehouse

    Cyterski, Mike; Brooks, Wesley; Galvin, Mike; Wolfe, Kurt; Carvin, Rebecca; Roddick, Tonia; Fienen, Mike; Corsi, Steve

    2014-01-01

    Virtual Beach version 3 (VB3) is a decision support tool that constructs site-specific statistical models to predict fecal indicator bacteria (FIB) concentrations at recreational beaches. VB3 is primarily designed for beach managers responsible for making decisions regarding beach closures or the issuance of swimming advisories due to pathogen contamination. However, researchers, scientists, engineers, and students interested in studying relationships between water quality indicators and ambient environmental conditions will find VB3 useful. VB3 reads input data from a text file or Excel document, assists the user in preparing the data for analysis, enables automated model selection using a wide array of possible model evaluation criteria, and provides predictions using a chosen model parameterized with new data. With an integrated mapping component to determine the geographic orientation of the beach, the software can automatically decompose wind/current/wave speed and magnitude information into along-shore and onshore/offshore components for use in subsequent analyses. Data can be examined using simple scatter plots to evaluate relationships between the response and independent variables (IVs). VB3 can produce interaction terms between the primary IVs, and it can also test an array of transformations to maximize the linearity of the relationship The software includes search routines for finding the "best" models from an array of possible choices. Automated censoring of statistical models with highly correlated IVs occurs during the selection process. Models can be constructed either using previously collected data or forecasted environmental information. VB3 has residual diagnostics for regression models, including automated outlier identification and removal using DFFITs or Cook's Distances.

  10. 3D QSAR models built on structure-based alignments of Abl tyrosine kinase inhibitors.

    PubMed

    Falchi, Federico; Manetti, Fabrizio; Carraro, Fabio; Naldini, Antonella; Maga, Giovanni; Crespan, Emmanuele; Schenone, Silvia; Bruno, Olga; Brullo, Chiara; Botta, Maurizio

    2009-06-01

    Quality QSAR: A combination of docking calculations and a statistical approach toward Abl inhibitors resulted in a 3D QSAR model, the analysis of which led to the identification of ligand portions important for affinity. New compounds designed on the basis of the model were found to have very good affinity for the target, providing further validation of the model itself.The X-ray crystallographic coordinates of the Abl tyrosine kinase domain in its active, inactive, and Src-like inactive conformations were used as targets to simulate the binding mode of a large series of pyrazolo[3,4-d]pyrimidines (known Abl inhibitors) by means of GOLD software. Receptor-based alignments provided by molecular docking calculations were submitted to a GRID-GOLPE protocol to generate 3D QSAR models. Analysis of the results showed that the models based on the inactive and Src-like inactive conformations had very poor statistical parameters, whereas the sole model based on the active conformation of Abl was characterized by significant internal and external predictive ability. Subsequent analysis of GOLPE PLS pseudo-coefficient contour plots of this model gave us a better understanding of the relationships between structure and affinity, providing suggestions for the next optimization process. On the basis of these results, new compounds were designed according to the hydrophobic and hydrogen bond donor and acceptor contours, and were found to have improved enzymatic and cellular activity with respect to parent compounds. Additional biological assays confirmed the important role of the selected compounds as inhibitors of cell proliferation in leukemia cells.

  11. Population models for passerine birds: structure, parameterization, and analysis

    USGS Publications Warehouse

    Noon, B.R.; Sauer, J.R.; McCullough, D.R.; Barrett, R.H.

    1992-01-01

    Population models have great potential as management tools, as they use infonnation about the life history of a species to summarize estimates of fecundity and survival into a description of population change. Models provide a framework for projecting future populations, determining the effects of management decisions on future population dynamics, evaluating extinction probabilities, and addressing a variety of questions of ecological and evolutionary interest. Even when insufficient information exists to allow complete identification of the model, the modelling procedure is useful because it forces the investigator to consider the life history of the species when determining what parameters should be estimated from field studies and provides a context for evaluating the relative importance of demographic parameters. Models have been little used in the study of the population dynamics of passerine birds because of: (1) widespread misunderstandings of the model structures and parameterizations, (2) a lack of knowledge of life histories of many species, (3) difficulties in obtaining statistically reliable estimates of demographic parameters for most passerine species, and (4) confusion about functional relationships among demographic parameters. As a result, studies of passerine demography are often designed inappropriately and fail to provide essential data. We review appropriate models for passerine bird populations and illustrate their possible uses in evaluating the effects of management or other environmental influences on population dynamics. We identify environmental influences on population dynamics. We identify parameters that must be estimated from field data, briefly review existing statistical methods for obtaining valid estimates, and evaluate the present status of knowledge of these parameters.

  12. Language acquisition and use: learning and applying probabilistic constraints.

    PubMed

    Seidenberg, M S

    1997-03-14

    What kinds of knowledge underlie the use of language and how is this knowledge acquired? Linguists equate knowing a language with knowing a grammar. Classic "poverty of the stimulus" arguments suggest that grammar identification is an intractable inductive problem and that acquisition is possible only because children possess innate knowledge of grammatical structure. An alternative view is emerging from studies of statistical and probabilistic aspects of language, connectionist models, and the learning capacities of infants. This approach emphasizes continuity between how language is acquired and how it is used. It retains the idea that innate capacities constrain language learning, but calls into question whether they include knowledge of grammatical structure.

  13. Cascaded Amplitude Modulations in Sound Texture Perception

    PubMed Central

    McWalter, Richard; Dau, Torsten

    2017-01-01

    Sound textures, such as crackling fire or chirping crickets, represent a broad class of sounds defined by their homogeneous temporal structure. It has been suggested that the perception of texture is mediated by time-averaged summary statistics measured from early auditory representations. In this study, we investigated the perception of sound textures that contain rhythmic structure, specifically second-order amplitude modulations that arise from the interaction of different modulation rates, previously described as “beating” in the envelope-frequency domain. We developed an auditory texture model that utilizes a cascade of modulation filterbanks that capture the structure of simple rhythmic patterns. The model was examined in a series of psychophysical listening experiments using synthetic sound textures—stimuli generated using time-averaged statistics measured from real-world textures. In a texture identification task, our results indicated that second-order amplitude modulation sensitivity enhanced recognition. Next, we examined the contribution of the second-order modulation analysis in a preference task, where the proposed auditory texture model was preferred over a range of model deviants that lacked second-order modulation rate sensitivity. Lastly, the discriminability of textures that included second-order amplitude modulations appeared to be perceived using a time-averaging process. Overall, our results demonstrate that the inclusion of second-order modulation analysis generates improvements in the perceived quality of synthetic textures compared to the first-order modulation analysis considered in previous approaches. PMID:28955191

  14. Applied Prevalence Ratio estimation with different Regression models: An example from a cross-national study on substance use research.

    PubMed

    Espelt, Albert; Marí-Dell'Olmo, Marc; Penelo, Eva; Bosque-Prous, Marina

    2016-06-14

    To examine the differences between Prevalence Ratio (PR) and Odds Ratio (OR) in a cross-sectional study and to provide tools to calculate PR using two statistical packages widely used in substance use research (STATA and R). We used cross-sectional data from 41,263 participants of 16 European countries participating in the Survey on Health, Ageing and Retirement in Europe (SHARE). The dependent variable, hazardous drinking, was calculated using the Alcohol Use Disorders Identification Test - Consumption (AUDIT-C). The main independent variable was gender. Other variables used were: age, educational level and country of residence. PR of hazardous drinking in men with relation to women was estimated using Mantel-Haenszel method, log-binomial regression models and poisson regression models with robust variance. These estimations were compared to the OR calculated using logistic regression models. Prevalence of hazardous drinkers varied among countries. Generally, men have higher prevalence of hazardous drinking than women [PR=1.43 (1.38-1.47)]. Estimated PR was identical independently of the method and the statistical package used. However, OR overestimated PR, depending on the prevalence of hazardous drinking in the country. In cross-sectional studies, where comparisons between countries with differences in the prevalence of the disease or condition are made, it is advisable to use PR instead of OR.

  15. Cascaded Amplitude Modulations in Sound Texture Perception.

    PubMed

    McWalter, Richard; Dau, Torsten

    2017-01-01

    Sound textures, such as crackling fire or chirping crickets, represent a broad class of sounds defined by their homogeneous temporal structure. It has been suggested that the perception of texture is mediated by time-averaged summary statistics measured from early auditory representations. In this study, we investigated the perception of sound textures that contain rhythmic structure, specifically second-order amplitude modulations that arise from the interaction of different modulation rates, previously described as "beating" in the envelope-frequency domain. We developed an auditory texture model that utilizes a cascade of modulation filterbanks that capture the structure of simple rhythmic patterns. The model was examined in a series of psychophysical listening experiments using synthetic sound textures-stimuli generated using time-averaged statistics measured from real-world textures. In a texture identification task, our results indicated that second-order amplitude modulation sensitivity enhanced recognition. Next, we examined the contribution of the second-order modulation analysis in a preference task, where the proposed auditory texture model was preferred over a range of model deviants that lacked second-order modulation rate sensitivity. Lastly, the discriminability of textures that included second-order amplitude modulations appeared to be perceived using a time-averaging process. Overall, our results demonstrate that the inclusion of second-order modulation analysis generates improvements in the perceived quality of synthetic textures compared to the first-order modulation analysis considered in previous approaches.

  16. Accurate prediction of vaccine stability under real storage conditions and during temperature excursions.

    PubMed

    Clénet, Didier

    2018-04-01

    Due to their thermosensitivity, most vaccines must be kept refrigerated from production to use. To successfully carry out global immunization programs, ensuring the stability of vaccines is crucial. In this context, two important issues are critical, namely: (i) predicting vaccine stability and (ii) preventing product damage due to excessive temperature excursions outside of the recommended storage conditions (cold chain break). We applied a combination of advanced kinetics and statistical analyses on vaccine forced degradation data to accurately describe the loss of antigenicity for a multivalent freeze-dried inactivated virus vaccine containing three variants. The screening of large amounts of kinetic models combined with a statistical model selection approach resulted in the identification of two-step kinetic models. Predictions based on kinetic analysis and experimental stability data were in agreement, with approximately five percentage points difference from real values for long-term stability storage conditions, after excursions of temperature and during experimental shipments of freeze-dried products. Results showed that modeling a few months of forced degradation can be used to predict various time and temperature profiles endured by vaccines, i.e. long-term stability, short time excursions outside the labeled storage conditions or shipments at ambient temperature, with high accuracy. Pharmaceutical applications of the presented kinetics-based approach are discussed. Copyright © 2018 The Author. Published by Elsevier B.V. All rights reserved.

  17. Stochastic reduced order models for inverse problems under uncertainty

    PubMed Central

    Warner, James E.; Aquino, Wilkins; Grigoriu, Mircea D.

    2014-01-01

    This work presents a novel methodology for solving inverse problems under uncertainty using stochastic reduced order models (SROMs). Given statistical information about an observed state variable in a system, unknown parameters are estimated probabilistically through the solution of a model-constrained, stochastic optimization problem. The point of departure and crux of the proposed framework is the representation of a random quantity using a SROM - a low dimensional, discrete approximation to a continuous random element that permits e cient and non-intrusive stochastic computations. Characterizing the uncertainties with SROMs transforms the stochastic optimization problem into a deterministic one. The non-intrusive nature of SROMs facilitates e cient gradient computations for random vector unknowns and relies entirely on calls to existing deterministic solvers. Furthermore, the method is naturally extended to handle multiple sources of uncertainty in cases where state variable data, system parameters, and boundary conditions are all considered random. The new and widely-applicable SROM framework is formulated for a general stochastic optimization problem in terms of an abstract objective function and constraining model. For demonstration purposes, however, we study its performance in the specific case of inverse identification of random material parameters in elastodynamics. We demonstrate the ability to efficiently recover random shear moduli given material displacement statistics as input data. We also show that the approach remains effective for the case where the loading in the problem is random as well. PMID:25558115

  18. The utility of modeling word identification from visual input within models of eye movements in reading

    PubMed Central

    Bicknell, Klinton; Levy, Roger

    2012-01-01

    Decades of empirical work have shown that a range of eye movement phenomena in reading are sensitive to the details of the process of word identification. Despite this, major models of eye movement control in reading do not explicitly model word identification from visual input. This paper presents a argument for developing models of eye movements that do include detailed models of word identification. Specifically, we argue that insights into eye movement behavior can be gained by understanding which phenomena naturally arise from an account in which the eyes move for efficient word identification, and that one important use of such models is to test which eye movement phenomena can be understood this way. As an extended case study, we present evidence from an extension of a previous model of eye movement control in reading that does explicitly model word identification from visual input, Mr. Chips (Legge, Klitz, & Tjan, 1997), to test two proposals for the effect of using linguistic context on reading efficiency. PMID:23074362

  19. Fabrication and optimization of a conducting polymer sensor array using stored grain model volatiles.

    PubMed

    Hossain, Md Eftekhar; Rahman, G M Aminur; Freund, Michael S; Jayas, Digvir S; White, Noel D G; Shafai, Cyrus; Thomson, Douglas J

    2012-03-21

    During storage, grain can experience significant degradation in quality due to a variety of physical, chemical, and biological interactions. Most commonly, these losses are associated with insects or fungi. Continuous monitoring and an ability to differentiate between sources of spoilage are critical for rapid and effective intervention to minimize deterioration or losses. Therefore, there is a keen interest in developing a straightforward, cost-effective, and efficient method for monitoring of stored grain. Sensor arrays are currently used for classifying liquors, perfumes, and the quality of food products by mimicking the mammalian olfactory system. The use of this technology for monitoring of stored grain and identification of the source of spoilage is a new application, which has the potential for broad impact. The main focus of the work described herein is on the fabrication and optimization of a carbon black (CB) polymer sensor array to monitor stored grain model volatiles associated with insect secretions (benzene derivatives) and fungi (aliphatic hydrocarbon derivatives). Various methods of statistical analysis (RSD, PCA, LDA, t test) were used to select polymers for the array that were optimum for distinguishing between important compound classes (quinones, alcohols) and to minimize the sensitivity for other parameters such as humidity. The performance of the developed sensor array was satisfactory to demonstrate identification and separation of stored grain model volatiles at ambient conditions.

  20. Racialized identity and health in Canada: results from a nationally representative survey.

    PubMed

    Veenstra, Gerry

    2009-08-01

    This article uses survey data to investigate health effects of racialization in Canada. The operative sample was comprised of 91,123 Canadians aged 25 and older who completed the 2003 Canadian Community Health Survey. A "racial and cultural background" survey question contributed a variable that differentiated respondents who identified with Aboriginal, Black, Chinese, Filipino, Latin American, South Asian, White, or jointly Aboriginal and White racial/cultural backgrounds. Indicators of diabetes, hypertension and self-rated health were used to assess health. The healthy immigrant effect suppressed some disparity in risk for diabetes by racial/cultural identification. In logistic regression models also containing gender, age, and immigrant status, no racial/cultural identifications corresponded with significantly better health outcomes than those reported by survey respondents identifying as White. Subsequent models indicated that residential locale did little to explain the associations between racial/cultural background and health and that socioeconomic status was only implicated in relatively poor health outcomes for respondents identifying as Aboriginal or Aboriginal/White. Sizable and statistically significant relative risks for poor health for respondents identifying as Aboriginal, Aboriginal/White, Black, Chinese, or South Asian remained unexplained by the models, suggesting that other explanations for health disparities by racialized identity in Canada - perhaps pertaining to experiences with institutional racism and/or the wear and tear of experiences of racism and discrimination in everyday life - also deserve empirical investigation in this context.

  1. Stochastic approaches for time series forecasting of boron: a case study of Western Turkey.

    PubMed

    Durdu, Omer Faruk

    2010-10-01

    In the present study, a seasonal and non-seasonal prediction of boron concentrations time series data for the period of 1996-2004 from Büyük Menderes river in western Turkey are addressed by means of linear stochastic models. The methodology presented here is to develop adequate linear stochastic models known as autoregressive integrated moving average (ARIMA) and multiplicative seasonal autoregressive integrated moving average (SARIMA) to predict boron content in the Büyük Menderes catchment. Initially, the Box-Whisker plots and Kendall's tau test are used to identify the trends during the study period. The measurements locations do not show significant overall trend in boron concentrations, though marginal increasing and decreasing trends are observed for certain periods at some locations. ARIMA modeling approach involves the following three steps: model identification, parameter estimation, and diagnostic checking. In the model identification step, considering the autocorrelation function (ACF) and partial autocorrelation function (PACF) results of boron data series, different ARIMA models are identified. The model gives the minimum Akaike information criterion (AIC) is selected as the best-fit model. The parameter estimation step indicates that the estimated model parameters are significantly different from zero. The diagnostic check step is applied to the residuals of the selected ARIMA models and the results indicate that the residuals are independent, normally distributed, and homoscadastic. For the model validation purposes, the predicted results using the best ARIMA models are compared to the observed data. The predicted data show reasonably good agreement with the actual data. The comparison of the mean and variance of 3-year (2002-2004) observed data vs predicted data from the selected best models show that the boron model from ARIMA modeling approaches could be used in a safe manner since the predicted values from these models preserve the basic statistics of observed data in terms of mean. The ARIMA modeling approach is recommended for predicting boron concentration series of a river.

  2. Statistical Modeling of Extreme Values and Evidence of Presence of Dragon King (DK) in Solar Wind

    NASA Astrophysics Data System (ADS)

    Gomes, T.; Ramos, F.; Rempel, E. L.; Silva, S.; C-L Chian, A.

    2017-12-01

    The solar wind constitutes a nonlinear dynamical system, presenting intermittent turbulence, multifractality and chaotic dynamics. One characteristic shared by many such complex systems is the presence of extreme events, that play an important role in several Geophysical phenomena and their statistical characterization is a problem of great practical relevance. This work investigates the presence of extreme events in time series of the modulus of the interplanetary magnetic field measured by Cluster spacecraft on February 2, 2002. One of the main results is that the solar wind near the Earth's bow shock can be modeled by the Generalized Pareto (GP) and Generalized Extreme Values (GEV) distributions. Both models present a statistically significant positive shape parameter which implyies a heavy tail in the probability distribution functions and an unbounded growth in return values as return periods become too long. There is evidence that current sheets are the main responsible for positive values of the shape parameter. It is also shown that magnetic reconnection at the interface between two interplanetary magnetic flux ropes in the solar wind can be considered as Dragon Kings (DK), a class of extreme events whose formation mechanisms are fundamentally different from others. As long as magnetic reconnection can be classified as a Dragon King, there is the possibility of its identification and even its prediction. Dragon kings had previously been identified in time series of financial crashes, nuclear power generation accidents, stock market and so on. It is believed that they are associated with the occurrence of extreme events in dynamical systems at phase transition, bifurcation, crises or tipping points.

  3. Alternatives for jet engine control

    NASA Technical Reports Server (NTRS)

    Sain, M. K.

    1983-01-01

    Tensor model order reduction, recursive tensor model identification, input design for tensor model identification, software development for nonlinear feedback control laws based upon tensors, and development of the CATNAP software package for tensor modeling, identification and simulation were studied. The last of these are discussed.

  4. A numerical study of sensory-guided multiple views for improved object identification

    NASA Astrophysics Data System (ADS)

    Blakeslee, B. A.; Zelnio, E. G.; Koditschek, D. E.

    2014-06-01

    We explore the potential on-line adjustment of sensory controls for improved object identification and discrimination in the context of a simulated high resolution camera system carried onboard a maneuverable robotic platform that can actively choose its observational position and pose. Our early numerical studies suggest the significant efficacy and enhanced performance achieved by even very simple feedback-driven iteration of the view in contrast to identification from a fixed pose, uninformed by any active adaptation. Specifically, we contrast the discriminative performance of the same conventional classification system when informed by: a random glance at a vehicle; two random glances at a vehicle; or a random glance followed by a guided second look. After each glance, edge detection algorithms isolate the most salient features of the image and template matching is performed through the use of the Hausdor↵ distance, comparing the simulated sensed images with reference images of the vehicles. We present initial simulation statistics that overwhelmingly favor the third scenario. We conclude with a sketch of our near-future steps in this study that will entail: the incorporation of more sophisticated image processing and template matching algorithms; more complex discrimination tasks such as distinguishing between two similar vehicles or vehicles in motion; more realistic models of the observers mobility including platform dynamics and eventually environmental constraints; and expanding the sensing task beyond the identification of a specified object selected from a pre-defined library of alternatives.

  5. Identification of organ tissue types and skin from forensic samples by microRNA expression analysis.

    PubMed

    Sauer, Eva; Extra, Antje; Cachée, Philipp; Courts, Cornelius

    2017-05-01

    The identification of organ tissues in traces recovered from scenes and objects with regard to violent crimes involving serious injuries can be of considerable relevance in forensic investigations. Molecular genetic approaches are provably superior to histological and immunological assays in characterizing organ tissues, and micro-RNAs (miRNAs), due to their cell type specific expression patterns and stability against degradation, emerged as a promising molecular species for forensic analyses, with a range of tried and tested indicative markers. Thus, herein we present the first miRNA based approach for the forensic identification of organ tissues. Using quantitative PCR employing an empirically derived strategy for data normalization and unbiased statistical decision making, we assessed the differential expression of 15 preselected miRNAs in tissues of brain, kidney, lung, liver, heart muscle, skeletal muscle and skin. We show that not only can miRNA expression profiling be used to reliably differentiate between organ tissues but also that this method, which is compatible with and complementary to forensic DNA analysis, is applicable to realistic forensic samples e.g. mixtures, aged and degraded material as well as traces generated by mock stabbings and experimental shootings at ballistic models. Copyright © 2017 Elsevier B.V. All rights reserved.

  6. Comparison between traditional strategies and classification technique (SIMCA) in the identification of old proteinaceous binders.

    PubMed

    Checa-Moreno, R; Manzano, E; Mirón, G; Capitan-Vallvey, L F

    2008-05-15

    In this paper, we performed a comparison between commonly used strategies amino acid ratios (Aa ratios), two-dimensional ratio plots (2D-Plot) and statistical correlation factor (SCF) and a classification technique, soft independent modelling of class analogy (SIMCA), to identify protein binders present in old artwork samples. To do this, we used a natural standard collection of proteinaceous binders prepared in our laboratory using old recipes and eleven samples coming from Cultural Heritage, such as mural and easel paintings, manuscripts and polychrome sculptures from the 15-18th centuries. Protein binder samples were hydrolyzed and their constitutive amino acids were determined as PITC-derivatives using HPLC-DAD. Amino acid profile data were used to perform the comparison between the four different strategies mentioned above. Traditional strategies can lead to ambiguous or non-conclusive results. With SIMCA, it is possible to provide a more robust and less subjective identification knowing the confidence level of identification. As a standard, we used proteinaceous albumin (whole egg, yolk and glair); casein (goat, cow and sheep) and collagen (mammalian and fish). The process results in a more robust understanding of proteinaceous binding media in old artworks that makes it possible to distinguish them according to their origin.

  7. A survey of computational methods and error rate estimation procedures for peptide and protein identification in shotgun proteomics

    PubMed Central

    Nesvizhskii, Alexey I.

    2010-01-01

    This manuscript provides a comprehensive review of the peptide and protein identification process using tandem mass spectrometry (MS/MS) data generated in shotgun proteomic experiments. The commonly used methods for assigning peptide sequences to MS/MS spectra are critically discussed and compared, from basic strategies to advanced multi-stage approaches. A particular attention is paid to the problem of false-positive identifications. Existing statistical approaches for assessing the significance of peptide to spectrum matches are surveyed, ranging from single-spectrum approaches such as expectation values to global error rate estimation procedures such as false discovery rates and posterior probabilities. The importance of using auxiliary discriminant information (mass accuracy, peptide separation coordinates, digestion properties, and etc.) is discussed, and advanced computational approaches for joint modeling of multiple sources of information are presented. This review also includes a detailed analysis of the issues affecting the interpretation of data at the protein level, including the amplification of error rates when going from peptide to protein level, and the ambiguities in inferring the identifies of sample proteins in the presence of shared peptides. Commonly used methods for computing protein-level confidence scores are discussed in detail. The review concludes with a discussion of several outstanding computational issues. PMID:20816881

  8. Developing optimal input design strategies in cancer systems biology with applications to microfluidic device engineering.

    PubMed

    Menolascina, Filippo; Bellomo, Domenico; Maiwald, Thomas; Bevilacqua, Vitoantonio; Ciminelli, Caterina; Paradiso, Angelo; Tommasi, Stefania

    2009-10-15

    Mechanistic models are becoming more and more popular in Systems Biology; identification and control of models underlying biochemical pathways of interest in oncology is a primary goal in this field. Unfortunately the scarce availability of data still limits our understanding of the intrinsic characteristics of complex pathologies like cancer: acquiring information for a system understanding of complex reaction networks is time consuming and expensive. Stimulus response experiments (SRE) have been used to gain a deeper insight into the details of biochemical mechanisms underlying cell life and functioning. Optimisation of the input time-profile, however, still remains a major area of research due to the complexity of the problem and its relevance for the task of information retrieval in systems biology-related experiments. We have addressed the problem of quantifying the information associated to an experiment using the Fisher Information Matrix and we have proposed an optimal experimental design strategy based on evolutionary algorithm to cope with the problem of information gathering in Systems Biology. On the basis of the theoretical results obtained in the field of control systems theory, we have studied the dynamical properties of the signals to be used in cell stimulation. The results of this study have been used to develop a microfluidic device for the automation of the process of cell stimulation for system identification. We have applied the proposed approach to the Epidermal Growth Factor Receptor pathway and we observed that it minimises the amount of parametric uncertainty associated to the identified model. A statistical framework based on Monte-Carlo estimations of the uncertainty ellipsoid confirmed the superiority of optimally designed experiments over canonical inputs. The proposed approach can be easily extended to multiobjective formulations that can also take advantage of identifiability analysis. Moreover, the availability of fully automated microfluidic platforms explicitly developed for the task of biochemical model identification will hopefully reduce the effects of the 'data rich--data poor' paradox in Systems Biology.

  9. Books and Balls: Antecedents and Outcomes of College Identification

    ERIC Educational Resources Information Center

    Porter, Thomas; Hartman, Katherine; Johnson, John Seth

    2011-01-01

    Identification plays a central role in models of giving to an organization. This study presents and tests a general model of giving that highlights status based and affect based drivers of identification. The model was tested using a sample of 114 alumni from 74 different colleges participated in an online survey. Identification was found to…

  10. Source-Modeling Auditory Processes of EEG Data Using EEGLAB and Brainstorm.

    PubMed

    Stropahl, Maren; Bauer, Anna-Katharina R; Debener, Stefan; Bleichner, Martin G

    2018-01-01

    Electroencephalography (EEG) source localization approaches are often used to disentangle the spatial patterns mixed up in scalp EEG recordings. However, approaches differ substantially between experiments, may be strongly parameter-dependent, and results are not necessarily meaningful. In this paper we provide a pipeline for EEG source estimation, from raw EEG data pre-processing using EEGLAB functions up to source-level analysis as implemented in Brainstorm. The pipeline is tested using a data set of 10 individuals performing an auditory attention task. The analysis approach estimates sources of 64-channel EEG data without the prerequisite of individual anatomies or individually digitized sensor positions. First, we show advanced EEG pre-processing using EEGLAB, which includes artifact attenuation using independent component analysis (ICA). ICA is a linear decomposition technique that aims to reveal the underlying statistical sources of mixed signals and is further a powerful tool to attenuate stereotypical artifacts (e.g., eye movements or heartbeat). Data submitted to ICA are pre-processed to facilitate good-quality decompositions. Aiming toward an objective approach on component identification, the semi-automatic CORRMAP algorithm is applied for the identification of components representing prominent and stereotypic artifacts. Second, we present a step-wise approach to estimate active sources of auditory cortex event-related processing, on a single subject level. The presented approach assumes that no individual anatomy is available and therefore the default anatomy ICBM152, as implemented in Brainstorm, is used for all individuals. Individual noise modeling in this dataset is based on the pre-stimulus baseline period. For EEG source modeling we use the OpenMEEG algorithm as the underlying forward model based on the symmetric Boundary Element Method (BEM). We then apply the method of dynamical statistical parametric mapping (dSPM) to obtain physiologically plausible EEG source estimates. Finally, we show how to perform group level analysis in the time domain on anatomically defined regions of interest (auditory scout). The proposed pipeline needs to be tailored to the specific datasets and paradigms. However, the straightforward combination of EEGLAB and Brainstorm analysis tools may be of interest to others performing EEG source localization.

  11. An AI-based approach to structural damage identification by modal analysis

    NASA Technical Reports Server (NTRS)

    Glass, B. J.; Hanagud, S.

    1990-01-01

    Flexible-structure damage is presently addressed by a combined model- and parameter-identification approach which employs the AI methodologies of classification, heuristic search, and object-oriented model knowledge representation. The conditions for model-space search convergence to the best model are discussed in terms of search-tree organization and initial model parameter error. In the illustrative example of a truss structure presented, the use of both model and parameter identification is shown to lead to smaller parameter corrections than would be required by parameter identification alone.

  12. Discriminant analysis of Raman spectra for body fluid identification for forensic purposes.

    PubMed

    Sikirzhytski, Vitali; Virkler, Kelly; Lednev, Igor K

    2010-01-01

    Detection and identification of blood, semen and saliva stains, the most common body fluids encountered at a crime scene, are very important aspects of forensic science today. This study targets the development of a nondestructive, confirmatory method for body fluid identification based on Raman spectroscopy coupled with advanced statistical analysis. Dry traces of blood, semen and saliva obtained from multiple donors were probed using a confocal Raman microscope with a 785-nm excitation wavelength under controlled laboratory conditions. Results demonstrated the capability of Raman spectroscopy to identify an unknown substance to be semen, blood or saliva with high confidence.

  13. Non-coding cancer driver candidates identified with a sample- and position-specific model of the somatic mutation rate

    PubMed Central

    Juul, Malene; Bertl, Johanna; Guo, Qianyun; Nielsen, Morten Muhlig; Świtnicki, Michał; Hornshøj, Henrik; Madsen, Tobias; Hobolth, Asger; Pedersen, Jakob Skou

    2017-01-01

    Non-coding mutations may drive cancer development. Statistical detection of non-coding driver regions is challenged by a varying mutation rate and uncertainty of functional impact. Here, we develop a statistically founded non-coding driver-detection method, ncdDetect, which includes sample-specific mutational signatures, long-range mutation rate variation, and position-specific impact measures. Using ncdDetect, we screened non-coding regulatory regions of protein-coding genes across a pan-cancer set of whole-genomes (n = 505), which top-ranked known drivers and identified new candidates. For individual candidates, presence of non-coding mutations associates with altered expression or decreased patient survival across an independent pan-cancer sample set (n = 5454). This includes an antigen-presenting gene (CD1A), where 5’UTR mutations correlate significantly with decreased survival in melanoma. Additionally, mutations in a base-excision-repair gene (SMUG1) correlate with a C-to-T mutational-signature. Overall, we find that a rich model of mutational heterogeneity facilitates non-coding driver identification and integrative analysis points to candidates of potential clinical relevance. DOI: http://dx.doi.org/10.7554/eLife.21778.001 PMID:28362259

  14. THE CAUSAL ANALYSIS / DIAGNOSIS DECISION ...

    EPA Pesticide Factsheets

    CADDIS is an on-line decision support system that helps investigators in the regions, states and tribes find, access, organize, use and share information to produce causal evaluations in aquatic systems. It is based on the US EPA's Stressor Identification process which is a formal method for identifying causes of impairments in aquatic systems. CADDIS 2007 increases access to relevant information useful for causal analysis and provides methods and tools that practitioners can use to analyze their own data. The new Candidate Cause section provides overviews of commonly encountered causes of impairments to aquatic systems: metals, sediments, nutrients, flow alteration, temperature, ionic strength, and low dissolved oxygen. CADDIS includes new Conceptual Models that illustrate the relationships from sources to stressors to biological effects. An Interactive Conceptual Model for phosphorus links the diagram with supporting literature citations. The new Analyzing Data section helps practitioners analyze their data sets and interpret and use those results as evidence within the USEPA causal assessment process. Downloadable tools include a graphical user interface statistical package (CADStat), and programs for use with the freeware R statistical package, and a Microsoft Excel template. These tools can be used to quantify associations between causes and biological impairments using innovative methods such as species-sensitivity distributions, biological inferenc

  15. Quantitative assessment model for gastric cancer screening

    PubMed Central

    Chen, Kun; Yu, Wei-Ping; Song, Liang; Zhu, Yi-Min

    2005-01-01

    AIM: To set up a mathematic model for gastric cancer screening and to evaluate its function in mass screening for gastric cancer. METHODS: A case control study was carried on in 66 patients and 198 normal people, then the risk and protective factors of gastric cancer were determined, including heavy manual work, foods such as small yellow-fin tuna, dried small shrimps, squills, crabs, mothers suffering from gastric diseases, spouse alive, use of refrigerators and hot food, etc. According to some principles and methods of probability and fuzzy mathematics, a quantitative assessment model was established as follows: first, we selected some factors significant in statistics, and calculated weight coefficient for each one by two different methods; second, population space was divided into gastric cancer fuzzy subset and non gastric cancer fuzzy subset, then a mathematic model for each subset was established, we got a mathematic expression of attribute degree (AD). RESULTS: Based on the data of 63 patients and 693 normal people, AD of each subject was calculated. Considering the sensitivity and specificity, the thresholds of AD values calculated were configured with 0.20 and 0.17, respectively. According to these thresholds, the sensitivity and specificity of the quantitative model were about 69% and 63%. Moreover, statistical test showed that the identification outcomes of these two different calculation methods were identical (P>0.05). CONCLUSION: The validity of this method is satisfactory. It is convenient, feasible, economic and can be used to determine individual and population risks of gastric cancer. PMID:15655813

  16. Comparison of Logistic Regression and Random Forests techniques for shallow landslide susceptibility assessment in Giampilieri (NE Sicily, Italy)

    NASA Astrophysics Data System (ADS)

    Trigila, Alessandro; Iadanza, Carla; Esposito, Carlo; Scarascia-Mugnozza, Gabriele

    2015-11-01

    The aim of this work is to define reliable susceptibility models for shallow landslides using Logistic Regression and Random Forests multivariate statistical techniques. The study area, located in North-East Sicily, was hit on October 1st 2009 by a severe rainstorm (225 mm of cumulative rainfall in 7 h) which caused flash floods and more than 1000 landslides. Several small villages, such as Giampilieri, were hit with 31 fatalities, 6 missing persons and damage to buildings and transportation infrastructures. Landslides, mainly types such as earth and debris translational slides evolving into debris flows, were triggered on steep slopes and involved colluvium and regolith materials which cover the underlying metamorphic bedrock. The work has been carried out with the following steps: i) realization of a detailed event landslide inventory map through field surveys coupled with observation of high resolution aerial colour orthophoto; ii) identification of landslide source areas; iii) data preparation of landslide controlling factors and descriptive statistics based on a bivariate method (Frequency Ratio) to get an initial overview on existing relationships between causative factors and shallow landslide source areas; iv) choice of criteria for the selection and sizing of the mapping unit; v) implementation of 5 multivariate statistical susceptibility models based on Logistic Regression and Random Forests techniques and focused on landslide source areas; vi) evaluation of the influence of sample size and type of sampling on results and performance of the models; vii) evaluation of the predictive capabilities of the models using ROC curve, AUC and contingency tables; viii) comparison of model results and obtained susceptibility maps; and ix) analysis of temporal variation of landslide susceptibility related to input parameter changes. Models based on Logistic Regression and Random Forests have demonstrated excellent predictive capabilities. Land use and wildfire variables were found to have a strong control on the occurrence of very rapid shallow landslides.

  17. Accounting for standard errors of vision-specific latent trait in regression models.

    PubMed

    Wong, Wan Ling; Li, Xiang; Li, Jialiang; Wong, Tien Yin; Cheng, Ching-Yu; Lamoureux, Ecosse L

    2014-07-11

    To demonstrate the effectiveness of Hierarchical Bayesian (HB) approach in a modeling framework for association effects that accounts for SEs of vision-specific latent traits assessed using Rasch analysis. A systematic literature review was conducted in four major ophthalmic journals to evaluate Rasch analysis performed on vision-specific instruments. The HB approach was used to synthesize the Rasch model and multiple linear regression model for the assessment of the association effects related to vision-specific latent traits. The effectiveness of this novel HB one-stage "joint-analysis" approach allows all model parameters to be estimated simultaneously and was compared with the frequently used two-stage "separate-analysis" approach in our simulation study (Rasch analysis followed by traditional statistical analyses without adjustment for SE of latent trait). Sixty-six reviewed articles performed evaluation and validation of vision-specific instruments using Rasch analysis, and 86.4% (n = 57) performed further statistical analyses on the Rasch-scaled data using traditional statistical methods; none took into consideration SEs of the estimated Rasch-scaled scores. The two models on real data differed for effect size estimations and the identification of "independent risk factors." Simulation results showed that our proposed HB one-stage "joint-analysis" approach produces greater accuracy (average of 5-fold decrease in bias) with comparable power and precision in estimation of associations when compared with the frequently used two-stage "separate-analysis" procedure despite accounting for greater uncertainty due to the latent trait. Patient-reported data, using Rasch analysis techniques, do not take into account the SE of latent trait in association analyses. The HB one-stage "joint-analysis" is a better approach, producing accurate effect size estimations and information about the independent association of exposure variables with vision-specific latent traits. Copyright 2014 The Association for Research in Vision and Ophthalmology, Inc.

  18. Project T.E.A.M. (Technical Education Advancement Modules). Introduction to Statistical Process Control.

    ERIC Educational Resources Information Center

    Billings, Paul H.

    This instructional guide, one of a series developed by the Technical Education Advancement Modules (TEAM) project, is a 6-hour introductory module on statistical process control (SPC), designed to develop competencies in the following skill areas: (1) identification of the three classes of SPC use; (2) understanding a process and how it works; (3)…

  19. Preliminary results from a method to update timber resource statistics in North Carolina

    Treesearch

    Glenn P. Catts; Noel D. Cost; Raymond L. Czaplewski; Paul W. Snook

    1987-01-01

    Forest Inventory and Analysis units of the USDA Forest Service produce timber resource statistics every 8 to 10 years. Midcycle surveys are often performed to update inventory estimates. This requires timely identification of forest lands. There are several kinds of remotely sensed data that are suitable for this purpose. Medium scale color infrared aerial photography...

  20. DNA Damage and Genetic Instability as Harbingers of Prostate Cancer

    DTIC Science & Technology

    2013-01-01

    incidence of prostate cancer as compared to placebo. Primary analysis of this trial indicated no statistically significant effect of selenium...Identification, isolation, staining, processing, and statistical analysis of slides for ERG and PTEN markers (aim 1) and interpretation of these results...participating in this study being conducted under Investigational New Drug #29829 from the Food and Drug Administration. STANDARD TREATMENT Patients

Top