Sanford, Ward E.; Nelms, David L.; Pope, Jason P.; Selnick, David L.
2012-01-01
This study by the U.S. Geological Survey, prepared in cooperation with the Virginia Department of Environmental Quality, quantifies the components of the hydrologic cycle across the Commonwealth of Virginia. Long-term, mean fluxes were calculated for precipitation, surface runoff, infiltration, total evapotranspiration (ET), riparian ET, recharge, base flow (or groundwater discharge) and net total outflow. Fluxes of these components were first estimated on a number of real-time-gaged watersheds across Virginia. Specific conductance was used to distinguish and separate surface runoff from base flow. Specific-conductance data were collected every 15 minutes at 75 real-time gages for approximately 18 months between March 2007 and August 2008. Precipitation was estimated for 1971–2000 using PRISM climate data. Precipitation and temperature from the PRISM data were used to develop a regression-based relation to estimate total ET. The proportion of watershed precipitation that becomes surface runoff was related to physiographic province and rock type in a runoff regression equation. Component flux estimates from the watersheds were transferred to flux estimates for counties and independent cities using the ET and runoff regression equations. Only 48 of the 75 watersheds yielded sufficient data, and data from these 48 were used in the final runoff regression equation. The base-flow proportion for the 48 watersheds averaged 72 percent using specific conductance, a value that was substantially higher than the 61 percent average calculated using a graphical-separation technique (the USGS program PART). Final results for the study are presented as component flux estimates for all counties and independent cities in Virginia.
Boosted Regression Tree Models to Explain Watershed Nutrient Concentrations and Biological Condition
Boosted regression tree (BRT) models were developed to quantify the nonlinear relationships between landscape variables and nutrient concentrations in a mesoscale mixed land cover watershed during base-flow conditions. Factors that affect instream biological components, based on ...
HT-FRTC: a fast radiative transfer code using kernel regression
NASA Astrophysics Data System (ADS)
Thelen, Jean-Claude; Havemann, Stephan; Lewis, Warren
2016-09-01
The HT-FRTC is a principal component based fast radiative transfer code that can be used across the electromagnetic spectrum from the microwave through to the ultraviolet to calculate transmittance, radiance and flux spectra. The principal components cover the spectrum at a very high spectral resolution, which allows very fast line-by-line, hyperspectral and broadband simulations for satellite-based, airborne and ground-based sensors. The principal components are derived during a code training phase from line-by-line simulations for a diverse set of atmosphere and surface conditions. The derived principal components are sensor independent, i.e. no extra training is required to include additional sensors. During the training phase we also derive the predictors which are required by the fast radiative transfer code to determine the principal component scores from the monochromatic radiances (or fluxes, transmittances). These predictors are calculated for each training profile at a small number of frequencies, which are selected by a k-means cluster algorithm during the training phase. Until recently the predictors were calculated using a linear regression. However, during a recent rewrite of the code the linear regression was replaced by a Gaussian Process (GP) regression which resulted in a significant increase in accuracy when compared to the linear regression. The HT-FRTC has been trained with a large variety of gases, surface properties and scatterers. Rayleigh scattering as well as scattering by frozen/liquid clouds, hydrometeors and aerosols have all been included. The scattering phase function can be fully accounted for by an integrated line-by-line version of the Edwards-Slingo spherical harmonics radiation code or approximately by a modification to the extinction (Chou scaling).
On the influence of high-pass filtering on ICA-based artifact reduction in EEG-ERP.
Winkler, Irene; Debener, Stefan; Müller, Klaus-Robert; Tangermann, Michael
2015-01-01
Standard artifact removal methods for electroencephalographic (EEG) signals are either based on Independent Component Analysis (ICA) or they regress out ocular activity measured at electrooculogram (EOG) channels. Successful ICA-based artifact reduction relies on suitable pre-processing. Here we systematically evaluate the effects of high-pass filtering at different frequencies. Offline analyses were based on event-related potential data from 21 participants performing a standard auditory oddball task and an automatic artifactual component classifier method (MARA). As a pre-processing step for ICA, high-pass filtering between 1-2 Hz consistently produced good results in terms of signal-to-noise ratio (SNR), single-trial classification accuracy and the percentage of `near-dipolar' ICA components. Relative to no artifact reduction, ICA-based artifact removal significantly improved SNR and classification accuracy. This was not the case for a regression-based approach to remove EOG artifacts.
[New method of mixed gas infrared spectrum analysis based on SVM].
Bai, Peng; Xie, Wen-Jun; Liu, Jun-Hua
2007-07-01
A new method of infrared spectrum analysis based on support vector machine (SVM) for mixture gas was proposed. The kernel function in SVM was used to map the seriously overlapping absorption spectrum into high-dimensional space, and after transformation, the high-dimensional data could be processed in the original space, so the regression calibration model was established, then the regression calibration model with was applied to analyze the concentration of component gas. Meanwhile it was proved that the regression calibration model with SVM also could be used for component recognition of mixture gas. The method was applied to the analysis of different data samples. Some factors such as scan interval, range of the wavelength, kernel function and penalty coefficient C that affect the model were discussed. Experimental results show that the component concentration maximal Mean AE is 0.132%, and the component recognition accuracy is higher than 94%. The problems of overlapping absorption spectrum, using the same method for qualitative and quantitative analysis, and limit number of training sample, were solved. The method could be used in other mixture gas infrared spectrum analyses, promising theoretic and application values.
Principal component regression analysis with SPSS.
Liu, R X; Kuang, J; Gong, Q; Hou, X L
2003-06-01
The paper introduces all indices of multicollinearity diagnoses, the basic principle of principal component regression and determination of 'best' equation method. The paper uses an example to describe how to do principal component regression analysis with SPSS 10.0: including all calculating processes of the principal component regression and all operations of linear regression, factor analysis, descriptives, compute variable and bivariate correlations procedures in SPSS 10.0. The principal component regression analysis can be used to overcome disturbance of the multicollinearity. The simplified, speeded up and accurate statistical effect is reached through the principal component regression analysis with SPSS.
Sanford, Ward E.; Nelms, David L.; Pope, Jason P.; Selnick, David L.
2015-01-01
Mean long-term hydrologic budget components, such as recharge and base flow, are often difficult to estimate because they can vary substantially in space and time. Mean long-term fluxes were calculated in this study for precipitation, surface runoff, infiltration, total evapotranspiration (ET), riparian ET, recharge, base flow (or groundwater discharge) and net total outflow using long-term estimates of mean ET and precipitation and the assumption that the relative change in storage over that 30-year period is small compared to the total ET or precipitation. Fluxes of these components were first estimated on a number of real-time-gaged watersheds across Virginia. Specific conductance was used to distinguish and separate surface runoff from base flow. Specific-conductance (SC) data were collected every 15 minutes at 75 real-time gages for approximately 18 months between March 2007 and August 2008. Precipitation was estimated for 1971-2000 using PRISM climate data. Precipitation and temperature from the PRISM data were used to develop a regression-based relation to estimate total ET. The proportion of watershed precipitation that becomes surface runoff was related to physiographic province and rock type in a runoff regression equation. A new approach to estimate riparian ET using seasonal SC data gave results consistent with those from other methods. Component flux estimates from the watersheds were transferred to flux estimates for counties and independent cities using the ET and runoff regression equations. Only 48 of the 75 watersheds yielded sufficient data, and data from these 48 were used in the final runoff regression equation. Final results for the study are presented as component flux estimates for all counties and independent cities in Virginia. The method has the potential to be applied in many other states in the U.S. or in other regions or countries of the world where climate and stream flow data are plentiful.
Analysis and improvement measures of flight delay in China
NASA Astrophysics Data System (ADS)
Zang, Yuhang
2017-03-01
Firstly, this paper establishes the principal component regression model to analyze the data quantitatively, based on principal component analysis to get the three principal component factors of flight delays. Then the least square method is used to analyze the factors and obtained the regression equation expression by substitution, and then found that the main reason for flight delays is airlines, followed by weather and traffic. Aiming at the above problems, this paper improves the controllable aspects of traffic flow control. For reasons of traffic flow control, an adaptive genetic queuing model is established for the runway terminal area. This paper, establish optimization method that fifteen planes landed simultaneously on the three runway based on Beijing capital international airport, comparing the results with the existing FCFS algorithm, the superiority of the model is proved.
Austin, Peter C
2010-04-22
Multilevel logistic regression models are increasingly being used to analyze clustered data in medical, public health, epidemiological, and educational research. Procedures for estimating the parameters of such models are available in many statistical software packages. There is currently little evidence on the minimum number of clusters necessary to reliably fit multilevel regression models. We conducted a Monte Carlo study to compare the performance of different statistical software procedures for estimating multilevel logistic regression models when the number of clusters was low. We examined procedures available in BUGS, HLM, R, SAS, and Stata. We found that there were qualitative differences in the performance of different software procedures for estimating multilevel logistic models when the number of clusters was low. Among the likelihood-based procedures, estimation methods based on adaptive Gauss-Hermite approximations to the likelihood (glmer in R and xtlogit in Stata) or adaptive Gaussian quadrature (Proc NLMIXED in SAS) tended to have superior performance for estimating variance components when the number of clusters was small, compared to software procedures based on penalized quasi-likelihood. However, only Bayesian estimation with BUGS allowed for accurate estimation of variance components when there were fewer than 10 clusters. For all statistical software procedures, estimation of variance components tended to be poor when there were only five subjects per cluster, regardless of the number of clusters.
Oil and gas pipeline construction cost analysis and developing regression models for cost estimation
NASA Astrophysics Data System (ADS)
Thaduri, Ravi Kiran
In this study, cost data for 180 pipelines and 136 compressor stations have been analyzed. On the basis of the distribution analysis, regression models have been developed. Material, Labor, ROW and miscellaneous costs make up the total cost of a pipeline construction. The pipelines are analyzed based on different pipeline lengths, diameter, location, pipeline volume and year of completion. In a pipeline construction, labor costs dominate the total costs with a share of about 40%. Multiple non-linear regression models are developed to estimate the component costs of pipelines for various cross-sectional areas, lengths and locations. The Compressor stations are analyzed based on the capacity, year of completion and location. Unlike the pipeline costs, material costs dominate the total costs in the construction of compressor station, with an average share of about 50.6%. Land costs have very little influence on the total costs. Similar regression models are developed to estimate the component costs of compressor station for various capacities and locations.
Allegrini, Franco; Braga, Jez W B; Moreira, Alessandro C O; Olivieri, Alejandro C
2018-06-29
A new multivariate regression model, named Error Covariance Penalized Regression (ECPR) is presented. Following a penalized regression strategy, the proposed model incorporates information about the measurement error structure of the system, using the error covariance matrix (ECM) as a penalization term. Results are reported from both simulations and experimental data based on replicate mid and near infrared (MIR and NIR) spectral measurements. The results for ECPR are better under non-iid conditions when compared with traditional first-order multivariate methods such as ridge regression (RR), principal component regression (PCR) and partial least-squares regression (PLS). Copyright © 2018 Elsevier B.V. All rights reserved.
ERIC Educational Resources Information Center
Mugrage, Beverly; And Others
Three ridge regression solutions are compared with ordinary least squares regression and with principal components regression using all components. Ridge regression, particularly the Lawless-Wang solution, out-performed ordinary least squares regression and the principal components solution on the criteria of stability of coefficient and closeness…
Additivity of nonlinear biomass equations
Bernard R. Parresol
2001-01-01
Two procedures that guarantee the property of additivity among the components of tree biomass and total tree biomass utilizing nonlinear functions are developed. Procedure 1 is a simple combination approach, and procedure 2 is based on nonlinear joint-generalized regression (nonlinear seemingly unrelated regressions) with parameter restrictions. Statistical theory is...
Peng, Ying; Li, Su-Ning; Pei, Xuexue; Hao, Kun
2018-03-01
Amultivariate regression statisticstrategy was developed to clarify multi-components content-effect correlation ofpanaxginseng saponins extract and predict the pharmacological effect by components content. In example 1, firstly, we compared pharmacological effects between panax ginseng saponins extract and individual saponin combinations. Secondly, we examined the anti-platelet aggregation effect in seven different saponin combinations of ginsenoside Rb1, Rg1, Rh, Rd, Ra3 and notoginsenoside R1. Finally, the correlation between anti-platelet aggregation and the content of multiple components was analyzed by a partial least squares algorithm. In example 2, firstly, 18 common peaks were identified in ten different batches of panax ginseng saponins extracts from different origins. Then, we investigated the anti-myocardial ischemia reperfusion injury effects of the ten different panax ginseng saponins extracts. Finally, the correlation between the fingerprints and the cardioprotective effects was analyzed by a partial least squares algorithm. Both in example 1 and 2, the relationship between the components content and pharmacological effect was modeled well by the partial least squares regression equations. Importantly, the predicted effect curve was close to the observed data of dot marked on the partial least squares regression model. This study has given evidences that themulti-component content is a promising information for predicting the pharmacological effects of traditional Chinese medicine.
Krishna P. Poudel; Temesgen Hailemariam
2016-01-01
Using data from destructively sampled Douglas-fir and lodgepole pine trees, we evaluated the performance of regional volume and component biomass equations in terms of bias and RMSE. The volume and component biomass equations were calibrated using three different adjustment methods that used: (a) a correction factor based on ordinary least square regression through...
Mohd Yusof, Mohd Yusmiaidil Putera; Cauwels, Rita; Deschepper, Ellen; Martens, Luc
2015-08-01
The third molar development (TMD) has been widely utilized as one of the radiographic method for dental age estimation. By using the same radiograph of the same individual, third molar eruption (TME) information can be incorporated to the TMD regression model. This study aims to evaluate the performance of dental age estimation in individual method models and the combined model (TMD and TME) based on the classic regressions of multiple linear and principal component analysis. A sample of 705 digital panoramic radiographs of Malay sub-adults aged between 14.1 and 23.8 years was collected. The techniques described by Gleiser and Hunt (modified by Kohler) and Olze were employed to stage the TMD and TME, respectively. The data was divided to develop three respective models based on the two regressions of multiple linear and principal component analysis. The trained models were then validated on the test sample and the accuracy of age prediction was compared between each model. The coefficient of determination (R²) and root mean square error (RMSE) were calculated. In both genders, adjusted R² yielded an increment in the linear regressions of combined model as compared to the individual models. The overall decrease in RMSE was detected in combined model as compared to TMD (0.03-0.06) and TME (0.2-0.8). In principal component regression, low value of adjusted R(2) and high RMSE except in male were exhibited in combined model. Dental age estimation is better predicted using combined model in multiple linear regression models. Copyright © 2015 Elsevier Ltd and Faculty of Forensic and Legal Medicine. All rights reserved.
USDA-ARS?s Scientific Manuscript database
Selective principal component regression analysis (SPCR) uses a subset of the original image bands for principal component transformation and regression. For optimal band selection before the transformation, this paper used genetic algorithms (GA). In this case, the GA process used the regression co...
NASA Astrophysics Data System (ADS)
Wibowo, Wahyu; Wene, Chatrien; Budiantara, I. Nyoman; Permatasari, Erma Oktania
2017-03-01
Multiresponse semiparametric regression is simultaneous equation regression model and fusion of parametric and nonparametric model. The regression model comprise several models and each model has two components, parametric and nonparametric. The used model has linear function as parametric and polynomial truncated spline as nonparametric component. The model can handle both linearity and nonlinearity relationship between response and the sets of predictor variables. The aim of this paper is to demonstrate the application of the regression model for modeling of effect of regional socio-economic on use of information technology. More specific, the response variables are percentage of households has access to internet and percentage of households has personal computer. Then, predictor variables are percentage of literacy people, percentage of electrification and percentage of economic growth. Based on identification of the relationship between response and predictor variable, economic growth is treated as nonparametric predictor and the others are parametric predictors. The result shows that the multiresponse semiparametric regression can be applied well as indicate by the high coefficient determination, 90 percent.
Elkhoudary, Mahmoud M; Naguib, Ibrahim A; Abdel Salam, Randa A; Hadad, Ghada M
2017-05-01
Four accurate, sensitive and reliable stability indicating chemometric methods were developed for the quantitative determination of Agomelatine (AGM) whether in pure form or in pharmaceutical formulations. Two supervised learning machines' methods; linear artificial neural networks (PC-linANN) preceded by principle component analysis and linear support vector regression (linSVR), were compared with two principle component based methods; principle component regression (PCR) as well as partial least squares (PLS) for the spectrofluorimetric determination of AGM and its degradants. The results showed the benefits behind using linear learning machines' methods and the inherent merits of their algorithms in handling overlapped noisy spectral data especially during the challenging determination of AGM alkaline and acidic degradants (DG1 and DG2). Relative mean squared error of prediction (RMSEP) for the proposed models in the determination of AGM were 1.68, 1.72, 0.68 and 0.22 for PCR, PLS, SVR and PC-linANN; respectively. The results showed the superiority of supervised learning machines' methods over principle component based methods. Besides, the results suggested that linANN is the method of choice for determination of components in low amounts with similar overlapped spectra and narrow linearity range. Comparison between the proposed chemometric models and a reported HPLC method revealed the comparable performance and quantification power of the proposed models.
Revisiting tests for neglected nonlinearity using artificial neural networks.
Cho, Jin Seo; Ishida, Isao; White, Halbert
2011-05-01
Tests for regression neglected nonlinearity based on artificial neural networks (ANNs) have so far been studied by separately analyzing the two ways in which the null of regression linearity can hold. This implies that the asymptotic behavior of general ANN-based tests for neglected nonlinearity is still an open question. Here we analyze a convenient ANN-based quasi-likelihood ratio statistic for testing neglected nonlinearity, paying careful attention to both components of the null. We derive the asymptotic null distribution under each component separately and analyze their interaction. Somewhat remarkably, it turns out that the previously known asymptotic null distribution for the type 1 case still applies, but under somewhat stronger conditions than previously recognized. We present Monte Carlo experiments corroborating our theoretical results and showing that standard methods can yield misleading inference when our new, stronger regularity conditions are violated.
Regression to fuzziness method for estimation of remaining useful life in power plant components
NASA Astrophysics Data System (ADS)
Alamaniotis, Miltiadis; Grelle, Austin; Tsoukalas, Lefteri H.
2014-10-01
Mitigation of severe accidents in power plants requires the reliable operation of all systems and the on-time replacement of mechanical components. Therefore, the continuous surveillance of power systems is a crucial concern for the overall safety, cost control, and on-time maintenance of a power plant. In this paper a methodology called regression to fuzziness is presented that estimates the remaining useful life (RUL) of power plant components. The RUL is defined as the difference between the time that a measurement was taken and the estimated failure time of that component. The methodology aims to compensate for a potential lack of historical data by modeling an expert's operational experience and expertise applied to the system. It initially identifies critical degradation parameters and their associated value range. Once completed, the operator's experience is modeled through fuzzy sets which span the entire parameter range. This model is then synergistically used with linear regression and a component's failure point to estimate the RUL. The proposed methodology is tested on estimating the RUL of a turbine (the basic electrical generating component of a power plant) in three different cases. Results demonstrate the benefits of the methodology for components for which operational data is not readily available and emphasize the significance of the selection of fuzzy sets and the effect of knowledge representation on the predicted output. To verify the effectiveness of the methodology, it was benchmarked against the data-based simple linear regression model used for predictions which was shown to perform equal or worse than the presented methodology. Furthermore, methodology comparison highlighted the improvement in estimation offered by the adoption of appropriate of fuzzy sets for parameter representation.
Multicomponent analysis of a digital Trail Making Test.
Fellows, Robert P; Dahmen, Jessamyn; Cook, Diane; Schmitter-Edgecombe, Maureen
2017-01-01
The purpose of the current study was to use a newly developed digital tablet-based variant of the TMT to isolate component cognitive processes underlying TMT performance. Similar to the paper-based trail making test, this digital variant consists of two conditions, Part A and Part B. However, this digital version automatically collects additional data to create component subtest scores to isolate cognitive abilities. Specifically, in addition to the total time to completion and number of errors, the digital Trail Making Test (dTMT) records several unique components including the number of pauses, pause duration, lifts, lift duration, time inside each circle, and time between circles. Participants were community-dwelling older adults who completed a neuropsychological evaluation including measures of processing speed, inhibitory control, visual working memory/sequencing, and set-switching. The abilities underlying TMT performance were assessed through regression analyses of component scores from the dTMT with traditional neuropsychological measures. Results revealed significant correlations between paper and digital variants of Part A (r s = .541, p < .001) and paper and digital versions of Part B (r s = .799, p < .001). Regression analyses with traditional neuropsychological measures revealed that Part A components were best predicted by speeded processing, while inhibitory control and visual/spatial sequencing were predictors of specific components of Part B. Exploratory analyses revealed that specific dTMT-B components were associated with a performance-based medication management task. Taken together, these results elucidate specific cognitive abilities underlying TMT performance, as well as the utility of isolating digital components.
Aaron Weiskittel; Jereme Frank; David Walker; Phil Radtke; David Macfarlane; James Westfall
2015-01-01
Prediction of forest biomass and carbon is becoming important issues in the United States. However, estimating forest biomass and carbon is difficult and relies on empirically-derived regression equations. Based on recent findings from a national gap analysis and comprehensive assessment of the USDA Forest Service Forest Inventory and Analysis (USFS-FIA) component...
Experimental investigation of fuel regression rate in a HTPB based lab-scale hybrid rocket motor
NASA Astrophysics Data System (ADS)
Li, Xintian; Tian, Hui; Yu, Nanjia; Cai, Guobiao
2014-12-01
The fuel regression rate is an important parameter in the design process of the hybrid rocket motor. Additives in the solid fuel may have influences on the fuel regression rate, which will affect the internal ballistics of the motor. A series of firing experiments have been conducted on lab-scale hybrid rocket motors with 98% hydrogen peroxide (H2O2) oxidizer and hydroxyl terminated polybutadiene (HTPB) based fuels in this paper. An innovative fuel regression rate analysis method is established to diminish the errors caused by start and tailing stages in a short time firing test. The effects of the metal Mg, Al, aromatic hydrocarbon anthracene (C14H10), and carbon black (C) on the fuel regression rate are investigated. The fuel regression rate formulas of different fuel components are fitted according to the experiment data. The results indicate that the influence of C14H10 on the fuel regression rate of HTPB is not evident. However, the metal additives in the HTPB fuel can increase the fuel regression rate significantly.
A Method to Measure the Transverse Magnetic Field and Orient the Rotational Axis of Stars
DOE Office of Scientific and Technical Information (OSTI.GOV)
Leone, Francesco; Scalia, Cesare; Gangi, Manuele
Direct measurements of stellar magnetic fields are based on the splitting of spectral lines into polarized Zeeman components. With a few exceptions, Zeeman signatures are hidden in data noise, and a number of methods have been developed to measure the average, over the visible stellar disk, of longitudinal components of the magnetic field. At present, faint stars are only observable via low-resolution spectropolarimetry, which is a method based on the regression of the Stokes V signal against the first derivative of Stokes I . Here, we present an extension of this method to obtain a direct measurement of the transversemore » component of stellar magnetic fields by the regression of high-resolution Stokes Q and U as a function of the second derivative of Stokes I . We also show that it is possible to determine the orientation in the sky of the rotation axis of a star on the basis of the periodic variability of the transverse component due to its rotation. The method is applied to data, obtained with the Catania Astrophysical Observatory Spectropolarimeter along the rotational period of the well known magnetic star β CrB.« less
Kwok, Sylvia Lai Yuk Ching; Shek, Daniel Tan Lei
2010-03-05
Utilizing Daniel Goleman's theory of emotional competence, Beck's cognitive theory, and Rudd's cognitive-behavioral theory of suicidality, the relationships between hopelessness (cognitive component), social problem solving (cognitive-behavioral component), emotional competence (emotive component), and adolescent suicidal ideation were examined. Based on the responses of 5,557 Secondary 1 to Secondary 4 students from 42 secondary schools in Hong Kong, results showed that suicidal ideation was positively related to adolescent hopelessness, but negatively related to emotional competence and social problem solving. While standard regression analyses showed that all the above variables were significant predictors of suicidal ideation, hierarchical regression analyses showed that hopelessness was the most important predictor of suicidal ideation, followed by social problem solving and emotional competence. Further regression analyses found that all four subscales of emotional competence, i.e., empathy, social skills, self-management of emotions, and utilization of emotions, were important predictors of male adolescent suicidal ideation. However, the subscale of social skills was not a significant predictor of female adolescent suicidal ideation. Standard regression analysis also revealed that all three subscales of social problem solving, i.e., negative problem orientation, rational problem solving, and impulsiveness/carelessness style, were important predictors of suicidal ideation. Theoretical and practice implications of the findings are discussed.
Ng, S K; McLachlan, G J
2003-04-15
We consider a mixture model approach to the regression analysis of competing-risks data. Attention is focused on inference concerning the effects of factors on both the probability of occurrence and the hazard rate conditional on each of the failure types. These two quantities are specified in the mixture model using the logistic model and the proportional hazards model, respectively. We propose a semi-parametric mixture method to estimate the logistic and regression coefficients jointly, whereby the component-baseline hazard functions are completely unspecified. Estimation is based on maximum likelihood on the basis of the full likelihood, implemented via an expectation-conditional maximization (ECM) algorithm. Simulation studies are performed to compare the performance of the proposed semi-parametric method with a fully parametric mixture approach. The results show that when the component-baseline hazard is monotonic increasing, the semi-parametric and fully parametric mixture approaches are comparable for mildly and moderately censored samples. When the component-baseline hazard is not monotonic increasing, the semi-parametric method consistently provides less biased estimates than a fully parametric approach and is comparable in efficiency in the estimation of the parameters for all levels of censoring. The methods are illustrated using a real data set of prostate cancer patients treated with different dosages of the drug diethylstilbestrol. Copyright 2003 John Wiley & Sons, Ltd.
Modeling Governance KB with CATPCA to Overcome Multicollinearity in the Logistic Regression
NASA Astrophysics Data System (ADS)
Khikmah, L.; Wijayanto, H.; Syafitri, U. D.
2017-04-01
The problem often encounters in logistic regression modeling are multicollinearity problems. Data that have multicollinearity between explanatory variables with the result in the estimation of parameters to be bias. Besides, the multicollinearity will result in error in the classification. In general, to overcome multicollinearity in regression used stepwise regression. They are also another method to overcome multicollinearity which involves all variable for prediction. That is Principal Component Analysis (PCA). However, classical PCA in only for numeric data. Its data are categorical, one method to solve the problems is Categorical Principal Component Analysis (CATPCA). Data were used in this research were a part of data Demographic and Population Survey Indonesia (IDHS) 2012. This research focuses on the characteristic of women of using the contraceptive methods. Classification results evaluated using Area Under Curve (AUC) values. The higher the AUC value, the better. Based on AUC values, the classification of the contraceptive method using stepwise method (58.66%) is better than the logistic regression model (57.39%) and CATPCA (57.39%). Evaluation of the results of logistic regression using sensitivity, shows the opposite where CATPCA method (99.79%) is better than logistic regression method (92.43%) and stepwise (92.05%). Therefore in this study focuses on major class classification (using a contraceptive method), then the selected model is CATPCA because it can raise the level of the major class model accuracy.
NASA Astrophysics Data System (ADS)
Nakamuta, Y.; Urata, K.; Shibata, Y.; Kuwahara, Y.
2017-03-01
In Lindsley's thermometry, a revised sequence of calculation of components is proposed for clinopyroxene, in which kosmochlor component is added. Temperatures obtained for the components calculated by the revised method are about 50 °C lower than those obtained for the components calculated by the Lindsley's original method and agree well with temperatures obtained from orthopyroxenes. Ca-partitioning between clino- and orthopyroxenes is then thought to be equilibrated in types 5 to 7 ordinary chondrites. The temperatures for Tuxtuac (LL5), Dhurmsala (LL6), NWA 2092 (LL6/7), and Dho 011 (LL7) are 767-793°, 818-835°, 872-892°, and 917-936°C, respectively, suggesting that chondrites of higher petrographic types show higher equilibrium temperatures of pyroxenes. The regression equations which relate temperature and Wo and Fs contents in the temperature-contoured pyroxene quadrilateral of 1 atm of Lindsley (1983) are also determined by the least squares method. It is possible to reproduce temperatures with an error less than 20 °C (2SE) using the regression equations.
Zhao, Yu Xi; Xie, Ping; Sang, Yan Fang; Wu, Zi Yi
2018-04-01
Hydrological process evaluation is temporal dependent. Hydrological time series including dependence components do not meet the data consistency assumption for hydrological computation. Both of those factors cause great difficulty for water researches. Given the existence of hydrological dependence variability, we proposed a correlationcoefficient-based method for significance evaluation of hydrological dependence based on auto-regression model. By calculating the correlation coefficient between the original series and its dependence component and selecting reasonable thresholds of correlation coefficient, this method divided significance degree of dependence into no variability, weak variability, mid variability, strong variability, and drastic variability. By deducing the relationship between correlation coefficient and auto-correlation coefficient in each order of series, we found that the correlation coefficient was mainly determined by the magnitude of auto-correlation coefficient from the 1 order to p order, which clarified the theoretical basis of this method. With the first-order and second-order auto-regression models as examples, the reasonability of the deduced formula was verified through Monte-Carlo experiments to classify the relationship between correlation coefficient and auto-correlation coefficient. This method was used to analyze three observed hydrological time series. The results indicated the coexistence of stochastic and dependence characteristics in hydrological process.
Assessment of Weighted Quantile Sum Regression for Modeling Chemical Mixtures and Cancer Risk
Czarnota, Jenna; Gennings, Chris; Wheeler, David C
2015-01-01
In evaluation of cancer risk related to environmental chemical exposures, the effect of many chemicals on disease is ultimately of interest. However, because of potentially strong correlations among chemicals that occur together, traditional regression methods suffer from collinearity effects, including regression coefficient sign reversal and variance inflation. In addition, penalized regression methods designed to remediate collinearity may have limitations in selecting the truly bad actors among many correlated components. The recently proposed method of weighted quantile sum (WQS) regression attempts to overcome these problems by estimating a body burden index, which identifies important chemicals in a mixture of correlated environmental chemicals. Our focus was on assessing through simulation studies the accuracy of WQS regression in detecting subsets of chemicals associated with health outcomes (binary and continuous) in site-specific analyses and in non-site-specific analyses. We also evaluated the performance of the penalized regression methods of lasso, adaptive lasso, and elastic net in correctly classifying chemicals as bad actors or unrelated to the outcome. We based the simulation study on data from the National Cancer Institute Surveillance Epidemiology and End Results Program (NCI-SEER) case–control study of non-Hodgkin lymphoma (NHL) to achieve realistic exposure situations. Our results showed that WQS regression had good sensitivity and specificity across a variety of conditions considered in this study. The shrinkage methods had a tendency to incorrectly identify a large number of components, especially in the case of strong association with the outcome. PMID:26005323
Assessment of weighted quantile sum regression for modeling chemical mixtures and cancer risk.
Czarnota, Jenna; Gennings, Chris; Wheeler, David C
2015-01-01
In evaluation of cancer risk related to environmental chemical exposures, the effect of many chemicals on disease is ultimately of interest. However, because of potentially strong correlations among chemicals that occur together, traditional regression methods suffer from collinearity effects, including regression coefficient sign reversal and variance inflation. In addition, penalized regression methods designed to remediate collinearity may have limitations in selecting the truly bad actors among many correlated components. The recently proposed method of weighted quantile sum (WQS) regression attempts to overcome these problems by estimating a body burden index, which identifies important chemicals in a mixture of correlated environmental chemicals. Our focus was on assessing through simulation studies the accuracy of WQS regression in detecting subsets of chemicals associated with health outcomes (binary and continuous) in site-specific analyses and in non-site-specific analyses. We also evaluated the performance of the penalized regression methods of lasso, adaptive lasso, and elastic net in correctly classifying chemicals as bad actors or unrelated to the outcome. We based the simulation study on data from the National Cancer Institute Surveillance Epidemiology and End Results Program (NCI-SEER) case-control study of non-Hodgkin lymphoma (NHL) to achieve realistic exposure situations. Our results showed that WQS regression had good sensitivity and specificity across a variety of conditions considered in this study. The shrinkage methods had a tendency to incorrectly identify a large number of components, especially in the case of strong association with the outcome.
Experimental and computational prediction of glass transition temperature of drugs.
Alzghoul, Ahmad; Alhalaweh, Amjad; Mahlin, Denny; Bergström, Christel A S
2014-12-22
Glass transition temperature (Tg) is an important inherent property of an amorphous solid material which is usually determined experimentally. In this study, the relation between Tg and melting temperature (Tm) was evaluated using a data set of 71 structurally diverse druglike compounds. Further, in silico models for prediction of Tg were developed based on calculated molecular descriptors and linear (multilinear regression, partial least-squares, principal component regression) and nonlinear (neural network, support vector regression) modeling techniques. The models based on Tm predicted Tg with an RMSE of 19.5 K for the test set. Among the five computational models developed herein the support vector regression gave the best result with RMSE of 18.7 K for the test set using only four chemical descriptors. Hence, two different models that predict Tg of drug-like molecules with high accuracy were developed. If Tm is available, a simple linear regression can be used to predict Tg. However, the results also suggest that support vector regression and calculated molecular descriptors can predict Tg with equal accuracy, already before compound synthesis.
Poisson Mixture Regression Models for Heart Disease Prediction.
Mufudza, Chipo; Erol, Hamza
2016-01-01
Early heart disease control can be achieved by high disease prediction and diagnosis efficiency. This paper focuses on the use of model based clustering techniques to predict and diagnose heart disease via Poisson mixture regression models. Analysis and application of Poisson mixture regression models is here addressed under two different classes: standard and concomitant variable mixture regression models. Results show that a two-component concomitant variable Poisson mixture regression model predicts heart disease better than both the standard Poisson mixture regression model and the ordinary general linear Poisson regression model due to its low Bayesian Information Criteria value. Furthermore, a Zero Inflated Poisson Mixture Regression model turned out to be the best model for heart prediction over all models as it both clusters individuals into high or low risk category and predicts rate to heart disease componentwise given clusters available. It is deduced that heart disease prediction can be effectively done by identifying the major risks componentwise using Poisson mixture regression model.
Poisson Mixture Regression Models for Heart Disease Prediction
Erol, Hamza
2016-01-01
Early heart disease control can be achieved by high disease prediction and diagnosis efficiency. This paper focuses on the use of model based clustering techniques to predict and diagnose heart disease via Poisson mixture regression models. Analysis and application of Poisson mixture regression models is here addressed under two different classes: standard and concomitant variable mixture regression models. Results show that a two-component concomitant variable Poisson mixture regression model predicts heart disease better than both the standard Poisson mixture regression model and the ordinary general linear Poisson regression model due to its low Bayesian Information Criteria value. Furthermore, a Zero Inflated Poisson Mixture Regression model turned out to be the best model for heart prediction over all models as it both clusters individuals into high or low risk category and predicts rate to heart disease componentwise given clusters available. It is deduced that heart disease prediction can be effectively done by identifying the major risks componentwise using Poisson mixture regression model. PMID:27999611
Fingeret, Abbey L; Martinez, Rebecca H; Hsieh, Christine; Downey, Peter; Nowygrod, Roman
2016-02-01
We aim to determine whether observed operations or internet-based video review predict improved performance in the surgery clerkship. A retrospective review of students' usage of surgical videos, observed operations, evaluations, and examination scores were used to construct an exploratory principal component analysis. Multivariate regression was used to determine factors predictive of clerkship performance. Case log data for 231 students revealed a median of 25 observed cases. Students accessed the web-based video platform a median of 15 times. Principal component analysis yielded 4 factors contributing 74% of the variability with a Kaiser-Meyer-Olkin coefficient of .83. Multivariate regression predicted shelf score (P < .0001), internal clinical skills examination score (P < .0001), subjective evaluations (P < .001), and video website utilization (P < .001) but not observed cases to be significantly associated with overall performance. Utilization of a web-based operative video platform during a surgical clerkship is an independently associated with improved clinical reasoning, fund of knowledge, and overall evaluation. Thus, this modality can serve as a useful adjunct to live observation. Copyright © 2016 Elsevier Inc. All rights reserved.
Weight estimation techniques for composite airplanes in general aviation industry
NASA Technical Reports Server (NTRS)
Paramasivam, T.; Horn, W. J.; Ritter, J.
1986-01-01
Currently available weight estimation methods for general aviation airplanes were investigated. New equations with explicit material properties were developed for the weight estimation of aircraft components such as wing, fuselage and empennage. Regression analysis was applied to the basic equations for a data base of twelve airplanes to determine the coefficients. The resulting equations can be used to predict the component weights of either metallic or composite airplanes.
Symplectic geometry spectrum regression for prediction of noisy time series
NASA Astrophysics Data System (ADS)
Xie, Hong-Bo; Dokos, Socrates; Sivakumar, Bellie; Mengersen, Kerrie
2016-05-01
We present the symplectic geometry spectrum regression (SGSR) technique as well as a regularized method based on SGSR for prediction of nonlinear time series. The main tool of analysis is the symplectic geometry spectrum analysis, which decomposes a time series into the sum of a small number of independent and interpretable components. The key to successful regularization is to damp higher order symplectic geometry spectrum components. The effectiveness of SGSR and its superiority over local approximation using ordinary least squares are demonstrated through prediction of two noisy synthetic chaotic time series (Lorenz and Rössler series), and then tested for prediction of three real-world data sets (Mississippi River flow data and electromyographic and mechanomyographic signal recorded from human body).
Variable Selection for Regression Models of Percentile Flows
NASA Astrophysics Data System (ADS)
Fouad, G.
2017-12-01
Percentile flows describe the flow magnitude equaled or exceeded for a given percent of time, and are widely used in water resource management. However, these statistics are normally unavailable since most basins are ungauged. Percentile flows of ungauged basins are often predicted using regression models based on readily observable basin characteristics, such as mean elevation. The number of these independent variables is too large to evaluate all possible models. A subset of models is typically evaluated using automatic procedures, like stepwise regression. This ignores a large variety of methods from the field of feature (variable) selection and physical understanding of percentile flows. A study of 918 basins in the United States was conducted to compare an automatic regression procedure to the following variable selection methods: (1) principal component analysis, (2) correlation analysis, (3) random forests, (4) genetic programming, (5) Bayesian networks, and (6) physical understanding. The automatic regression procedure only performed better than principal component analysis. Poor performance of the regression procedure was due to a commonly used filter for multicollinearity, which rejected the strongest models because they had cross-correlated independent variables. Multicollinearity did not decrease model performance in validation because of a representative set of calibration basins. Variable selection methods based strictly on predictive power (numbers 2-5 from above) performed similarly, likely indicating a limit to the predictive power of the variables. Similar performance was also reached using variables selected based on physical understanding, a finding that substantiates recent calls to emphasize physical understanding in modeling for predictions in ungauged basins. The strongest variables highlighted the importance of geology and land cover, whereas widely used topographic variables were the weakest predictors. Variables suffered from a high degree of multicollinearity, possibly illustrating the co-evolution of climatic and physiographic conditions. Given the ineffectiveness of many variables used here, future work should develop new variables that target specific processes associated with percentile flows.
QSAR modeling of flotation collectors using principal components extracted from topological indices.
Natarajan, R; Nirdosh, Inderjit; Basak, Subhash C; Mills, Denise R
2002-01-01
Several topological indices were calculated for substituted-cupferrons that were tested as collectors for the froth flotation of uranium. The principal component analysis (PCA) was used for data reduction. Seven principal components (PC) were found to account for 98.6% of the variance among the computed indices. The principal components thus extracted were used in stepwise regression analyses to construct regression models for the prediction of separation efficiencies (Es) of the collectors. A two-parameter model with a correlation coefficient of 0.889 and a three-parameter model with a correlation coefficient of 0.913 were formed. PCs were found to be better than partition coefficient to form regression equations, and inclusion of an electronic parameter such as Hammett sigma or quantum mechanically derived electronic charges on the chelating atoms did not improve the correlation coefficient significantly. The method was extended to model the separation efficiencies of mercaptobenzothiazoles (MBT) and aminothiophenols (ATP) used in the flotation of lead and zinc ores, respectively. Five principal components were found to explain 99% of the data variability in each series. A three-parameter equation with correlation coefficient of 0.985 and a two-parameter equation with correlation coefficient of 0.926 were obtained for MBT and ATP, respectively. The amenability of separation efficiencies of chelating collectors to QSAR modeling using PCs based on topological indices might lead to the selection of collectors for synthesis and testing from a virtual database.
NASA Astrophysics Data System (ADS)
Ying, Yibin; Liu, Yande; Fu, Xiaping; Lu, Huishan
2005-11-01
The artificial neural networks (ANNs) have been used successfully in applications such as pattern recognition, image processing, automation and control. However, majority of today's applications of ANNs is back-propagate feed-forward ANN (BP-ANN). In this paper, back-propagation artificial neural networks (BP-ANN) were applied for modeling soluble solid content (SSC) of intact pear from their Fourier transform near infrared (FT-NIR) spectra. One hundred and sixty-four pear samples were used to build the calibration models and evaluate the models predictive ability. The results are compared to the classical calibration approaches, i.e. principal component regression (PCR), partial least squares (PLS) and non-linear PLS (NPLS). The effects of the optimal methods of training parameters on the prediction model were also investigated. BP-ANN combine with principle component regression (PCR) resulted always better than the classical PCR, PLS and Weight-PLS methods, from the point of view of the predictive ability. Based on the results, it can be concluded that FT-NIR spectroscopy and BP-ANN models can be properly employed for rapid and nondestructive determination of fruit internal quality.
Riccardi, M; Mele, G; Pulvento, C; Lavini, A; d'Andria, R; Jacobsen, S-E
2014-06-01
Leaf chlorophyll content provides valuable information about physiological status of plants; it is directly linked to photosynthetic potential and primary production. In vitro assessment by wet chemical extraction is the standard method for leaf chlorophyll determination. This measurement is expensive, laborious, and time consuming. Over the years alternative methods, rapid and non-destructive, have been explored. The aim of this work was to evaluate the applicability of a fast and non-invasive field method for estimation of chlorophyll content in quinoa and amaranth leaves based on RGB components analysis of digital images acquired with a standard SLR camera. Digital images of leaves from different genotypes of quinoa and amaranth were acquired directly in the field. Mean values of each RGB component were evaluated via image analysis software and correlated to leaf chlorophyll provided by standard laboratory procedure. Single and multiple regression models using RGB color components as independent variables have been tested and validated. The performance of the proposed method was compared to that of the widely used non-destructive SPAD method. Sensitivity of the best regression models for different genotypes of quinoa and amaranth was also checked. Color data acquisition of the leaves in the field with a digital camera was quick, more effective, and lower cost than SPAD. The proposed RGB models provided better correlation (highest R (2)) and prediction (lowest RMSEP) of the true value of foliar chlorophyll content and had a lower amount of noise in the whole range of chlorophyll studied compared with SPAD and other leaf image processing based models when applied to quinoa and amaranth.
Lin, Lili; Yan, Rong; Liu, Yongqiang; Jiang, Wenju
2010-11-01
The artificial biomass based on three biomass components (cellulose, hemicellulose and lignin) were developed on the basis of a simplex-lattice approach. Together with a natural biomass sample, they were employed in enzymatic hydrolysis researches. Different enzyme combines of two commercial enzymes (ACCELLERASE 1500 and OPTIMASH BG) showed a potential to hydrolyze hemicellulose completely. Negligible interactions among the three components were observed, and the used enzyme ACCELLERASE 1500 was proven to be weak lignin-binding. On this basis, a multiple linear-regression equation was established for predicting the reducing sugar yield based on the component proportions in a biomass. The hemicellulose and cellulose in a biomass sample were found to have different contributions in staged hydrolysis at different time periods. Furthermore, the hydrolysis of rice straw was conducted to validate the computation approach through considerations of alkaline solution pretreatment and combined enzymes function, so as to understand better the nature of biomass hydrolysis, from the aspect of three biomass components.
Mi, Baibing; Dang, Shaonong; Li, Qiang; Zhao, Yaling; Yang, Ruihai; Wang, Duolao; Yan, Hong
2015-07-01
Hypertensive patients have more complex health care needs and are more likely to have poorer health-related quality of life than normotensive people. The awareness of hypertension could be related to reduce health-related quality of life. We propose the use of quantile regression to explore more detailed relationships between awareness of hypertension and health-related quality of life. In a cross-sectional, population-based study, 2737 participants (including 1035 hypertensive patients and 1702 normotensive participants) completed the Short-Form Health Survey. A quantile regression model was employed to investigate the association of physical component summary scores and mental component summary scores with awareness of hypertension and to evaluate the associated factors. Patients who were aware of hypertension (N = 554) had lower scores than patients who were unaware of hypertension (N = 481). The median (IQR) of physical component summary scores: 48.20 (13.88) versus 53.27 (10.79), P < 0.01; the mental component summary scores: 50.68 (15.09) versus 51.70 (10.65), P = 0.03. adjusting for covariates, the quantile regression results suggest awareness of hypertension was associated with most physical component summary scores quantiles (P < 0.05 except 10th and 20th quantiles) in which the β-estimates from -2.14 (95% CI: -3.80 to -0.48) to -1.45 (95% CI: -2.42 to -0.47), as the same significant trend with some poorer mental component summary scores quantiles in which the β-estimates from -3.47 (95% CI: -6.65 to -0.39) to -2.18 (95% CI: -4.30 to -0.06). The awareness of hypertension has a greater effect on those with intermediate physical component summary status: the β-estimates were equal to -2.04 (95% CI: -3.51 to -0.57, P < 0.05) at the 40th and decreased further to -1.45 (95% CI: -2.42 to -0.47, P < 0.01) at the 90th quantile. Awareness of hypertension was negatively related to health-related quality of life in hypertensive patients in rural western China, which has a greater effect on mental component summary scores with the poorer status and on physical component summary scores with the intermediate status.
Improving Cluster Analysis with Automatic Variable Selection Based on Trees
2014-12-01
regression trees Daisy DISsimilAritY PAM partitioning around medoids PMA penalized multivariate analysis SPC sparse principal components UPGMA unweighted...unweighted pair-group average method ( UPGMA ). This method measures dissimilarities between all objects in two clusters and takes the average value
Statistical process control of cocrystallization processes: A comparison between OPLS and PLS.
Silva, Ana F T; Sarraguça, Mafalda Cruz; Ribeiro, Paulo R; Santos, Adenilson O; De Beer, Thomas; Lopes, João Almeida
2017-03-30
Orthogonal partial least squares regression (OPLS) is being increasingly adopted as an alternative to partial least squares (PLS) regression due to the better generalization that can be achieved. Particularly in multivariate batch statistical process control (BSPC), the use of OPLS for estimating nominal trajectories is advantageous. In OPLS, the nominal process trajectories are expected to be captured in a single predictive principal component while uncorrelated variations are filtered out to orthogonal principal components. In theory, OPLS will yield a better estimation of the Hotelling's T 2 statistic and corresponding control limits thus lowering the number of false positives and false negatives when assessing the process disturbances. Although OPLS advantages have been demonstrated in the context of regression, its use on BSPC was seldom reported. This study proposes an OPLS-based approach for BSPC of a cocrystallization process between hydrochlorothiazide and p-aminobenzoic acid monitored on-line with near infrared spectroscopy and compares the fault detection performance with the same approach based on PLS. A series of cocrystallization batches with imposed disturbances were used to test the ability to detect abnormal situations by OPLS and PLS-based BSPC methods. Results demonstrated that OPLS was generally superior in terms of sensibility and specificity in most situations. In some abnormal batches, it was found that the imposed disturbances were only detected with OPLS. Copyright © 2017 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Hassanzadeh, S.; Hosseinibalam, F.; Omidvari, M.
2008-04-01
Data of seven meteorological variables (relative humidity, wet temperature, dry temperature, maximum temperature, minimum temperature, ground temperature and sun radiation time) and ozone values have been used for statistical analysis. Meteorological variables and ozone values were analyzed using both multiple linear regression and principal component methods. Data for the period 1999-2004 are analyzed jointly using both methods. For all periods, temperature dependent variables were highly correlated, but were all negatively correlated with relative humidity. Multiple regression analysis was used to fit the meteorological variables using the meteorological variables as predictors. A variable selection method based on high loading of varimax rotated principal components was used to obtain subsets of the predictor variables to be included in the linear regression model of the meteorological variables. In 1999, 2001 and 2002 one of the meteorological variables was weakly influenced predominantly by the ozone concentrations. However, the model did not predict that the meteorological variables for the year 2000 were not influenced predominantly by the ozone concentrations that point to variation in sun radiation. This could be due to other factors that were not explicitly considered in this study.
Ocaña-Peinado, Francisco M; Valderrama, Mariano J; Bouzas, Paula R
2013-05-01
The problem of developing a 2-week-on ahead forecast of atmospheric cypress pollen levels is tackled in this paper by developing a principal component multiple regression model involving several climatic variables. The efficacy of the proposed model is validated by means of an application to real data of Cupressaceae pollen concentration in the city of Granada (southeast of Spain). The model was applied to data from 11 consecutive years (1995-2005), with 2006 being used to validate the forecasts. Based on the work of different authors, factors as temperature, humidity, hours of sun and wind speed were incorporated in the model. This methodology explains approximately 75-80% of the variability in the airborne Cupressaceae pollen concentration.
Weighted functional linear regression models for gene-based association analysis.
Belonogova, Nadezhda M; Svishcheva, Gulnara R; Wilson, James F; Campbell, Harry; Axenovich, Tatiana I
2018-01-01
Functional linear regression models are effectively used in gene-based association analysis of complex traits. These models combine information about individual genetic variants, taking into account their positions and reducing the influence of noise and/or observation errors. To increase the power of methods, where several differently informative components are combined, weights are introduced to give the advantage to more informative components. Allele-specific weights have been introduced to collapsing and kernel-based approaches to gene-based association analysis. Here we have for the first time introduced weights to functional linear regression models adapted for both independent and family samples. Using data simulated on the basis of GAW17 genotypes and weights defined by allele frequencies via the beta distribution, we demonstrated that type I errors correspond to declared values and that increasing the weights of causal variants allows the power of functional linear models to be increased. We applied the new method to real data on blood pressure from the ORCADES sample. Five of the six known genes with P < 0.1 in at least one analysis had lower P values with weighted models. Moreover, we found an association between diastolic blood pressure and the VMP1 gene (P = 8.18×10-6), when we used a weighted functional model. For this gene, the unweighted functional and weighted kernel-based models had P = 0.004 and 0.006, respectively. The new method has been implemented in the program package FREGAT, which is freely available at https://cran.r-project.org/web/packages/FREGAT/index.html.
Modeling and managing risk early in software development
NASA Technical Reports Server (NTRS)
Briand, Lionel C.; Thomas, William M.; Hetmanski, Christopher J.
1993-01-01
In order to improve the quality of the software development process, we need to be able to build empirical multivariate models based on data collectable early in the software process. These models need to be both useful for prediction and easy to interpret, so that remedial actions may be taken in order to control and optimize the development process. We present an automated modeling technique which can be used as an alternative to regression techniques. We show how it can be used to facilitate the identification and aid the interpretation of the significant trends which characterize 'high risk' components in several Ada systems. Finally, we evaluate the effectiveness of our technique based on a comparison with logistic regression based models.
NASA Astrophysics Data System (ADS)
Polat, Esra; Gunay, Suleyman
2013-10-01
One of the problems encountered in Multiple Linear Regression (MLR) is multicollinearity, which causes the overestimation of the regression parameters and increase of the variance of these parameters. Hence, in case of multicollinearity presents, biased estimation procedures such as classical Principal Component Regression (CPCR) and Partial Least Squares Regression (PLSR) are then performed. SIMPLS algorithm is the leading PLSR algorithm because of its speed, efficiency and results are easier to interpret. However, both of the CPCR and SIMPLS yield very unreliable results when the data set contains outlying observations. Therefore, Hubert and Vanden Branden (2003) have been presented a robust PCR (RPCR) method and a robust PLSR (RPLSR) method called RSIMPLS. In RPCR, firstly, a robust Principal Component Analysis (PCA) method for high-dimensional data on the independent variables is applied, then, the dependent variables are regressed on the scores using a robust regression method. RSIMPLS has been constructed from a robust covariance matrix for high-dimensional data and robust linear regression. The purpose of this study is to show the usage of RPCR and RSIMPLS methods on an econometric data set, hence, making a comparison of two methods on an inflation model of Turkey. The considered methods have been compared in terms of predictive ability and goodness of fit by using a robust Root Mean Squared Error of Cross-validation (R-RMSECV), a robust R2 value and Robust Component Selection (RCS) statistic.
Brown, C. Erwin
1993-01-01
Correlation analysis in conjunction with principal-component and multiple-regression analyses were applied to laboratory chemical and petrographic data to assess the usefulness of these techniques in evaluating selected physical and hydraulic properties of carbonate-rock aquifers in central Pennsylvania. Correlation and principal-component analyses were used to establish relations and associations among variables, to determine dimensions of property variation of samples, and to filter the variables containing similar information. Principal-component and correlation analyses showed that porosity is related to other measured variables and that permeability is most related to porosity and grain size. Four principal components are found to be significant in explaining the variance of data. Stepwise multiple-regression analysis was used to see how well the measured variables could predict porosity and (or) permeability for this suite of rocks. The variation in permeability and porosity is not totally predicted by the other variables, but the regression is significant at the 5% significance level. ?? 1993.
Qu, Mingkai; Wang, Yan; Huang, Biao; Zhao, Yongcun
2018-06-01
The traditional source apportionment models, such as absolute principal component scores-multiple linear regression (APCS-MLR), are usually susceptible to outliers, which may be widely present in the regional geochemical dataset. Furthermore, the models are merely built on variable space instead of geographical space and thus cannot effectively capture the local spatial characteristics of each source contributions. To overcome the limitations, a new receptor model, robust absolute principal component scores-robust geographically weighted regression (RAPCS-RGWR), was proposed based on the traditional APCS-MLR model. Then, the new method was applied to the source apportionment of soil metal elements in a region of Wuhan City, China as a case study. Evaluations revealed that: (i) RAPCS-RGWR model had better performance than APCS-MLR model in the identification of the major sources of soil metal elements, and (ii) source contributions estimated by RAPCS-RGWR model were more close to the true soil metal concentrations than that estimated by APCS-MLR model. It is shown that the proposed RAPCS-RGWR model is a more effective source apportionment method than APCS-MLR (i.e., non-robust and global model) in dealing with the regional geochemical dataset. Copyright © 2018 Elsevier B.V. All rights reserved.
Salvatore, Stefania; Bramness, Jørgen Gustav; Reid, Malcolm J; Thomas, Kevin Victor; Harman, Christopher; Røislien, Jo
2015-01-01
Wastewater-based epidemiology (WBE) is a new methodology for estimating the drug load in a population. Simple summary statistics and specification tests have typically been used to analyze WBE data, comparing differences between weekday and weekend loads. Such standard statistical methods may, however, overlook important nuanced information in the data. In this study, we apply functional data analysis (FDA) to WBE data and compare the results to those obtained from more traditional summary measures. We analysed temporal WBE data from 42 European cities, using sewage samples collected daily for one week in March 2013. For each city, the main temporal features of two selected drugs were extracted using functional principal component (FPC) analysis, along with simpler measures such as the area under the curve (AUC). The individual cities' scores on each of the temporal FPCs were then used as outcome variables in multiple linear regression analysis with various city and country characteristics as predictors. The results were compared to those of functional analysis of variance (FANOVA). The three first FPCs explained more than 99% of the temporal variation. The first component (FPC1) represented the level of the drug load, while the second and third temporal components represented the level and the timing of a weekend peak. AUC was highly correlated with FPC1, but other temporal characteristic were not captured by the simple summary measures. FANOVA was less flexible than the FPCA-based regression, and even showed concordance results. Geographical location was the main predictor for the general level of the drug load. FDA of WBE data extracts more detailed information about drug load patterns during the week which are not identified by more traditional statistical methods. Results also suggest that regression based on FPC results is a valuable addition to FANOVA for estimating associations between temporal patterns and covariate information.
NASA Astrophysics Data System (ADS)
Keshtpoor, M.; Carnacina, I.; Yablonsky, R. M.
2016-12-01
Extratropical cyclones (ETCs) are the primary driver of storm surge events along the UK and northwest mainland Europe coastlines. In an effort to evaluate the storm surge risk in coastal communities in this region, a stochastic catalog is developed by perturbing the historical storm seeds of European ETCs to account for 10,000 years of possible ETCs. Numerical simulation of the storm surge generated by the full 10,000-year stochastic catalog, however, is computationally expensive and may take several months to complete with available computational resources. A new statistical regression model is developed to select the major surge-generating events from the stochastic ETC catalog. This regression model is based on the maximum storm surge, obtained via numerical simulations using a calibrated version of the Delft3D-FM hydrodynamic model with a relatively coarse mesh, of 1750 historical ETC events that occurred over the past 38 years in Europe. These numerically-simulated surge values were regressed to the local sea level pressure and the U and V components of the wind field at the location of 196 tide gauge stations near the UK and northwest mainland Europe coastal areas. The regression model suggests that storm surge values in the area of interest are highly correlated to the U- and V-component of wind speed, as well as the sea level pressure. Based on these correlations, the regression model was then used to select surge-generating storms from the 10,000-year stochastic catalog. Results suggest that roughly 105,000 events out of 480,000 stochastic storms are surge-generating events and need to be considered for numerical simulation using a hydrodynamic model. The selected stochastic storms were then simulated in Delft3D-FM, and the final refinement of the storm population was performed based on return period analysis of the 1750 historical event simulations at each of the 196 tide gauges in preparation for Delft3D-FM fine mesh simulations.
NASA Astrophysics Data System (ADS)
Rajab, Jasim M.; MatJafri, M. Z.; Lim, H. S.
2013-06-01
This study encompasses columnar ozone modelling in the peninsular Malaysia. Data of eight atmospheric parameters [air surface temperature (AST), carbon monoxide (CO), methane (CH4), water vapour (H2Ovapour), skin surface temperature (SSKT), atmosphere temperature (AT), relative humidity (RH), and mean surface pressure (MSP)] data set, retrieved from NASA's Atmospheric Infrared Sounder (AIRS), for the entire period (2003-2008) was employed to develop models to predict the value of columnar ozone (O3) in study area. The combined method, which is based on using both multiple regressions combined with principal component analysis (PCA) modelling, was used to predict columnar ozone. This combined approach was utilized to improve the prediction accuracy of columnar ozone. Separate analysis was carried out for north east monsoon (NEM) and south west monsoon (SWM) seasons. The O3 was negatively correlated with CH4, H2Ovapour, RH, and MSP, whereas it was positively correlated with CO, AST, SSKT, and AT during both the NEM and SWM season periods. Multiple regression analysis was used to fit the columnar ozone data using the atmospheric parameter's variables as predictors. A variable selection method based on high loading of varimax rotated principal components was used to acquire subsets of the predictor variables to be comprised in the linear regression model of the atmospheric parameter's variables. It was found that the increase in columnar O3 value is associated with an increase in the values of AST, SSKT, AT, and CO and with a drop in the levels of CH4, H2Ovapour, RH, and MSP. The result of fitting the best models for the columnar O3 value using eight of the independent variables gave about the same values of the R (≈0.93) and R2 (≈0.86) for both the NEM and SWM seasons. The common variables that appeared in both regression equations were SSKT, CH4 and RH, and the principal precursor of the columnar O3 value in both the NEM and SWM seasons was SSKT.
NASA Astrophysics Data System (ADS)
Wei, Wenjuan; Liu, Jiangang; Dai, Ruwei; Feng, Lu; Li, Ling; Tian, Jie
2014-03-01
Previous behavioral research has proved that individuals process own- and other-race faces differently. One well-known effect is the other-race effect (ORE), which indicates that individuals categorize other-race faces more accurately and faster than own-race faces. The existed functional magnetic resonance imaging (fMRI) studies of the other-race effect mainly focused on the racial prejudice and the socio-affective differences towards own- and other-race face. In the present fMRI study, we adopted a race-categorization task to determine the activation level differences between categorizing own- and other-race faces. Thirty one Chinese participants who live in China with Chinese as the majority and who had no direct contact with Caucasian individual were recruited in the present study. We used the group independent component analysis (ICA), which is a method of blind source signal separation that has proven to be promising for analysis of fMRI data. We separated the entail data into 56 components which is estimated based on one subject using the Minimal Description Length (MDL) criteria. The components sorted based on the multiple linear regression temporal sorting criteria, and the fit regression parameters were used in performing statistical test to evaluate the task-relatedness of the components. The one way anova was performed to test the significance of the component time course in different conditions. Our result showed that the areas, which coordinates is similar to the right FFA coordinates that previous studies reported, were greater activated for own-race faces than other-race faces, while the precuneus showed greater activation for other-race faces than own-race faces.
Ostro, Bart; Feng, Wen-Ying; Broadwin, Rachel; Green, Shelley; Lipsett, Michael
2007-01-01
Several epidemiologic studies provide evidence of an association between daily mortality and particulate matter < 2.5 pm in diameter (PM2.5). Little is known, however, about the relative effects of PM2.5 constituents. We examined associations between 19 PM2.5 components and daily mortality in six California counties. We obtained daily data from 2000 to 2003 on mortality and PM2.5 mass and components, including elemental and organic carbon (EC and OC), nitrates, sulfates, and various metals. We examined associations of PM2.5 and its constituents with daily counts of several mortality categories: all-cause, cardiovascular, respiratory, and mortality age > 65 years. Poisson regressions incorporating natural splines were used to control for time-varying covariates. Effect estimates were determined for each component in each county and then combined using a random-effects model. PM2.5 mass and several constituents were associated with multiple mortality categories, especially cardiovascular deaths. For example, for a 3-day lag, the latter increased by 1.6, 2.1, 1.6, and 1.5% for PM2.5, EC, OC, and nitrates based on interquartile ranges of 14.6, 0.8, 4.6, and 5.5 pg/m(3), respectively. Stronger associations were observed between mortality and additional pollutants, including sulfates and several metals, during the cool season. This multicounty analysis adds to the growing body of evidence linking PM2.5 with mortality and indicates that excess risks may vary among specific PM2.5 components. Therefore, the use of regression coefficients based on PM2.5 mass may underestimate associations with some PM2.5 components. Also, our findings support the hypothesis that combustion-associated pollutants are particularly important in California.
High-Dimensional Sparse Factor Modeling: Applications in Gene Expression Genomics
Carvalho, Carlos M.; Chang, Jeffrey; Lucas, Joseph E.; Nevins, Joseph R.; Wang, Quanli; West, Mike
2010-01-01
We describe studies in molecular profiling and biological pathway analysis that use sparse latent factor and regression models for microarray gene expression data. We discuss breast cancer applications and key aspects of the modeling and computational methodology. Our case studies aim to investigate and characterize heterogeneity of structure related to specific oncogenic pathways, as well as links between aggregate patterns in gene expression profiles and clinical biomarkers. Based on the metaphor of statistically derived “factors” as representing biological “subpathway” structure, we explore the decomposition of fitted sparse factor models into pathway subcomponents and investigate how these components overlay multiple aspects of known biological activity. Our methodology is based on sparsity modeling of multivariate regression, ANOVA, and latent factor models, as well as a class of models that combines all components. Hierarchical sparsity priors address questions of dimension reduction and multiple comparisons, as well as scalability of the methodology. The models include practically relevant non-Gaussian/nonparametric components for latent structure, underlying often quite complex non-Gaussianity in multivariate expression patterns. Model search and fitting are addressed through stochastic simulation and evolutionary stochastic search methods that are exemplified in the oncogenic pathway studies. Supplementary supporting material provides more details of the applications, as well as examples of the use of freely available software tools for implementing the methodology. PMID:21218139
Time series modeling by a regression approach based on a latent process.
Chamroukhi, Faicel; Samé, Allou; Govaert, Gérard; Aknin, Patrice
2009-01-01
Time series are used in many domains including finance, engineering, economics and bioinformatics generally to represent the change of a measurement over time. Modeling techniques may then be used to give a synthetic representation of such data. A new approach for time series modeling is proposed in this paper. It consists of a regression model incorporating a discrete hidden logistic process allowing for activating smoothly or abruptly different polynomial regression models. The model parameters are estimated by the maximum likelihood method performed by a dedicated Expectation Maximization (EM) algorithm. The M step of the EM algorithm uses a multi-class Iterative Reweighted Least-Squares (IRLS) algorithm to estimate the hidden process parameters. To evaluate the proposed approach, an experimental study on simulated data and real world data was performed using two alternative approaches: a heteroskedastic piecewise regression model using a global optimization algorithm based on dynamic programming, and a Hidden Markov Regression Model whose parameters are estimated by the Baum-Welch algorithm. Finally, in the context of the remote monitoring of components of the French railway infrastructure, and more particularly the switch mechanism, the proposed approach has been applied to modeling and classifying time series representing the condition measurements acquired during switch operations.
Bayesian nonparametric regression with varying residual density
Pati, Debdeep; Dunson, David B.
2013-01-01
We consider the problem of robust Bayesian inference on the mean regression function allowing the residual density to change flexibly with predictors. The proposed class of models is based on a Gaussian process prior for the mean regression function and mixtures of Gaussians for the collection of residual densities indexed by predictors. Initially considering the homoscedastic case, we propose priors for the residual density based on probit stick-breaking (PSB) scale mixtures and symmetrized PSB (sPSB) location-scale mixtures. Both priors restrict the residual density to be symmetric about zero, with the sPSB prior more flexible in allowing multimodal densities. We provide sufficient conditions to ensure strong posterior consistency in estimating the regression function under the sPSB prior, generalizing existing theory focused on parametric residual distributions. The PSB and sPSB priors are generalized to allow residual densities to change nonparametrically with predictors through incorporating Gaussian processes in the stick-breaking components. This leads to a robust Bayesian regression procedure that automatically down-weights outliers and influential observations in a locally-adaptive manner. Posterior computation relies on an efficient data augmentation exact block Gibbs sampler. The methods are illustrated using simulated and real data applications. PMID:24465053
Chakraborty, Somsubhra; Weindorf, David C; Li, Bin; Ali Aldabaa, Abdalsamad Abdalsatar; Ghosh, Rakesh Kumar; Paul, Sathi; Nasim Ali, Md
2015-05-01
Using 108 petroleum contaminated soil samples, this pilot study proposed a new analytical approach of combining visible near-infrared diffuse reflectance spectroscopy (VisNIR DRS) and portable X-ray fluorescence spectrometry (PXRF) for rapid and improved quantification of soil petroleum contamination. Results indicated that an advanced fused model where VisNIR DRS spectra-based penalized spline regression (PSR) was used to predict total petroleum hydrocarbon followed by PXRF elemental data-based random forest regression was used to model the PSR residuals, it outperformed (R(2)=0.78, residual prediction deviation (RPD)=2.19) all other models tested, even producing better generalization than using VisNIR DRS alone (RPD's of 1.64, 1.86, and 1.96 for random forest, penalized spline regression, and partial least squares regression, respectively). Additionally, unsupervised principal component analysis using the PXRF+VisNIR DRS system qualitatively separated contaminated soils from control samples. Fusion of PXRF elemental data and VisNIR derivative spectra produced an optimized model for total petroleum hydrocarbon quantification in soils. Copyright © 2015 Elsevier B.V. All rights reserved.
Igne, Benoît; de Juan, Anna; Jaumot, Joaquim; Lallemand, Jordane; Preys, Sébastien; Drennen, James K; Anderson, Carl A
2014-10-01
The implementation of a blend monitoring and control method based on a process analytical technology such as near infrared spectroscopy requires the selection and optimization of numerous criteria that will affect the monitoring outputs and expected blend end-point. Using a five component formulation, the present article contrasts the modeling strategies and end-point determination of a traditional quantitative method based on the prediction of the blend parameters employing partial least-squares regression with a qualitative strategy based on principal component analysis and Hotelling's T(2) and residual distance to the model, called Prototype. The possibility to monitor and control blend homogeneity with multivariate curve resolution was also assessed. The implementation of the above methods in the presence of designed experiments (with variation of the amount of active ingredient and excipients) and with normal operating condition samples (nominal concentrations of the active ingredient and excipients) was tested. The impact of criteria used to stop the blends (related to precision and/or accuracy) was assessed. Results demonstrated that while all methods showed similarities in their outputs, some approaches were preferred for decision making. The selectivity of regression based methods was also contrasted with the capacity of qualitative methods to determine the homogeneity of the entire formulation. Copyright © 2014. Published by Elsevier B.V.
Carbonell, Felix; Bellec, Pierre; Shmuel, Amir
2011-01-01
The influence of the global average signal (GAS) on functional-magnetic resonance imaging (fMRI)-based resting-state functional connectivity is a matter of ongoing debate. The global average fluctuations increase the correlation between functional systems beyond the correlation that reflects their specific functional connectivity. Hence, removal of the GAS is a common practice for facilitating the observation of network-specific functional connectivity. This strategy relies on the implicit assumption of a linear-additive model according to which global fluctuations, irrespective of their origin, and network-specific fluctuations are super-positioned. However, removal of the GAS introduces spurious negative correlations between functional systems, bringing into question the validity of previous findings of negative correlations between fluctuations in the default-mode and the task-positive networks. Here we present an alternative method for estimating global fluctuations, immune to the complications associated with the GAS. Principal components analysis was applied to resting-state fMRI time-series. A global-signal effect estimator was defined as the principal component (PC) that correlated best with the GAS. The mean correlation coefficient between our proposed PC-based global effect estimator and the GAS was 0.97±0.05, demonstrating that our estimator successfully approximated the GAS. In 66 out of 68 runs, the PC that showed the highest correlation with the GAS was the first PC. Since PCs are orthogonal, our method provides an estimator of the global fluctuations, which is uncorrelated to the remaining, network-specific fluctuations. Moreover, unlike the regression of the GAS, the regression of the PC-based global effect estimator does not introduce spurious anti-correlations beyond the decrease in seed-based correlation values allowed by the assumed additive model. After regressing this PC-based estimator out of the original time-series, we observed robust anti-correlations between resting-state fluctuations in the default-mode and the task-positive networks. We conclude that resting-state global fluctuations and network-specific fluctuations are uncorrelated, supporting a Resting-State Linear-Additive Model. In addition, we conclude that the network-specific resting-state fluctuations of the default-mode and task-positive networks show artifact-free anti-correlations.
Comprehensive database of diameter-based biomass regressions for North American tree species
Jennifer C. Jenkins; David C. Chojnacky; Linda S. Heath; Richard A. Birdsey
2004-01-01
A database consisting of 2,640 equations compiled from the literature for predicting the biomass of trees and tree components from diameter measurements of species found in North America. Bibliographic information, geographic locations, diameter limits, diameter and biomass units, equation forms, statistical errors, and coefficients are provided for each equation,...
NASA Astrophysics Data System (ADS)
Lisenko, S. A.; Kugeiko, M. M.
2013-01-01
The ability to determine noninvasively microphysical parameters (MPPs) of skin characteristic of malignant melanoma was demonstrated. The MPPs were the melanin content in dermis, saturation of tissue with blood vessels, and concentration and effective size of tissue scatterers. The proposed method was based on spatially resolved spectral measurements of skin diffuse reflectance and multiple regressions between linearly independent measurement components and skin MPPs. The regressions were established by modeling radiation transfer in skin with a wide variation of its MPPs. Errors in the determination of skin MPPs were estimated using fiber-optic measurements of its diffuse reflectance at wavelengths of commercially available semiconductor diode lasers (578, 625, 660, 760, and 806 nm) at source-detector separations of 0.23-1.38 mm.
Template based rotation: A method for functional connectivity analysis with a priori templates☆
Schultz, Aaron P.; Chhatwal, Jasmeer P.; Huijbers, Willem; Hedden, Trey; van Dijk, Koene R.A.; McLaren, Donald G.; Ward, Andrew M.; Wigman, Sarah; Sperling, Reisa A.
2014-01-01
Functional connectivity magnetic resonance imaging (fcMRI) is a powerful tool for understanding the network level organization of the brain in research settings and is increasingly being used to study large-scale neuronal network degeneration in clinical trial settings. Presently, a variety of techniques, including seed-based correlation analysis and group independent components analysis (with either dual regression or back projection) are commonly employed to compute functional connectivity metrics. In the present report, we introduce template based rotation,1 a novel analytic approach optimized for use with a priori network parcellations, which may be particularly useful in clinical trial settings. Template based rotation was designed to leverage the stable spatial patterns of intrinsic connectivity derived from out-of-sample datasets by mapping data from novel sessions onto the previously defined a priori templates. We first demonstrate the feasibility of using previously defined a priori templates in connectivity analyses, and then compare the performance of template based rotation to seed based and dual regression methods by applying these analytic approaches to an fMRI dataset of normal young and elderly subjects. We observed that template based rotation and dual regression are approximately equivalent in detecting fcMRI differences between young and old subjects, demonstrating similar effect sizes for group differences and similar reliability metrics across 12 cortical networks. Both template based rotation and dual-regression demonstrated larger effect sizes and comparable reliabilities as compared to seed based correlation analysis, though all three methods yielded similar patterns of network differences. When performing inter-network and sub-network connectivity analyses, we observed that template based rotation offered greater flexibility, larger group differences, and more stable connectivity estimates as compared to dual regression and seed based analyses. This flexibility owes to the reduced spatial and temporal orthogonality constraints of template based rotation as compared to dual regression. These results suggest that template based rotation can provide a useful alternative to existing fcMRI analytic methods, particularly in clinical trial settings where predefined outcome measures and conserved network descriptions across groups are at a premium. PMID:25150630
Wavelet regression model in forecasting crude oil price
NASA Astrophysics Data System (ADS)
Hamid, Mohd Helmie; Shabri, Ani
2017-05-01
This study presents the performance of wavelet multiple linear regression (WMLR) technique in daily crude oil forecasting. WMLR model was developed by integrating the discrete wavelet transform (DWT) and multiple linear regression (MLR) model. The original time series was decomposed to sub-time series with different scales by wavelet theory. Correlation analysis was conducted to assist in the selection of optimal decomposed components as inputs for the WMLR model. The daily WTI crude oil price series has been used in this study to test the prediction capability of the proposed model. The forecasting performance of WMLR model were also compared with regular multiple linear regression (MLR), Autoregressive Moving Average (ARIMA) and Generalized Autoregressive Conditional Heteroscedasticity (GARCH) using root mean square errors (RMSE) and mean absolute errors (MAE). Based on the experimental results, it appears that the WMLR model performs better than the other forecasting technique tested in this study.
NASA Astrophysics Data System (ADS)
Jiang, Weiping; Ma, Jun; Li, Zhao; Zhou, Xiaohui; Zhou, Boye
2018-05-01
The analysis of the correlations between the noise in different components of GPS stations has positive significance to those trying to obtain more accurate uncertainty of velocity with respect to station motion. Previous research into noise in GPS position time series focused mainly on single component evaluation, which affects the acquisition of precise station positions, the velocity field, and its uncertainty. In this study, before and after removing the common-mode error (CME), we performed one-dimensional linear regression analysis of the noise amplitude vectors in different components of 126 GPS stations with a combination of white noise, flicker noise, and random walking noise in Southern California. The results show that, on the one hand, there are above-moderate degrees of correlation between the white noise amplitude vectors in all components of the stations before and after removal of the CME, while the correlations between flicker noise amplitude vectors in horizontal and vertical components are enhanced from un-correlated to moderately correlated by removing the CME. On the other hand, the significance tests show that, all of the obtained linear regression equations, which represent a unique function of the noise amplitude in any two components, are of practical value after removing the CME. According to the noise amplitude estimates in two components and the linear regression equations, more accurate noise amplitudes can be acquired in the two components.
Wang, Jing-Jing; Wu, Hai-Feng; Sun, Tao; Li, Xia; Wang, Wei; Tao, Li-Xin; Huo, Da; Lv, Ping-Xin; He, Wen; Guo, Xiu-Hua
2013-01-01
Lung cancer, one of the leading causes of cancer-related deaths, usually appears as solitary pulmonary nodules (SPNs) which are hard to diagnose using the naked eye. In this paper, curvelet-based textural features and clinical parameters are used with three prediction models [a multilevel model, a least absolute shrinkage and selection operator (LASSO) regression method, and a support vector machine (SVM)] to improve the diagnosis of benign and malignant SPNs. Dimensionality reduction of the original curvelet-based textural features was achieved using principal component analysis. In addition, non-conditional logistical regression was used to find clinical predictors among demographic parameters and morphological features. The results showed that, combined with 11 clinical predictors, the accuracy rates using 12 principal components were higher than those using the original curvelet-based textural features. To evaluate the models, 10-fold cross validation and back substitution were applied. The results obtained, respectively, were 0.8549 and 0.9221 for the LASSO method, 0.9443 and 0.9831 for SVM, and 0.8722 and 0.9722 for the multilevel model. All in all, it was found that using curvelet-based textural features after dimensionality reduction and using clinical predictors, the highest accuracy rate was achieved with SVM. The method may be used as an auxiliary tool to differentiate between benign and malignant SPNs in CT images.
Batorsky, Benjamin; Van Stolk, Christian; Liu, Hangsheng
2016-10-01
Assess whether adding more components to a workplace wellness program is associated with better outcomes by measuring the relationship of program components to one another and to employee participation and perceptions of program effectiveness. Data came from a 2014 survey of 24,393 employees of 81 employers about services offered, leadership, incentives, and promotion. Logistic regressions were used to model the relationship between program characteristics and outcomes. Components individually are related to better outcomes, but this relationship is weaker in the presence of other components and non-significant for incentives. Within components, a moderate level of services and work time participation opportunities are associated with higher participation and effectiveness. The "more of everything" approach does not appear to be advisable for all programs. Programs should focus on providing ample opportunities for employees to participate and initiatives like results-based incentives.
Physiological and anthropometric determinants of rhythmic gymnastics performance.
Douda, Helen T; Toubekis, Argyris G; Avloniti, Alexandra A; Tokmakidis, Savvas P
2008-03-01
To identify the physiological and anthropometric predictors of rhythmic gymnastics performance, which was defined from the total ranking score of each athlete in a national competition. Thirty-four rhythmic gymnasts were divided into 2 groups, elite (n = 15) and nonelite (n = 19), and they underwent a battery of anthropometric, physical fitness, and physiological measurements. The principal-components analysis extracted 6 components: anthropometric, flexibility, explosive strength, aerobic capacity, body dimensions, and anaerobic metabolism. These were used in a simultaneous multiple-regression procedure to determine which best explain the variance in rhythmic gymnastics performance. Based on the principal-component analysis, the anthropometric component explained 45% of the total variance, flexibility 12.1%, explosive strength 9.2%, aerobic capacity 7.4%, body dimensions 6.8%, and anaerobic metabolism 4.6%. Components of anthropometric (r = .50) and aerobic capacity (r = .49) were significantly correlated with performance (P < .01). When the multiple-regression model-y = 10.708 + (0.0005121 x VO2max) + (0.157 x arm span) + (0.814 x midthigh circumference) - (0.293 x body mass)-was applied to elite gymnasts, 92.5% of the variation was explained by VO2max (58.9%), arm span (12%), midthigh circumference (13.1%), and body mass (8.5%). Selected anthropometric characteristics, aerobic power, flexibility, and explosive strength are important determinants of successful performance. These findings might have practical implications for both training and talent identification in rhythmic gymnastics.
Odegård, J; Klemetsdal, G; Heringstad, B
2005-04-01
Several selection criteria for reducing incidence of mastitis were developed from a random regression sire model for test-day somatic cell score (SCS). For comparison, sire transmitting abilities were also predicted based on a cross-sectional model for lactation mean SCS. Only first-crop daughters were used in genetic evaluation of SCS, and the different selection criteria were compared based on their correlation with incidence of clinical mastitis in second-crop daughters (measured as mean daughter deviations). Selection criteria were predicted based on both complete and reduced first-crop daughter groups (261 or 65 daughters per sire, respectively). For complete daughter groups, predicted transmitting abilities at around 30 d in milk showed the best predictive ability for incidence of clinical mastitis, closely followed by average predicted transmitting abilities over the entire lactation. Both of these criteria were derived from the random regression model. These selection criteria improved accuracy of selection by approximately 2% relative to a cross-sectional model. However, for reduced daughter groups, the cross-sectional model yielded increased predictive ability compared with the selection criteria based on the random regression model. This result may be explained by the cross-sectional model being more robust, i.e., less sensitive to precision of (co)variance components estimates and effects of data structure.
J.B. St. Clair
1993-01-01
Logarithmic regression equations were developed to predict component biomass and leaf area for an 18-yr-old genetic test of Douglas-fir (Pseudotsuga menziesii [Mirb.] Franco var. menziesii) based on stem diameter or cross-sectional sapwood area. Equations did not differ among open-pollinated families in slope, but intercepts...
Hydrological predictions at a watershed scale are commonly based on extrapolation and upscaling of hydrological behavior at plot and hillslope scales. Yet, dominant hydrological drivers at a hillslope may not be as dominant at the watershed scale because of the heterogeneity of w...
Boosting structured additive quantile regression for longitudinal childhood obesity data.
Fenske, Nora; Fahrmeir, Ludwig; Hothorn, Torsten; Rzehak, Peter; Höhle, Michael
2013-07-25
Childhood obesity and the investigation of its risk factors has become an important public health issue. Our work is based on and motivated by a German longitudinal study including 2,226 children with up to ten measurements on their body mass index (BMI) and risk factors from birth to the age of 10 years. We introduce boosting of structured additive quantile regression as a novel distribution-free approach for longitudinal quantile regression. The quantile-specific predictors of our model include conventional linear population effects, smooth nonlinear functional effects, varying-coefficient terms, and individual-specific effects, such as intercepts and slopes. Estimation is based on boosting, a computer intensive inference method for highly complex models. We propose a component-wise functional gradient descent boosting algorithm that allows for penalized estimation of the large variety of different effects, particularly leading to individual-specific effects shrunken toward zero. This concept allows us to flexibly estimate the nonlinear age curves of upper quantiles of the BMI distribution, both on population and on individual-specific level, adjusted for further risk factors and to detect age-varying effects of categorical risk factors. Our model approach can be regarded as the quantile regression analog of Gaussian additive mixed models (or structured additive mean regression models), and we compare both model classes with respect to our obesity data.
2014-01-01
Background Support vector regression (SVR) and Gaussian process regression (GPR) were used for the analysis of electroanalytical experimental data to estimate diffusion coefficients. Results For simulated cyclic voltammograms based on the EC, Eqr, and EqrC mechanisms these regression algorithms in combination with nonlinear kernel/covariance functions yielded diffusion coefficients with higher accuracy as compared to the standard approach of calculating diffusion coefficients relying on the Nicholson-Shain equation. The level of accuracy achieved by SVR and GPR is virtually independent of the rate constants governing the respective reaction steps. Further, the reduction of high-dimensional voltammetric signals by manual selection of typical voltammetric peak features decreased the performance of both regression algorithms compared to a reduction by downsampling or principal component analysis. After training on simulated data sets, diffusion coefficients were estimated by the regression algorithms for experimental data comprising voltammetric signals for three organometallic complexes. Conclusions Estimated diffusion coefficients closely matched the values determined by the parameter fitting method, but reduced the required computational time considerably for one of the reaction mechanisms. The automated processing of voltammograms according to the regression algorithms yields better results than the conventional analysis of peak-related data. PMID:24987463
Optical system for tablet variety discrimination using visible/near-infrared spectroscopy
NASA Astrophysics Data System (ADS)
Shao, Yongni; He, Yong; Hu, Xingyue
2007-12-01
An optical system based on visible/near-infrared spectroscopy (Vis/NIRS) for variety discrimination of ginkgo (Ginkgo biloba L.) tablets was developed. This system consisted of a light source, beam splitter system, sample chamber, optical detector (diffuse reflection detector), and data collection. The tablet varieties used in the research include Da na kang, Xin bang, Tian bao ning, Yi kang, Hua na xing, Dou le, Lv yuan, Hai wang, and Ji yao. All samples (n=270) were scanned in the Vis/NIR region between 325 and 1075 nm using a spectrograph. The chemometrics method of principal component artificial neural network (PC-ANN) was used to establish discrimination models of them. In PC-ANN models, the scores of the principal components were chosen as the input nodes for the input layer of ANN, and the best discrimination rate of 91.1% was reached. Principal component analysis was also executed to select several optimal wavelengths based on loading values. Wavelengths at 481, 458, 466, 570, 1000, 662, and 400 nm were then used as the input data of stepwise multiple linear regression, the regression equation of ginkgo tablets was obtained, and the discrimination rate was researched 84.4%. The results indicated that this optical system could be applied to discriminating ginkgo (Ginkgo biloba L.) tablets, and it supplied a new method for fast ginkgo tablet variety discrimination.
NASA Astrophysics Data System (ADS)
Werth, Alexandra; Liakat, Sabbir; Dong, Anqi; Woods, Callie M.; Gmachl, Claire F.
2018-05-01
An integrating sphere is used to enhance the collection of backscattered light in a noninvasive glucose sensor based on quantum cascade laser spectroscopy. The sphere enhances signal stability by roughly an order of magnitude, allowing us to use a thermoelectrically (TE) cooled detector while maintaining comparable glucose prediction accuracy levels. Using a smaller TE-cooled detector reduces form factor, creating a mobile sensor. Principal component analysis has predicted principal components of spectra taken from human subjects that closely match the absorption peaks of glucose. These principal components are used as regressors in a linear regression algorithm to make glucose concentration predictions, over 75% of which are clinically accurate.
Regression Models for Identifying Noise Sources in Magnetic Resonance Images
Zhu, Hongtu; Li, Yimei; Ibrahim, Joseph G.; Shi, Xiaoyan; An, Hongyu; Chen, Yashen; Gao, Wei; Lin, Weili; Rowe, Daniel B.; Peterson, Bradley S.
2009-01-01
Stochastic noise, susceptibility artifacts, magnetic field and radiofrequency inhomogeneities, and other noise components in magnetic resonance images (MRIs) can introduce serious bias into any measurements made with those images. We formally introduce three regression models including a Rician regression model and two associated normal models to characterize stochastic noise in various magnetic resonance imaging modalities, including diffusion-weighted imaging (DWI) and functional MRI (fMRI). Estimation algorithms are introduced to maximize the likelihood function of the three regression models. We also develop a diagnostic procedure for systematically exploring MR images to identify noise components other than simple stochastic noise, and to detect discrepancies between the fitted regression models and MRI data. The diagnostic procedure includes goodness-of-fit statistics, measures of influence, and tools for graphical display. The goodness-of-fit statistics can assess the key assumptions of the three regression models, whereas measures of influence can isolate outliers caused by certain noise components, including motion artifacts. The tools for graphical display permit graphical visualization of the values for the goodness-of-fit statistic and influence measures. Finally, we conduct simulation studies to evaluate performance of these methods, and we analyze a real dataset to illustrate how our diagnostic procedure localizes subtle image artifacts by detecting intravoxel variability that is not captured by the regression models. PMID:19890478
Variable selection and model choice in geoadditive regression models.
Kneib, Thomas; Hothorn, Torsten; Tutz, Gerhard
2009-06-01
Model choice and variable selection are issues of major concern in practical regression analyses, arising in many biometric applications such as habitat suitability analyses, where the aim is to identify the influence of potentially many environmental conditions on certain species. We describe regression models for breeding bird communities that facilitate both model choice and variable selection, by a boosting algorithm that works within a class of geoadditive regression models comprising spatial effects, nonparametric effects of continuous covariates, interaction surfaces, and varying coefficients. The major modeling components are penalized splines and their bivariate tensor product extensions. All smooth model terms are represented as the sum of a parametric component and a smooth component with one degree of freedom to obtain a fair comparison between the model terms. A generic representation of the geoadditive model allows us to devise a general boosting algorithm that automatically performs model choice and variable selection.
A regression-adjusted approach can estimate competing biomass
James H. Miller
1983-01-01
A method is presented for estimating above-ground herbaceous and woody biomass on competition research plots. On a set of destructively-sampled plots, an ocular estimate of biomass by vegetative component is first made, after which vegetation is clipped, dried, and weighed. Linear regressions are then calculated for each component between estimated and actual weights...
NASA Astrophysics Data System (ADS)
Kumar, Vandhna; Meyssignac, Benoit; Melet, Angélique; Ganachaud, Alexandre
2017-04-01
Rising sea levels are a critical concern in small island nations. The problem is especially serious in the western south Pacific, where the total sea level rise over the last 60 years is up to 3 times the global average. In this study, we attempt to reconstruct sea levels at selected sites in the region (Suva, Lautoka, Noumea - Fiji and New Caledonia) as a mutiple-linear regression of atmospheric and oceanic variables. We focus on interannual-to-decadal scale variability, and lower (including the global mean sea level rise) over the 1979-2014 period. Sea levels are taken from tide gauge records and the ORAS4 reanalysis dataset, and are expressed as a sum of steric and mass changes as a preliminary step. The key development in our methodology is using leading wind stress curl as a proxy for the thermosteric component. This is based on the knowledge that wind stress curl anomalies can modulate the thermocline depth and resultant sea levels via Rossby wave propagation. The analysis is primarily based on correlation between local sea level and selected predictors, the dominant one being wind stress curl. In the first step, proxy boxes for wind stress curl are determined via regions of highest correlation. The proportion of sea level explained via linear regression is then removed, leaving a residual. This residual is then correlated with other locally acting potential predictors: halosteric sea level, the zonal and meridional wind stress components, and sea surface temperature. The statistically significant predictors are used in a multi-linear regression function to simulate the observed sea level. The method is able to reproduce between 40 to 80% of the variance in observed sea level. Based on the skill of the model, it has high potential in sea level projection and downscaling studies.
Ximenes, Sofia; Silva, Ana; Soares, António; Flores-Colen, Inês; de Brito, Jorge
2016-05-04
Statistical models using multiple linear regression are some of the most widely used methods to study the influence of independent variables in a given phenomenon. This study's objective is to understand the influence of the various components of aerogel-based renders on their thermal and mechanical performance, namely cement (three types), fly ash, aerial lime, silica sand, expanded clay, type of aerogel, expanded cork granules, expanded perlite, air entrainers, resins (two types), and rheological agent. The statistical analysis was performed using SPSS (Statistical Package for Social Sciences), based on 85 mortar mixes produced in the laboratory and on their values of thermal conductivity and compressive strength obtained using tests in small-scale samples. The results showed that aerial lime assumes the main role in improving the thermal conductivity of the mortars. Aerogel type, fly ash, expanded perlite and air entrainers are also relevant components for a good thermal conductivity. Expanded clay can improve the mechanical behavior and aerogel has the opposite effect.
Ximenes, Sofia; Silva, Ana; Soares, António; Flores-Colen, Inês; de Brito, Jorge
2016-01-01
Statistical models using multiple linear regression are some of the most widely used methods to study the influence of independent variables in a given phenomenon. This study’s objective is to understand the influence of the various components of aerogel-based renders on their thermal and mechanical performance, namely cement (three types), fly ash, aerial lime, silica sand, expanded clay, type of aerogel, expanded cork granules, expanded perlite, air entrainers, resins (two types), and rheological agent. The statistical analysis was performed using SPSS (Statistical Package for Social Sciences), based on 85 mortar mixes produced in the laboratory and on their values of thermal conductivity and compressive strength obtained using tests in small-scale samples. The results showed that aerial lime assumes the main role in improving the thermal conductivity of the mortars. Aerogel type, fly ash, expanded perlite and air entrainers are also relevant components for a good thermal conductivity. Expanded clay can improve the mechanical behavior and aerogel has the opposite effect. PMID:28773460
Yang, Liyang; Shin, Hyun-Sang; Hur, Jin
2014-01-01
This study aimed at monitoring the changes of fluorescent components in wastewater samples from 22 Korean biological wastewater treatment plants and exploring their prediction capabilities for total organic carbon (TOC), dissolved organic carbon (DOC), biochemical oxygen demand (BOD), chemical oxygen demand (COD), and the biodegradability of the wastewater using an optical sensing technique based on fluorescence excitation emission matrices and parallel factor analysis (EEM-PARAFAC). Three fluorescent components were identified from the samples by using EEM-PARAFAC, including protein-like (C1), fulvic-like (C2) and humic-like (C3) components. C1 showed the highest removal efficiencies for all the treatment types investigated here (69% ± 26%–81% ± 8%), followed by C2 (37% ± 27%–65% ± 35%), while humic-like component (i.e., C3) tended to be accumulated during the biological treatment processes. The percentage of C1 in total fluorescence (%C1) decreased from 54% ± 8% in the influents to 28% ± 8% in the effluents, while those of C2 and C3 (%C2 and %C3) increased from 43% ± 6% to 62% ± 9% and from 3% ± 7% to 10% ± 8%, respectively. The concentrations of TOC, DOC, BOD, and COD were the most correlated with the fluorescence intensity (Fmax) of C1 (r = 0.790–0.817), as compared with the other two fluorescent components. The prediction capability of C1 for TOC, BOD, and COD were improved by using multiple regression based on Fmax of C1 and suspended solids (SS) (r = 0.856–0.865), both of which can be easily monitored in situ. The biodegradability of organic matter in BOD/COD were significantly correlated with each PARAFAC component and their combinations (r = −0.598–0.613, p < 0.001), with the highest correlation coefficient shown for %C1. The estimation capability was further enhanced by using multiple regressions based on %C1, %C2 and C3/C2 (r = −0.691). PMID:24448170
NASA Astrophysics Data System (ADS)
Merkord, C. L.; Liu, Y.; DeVos, M.; Wimberly, M. C.
2015-12-01
Malaria early detection and early warning systems are important tools for public health decision makers in regions where malaria transmission is seasonal and varies from year to year with fluctuations in rainfall and temperature. Here we present a new data-driven dynamic linear model based on the Kalman filter with time-varying coefficients that are used to identify malaria outbreaks as they occur (early detection) and predict the location and timing of future outbreaks (early warning). We fit linear models of malaria incidence with trend and Fourier form seasonal components using three years of weekly malaria case data from 30 districts in the Amhara Region of Ethiopia. We identified past outbreaks by comparing the modeled prediction envelopes with observed case data. Preliminary results demonstrated the potential for improved accuracy and timeliness over commonly-used methods in which thresholds are based on simpler summary statistics of historical data. Other benefits of the dynamic linear modeling approach include robustness to missing data and the ability to fit models with relatively few years of training data. To predict future outbreaks, we started with the early detection model for each district and added a regression component based on satellite-derived environmental predictor variables including precipitation data from the Tropical Rainfall Measuring Mission (TRMM) and land surface temperature (LST) and spectral indices from the Moderate Resolution Imaging Spectroradiometer (MODIS). We included lagged environmental predictors in the regression component of the model, with lags chosen based on cross-correlation of the one-step-ahead forecast errors from the first model. Our results suggest that predictions of future malaria outbreaks can be improved by incorporating lagged environmental predictors.
NASA Astrophysics Data System (ADS)
Li, Jiangtong; Luo, Yongdao; Dai, Honglin
2018-01-01
Water is the source of life and the essential foundation of all life. With the development of industrialization, the phenomenon of water pollution is becoming more and more frequent, which directly affects the survival and development of human. Water quality detection is one of the necessary measures to protect water resources. Ultraviolet (UV) spectral analysis is an important research method in the field of water quality detection, which partial least squares regression (PLSR) analysis method is becoming predominant technology, however, in some special cases, PLSR's analysis produce considerable errors. In order to solve this problem, the traditional principal component regression (PCR) analysis method was improved by using the principle of PLSR in this paper. The experimental results show that for some special experimental data set, improved PCR analysis method performance is better than PLSR. The PCR and PLSR is the focus of this paper. Firstly, the principal component analysis (PCA) is performed by MATLAB to reduce the dimensionality of the spectral data; on the basis of a large number of experiments, the optimized principal component is extracted by using the principle of PLSR, which carries most of the original data information. Secondly, the linear regression analysis of the principal component is carried out with statistic package for social science (SPSS), which the coefficients and relations of principal components can be obtained. Finally, calculating a same water spectral data set by PLSR and improved PCR, analyzing and comparing two results, improved PCR and PLSR is similar for most data, but improved PCR is better than PLSR for data near the detection limit. Both PLSR and improved PCR can be used in Ultraviolet spectral analysis of water, but for data near the detection limit, improved PCR's result better than PLSR.
ERIC Educational Resources Information Center
Beauducel, Andre
2007-01-01
It was investigated whether commonly used factor score estimates lead to the same reproduced covariance matrix of observed variables. This was achieved by means of Schonemann and Steiger's (1976) regression component analysis, since it is possible to compute the reproduced covariance matrices of the regression components corresponding to different…
A Method for Calculating the Probability of Successfully Completing a Rocket Propulsion Ground Test
NASA Technical Reports Server (NTRS)
Messer, Bradley P.
2004-01-01
Propulsion ground test facilities face the daily challenges of scheduling multiple customers into limited facility space and successfully completing their propulsion test projects. Due to budgetary and schedule constraints, NASA and industry customers are pushing to test more components, for less money, in a shorter period of time. As these new rocket engine component test programs are undertaken, the lack of technology maturity in the test articles, combined with pushing the test facilities capabilities to their limits, tends to lead to an increase in facility breakdowns and unsuccessful tests. Over the last five years Stennis Space Center's propulsion test facilities have performed hundreds of tests, collected thousands of seconds of test data, and broken numerous test facility and test article parts. While various initiatives have been implemented to provide better propulsion test techniques and improve the quality, reliability, and maintainability of goods and parts used in the propulsion test facilities, unexpected failures during testing still occur quite regularly due to the harsh environment in which the propulsion test facilities operate. Previous attempts at modeling the lifecycle of a propulsion component test project have met with little success. Each of the attempts suffered form incomplete or inconsistent data on which to base the models. By focusing on the actual test phase of the tests project rather than the formulation, design or construction phases of the test project, the quality and quantity of available data increases dramatically. A logistic regression model has been developed form the data collected over the last five years, allowing the probability of successfully completing a rocket propulsion component test to be calculated. A logistic regression model is a mathematical modeling approach that can be used to describe the relationship of several independent predictor variables X(sub 1), X(sub 2),..,X(sub k) to a binary or dichotomous dependent variable Y, where Y can only be one of two possible outcomes, in this case Success or Failure. Logistic regression has primarily been used in the fields of epidemiology and biomedical research, but lends itself to many other applications. As indicated the use of logistic regression is not new, however, modeling propulsion ground test facilities using logistic regression is both a new and unique application of the statistical technique. Results from the models provide project managers with insight and confidence into the affectivity of rocket engine component ground test projects. The initial success in modeling rocket propulsion ground test projects clears the way for more complex models to be developed in this area.
Genetic variation and seed transfer guidelines for ponderosa pine in central Oregon.
Frank C. Sorensen
1994-01-01
Adaptive genetic variation in seed and seedling traits for ponderosa pine from the east slopes of the Cascade Range in Oregon was analyzed by using 307 families from 227 locations. Factor scores from three principal components based on seed and seedling traits were related by multiple regression to latitude, distance from the Cascade crest, elevation, slope, and...
NASA Astrophysics Data System (ADS)
Zhao, Hong; Li, Changjun; Li, Hongping; Lv, Kebo; Zhao, Qinghui
2016-06-01
The sea surface salinity (SSS) is a key parameter in monitoring ocean states. Observing SSS can promote the understanding of global water cycle. This paper provides a new approach for retrieving sea surface salinity from Soil Moisture and Ocean Salinity (SMOS) satellite data. Based on the principal component regression (PCR) model, SSS can also be retrieved from the brightness temperature data of SMOS L2 measurements and Auxiliary data. 26 pair matchup data is used in model validation for the South China Sea (in the area of 4°-25°N, 105°-125°E). The RMSE value of PCR model retrieved SSS reaches 0.37 psu (practical salinity units) and the RMSE of SMOS SSS1 is 1.65 psu when compared with in-situ SSS. The corresponding Argo daily salinity data during April to June 2013 is also used in our validation with RMSE value 0.46 psu compared to 1.82 psu for daily averaged SMOS L2 products. This indicates that the PCR model is valid and may provide us with a good approach for retrieving SSS from SMOS satellite data.
Li, Hong Zhi; Tao, Wei; Gao, Ting; Li, Hui; Lu, Ying Hua; Su, Zhong Min
2011-01-01
We propose a generalized regression neural network (GRNN) approach based on grey relational analysis (GRA) and principal component analysis (PCA) (GP-GRNN) to improve the accuracy of density functional theory (DFT) calculation for homolysis bond dissociation energies (BDE) of Y-NO bond. As a demonstration, this combined quantum chemistry calculation with the GP-GRNN approach has been applied to evaluate the homolysis BDE of 92 Y-NO organic molecules. The results show that the ull-descriptor GRNN without GRA and PCA (F-GRNN) and with GRA (G-GRNN) approaches reduce the root-mean-square (RMS) of the calculated homolysis BDE of 92 organic molecules from 5.31 to 0.49 and 0.39 kcal mol(-1) for the B3LYP/6-31G (d) calculation. Then the newly developed GP-GRNN approach further reduces the RMS to 0.31 kcal mol(-1). Thus, the GP-GRNN correction on top of B3LYP/6-31G (d) can improve the accuracy of calculating the homolysis BDE in quantum chemistry and can predict homolysis BDE which cannot be obtained experimentally.
Sun, You-Wen; Liu, Wen-Qing; Wang, Shi-Mei; Huang, Shu-Hua; Yu, Xiao-Man
2011-10-01
A method of interference correction for nondispersive infrared multi-component gas analysis was described. According to the successive integral gas absorption models and methods, the influence of temperature and air pressure on the integral line strengths and linetype was considered, and based on Lorentz detuning linetypes, the absorption cross sections and response coefficients of H2O, CO2, CO, and NO on each filter channel were obtained. The four dimension linear regression equations for interference correction were established by response coefficients, the absorption cross interference was corrected by solving the multi-dimensional linear regression equations, and after interference correction, the pure absorbance signal on each filter channel was only controlled by the corresponding target gas concentration. When the sample cell was filled with gas mixture with a certain concentration proportion of CO, NO and CO2, the pure absorbance after interference correction was used for concentration inversion, the inversion concentration error for CO2 is 2.0%, the inversion concentration error for CO is 1.6%, and the inversion concentration error for NO is 1.7%. Both the theory and experiment prove that the interference correction method proposed for NDIR multi-component gas analysis is feasible.
Diagnosis of edge condition based on force measurement during milling of composites
NASA Astrophysics Data System (ADS)
Felusiak, Agata; Twardowski, Paweł
2018-04-01
The present paper presents comparative results of the forecasting of a cutting tool wear with the application of different methods of diagnostic deduction based on the measurement of cutting force components. The research was carried out during the milling of the Duralcan F3S.10S aluminum-ceramic composite. Prediction of the toolwear was based on one variable, two variables regression Multilayer Perceptron(MLP)and Radial Basis Function(RBF)neural networks. Forecasting the condition of the cutting tool on the basis of cutting forces has yielded very satisfactory results.
Libiger, Ondrej; Schork, Nicholas J.
2015-01-01
It is now feasible to examine the composition and diversity of microbial communities (i.e., “microbiomes”) that populate different human organs and orifices using DNA sequencing and related technologies. To explore the potential links between changes in microbial communities and various diseases in the human body, it is essential to test associations involving different species within and across microbiomes, environmental settings and disease states. Although a number of statistical techniques exist for carrying out relevant analyses, it is unclear which of these techniques exhibit the greatest statistical power to detect associations given the complexity of most microbiome datasets. We compared the statistical power of principal component regression, partial least squares regression, regularized regression, distance-based regression, Hill's diversity measures, and a modified test implemented in the popular and widely used microbiome analysis methodology “Metastats” across a wide range of simulated scenarios involving changes in feature abundance between two sets of metagenomic samples. For this purpose, simulation studies were used to change the abundance of microbial species in a real dataset from a published study examining human hands. Each technique was applied to the same data, and its ability to detect the simulated change in abundance was assessed. We hypothesized that a small subset of methods would outperform the rest in terms of the statistical power. Indeed, we found that the Metastats technique modified to accommodate multivariate analysis and partial least squares regression yielded high power under the models and data sets we studied. The statistical power of diversity measure-based tests, distance-based regression and regularized regression was significantly lower. Our results provide insight into powerful analysis strategies that utilize information on species counts from large microbiome data sets exhibiting skewed frequency distributions obtained on a small to moderate number of samples. PMID:26734061
Lee, Ji Eun; Kim, Hyun Woong; Lee, Sang Joon; Lee, Joo Eun
2015-05-01
To investigate vascular structural changes of choroidal neovascularization (CNV) followed by intravitreal ranibizumab injections using indocyanine green angiography. A total of 31 patients with exudative age-related macular degeneration and CNV whose structures were identifiable in indocyanine green angiography were included. Ranibizumab was injected into the vitreous cavity once a month for 3 months and then as needed for the next 3 months prospectively. Indocyanine green angiography was performed at baseline, 3, and 6 months. Early to midphase images of the indocyanine green angiography in the details of vascular structure of the CNV were discerned the best were used in the image analysis. Vascular structures of CNV were described as arteriovenular and capillary components, and structural changes were assessed. Arteriovenular components were observed in 29 eyes (94%). Regression of the capillary components was observed in most cases. Although regression of arteriovenular component was noted in 14 eyes (48%), complete resolution was not observed. The eyes were categorized into 3 groups according to CNV structural changes: the regressed (Group R, 10 eyes, 31%), the matured (Group M, 7 eyes, 23%), and the growing (Group G, 14 eyes, 45%). In Group R, there was no regrowth of CNV found at 6 months. In Group M, distinct vascular structures were observed at 3 months and persisted without apparent changes at 6 months. In Group G, growth or reperfusion of capillary components from the persisting arteriovenular components was noted at 6 months. Both capillary and arteriovenular components were regressed during monthly ranibizumab injections. However, CNV regrowth was observed in a group of patients during the as-needed treatment phase.
Rahman, Md. Jahanur; Shamim, Abu Ahmed; Klemm, Rolf D. W.; Labrique, Alain B.; Rashid, Mahbubur; Christian, Parul; West, Keith P.
2017-01-01
Birth weight, length and circumferences of the head, chest and arm are key measures of newborn size and health in developing countries. We assessed maternal socio-demographic factors associated with multiple measures of newborn size in a large rural population in Bangladesh using partial least squares (PLS) regression method. PLS regression, combining features from principal component analysis and multiple linear regression, is a multivariate technique with an ability to handle multicollinearity while simultaneously handling multiple dependent variables. We analyzed maternal and infant data from singletons (n = 14,506) born during a double-masked, cluster-randomized, placebo-controlled maternal vitamin A or β-carotene supplementation trial in rural northwest Bangladesh. PLS regression results identified numerous maternal factors (parity, age, early pregnancy MUAC, living standard index, years of education, number of antenatal care visits, preterm delivery and infant sex) significantly (p<0.001) associated with newborn size. Among them, preterm delivery had the largest negative influence on newborn size (Standardized β = -0.29 − -0.19; p<0.001). Scatter plots of the scores of first two PLS components also revealed an interaction between newborn sex and preterm delivery on birth size. PLS regression was found to be more parsimonious than both ordinary least squares regression and principal component regression. It also provided more stable estimates than the ordinary least squares regression and provided the effect measure of the covariates with greater accuracy as it accounts for the correlation among the covariates and outcomes. Therefore, PLS regression is recommended when either there are multiple outcome measurements in the same study, or the covariates are correlated, or both situations exist in a dataset. PMID:29261760
Kabir, Alamgir; Rahman, Md Jahanur; Shamim, Abu Ahmed; Klemm, Rolf D W; Labrique, Alain B; Rashid, Mahbubur; Christian, Parul; West, Keith P
2017-01-01
Birth weight, length and circumferences of the head, chest and arm are key measures of newborn size and health in developing countries. We assessed maternal socio-demographic factors associated with multiple measures of newborn size in a large rural population in Bangladesh using partial least squares (PLS) regression method. PLS regression, combining features from principal component analysis and multiple linear regression, is a multivariate technique with an ability to handle multicollinearity while simultaneously handling multiple dependent variables. We analyzed maternal and infant data from singletons (n = 14,506) born during a double-masked, cluster-randomized, placebo-controlled maternal vitamin A or β-carotene supplementation trial in rural northwest Bangladesh. PLS regression results identified numerous maternal factors (parity, age, early pregnancy MUAC, living standard index, years of education, number of antenatal care visits, preterm delivery and infant sex) significantly (p<0.001) associated with newborn size. Among them, preterm delivery had the largest negative influence on newborn size (Standardized β = -0.29 - -0.19; p<0.001). Scatter plots of the scores of first two PLS components also revealed an interaction between newborn sex and preterm delivery on birth size. PLS regression was found to be more parsimonious than both ordinary least squares regression and principal component regression. It also provided more stable estimates than the ordinary least squares regression and provided the effect measure of the covariates with greater accuracy as it accounts for the correlation among the covariates and outcomes. Therefore, PLS regression is recommended when either there are multiple outcome measurements in the same study, or the covariates are correlated, or both situations exist in a dataset.
Study of Rapid-Regression Liquefying Hybrid Rocket Fuels
NASA Technical Reports Server (NTRS)
Zilliac, Greg; DeZilwa, Shane; Karabeyoglu, M. Arif; Cantwell, Brian J.; Castellucci, Paul
2004-01-01
A report describes experiments directed toward the development of paraffin-based hybrid rocket fuels that burn at regression rates greater than those of conventional hybrid rocket fuels like hydroxyl-terminated butadiene. The basic approach followed in this development is to use materials such that a hydrodynamically unstable liquid layer forms on the melting surface of a burning fuel body. Entrainment of droplets from the liquid/gas interface can substantially increase the rate of fuel mass transfer, leading to surface regression faster than can be achieved using conventional fuels. The higher regression rate eliminates the need for the complex multi-port grain structures of conventional solid rocket fuels, making it possible to obtain acceptable performance from single-port structures. The high-regression-rate fuels contain no toxic or otherwise hazardous components and can be shipped commercially as non-hazardous commodities. Among the experiments performed on these fuels were scale-up tests using gaseous oxygen. The data from these tests were found to agree with data from small-scale, low-pressure and low-mass-flux laboratory tests and to confirm the expectation that these fuels would burn at high regression rates, chamber pressures, and mass fluxes representative of full-scale rocket motors.
Electrophysiology and optical coherence tomography to evaluate Parkinson disease severity.
Garcia-Martin, Elena; Rodriguez-Mena, Diego; Satue, Maria; Almarcegui, Carmen; Dolz, Isabel; Alarcia, Raquel; Seral, Maria; Polo, Vicente; Larrosa, Jose M; Pablo, Luis E
2014-02-04
To evaluate correlations between visual evoked potentials (VEP), pattern electroretinogram (PERG), and macular and retinal nerve fiber layer (RNFL) thickness measured by optical coherence tomography (OCT) and the severity of Parkinson disease (PD). Forty-six PD patients and 33 age and sex-matched healthy controls were enrolled, and underwent VEP, PERG, and OCT measurements of macular and RNFL thicknesses, and evaluation of PD severity using the Hoehn and Yahr scale to measure PD symptom progression, the Schwab and England Activities of Daily Living Scale (SE-ADL) to evaluate patient quality of life (QOL), and disease duration. Logistical regression was performed to analyze which measures, if any, could predict PD symptom progression or effect on QOL. Visual functional parameters (best corrected visual acuity, mean deviation of visual field, PERG positive (P) component at 50 ms -P50- and negative (N) component at 95 ms -N95- component amplitude, and PERG P50 component latency) and structural parameters (OCT measurements of RNFL and retinal thickness) were decreased in PD patients compared with healthy controls. OCT measurements were significantly negatively correlated with the Hoehn and Yahr scale, and significantly positively correlated with the SE-ADL scale. Based on logistical regression analysis, fovea thickness provided by OCT equipment predicted PD severity, and QOL and amplitude of the PERG N95 component predicted a lower SE-ADL score. Patients with greater damage in the RNFL tend to have lower QOL and more severe PD symptoms. Foveal thicknesses and the PERG N95 component provide good biomarkers for predicting QOL and disease severity.
Carbonell, Felix; Bellec, Pierre
2011-01-01
Abstract The influence of the global average signal (GAS) on functional-magnetic resonance imaging (fMRI)–based resting-state functional connectivity is a matter of ongoing debate. The global average fluctuations increase the correlation between functional systems beyond the correlation that reflects their specific functional connectivity. Hence, removal of the GAS is a common practice for facilitating the observation of network-specific functional connectivity. This strategy relies on the implicit assumption of a linear-additive model according to which global fluctuations, irrespective of their origin, and network-specific fluctuations are super-positioned. However, removal of the GAS introduces spurious negative correlations between functional systems, bringing into question the validity of previous findings of negative correlations between fluctuations in the default-mode and the task-positive networks. Here we present an alternative method for estimating global fluctuations, immune to the complications associated with the GAS. Principal components analysis was applied to resting-state fMRI time-series. A global-signal effect estimator was defined as the principal component (PC) that correlated best with the GAS. The mean correlation coefficient between our proposed PC-based global effect estimator and the GAS was 0.97±0.05, demonstrating that our estimator successfully approximated the GAS. In 66 out of 68 runs, the PC that showed the highest correlation with the GAS was the first PC. Since PCs are orthogonal, our method provides an estimator of the global fluctuations, which is uncorrelated to the remaining, network-specific fluctuations. Moreover, unlike the regression of the GAS, the regression of the PC-based global effect estimator does not introduce spurious anti-correlations beyond the decrease in seed-based correlation values allowed by the assumed additive model. After regressing this PC-based estimator out of the original time-series, we observed robust anti-correlations between resting-state fluctuations in the default-mode and the task-positive networks. We conclude that resting-state global fluctuations and network-specific fluctuations are uncorrelated, supporting a Resting-State Linear-Additive Model. In addition, we conclude that the network-specific resting-state fluctuations of the default-mode and task-positive networks show artifact-free anti-correlations. PMID:22444074
Xian, George; Homer, Collin G.; Aldridge, Cameron L.
2012-01-01
An approach that can generate sagebrush habitat change estimates for monitoring large-area sagebrush ecosystems has been developed and tested in southwestern Wyoming, USA. This prototype method uses a satellite-based image change detection algorithm and regression models to estimate sub-pixel percentage cover for five sagebrush habitat components: bare ground, herbaceous, litter, sagebrush and shrub. Landsat images from three different months in 1988, 1996 and 2006 were selected to identify potential landscape change during these time periods using change vector (CV) analysis incorporated with an image normalization algorithm. Regression tree (RT) models were used to estimate percentage cover for five components on all change areas identified in 1988 and 1996, using unchanged 2006 baseline data as training for both estimates. Over the entire study area (24 950 km2), a net increase of 98.83 km2, or 0.7%, for bare ground was measured between 1988 and 2006. Over the same period, the other four components had net losses of 20.17 km2, or 0.6%, for herbaceous vegetation; 30.16 km2, or 0.7%, for litter; 32.81 km2, or 1.5%, for sagebrush; and 33.34 km2, or 1.2%, for shrubs. The overall accuracy for shrub vegetation change between 1988 and 2006 was 89.56%. Change patterns within sagebrush habitat components differ spatially and quantitatively from each other, potentially indicating unique responses by these components to disturbances imposed upon them.
Liu, Weijian; Wang, Yilong; Chen, Yuanchen; Tao, Shu; Liu, Wenxin
2017-07-01
The total concentrations and component profiles of polycyclic aromatic hydrocarbons (PAHs) in ambient air, surface soil and wheat grain collected from wheat fields near a large steel-smelting manufacturer in Northern China were determined. Based on the specific isomeric ratios of paired species in ambient air, principle component analysis and multivariate linear regression, the main emission source of local PAHs was identified as a mixture of industrial and domestic coal combustion, biomass burning and traffic exhaust. The total organic carbon (TOC) fraction was considerably correlated with the total and individual PAH concentrations in surface soil. The total concentrations of PAHs in wheat grain were relatively low, with dominant low molecular weight constituents, and the compositional profile was more similar to that in ambient air than in topsoil. Combined with more significant results from partial correlation and linear regression models, the contribution from air PAHs to grain PAHs may be greater than that from soil PAHs. Copyright © 2016. Published by Elsevier B.V.
Effect of Malmquist bias on correlation studies with IRAS data base
NASA Technical Reports Server (NTRS)
Verter, Frances
1993-01-01
The relationships between galaxy properties in the sample of Trinchieri et al. (1989) are reexamined with corrections for Malmquist bias. The linear correlations are tested and linear regressions are fit for log-log plots of L(FIR), L(H-alpha), and L(B) as well as ratios of these quantities. The linear correlations for Malmquist bias are corrected using the method of Verter (1988), in which each galaxy observation is weighted by the inverse of its sampling volume. The linear regressions are corrected for Malmquist bias by a new method invented here in which each galaxy observation is weighted by its sampling volume. The results of correlation and regressions among the sample are significantly changed in the anticipated sense that the corrected correlation confidences are lower and the corrected slopes of the linear regressions are lower. The elimination of Malmquist bias eliminates the nonlinear rise in luminosity that has caused some authors to hypothesize additional components of FIR emission.
Vindimian, Éric; Garric, Jeanne; Flammarion, Patrick; Thybaud, Éric; Babut, Marc
1999-10-01
The evaluation of the ecotoxicity of effluents requires a battery of biological tests on several species. In order to derive a summary parameter from such a battery, a single endpoint was calculated for all the tests: the EC10, obtained by nonlinear regression, with bootstrap evaluation of the confidence intervals. Principal component analysis was used to characterize and visualize the correlation between the tests. The table of the toxicity of the effluents was then submitted to a panel of experts, who classified the effluents according to the test results. Partial least squares (PLS) regression was used to fit the average value of the experts' judgements to the toxicity data, using a simple equation. Furthermore, PLS regression on partial data sets and other considerations resulted in an optimum battery, with two chronic tests and one acute test. The index is intended to be used for the classification of effluents based on their toxicity to aquatic species. Copyright © 1999 SETAC.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vindimian, E.; Garric, J.; Flammarion, P.
1999-10-01
The evaluation of the ecotoxicity of effluents requires a battery of biological tests on several species. In order to derive a summary parameter from such a battery, a single endpoint was calculated for all the tests: the EC10, obtained by nonlinear regression, with bootstrap evaluation of the confidence intervals. Principal component analysis was used to characterize and visualize the correlation between the tests. The table of the toxicity of the effluents was then submitted to a panel of experts, who classified the effluents according to the test results. Partial least squares (PLS) regression was used to fit the average valuemore » of the experts' judgments to the toxicity data, using a simple equation. Furthermore, PLS regression on partial data sets and other considerations resulted in an optimum battery, with two chronic tests and one acute test. The index is intended to be used for the classification of effluents based on their toxicity to aquatic species.« less
Mathematical Modelling of Optimization of Structures of Monolithic Coverings Based on Liquid Rubbers
NASA Astrophysics Data System (ADS)
Turgumbayeva, R. Kh; Abdikarimov, M. N.; Mussabekov, R.; Sartayev, D. T.
2018-05-01
The paper considers optimization of monolithic coatings compositions using a computer and MPE methods. The goal of the paper was to construct a mathematical model of the complete factorial experiment taking into account its plan and conditions. Several regression equations were received. Dependence between content components and parameters of rubber, as well as the quantity of a rubber crumb, was considered. An optimal composition for manufacturing the material of monolithic coatings compositions was recommended based on experimental data.
B. Desta Fekedulegn; J.J. Colbert; R.R., Jr. Hicks; Michael E. Schuckers
2002-01-01
The theory and application of principal components regression, a method for coping with multicollinearity among independent variables in analyzing ecological data, is exhibited in detail. A concrete example of the complex procedures that must be carried out in developing a diagnostic growth-climate model is provided. We use tree radial increment data taken from breast...
Smith, David V.; Utevsky, Amanda V.; Bland, Amy R.; Clement, Nathan; Clithero, John A.; Harsch, Anne E. W.; Carter, R. McKell; Huettel, Scott A.
2014-01-01
A central challenge for neuroscience lies in relating inter-individual variability to the functional properties of specific brain regions. Yet, considerable variability exists in the connectivity patterns between different brain areas, potentially producing reliable group differences. Using sex differences as a motivating example, we examined two separate resting-state datasets comprising a total of 188 human participants. Both datasets were decomposed into resting-state networks (RSNs) using a probabilistic spatial independent components analysis (ICA). We estimated voxelwise functional connectivity with these networks using a dual-regression analysis, which characterizes the participant-level spatiotemporal dynamics of each network while controlling for (via multiple regression) the influence of other networks and sources of variability. We found that males and females exhibit distinct patterns of connectivity with multiple RSNs, including both visual and auditory networks and the right frontal-parietal network. These results replicated across both datasets and were not explained by differences in head motion, data quality, brain volume, cortisol levels, or testosterone levels. Importantly, we also demonstrate that dual-regression functional connectivity is better at detecting inter-individual variability than traditional seed-based functional connectivity approaches. Our findings characterize robust—yet frequently ignored—neural differences between males and females, pointing to the necessity of controlling for sex in neuroscience studies of individual differences. Moreover, our results highlight the importance of employing network-based models to study variability in functional connectivity. PMID:24662574
Zaremba, Dario; Enneking, Verena; Meinert, Susanne; Förster, Katharina; Bürger, Christian; Dohm, Katharina; Grotegerd, Dominik; Redlich, Ronny; Dietsche, Bruno; Krug, Axel; Kircher, Tilo; Kugel, Harald; Heindel, Walter; Baune, Bernhard T; Arolt, Volker; Dannlowski, Udo
2018-02-08
Patients with major depression show reduced hippocampal volume compared to healthy controls. However, the contribution of patients' cumulative illness severity to hippocampal volume has rarely been investigated. It was the aim of our study to find a composite score of cumulative illness severity that is associated with hippocampal volume in depression. We estimated hippocampal gray matter volume using 3-tesla brain magnetic resonance imaging in 213 inpatients with acute major depression according to DSM-IV criteria (employing the SCID interview) and 213 healthy controls. Patients' cumulative illness severity was ascertained by six clinical variables via structured clinical interviews. A principal component analysis was conducted to identify components reflecting cumulative illness severity. Regression analyses and a voxel-based morphometry approach were used to investigate the influence of patients' individual component scores on hippocampal volume. Principal component analysis yielded two main components of cumulative illness severity: Hospitalization and Duration of Illness. While the component Hospitalization incorporated information from the intensity of inpatient treatment, the component Duration of Illness was based on the duration and frequency of illness episodes. We could demonstrate a significant inverse association of patients' Hospitalization component scores with bilateral hippocampal gray matter volume. This relationship was not found for Duration of Illness component scores. Variables associated with patients' history of psychiatric hospitalization seem to be accurate predictors of hippocampal volume in major depression and reliable estimators of patients' cumulative illness severity. Future studies should pay attention to these measures when investigating hippocampal volume changes in major depression.
Ghosh, Sudipta; Dosaev, Tasbulat; Prakash, Jai; Livshits, Gregory
2017-04-01
The major aim of this study was to conduct comparative quantitative-genetic analysis of the body composition (BCP) and somatotype (STP) variation, as well as their correlations with blood pressure (BP) in two ethnically, culturally and geographically different populations: Santhal, indigenous ethnic group from India and Chuvash, indigenous population from Russia. Correspondently two pedigree-based samples were collected from 1,262 Santhal and1,558 Chuvash individuals, respectively. At the first stage of the study, descriptive statistics and a series of univariate regression analyses were calculated. Finally, multiple and multivariate regression (MMR) analyses, with BP measurements as dependent variables and age, sex, BCP and STP as independent variables were carried out in each sample separately. The significant and independent covariates of BP were identified and used for re-examination in pedigree-based variance decomposition analysis. Despite clear and significant differences between the populations in BCP/STP, both Santhal and Chuvash were found to be predominantly mesomorphic irrespective of their sex. According to MMR analyses variation of BP significantly depended on age and mesomorphic component in both samples, and in addition on sex, ectomorphy and fat mass index in Santhal and on fat free mass index in Chuvash samples, respectively. Additive genetic component contributes to a substantial proportion of blood pressure and body composition variance. Variance component analysis in addition to above mentioned results suggests that additive genetic factors influence BP and BCP/STP associations significantly. © 2017 Wiley Periodicals, Inc.
Zhao, Ni; Chen, Jun; Carroll, Ian M.; Ringel-Kulka, Tamar; Epstein, Michael P.; Zhou, Hua; Zhou, Jin J.; Ringel, Yehuda; Li, Hongzhe; Wu, Michael C.
2015-01-01
High-throughput sequencing technology has enabled population-based studies of the role of the human microbiome in disease etiology and exposure response. Distance-based analysis is a popular strategy for evaluating the overall association between microbiome diversity and outcome, wherein the phylogenetic distance between individuals’ microbiome profiles is computed and tested for association via permutation. Despite their practical popularity, distance-based approaches suffer from important challenges, especially in selecting the best distance and extending the methods to alternative outcomes, such as survival outcomes. We propose the microbiome regression-based kernel association test (MiRKAT), which directly regresses the outcome on the microbiome profiles via the semi-parametric kernel machine regression framework. MiRKAT allows for easy covariate adjustment and extension to alternative outcomes while non-parametrically modeling the microbiome through a kernel that incorporates phylogenetic distance. It uses a variance-component score statistic to test for the association with analytical p value calculation. The model also allows simultaneous examination of multiple distances, alleviating the problem of choosing the best distance. Our simulations demonstrated that MiRKAT provides correctly controlled type I error and adequate power in detecting overall association. “Optimal” MiRKAT, which considers multiple candidate distances, is robust in that it suffers from little power loss in comparison to when the best distance is used and can achieve tremendous power gain in comparison to when a poor distance is chosen. Finally, we applied MiRKAT to real microbiome datasets to show that microbial communities are associated with smoking and with fecal protease levels after confounders are controlled for. PMID:25957468
Coelho, Lúcia H G; Gutz, Ivano G R
2006-03-15
A chemometric method for analysis of conductometric titration data was introduced to extend its applicability to lower concentrations and more complex acid-base systems. Auxiliary pH measurements were made during the titration to assist the calculation of the distribution of protonable species on base of known or guessed equilibrium constants. Conductivity values of each ionized or ionizable species possibly present in the sample were introduced in a general equation where the only unknown parameters were the total concentrations of (conjugated) bases and of strong electrolytes not involved in acid-base equilibria. All these concentrations were adjusted by a multiparametric nonlinear regression (NLR) method, based on the Levenberg-Marquardt algorithm. This first conductometric titration method with NLR analysis (CT-NLR) was successfully applied to simulated conductometric titration data and to synthetic samples with multiple components at concentrations as low as those found in rainwater (approximately 10 micromol L(-1)). It was possible to resolve and quantify mixtures containing a strong acid, formic acid, acetic acid, ammonium ion, bicarbonate and inert electrolyte with accuracy of 5% or better.
Optimizing methods for linking cinematic features to fMRI data.
Kauttonen, Janne; Hlushchuk, Yevhen; Tikka, Pia
2015-04-15
One of the challenges of naturalistic neurosciences using movie-viewing experiments is how to interpret observed brain activations in relation to the multiplicity of time-locked stimulus features. As previous studies have shown less inter-subject synchronization across viewers of random video footage than story-driven films, new methods need to be developed for analysis of less story-driven contents. To optimize the linkage between our fMRI data collected during viewing of a deliberately non-narrative silent film 'At Land' by Maya Deren (1944) and its annotated content, we combined the method of elastic-net regularization with the model-driven linear regression and the well-established data-driven independent component analysis (ICA) and inter-subject correlation (ISC) methods. In the linear regression analysis, both IC and region-of-interest (ROI) time-series were fitted with time-series of a total of 36 binary-valued and one real-valued tactile annotation of film features. The elastic-net regularization and cross-validation were applied in the ordinary least-squares linear regression in order to avoid over-fitting due to the multicollinearity of regressors, the results were compared against both the partial least-squares (PLS) regression and the un-regularized full-model regression. Non-parametric permutation testing scheme was applied to evaluate the statistical significance of regression. We found statistically significant correlation between the annotation model and 9 ICs out of 40 ICs. Regression analysis was also repeated for a large set of cubic ROIs covering the grey matter. Both IC- and ROI-based regression analyses revealed activations in parietal and occipital regions, with additional smaller clusters in the frontal lobe. Furthermore, we found elastic-net based regression more sensitive than PLS and un-regularized regression since it detected a larger number of significant ICs and ROIs. Along with the ISC ranking methods, our regression analysis proved a feasible method for ordering the ICs based on their functional relevance to the annotated cinematic features. The novelty of our method is - in comparison to the hypothesis-driven manual pre-selection and observation of some individual regressors biased by choice - in applying data-driven approach to all content features simultaneously. We found especially the combination of regularized regression and ICA useful when analyzing fMRI data obtained using non-narrative movie stimulus with a large set of complex and correlated features. Copyright © 2015. Published by Elsevier Inc.
Evolution of the Marine Officer Fitness Report: A Multivariate Analysis
This thesis explores the evaluation behavior of United States Marine Corps (USMC) Reporting Seniors (RSs) from 2010 to 2017. Using fitness report...RSs evaluate the performance of subordinate active component unrestricted officer MROs over time. I estimate logistic regression models of the...lowest. However, these correlations indicating the effects of race matching on FITREP evaluations narrow in significance when performance-based factors
An Extension of CART's Pruning Algorithm. Program Statistics Research Technical Report No. 91-11.
ERIC Educational Resources Information Center
Kim, Sung-Ho
Among the computer-based methods used for the construction of trees such as AID, THAID, CART, and FACT, the only one that uses an algorithm that first grows a tree and then prunes the tree is CART. The pruning component of CART is analogous in spirit to the backward elimination approach in regression analysis. This idea provides a tool in…
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chen, Wei-Chen; Maitra, Ranjan
2011-01-01
We propose a model-based approach for clustering time series regression data in an unsupervised machine learning framework to identify groups under the assumption that each mixture component follows a Gaussian autoregressive regression model of order p. Given the number of groups, the traditional maximum likelihood approach of estimating the parameters using the expectation-maximization (EM) algorithm can be employed, although it is computationally demanding. The somewhat fast tune to the EM folk song provided by the Alternating Expectation Conditional Maximization (AECM) algorithm can alleviate the problem to some extent. In this article, we develop an alternative partial expectation conditional maximization algorithmmore » (APECM) that uses an additional data augmentation storage step to efficiently implement AECM for finite mixture models. Results on our simulation experiments show improved performance in both fewer numbers of iterations and computation time. The methodology is applied to the problem of clustering mutual funds data on the basis of their average annual per cent returns and in the presence of economic indicators.« less
NASA Astrophysics Data System (ADS)
Liu, Yande; Ying, Yibin; Lu, Huishan; Fu, Xiaping
2004-12-01
This work evaluates the feasibility of Fourier transform near infrared (FT-NIR) spectrometry for rapid determining the total soluble solids content and acidity of apple fruit. Intact apple fruit were measured by reflectance FT-NIR in 800-2500 nm range. FT-NIR models were developed based on partial least square (PLS) regression and principal component regress (PCR) with respect to the reflectance and its first derivative, the logarithms of the reflectance reciprocal and its second derivative. The above regression models, related the FT-NIR spectra to soluble solids content (SSC), titratable acidity (TA) and available acidity (pH). The best combination, based on the prediction results, was PLS models with respect to the logarithms of the reflectance reciprocal. Predictions with PLS models resulted standard errors of prediction (SEP) of 0.455, 0.044 and 0.068, and correlation coefficients of 0.968, 0.728 and 0.831 for SSC, TA and pH, respectively. It was concluded that by using the FT-NIR spectrometry measurement system, in the appropriate spectral range, it is possible to nondestructively assess the maturity factors of apple fruit.
NASA Astrophysics Data System (ADS)
Gholizadeh, H.; Robeson, S. M.
2015-12-01
Empirical models have been widely used to estimate global chlorophyll content from remotely sensed data. Here, we focus on the standard NASA empirical models that use blue-green band ratios. These band ratio ocean color (OC) algorithms are in the form of fourth-order polynomials and the parameters of these polynomials (i.e. coefficients) are estimated from the NASA bio-Optical Marine Algorithm Data set (NOMAD). Most of the points in this data set have been sampled from tropical and temperate regions. However, polynomial coefficients obtained from this data set are used to estimate chlorophyll content in all ocean regions with different properties such as sea-surface temperature, salinity, and downwelling/upwelling patterns. Further, the polynomial terms in these models are highly correlated. In sum, the limitations of these empirical models are as follows: 1) the independent variables within the empirical models, in their current form, are correlated (multicollinear), and 2) current algorithms are global approaches and are based on the spatial stationarity assumption, so they are independent of location. Multicollinearity problem is resolved by using partial least squares (PLS). PLS, which transforms the data into a set of independent components, can be considered as a combined form of principal component regression (PCR) and multiple regression. Geographically weighted regression (GWR) is also used to investigate the validity of spatial stationarity assumption. GWR solves a regression model over each sample point by using the observations within its neighbourhood. PLS results show that the empirical method underestimates chlorophyll content in high latitudes, including the Southern Ocean region, when compared to PLS (see Figure 1). Cluster analysis of GWR coefficients also shows that the spatial stationarity assumption in empirical models is not likely a valid assumption.
Monopole and dipole estimation for multi-frequency sky maps by linear regression
NASA Astrophysics Data System (ADS)
Wehus, I. K.; Fuskeland, U.; Eriksen, H. K.; Banday, A. J.; Dickinson, C.; Ghosh, T.; Górski, K. M.; Lawrence, C. R.; Leahy, J. P.; Maino, D.; Reich, P.; Reich, W.
2017-01-01
We describe a simple but efficient method for deriving a consistent set of monopole and dipole corrections for multi-frequency sky map data sets, allowing robust parametric component separation with the same data set. The computational core of this method is linear regression between pairs of frequency maps, often called T-T plots. Individual contributions from monopole and dipole terms are determined by performing the regression locally in patches on the sky, while the degeneracy between different frequencies is lifted whenever the dominant foreground component exhibits a significant spatial spectral index variation. Based on this method, we present two different, but each internally consistent, sets of monopole and dipole coefficients for the nine-year WMAP, Planck 2013, SFD 100 μm, Haslam 408 MHz and Reich & Reich 1420 MHz maps. The two sets have been derived with different analysis assumptions and data selection, and provide an estimate of residual systematic uncertainties. In general, our values are in good agreement with previously published results. Among the most notable results are a relative dipole between the WMAP and Planck experiments of 10-15μK (depending on frequency), an estimate of the 408 MHz map monopole of 8.9 ± 1.3 K, and a non-zero dipole in the 1420 MHz map of 0.15 ± 0.03 K pointing towards Galactic coordinates (l,b) = (308°,-36°) ± 14°. These values represent the sum of any instrumental and data processing offsets, as well as any Galactic or extra-Galactic component that is spectrally uniform over the full sky.
Tu, Yu-Kang; Krämer, Nicole; Lee, Wen-Chung
2012-07-01
In the analysis of trends in health outcomes, an ongoing issue is how to separate and estimate the effects of age, period, and cohort. As these 3 variables are perfectly collinear by definition, regression coefficients in a general linear model are not unique. In this tutorial, we review why identification is a problem, and how this problem may be tackled using partial least squares and principal components regression analyses. Both methods produce regression coefficients that fulfill the same collinearity constraint as the variables age, period, and cohort. We show that, because the constraint imposed by partial least squares and principal components regression is inherent in the mathematical relation among the 3 variables, this leads to more interpretable results. We use one dataset from a Taiwanese health-screening program to illustrate how to use partial least squares regression to analyze the trends in body heights with 3 continuous variables for age, period, and cohort. We then use another dataset of hepatocellular carcinoma mortality rates for Taiwanese men to illustrate how to use partial least squares regression to analyze tables with aggregated data. We use the second dataset to show the relation between the intrinsic estimator, a recently proposed method for the age-period-cohort analysis, and partial least squares regression. We also show that the inclusion of all indicator variables provides a more consistent approach. R code for our analyses is provided in the eAppendix.
Zhonggang, Liang; Hong, Yan
2006-10-01
A new method of calculating fractal dimension of short-term heart rate variability signals is presented. The method is based on wavelet transform and filter banks. The implementation of the method is: First of all we pick-up the fractal component from HRV signals using wavelet transform. Next, we estimate the power spectrum distribution of fractal component using auto-regressive model, and we estimate parameter 7 using the least square method. Finally according to formula D = 2- (gamma-1)/2 estimate fractal dimension of HRV signal. To validate the stability and reliability of the proposed method, using fractional brown movement simulate 24 fractal signals that fractal value is 1.6 to validate, the result shows that the method has stability and reliability.
NASA Astrophysics Data System (ADS)
Fouad, Geoffrey; Skupin, André; Hope, Allen
2016-04-01
The flow duration curve (FDC) is one of the most widely used tools to quantify streamflow. Its percentile flows are often required for water resource applications, but these values must be predicted for ungauged basins with insufficient or no streamflow data. Regional regression is a commonly used approach for predicting percentile flows that involves identifying hydrologic regions and calibrating regression models to each region. The independent variables used to describe the physiographic and climatic setting of the basins are a critical component of regional regression, yet few studies have investigated their effect on resulting predictions. In this study, the complexity of the independent variables needed for regional regression is investigated. Different levels of variable complexity are applied for a regional regression consisting of 918 basins in the US. Both the hydrologic regions and regression models are determined according to the different sets of variables, and the accuracy of resulting predictions is assessed. The different sets of variables include (1) a simple set of three variables strongly tied to the FDC (mean annual precipitation, potential evapotranspiration, and baseflow index), (2) a traditional set of variables describing the average physiographic and climatic conditions of the basins, and (3) a more complex set of variables extending the traditional variables to include statistics describing the distribution of physiographic data and temporal components of climatic data. The latter set of variables is not typically used in regional regression, and is evaluated for its potential to predict percentile flows. The simplest set of only three variables performed similarly to the other more complex sets of variables. Traditional variables used to describe climate, topography, and soil offered little more to the predictions, and the experimental set of variables describing the distribution of basin data in more detail did not improve predictions. These results are largely reflective of cross-correlation existing in hydrologic datasets, and highlight the limited predictive power of many traditionally used variables for regional regression. A parsimonious approach including fewer variables chosen based on their connection to streamflow may be more efficient than a data mining approach including many different variables. Future regional regression studies may benefit from having a hydrologic rationale for including different variables and attempting to create new variables related to streamflow.
Tussing-Humphreys, Lisa; Thomson, Jessica L; Mayo, Tanyatta; Edmond, Emanuel
2013-06-06
Obesity, diabetes, and hypertension have reached epidemic levels in the largely rural Lower Mississippi Delta (LMD) region. We assessed the effectiveness of a 6-month, church-based diet and physical activity intervention, conducted during 2010 through 2011, for improving diet quality (measured by the Healthy Eating Index-2005) and increasing physical activity of African American adults in the LMD region. We used a quasi-experimental design in which 8 self-selected eligible churches were assigned to intervention or control. Assessments included dietary, physical activity, anthropometric, and clinical measures. Statistical tests for group comparisons included χ(2), Fisher's exact, and McNemar's tests for categorical variables, and mixed-model regression analysis for continuous variables and modeling intervention effects. Retention rates were 85% (176 of 208) for control and 84% (163 of 195) for intervention churches. Diet quality components, including total fruit, total vegetables, and total quality improved significantly in both control (mean [standard deviation], 0.3 [1.8], 0.2 [1.1], and 3.4 [9.6], respectively) and intervention (0.6 [1.7], 0.3 [1.2], and 3.2 [9.7], respectively) groups, while significant increases in aerobic (22%) and strength/flexibility (24%) physical activity indicators were apparent in the intervention group only. Regression analysis indicated that intervention participation level and vehicle ownership were significant positive predictors of change for several diet quality components. This church-based diet and physical activity intervention may be effective in improving diet quality and increasing physical activity of LMD African American adults. Components key to the success of such programs are participant engagement in educational sessions and vehicle access.
NASA Astrophysics Data System (ADS)
Oguntunde, Philip G.; Lischeid, Gunnar; Dietrich, Ottfried
2018-03-01
This study examines the variations of climate variables and rice yield and quantifies the relationships among them using multiple linear regression, principal component analysis, and support vector machine (SVM) analysis in southwest Nigeria. The climate and yield data used was for a period of 36 years between 1980 and 2015. Similar to the observed decrease ( P < 0.001) in rice yield, pan evaporation, solar radiation, and wind speed declined significantly. Eight principal components exhibited an eigenvalue > 1 and explained 83.1% of the total variance of predictor variables. The SVM regression function using the scores of the first principal component explained about 75% of the variance in rice yield data and linear regression about 64%. SVM regression between annual solar radiation values and yield explained 67% of the variance. Only the first component of the principal component analysis (PCA) exhibited a clear long-term trend and sometimes short-term variance similar to that of rice yield. Short-term fluctuations of the scores of the PC1 are closely coupled to those of rice yield during the 1986-1993 and the 2006-2013 periods thereby revealing the inter-annual sensitivity of rice production to climate variability. Solar radiation stands out as the climate variable of highest influence on rice yield, and the influence was especially strong during monsoon and post-monsoon periods, which correspond to the vegetative, booting, flowering, and grain filling stages in the study area. The outcome is expected to provide more in-depth regional-specific climate-rice linkage for screening of better cultivars that can positively respond to future climate fluctuations as well as providing information that may help optimized planting dates for improved radiation use efficiency in the study area.
Maximum Entropy Discrimination Poisson Regression for Software Reliability Modeling.
Chatzis, Sotirios P; Andreou, Andreas S
2015-11-01
Reliably predicting software defects is one of the most significant tasks in software engineering. Two of the major components of modern software reliability modeling approaches are: 1) extraction of salient features for software system representation, based on appropriately designed software metrics and 2) development of intricate regression models for count data, to allow effective software reliability data modeling and prediction. Surprisingly, research in the latter frontier of count data regression modeling has been rather limited. More specifically, a lack of simple and efficient algorithms for posterior computation has made the Bayesian approaches appear unattractive, and thus underdeveloped in the context of software reliability modeling. In this paper, we try to address these issues by introducing a novel Bayesian regression model for count data, based on the concept of max-margin data modeling, effected in the context of a fully Bayesian model treatment with simple and efficient posterior distribution updates. Our novel approach yields a more discriminative learning technique, making more effective use of our training data during model inference. In addition, it allows of better handling uncertainty in the modeled data, which can be a significant problem when the training data are limited. We derive elegant inference algorithms for our model under the mean-field paradigm and exhibit its effectiveness using the publicly available benchmark data sets.
Kovačević, Strahinja; Karadžić, Milica; Podunavac-Kuzmanović, Sanja; Jevrić, Lidija
2018-01-01
The present study is based on the quantitative structure-activity relationship (QSAR) analysis of binding affinity toward human prion protein (huPrP C ) of quinacrine, pyridine dicarbonitrile, diphenylthiazole and diphenyloxazole analogs applying different linear and non-linear chemometric regression techniques, including univariate linear regression, multiple linear regression, partial least squares regression and artificial neural networks. The QSAR analysis distinguished molecular lipophilicity as an important factor that contributes to the binding affinity. Principal component analysis was used in order to reveal similarities or dissimilarities among the studied compounds. The analysis of in silico absorption, distribution, metabolism, excretion and toxicity (ADMET) parameters was conducted. The ranking of the studied analogs on the basis of their ADMET parameters was done applying the sum of ranking differences, as a relatively new chemometric method. The main aim of the study was to reveal the most important molecular features whose changes lead to the changes in the binding affinities of the studied compounds. Another point of view on the binding affinity of the most promising analogs was established by application of molecular docking analysis. The results of the molecular docking were proven to be in agreement with the experimental outcome. Copyright © 2017 Elsevier B.V. All rights reserved.
Accounting for measurement error in log regression models with applications to accelerated testing.
Richardson, Robert; Tolley, H Dennis; Evenson, William E; Lunt, Barry M
2018-01-01
In regression settings, parameter estimates will be biased when the explanatory variables are measured with error. This bias can significantly affect modeling goals. In particular, accelerated lifetime testing involves an extrapolation of the fitted model, and a small amount of bias in parameter estimates may result in a significant increase in the bias of the extrapolated predictions. Additionally, bias may arise when the stochastic component of a log regression model is assumed to be multiplicative when the actual underlying stochastic component is additive. To account for these possible sources of bias, a log regression model with measurement error and additive error is approximated by a weighted regression model which can be estimated using Iteratively Re-weighted Least Squares. Using the reduced Eyring equation in an accelerated testing setting, the model is compared to previously accepted approaches to modeling accelerated testing data with both simulations and real data.
NASA Astrophysics Data System (ADS)
Yuniarto, Budi; Kurniawan, Robert
2017-03-01
PLS Path Modeling (PLS-PM) is different from covariance based SEM, where PLS-PM use an approach based on variance or component, therefore, PLS-PM is also known as a component based SEM. Multiblock Partial Least Squares (MBPLS) is a method in PLS regression which can be used in PLS Path Modeling which known as Multiblock PLS Path Modeling (MBPLS-PM). This method uses an iterative procedure in its algorithm. This research aims to modify MBPLS-PM with Back Propagation Neural Network approach. The result is MBPLS-PM algorithm can be modified using the Back Propagation Neural Network approach to replace the iterative process in backward and forward step to get the matrix t and the matrix u in the algorithm. By modifying the MBPLS-PM algorithm using Back Propagation Neural Network approach, the model parameters obtained are relatively not significantly different compared to model parameters obtained by original MBPLS-PM algorithm.
Practical aspects of estimating energy components in rodents
van Klinken, Jan B.; van den Berg, Sjoerd A. A.; van Dijk, Ko Willems
2013-01-01
Recently there has been an increasing interest in exploiting computational and statistical techniques for the purpose of component analysis of indirect calorimetry data. Using these methods it becomes possible to dissect daily energy expenditure into its components and to assess the dynamic response of the resting metabolic rate (RMR) to nutritional and pharmacological manipulations. To perform robust component analysis, however, is not straightforward and typically requires the tuning of parameters and the preprocessing of data. Moreover the degree of accuracy that can be attained by these methods depends on the configuration of the system, which must be properly taken into account when setting up experimental studies. Here, we review the methods of Kalman filtering, linear, and penalized spline regression, and minimal energy expenditure estimation in the context of component analysis and discuss their results on high resolution datasets from mice and rats. In addition, we investigate the effect of the sample time, the accuracy of the activity sensor, and the washout time of the chamber on the estimation accuracy. We found that on the high resolution data there was a strong correlation between the results of Kalman filtering and penalized spline (P-spline) regression, except for the activity respiratory quotient (RQ). For low resolution data the basal metabolic rate (BMR) and resting RQ could still be estimated accurately with P-spline regression, having a strong correlation with the high resolution estimate (R2 > 0.997; sample time of 9 min). In contrast, the thermic effect of food (TEF) and activity related energy expenditure (AEE) were more sensitive to a reduction in the sample rate (R2 > 0.97). In conclusion, for component analysis on data generated by single channel systems with continuous data acquisition both Kalman filtering and P-spline regression can be used, while for low resolution data from multichannel systems P-spline regression gives more robust results. PMID:23641217
Estelles-Lopez, Lucia; Ropodi, Athina; Pavlidis, Dimitris; Fotopoulou, Jenny; Gkousari, Christina; Peyrodie, Audrey; Panagou, Efstathios; Nychas, George-John; Mohareb, Fady
2017-09-01
Over the past decade, analytical approaches based on vibrational spectroscopy, hyperspectral/multispectral imagining and biomimetic sensors started gaining popularity as rapid and efficient methods for assessing food quality, safety and authentication; as a sensible alternative to the expensive and time-consuming conventional microbiological techniques. Due to the multi-dimensional nature of the data generated from such analyses, the output needs to be coupled with a suitable statistical approach or machine-learning algorithms before the results can be interpreted. Choosing the optimum pattern recognition or machine learning approach for a given analytical platform is often challenging and involves a comparative analysis between various algorithms in order to achieve the best possible prediction accuracy. In this work, "MeatReg", a web-based application is presented, able to automate the procedure of identifying the best machine learning method for comparing data from several analytical techniques, to predict the counts of microorganisms responsible of meat spoilage regardless of the packaging system applied. In particularly up to 7 regression methods were applied and these are ordinary least squares regression, stepwise linear regression, partial least square regression, principal component regression, support vector regression, random forest and k-nearest neighbours. MeatReg" was tested with minced beef samples stored under aerobic and modified atmosphere packaging and analysed with electronic nose, HPLC, FT-IR, GC-MS and Multispectral imaging instrument. Population of total viable count, lactic acid bacteria, pseudomonads, Enterobacteriaceae and B. thermosphacta, were predicted. As a result, recommendations of which analytical platforms are suitable to predict each type of bacteria and which machine learning methods to use in each case were obtained. The developed system is accessible via the link: www.sorfml.com. Copyright © 2017 Elsevier Ltd. All rights reserved.
A Simulation Investigation of Principal Component Regression.
ERIC Educational Resources Information Center
Allen, David E.
Regression analysis is one of the more common analytic tools used by researchers. However, multicollinearity between the predictor variables can cause problems in using the results of regression analyses. Problems associated with multicollinearity include entanglement of relative influences of variables due to reduced precision of estimation,…
NASA Astrophysics Data System (ADS)
Canitano, Alexandre; Hsu, Ya-Ju; Lee, Hsin-Ming; Linde, Alan T.; Sacks, Selwyn
2018-03-01
We propose an approach for calibrating the horizontal tidal shear components [(differential extension (γ _1) and engineering shear (γ _2)] of two Sacks-Evertson (in Pap Meteorol Geophys 22:195-208, 1971) SES-3 borehole strainmeters installed in the Longitudinal Valley in eastern Taiwan. The method is based on the waveform reconstruction of the Earth and ocean tidal shear signals through linear regressions on strain gauge signals, with variable sensor azimuth. This method allows us to derive the orientation of the sensor without any initial constraints and to calibrate the shear strain components γ _1 and γ _2 against M_2 tidal constituent. The results illustrate the potential of tensor strainmeters for recording horizontal tidal shear strain.
James R. Wallis
1965-01-01
Written in Fortran IV and MAP, this computer program can handle up to 120 variables, and retain 40 principal components. It can perform simultaneous regression of up to 40 criterion variables upon the varimax rotated factor weight matrix. The columns and rows of all output matrices are labeled by six-character alphanumeric names. Data input can be from punch cards or...
[Studies on the brand traceability of milk powder based on NIR spectroscopy technology].
Guan, Xiao; Gu, Fang-Qing; Liu, Jing; Yang, Yong-Jian
2013-10-01
Brand traceability of several different kinds of milk powder was studied by combining near infrared spectroscopy diffuse reflectance mode with soft independent modeling of class analogy (SIMCA) in the present paper. The near infrared spectrum of 138 samples, including 54 Guangming milk powder samples, 43 Netherlands samples, and 33 Nestle samples and 8 Yili samples, were collected. After pretreatment of full spectrum data variables in training set, principal component analysis was performed, and the contribution rate of the cumulative variance of the first three principal components was about 99.07%. Milk powder principal component regression model based on SIMCA was established, and used to classify the milk powder samples in prediction sets. The results showed that the recognition rate of Guangming milk powder, Netherlands milk powder and Nestle milk powder was 78%, 75% and 100%, the rejection rate was 100%, 87%, and 88%, respectively. Therefore, the near infrared spectroscopy combined with SIMCA model can classify milk powder with high accuracy, and is a promising identification method of milk powder variety.
Steiner, Genevieve Z.; Barry, Robert J.; Gonsalvez, Craig J.
2016-01-01
In oddball tasks, increasing the time between stimuli within a particular condition (target-to-target interval, TTI; nontarget-to-nontarget interval, NNI) systematically enhances N1, P2, and P300 event-related potential (ERP) component amplitudes. This study examined the mechanism underpinning these effects in ERP components recorded from 28 adults who completed a conventional three-tone oddball task. Bivariate correlations, partial correlations and multiple regression explored component changes due to preceding ERP component amplitudes and intervals found within the stimulus series, rather than constraining the task with experimentally constructed intervals, which has been adequately explored in prior studies. Multiple regression showed that for targets, N1 and TTI predicted N2, TTI predicted P3a and P3b, and Processing Negativity (PN), P3b, and TTI predicted reaction time. For rare nontargets, P1 predicted N1, NNI predicted N2, and N1 predicted Slow Wave (SW). Findings show that the mechanism is operating on separate stages of stimulus-processing, suggestive of either increased activation within a number of stimulus-specific pathways, or very long component generator recovery cycles. These results demonstrate the extent to which matching-stimulus intervals influence ERP component amplitudes and behavior in a three-tone oddball task, and should be taken into account when designing similar studies. PMID:27445774
Steiner, Genevieve Z; Barry, Robert J; Gonsalvez, Craig J
2016-01-01
In oddball tasks, increasing the time between stimuli within a particular condition (target-to-target interval, TTI; nontarget-to-nontarget interval, NNI) systematically enhances N1, P2, and P300 event-related potential (ERP) component amplitudes. This study examined the mechanism underpinning these effects in ERP components recorded from 28 adults who completed a conventional three-tone oddball task. Bivariate correlations, partial correlations and multiple regression explored component changes due to preceding ERP component amplitudes and intervals found within the stimulus series, rather than constraining the task with experimentally constructed intervals, which has been adequately explored in prior studies. Multiple regression showed that for targets, N1 and TTI predicted N2, TTI predicted P3a and P3b, and Processing Negativity (PN), P3b, and TTI predicted reaction time. For rare nontargets, P1 predicted N1, NNI predicted N2, and N1 predicted Slow Wave (SW). Findings show that the mechanism is operating on separate stages of stimulus-processing, suggestive of either increased activation within a number of stimulus-specific pathways, or very long component generator recovery cycles. These results demonstrate the extent to which matching-stimulus intervals influence ERP component amplitudes and behavior in a three-tone oddball task, and should be taken into account when designing similar studies.
Ma, Wan-Li; Sun, De-Zhi; Shen, Wei-Guo; Yang, Meng; Qi, Hong; Liu, Li-Yan; Shen, Ji-Min; Li, Yi-Fan
2011-07-01
A comprehensive sampling campaign was carried out to study atmospheric concentration of polycyclic aromatic hydrocarbons (PAHs) in Beijing and to evaluate the effectiveness of source control strategies in reducing PAHs pollution after the 29th Olympic Games. The sub-cooled liquid vapor pressure (logP(L)(o))-based model and octanol-air partition coefficient (K(oa))-based model were applied based on each seasonal dateset. Regression analysis among log K(P), logP(L)(o) and log K(oa) exhibited high significant correlations for four seasons. Source factors were identified by principle component analysis and contributions were further estimated by multiple linear regression. Pyrogenic sources and coke oven emission were identified as major sources for both the non-heating and heating seasons. As compared with literatures, the mean PAH concentrations before and after the 29th Olympic Games were reduced by more than 60%, indicating that the source control measures were effective for reducing PAHs pollution in Beijing. Copyright © 2011 Elsevier Ltd. All rights reserved.
Li, Zhongwei; Xin, Yuezhen; Wang, Xun; Sun, Beibei; Xia, Shengyu; Li, Hui
2016-01-01
Phellinus is a kind of fungus and is known as one of the elemental components in drugs to avoid cancers. With the purpose of finding optimized culture conditions for Phellinus production in the laboratory, plenty of experiments focusing on single factor were operated and large scale of experimental data were generated. In this work, we use the data collected from experiments for regression analysis, and then a mathematical model of predicting Phellinus production is achieved. Subsequently, a gene-set based genetic algorithm is developed to optimize the values of parameters involved in culture conditions, including inoculum size, PH value, initial liquid volume, temperature, seed age, fermentation time, and rotation speed. These optimized values of the parameters have accordance with biological experimental results, which indicate that our method has a good predictability for culture conditions optimization. PMID:27610365
Takagi, Daisuke; Ikeda, Ken'ichi; Kawachi, Ichiro
2012-11-01
Crime is an important determinant of public health outcomes, including quality of life, mental well-being, and health behavior. A body of research has documented the association between community social capital and crime victimization. The association between social capital and crime victimization has been examined at multiple levels of spatial aggregation, ranging from entire countries, to states, metropolitan areas, counties, and neighborhoods. In multilevel analysis, the spatial boundaries at level 2 are most often drawn from administrative boundaries (e.g., Census tracts in the U.S.). One problem with adopting administrative definitions of neighborhoods is that it ignores spatial spillover. We conducted a study of social capital and crime victimization in one ward of Tokyo city, using a spatial Durbin model with an inverse-distance weighting matrix that assigned each respondent a unique level of "exposure" to social capital based on all other residents' perceptions. The study is based on a postal questionnaire sent to 20-69 years old residents of Arakawa Ward, Tokyo. The response rate was 43.7%. We examined the contextual influence of generalized trust, perceptions of reciprocity, two types of social network variables, as well as two principal components of social capital (constructed from the above four variables). Our outcome measure was self-reported crime victimization in the last five years. In the spatial Durbin model, we found that neighborhood generalized trust, reciprocity, supportive networks and two principal components of social capital were each inversely associated with crime victimization. By contrast, a multilevel regression performed with the same data (using administrative neighborhood boundaries) found generally null associations between neighborhood social capital and crime. Spatial regression methods may be more appropriate for investigating the contextual influence of social capital in homogeneous cultural settings such as Japan. Copyright © 2012 Elsevier Ltd. All rights reserved.
Soccer and sexual health education: a promising approach for reducing adolescent births in Haiti.
Kaplan, Kathryn C; Lewis, Judy; Gebrian, Bette; Theall, Katherine
2015-05-01
To explore the effect of an innovative, integrative program in female sexual reproductive health (SRH) and soccer (or fútbol, in Haitian Creole) in rural Haiti by measuring the rate of births among program participants 15-19 years old and their nonparticipant peers. A retrospective cohort study using 2006-2009 data from the computerized data-tracking system of the Haitian Health Foundation (HHF), a U.S.-based nongovernmental organization serving urban and rural populations in Haiti, was used to assess births among girls 15-19 years old who participated in HHF's GenNext program, a combination education-soccer program for youth, based on SRH classes HHF nurses and community workers had been conducting in Haiti for mothers, fathers, and youth; girl-centered health screenings; and an all-female summer soccer league, during 2006-2009 (n = 4 251). Bivariate and multiple logistic regression analyses were carried out to assess differences in the rate of births among program participants according to their level of participation (SRH component only ("EDU") versus both the SRH and soccer components ("SO") compared to their village peers who did not participate. Hazard ratios (HRs) of birth rates were estimated using Cox regression analysis of childbearing data for the three different groups. In the multiple logistic regression analysis, only the girls in the "EDU" group had significantly fewer births than the nonparticipants after adjusting for confounders (odds ratio = 0.535; 95% confidence interval (CI) = 0.304, 0.940). The Cox regression analysis demonstrated that those in the EDU group (HR = 0.893; 95% CI = 0.802, 0.994) and to a greater degree those in the SO group (HR = 0.631; 95% CI = 0.558, 0.714) were significantly protected against childbearing between the ages of 15 and 19 years. HHF's GenNext program demonstrates the effectiveness of utilizing nurse educators, community mobilization, and youth participation in sports, education, and structured youth groups to promote and sustain health for adolescent girls and young women.
Dirichlet Component Regression and its Applications to Psychiatric Data.
Gueorguieva, Ralitza; Rosenheck, Robert; Zelterman, Daniel
2008-08-15
We describe a Dirichlet multivariable regression method useful for modeling data representing components as a percentage of a total. This model is motivated by the unmet need in psychiatry and other areas to simultaneously assess the effects of covariates on the relative contributions of different components of a measure. The model is illustrated using the Positive and Negative Syndrome Scale (PANSS) for assessment of schizophrenia symptoms which, like many other metrics in psychiatry, is composed of a sum of scores on several components, each in turn, made up of sums of evaluations on several questions. We simultaneously examine the effects of baseline socio-demographic and co-morbid correlates on all of the components of the total PANSS score of patients from a schizophrenia clinical trial and identify variables associated with increasing or decreasing relative contributions of each component. Several definitions of residuals are provided. Diagnostics include measures of overdispersion, Cook's distance, and a local jackknife influence metric.
Smith, David V; Utevsky, Amanda V; Bland, Amy R; Clement, Nathan; Clithero, John A; Harsch, Anne E W; McKell Carter, R; Huettel, Scott A
2014-07-15
A central challenge for neuroscience lies in relating inter-individual variability to the functional properties of specific brain regions. Yet, considerable variability exists in the connectivity patterns between different brain areas, potentially producing reliable group differences. Using sex differences as a motivating example, we examined two separate resting-state datasets comprising a total of 188 human participants. Both datasets were decomposed into resting-state networks (RSNs) using a probabilistic spatial independent component analysis (ICA). We estimated voxel-wise functional connectivity with these networks using a dual-regression analysis, which characterizes the participant-level spatiotemporal dynamics of each network while controlling for (via multiple regression) the influence of other networks and sources of variability. We found that males and females exhibit distinct patterns of connectivity with multiple RSNs, including both visual and auditory networks and the right frontal-parietal network. These results replicated across both datasets and were not explained by differences in head motion, data quality, brain volume, cortisol levels, or testosterone levels. Importantly, we also demonstrate that dual-regression functional connectivity is better at detecting inter-individual variability than traditional seed-based functional connectivity approaches. Our findings characterize robust-yet frequently ignored-neural differences between males and females, pointing to the necessity of controlling for sex in neuroscience studies of individual differences. Moreover, our results highlight the importance of employing network-based models to study variability in functional connectivity. Copyright © 2014 Elsevier Inc. All rights reserved.
Wang, Ying; Goh, Joshua O; Resnick, Susan M; Davatzikos, Christos
2013-01-01
In this study, we used high-dimensional pattern regression methods based on structural (gray and white matter; GM and WM) and functional (positron emission tomography of regional cerebral blood flow; PET) brain data to identify cross-sectional imaging biomarkers of cognitive performance in cognitively normal older adults from the Baltimore Longitudinal Study of Aging (BLSA). We focused on specific components of executive and memory domains known to decline with aging, including manipulation, semantic retrieval, long-term memory (LTM), and short-term memory (STM). For each imaging modality, brain regions associated with each cognitive domain were generated by adaptive regional clustering. A relevance vector machine was adopted to model the nonlinear continuous relationship between brain regions and cognitive performance, with cross-validation to select the most informative brain regions (using recursive feature elimination) as imaging biomarkers and optimize model parameters. Predicted cognitive scores using our regression algorithm based on the resulting brain regions correlated well with actual performance. Also, regression models obtained using combined GM, WM, and PET imaging modalities outperformed models based on single modalities. Imaging biomarkers related to memory performance included the orbito-frontal and medial temporal cortical regions with LTM showing stronger correlation with the temporal lobe than STM. Brain regions predicting executive performance included orbito-frontal, and occipito-temporal areas. The PET modality had higher contribution to most cognitive domains except manipulation, which had higher WM contribution from the superior longitudinal fasciculus and the genu of the corpus callosum. These findings based on machine-learning methods demonstrate the importance of combining structural and functional imaging data in understanding complex cognitive mechanisms and also their potential usage as biomarkers that predict cognitive status.
A Nonlinear Model for Gene-Based Gene-Environment Interaction.
Sa, Jian; Liu, Xu; He, Tao; Liu, Guifen; Cui, Yuehua
2016-06-04
A vast amount of literature has confirmed the role of gene-environment (G×E) interaction in the etiology of complex human diseases. Traditional methods are predominantly focused on the analysis of interaction between a single nucleotide polymorphism (SNP) and an environmental variable. Given that genes are the functional units, it is crucial to understand how gene effects (rather than single SNP effects) are influenced by an environmental variable to affect disease risk. Motivated by the increasing awareness of the power of gene-based association analysis over single variant based approach, in this work, we proposed a sparse principle component regression (sPCR) model to understand the gene-based G×E interaction effect on complex disease. We first extracted the sparse principal components for SNPs in a gene, then the effect of each principal component was modeled by a varying-coefficient (VC) model. The model can jointly model variants in a gene in which their effects are nonlinearly influenced by an environmental variable. In addition, the varying-coefficient sPCR (VC-sPCR) model has nice interpretation property since the sparsity on the principal component loadings can tell the relative importance of the corresponding SNPs in each component. We applied our method to a human birth weight dataset in Thai population. We analyzed 12,005 genes across 22 chromosomes and found one significant interaction effect using the Bonferroni correction method and one suggestive interaction. The model performance was further evaluated through simulation studies. Our model provides a system approach to evaluate gene-based G×E interaction.
Classical Testing in Functional Linear Models.
Kong, Dehan; Staicu, Ana-Maria; Maity, Arnab
2016-01-01
We extend four tests common in classical regression - Wald, score, likelihood ratio and F tests - to functional linear regression, for testing the null hypothesis, that there is no association between a scalar response and a functional covariate. Using functional principal component analysis, we re-express the functional linear model as a standard linear model, where the effect of the functional covariate can be approximated by a finite linear combination of the functional principal component scores. In this setting, we consider application of the four traditional tests. The proposed testing procedures are investigated theoretically for densely observed functional covariates when the number of principal components diverges. Using the theoretical distribution of the tests under the alternative hypothesis, we develop a procedure for sample size calculation in the context of functional linear regression. The four tests are further compared numerically for both densely and sparsely observed noisy functional data in simulation experiments and using two real data applications.
Classical Testing in Functional Linear Models
Kong, Dehan; Staicu, Ana-Maria; Maity, Arnab
2016-01-01
We extend four tests common in classical regression - Wald, score, likelihood ratio and F tests - to functional linear regression, for testing the null hypothesis, that there is no association between a scalar response and a functional covariate. Using functional principal component analysis, we re-express the functional linear model as a standard linear model, where the effect of the functional covariate can be approximated by a finite linear combination of the functional principal component scores. In this setting, we consider application of the four traditional tests. The proposed testing procedures are investigated theoretically for densely observed functional covariates when the number of principal components diverges. Using the theoretical distribution of the tests under the alternative hypothesis, we develop a procedure for sample size calculation in the context of functional linear regression. The four tests are further compared numerically for both densely and sparsely observed noisy functional data in simulation experiments and using two real data applications. PMID:28955155
A Semiparametric Change-Point Regression Model for Longitudinal Observations.
Xing, Haipeng; Ying, Zhiliang
2012-12-01
Many longitudinal studies involve relating an outcome process to a set of possibly time-varying covariates, giving rise to the usual regression models for longitudinal data. When the purpose of the study is to investigate the covariate effects when experimental environment undergoes abrupt changes or to locate the periods with different levels of covariate effects, a simple and easy-to-interpret approach is to introduce change-points in regression coefficients. In this connection, we propose a semiparametric change-point regression model, in which the error process (stochastic component) is nonparametric and the baseline mean function (functional part) is completely unspecified, the observation times are allowed to be subject-specific, and the number, locations and magnitudes of change-points are unknown and need to be estimated. We further develop an estimation procedure which combines the recent advance in semiparametric analysis based on counting process argument and multiple change-points inference, and discuss its large sample properties, including consistency and asymptotic normality, under suitable regularity conditions. Simulation results show that the proposed methods work well under a variety of scenarios. An application to a real data set is also given.
Rein, Thomas R; Harvati, Katerina; Harrison, Terry
2015-01-01
Uncovering links between skeletal morphology and locomotor behavior is an essential component of paleobiology because it allows researchers to infer the locomotor repertoire of extinct species based on preserved fossils. In this study, we explored ulnar shape in anthropoid primates using 3D geometric morphometrics to discover novel aspects of shape variation that correspond to observed differences in the relative amount of forelimb suspensory locomotion performed by species. The ultimate goal of this research was to construct an accurate predictive model that can be applied to infer the significance of these behaviors. We studied ulnar shape variation in extant species using principal component analysis. Species mainly clustered into phylogenetic groups along the first two principal components. Upon closer examination, the results showed that the position of species within each major clade corresponded closely with the proportion of forelimb suspensory locomotion that they have been observed to perform in nature. We used principal component regression to construct a predictive model for the proportion of these behaviors that would be expected to occur in the locomotor repertoire of anthropoid primates. We then applied this regression analysis to Pliopithecus vindobonensis, a stem catarrhine from the Miocene of central Europe, and found strong evidence that this species was adapted to perform a proportion of forelimb suspensory locomotion similar to that observed in the extant woolly monkey, Lagothrix lagothricha. Copyright © 2014 Elsevier Ltd. All rights reserved.
NASA Technical Reports Server (NTRS)
Siconolfi, Steven F. (Inventor)
2000-01-01
Method and apparatus are described for determining volumes of body fluids in a subject using bioelectrical response spectroscopy. The human body is represented using an electrical circuit. Intra-cellular water is represented by a resistor in series with a capacitor; extra-cellular water is represented by a resistor in series with two parallel inductors. The parallel inductors represent the resistance due to vascular fluids. An alternating, low amperage, multifrequency signal is applied to determine a subject's impedance and resistance. From these data, statistical regression is used to determine a 1% impedance where the subject's impedance changes by no more than 1% over a 25 kHz interval. Circuit component, of the human body circuit are determined based on the 1% impedance. Equations for calculating total body water, extra-cellular water, total blood volume, and plasma volume are developed based on the circuit components.
Shabri, Ani; Samsudin, Ruhaidah
2014-01-01
Crude oil prices do play significant role in the global economy and are a key input into option pricing formulas, portfolio allocation, and risk measurement. In this paper, a hybrid model integrating wavelet and multiple linear regressions (MLR) is proposed for crude oil price forecasting. In this model, Mallat wavelet transform is first selected to decompose an original time series into several subseries with different scale. Then, the principal component analysis (PCA) is used in processing subseries data in MLR for crude oil price forecasting. The particle swarm optimization (PSO) is used to adopt the optimal parameters of the MLR model. To assess the effectiveness of this model, daily crude oil market, West Texas Intermediate (WTI), has been used as the case study. Time series prediction capability performance of the WMLR model is compared with the MLR, ARIMA, and GARCH models using various statistics measures. The experimental results show that the proposed model outperforms the individual models in forecasting of the crude oil prices series.
Bio-inspired adaptive feedback error learning architecture for motor control.
Tolu, Silvia; Vanegas, Mauricio; Luque, Niceto R; Garrido, Jesús A; Ros, Eduardo
2012-10-01
This study proposes an adaptive control architecture based on an accurate regression method called Locally Weighted Projection Regression (LWPR) and on a bio-inspired module, such as a cerebellar-like engine. This hybrid architecture takes full advantage of the machine learning module (LWPR kernel) to abstract an optimized representation of the sensorimotor space while the cerebellar component integrates this to generate corrective terms in the framework of a control task. Furthermore, we illustrate how the use of a simple adaptive error feedback term allows to use the proposed architecture even in the absence of an accurate analytic reference model. The presented approach achieves an accurate control with low gain corrective terms (for compliant control schemes). We evaluate the contribution of the different components of the proposed scheme comparing the obtained performance with alternative approaches. Then, we show that the presented architecture can be used for accurate manipulation of different objects when their physical properties are not directly known by the controller. We evaluate how the scheme scales for simulated plants of high Degrees of Freedom (7-DOFs).
Scampicchio, Matteo; Mimmo, Tanja; Capici, Calogero; Huck, Christian; Innocente, Nadia; Drusch, Stephan; Cesco, Stefano
2012-11-14
Stable isotope values were used to develop a new analytical approach enabling the simultaneous identification of milk samples either processed with different heating regimens or from different geographical origins. The samples consisted of raw, pasteurized (HTST), and ultrapasteurized (UHT) milk from different Italian origins. The approach consisted of the analysis of the isotope ratio of δ¹³C and δ¹⁵N for the milk samples and their fractions (fat, casein, and whey). The main finding of this work is that as the heat processing affects the composition of the milk fractions, changes in δ¹³C and δ¹⁵N were also observed. These changes were used as markers to develop pattern recognition maps based on principal component analysis and supervised classification models, such as linear discriminant analysis (LDA), multivariate regression (MLR), principal component regression (PCR), and partial least-squares (PLS). The results give proof of the concept that isotope ratio mass spectroscopy can discriminate simultaneously between milk samples according to their geographical origin and type of processing.
Savary, Serge; Delbac, Lionel; Rochas, Amélie; Taisant, Guillaume; Willocquet, Laetitia
2009-08-01
Dual epidemics are defined as epidemics developing on two or several plant organs in the course of a cropping season. Agricultural pathosystems where such epidemics develop are often very important, because the harvestable part is one of the organs affected. These epidemics also are often difficult to manage, because the linkage between epidemiological components occurring on different organs is poorly understood, and because prediction of the risk toward the harvestable organs is difficult. In the case of downy mildew (DM) and powdery mildew (PM) of grapevine, nonlinear modeling and logistic regression indicated nonlinearity in the foliage-cluster relationships. Nonlinear modeling enabled the parameterization of a transmission coefficient that numerically links the two components, leaves and clusters, in DM and PM epidemics. Logistic regression analysis yielded a series of probabilistic models that enabled predicting preset levels of cluster infection risks based on DM and PM severities on the foliage at successive crop stages. The usefulness of this framework for tactical decision-making for disease control is discussed.
Shabri, Ani; Samsudin, Ruhaidah
2014-01-01
Crude oil prices do play significant role in the global economy and are a key input into option pricing formulas, portfolio allocation, and risk measurement. In this paper, a hybrid model integrating wavelet and multiple linear regressions (MLR) is proposed for crude oil price forecasting. In this model, Mallat wavelet transform is first selected to decompose an original time series into several subseries with different scale. Then, the principal component analysis (PCA) is used in processing subseries data in MLR for crude oil price forecasting. The particle swarm optimization (PSO) is used to adopt the optimal parameters of the MLR model. To assess the effectiveness of this model, daily crude oil market, West Texas Intermediate (WTI), has been used as the case study. Time series prediction capability performance of the WMLR model is compared with the MLR, ARIMA, and GARCH models using various statistics measures. The experimental results show that the proposed model outperforms the individual models in forecasting of the crude oil prices series. PMID:24895666
Barba, Lida; Sánchez-Macías, Davinia; Barba, Iván; Rodríguez, Nibaldo
2018-06-01
Guinea pig meat consumption is increasing exponentially worldwide. The evaluation of the contribution of carcass components to carcass quality potentially can allow for the estimation of the value added to food animal origin and make research in guinea pigs more practicable. The aim of this study was to propose a methodology for modelling the contribution of different carcass components to the overall carcass quality of guinea pigs by using non-invasive pre- and post mortem carcass measurements. The selection of predictors was developed through correlation analysis and statistical significance; whereas the prediction models were based on Multiple Linear Regression. The prediction results showed higher accuracy in the prediction of carcass component contribution expressed in grams, compared to when expressed as a percentage of carcass quality components. The proposed prediction models can be useful for the guinea pig meat industry and research institutions by using non-invasive and time- and cost-efficient carcass component measuring techniques. Copyright © 2018 Elsevier Ltd. All rights reserved.
Cespedes, Elizabeth M.; Horan, Christine M.; Gillman, Matthew W.; Gortmaker, Steven L.; Price, Sarah; Rifas-Shiman, Sheryl L.; Mitchell, Kathleen; Taveras, Elsie M.
2014-01-01
Objective To evaluate the High Five for Kids intervention effect on television (TV) within subgroups, examine participant characteristics associated with process measures and assess perceived helpfulness of TV intervention components. Method High Five (RCT of 445 overweight/obese 2–7 year-olds in Massachusetts [2006–2008]) reduced TV by 0.36 hours/day. 1-year effects on TV, stratified by subgroup, were assessed using linear regression. Among intervention participants (n=253), associations of intervention component helpfulness with TV reduction were examined using linear regression and associations of participant characteristics with processes linked to TV reduction (choosing TV and completing intervention visits) were examined using logistic regression. Results High Five reduced TV across subgroups. Parents of Latino (v. white) children had lower odds of completing >=2 study visits (OR 0.39 [95%CI: 0.18, 0.84]). Parents of black (v. white) children had higher odds of choosing TV (OR: 2.23 [95% CI: 1.08, 4.59]), as did parents of obese (v. overweight) children and children watching >=2 hours/day (v. <2) at baseline. Greater perceived helpfulness was associated with greater TV reduction. Conclusion Clinic-based motivational interviewing reduces TV in children. Low cost education approaches (e.g., printed materials) may be well-received. Parents of children at higher obesity risk could be more motivated to reduce TV. PMID:24518002
Duarte-Tagles, Héctor; Salinas-Rodríguez, Aarón; Idrovo, Álvaro J; Búrquez, Alberto; Corral-Verdugo, Víctor
2015-08-01
Depression is a highly prevalent illness among adults, and it is the second most frequently reported mental disorder in urban settings in México. Exposure to natural environments and its components may improve the mental health of the population. To evaluate the association between biodiversity indicators and the prevalence of depressive symptoms among the adult population (20 to 65 years of age) in México. Information from the Encuesta Nacional de Salud y Nutrición 2006 (ENSANUT 2006) and the Compendio de Estadísticas Ambientales 2008 was analyzed. A biodiversity index was constructed based on the species richness and ecoregions in each state. A multilevel logistic regression model was built with random intercepts and a multiple logistic regression was generated with clustering by state. The factors associated with depressive symptoms were being female, self-perceived as indigenous, lower education level, not living with a partner, lack of steady paid work, having a chronic illness and drinking alcohol. The biodiversity index was found to be inversely associated with the prevalence of depressive symptoms when defined as a continuous variable, and the results from the regression were grouped by state (OR=0.71; 95% CI = 0.59-0.87). Although the design was cross-sectional, this study adds to the evidence of the potential benefits to mental health from contact with nature and its components.
Söhn, Matthias; Alber, Markus; Yan, Di
2007-09-01
The variability of dose-volume histogram (DVH) shapes in a patient population can be quantified using principal component analysis (PCA). We applied this to rectal DVHs of prostate cancer patients and investigated the correlation of the PCA parameters with late bleeding. PCA was applied to the rectal wall DVHs of 262 patients, who had been treated with a four-field box, conformal adaptive radiotherapy technique. The correlated changes in the DVH pattern were revealed as "eigenmodes," which were ordered by their importance to represent data set variability. Each DVH is uniquely characterized by its principal components (PCs). The correlation of the first three PCs and chronic rectal bleeding of Grade 2 or greater was investigated with uni- and multivariate logistic regression analyses. Rectal wall DVHs in four-field conformal RT can primarily be represented by the first two or three PCs, which describe approximately 94% or 96% of the DVH shape variability, respectively. The first eigenmode models the total irradiated rectal volume; thus, PC1 correlates to the mean dose. Mode 2 describes the interpatient differences of the relative rectal volume in the two- or four-field overlap region. Mode 3 reveals correlations of volumes with intermediate doses ( approximately 40-45 Gy) and volumes with doses >70 Gy; thus, PC3 is associated with the maximal dose. According to univariate logistic regression analysis, only PC2 correlated significantly with toxicity. However, multivariate logistic regression analysis with the first two or three PCs revealed an increased probability of bleeding for DVHs with more than one large PC. PCA can reveal the correlation structure of DVHs for a patient population as imposed by the treatment technique and provide information about its relationship to toxicity. It proves useful for augmenting normal tissue complication probability modeling approaches.
Analysis of Forest Foliage Using a Multivariate Mixture Model
NASA Technical Reports Server (NTRS)
Hlavka, C. A.; Peterson, David L.; Johnson, L. F.; Ganapol, B.
1997-01-01
Data with wet chemical measurements and near infrared spectra of ground leaf samples were analyzed to test a multivariate regression technique for estimating component spectra which is based on a linear mixture model for absorbance. The resulting unmixed spectra for carbohydrates, lignin, and protein resemble the spectra of extracted plant starches, cellulose, lignin, and protein. The unmixed protein spectrum has prominent absorption spectra at wavelengths which have been associated with nitrogen bonds.
NASA Astrophysics Data System (ADS)
Bigdeli, Behnaz; Pahlavani, Parham
2017-01-01
Interpretation of synthetic aperture radar (SAR) data processing is difficult because the geometry and spectral range of SAR are different from optical imagery. Consequently, SAR imaging can be a complementary data to multispectral (MS) optical remote sensing techniques because it does not depend on solar illumination and weather conditions. This study presents a multisensor fusion of SAR and MS data based on the use of classification and regression tree (CART) and support vector machine (SVM) through a decision fusion system. First, different feature extraction strategies were applied on SAR and MS data to produce more spectral and textural information. To overcome the redundancy and correlation between features, an intrinsic dimension estimation method based on noise-whitened Harsanyi, Farrand, and Chang determines the proper dimension of the features. Then, principal component analysis and independent component analysis were utilized on stacked feature space of two data. Afterward, SVM and CART classified each reduced feature space. Finally, a fusion strategy was utilized to fuse the classification results. To show the effectiveness of the proposed methodology, single classification on each data was compared to the obtained results. A coregistered Radarsat-2 and WorldView-2 data set from San Francisco, USA, was available to examine the effectiveness of the proposed method. The results show that combinations of SAR data with optical sensor based on the proposed methodology improve the classification results for most of the classes. The proposed fusion method provided approximately 93.24% and 95.44% for two different areas of the data.
Asano, Junichi; Hirakawa, Akihiro; Hamada, Chikuma
2014-01-01
A cure rate model is a survival model incorporating the cure rate with the assumption that the population contains both uncured and cured individuals. It is a powerful statistical tool for prognostic studies, especially in cancer. The cure rate is important for making treatment decisions in clinical practice. The proportional hazards (PH) cure model can predict the cure rate for each patient. This contains a logistic regression component for the cure rate and a Cox regression component to estimate the hazard for uncured patients. A measure for quantifying the predictive accuracy of the cure rate estimated by the Cox PH cure model is required, as there has been a lack of previous research in this area. We used the Cox PH cure model for the breast cancer data; however, the area under the receiver operating characteristic curve (AUC) could not be estimated because many patients were censored. In this study, we used imputation-based AUCs to assess the predictive accuracy of the cure rate from the PH cure model. We examined the precision of these AUCs using simulation studies. The results demonstrated that the imputation-based AUCs were estimable and their biases were negligibly small in many cases, although ordinary AUC could not be estimated. Additionally, we introduced the bias-correction method of imputation-based AUCs and found that the bias-corrected estimate successfully compensated the overestimation in the simulation studies. We also illustrated the estimation of the imputation-based AUCs using breast cancer data. Copyright © 2014 John Wiley & Sons, Ltd.
A guide to understanding meta-analysis.
Israel, Heidi; Richter, Randy R
2011-07-01
With the focus on evidence-based practice in healthcare, a well-conducted systematic review that includes a meta-analysis where indicated represents a high level of evidence for treatment effectiveness. The purpose of this commentary is to assist clinicians in understanding meta-analysis as a statistical tool using both published articles and explanations of components of the technique. We describe what meta-analysis is, what heterogeneity is, and how it affects meta-analysis, effect size, the modeling techniques of meta-analysis, and strengths and weaknesses of meta-analysis. Common components like forest plot interpretation, software that may be used, special cases for meta-analysis, such as subgroup analysis, individual patient data, and meta-regression, and a discussion of criticisms, are included.
Factors Associated with Clinician Participation in TF-CBT Post-workshop Training Components.
Pemberton, Joy R; Conners-Burrow, Nicola A; Sigel, Benjamin A; Sievers, Chad M; Stokes, Lauren D; Kramer, Teresa L
2017-07-01
For proficiency in an evidence-based treatment (EBT), mental health professionals (MHPs) need training activities extending beyond a one-time workshop. Using data from 178 MHPs participating in a statewide TF-CBT dissemination project, we used five variables assessed at the workshop, via multiple and logistic regression, to predict participation in three post-workshop training components. Perceived in-workshop learning and client-treatment mismatch were predictive of consultation call participation and case presentation respectively. Attitudes toward EBTs were predictive of trauma assessment utilization, although only with non-call participants removed from analysis. Productivity requirements and confidence in TF-CBT skills were not associated with participation in post-workshop activities.
Regression and multivariate models for predicting particulate matter concentration level.
Nazif, Amina; Mohammed, Nurul Izma; Malakahmad, Amirhossein; Abualqumboz, Motasem S
2018-01-01
The devastating health effects of particulate matter (PM 10 ) exposure by susceptible populace has made it necessary to evaluate PM 10 pollution. Meteorological parameters and seasonal variation increases PM 10 concentration levels, especially in areas that have multiple anthropogenic activities. Hence, stepwise regression (SR), multiple linear regression (MLR) and principal component regression (PCR) analyses were used to analyse daily average PM 10 concentration levels. The analyses were carried out using daily average PM 10 concentration, temperature, humidity, wind speed and wind direction data from 2006 to 2010. The data was from an industrial air quality monitoring station in Malaysia. The SR analysis established that meteorological parameters had less influence on PM 10 concentration levels having coefficient of determination (R 2 ) result from 23 to 29% based on seasoned and unseasoned analysis. While, the result of the prediction analysis showed that PCR models had a better R 2 result than MLR methods. The results for the analyses based on both seasoned and unseasoned data established that MLR models had R 2 result from 0.50 to 0.60. While, PCR models had R 2 result from 0.66 to 0.89. In addition, the validation analysis using 2016 data also recognised that the PCR model outperformed the MLR model, with the PCR model for the seasoned analysis having the best result. These analyses will aid in achieving sustainable air quality management strategies.
Kendrick, Sarah K; Zheng, Qi; Garbett, Nichola C; Brock, Guy N
2017-01-01
DSC is used to determine thermally-induced conformational changes of biomolecules within a blood plasma sample. Recent research has indicated that DSC curves (or thermograms) may have different characteristics based on disease status and, thus, may be useful as a monitoring and diagnostic tool for some diseases. Since thermograms are curves measured over a range of temperature values, they are considered functional data. In this paper we apply functional data analysis techniques to analyze differential scanning calorimetry (DSC) data from individuals from the Lupus Family Registry and Repository (LFRR). The aim was to assess the effect of lupus disease status as well as additional covariates on the thermogram profiles, and use FD analysis methods to create models for classifying lupus vs. control patients on the basis of the thermogram curves. Thermograms were collected for 300 lupus patients and 300 controls without lupus who were matched with diseased individuals based on sex, race, and age. First, functional regression with a functional response (DSC) and categorical predictor (disease status) was used to determine how thermogram curve structure varied according to disease status and other covariates including sex, race, and year of birth. Next, functional logistic regression with disease status as the response and functional principal component analysis (FPCA) scores as the predictors was used to model the effect of thermogram structure on disease status prediction. The prediction accuracy for patients with Osteoarthritis and Rheumatoid Arthritis but without Lupus was also calculated to determine the ability of the classifier to differentiate between Lupus and other diseases. Data were divided 1000 times into separate 2/3 training and 1/3 test data for evaluation of predictions. Finally, derivatives of thermogram curves were included in the models to determine whether they aided in prediction of disease status. Functional regression with thermogram as a functional response and disease status as predictor showed a clear separation in thermogram curve structure between cases and controls. The logistic regression model with FPCA scores as the predictors gave the most accurate results with a mean 79.22% correct classification rate with a mean sensitivity = 79.70%, and specificity = 81.48%. The model correctly classified OA and RA patients without Lupus as controls at a rate of 75.92% on average with a mean sensitivity = 79.70% and specificity = 77.6%. Regression models including FPCA scores for derivative curves did not perform as well, nor did regression models including covariates. Changes in thermograms observed in the disease state likely reflect covalent modifications of plasma proteins or changes in large protein-protein interacting networks resulting in the stabilization of plasma proteins towards thermal denaturation. By relating functional principal components from thermograms to disease status, our Functional Principal Component Analysis model provides results that are more easily interpretable compared to prior studies. Further, the model could also potentially be coupled with other biomarkers to improve diagnostic classification for lupus.
Mimura, Natsuki; Isogai, Atsuko; Iwashita, Kazuhiro; Bamba, Takeshi; Fukusaki, Eiichiro
2014-10-01
Sake is a Japanese traditional alcoholic beverage, which is produced by simultaneous saccharification and alcohol fermentation of polished and steamed rice by Aspergillus oryzae and Saccharomyces cerevisiae. About 300 compounds have been identified in sake, and the contribution of individual components to the sake flavor has been examined at the same time. However, only a few compounds could explain the characteristics alone and most of the attributes still remain unclear. The purpose of this study was to examine the relationship between the component profile and the attributes of sake. Gas chromatography coupled with mass spectrometry (GC/MS)-based non-targeted analysis was employed to obtain the low molecular weight component profile of Japanese sake including both nonvolatile and volatile compounds. Sake attributes and overall quality were assessed by analytical descriptive sensory test and the prediction model of the sensory score from the component profile was constructed by means of orthogonal projections to latent structures (OPLS) regression analysis. Our results showed that 12 sake attributes [ginjo-ka (aroma of premium ginjo sake), grassy/aldehydic odor, sweet aroma/caramel/burnt odor, sulfury odor, sour taste, umami, bitter taste, body, amakara (dryness), aftertaste, pungent/smoothness and appearance] and overall quality were accurately explained by component profiles. In addition, we were able to select statistically significant components according to variable importance on projection (VIP). Our methodology clarified the correlation between sake attribute and 200 low molecular components and presented the importance of each component thus, providing new insights to the flavor study of sake. Copyright © 2014 The Society for Biotechnology, Japan. Published by Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Vitanovski, Dime; Tsymbal, Alexey; Ionasec, Razvan; Georgescu, Bogdan; Zhou, Shaohua K.; Hornegger, Joachim; Comaniciu, Dorin
2011-03-01
Congenital heart defect (CHD) is the most common birth defect and a frequent cause of death for children. Tetralogy of Fallot (ToF) is the most often occurring CHD which affects in particular the pulmonary valve and trunk. Emerging interventional methods enable percutaneous pulmonary valve implantation, which constitute an alternative to open heart surgery. While minimal invasive methods become common practice, imaging and non-invasive assessment tools become crucial components in the clinical setting. Cardiac computed tomography (CT) and cardiac magnetic resonance imaging (cMRI) are techniques with complementary properties and ability to acquire multiple non-invasive and accurate scans required for advance evaluation and therapy planning. In contrary to CT which covers the full 4D information over the cardiac cycle, cMRI often acquires partial information, for example only one 3D scan of the whole heart in the end-diastolic phase and two 2D planes (long and short axes) over the whole cardiac cycle. The data acquired in this way is called sparse cMRI. In this paper, we propose a regression-based approach for the reconstruction of the full 4D pulmonary trunk model from sparse MRI. The reconstruction approach is based on learning a distance function between the sparse MRI which needs to be completed and the 4D CT data with the full information used as the training set. The distance is based on the intrinsic Random Forest similarity which is learnt for the corresponding regression problem of predicting coordinates of unseen mesh points. Extensive experiments performed on 80 cardiac CT and MR sequences demonstrated the average speed of 10 seconds and accuracy of 0.1053mm mean absolute error for the proposed approach. Using the case retrieval workflow and local nearest neighbour regression with the learnt distance function appears to be competitive with respect to "black box" regression with immediate prediction of coordinates, while providing transparency to the predictions made.
Trnovec, Tomáš; Jusko, Todd A; Šovčíková, Eva; Lancz, Kinga; Chovancová, Jana; Patayová, Henrieta; Palkovičová, L'ubica; Drobná, Beata; Langer, Pavel; Van den Berg, Martin; Dedik, Ladislav; Wimmerová, Soňa
2013-08-01
Toxic equivalency factors (TEFs) are an important component in the risk assessment of dioxin-like human exposures. At present, this concept is based mainly on in vivo animal experiments using oral dosage. Consequently, the current human TEFs derived from mammalian experiments are applicable only for exposure situations in which oral ingestion occurs. Nevertheless, these "intake" TEFs are commonly-but incorrectly-used by regulatory authorities to calculate "systemic" toxic equivalents (TEQs) based on human blood and tissue concentrations, which are used as biomarkers for either exposure or effect. We sought to determine relative effect potencies (REPs) for systemic human concentrations of dioxin-like mixture components using thyroid volume or serum free thyroxine (FT4) concentration as the outcomes of interest. We used a benchmark concentration and a regression-based approach to compare the strength of association between each dioxin-like compound and the thyroid end points in 320 adults residing in an organochlorine-polluted area of eastern Slovakia. REPs calculated from thyroid volume and FT4 were similar. The regression coefficient (β)-derived REP data from thyroid volume and FT4 level were correlated with the World Health Organization (WHO) TEF values (Spearman r = 0.69, p = 0.01 and r = 0.62, p = 0.03, respectively). The calculated REPs were mostly within the minimum and maximum values for in vivo REPs derived by other investigators. Our REPs calculated from thyroid end points realistically reflect human exposure scenarios because they are based on chronic, low-dose human exposures and on biomarkers reflecting body burden. Compared with previous results, our REPs suggest higher sensitivity to the effects of dioxin-like compounds.
NASA Astrophysics Data System (ADS)
Gilmore, A. M.
2015-12-01
This study describes a method based on simultaneous absorbance and fluorescence excitation-emission mapping for rapidly and accurately monitoring dissolved organic carbon concentration and disinfection by-product formation potential for surface water sourced drinking water treatment. The method enables real-time monitoring of the Dissolved Organic Carbon (DOC), absorbance at 254 nm (UVA), the Specific UV Absorbance (SUVA) as well as the Simulated Distribution System Trihalomethane (THM) Formation Potential (SDS-THMFP) for the source and treated water among other component parameters. The method primarily involves Parallel Factor Analysis (PARAFAC) decomposition of the high and lower molecular weight humic and fulvic organic component concentrations. The DOC calibration method involves calculating a single slope factor (with the intercept fixed at 0 mg/l) by linear regression for the UVA divided by the ratio of the high and low molecular weight component concentrations. This method thus corrects for the changes in the molecular weight component composition as a function of the source water composition and coagulation treatment effects. The SDS-THMFP calibration involves a multiple linear regression of the DOC, organic component ratio, chlorine residual, pH and alkalinity. Both the DOC and SDS-THMFP correlations over a period of 18 months exhibited adjusted correlation coefficients with r2 > 0.969. The parameters can be reported as a function of compliance rules associated with required % removals of DOC (as a function of alkalinity) and predicted maximum contaminant levels (MCL) of THMs. The single instrument method, which is compatible with continuous flow monitoring or grab sampling, provides a rapid (2-3 minute) and precise indicator of drinking water disinfectant treatability without the need for separate UV photometric and DOC meter measurements or independent THM determinations.
Modeling vertebrate diversity in Oregon using satellite imagery
NASA Astrophysics Data System (ADS)
Cablk, Mary Elizabeth
Vertebrate diversity was modeled for the state of Oregon using a parametric approach to regression tree analysis. This exploratory data analysis effectively modeled the non-linear relationships between vertebrate richness and phenology, terrain, and climate. Phenology was derived from time-series NOAA-AVHRR satellite imagery for the year 1992 using two methods: principal component analysis and derivation of EROS data center greenness metrics. These two measures of spatial and temporal vegetation condition incorporated the critical temporal element in this analysis. The first three principal components were shown to contain spatial and temporal information about the landscape and discriminated phenologically distinct regions in Oregon. Principal components 2 and 3, 6 greenness metrics, elevation, slope, aspect, annual precipitation, and annual seasonal temperature difference were investigated as correlates to amphibians, birds, all vertebrates, reptiles, and mammals. Variation explained for each regression tree by taxa were: amphibians (91%), birds (67%), all vertebrates (66%), reptiles (57%), and mammals (55%). Spatial statistics were used to quantify the pattern of each taxa and assess validity of resulting predictions from regression tree models. Regression tree analysis was relatively robust against spatial autocorrelation in the response data and graphical results indicated models were well fit to the data.
Agho, Kingsley E; Ezeh, Osita K; Ogbo, Felix A; Enoma, Anthony I; Raynes-Greenow, Camille
2018-05-01
Antenatal care (ANC) is an essential intervention to improve maternal and child health. In Nigeria, no population-based studies have investigated predictors of poor receipt of components and uptake of ANC at the national level to inform targeted maternal health initiatives. This study aimed to examine factors associated with inadequate receipt of components and use of ANC in Nigeria. The study used information on 20 405 singleton live-born infants of the mothers from the 2013 Nigeria Demographic and Health Survey. Multivariable logistic regression analyses that adjusted for cluster and survey weights were used to determine potential factors associated with inadequate receipt of components and use of ANC. The prevalence of underutilization and inadequate components of ANC were 47.5% (95% CI: 45.2 to 49.9) and 92.6% (95% CI: 91.8 to 93.2), respectively. Common risk factors for underutilization and inadequate components of ANC in Nigeria included residence in rural areas, no maternal education, maternal unemployment, long distance to health facilities and less maternal exposure to the media. Other risk factors for underutilization of ANC were home births and low household wealth. The study suggests that underutilization and inadequate receipt of the components of ANC were associated with amenable factors in Nigeria. Subsidized maternal services and well-guided health educational messages or financial support from the government will help to improve uptake of ANC services.
Dirichlet Component Regression and its Applications to Psychiatric Data
Gueorguieva, Ralitza; Rosenheck, Robert; Zelterman, Daniel
2011-01-01
Summary We describe a Dirichlet multivariable regression method useful for modeling data representing components as a percentage of a total. This model is motivated by the unmet need in psychiatry and other areas to simultaneously assess the effects of covariates on the relative contributions of different components of a measure. The model is illustrated using the Positive and Negative Syndrome Scale (PANSS) for assessment of schizophrenia symptoms which, like many other metrics in psychiatry, is composed of a sum of scores on several components, each in turn, made up of sums of evaluations on several questions. We simultaneously examine the effects of baseline socio-demographic and co-morbid correlates on all of the components of the total PANSS score of patients from a schizophrenia clinical trial and identify variables associated with increasing or decreasing relative contributions of each component. Several definitions of residuals are provided. Diagnostics include measures of overdispersion, Cook’s distance, and a local jackknife influence metric. PMID:22058582
Hanley, James A
2008-01-01
Most survival analysis textbooks explain how the hazard ratio parameters in Cox's life table regression model are estimated. Fewer explain how the components of the nonparametric baseline survivor function are derived. Those that do often relegate the explanation to an "advanced" section and merely present the components as algebraic or iterative solutions to estimating equations. None comment on the structure of these estimators. This note brings out a heuristic representation that may help to de-mystify the structure.
Prediction of sweetness and amino acid content in soybean crops from hyperspectral imagery
NASA Astrophysics Data System (ADS)
Monteiro, Sildomar Takahashi; Minekawa, Yohei; Kosugi, Yukio; Akazawa, Tsuneya; Oda, Kunio
Hyperspectral image data provides a powerful tool for non-destructive crop analysis. This paper investigates a hyperspectral image data-processing method to predict the sweetness and amino acid content of soybean crops. Regression models based on artificial neural networks were developed in order to calculate the level of sucrose, glucose, fructose, and nitrogen concentrations, which can be related to the sweetness and amino acid content of vegetables. A performance analysis was conducted comparing regression models obtained using different preprocessing methods, namely, raw reflectance, second derivative, and principal components analysis. This method is demonstrated using high-resolution hyperspectral data of wavelengths ranging from the visible to the near infrared acquired from an experimental field of green vegetable soybeans. The best predictions were achieved using a nonlinear regression model of the second derivative transformed dataset. Glucose could be predicted with greater accuracy, followed by sucrose, fructose and nitrogen. The proposed method provides the possibility to provide relatively accurate maps predicting the chemical content of soybean crop fields.
NASA Astrophysics Data System (ADS)
Fernández-Manso, O.; Fernández-Manso, A.; Quintano, C.
2014-09-01
Aboveground biomass (AGB) estimation from optical satellite data is usually based on regression models of original or synthetic bands. To overcome the poor relation between AGB and spectral bands due to mixed-pixels when a medium spatial resolution sensor is considered, we propose to base the AGB estimation on fraction images from Linear Spectral Mixture Analysis (LSMA). Our study area is a managed Mediterranean pine woodland (Pinus pinaster Ait.) in central Spain. A total of 1033 circular field plots were used to estimate AGB from Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) optical data. We applied Pearson correlation statistics and stepwise multiple regression to identify suitable predictors from the set of variables of original bands, fraction imagery, Normalized Difference Vegetation Index and Tasselled Cap components. Four linear models and one nonlinear model were tested. A linear combination of ASTER band 2 (red, 0.630-0.690 μm), band 8 (short wave infrared 5, 2.295-2.365 μm) and green vegetation fraction (from LSMA) was the best AGB predictor (Radj2=0.632, the root-mean-squared error of estimated AGB was 13.3 Mg ha-1 (or 37.7%), resulting from cross-validation), rather than other combinations of the above cited independent variables. Results indicated that using ASTER fraction images in regression models improves the AGB estimation in Mediterranean pine forests. The spatial distribution of the estimated AGB, based on a multiple linear regression model, may be used as baseline information for forest managers in future studies, such as quantifying the regional carbon budget, fuel accumulation or monitoring of management practices.
Are low wages risk factors for hypertension?
Du, Juan
2012-01-01
Objective: Socio-economic status (SES) is strongly correlated with hypertension. But SES has several components, including income and correlations in cross-sectional data need not imply SES is a risk factor. This study investigates whether wages—the largest category within income—are risk factors. Methods: We analysed longitudinal, nationally representative US data from four waves (1999, 2001, 2003 and 2005) of the Panel Study of Income Dynamics. The overall sample was restricted to employed persons age 25–65 years, n = 17 295. Separate subsamples were constructed of persons within two age groups (25–44 and 45–65 years) and genders. Hypertension incidence was self-reported based on physician diagnosis. Our study was prospective since data from three base years (1999, 2001, 2003) were used to predict newly diagnosed hypertension for three subsequent years (2001, 2003, 2005). In separate analyses, data from the first base year were used to predict time-to-reporting hypertension. Logistic regressions with random effects and Cox proportional hazards regressions were run. Results: Negative and strongly statistically significant correlations between wages and hypertension were found both in logistic and Cox regressions, especially for subsamples containing the younger age group (25–44 years) and women. Correlations were stronger when three health variables—obesity, subjective measures of health and number of co-morbidities—were excluded from regressions. Doubling the wage was associated with 25–30% lower chances of hypertension for persons aged 25–44 years. Conclusions: The strongest evidence for low wages being risk factors for hypertension among working people were for women and persons aged 25–44 years. PMID:22262559
Are low wages risk factors for hypertension?
Leigh, J Paul; Du, Juan
2012-12-01
Socio-economic status (SES) is strongly correlated with hypertension. But SES has several components, including income and correlations in cross-sectional data need not imply SES is a risk factor. This study investigates whether wages-the largest category within income-are risk factors. We analysed longitudinal, nationally representative US data from four waves (1999, 2001, 2003 and 2005) of the Panel Study of Income Dynamics. The overall sample was restricted to employed persons age 25-65 years, n = 17 295. Separate subsamples were constructed of persons within two age groups (25-44 and 45-65 years) and genders. Hypertension incidence was self-reported based on physician diagnosis. Our study was prospective since data from three base years (1999, 2001, 2003) were used to predict newly diagnosed hypertension for three subsequent years (2001, 2003, 2005). In separate analyses, data from the first base year were used to predict time-to-reporting hypertension. Logistic regressions with random effects and Cox proportional hazards regressions were run. Negative and strongly statistically significant correlations between wages and hypertension were found both in logistic and Cox regressions, especially for subsamples containing the younger age group (25-44 years) and women. Correlations were stronger when three health variables-obesity, subjective measures of health and number of co-morbidities-were excluded from regressions. Doubling the wage was associated with 25-30% lower chances of hypertension for persons aged 25-44 years. The strongest evidence for low wages being risk factors for hypertension among working people were for women and persons aged 25-44 years.
Biomass relations for components of five Minnesota shrubs.
Richard R. Buech; David J. Rugg
1995-01-01
Presents equations for estimating biomass of six components on five species of shrubs common to northeastern Minnesota. Regression analysis is used to compare the performance of three estimators of biomass.
Lee, Sung-Sahn; Lee, Yong-In; Kim, Dong-Uk; Lee, Dae-Hee; Moon, Young-Wan
2018-01-01
Achieving proper rotational alignment of the femoral component in total knee arthroplasty (TKA) for valgus knee is challenging because of lateral condylar hypoplasia and lateral cartilage erosion. Gap-based navigation-assisted TKA enables surgeons to determine the angle of femoral component rotation (FCR) based on the posterior condylar axis. This study evaluated the possible factors that affect the rotational alignment of the femoral component based on the posterior condylar axis. Between 2008 and 2016, 28 knees were enrolled. The dependent variable for this study was FCR based on the posterior condylar axis, which was obtained from the navigation system archives. Multiple regression analysis was conducted to identify factors that might predict FCR, including body mass index (BMI), Kellgren-Lawrence grade (K-L grade), lateral distal femoral angles obtained from the navigation system and radiographs (NaviLDFA, XrayLDFA), hip-knee-ankle (HKA) axis, lateral gap under varus stress (LGVS), medial gap under valgus stress (MGVS), and side-to-side difference (STSD, MGVS - LGVS). The mean FCR was 6.1° ± 2.0°. Of all the potentially predictive factors evaluated in this study, only NaviLDFA (β = -0.668) and XrayLDFA (β = -0.714) predicted significantly FCR. The LDFAs, as determined using radiographs and the navigation system, were both predictive of the rotational alignment of the femoral component based on the posterior condylar axis in gap-based TKA for valgus knee. A 1° increment with NaviLDFA led to a 0.668° decrement in FCR, and a 1° increment with XrayLDFA led to a 0.714° decrement. This suggests that symmetrical lateral condylar hypoplasia of the posterior and distal side occurs in lateral compartment end-stage osteoarthritis with valgus deformity.
NASA Astrophysics Data System (ADS)
Baraldi, P.; Bonfanti, G.; Zio, E.
2018-03-01
The identification of the current degradation state of an industrial component and the prediction of its future evolution is a fundamental step for the development of condition-based and predictive maintenance approaches. The objective of the present work is to propose a general method for extracting a health indicator to measure the amount of component degradation from a set of signals measured during operation. The proposed method is based on the combined use of feature extraction techniques, such as Empirical Mode Decomposition and Auto-Associative Kernel Regression, and a multi-objective Binary Differential Evolution (BDE) algorithm for selecting the subset of features optimal for the definition of the health indicator. The objectives of the optimization are desired characteristics of the health indicator, such as monotonicity, trendability and prognosability. A case study is considered, concerning the prediction of the remaining useful life of turbofan engines. The obtained results confirm that the method is capable of extracting health indicators suitable for accurate prognostics.
Identification of different bacterial species in biofilms using confocal Raman microscopy
NASA Astrophysics Data System (ADS)
Beier, Brooke D.; Quivey, Robert G.; Berger, Andrew J.
2010-11-01
Confocal Raman microspectroscopy is used to discriminate between different species of bacteria grown in biofilms. Tests are performed using two bacterial species, Streptococcus sanguinis and Streptococcus mutans, which are major components of oral plaque and of particular interest due to their association with healthy and cariogenic plaque, respectively. Dehydrated biofilms of these species are studied as a simplified model of dental plaque. A prediction model based on principal component analysis and logistic regression is calibrated using pure biofilms of each species and validated on pure biofilms grown months later, achieving 96% accuracy in prospective classification. When biofilms of the two species are partially mixed together, Raman-based identifications are achieved within ~2 μm of the boundaries between species with 97% accuracy. This combination of spatial resolution and predication accuracy should be suitable for forming images of species distributions within intact two-species biofilms.
A preliminary case-mix classification system for Medicare home health clients.
Branch, L G; Goldberg, H B
1993-04-01
In this study, a hierarchical case-mix model was developed for grouping Medicare home health beneficiaries homogeneously, based on the allowed charges for their home care. Based on information from a two-page form from 2,830 clients from ten states and using the classification and regression trees method, a four-component model was developed that yielded 11 case-mix groups and explained 22% of the variance for the test sample of 1,929 clients. The four components are rehabilitation, special care, skilled-nurse monitoring, and paralysis; each are categorized as present or absent. The range of mean-allowed charges for the 11 groups in the total sample was $473 to $2,562 with a mean of $847. Of the six groups with mean charges above $1,000, none exceeded 5.2% of clients; thus, the high-cost groups are relatively rare.
Kajbafnezhad, H; Ahadi, H; Heidarie, A; Askari, P; Enayati, M
2012-10-01
The aim of this study was to predict athletic success motivation by mental skills, emotional intelligence and its components. The research sample consisted of 153 male athletes who were selected through random multistage sampling. The subjects completed the Mental Skills Questionnaire, Bar-On Emotional Intelligence questionnaire and the perception of sport success questionnaire. Data were analyzed using Pearson correlation coefficient and multiple regressions. Regression analysis shows that between the two variables of mental skill and emotional intelligence, mental skill is the best predictor for athletic success motivation and has a better ability to predict the success rate of the participants. Regression analysis results showed that among all the components of emotional intelligence, self-respect had a significantly higher ability to predict athletic success motivation. The use of psychological skills and emotional intelligence as an mediating and regulating factor and organizer cause leads to improved performance and can not only can to help athletes in making suitable and effective decisions for reaching a desired goal.
Separation mechanism of nortriptyline and amytriptyline in RPLC
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gritti, Fabrice; Guiochon, Georges A
2005-08-01
The single and the competitive equilibrium isotherms of nortriptyline and amytriptyline were acquired by frontal analysis (FA) on the C{sub 18}-bonded discovery column, using a 28/72 (v/v) mixture of acetonitrile and water buffered with phosphate (20 mM, pH 2.70). The adsorption energy distributions (AED) of each compound were calculated from the raw adsorption data. Both the fitting of the adsorption data using multi-linear regression analysis and the AEDs are consistent with a trimodal isotherm model. The single-component isotherm data fit well to the tri-Langmuir isotherm model. The extension to a competitive two-component tri-Langmuir isotherm model based on the best parametersmore » of the single-component isotherms does not account well for the breakthrough curves nor for the overloaded band profiles measured for mixtures of nortriptyline and amytriptyline. However, it was possible to derive adjusted parameters of a competitive tri-Langmuir model based on the fitting of the adsorption data obtained for these mixtures. A very good agreement was then found between the calculated and the experimental overloaded band profiles of all the mixtures injected.« less
Rehkämper, Gerd; Frahm, Heiko D; Cnotka, Julia
2008-01-01
Brain sizes and brain component sizes of five domesticated pigeon breeds including homing (racing) pigeons are compared with rock doves (Columba livia) based on an allometric approach to test the influence of domestication on brain and brain component size. Net brain volume, the volumes of cerebellum and telencephalon as a whole are significantly smaller in almost all domestic pigeons. Inside the telencephalon, mesopallium, nidopallium (+ entopallium + arcopallium) and septum are smaller as well. The hippocampus is significantly larger, particularly in homing pigeons. This finding is in contrast to the predictions of the 'regression hypothesis' of brain alteration under domestication. Among the domestic pigeons homing pigeons have significantly larger olfactory bulbs. These data are interpreted as representing a functional adaptation to homing that is based on spatial cognition and sensory integration. We argue that domestication as seen in domestic pigeons is not principally different from evolution in the wild, but represents a heuristic model to understand the evolutionary process in terms of adaptation and optimization. Copyright 2007 S. Karger AG, Basel.
NASA Technical Reports Server (NTRS)
Trejo, Leonard J.; Shensa, Mark J.; Remington, Roger W. (Technical Monitor)
1998-01-01
This report describes the development and evaluation of mathematical models for predicting human performance from discrete wavelet transforms (DWT) of event-related potentials (ERP) elicited by task-relevant stimuli. The DWT was compared to principal components analysis (PCA) for representation of ERPs in linear regression and neural network models developed to predict a composite measure of human signal detection performance. Linear regression models based on coefficients of the decimated DWT predicted signal detection performance with half as many f ree parameters as comparable models based on PCA scores. In addition, the DWT-based models were more resistant to model degradation due to over-fitting than PCA-based models. Feed-forward neural networks were trained using the backpropagation,-, algorithm to predict signal detection performance based on raw ERPs, PCA scores, or high-power coefficients of the DWT. Neural networks based on high-power DWT coefficients trained with fewer iterations, generalized to new data better, and were more resistant to overfitting than networks based on raw ERPs. Networks based on PCA scores did not generalize to new data as well as either the DWT network or the raw ERP network. The results show that wavelet expansions represent the ERP efficiently and extract behaviorally important features for use in linear regression or neural network models of human performance. The efficiency of the DWT is discussed in terms of its decorrelation and energy compaction properties. In addition, the DWT models provided evidence that a pattern of low-frequency activity (1 to 3.5 Hz) occurring at specific times and scalp locations is a reliable correlate of human signal detection performance.
NASA Technical Reports Server (NTRS)
Trejo, L. J.; Shensa, M. J.
1999-01-01
This report describes the development and evaluation of mathematical models for predicting human performance from discrete wavelet transforms (DWT) of event-related potentials (ERP) elicited by task-relevant stimuli. The DWT was compared to principal components analysis (PCA) for representation of ERPs in linear regression and neural network models developed to predict a composite measure of human signal detection performance. Linear regression models based on coefficients of the decimated DWT predicted signal detection performance with half as many free parameters as comparable models based on PCA scores. In addition, the DWT-based models were more resistant to model degradation due to over-fitting than PCA-based models. Feed-forward neural networks were trained using the backpropagation algorithm to predict signal detection performance based on raw ERPs, PCA scores, or high-power coefficients of the DWT. Neural networks based on high-power DWT coefficients trained with fewer iterations, generalized to new data better, and were more resistant to overfitting than networks based on raw ERPs. Networks based on PCA scores did not generalize to new data as well as either the DWT network or the raw ERP network. The results show that wavelet expansions represent the ERP efficiently and extract behaviorally important features for use in linear regression or neural network models of human performance. The efficiency of the DWT is discussed in terms of its decorrelation and energy compaction properties. In addition, the DWT models provided evidence that a pattern of low-frequency activity (1 to 3.5 Hz) occurring at specific times and scalp locations is a reliable correlate of human signal detection performance. Copyright 1999 Academic Press.
NASA Astrophysics Data System (ADS)
Nordemann, D. J. R.; Rigozo, N. R.; de Souza Echer, M. P.; Echer, E.
2008-11-01
We present here an implementation of a least squares iterative regression method applied to the sine functions embedded in the principal components extracted from geophysical time series. This method seems to represent a useful improvement for the non-stationary time series periodicity quantitative analysis. The principal components determination followed by the least squares iterative regression method was implemented in an algorithm written in the Scilab (2006) language. The main result of the method is to obtain the set of sine functions embedded in the series analyzed in decreasing order of significance, from the most important ones, likely to represent the physical processes involved in the generation of the series, to the less important ones that represent noise components. Taking into account the need of a deeper knowledge of the Sun's past history and its implication to global climate change, the method was applied to the Sunspot Number series (1750-2004). With the threshold and parameter values used here, the application of the method leads to a total of 441 explicit sine functions, among which 65 were considered as being significant and were used for a reconstruction that gave a normalized mean squared error of 0.146.
NASA Astrophysics Data System (ADS)
Ťupek, Boris; Launiainen, Samuli; Peltoniemi, Mikko; Heikkinen, Jukka; Lehtonen, Aleksi
2016-04-01
Litter decomposition rates of the most process based soil carbon models affected by environmental conditions are linked with soil heterotrophic CO2 emissions and serve for estimating soil carbon sequestration; thus due to the mass balance equation the variation in measured litter inputs and measured heterotrophic soil CO2 effluxes should indicate soil carbon stock changes, needed by soil carbon management for mitigation of anthropogenic CO2 emissions, if sensitivity functions of the applied model suit to the environmental conditions e.g. soil temperature and moisture. We evaluated the response forms of autotrophic and heterotrophic forest floor respiration to soil temperature and moisture in four boreal forest sites of the International Cooperative Programme on Assessment and Monitoring of Air Pollution Effects on Forests (ICP Forests) by a soil trenching experiment during year 2015 in southern Finland. As expected both autotrophic and heterotrophic forest floor respiration components were primarily controlled by soil temperature and exponential regression models generally explained more than 90% of the variance. Soil moisture regression models on average explained less than 10% of the variance and the response forms varied between Gaussian for the autotrophic forest floor respiration component and linear for the heterotrophic forest floor respiration component. Although the percentage of explained variance of soil heterotrophic respiration by the soil moisture was small, the observed reduction of CO2 emissions with higher moisture levels suggested that soil moisture response of soil carbon models not accounting for the reduction due to excessive moisture should be re-evaluated in order to estimate right levels of soil carbon stock changes. Our further study will include evaluation of process based soil carbon models by the annual heterotrophic respiration and soil carbon stocks.
Time-Dependent Moment Tensors of the First Four Source Physics Experiments (SPE) Explosions
NASA Astrophysics Data System (ADS)
Yang, X.
2015-12-01
We use mainly vertical-component geophone data within 2 km from the epicenter to invert for time-dependent moment tensors of the first four SPE explosions: SPE-1, SPE-2, SPE-3 and SPE-4Prime. We employ a one-dimensional (1D) velocity model developed from P- and Rg-wave travel times for Green's function calculations. The attenuation structure of the model is developed from P- and Rg-wave amplitudes. We select data for the inversion based on the criterion that they show consistent travel times and amplitude behavior as those predicted by the 1D model. Due to limited azimuthal coverage of the sources and the mostly vertical-component-only nature of the dataset, only long-period, diagonal components of the moment tensors are well constrained. Nevertheless, the moment tensors, particularly their isotropic components, provide reasonable estimates of the long-period source amplitudes as well as estimates of corner frequencies, albeit with larger uncertainties. The estimated corner frequencies, however, are consistent with estimates from ratios of seismogram spectra from different explosions. These long-period source amplitudes and corner frequencies cannot be fit by classical P-wave explosion source models. The results motivate the development of new P-wave source models suitable for these chemical explosions. To that end, we fit inverted moment-tensor spectra by modifying the classical explosion model using regressions of estimated source parameters. Although the number of data points used in the regression is small, the approach suggests a way for the new-model development when more data are collected.
Face aging effect simulation model based on multilayer representation and shearlet transform
NASA Astrophysics Data System (ADS)
Li, Yuancheng; Li, Yan
2017-09-01
In order to extract detailed facial features, we build a face aging effect simulation model based on multilayer representation and shearlet transform. The face is divided into three layers: the global layer of the face, the local features layer, and texture layer, which separately establishes the aging model. First, the training samples are classified according to different age groups, and we use active appearance model (AAM) at the global level to obtain facial features. The regression equations of shape and texture with age are obtained by fitting the support vector machine regression, which is based on the radial basis function. We use AAM to simulate the aging of facial organs. Then, for the texture detail layer, we acquire the significant high-frequency characteristic components of the face by using the multiscale shearlet transform. Finally, we get the last simulated aging images of the human face by the fusion algorithm. Experiments are carried out on the FG-NET dataset, and the experimental results show that the simulated face images have less differences from the original image and have a good face aging simulation effect.
NASA Astrophysics Data System (ADS)
de Andrés, Javier; Landajo, Manuel; Lorca, Pedro; Labra, Jose; Ordóñez, Patricia
Artificial neural networks have proven to be useful tools for solving financial analysis problems such as financial distress prediction and audit risk assessment. In this paper we focus on the performance of robust (least absolute deviation-based) neural networks on measuring liquidity of firms. The problem of learning the bivariate relationship between the components (namely, current liabilities and current assets) of the so-called current ratio is analyzed, and the predictive performance of several modelling paradigms (namely, linear and log-linear regressions, classical ratios and neural networks) is compared. An empirical analysis is conducted on a representative data base from the Spanish economy. Results indicate that classical ratio models are largely inadequate as a realistic description of the studied relationship, especially when used for predictive purposes. In a number of cases, especially when the analyzed firms are microenterprises, the linear specification is improved by considering the flexible non-linear structures provided by neural networks.
Robitaille, Yvonne; Fournier, Michel; Laforest, Sophie; Gauvin, Lise; Filiatrault, Johanne; Corriveau, Hélène
2012-08-01
To examine the effect of a fall prevention program offered under real-world conditions on balance maintenance several months after the program. To explore the program's impact on falls. A quasi-experimental study was conducted among community-dwelling seniors, with pre- and postintervention measures of balance performance and self-reported falls. Ten community-based organizations offered the intervention (98 participants) and 7 recruited participants to the study's control arm (102 participants). An earlier study examined balance immediately after the 12-week program. The present study focuses on the 12-month effect. Linear regression (balance) and negative binomial regression (falls) procedures were performed.falls. During the 12-month study period, experimental participants improved and maintained their balance as reflected by their scores on three performance tests. There was no evidence of an effect on falls.falls. Structured group exercise programs offered in community-based settings can maintain selected components of balance for several months after the program's end.
Saunders, Christina T; Blume, Jeffrey D
2017-10-26
Mediation analysis explores the degree to which an exposure's effect on an outcome is diverted through a mediating variable. We describe a classical regression framework for conducting mediation analyses in which estimates of causal mediation effects and their variance are obtained from the fit of a single regression model. The vector of changes in exposure pathway coefficients, which we named the essential mediation components (EMCs), is used to estimate standard causal mediation effects. Because these effects are often simple functions of the EMCs, an analytical expression for their model-based variance follows directly. Given this formula, it is instructive to revisit the performance of routinely used variance approximations (e.g., delta method and resampling methods). Requiring the fit of only one model reduces the computation time required for complex mediation analyses and permits the use of a rich suite of regression tools that are not easily implemented on a system of three equations, as would be required in the Baron-Kenny framework. Using data from the BRAIN-ICU study, we provide examples to illustrate the advantages of this framework and compare it with the existing approaches. © The Author 2017. Published by Oxford University Press.
Research on Fault Rate Prediction Method of T/R Component
NASA Astrophysics Data System (ADS)
Hou, Xiaodong; Yang, Jiangping; Bi, Zengjun; Zhang, Yu
2017-07-01
T/R component is an important part of the large phased array radar antenna array, because of its large numbers, high fault rate, it has important significance for fault prediction. Aiming at the problems of traditional grey model GM(1,1) in practical operation, the discrete grey model is established based on the original model in this paper, and the optimization factor is introduced to optimize the background value, and the linear form of the prediction model is added, the improved discrete grey model of linear regression is proposed, finally, an example is simulated and compared with other models. The results show that the method proposed in this paper has higher accuracy and the solution is simple and the application scope is more extensive.
Namba, Shushi; Kabir, Russell S.; Miyatani, Makoto; Nakao, Takashi
2017-01-01
While numerous studies have examined the relationships between facial actions and emotions, they have yet to account for the ways that specific spontaneous facial expressions map onto emotional experiences induced without expressive intent. Moreover, previous studies emphasized that a fine-grained investigation of facial components could establish the coherence of facial actions with actual internal states. Therefore, this study aimed to accumulate evidence for the correspondence between spontaneous facial components and emotional experiences. We reinvestigated data from previous research which secretly recorded spontaneous facial expressions of Japanese participants as they watched film clips designed to evoke four different target emotions: surprise, amusement, disgust, and sadness. The participants rated their emotional experiences via a self-reported questionnaire of 16 emotions. These spontaneous facial expressions were coded using the Facial Action Coding System, the gold standard for classifying visible facial movements. We corroborated each facial action that was present in the emotional experiences by applying stepwise regression models. The results found that spontaneous facial components occurred in ways that cohere to their evolutionary functions based on the rating values of emotional experiences (e.g., the inner brow raiser might be involved in the evaluation of novelty). This study provided new empirical evidence for the correspondence between each spontaneous facial component and first-person internal states of emotion as reported by the expresser. PMID:28522979
Combining Mixture Components for Clustering*
Baudry, Jean-Patrick; Raftery, Adrian E.; Celeux, Gilles; Lo, Kenneth; Gottardo, Raphaël
2010-01-01
Model-based clustering consists of fitting a mixture model to data and identifying each cluster with one of its components. Multivariate normal distributions are typically used. The number of clusters is usually determined from the data, often using BIC. In practice, however, individual clusters can be poorly fitted by Gaussian distributions, and in that case model-based clustering tends to represent one non-Gaussian cluster by a mixture of two or more Gaussian distributions. If the number of mixture components is interpreted as the number of clusters, this can lead to overestimation of the number of clusters. This is because BIC selects the number of mixture components needed to provide a good approximation to the density, rather than the number of clusters as such. We propose first selecting the total number of Gaussian mixture components, K, using BIC and then combining them hierarchically according to an entropy criterion. This yields a unique soft clustering for each number of clusters less than or equal to K. These clusterings can be compared on substantive grounds, and we also describe an automatic way of selecting the number of clusters via a piecewise linear regression fit to the rescaled entropy plot. We illustrate the method with simulated data and a flow cytometry dataset. Supplemental Materials are available on the journal Web site and described at the end of the paper. PMID:20953302
Plant, soil, and shadow reflectance components of row crops
NASA Technical Reports Server (NTRS)
Richardson, A. J.; Wiegand, C. L.; Gausman, H. W.; Cuellar, J. A.; Gerbermann, A. H.
1975-01-01
Data from the first Earth Resource Technology Satellite (LANDSAT-1) multispectral scanner (MSS) were used to develop three plant canopy models (Kubelka-Munk (K-M), regression, and combined K-M and regression models) for extracting plant, soil, and shadow reflectance components of cropped fields. The combined model gave the best correlation between MSS data and ground truth, by accounting for essentially all of the reflectance of plants, soil, and shadow between crop rows. The principles presented can be used to better forecast crop yield and to estimate acreage.
NASA Astrophysics Data System (ADS)
Rodrigues, João Fabrício Mota; Coelho, Marco Túlio Pacheco; Ribeiro, Bruno R.
2018-04-01
Species distribution models (SDM) have been broadly used in ecology to address theoretical and practical problems. Currently, there are two main approaches to generate SDMs: (i) correlative, which is based on species occurrences and environmental predictor layers and (ii) process-based models, which are constructed based on species' functional traits and physiological tolerances. The distributions estimated by each approach are based on different components of species niche. Predictions of correlative models approach species realized niches, while predictions of process-based are more akin to species fundamental niche. Here, we integrated the predictions of fundamental and realized distributions of the freshwater turtle Trachemys dorbigni. Fundamental distribution was estimated using data of T. dorbigni's egg incubation temperature, and realized distribution was estimated using species occurrence records. Both types of distributions were estimated using the same regression approaches (logistic regression and support vector machines), both considering macroclimatic and microclimatic temperatures. The realized distribution of T. dorbigni was generally nested in its fundamental distribution reinforcing theoretical assumptions that the species' realized niche is a subset of its fundamental niche. Both modelling algorithms produced similar results but microtemperature generated better results than macrotemperature for the incubation model. Finally, our results reinforce the conclusion that species realized distributions are constrained by other factors other than just thermal tolerances.
Lin, Meihua; Li, Haoli; Zhao, Xiaolei; Qin, Jiheng
2013-01-01
Genome-wide analysis of gene-gene interactions has been recognized as a powerful avenue to identify the missing genetic components that can not be detected by using current single-point association analysis. Recently, several model-free methods (e.g. the commonly used information based metrics and several logistic regression-based metrics) were developed for detecting non-linear dependence between genetic loci, but they are potentially at the risk of inflated false positive error, in particular when the main effects at one or both loci are salient. In this study, we proposed two conditional entropy-based metrics to challenge this limitation. Extensive simulations demonstrated that the two proposed metrics, provided the disease is rare, could maintain consistently correct false positive rate. In the scenarios for a common disease, our proposed metrics achieved better or comparable control of false positive error, compared to four previously proposed model-free metrics. In terms of power, our methods outperformed several competing metrics in a range of common disease models. Furthermore, in real data analyses, both metrics succeeded in detecting interactions and were competitive with the originally reported results or the logistic regression approaches. In conclusion, the proposed conditional entropy-based metrics are promising as alternatives to current model-based approaches for detecting genuine epistatic effects. PMID:24339984
Recent Improvements in Estimating Convective and Stratiform Rainfall in Amazonia
NASA Technical Reports Server (NTRS)
Negri, Andrew J.
1999-01-01
In this paper we present results from the application of a satellite infrared (IR) technique for estimating rainfall over northern South America. Our main objectives are to examine the diurnal variability of rainfall and to investigate the relative contributions from the convective and stratiform components. We apply the technique of Anagnostou et al (1999). In simple functional form, the estimated rain area A(sub rain) may be expressed as: A(sub rain) = f(A(sub mode),T(sub mode)), where T(sub mode) is the mode temperature of a cloud defined by 253 K, and A(sub mode) is the area encompassed by T(sub mode). The technique was trained by a regression between coincident microwave estimates from the Goddard Profiling (GPROF) algorithm (Kummerow et al, 1996) applied to SSM/I data and GOES IR (11 microns) observations. The apportionment of the rainfall into convective and stratiform components is based on the microwave technique described by Anagnostou and Kummerow (1997). The convective area from this technique was regressed against an IR structure parameter (the Convective Index) defined by Anagnostou et al (1999). Finally, rainrates are assigned to the Am.de proportional to (253-temperature), with different rates for the convective and stratiform
Fang, Ling; Gu, Caiyun; Liu, Xinyu; Xie, Jiabin; Hou, Zhiguo; Tian, Meng; Yin, Jia; Li, Aizhu; Li, Yubo
2017-01-01
Primary dysmenorrhea (PD) is a common gynecological disorder which, while not life-threatening, severely affects the quality of life of women. Most patients with PD suffer ovarian hormone imbalances caused by uterine contraction, which results in dysmenorrhea. PD patients may also suffer from increases in estrogen levels caused by increased levels of prostaglandin synthesis and release during luteal regression and early menstruation. Although PD pathogenesis has been previously reported on, these studies only examined the menstrual period and neglected the importance of the luteal regression stage. Therefore, the present study used urine metabolomics to examine changes in endogenous substances and detect urine biomarkers for PD during luteal regression. Ultra performance liquid chromatography coupled with quadrupole-time-of-flight mass spectrometry was used to create metabolomic profiles for 36 patients with PD and 27 healthy controls. Principal component analysis and partial least squares discriminate analysis were used to investigate the metabolic alterations associated with PD. Ten biomarkers for PD were identified, including ornithine, dihydrocortisol, histidine, citrulline, sphinganine, phytosphingosine, progesterone, 17-hydroxyprogesterone, androstenedione, and 15-keto-prostaglandin F2α. The specificity and sensitivity of these biomarkers was assessed based on the area under the curve of receiver operator characteristic curves, which can be used to distinguish patients with PD from healthy controls. These results provide novel targets for the treatment of PD. PMID:28098892
NASA Technical Reports Server (NTRS)
York, P.; Labell, R. W.
1980-01-01
An aircraft wing weight estimating method based on a component buildup technique is described. A simplified analytically derived beam model, modified by a regression analysis, is used to estimate the wing box weight, utilizing a data base of 50 actual airplane wing weights. Factors representing materials and methods of construction were derived and incorporated into the basic wing box equations. Weight penalties to the wing box for fuel, engines, landing gear, stores and fold or pivot are also included. Methods for estimating the weight of additional items (secondary structure, control surfaces) have the option of using details available at the design stage (i.e., wing box area, flap area) or default values based on actual aircraft from the data base.
Xian, George Z.; Homer, Collin G.; Rigge, Matthew B.; Shi, Hua; Meyer, Debbie
2015-01-01
Accurate and consistent estimates of shrubland ecosystem components are crucial to a better understanding of ecosystem conditions in arid and semiarid lands. An innovative approach was developed by integrating multiple sources of information to quantify shrubland components as continuous field products within the National Land Cover Database (NLCD). The approach consists of several procedures including field sample collections, high-resolution mapping of shrubland components using WorldView-2 imagery and regression tree models, Landsat 8 radiometric balancing and phenological mosaicking, medium resolution estimates of shrubland components following different climate zones using Landsat 8 phenological mosaics and regression tree models, and product validation. Fractional covers of nine shrubland components were estimated: annual herbaceous, bare ground, big sagebrush, herbaceous, litter, sagebrush, shrub, sagebrush height, and shrub height. Our study area included the footprint of six Landsat 8 scenes in the northwestern United States. Results show that most components have relatively significant correlations with validation data, have small normalized root mean square errors, and correspond well with expected ecological gradients. While some uncertainties remain with height estimates, the model formulated in this study provides a cross-validated, unbiased, and cost effective approach to quantify shrubland components at a regional scale and advances knowledge of horizontal and vertical variability of these components.
Least Principal Components Analysis (LPCA): An Alternative to Regression Analysis.
ERIC Educational Resources Information Center
Olson, Jeffery E.
Often, all of the variables in a model are latent, random, or subject to measurement error, or there is not an obvious dependent variable. When any of these conditions exist, an appropriate method for estimating the linear relationships among the variables is Least Principal Components Analysis. Least Principal Components are robust, consistent,…
Shen, Chung-Wei; Chen, Yi-Hau
2018-03-13
We propose a model selection criterion for semiparametric marginal mean regression based on generalized estimating equations. The work is motivated by a longitudinal study on the physical frailty outcome in the elderly, where the cluster size, that is, the number of the observed outcomes in each subject, is "informative" in the sense that it is related to the frailty outcome itself. The new proposal, called Resampling Cluster Information Criterion (RCIC), is based on the resampling idea utilized in the within-cluster resampling method (Hoffman, Sen, and Weinberg, 2001, Biometrika 88, 1121-1134) and accommodates informative cluster size. The implementation of RCIC, however, is free of performing actual resampling of the data and hence is computationally convenient. Compared with the existing model selection methods for marginal mean regression, the RCIC method incorporates an additional component accounting for variability of the model over within-cluster subsampling, and leads to remarkable improvements in selecting the correct model, regardless of whether the cluster size is informative or not. Applying the RCIC method to the longitudinal frailty study, we identify being female, old age, low income and life satisfaction, and chronic health conditions as significant risk factors for physical frailty in the elderly. © 2018, The International Biometric Society.
Azevedo, C F; Nascimento, M; Silva, F F; Resende, M D V; Lopes, P S; Guimarães, S E F; Glória, L S
2015-10-09
A significant contribution of molecular genetics is the direct use of DNA information to identify genetically superior individuals. With this approach, genome-wide selection (GWS) can be used for this purpose. GWS consists of analyzing a large number of single nucleotide polymorphism markers widely distributed in the genome; however, because the number of markers is much larger than the number of genotyped individuals, and such markers are highly correlated, special statistical methods are widely required. Among these methods, independent component regression, principal component regression, partial least squares, and partial principal components stand out. Thus, the aim of this study was to propose an application of the methods of dimensionality reduction to GWS of carcass traits in an F2 (Piau x commercial line) pig population. The results show similarities between the principal and the independent component methods and provided the most accurate genomic breeding estimates for most carcass traits in pigs.
An Excel Solver Exercise to Introduce Nonlinear Regression
ERIC Educational Resources Information Center
Pinder, Jonathan P.
2013-01-01
Business students taking business analytics courses that have significant predictive modeling components, such as marketing research, data mining, forecasting, and advanced financial modeling, are introduced to nonlinear regression using application software that is a "black box" to the students. Thus, although correct models are…
Shi, J Q; Wang, B; Will, E J; West, R M
2012-11-20
We propose a new semiparametric model for functional regression analysis, combining a parametric mixed-effects model with a nonparametric Gaussian process regression model, namely a mixed-effects Gaussian process functional regression model. The parametric component can provide explanatory information between the response and the covariates, whereas the nonparametric component can add nonlinearity. We can model the mean and covariance structures simultaneously, combining the information borrowed from other subjects with the information collected from each individual subject. We apply the model to dose-response curves that describe changes in the responses of subjects for differing levels of the dose of a drug or agent and have a wide application in many areas. We illustrate the method for the management of renal anaemia. An individual dose-response curve is improved when more information is included by this mechanism from the subject/patient over time, enabling a patient-specific treatment regime. Copyright © 2012 John Wiley & Sons, Ltd.
NASA Astrophysics Data System (ADS)
Petrova, Desislava; Koopman, Siem Jan; Ballester, Joan; Rodó, Xavier
2017-02-01
El Niño (EN) is a dominant feature of climate variability on inter-annual time scales driving changes in the climate throughout the globe, and having wide-spread natural and socio-economic consequences. In this sense, its forecast is an important task, and predictions are issued on a regular basis by a wide array of prediction schemes and climate centres around the world. This study explores a novel method for EN forecasting. In the state-of-the-art the advantageous statistical technique of unobserved components time series modeling, also known as structural time series modeling, has not been applied. Therefore, we have developed such a model where the statistical analysis, including parameter estimation and forecasting, is based on state space methods, and includes the celebrated Kalman filter. The distinguishing feature of this dynamic model is the decomposition of a time series into a range of stochastically time-varying components such as level (or trend), seasonal, cycles of different frequencies, irregular, and regression effects incorporated as explanatory covariates. These components are modeled separately and ultimately combined in a single forecasting scheme. Customary statistical models for EN prediction essentially use SST and wind stress in the equatorial Pacific. In addition to these, we introduce a new domain of regression variables accounting for the state of the subsurface ocean temperature in the western and central equatorial Pacific, motivated by our analysis, as well as by recent and classical research, showing that subsurface processes and heat accumulation there are fundamental for the genesis of EN. An important feature of the scheme is that different regression predictors are used at different lead months, thus capturing the dynamical evolution of the system and rendering more efficient forecasts. The new model has been tested with the prediction of all warm events that occurred in the period 1996-2015. Retrospective forecasts of these events were made for long lead times of at least two and a half years. Hence, the present study demonstrates that the theoretical limit of ENSO prediction should be sought much longer than the commonly accepted "Spring Barrier". The high correspondence between the forecasts and observations indicates that the proposed model outperforms all current operational statistical models, and behaves comparably to the best dynamical models used for EN prediction. Thus, the novel way in which the modeling scheme has been structured could also be used for improving other statistical and dynamical modeling systems.
Plöchl, Michael; Ossandón, José P.; König, Peter
2012-01-01
Eye movements introduce large artifacts to electroencephalographic recordings (EEG) and thus render data analysis difficult or even impossible. Trials contaminated by eye movement and blink artifacts have to be discarded, hence in standard EEG-paradigms subjects are required to fixate on the screen. To overcome this restriction, several correction methods including regression and blind source separation have been proposed. Yet, there is no automated standard procedure established. By simultaneously recording eye movements and 64-channel-EEG during a guided eye movement paradigm, we investigate and review the properties of eye movement artifacts, including corneo-retinal dipole changes, saccadic spike potentials and eyelid artifacts, and study their interrelations during different types of eye- and eyelid movements. In concordance with earlier studies our results confirm that these artifacts arise from different independent sources and that depending on electrode site, gaze direction, and choice of reference these sources contribute differently to the measured signal. We assess the respective implications for artifact correction methods and therefore compare the performance of two prominent approaches, namely linear regression and independent component analysis (ICA). We show and discuss that due to the independence of eye artifact sources, regression-based correction methods inevitably over- or under-correct individual artifact components, while ICA is in principle suited to address such mixtures of different types of artifacts. Finally, we propose an algorithm, which uses eye tracker information to objectively identify eye-artifact related ICA-components (ICs) in an automated manner. In the data presented here, the algorithm performed very similar to human experts when those were given both, the topographies of the ICs and their respective activations in a large amount of trials. Moreover it performed more reliable and almost twice as effective than human experts when those had to base their decision on IC topographies only. Furthermore, a receiver operating characteristic (ROC) analysis demonstrated an optimal balance of false positive and false negative at an area under curve (AUC) of more than 0.99. Removing the automatically detected ICs from the data resulted in removal or substantial suppression of ocular artifacts including microsaccadic spike potentials, while the relevant neural signal remained unaffected. In conclusion the present work aims at a better understanding of individual eye movement artifacts, their interrelations and the respective implications for eye artifact correction. Additionally, the proposed ICA-procedure provides a tool for optimized detection and correction of eye movement-related artifact components. PMID:23087632
NASA Astrophysics Data System (ADS)
Sumantari, Y. D.; Slamet, I.; Sugiyanto
2017-06-01
Semiparametric regression is a statistical analysis method that consists of parametric and nonparametric regression. There are various approach techniques in nonparametric regression. One of the approach techniques is spline. Central Java is one of the most densely populated province in Indonesia. Population density in this province can be modeled by semiparametric regression because it consists of parametric and nonparametric component. Therefore, the purpose of this paper is to determine the factors that in uence population density in Central Java using the semiparametric spline regression model. The result shows that the factors which in uence population density in Central Java is Family Planning (FP) active participants and district minimum wage.
Method of determining glass durability
Jantzen, C.M.; Pickett, J.B.; Brown, K.G.; Edwards, T.B.
1998-12-08
A process is described for determining one or more leachate concentrations of one or more components of a glass composition in an aqueous solution of the glass composition by identifying the components of the glass composition, including associated oxides, determining a preliminary glass dissolution estimator, {Delta}G{sub p}, based upon the free energies of hydration for the component reactant species, determining an accelerated glass dissolution function, {Delta}G{sub a}, based upon the free energy associated with weak acid dissociation, {Delta}G{sub a}{sup WA}, and accelerated matrix dissolution at high pH, {Delta}G{sub a}{sup SB} associated with solution strong base formation, and determining a final hydration free energy, {Delta}G{sub f}. This final hydration free energy is then used to determine leachate concentrations for elements of interest using a regression analysis and the formula log{sub 10}(N C{sub i}(g/L))=a{sub i} + b{sub i}{Delta}G{sub f}. The present invention also includes a method to determine whether a particular glass to be produced will be homogeneous or phase separated. The present invention is also directed to methods of monitoring and controlling processes for making glass using these determinations to modify the feedstock materials until a desired glass durability and homogeneity is obtained. 4 figs.
Method of determining glass durability
Jantzen, Carol Maryanne; Pickett, John Butler; Brown, Kevin George; Edwards, Thomas Barry
1998-01-01
A process for determining one or more leachate concentrations of one or more components of a glass composition in an aqueous solution of the glass composition by identifying the components of the glass composition, including associated oxides, determining a preliminary glass dissolution estimator, .DELTA.G.sub.p, based upon the free energies of hydration for the component reactant species, determining an accelerated glass dissolution function, .DELTA.G.sub.a, based upon the free energy associated with weak acid dissociation, .DELTA.G.sub.a.sup.WA, and accelerated matrix dissolution at high pH, .DELTA.G.sub.a.sup.SB associated with solution strong base formation, and determining a final hydration free energy, .DELTA.G.sub.f. This final hydration free energy is then used to determine leachate concentrations for elements of interest using a regression analysis and the formula log.sub.10 (N C.sub.i (g/L))=a.sub.i +b.sub.i .DELTA.G.sub.f. The present invention also includes a method to determine whether a particular glass to be produced will be homogeneous or phase separated. The present invention is also directed to methods of monitoring and controlling processes for making glass using these determinations to modify the feedstock materials until a desired glass durability and homogeneity is obtained.
A Demonstration of Regression False Positive Selection in Data Mining
ERIC Educational Resources Information Center
Pinder, Jonathan P.
2014-01-01
Business analytics courses, such as marketing research, data mining, forecasting, and advanced financial modeling, have substantial predictive modeling components. The predictive modeling in these courses requires students to estimate and test many linear regressions. As a result, false positive variable selection ("type I errors") is…
Genetic parameters of legendre polynomials for first parity lactation curves.
Pool, M H; Janss, L L; Meuwissen, T H
2000-11-01
Variance components of the covariance function coefficients in a random regression test-day model were estimated by Legendre polynomials up to a fifth order for first-parity records of Dutch dairy cows using Gibbs sampling. Two Legendre polynomials of equal order were used to model the random part of the lactation curve, one for the genetic component and one for permanent environment. Test-day records from cows registered between 1990 to 1996 and collected by regular milk recording were available. For the data set, 23,700 complete lactations were selected from 475 herds sired by 262 sires. Because the application of a random regression model is limited by computing capacity, we investigated the minimum order needed to fit the variance structure in the data sufficiently. Predictions of genetic and permanent environmental variance structures were compared with bivariate estimates on 30-d intervals. A third-order or higher polynomial modeled the shape of variance curves over DIM with sufficient accuracy for the genetic and permanent environment part. Also, the genetic correlation structure was fitted with sufficient accuracy by a third-order polynomial, but, for the permanent environmental component, a fourth order was needed. Because equal orders are suggested in the literature, a fourth-order Legendre polynomial is recommended in this study. However, a rank of three for the genetic covariance matrix and of four for permanent environment allows a simpler covariance function with a reduced number of parameters based on the eigenvalues and eigenvectors.
Effectiveness of a worksite mindfulness-based multi-component intervention on lifestyle behaviors
2014-01-01
Introduction Overweight and obesity are associated with an increased risk of morbidity. Mindfulness training could be an effective strategy to optimize lifestyle behaviors related to body weight gain. The aim of this study was to evaluate the effectiveness of a worksite mindfulness-based multi-component intervention on vigorous physical activity in leisure time, sedentary behavior at work, fruit intake and determinants of these behaviors. The control group received information on existing lifestyle behavior- related facilities that were already available at the worksite. Methods In a randomized controlled trial design (n = 257), 129 workers received a mindfulness training, followed by e-coaching, lunch walking routes and fruit. Outcome measures were assessed at baseline and after 6 and 12 months using questionnaires. Physical activity was also measured using accelerometers. Effects were analyzed using linear mixed effect models according to the intention-to-treat principle. Linear regression models (complete case analyses) were used as sensitivity analyses. Results There were no significant differences in lifestyle behaviors and determinants of these behaviors between the intervention and control group after 6 or 12 months. The sensitivity analyses showed effect modification for gender in sedentary behavior at work at 6-month follow-up, although the main analyses did not. Conclusions This study did not show an effect of a worksite mindfulness-based multi-component intervention on lifestyle behaviors and behavioral determinants after 6 and 12 months. The effectiveness of a worksite mindfulness-based multi-component intervention as a health promotion intervention for all workers could not be established. PMID:24467802
Comparison of multi-subject ICA methods for analysis of fMRI data
Erhardt, Erik Barry; Rachakonda, Srinivas; Bedrick, Edward; Allen, Elena; Adali, Tülay; Calhoun, Vince D.
2010-01-01
Spatial independent component analysis (ICA) applied to functional magnetic resonance imaging (fMRI) data identifies functionally connected networks by estimating spatially independent patterns from their linearly mixed fMRI signals. Several multi-subject ICA approaches estimating subject-specific time courses (TCs) and spatial maps (SMs) have been developed, however there has not yet been a full comparison of the implications of their use. Here, we provide extensive comparisons of four multi-subject ICA approaches in combination with data reduction methods for simulated and fMRI task data. For multi-subject ICA, the data first undergo reduction at the subject and group levels using principal component analysis (PCA). Comparisons of subject-specific, spatial concatenation, and group data mean subject-level reduction strategies using PCA and probabilistic PCA (PPCA) show that computationally intensive PPCA is equivalent to PCA, and that subject-specific and group data mean subject-level PCA are preferred because of well-estimated TCs and SMs. Second, aggregate independent components are estimated using either noise free ICA or probabilistic ICA (PICA). Third, subject-specific SMs and TCs are estimated using back-reconstruction. We compare several direct group ICA (GICA) back-reconstruction approaches (GICA1-GICA3) and an indirect back-reconstruction approach, spatio-temporal regression (STR, or dual regression). Results show the earlier group ICA (GICA1) approximates STR, however STR has contradictory assumptions and may show mixed-component artifacts in estimated SMs. Our evidence-based recommendation is to use GICA3, introduced here, with subject-specific PCA and noise-free ICA, providing the most robust and accurate estimated SMs and TCs in addition to offering an intuitive interpretation. PMID:21162045
Rank estimation and the multivariate analysis of in vivo fast-scan cyclic voltammetric data
Keithley, Richard B.; Carelli, Regina M.; Wightman, R. Mark
2010-01-01
Principal component regression has been used in the past to separate current contributions from different neuromodulators measured with in vivo fast-scan cyclic voltammetry. Traditionally, a percent cumulative variance approach has been used to determine the rank of the training set voltammetric matrix during model development, however this approach suffers from several disadvantages including the use of arbitrary percentages and the requirement of extreme precision of training sets. Here we propose that Malinowski’s F-test, a method based on a statistical analysis of the variance contained within the training set, can be used to improve factor selection for the analysis of in vivo fast-scan cyclic voltammetric data. These two methods of rank estimation were compared at all steps in the calibration protocol including the number of principal components retained, overall noise levels, model validation as determined using a residual analysis procedure, and predicted concentration information. By analyzing 119 training sets from two different laboratories amassed over several years, we were able to gain insight into the heterogeneity of in vivo fast-scan cyclic voltammetric data and study how differences in factor selection propagate throughout the entire principal component regression analysis procedure. Visualizing cyclic voltammetric representations of the data contained in the retained and discarded principal components showed that using Malinowski’s F-test for rank estimation of in vivo training sets allowed for noise to be more accurately removed. Malinowski’s F-test also improved the robustness of our criterion for judging multivariate model validity, even though signal-to-noise ratios of the data varied. In addition, pH change was the majority noise carrier of in vivo training sets while dopamine prediction was more sensitive to noise. PMID:20527815
Cook, Nicola A; Kim, Jin Un; Pasha, Yasmin; Crossey, Mary Me; Schembri, Adrian J; Harel, Brian T; Kimhofer, Torben; Taylor-Robinson, Simon D
2017-01-01
Psychometric testing is used to identify patients with cirrhosis who have developed hepatic encephalopathy (HE). Most batteries consist of a series of paper-and-pencil tests, which are cumbersome for most clinicians. A modern, easy-to-use, computer-based battery would be a helpful clinical tool, given that in its minimal form, HE has an impact on both patients' quality of life and the ability to drive and operate machinery (with societal consequences). We compared the Cogstate™ computer battery testing with the Psychometric Hepatic Encephalopathy Score (PHES) tests, with a view to simplify the diagnosis. This was a prospective study of 27 patients with histologically proven cirrhosis. An analysis of psychometric testing was performed using accuracy of task performance and speed of completion as primary variables to create a correlation matrix. A stepwise linear regression analysis was performed with backward elimination, using analysis of variance. Strong correlations were found between the international shopping list, international shopping list delayed recall of Cogstate and the PHES digit symbol test. The Shopping List Tasks were the only tasks that consistently had P values of <0.05 in the linear regression analysis. Subtests of the Cogstate battery correlated very strongly with the digit symbol component of PHES in discriminating severity of HE. These findings would indicate that components of the current PHES battery with the international shopping list tasks of Cogstate would be discriminant and have the potential to be used easily in clinical practice.
Using Dual Regression to Investigate Network Shape and Amplitude in Functional Connectivity Analyses
Nickerson, Lisa D.; Smith, Stephen M.; Öngür, Döst; Beckmann, Christian F.
2017-01-01
Independent Component Analysis (ICA) is one of the most popular techniques for the analysis of resting state FMRI data because it has several advantageous properties when compared with other techniques. Most notably, in contrast to a conventional seed-based correlation analysis, it is model-free and multivariate, thus switching the focus from evaluating the functional connectivity of single brain regions identified a priori to evaluating brain connectivity in terms of all brain resting state networks (RSNs) that simultaneously engage in oscillatory activity. Furthermore, typical seed-based analysis characterizes RSNs in terms of spatially distributed patterns of correlation (typically by means of simple Pearson's coefficients) and thereby confounds together amplitude information of oscillatory activity and noise. ICA and other regression techniques, on the other hand, retain magnitude information and therefore can be sensitive to both changes in the spatially distributed nature of correlations (differences in the spatial pattern or “shape”) as well as the amplitude of the network activity. Furthermore, motion can mimic amplitude effects so it is crucial to use a technique that retains such information to ensure that connectivity differences are accurately localized. In this work, we investigate the dual regression approach that is frequently applied with group ICA to assess group differences in resting state functional connectivity of brain networks. We show how ignoring amplitude effects and how excessive motion corrupts connectivity maps and results in spurious connectivity differences. We also show how to implement the dual regression to retain amplitude information and how to use dual regression outputs to identify potential motion effects. Two key findings are that using a technique that retains magnitude information, e.g., dual regression, and using strict motion criteria are crucial for controlling both network amplitude and motion-related amplitude effects, respectively, in resting state connectivity analyses. We illustrate these concepts using realistic simulated resting state FMRI data and in vivo data acquired in healthy subjects and patients with bipolar disorder and schizophrenia. PMID:28348512
Structured functional additive regression in reproducing kernel Hilbert spaces.
Zhu, Hongxiao; Yao, Fang; Zhang, Hao Helen
2014-06-01
Functional additive models (FAMs) provide a flexible yet simple framework for regressions involving functional predictors. The utilization of data-driven basis in an additive rather than linear structure naturally extends the classical functional linear model. However, the critical issue of selecting nonlinear additive components has been less studied. In this work, we propose a new regularization framework for the structure estimation in the context of Reproducing Kernel Hilbert Spaces. The proposed approach takes advantage of the functional principal components which greatly facilitates the implementation and the theoretical analysis. The selection and estimation are achieved by penalized least squares using a penalty which encourages the sparse structure of the additive components. Theoretical properties such as the rate of convergence are investigated. The empirical performance is demonstrated through simulation studies and a real data application.
Introduction to uses and interpretation of principal component analyses in forest biology.
J. G. Isebrands; Thomas R. Crow
1975-01-01
The application of principal component analysis for interpretation of multivariate data sets is reviewed with emphasis on (1) reduction of the number of variables, (2) ordination of variables, and (3) applications in conjunction with multiple regression.
A hybrid PCA-CART-MARS-based prognostic approach of the remaining useful life for aircraft engines.
Sánchez Lasheras, Fernando; García Nieto, Paulino José; de Cos Juez, Francisco Javier; Mayo Bayón, Ricardo; González Suárez, Victor Manuel
2015-03-23
Prognostics is an engineering discipline that predicts the future health of a system. In this research work, a data-driven approach for prognostics is proposed. Indeed, the present paper describes a data-driven hybrid model for the successful prediction of the remaining useful life of aircraft engines. The approach combines the multivariate adaptive regression splines (MARS) technique with the principal component analysis (PCA), dendrograms and classification and regression trees (CARTs). Elements extracted from sensor signals are used to train this hybrid model, representing different levels of health for aircraft engines. In this way, this hybrid algorithm is used to predict the trends of these elements. Based on this fitting, one can determine the future health state of a system and estimate its remaining useful life (RUL) with accuracy. To evaluate the proposed approach, a test was carried out using aircraft engine signals collected from physical sensors (temperature, pressure, speed, fuel flow, etc.). Simulation results show that the PCA-CART-MARS-based approach can forecast faults long before they occur and can predict the RUL. The proposed hybrid model presents as its main advantage the fact that it does not require information about the previous operation states of the input variables of the engine. The performance of this model was compared with those obtained by other benchmark models (multivariate linear regression and artificial neural networks) also applied in recent years for the modeling of remaining useful life. Therefore, the PCA-CART-MARS-based approach is very promising in the field of prognostics of the RUL for aircraft engines.
Bao, Jie; Hou, Zhangshuan; Huang, Maoyi; ...
2015-12-04
Here, effective sensitivity analysis approaches are needed to identify important parameters or factors and their uncertainties in complex Earth system models composed of multi-phase multi-component phenomena and multiple biogeophysical-biogeochemical processes. In this study, the impacts of 10 hydrologic parameters in the Community Land Model on simulations of runoff and latent heat flux are evaluated using data from a watershed. Different metrics, including residual statistics, the Nash-Sutcliffe coefficient, and log mean square error, are used as alternative measures of the deviations between the simulated and field observed values. Four sensitivity analysis (SA) approaches, including analysis of variance based on the generalizedmore » linear model, generalized cross validation based on the multivariate adaptive regression splines model, standardized regression coefficients based on a linear regression model, and analysis of variance based on support vector machine, are investigated. Results suggest that these approaches show consistent measurement of the impacts of major hydrologic parameters on response variables, but with differences in the relative contributions, particularly for the secondary parameters. The convergence behaviors of the SA with respect to the number of sampling points are also examined with different combinations of input parameter sets and output response variables and their alternative metrics. This study helps identify the optimal SA approach, provides guidance for the calibration of the Community Land Model parameters to improve the model simulations of land surface fluxes, and approximates the magnitudes to be adjusted in the parameter values during parametric model optimization.« less
A Hybrid PCA-CART-MARS-Based Prognostic Approach of the Remaining Useful Life for Aircraft Engines
Lasheras, Fernando Sánchez; Nieto, Paulino José García; de Cos Juez, Francisco Javier; Bayón, Ricardo Mayo; Suárez, Victor Manuel González
2015-01-01
Prognostics is an engineering discipline that predicts the future health of a system. In this research work, a data-driven approach for prognostics is proposed. Indeed, the present paper describes a data-driven hybrid model for the successful prediction of the remaining useful life of aircraft engines. The approach combines the multivariate adaptive regression splines (MARS) technique with the principal component analysis (PCA), dendrograms and classification and regression trees (CARTs). Elements extracted from sensor signals are used to train this hybrid model, representing different levels of health for aircraft engines. In this way, this hybrid algorithm is used to predict the trends of these elements. Based on this fitting, one can determine the future health state of a system and estimate its remaining useful life (RUL) with accuracy. To evaluate the proposed approach, a test was carried out using aircraft engine signals collected from physical sensors (temperature, pressure, speed, fuel flow, etc.). Simulation results show that the PCA-CART-MARS-based approach can forecast faults long before they occur and can predict the RUL. The proposed hybrid model presents as its main advantage the fact that it does not require information about the previous operation states of the input variables of the engine. The performance of this model was compared with those obtained by other benchmark models (multivariate linear regression and artificial neural networks) also applied in recent years for the modeling of remaining useful life. Therefore, the PCA-CART-MARS-based approach is very promising in the field of prognostics of the RUL for aircraft engines. PMID:25806876
Iorgulescu, E; Voicu, V A; Sârbu, C; Tache, F; Albu, F; Medvedovici, A
2016-08-01
The influence of the experimental variability (instrumental repeatability, instrumental intermediate precision and sample preparation variability) and data pre-processing (normalization, peak alignment, background subtraction) on the discrimination power of multivariate data analysis methods (Principal Component Analysis -PCA- and Cluster Analysis -CA-) as well as a new algorithm based on linear regression was studied. Data used in the study were obtained through positive or negative ion monitoring electrospray mass spectrometry (+/-ESI/MS) and reversed phase liquid chromatography/UV spectrometric detection (RPLC/UV) applied to green tea extracts. Extractions in ethanol and heated water infusion were used as sample preparation procedures. The multivariate methods were directly applied to mass spectra and chromatograms, involving strictly a holistic comparison of shapes, without assignment of any structural identity to compounds. An alternative data interpretation based on linear regression analysis mutually applied to data series is also discussed. Slopes, intercepts and correlation coefficients produced by the linear regression analysis applied on pairs of very large experimental data series successfully retain information resulting from high frequency instrumental acquisition rates, obviously better defining the profiles being compared. Consequently, each type of sample or comparison between samples produces in the Cartesian space an ellipsoidal volume defined by the normal variation intervals of the slope, intercept and correlation coefficient. Distances between volumes graphically illustrates (dis)similarities between compared data. The instrumental intermediate precision had the major effect on the discrimination power of the multivariate data analysis methods. Mass spectra produced through ionization from liquid state in atmospheric pressure conditions of bulk complex mixtures resulting from extracted materials of natural origins provided an excellent data basis for multivariate analysis methods, equivalent to data resulting from chromatographic separations. The alternative evaluation of very large data series based on linear regression analysis produced information equivalent to results obtained through application of PCA an CA. Copyright © 2016 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
McCulley, Jonathan M.
This research investigates the application of additive manufacturing techniques for fabricating hybrid rocket fuel grains composed of porous Acrylonitrile-butadiene-styrene impregnated with paraffin wax. The digitally manufactured ABS substrate provides mechanical support for the paraffin fuel material and serves as an additional fuel component. The embedded paraffin provides an enhanced fuel regression rate while having no detrimental effect on the thermodynamic burn properties of the fuel grain. Multiple fuel grains with various ABS-to-Paraffin mass ratios were fabricated and burned with nitrous oxide. Analytical predictions for end-to-end motor performance and fuel regression are compared against static test results. Baseline fuel grain regression calculations use an enthalpy balance energy analysis with the material and thermodynamic properties based on the mean paraffin/ABS mass fractions within the fuel grain. In support of these analytical comparisons, a novel method for propagating the fuel port burn surface was developed. In this modeling approach the fuel cross section grid is modeled as an image with white pixels representing the fuel and black pixels representing empty or burned grid cells.
Zhang, Ni; Liu, Xu; Jin, Xiaoduo; Li, Chen; Wu, Xuan; Yang, Shuqin; Ning, Jifeng; Yanne, Paul
2017-12-15
Phenolics contents in wine grapes are key indicators for assessing ripeness. Near-infrared hyperspectral images during ripening have been explored to achieve an effective method for predicting phenolics contents. Principal component regression (PCR), partial least squares regression (PLSR) and support vector regression (SVR) models were built, respectively. The results show that SVR behaves globally better than PLSR and PCR, except in predicting tannins content of seeds. For the best prediction results, the squared correlation coefficient and root mean square error reached 0.8960 and 0.1069g/L (+)-catechin equivalents (CE), respectively, for tannins in skins, 0.9065 and 0.1776 (g/L CE) for total iron-reactive phenolics (TIRP) in skins, 0.8789 and 0.1442 (g/L M3G) for anthocyanins in skins, 0.9243 and 0.2401 (g/L CE) for tannins in seeds, and 0.8790 and 0.5190 (g/L CE) for TIRP in seeds. Our results indicated that NIR hyperspectral imaging has good prospects for evaluation of phenolics in wine grapes. Copyright © 2017 Elsevier Ltd. All rights reserved.
New machine-learning algorithms for prediction of Parkinson's disease
NASA Astrophysics Data System (ADS)
Mandal, Indrajit; Sairam, N.
2014-03-01
This article presents an enhanced prediction accuracy of diagnosis of Parkinson's disease (PD) to prevent the delay and misdiagnosis of patients using the proposed robust inference system. New machine-learning methods are proposed and performance comparisons are based on specificity, sensitivity, accuracy and other measurable parameters. The robust methods of treating Parkinson's disease (PD) includes sparse multinomial logistic regression, rotation forest ensemble with support vector machines and principal components analysis, artificial neural networks, boosting methods. A new ensemble method comprising of the Bayesian network optimised by Tabu search algorithm as classifier and Haar wavelets as projection filter is used for relevant feature selection and ranking. The highest accuracy obtained by linear logistic regression and sparse multinomial logistic regression is 100% and sensitivity, specificity of 0.983 and 0.996, respectively. All the experiments are conducted over 95% and 99% confidence levels and establish the results with corrected t-tests. This work shows a high degree of advancement in software reliability and quality of the computer-aided diagnosis system and experimentally shows best results with supportive statistical inference.
State-space decoding of primary afferent neuron firing rates
NASA Astrophysics Data System (ADS)
Wagenaar, J. B.; Ventura, V.; Weber, D. J.
2011-02-01
Kinematic state feedback is important for neuroprostheses to generate stable and adaptive movements of an extremity. State information, represented in the firing rates of populations of primary afferent (PA) neurons, can be recorded at the level of the dorsal root ganglia (DRG). Previous work in cats showed the feasibility of using DRG recordings to predict the kinematic state of the hind limb using reverse regression. Although accurate decoding results were attained, reverse regression does not make efficient use of the information embedded in the firing rates of the neural population. In this paper, we present decoding results based on state-space modeling, and show that it is a more principled and more efficient method for decoding the firing rates in an ensemble of PA neurons. In particular, we show that we can extract confounded information from neurons that respond to multiple kinematic parameters, and that including velocity components in the firing rate models significantly increases the accuracy of the decoded trajectory. We show that, on average, state-space decoding is twice as efficient as reverse regression for decoding joint and endpoint kinematics.
New methodology for modeling annual-aircraft emissions at airports
DOE Office of Scientific and Technical Information (OSTI.GOV)
Woodmansey, B.G.; Patterson, J.G.
An as-accurate-as-possible estimation of total-aircraft emissions are an essential component of any environmental-impact assessment done for proposed expansions at major airports. To determine the amount of emissions generated by aircraft using present models it is necessary to know the emission characteristics of all engines that are on all planes using the airport. However, the published data base does not cover all engine types and, therefore, a new methodology is needed to assist in estimating annual emissions from aircraft at airports. Linear regression equations relating quantity of emissions to aircraft weight using a known-fleet mix are developed in this paper. Total-annualmore » emissions for CO, NO[sub x], NMHC, SO[sub x], CO[sub 2], and N[sub 2]O are tabulated for Toronto's international airport for 1990. The regression equations are statistically significant for all emissions except for NMHC from large jets and NO[sub x] and NMHC for piston-engine aircraft. This regression model is a relatively simple, fast, and inexpensive method of obtaining an annual-emission inventory for an airport.« less
“Smooth” Semiparametric Regression Analysis for Arbitrarily Censored Time-to-Event Data
Zhang, Min; Davidian, Marie
2008-01-01
Summary A general framework for regression analysis of time-to-event data subject to arbitrary patterns of censoring is proposed. The approach is relevant when the analyst is willing to assume that distributions governing model components that are ordinarily left unspecified in popular semiparametric regression models, such as the baseline hazard function in the proportional hazards model, have densities satisfying mild “smoothness” conditions. Densities are approximated by a truncated series expansion that, for fixed degree of truncation, results in a “parametric” representation, which makes likelihood-based inference coupled with adaptive choice of the degree of truncation, and hence flexibility of the model, computationally and conceptually straightforward with data subject to any pattern of censoring. The formulation allows popular models, such as the proportional hazards, proportional odds, and accelerated failure time models, to be placed in a common framework; provides a principled basis for choosing among them; and renders useful extensions of the models straightforward. The utility and performance of the methods are demonstrated via simulations and by application to data from time-to-event studies. PMID:17970813
Dinç, Erdal; Ozdemir, Abdil
2005-01-01
Multivariate chromatographic calibration technique was developed for the quantitative analysis of binary mixtures enalapril maleate (EA) and hydrochlorothiazide (HCT) in tablets in the presence of losartan potassium (LST). The mathematical algorithm of multivariate chromatographic calibration technique is based on the use of the linear regression equations constructed using relationship between concentration and peak area at the five-wavelength set. The algorithm of this mathematical calibration model having a simple mathematical content was briefly described. This approach is a powerful mathematical tool for an optimum chromatographic multivariate calibration and elimination of fluctuations coming from instrumental and experimental conditions. This multivariate chromatographic calibration contains reduction of multivariate linear regression functions to univariate data set. The validation of model was carried out by analyzing various synthetic binary mixtures and using the standard addition technique. Developed calibration technique was applied to the analysis of the real pharmaceutical tablets containing EA and HCT. The obtained results were compared with those obtained by classical HPLC method. It was observed that the proposed multivariate chromatographic calibration gives better results than classical HPLC.
Prenatal exposure to traffic-related air pollution and risk of early childhood cancers.
Ghosh, Jo Kay C; Heck, Julia E; Cockburn, Myles; Su, Jason; Jerrett, Michael; Ritz, Beate
2013-10-15
Exposure to air pollution during pregnancy has been linked to the risk of childhood cancer, but the evidence remains inconclusive. In the present study, we used land use regression modeling to estimate prenatal exposures to traffic exhaust and evaluate the associations with cancer risk in very young children. Participants in the Air Pollution and Childhood Cancers Study who were 5 years of age or younger and diagnosed with cancer between 1988 and 2008 were had their records linked to California birth certificates, and controls were selected from birth certificates. Land use regression-based estimates of exposures to nitric oxide, nitrogen dioxide, and nitrogen oxides were assigned based on birthplace residence and temporally adjusted using routine monitoring station data to evaluate air pollution exposures during specific pregnancy periods. Logistic regression models were adjusted for maternal age, race/ethnicity, educational level, parity, insurance type, and Census-based socioeconomic status, as well as child's sex and birth year. The odds of acute lymphoblastic leukemia increased by 9%, 23%, and 8% for each 25-ppb increase in average nitric oxide, nitrogen dioxide, and nitrogen oxide levels, respectively, over the entire pregnancy. Second- and third-trimester exposures increased the odds of bilateral retinoblastoma. No associations were found for annual average exposures without temporal components or for any other cancer type. These results lend support to a link between prenatal exposure to traffic exhaust and the risk of acute lymphoblastic leukemia and bilateral retinoblastoma.
Hedeker, D; Flay, B R; Petraitis, J
1996-02-01
Methods are proposed and described for estimating the degree to which relations among variables vary at the individual level. As an example of the methods, M. Fishbein and I. Ajzen's (1975; I. Ajzen & M. Fishbein, 1980) theory of reasoned action is examined, which posits first that an individual's behavioral intentions are a function of 2 components: the individual's attitudes toward the behavior and the subjective norms as perceived by the individual. A second component of their theory is that individuals may weight these 2 components differently in assessing their behavioral intentions. This article illustrates the use of empirical Bayes methods based on a random-effects regression model to estimate these individual influences, estimating an individual's weighting of both of these components (attitudes toward the behavior and subjective norms) in relation to their behavioral intentions. This method can be used when an individual's behavioral intentions, subjective norms, and attitudes toward the behavior are all repeatedly measured. In this case, the empirical Bayes estimates are derived as a function of the data from the individual, strengthened by the overall sample data.
NASA Astrophysics Data System (ADS)
Jamshidieini, Bahman; Fazaee, Reza
2016-05-01
Distribution network components connect machines and other loads to electrical sources. If resistance or current of any component is more than specified range, its temperature may exceed the operational limit which can cause major problems. Therefore, these defects should be found and eliminated according to their severity. Although infra-red cameras have been used for inspection of electrical components, maintenance prioritization of distribution cubicles is mostly based on personal perception and lack of training data prevents engineers from developing image processing methods. New research on the spatial control chart encouraged us to use statistical approaches instead of the pattern recognition for the image processing. In the present study, a new scanning pattern which can tolerate heavy autocorrelation among adjacent pixels within infra-red image was developed and for the first time combination of kernel smoothing, spatial control charts and local robust regression were used for finding defects within heterogeneous infra-red images of old distribution cubicles. This method does not need training data and this advantage is crucially important when the training data is not available.
The Impact of State Legislation and Model Policies on Bullying in Schools.
Terry, Amanda
2018-04-01
The purpose of this study was to determine the impact of the coverage of state legislation and the expansiveness ratings of state model policies on the state-level prevalence of bullying in schools. The state-level prevalence of bullying in schools was based on cross-sectional data from the 2013 High School Youth Risk Behavior Survey. Multiple regression was conducted to determine whether the coverage of state legislation and the expansiveness rating of a state model policy affected the state-level prevalence of bullying in schools. The purpose and definition category of components in state legislation and the expansiveness rating of a state model policy were statistically significant predictors of the state-level prevalence of bullying in schools. The other 3 categories of components in state legislation-District Policy Development and Review, District Policy Components, and Additional Components-were not statistically significant predictors in the model. Extensive coverage in the purpose and definition category of components in state legislation and a high expansiveness rating of a state model policy may be important in efforts to reduce bullying in schools. Improving these areas may reduce the state-level prevalence of bullying in schools. © 2018, American School Health Association.
Zhou, Yan; Cao, Hui
2013-01-01
We propose an augmented classical least squares (ACLS) calibration method for quantitative Raman spectral analysis against component information loss. The Raman spectral signals with low analyte concentration correlations were selected and used as the substitutes for unknown quantitative component information during the CLS calibration procedure. The number of selected signals was determined by using the leave-one-out root-mean-square error of cross-validation (RMSECV) curve. An ACLS model was built based on the augmented concentration matrix and the reference spectral signal matrix. The proposed method was compared with partial least squares (PLS) and principal component regression (PCR) using one example: a data set recorded from an experiment of analyte concentration determination using Raman spectroscopy. A 2-fold cross-validation with Venetian blinds strategy was exploited to evaluate the predictive power of the proposed method. The one-way variance analysis (ANOVA) was used to access the predictive power difference between the proposed method and existing methods. Results indicated that the proposed method is effective at increasing the robust predictive power of traditional CLS model against component information loss and its predictive power is comparable to that of PLS or PCR.
Shan, Si-Ming; Luo, Jian-Guang; Huang, Fang; Kong, Ling-Yi
2014-02-01
Panax ginseng C.A. Meyer has been known as a valuable traditional Chinese medicines for thousands years of history. Ginsenosides, the main active constituents, exhibit prominent immunoregulation effect. The present study first describes a holistic method based on chemical characteristic and lymphocyte proliferative capacity to evaluate systematically the quality of P. ginseng in thirty samples from different seasons during 2-6 years. The HPLC fingerprints were evaluated using principle component analysis (PCA) and hierarchical clustering analysis (HCA). The spectrum-efficacy model between HPLC fingerprints and T-lymphocyte proliferative activities was investigated by principal component regression (PCR) and partial least squares (PLS). The results indicated that the growth of the ginsenosides could be grouped into three periods and from August of the fifth year, P. ginseng appeared significant lymphocyte proliferative capacity. Close correlation existed between the spectrum-efficacy relationship and ginsenosides Rb1, Ro, Rc, Rb2 and Re were the main contributive components to the lymphocyte proliferative capacity. This comprehensive strategy, providing reliable and adequate scientific evidence, could be applied to other TCMs to ameliorate their quality control. Copyright © 2013 Elsevier B.V. All rights reserved.
Cao, Hui; Yan, Xingyu; Li, Yaojiang; Wang, Yanxia; Zhou, Yan; Yang, Sanchun
2014-01-01
Quantitative analysis for the flue gas of natural gas-fired generator is significant for energy conservation and emission reduction. The traditional partial least squares method may not deal with the nonlinear problems effectively. In the paper, a nonlinear partial least squares method with extended input based on radial basis function neural network (RBFNN) is used for components prediction of flue gas. For the proposed method, the original independent input matrix is the input of RBFNN and the outputs of hidden layer nodes of RBFNN are the extension term of the original independent input matrix. Then, the partial least squares regression is performed on the extended input matrix and the output matrix to establish the components prediction model of flue gas. A near-infrared spectral dataset of flue gas of natural gas combustion is used for estimating the effectiveness of the proposed method compared with PLS. The experiments results show that the root-mean-square errors of prediction values of the proposed method for methane, carbon monoxide, and carbon dioxide are, respectively, reduced by 4.74%, 21.76%, and 5.32% compared to those of PLS. Hence, the proposed method has higher predictive capabilities and better robustness.
Anastácio, Ana; Carvalho, Isabel Saraiva de
2015-08-01
A beverage benchtop prototype related to oxidative stress protection was developed based on sweet potato peels phenolics. Formula components were sweet potato peel (Ipomoeas batatas L.) aqueous extract (SPPE), sweet potato leaves water extract (SPLE) and honey solution (HonS). According to linear squares regression (LSR) models, SPLE presented higher additive effect on total phenolic content (TPC), FRAP and DPPH than the other components. All antagonist interactions were not significant. The optimum formula obtained by artificial neural networks (ANN) analysis was 50.0% of SPPE, 21.5% of SPLE and 28.5% of HonS. Predicted responses of TPC, FRAP, DPPH and soluble solids were 309 mg GAE/L, 476 mg TE/L, 1098 mg TE/L and 12.3 °Brix, respectively. Optimization with LSR models was similar to ANN. Beverage prototype results positioned next to commercial vegetable and fruit beverages, thus it has an interesting potential to the market of health and wellness.
Chen, Hao; Xie, Xiaoyun; Shu, Wanneng; Xiong, Naixue
2016-10-15
With the rapid growth of wireless sensor applications, the user interfaces and configurations of smart homes have become so complicated and inflexible that users usually have to spend a great amount of time studying them and adapting to their expected operation. In order to improve user experience, a weighted hybrid recommender system based on a Kalman Filter model is proposed to predict what users might want to do next, especially when users are located in a smart home with an enhanced living environment. Specifically, a weight hybridization method was introduced, which combines contextual collaborative filter and the contextual content-based recommendations. This method inherits the advantages of the optimum regression and the stability features of the proposed adaptive Kalman Filter model, and it can predict and revise the weight of each system component dynamically. Experimental results show that the hybrid recommender system can optimize the distribution of weights of each component, and achieve more reasonable recall and precision rates.
Chen, Hao; Xie, Xiaoyun; Shu, Wanneng; Xiong, Naixue
2016-01-01
With the rapid growth of wireless sensor applications, the user interfaces and configurations of smart homes have become so complicated and inflexible that users usually have to spend a great amount of time studying them and adapting to their expected operation. In order to improve user experience, a weighted hybrid recommender system based on a Kalman Filter model is proposed to predict what users might want to do next, especially when users are located in a smart home with an enhanced living environment. Specifically, a weight hybridization method was introduced, which combines contextual collaborative filter and the contextual content-based recommendations. This method inherits the advantages of the optimum regression and the stability features of the proposed adaptive Kalman Filter model, and it can predict and revise the weight of each system component dynamically. Experimental results show that the hybrid recommender system can optimize the distribution of weights of each component, and achieve more reasonable recall and precision rates. PMID:27754456
Khazaei, Salman; Rezaeian, Shahab; Khazaei, Somayeh; Mansori, Kamyar; Sanjari Moghaddam, Ali; Ayubi, Erfan
2016-01-01
Geographic disparity for colorectal cancer (CRC) incidence and mortality according to the human development index (HDI) might be expected. This study aimed at quantifying the effect measure of association HDI and its components on the CRC incidence and mortality. In this ecological study, CRC incidence and mortality was obtained from GLOBOCAN, the global cancer project for 172 countries. Data were extracted about HDI 2013 for 169 countries from the World Bank report. Linear regression was constructed to measure effects of HDI and its components on CRC incidence and mortality. A positive trend between increasing HDI of countries and age-standardized rates per 100,000 of CRC incidence and mortality was observed. Among HDI components education was the strongest effect measure of association on CRC incidence and mortality, regression coefficients (95% confidence intervals) being 2.8 (2.4, 3.2) and 0.9 (0.8, 1), respectively. HDI and its components were positively related with CRC incidence and mortality and can be considered as targets for prevention and treatment intervention or tracking geographic disparities.
Liu, Hui-lin; Wan, Xia; Yang, Gong-huan
2013-02-01
To explore the relationship between the strength of tobacco control and the effectiveness of creating smoke-free hospital, and summarize the main factors that affect the program of creating smoke-free hospitals. A total of 210 hospitals from 7 provinces/municipalities directly under the central government were enrolled in this study using stratified random sampling method. Principle component analysis and regression analysis were conducted to analyze the strength of tobacco control and the effectiveness of creating smoke-free hospitals. Two principal components were extracted in the strength of tobacco control index, which respectively reflected the tobacco control policies and efforts, and the willingness and leadership of hospital managers regarding tobacco control. The regression analysis indicated that only the first principal component was significantly correlated with the progression in creating smoke-free hospital (P<0.001), i.e. hospitals with higher scores on the first principal component had better achievements in smoke-free environment creation. Tobacco control policies and efforts are critical in creating smoke-free hospitals. The principal component analysis provides a comprehensive and objective tool for evaluating the creation of smoke-free hospitals.
Haplotype-Based Association Analysis via Variance-Components Score Test
Tzeng, Jung-Ying ; Zhang, Daowen
2007-01-01
Haplotypes provide a more informative format of polymorphisms for genetic association analysis than do individual single-nucleotide polymorphisms. However, the practical efficacy of haplotype-based association analysis is challenged by a trade-off between the benefits of modeling abundant variation and the cost of the extra degrees of freedom. To reduce the degrees of freedom, several strategies have been considered in the literature. They include (1) clustering evolutionarily close haplotypes, (2) modeling the level of haplotype sharing, and (3) smoothing haplotype effects by introducing a correlation structure for haplotype effects and studying the variance components (VC) for association. Although the first two strategies enjoy a fair extent of power gain, empirical evidence showed that VC methods may exhibit only similar or less power than the standard haplotype regression method, even in cases of many haplotypes. In this study, we report possible reasons that cause the underpowered phenomenon and show how the power of the VC strategy can be improved. We construct a score test based on the restricted maximum likelihood or the marginal likelihood function of the VC and identify its nontypical limiting distribution. Through simulation, we demonstrate the validity of the test and investigate the power performance of the VC approach and that of the standard haplotype regression approach. With suitable choices for the correlation structure, the proposed method can be directly applied to unphased genotypic data. Our method is applicable to a wide-ranging class of models and is computationally efficient and easy to implement. The broad coverage and the fast and easy implementation of this method make the VC strategy an effective tool for haplotype analysis, even in modern genomewide association studies. PMID:17924336
Comparative study of outcome measures and analysis methods for traumatic brain injury trials.
Alali, Aziz S; Vavrek, Darcy; Barber, Jason; Dikmen, Sureyya; Nathens, Avery B; Temkin, Nancy R
2015-04-15
Batteries of functional and cognitive measures have been proposed as alternatives to the Extended Glasgow Outcome Scale (GOSE) as the primary outcome for traumatic brain injury (TBI) trials. We evaluated several approaches to analyzing GOSE and a battery of four functional and cognitive measures. Using data from a randomized trial, we created a "super" dataset of 16,550 subjects from patients with complete data (n=331) and then simulated multiple treatment effects across multiple outcome measures. Patients were sampled with replacement (bootstrapping) to generate 10,000 samples for each treatment effect (n=400 patients/group). The percentage of samples where the null hypothesis was rejected estimates the power. All analytic techniques had appropriate rates of type I error (≤5%). Accounting for baseline prognosis either by using sliding dichotomy for GOSE or using regression-based methods substantially increased the power over the corresponding analysis without accounting for prognosis. Analyzing GOSE using multivariate proportional odds regression or analyzing the four-outcome battery with regression-based adjustments had the highest power, assuming equal treatment effect across all components. Analyzing GOSE using a fixed dichotomy provided the lowest power for both unadjusted and regression-adjusted analyses. We assumed an equal treatment effect for all measures. This may not be true in an actual clinical trial. Accounting for baseline prognosis is critical to attaining high power in Phase III TBI trials. The choice of primary outcome for future trials should be guided by power, the domain of brain function that an intervention is likely to impact, and the feasibility of collecting outcome data.
Zhou, Qingping; Jiang, Haiyan; Wang, Jianzhou; Zhou, Jianling
2014-10-15
Exposure to high concentrations of fine particulate matter (PM₂.₅) can cause serious health problems because PM₂.₅ contains microscopic solid or liquid droplets that are sufficiently small to be ingested deep into human lungs. Thus, daily prediction of PM₂.₅ levels is notably important for regulatory plans that inform the public and restrict social activities in advance when harmful episodes are foreseen. A hybrid EEMD-GRNN (ensemble empirical mode decomposition-general regression neural network) model based on data preprocessing and analysis is firstly proposed in this paper for one-day-ahead prediction of PM₂.₅ concentrations. The EEMD part is utilized to decompose original PM₂.₅ data into several intrinsic mode functions (IMFs), while the GRNN part is used for the prediction of each IMF. The hybrid EEMD-GRNN model is trained using input variables obtained from principal component regression (PCR) model to remove redundancy. These input variables accurately and succinctly reflect the relationships between PM₂.₅ and both air quality and meteorological data. The model is trained with data from January 1 to November 1, 2013 and is validated with data from November 2 to November 21, 2013 in Xi'an Province, China. The experimental results show that the developed hybrid EEMD-GRNN model outperforms a single GRNN model without EEMD, a multiple linear regression (MLR) model, a PCR model, and a traditional autoregressive integrated moving average (ARIMA) model. The hybrid model with fast and accurate results can be used to develop rapid air quality warning systems. Copyright © 2014 Elsevier B.V. All rights reserved.
Comparative Study of Outcome Measures and Analysis Methods for Traumatic Brain Injury Trials
Alali, Aziz S.; Vavrek, Darcy; Barber, Jason; Dikmen, Sureyya; Nathens, Avery B.
2015-01-01
Abstract Batteries of functional and cognitive measures have been proposed as alternatives to the Extended Glasgow Outcome Scale (GOSE) as the primary outcome for traumatic brain injury (TBI) trials. We evaluated several approaches to analyzing GOSE and a battery of four functional and cognitive measures. Using data from a randomized trial, we created a “super” dataset of 16,550 subjects from patients with complete data (n=331) and then simulated multiple treatment effects across multiple outcome measures. Patients were sampled with replacement (bootstrapping) to generate 10,000 samples for each treatment effect (n=400 patients/group). The percentage of samples where the null hypothesis was rejected estimates the power. All analytic techniques had appropriate rates of type I error (≤5%). Accounting for baseline prognosis either by using sliding dichotomy for GOSE or using regression-based methods substantially increased the power over the corresponding analysis without accounting for prognosis. Analyzing GOSE using multivariate proportional odds regression or analyzing the four-outcome battery with regression-based adjustments had the highest power, assuming equal treatment effect across all components. Analyzing GOSE using a fixed dichotomy provided the lowest power for both unadjusted and regression-adjusted analyses. We assumed an equal treatment effect for all measures. This may not be true in an actual clinical trial. Accounting for baseline prognosis is critical to attaining high power in Phase III TBI trials. The choice of primary outcome for future trials should be guided by power, the domain of brain function that an intervention is likely to impact, and the feasibility of collecting outcome data. PMID:25317951
NASA Astrophysics Data System (ADS)
Ilie, Iulia; Dittrich, Peter; Carvalhais, Nuno; Jung, Martin; Heinemeyer, Andreas; Migliavacca, Mirco; Morison, James I. L.; Sippel, Sebastian; Subke, Jens-Arne; Wilkinson, Matthew; Mahecha, Miguel D.
2017-09-01
Accurate model representation of land-atmosphere carbon fluxes is essential for climate projections. However, the exact responses of carbon cycle processes to climatic drivers often remain uncertain. Presently, knowledge derived from experiments, complemented by a steadily evolving body of mechanistic theory, provides the main basis for developing such models. The strongly increasing availability of measurements may facilitate new ways of identifying suitable model structures using machine learning. Here, we explore the potential of gene expression programming (GEP) to derive relevant model formulations based solely on the signals present in data by automatically applying various mathematical transformations to potential predictors and repeatedly evolving the resulting model structures. In contrast to most other machine learning regression techniques, the GEP approach generates readable
models that allow for prediction and possibly for interpretation. Our study is based on two cases: artificially generated data and real observations. Simulations based on artificial data show that GEP is successful in identifying prescribed functions, with the prediction capacity of the models comparable to four state-of-the-art machine learning methods (random forests, support vector machines, artificial neural networks, and kernel ridge regressions). Based on real observations we explore the responses of the different components of terrestrial respiration at an oak forest in south-eastern England. We find that the GEP-retrieved models are often better in prediction than some established respiration models. Based on their structures, we find previously unconsidered exponential dependencies of respiration on seasonal ecosystem carbon assimilation and water dynamics. We noticed that the GEP models are only partly portable across respiration components, the identification of a general
terrestrial respiration model possibly prevented by equifinality issues. Overall, GEP is a promising tool for uncovering new model structures for terrestrial ecology in the data-rich era, complementing more traditional modelling approaches.
Naval Research Logistics Quarterly. Volume 28. Number 3,
1981-09-01
denotes component-wise maximum. f has antone (isotone) differences on C x D if for cl < c2 and d, < d2, NAVAL RESEARCH LOGISTICS QUARTERLY VOL. 28...or negative correlations and linear or nonlinear regressions. Given are the mo- ments to order two and, for special cases, (he regression function and...data sets. We designate this bnb distribution as G - B - N(a, 0, v). The distribution admits only of positive correlation and linear regressions
NASA Astrophysics Data System (ADS)
Prabhu, M.; Unnikrishnan, K.
2018-04-01
In the present work, we analyzed the daytime vertical E × B drift velocities obtained from Jicamarca Unattended Long-term Ionosphere Atmosphere (JULIA) radar and ΔH component of geomagnetic field measured as the difference between the magnitudes of the horizontal (H) components between two magnetometers deployed at two different locations Jicamarca, and Piura in Peru for 22 geomagnetically disturbed events in which either SC has occurred or Dstmax < -50 nT during the period 2006-2011. The ΔH component of geomagnetic field is measured as the differences in the magnitudes of horizontal H component between magnetometer placed directly on the magnetic equator and one displaced 6-9° away. It will provide a direct measure of the daytime electrojet current, due to the eastward electric field. This will in turn gives the magnitude of vertical E × B drift velocity in the F region. A positive correlation exists between peak values of daytime vertical E × B drift velocity and peak value of ΔH for the three consecutive days of the events. It was observed that 45% of the events have daytime vertical E × B drift velocity peak in the magnitude range 10-20 m/s and 20-30 m/s and 20% have peak ΔH in the magnitude range 50-60 nT and 80-90 nT. It was observed that the time of occurrence of the peak value of both the vertical E × B drift velocity and the ΔH have a maximum (40%) probability in the same time range 11:00-13:00 LT. We also investigated the correlation between E × B drift velocity and Dst index and the correlation between delta H and Dst index. A strong positive correlation is found between E × B drift and Dst index as well as between delta H and Dst Index. Three different techniques of data analysis - linear, polynomial (order 2), and polynomial (order 3) regression analysis were considered. The regression parameters in all the three cases were calculated using the Least Square Method (LSM), using the daytime vertical E × B drift velocity and ΔH. A formula was developed which indicates the relationship between daytime vertical E × B drift velocity and ΔH, for the disturbed periods. The E × B drift velocity was then evaluated using the formulae thus found for the three regression analysis and validated for the 'disturbed periods' of 3 selected events. The E × B drift velocities estimated by the three regression analysis have a fairly good agreement with JULIA radar observed values under different seasons and solar activity conditions. Root Mean Square (RMS) errors calculated for each case suggest that polynomial (order 3) regression analysis provides a better agreement with the observations from among the three.
Zhang, Xingyu; Kim, Joyce; Patzer, Rachel E; Pitts, Stephen R; Patzer, Aaron; Schrager, Justin D
2017-10-26
To describe and compare logistic regression and neural network modeling strategies to predict hospital admission or transfer following initial presentation to Emergency Department (ED) triage with and without the addition of natural language processing elements. Using data from the National Hospital Ambulatory Medical Care Survey (NHAMCS), a cross-sectional probability sample of United States EDs from 2012 and 2013 survey years, we developed several predictive models with the outcome being admission to the hospital or transfer vs. discharge home. We included patient characteristics immediately available after the patient has presented to the ED and undergone a triage process. We used this information to construct logistic regression (LR) and multilayer neural network models (MLNN) which included natural language processing (NLP) and principal component analysis from the patient's reason for visit. Ten-fold cross validation was used to test the predictive capacity of each model and receiver operating curves (AUC) were then calculated for each model. Of the 47,200 ED visits from 642 hospitals, 6,335 (13.42%) resulted in hospital admission (or transfer). A total of 48 principal components were extracted by NLP from the reason for visit fields, which explained 75% of the overall variance for hospitalization. In the model including only structured variables, the AUC was 0.824 (95% CI 0.818-0.830) for logistic regression and 0.823 (95% CI 0.817-0.829) for MLNN. Models including only free-text information generated AUC of 0.742 (95% CI 0.731- 0.753) for logistic regression and 0.753 (95% CI 0.742-0.764) for MLNN. When both structured variables and free text variables were included, the AUC reached 0.846 (95% CI 0.839-0.853) for logistic regression and 0.844 (95% CI 0.836-0.852) for MLNN. The predictive accuracy of hospital admission or transfer for patients who presented to ED triage overall was good, and was improved with the inclusion of free text data from a patient's reason for visit regardless of modeling approach. Natural language processing and neural networks that incorporate patient-reported outcome free text may increase predictive accuracy for hospital admission.
A method for operative quantitative interpretation of multispectral images of biological tissues
NASA Astrophysics Data System (ADS)
Lisenko, S. A.; Kugeiko, M. M.
2013-10-01
A method for operative retrieval of spatial distributions of biophysical parameters of a biological tissue by using a multispectral image of it has been developed. The method is based on multiple regressions between linearly independent components of the diffuse reflection spectrum of the tissue and unknown parameters. Possibilities of the method are illustrated by an example of determining biophysical parameters of the skin (concentrations of melanin, hemoglobin and bilirubin, blood oxygenation, and scattering coefficient of the tissue). Examples of quantitative interpretation of the experimental data are presented.
An Alternative Way to Model Population Ability Distributions in Large-Scale Educational Surveys
ERIC Educational Resources Information Center
Wetzel, Eunike; Xu, Xueli; von Davier, Matthias
2015-01-01
In large-scale educational surveys, a latent regression model is used to compensate for the shortage of cognitive information. Conventionally, the covariates in the latent regression model are principal components extracted from background data. This operational method has several important disadvantages, such as the handling of missing data and…
Modeling health survey data with excessive zero and K responses.
Lin, Ting Hsiang; Tsai, Min-Hsiao
2013-04-30
Zero-inflated Poisson regression is a popular tool used to analyze data with excessive zeros. Although much work has already been performed to fit zero-inflated data, most models heavily depend on special features of the individual data. To be specific, this means that there is a sizable group of respondents who endorse the same answers making the data have peaks. In this paper, we propose a new model with the flexibility to model excessive counts other than zero, and the model is a mixture of multinomial logistic and Poisson regression, in which the multinomial logistic component models the occurrence of excessive counts, including zeros, K (where K is a positive integer) and all other values. The Poisson regression component models the counts that are assumed to follow a Poisson distribution. Two examples are provided to illustrate our models when the data have counts containing many ones and sixes. As a result, the zero-inflated and K-inflated models exhibit a better fit than the zero-inflated Poisson and standard Poisson regressions. Copyright © 2012 John Wiley & Sons, Ltd.
The effect of clouds on the earth's radiation budget
NASA Technical Reports Server (NTRS)
Ziskin, Daniel; Strobel, Darrell F.
1991-01-01
The radiative fluxes from the Earth Radiation Budget Experiment (ERBE) and the cloud properties from the International Satellite Cloud Climatology Project (ISCCP) over Indonesia for the months of June and July of 1985 and 1986 were analyzed to determine the cloud sensitivity coefficients. The method involved a linear least squares regression between co-incident flux and cloud coverage measurements. The calculated slope is identified as the cloud sensitivity. It was found that the correlations between the total cloud fraction and radiation parameters were modest. However, correlations between cloud fraction and IR flux were improved by separating clouds by height. Likewise, correlations between the visible flux and cloud fractions were improved by distinguishing clouds based on optical depth. Calculating correlations between the net fluxes and either height or optical depth segregated cloud fractions were somewhat improved. When clouds were classified in terms of their height and optical depth, correlations among all the radiation components were improved. Mean cloud sensitivities based on the regression of radiative fluxes against height and optical depth separated cloud types are presented. Results are compared to a one-dimensional radiation model with a simple cloud parameterization scheme.
Imaging genetics approach to predict progression of Parkinson's diseases.
Mansu Kim; Seong-Jin Son; Hyunjin Park
2017-07-01
Imaging genetics is a tool to extract genetic variants associated with both clinical phenotypes and imaging information. The approach can extract additional genetic variants compared to conventional approaches to better investigate various diseased conditions. Here, we applied imaging genetics to study Parkinson's disease (PD). We aimed to extract significant features derived from imaging genetics and neuroimaging. We built a regression model based on extracted significant features combining genetics and neuroimaging to better predict clinical scores of PD progression (i.e. MDS-UPDRS). Our model yielded high correlation (r = 0.697, p <; 0.001) and low root mean squared error (8.36) between predicted and actual MDS-UPDRS scores. Neuroimaging (from 123 I-Ioflupane SPECT) predictors of regression model were computed from independent component analysis approach. Genetic features were computed using image genetics approach based on identified neuroimaging features as intermediate phenotypes. Joint modeling of neuroimaging and genetics could provide complementary information and thus have the potential to provide further insight into the pathophysiology of PD. Our model included newly found neuroimaging features and genetic variants which need further investigation.
Multivariate Boosting for Integrative Analysis of High-Dimensional Cancer Genomic Data
Xiong, Lie; Kuan, Pei-Fen; Tian, Jianan; Keles, Sunduz; Wang, Sijian
2015-01-01
In this paper, we propose a novel multivariate component-wise boosting method for fitting multivariate response regression models under the high-dimension, low sample size setting. Our method is motivated by modeling the association among different biological molecules based on multiple types of high-dimensional genomic data. Particularly, we are interested in two applications: studying the influence of DNA copy number alterations on RNA transcript levels and investigating the association between DNA methylation and gene expression. For this purpose, we model the dependence of the RNA expression levels on DNA copy number alterations and the dependence of gene expression on DNA methylation through multivariate regression models and utilize boosting-type method to handle the high dimensionality as well as model the possible nonlinear associations. The performance of the proposed method is demonstrated through simulation studies. Finally, our multivariate boosting method is applied to two breast cancer studies. PMID:26609213
Factors Controlling Sediment Load in The Central Anatolia Region of Turkey: Ankara River Basin.
Duru, Umit; Wohl, Ellen; Ahmadi, Mehdi
2017-05-01
Better understanding of the factors controlling sediment load at a catchment scale can facilitate estimation of soil erosion and sediment transport rates. The research summarized here enhances understanding of correlations between potential control variables on suspended sediment loads. The Soil and Water Assessment Tool was used to simulate flow and sediment at the Ankara River basin. Multivariable regression analysis and principal component analysis were then performed between sediment load and controlling variables. The physical variables were either directly derived from a Digital Elevation Model or from field maps or computed using established equations. Mean observed sediment rate is 6697 ton/year and mean sediment yield is 21 ton/y/km² from the gage. Soil and Water Assessment Tool satisfactorily simulated observed sediment load with Nash-Sutcliffe efficiency, relative error, and coefficient of determination (R²) values of 0.81, -1.55, and 0.93, respectively in the catchment. Therefore, parameter values from the physically based model were applied to the multivariable regression analysis as well as principal component analysis. The results indicate that stream flow, drainage area, and channel width explain most of the variability in sediment load among the catchments. The implications of the results, efficient siltation management practices in the catchment should be performed to stream flow, drainage area, and channel width.
Factors Controlling Sediment Load in The Central Anatolia Region of Turkey: Ankara River Basin
NASA Astrophysics Data System (ADS)
Duru, Umit; Wohl, Ellen; Ahmadi, Mehdi
2017-05-01
Better understanding of the factors controlling sediment load at a catchment scale can facilitate estimation of soil erosion and sediment transport rates. The research summarized here enhances understanding of correlations between potential control variables on suspended sediment loads. The Soil and Water Assessment Tool was used to simulate flow and sediment at the Ankara River basin. Multivariable regression analysis and principal component analysis were then performed between sediment load and controlling variables. The physical variables were either directly derived from a Digital Elevation Model or from field maps or computed using established equations. Mean observed sediment rate is 6697 ton/year and mean sediment yield is 21 ton/y/km² from the gage. Soil and Water Assessment Tool satisfactorily simulated observed sediment load with Nash-Sutcliffe efficiency, relative error, and coefficient of determination ( R²) values of 0.81, -1.55, and 0.93, respectively in the catchment. Therefore, parameter values from the physically based model were applied to the multivariable regression analysis as well as principal component analysis. The results indicate that stream flow, drainage area, and channel width explain most of the variability in sediment load among the catchments. The implications of the results, efficient siltation management practices in the catchment should be performed to stream flow, drainage area, and channel width.
Structured functional additive regression in reproducing kernel Hilbert spaces
Zhu, Hongxiao; Yao, Fang; Zhang, Hao Helen
2013-01-01
Summary Functional additive models (FAMs) provide a flexible yet simple framework for regressions involving functional predictors. The utilization of data-driven basis in an additive rather than linear structure naturally extends the classical functional linear model. However, the critical issue of selecting nonlinear additive components has been less studied. In this work, we propose a new regularization framework for the structure estimation in the context of Reproducing Kernel Hilbert Spaces. The proposed approach takes advantage of the functional principal components which greatly facilitates the implementation and the theoretical analysis. The selection and estimation are achieved by penalized least squares using a penalty which encourages the sparse structure of the additive components. Theoretical properties such as the rate of convergence are investigated. The empirical performance is demonstrated through simulation studies and a real data application. PMID:25013362
Mahdavi, A; Nikmanesh, E; AghaeI, M; Kamran, F; Zahra Tavakoli, Z; Khaki Seddigh, F
2015-01-01
Nurses are the most significant part of human resources in a sanitary and health system. Job satisfaction results in the enhancement of organizational productivity, employee commitment to the organization and ensuring his/ her physical and mental health. The present research was conducted with the aim of predicting the level of job satisfaction based on hardiness and its components among the nurses with tension headache. The research method was correlational. The population consisted of all the nurses with tension headache who referred to the relevant specialists in Tehran. The sample size consisted of 50 individuals who were chosen by using the convenience sampling method and were measured and investigated by using the research tools of "Job Satisfaction Test" of Davis, Lofkvist and Weiss and "Personal Views Survey" of Kobasa. The data analysis was carried out by using the Pearson Correlation Coefficient and the Regression Analysis. The research findings demonstrated that the correlation coefficient obtained for "hardiness", "job satisfaction" was 0.506, and this coefficient was significant at the 0.01 level. Moreover, it was specified that the sense of commitment and challenge were stronger predictors for job satisfaction of nurses with tension headache among the components of hardiness, and, about 16% of the variance of "job satisfaction" could be explained by the two components (sense of commitment and challenge).
Current Pressure Transducer Application of Model-based Prognostics Using Steady State Conditions
NASA Technical Reports Server (NTRS)
Teubert, Christopher; Daigle, Matthew J.
2014-01-01
Prognostics is the process of predicting a system's future states, health degradation/wear, and remaining useful life (RUL). This information plays an important role in preventing failure, reducing downtime, scheduling maintenance, and improving system utility. Prognostics relies heavily on wear estimation. In some components, the sensors used to estimate wear may not be fast enough to capture brief transient states that are indicative of wear. For this reason it is beneficial to be capable of detecting and estimating the extent of component wear using steady-state measurements. This paper details a method for estimating component wear using steady-state measurements, describes how this is used to predict future states, and presents a case study of a current/pressure (I/P) Transducer. I/P Transducer nominal and off-nominal behaviors are characterized using a physics-based model, and validated against expected and observed component behavior. This model is used to map observed steady-state responses to corresponding fault parameter values in the form of a lookup table. This method was chosen because of its fast, efficient nature, and its ability to be applied to both linear and non-linear systems. Using measurements of the steady state output, and the lookup table, wear is estimated. A regression is used to estimate the wear propagation parameter and characterize the damage progression function, which are used to predict future states and the remaining useful life of the system.
Production of Selected Key Ductile Iron Castings Used in Large-Scale Windmills
NASA Astrophysics Data System (ADS)
Pan, Yung-Ning; Lin, Hsuan-Te; Lin, Chi-Chia; Chang, Re-Mo
Both the optimal alloy design and microstructures that conform to the mechanical properties requirements of selected key components used in large-scale windmills have been established in this study. The target specifications in this study are EN-GJS-350-22U-LT, EN-GJS-350-22U-LT and EN-GJS-700-2U. In order to meet the impact requirement of spec. EN-GJS-350-22U-LT, the Si content should be kept below 1.97%, and also the maximum pearlite content shouldn't exceed 7.8%. On the other hand, Si content below 2.15% and pearlite content below 12.5% were registered for specification EN-GJS-400-18U-LT. On the other hand, the optimal alloy designs that can comply with specification EN-GJS-700-2U include 0.25%Mn+0.6%Cu+0.05%Sn, 0.25%Mn+0.8%Cu+0.01%Sn and 0.45%Mn+0.6%Cu+0.01%Sn. Furthermore, based upon the experimental results, multiple regression analyses have been performed to correlate the mechanical properties with chemical compositions and microstructures. The derived regression equations can be used to attain the optimal alloy design for castings with target specifications. Furthermore, by employing these regression equations, the mechanical properties can be predicted based upon the chemical compositions and microstructures of cast irons.
NASA Astrophysics Data System (ADS)
Zhai, Liang; Li, Shuang; Zou, Bin; Sang, Huiyong; Fang, Xin; Xu, Shan
2018-05-01
Considering the spatial non-stationary contributions of environment variables to PM2.5 variations, the geographically weighted regression (GWR) modeling method has been using to estimate PM2.5 concentrations widely. However, most of the GWR models in reported studies so far were established based on the screened predictors through pretreatment correlation analysis, and this process might cause the omissions of factors really driving PM2.5 variations. This study therefore developed a best subsets regression (BSR) enhanced principal component analysis-GWR (PCA-GWR) modeling approach to estimate PM2.5 concentration by fully considering all the potential variables' contributions simultaneously. The performance comparison experiment between PCA-GWR and regular GWR was conducted in the Beijing-Tianjin-Hebei (BTH) region over a one-year-period. Results indicated that the PCA-GWR modeling outperforms the regular GWR modeling with obvious higher model fitting- and cross-validation based adjusted R2 and lower RMSE. Meanwhile, the distribution map of PM2.5 concentration from PCA-GWR modeling also clearly depicts more spatial variation details in contrast to the one from regular GWR modeling. It can be concluded that the BSR enhanced PCA-GWR modeling could be a reliable way for effective air pollution concentration estimation in the coming future by involving all the potential predictor variables' contributions to PM2.5 variations.
Wherry, Susan A.; Wood, Tamara M.
2018-04-27
A whole lake eutrophication (WLE) model approach for phosphorus and cyanobacterial biomass in Upper Klamath Lake, south-central Oregon, is presented here. The model is a successor to a previous model developed to inform a Total Maximum Daily Load (TMDL) for phosphorus in the lake, but is based on net primary production (NPP), which can be calculated from dissolved oxygen, rather than scaling up a small-scale description of cyanobacterial growth and respiration rates. This phase 3 WLE model is a refinement of the proof-of-concept developed in phase 2, which was the first attempt to use NPP to simulate cyanobacteria in the TMDL model. The calibration of the calculated NPP WLE model was successful, with performance metrics indicating a good fit to calibration data, and the calculated NPP WLE model was able to simulate mid-season bloom decreases, a feature that previous models could not reproduce.In order to use the model to simulate future scenarios based on phosphorus load reduction, a multivariate regression model was created to simulate NPP as a function of the model state variables (phosphorus and chlorophyll a) and measured meteorological and temperature model inputs. The NPP time series was split into a low- and high-frequency component using wavelet analysis, and regression models were fit to the components separately, with moderate success.The regression models for NPP were incorporated in the WLE model, referred to as the “scenario” WLE (SWLE), and the fit statistics for phosphorus during the calibration period were mostly unchanged. The fit statistics for chlorophyll a, however, were degraded. These statistics are still an improvement over prior models, and indicate that the SWLE is appropriate for long-term predictions even though it misses some of the seasonal variations in chlorophyll a.The complete whole lake SWLE model, with multivariate regression to predict NPP, was used to make long-term simulations of the response to 10-, 20-, and 40-percent reductions in tributary nutrient loads. The long-term mean water column concentration of total phosphorus was reduced by 9, 18, and 36 percent, respectively, in response to these load reductions. The long-term water column chlorophyll a concentration was reduced by 4, 13, and 44 percent, respectively. The adjustment to a new equilibrium between the water column and sediments occurred over about 30 years.
Bayesian kernel machine regression for estimating the health effects of multi-pollutant mixtures.
Bobb, Jennifer F; Valeri, Linda; Claus Henn, Birgit; Christiani, David C; Wright, Robert O; Mazumdar, Maitreyi; Godleski, John J; Coull, Brent A
2015-07-01
Because humans are invariably exposed to complex chemical mixtures, estimating the health effects of multi-pollutant exposures is of critical concern in environmental epidemiology, and to regulatory agencies such as the U.S. Environmental Protection Agency. However, most health effects studies focus on single agents or consider simple two-way interaction models, in part because we lack the statistical methodology to more realistically capture the complexity of mixed exposures. We introduce Bayesian kernel machine regression (BKMR) as a new approach to study mixtures, in which the health outcome is regressed on a flexible function of the mixture (e.g. air pollution or toxic waste) components that is specified using a kernel function. In high-dimensional settings, a novel hierarchical variable selection approach is incorporated to identify important mixture components and account for the correlated structure of the mixture. Simulation studies demonstrate the success of BKMR in estimating the exposure-response function and in identifying the individual components of the mixture responsible for health effects. We demonstrate the features of the method through epidemiology and toxicology applications. © The Author 2014. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Sparse modeling of spatial environmental variables associated with asthma
Chang, Timothy S.; Gangnon, Ronald E.; Page, C. David; Buckingham, William R.; Tandias, Aman; Cowan, Kelly J.; Tomasallo, Carrie D.; Arndt, Brian G.; Hanrahan, Lawrence P.; Guilbert, Theresa W.
2014-01-01
Geographically distributed environmental factors influence the burden of diseases such as asthma. Our objective was to identify sparse environmental variables associated with asthma diagnosis gathered from a large electronic health record (EHR) dataset while controlling for spatial variation. An EHR dataset from the University of Wisconsin’s Family Medicine, Internal Medicine and Pediatrics Departments was obtained for 199,220 patients aged 5–50 years over a three-year period. Each patient’s home address was geocoded to one of 3,456 geographic census block groups. Over one thousand block group variables were obtained from a commercial database. We developed a Sparse Spatial Environmental Analysis (SASEA). Using this method, the environmental variables were first dimensionally reduced with sparse principal component analysis. Logistic thin plate regression spline modeling was then used to identify block group variables associated with asthma from sparse principal components. The addresses of patients from the EHR dataset were distributed throughout the majority of Wisconsin’s geography. Logistic thin plate regression spline modeling captured spatial variation of asthma. Four sparse principal components identified via model selection consisted of food at home, dog ownership, household size, and disposable income variables. In rural areas, dog ownership and renter occupied housing units from significant sparse principal components were associated with asthma. Our main contribution is the incorporation of sparsity in spatial modeling. SASEA sequentially added sparse principal components to Logistic thin plate regression spline modeling. This method allowed association of geographically distributed environmental factors with asthma using EHR and environmental datasets. SASEA can be applied to other diseases with environmental risk factors. PMID:25533437
Sparse modeling of spatial environmental variables associated with asthma.
Chang, Timothy S; Gangnon, Ronald E; David Page, C; Buckingham, William R; Tandias, Aman; Cowan, Kelly J; Tomasallo, Carrie D; Arndt, Brian G; Hanrahan, Lawrence P; Guilbert, Theresa W
2015-02-01
Geographically distributed environmental factors influence the burden of diseases such as asthma. Our objective was to identify sparse environmental variables associated with asthma diagnosis gathered from a large electronic health record (EHR) dataset while controlling for spatial variation. An EHR dataset from the University of Wisconsin's Family Medicine, Internal Medicine and Pediatrics Departments was obtained for 199,220 patients aged 5-50years over a three-year period. Each patient's home address was geocoded to one of 3456 geographic census block groups. Over one thousand block group variables were obtained from a commercial database. We developed a Sparse Spatial Environmental Analysis (SASEA). Using this method, the environmental variables were first dimensionally reduced with sparse principal component analysis. Logistic thin plate regression spline modeling was then used to identify block group variables associated with asthma from sparse principal components. The addresses of patients from the EHR dataset were distributed throughout the majority of Wisconsin's geography. Logistic thin plate regression spline modeling captured spatial variation of asthma. Four sparse principal components identified via model selection consisted of food at home, dog ownership, household size, and disposable income variables. In rural areas, dog ownership and renter occupied housing units from significant sparse principal components were associated with asthma. Our main contribution is the incorporation of sparsity in spatial modeling. SASEA sequentially added sparse principal components to Logistic thin plate regression spline modeling. This method allowed association of geographically distributed environmental factors with asthma using EHR and environmental datasets. SASEA can be applied to other diseases with environmental risk factors. Copyright © 2014 Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Bektasli, Behzat
Graphs have a broad use in science classrooms, especially in physics. In physics, kinematics is probably the topic for which graphs are most widely used. The participants in this study were from two different grade-12 physics classrooms, advanced placement and calculus-based physics. The main purpose of this study was to search for the relationships between student spatial ability, logical thinking, mathematical achievement, and kinematics graphs interpretation skills. The Purdue Spatial Visualization Test, the Middle Grades Integrated Process Skills Test (MIPT), and the Test of Understanding Graphs in Kinematics (TUG-K) were used for quantitative data collection. Classroom observations were made to acquire ideas about classroom environment and instructional techniques. Factor analysis, simple linear correlation, multiple linear regression, and descriptive statistics were used to analyze the quantitative data. Each instrument has two principal components. The selection and calculation of the slope and of the area were the two principal components of TUG-K. MIPT was composed of a component based upon processing text and a second component based upon processing symbolic information. The Purdue Spatial Visualization Test was composed of a component based upon one-step processing and a second component based upon two-step processing of information. Student ability to determine the slope in a kinematics graph was significantly correlated with spatial ability, logical thinking, and mathematics aptitude and achievement. However, student ability to determine the area in a kinematics graph was only significantly correlated with student pre-calculus semester 2 grades. Male students performed significantly better than female students on the slope items of TUG-K. Also, male students performed significantly better than female students on the PSAT mathematics assessment and spatial ability. This study found that students have different levels of spatial ability, logical thinking, and mathematics aptitude and achievement levels. These different levels were related to student learning of kinematics and they need to be considered when kinematics is being taught. It might be easier for students to understand the kinematics graphs if curriculum developers include more activities related to spatial ability and logical thinking.
A regression-kriging model for estimation of rainfall in the Laohahe basin
NASA Astrophysics Data System (ADS)
Wang, Hong; Ren, Li L.; Liu, Gao H.
2009-10-01
This paper presents a multivariate geostatistical algorithm called regression-kriging (RK) for predicting the spatial distribution of rainfall by incorporating five topographic/geographic factors of latitude, longitude, altitude, slope and aspect. The technique is illustrated using rainfall data collected at 52 rain gauges from the Laohahe basis in northeast China during 1986-2005 . Rainfall data from 44 stations were selected for modeling and the remaining 8 stations were used for model validation. To eliminate multicollinearity, the five explanatory factors were first transformed using factor analysis with three Principal Components (PCs) extracted. The rainfall data were then fitted using step-wise regression and residuals interpolated using SK. The regression coefficients were estimated by generalized least squares (GLS), which takes the spatial heteroskedasticity between rainfall and PCs into account. Finally, the rainfall prediction based on RK was compared with that predicted from ordinary kriging (OK) and ordinary least squares (OLS) multiple regression (MR). For correlated topographic factors are taken into account, RK improves the efficiency of predictions. RK achieved a lower relative root mean square error (RMSE) (44.67%) than MR (49.23%) and OK (73.60%) and a lower bias than MR and OK (23.82 versus 30.89 and 32.15 mm) for annual rainfall. It is much more effective for the wet season than for the dry season. RK is suitable for estimation of rainfall in areas where there are no stations nearby and where topography has a major influence on rainfall.
GIS-based spatial regression and prediction of water quality in river networks: A case study in Iowa
Yang, X.; Jin, W.
2010-01-01
Nonpoint source pollution is the leading cause of the U.S.'s water quality problems. One important component of nonpoint source pollution control is an understanding of what and how watershed-scale conditions influence ambient water quality. This paper investigated the use of spatial regression to evaluate the impacts of watershed characteristics on stream NO3NO2-N concentration in the Cedar River Watershed, Iowa. An Arc Hydro geodatabase was constructed to organize various datasets on the watershed. Spatial regression models were developed to evaluate the impacts of watershed characteristics on stream NO3NO2-N concentration and predict NO3NO2-N concentration at unmonitored locations. Unlike the traditional ordinary least square (OLS) method, the spatial regression method incorporates the potential spatial correlation among the observations in its coefficient estimation. Study results show that NO3NO2-N observations in the Cedar River Watershed are spatially correlated, and by ignoring the spatial correlation, the OLS method tends to over-estimate the impacts of watershed characteristics on stream NO3NO2-N concentration. In conjunction with kriging, the spatial regression method not only makes better stream NO3NO2-N concentration predictions than the OLS method, but also gives estimates of the uncertainty of the predictions, which provides useful information for optimizing the design of stream monitoring network. It is a promising tool for better managing and controlling nonpoint source pollution. ?? 2010 Elsevier Ltd.
Estimated long-term outdoor air pollution concentrations in a cohort study
NASA Astrophysics Data System (ADS)
Beelen, Rob; Hoek, Gerard; Fischer, Paul; Brandt, Piet A. van den; Brunekreef, Bert
Several recent studies associated long-term exposure to air pollution with increased mortality. An ongoing cohort study, the Netherlands Cohort Study on Diet and Cancer (NLCS), was used to study the association between long-term exposure to traffic-related air pollution and mortality. Following on a previous exposure assessment study in the NLCS, we improved the exposure assessment methods. Long-term exposure to nitrogen dioxide (NO 2), nitrogen oxide (NO), black smoke (BS), and sulphur dioxide (SO 2) was estimated. Exposure at each home address ( N=21 868) was considered as a function of a regional, an urban and a local component. The regional component was estimated using inverse distance weighed interpolation of measurement data from regional background sites in a national monitoring network. Regression models with urban concentrations as dependent variables, and number of inhabitants in different buffers and land use variables, derived with a Geographic Information System (GIS), as predictor variables were used to estimate the urban component. The local component was assessed using a GIS and a digital road network with linked traffic intensities. Traffic intensity on the nearest road and on the nearest major road, and the sum of traffic intensity in a buffer of 100 m around each home address were assessed. Further, a quantitative estimate of the local component was estimated. The regression models to estimate the urban component explained 67%, 46%, 49% and 35% of the variances of NO 2, NO, BS, and SO 2 concentrations, respectively. Overall regression models which incorporated the regional, urban and local component explained 84%, 44%, 59% and 56% of the variability in concentrations for NO 2, NO, BS and SO 2, respectively. We were able to develop an exposure assessment model using GIS methods and traffic intensities that explained a large part of the variations in outdoor air pollution concentrations.
NASA Astrophysics Data System (ADS)
Li, Lingqi; Gottschalk, Lars; Krasovskaia, Irina; Xiong, Lihua
2018-01-01
Reconstruction of missing runoff data is of important significance to solve contradictions between the common situation of gaps and the fundamental necessity of complete time series for reliable hydrological research. The conventional empirical orthogonal functions (EOF) approach has been documented to be useful for interpolating hydrological series based upon spatiotemporal decomposition of runoff variation patterns, without additional measurements (e.g., precipitation, land cover). This study develops a new EOF-based approach (abbreviated as CEOF) that conditions EOF expansion on the oscillations at outlet (or any other reference station) of a target basin and creates a set of residual series by removing the dependence on this reference series, in order to redefine the amplitude functions (components). This development allows a transparent hydrological interpretation of the dimensionless components and thereby strengthens their capacities to explain various runoff regimes in a basin. The two approaches are demonstrated on an application of discharge observations from the Ganjiang basin, China. Two alternatives for determining amplitude functions based on centred and standardised series, respectively, are tested. The convergence in the reconstruction of observations at different sites as a function of the number of components and its relation to the characteristics of the site are analysed. Results indicate that the CEOF approach offers an efficient way to restore runoff records with only one to four components; it shows more superiority in nested large basins than at headwater sites and often performs better than the EOF approach when using standardised series, especially in improving infilling accuracy for low flows. Comparisons against other interpolation methods (i.e., nearest neighbour, linear regression, inverse distance weighting) further confirm the advantage of the EOF-based approaches in avoiding spatial and temporal inconsistencies in estimated series.
Metabolic syndrome in adolescents: definition based on regression of IDF adult cut-off points.
Benmohammed, K; Valensi, P; Balkau, B; Lezzar, A
2016-12-01
The objective of this study was to derive a sex- and age-specific definition of the metabolic syndrome (MetS) and its abnormalities for adolescents. This is a cross-sectional study. A total of 1100 adolescent students, aged 12-18 y, were randomly selected from schools and classrooms in the city of Constantine, Algeria; all had anthropometric measurements taken, and 989 had blood tests. Gender-specific growth curves for components of the MetS were derived, using the LMS (lambda-mu-sigma) method, and the percentiles corresponding to the thresholds of the MetS components proposed for adults by the International Diabetes Federation (IDF) were identified. The prevalence of the MetS using this new definition was 4.3% for boys and 3.7% for girls (P = 0.64). Overall, a high waist circumference was the most frequent of the syndrome components, but the frequency was much higher in girls than that in boys, 33.6% and 6.9%, respectively. In contrast, a high systolic blood pressure was seen in 26.8% of the boys and only 11.4% of the girls. The prevalence of the MetS was higher among adolescents with a body mass index (BMI) ≥95th percentile of the study population, 28.8%, against 9.8% in adolescents with a BMI between the 95th and 85th percentile and 1.8% in those with a BMI <85th percentile (P < 0.0001). MetS during adolescence requires more studies to establish a consensus definition. For clinical practice, we propose a simplified definition for boys and girls based on regression of IDF adult cut-off points. This definition should be tested in further studies with other adolescent populations. Copyright © 2016 The Royal Society for Public Health. Published by Elsevier Ltd. All rights reserved.
Increased prevalence of metabolic syndrome in non-obese asian Indian-an urban-rural comparison.
Mahadik, S R; Deo, S S; Mehtalia, S D
2007-06-01
In the present study we evaluated the association of insulin resistance (IR) with different components of Metabolic Syndrome (MS) in an Asian Indian population, and performed a comparative study between urban and rural populations of India. A Total of 267 urban men and women aged 25-70 years participated in this study. RESULTS were compared with rural data from a previously published study. Fasting serum insulin, uric acid, and lipid profile were measured along with fasting and 2 hour plasma glucose. Association of MS and IR was studied by using univariate regression analysis. Prevalence of MS was significantly higher in the urban population compared to that of the rural population (35.2% vs 20.6%, chi(2) = 23.2, p < 0.001). Calculated insulin resistence (HOMA-IR) was common in MS group of both populations. Percentage prevalence of IR was high and almost the same in both population (42%). Percentage prevalence of abdominal obesity and hypertriglyceridemia was significantly higher in the urban population compared to the rural population. Linear regression analysis of IR significantly correlated with different components of MS of both the population. The significant finding of the present study was that the rural population exhibited a high prevalence of MS and IR, though nonobese. IR correlated with components of MS not only in the urban but also in the rural population. To reduce the incidence of Type 2 Diabetes (T2DM) and cardiovascular disease (CVD) in our populations, early identification of populations at risk based on prevalence of MS and IR will become of prime importance.
Sun, Guanghao; Shinba, Toshikazu; Kirimoto, Tetsuo; Matsui, Takemi
2016-01-01
Heart rate variability (HRV) has been intensively studied as a promising biological marker of major depressive disorder (MDD). Our previous study confirmed that autonomic activity and reactivity in depression revealed by HRV during rest and mental task (MT) conditions can be used as diagnostic measures and in clinical evaluation. In this study, logistic regression analysis (LRA) was utilized for the classification and prediction of MDD based on HRV data obtained in an MT paradigm. Power spectral analysis of HRV on R-R intervals before, during, and after an MT (random number generation) was performed in 44 drug-naïve patients with MDD and 47 healthy control subjects at Department of Psychiatry in Shizuoka Saiseikai General Hospital. Logit scores of LRA determined by HRV indices and heart rates discriminated patients with MDD from healthy subjects. The high frequency (HF) component of HRV and the ratio of the low frequency (LF) component to the HF component (LF/HF) correspond to parasympathetic and sympathovagal balance, respectively. The LRA achieved a sensitivity and specificity of 80.0 and 79.0%, respectively, at an optimum cutoff logit score (0.28). Misclassifications occurred only when the logit score was close to the cutoff score. Logit scores also correlated significantly with subjective self-rating depression scale scores ( p < 0.05). HRV indices recorded during a MT may be an objective tool for screening patients with MDD in psychiatric practice. The proposed method appears promising for not only objective and rapid MDD screening but also evaluation of its severity.
Web document ranking via active learning and kernel principal component analysis
NASA Astrophysics Data System (ADS)
Cai, Fei; Chen, Honghui; Shu, Zhen
2015-09-01
Web document ranking arises in many information retrieval (IR) applications, such as the search engine, recommendation system and online advertising. A challenging issue is how to select the representative query-document pairs and informative features as well for better learning and exploring new ranking models to produce an acceptable ranking list of candidate documents of each query. In this study, we propose an active sampling (AS) plus kernel principal component analysis (KPCA) based ranking model, viz. AS-KPCA Regression, to study the document ranking for a retrieval system, i.e. how to choose the representative query-document pairs and features for learning. More precisely, we fill those documents gradually into the training set by AS such that each of which will incur the highest expected DCG loss if unselected. Then, the KPCA is performed via projecting the selected query-document pairs onto p-principal components in the feature space to complete the regression. Hence, we can cut down the computational overhead and depress the impact incurred by noise simultaneously. To the best of our knowledge, we are the first to perform the document ranking via dimension reductions in two dimensions, namely, the number of documents and features simultaneously. Our experiments demonstrate that the performance of our approach is better than that of the baseline methods on the public LETOR 4.0 datasets. Our approach brings an improvement against RankBoost as well as other baselines near 20% in terms of MAP metric and less improvements using P@K and NDCG@K, respectively. Moreover, our approach is particularly suitable for document ranking on the noisy dataset in practice.
NASA Astrophysics Data System (ADS)
Mangla, Rohit; Kumar, Shashi; Nandy, Subrata
2016-05-01
SAR and LiDAR remote sensing have already shown the potential of active sensors for forest parameter retrieval. SAR sensor in its fully polarimetric mode has an advantage to retrieve scattering property of different component of forest structure and LiDAR has the capability to measure structural information with very high accuracy. This study was focused on retrieval of forest aboveground biomass (AGB) using Terrestrial Laser Scanner (TLS) based point clouds and scattering property of forest vegetation obtained from decomposition modelling of RISAT-1 fully polarimetric SAR data. TLS data was acquired for 14 plots of Timli forest range, Uttarakhand, India. The forest area is dominated by Sal trees and random sampling with plot size of 0.1 ha (31.62m*31.62m) was adopted for TLS and field data collection. RISAT-1 data was processed to retrieve SAR data based variables and TLS point clouds based 3D imaging was done to retrieve LiDAR based variables. Surface scattering, double-bounce scattering, volume scattering, helix and wire scattering were the SAR based variables retrieved from polarimetric decomposition. Tree heights and stem diameters were used as LiDAR based variables retrieved from single tree vertical height and least square circle fit methods respectively. All the variables obtained for forest plots were used as an input in a machine learning based Random Forest Regression Model, which was developed in this study for forest AGB estimation. Modelled output for forest AGB showed reliable accuracy (RMSE = 27.68 t/ha) and a good coefficient of determination (0.63) was obtained through the linear regression between modelled AGB and field-estimated AGB. The sensitivity analysis showed that the model was more sensitive for the major contributed variables (stem diameter and volume scattering) and these variables were measured from two different remote sensing techniques. This study strongly recommends the integration of SAR and LiDAR data for forest AGB estimation.
Sumiyoshi, Chika; Harvey, Philip D; Takaki, Manabu; Okahisa, Yuko; Sato, Taku; Sora, Ichiro; Nuechterlein, Keith H; Subotnik, Kenneth L; Sumiyoshi, Tomiki
2015-09-01
Functional outcomes in individuals with schizophrenia suggest recovery of cognitive, everyday, and social functioning. Specifically improvement of work status is considered to be most important for their independent living and self-efficacy. The main purposes of the present study were 1) to identify which outcome factors predict occupational functioning, quantified as work hours, and 2) to provide cut-offs on the scales for those factors to attain better work status. Forty-five Japanese patients with schizophrenia and 111 healthy controls entered the study. Cognition, capacity for everyday activities, and social functioning were assessed by the Japanese versions of the MATRICS Cognitive Consensus Battery (MCCB), the UCSD Performance-based Skills Assessment-Brief (UPSA-B), and the Social Functioning Scale Individuals' version modified for the MATRICS-PASS (Modified SFS for PASS), respectively. Potential factors for work outcome were estimated by multiple linear regression analyses (predicting work hours directly) and a multiple logistic regression analyses (predicting dichotomized work status based on work hours). ROC curve analyses were performed to determine cut-off points for differentiating between the better- and poor work status. The results showed that a cognitive component, comprising visual/verbal learning and emotional management, and a social functioning component, comprising independent living and vocational functioning, were potential factors for predicting work hours/status. Cut-off points obtained in ROC analyses indicated that 60-70% achievements on the measures of those factors were expected to maintain the better work status. Our findings suggest that improvement on specific aspects of cognitive and social functioning are important for work outcome in patients with schizophrenia.
Cook, Nicola A; Kim, Jin Un; Pasha, Yasmin; Crossey, Mary ME; Schembri, Adrian J; Harel, Brian T; Kimhofer, Torben; Taylor-Robinson, Simon D
2017-01-01
Background Psychometric testing is used to identify patients with cirrhosis who have developed hepatic encephalopathy (HE). Most batteries consist of a series of paper-and-pencil tests, which are cumbersome for most clinicians. A modern, easy-to-use, computer-based battery would be a helpful clinical tool, given that in its minimal form, HE has an impact on both patients’ quality of life and the ability to drive and operate machinery (with societal consequences). Aim We compared the Cogstate™ computer battery testing with the Psychometric Hepatic Encephalopathy Score (PHES) tests, with a view to simplify the diagnosis. Methods This was a prospective study of 27 patients with histologically proven cirrhosis. An analysis of psychometric testing was performed using accuracy of task performance and speed of completion as primary variables to create a correlation matrix. A stepwise linear regression analysis was performed with backward elimination, using analysis of variance. Results Strong correlations were found between the international shopping list, international shopping list delayed recall of Cogstate and the PHES digit symbol test. The Shopping List Tasks were the only tasks that consistently had P values of <0.05 in the linear regression analysis. Conclusion Subtests of the Cogstate battery correlated very strongly with the digit symbol component of PHES in discriminating severity of HE. These findings would indicate that components of the current PHES battery with the international shopping list tasks of Cogstate would be discriminant and have the potential to be used easily in clinical practice. PMID:28919805
Estimates of Refrigerator Loads in Public Housing Based on Metered Consumption Data
DOE Office of Scientific and Technical Information (OSTI.GOV)
Miller, JD; Pratt, RG
1998-09-11
The New York Power Authority (NYPA), the New York City Housing Authority (NYCHA), and the U.S. Departments of Housing and Urban Development (HUD) and Energy (DOE) have joined in a project to replace refrigerators in New York City public housing with new, highly energy-efficient models. This project laid the ground work for the Consortium for Energy Efficiency (CEE) and DOE to enable housing authorities throughout the United States to bulk-purchase energy-efficient appliances. DOE helped develop and plan the program through the ENERGY STAR@ Partnerships program conducted by its Pacific Nofiwest National Laboratory (PNNL). PNNL was subsequently asked to conduct themore » savings evahations for 1996 and 1997. PNNL designed the metering protocol and occupant survey, supplied and calibrated the metering equipment, and managed and analyzed the data. The 1996 metering study of refrigerator energy usage in New York City public housing (Pratt and Miller 1997) established the need and justification for a regression-model-based approach to an energy savings estimate. The need originated in logistical difficulties associated with sampling the population and pen?orming a stratified analysis. Commonly, refrigerators[a) with high representation in the population were missed in the sampling schedule, leaving significant holes in the sample and difficulties for the stratified anrdysis. The just{jfcation was found in the fact that strata (distinct groups of identical refrigerators) were not statistically distinct in terms of their label ratio (ratio of metered consumption to label rating). This finding suggested a general regression model could be used to represent the consumption of all refrigerators in the population. In 1996 a simple two-coefficient regression model, a function of only the refrigerator label rating, was developed and used to represent the existing population of refrigerators. A key concept used in the 1997 study grew from findings in a small number of apartments metered in 1996 with a detailed protocol. Fifteen-minute time-series data of ambient and compartment temperatures and refrigerator power were analyzed and demonstrated the potential for reducing power records into three components. This motivated the development of an analysis process to divide the metered consumption into baseline load, occupant-associated load, and defrosting load. The baseline load is the consumption that would occur if the refrigerator were on but had no occupant usage load (no door-opening events) and the defrosting mechanism was disabled. The motivation behind this component reduction process was the hope that components could be more effectively modeled than the total. We reasoned that the components would lead to abetter (more general and more significant) understanding of the relationships between consumption, the characteristics of the refrigerator, and its operating environment.« less
NASA Technical Reports Server (NTRS)
Kalton, G.
1983-01-01
A number of surveys were conducted to study the relationship between the level of aircraft or traffic noise exposure experienced by people living in a particular area and their annoyance with it. These surveys generally employ a clustered sample design which affects the precision of the survey estimates. Regression analysis of annoyance on noise measures and other variables is often an important component of the survey analysis. Formulae are presented for estimating the standard errors of regression coefficients and ratio of regression coefficients that are applicable with a two- or three-stage clustered sample design. Using a simple cost function, they also determine the optimum allocation of the sample across the stages of the sample design for the estimation of a regression coefficient.
2011-01-01
Background Hemorrhagic fever with renal syndrome (HFRS) is an important infectious disease caused by different species of hantaviruses. As a rodent-borne disease with a seasonal distribution, external environmental factors including climate factors may play a significant role in its transmission. The city of Shenyang is one of the most seriously endemic areas for HFRS. Here, we characterized the dynamic temporal trend of HFRS, and identified climate-related risk factors and their roles in HFRS transmission in Shenyang, China. Methods The annual and monthly cumulative numbers of HFRS cases from 2004 to 2009 were calculated and plotted to show the annual and seasonal fluctuation in Shenyang. Cross-correlation and autocorrelation analyses were performed to detect the lagged effect of climate factors on HFRS transmission and the autocorrelation of monthly HFRS cases. Principal component analysis was constructed by using climate data from 2004 to 2009 to extract principal components of climate factors to reduce co-linearity. The extracted principal components and autocorrelation terms of monthly HFRS cases were added into a multiple regression model called principal components regression model (PCR) to quantify the relationship between climate factors, autocorrelation terms and transmission of HFRS. The PCR model was compared to a general multiple regression model conducted only with climate factors as independent variables. Results A distinctly declining temporal trend of annual HFRS incidence was identified. HFRS cases were reported every month, and the two peak periods occurred in spring (March to May) and winter (November to January), during which, nearly 75% of the HFRS cases were reported. Three principal components were extracted with a cumulative contribution rate of 86.06%. Component 1 represented MinRH0, MT1, RH1, and MWV1; component 2 represented RH2, MaxT3, and MAP3; and component 3 represented MaxT2, MAP2, and MWV2. The PCR model was composed of three principal components and two autocorrelation terms. The association between HFRS epidemics and climate factors was better explained in the PCR model (F = 446.452, P < 0.001, adjusted R2 = 0.75) than in the general multiple regression model (F = 223.670, P < 0.000, adjusted R2 = 0.51). Conclusion The temporal distribution of HFRS in Shenyang varied in different years with a distinctly declining trend. The monthly trends of HFRS were significantly associated with local temperature, relative humidity, precipitation, air pressure, and wind velocity of the different previous months. The model conducted in this study will make HFRS surveillance simpler and the control of HFRS more targeted in Shenyang. PMID:22133347
Construction of mathematical model for measuring material concentration by colorimetric method
NASA Astrophysics Data System (ADS)
Liu, Bing; Gao, Lingceng; Yu, Kairong; Tan, Xianghua
2018-06-01
This paper use the method of multiple linear regression to discuss the data of C problem of mathematical modeling in 2017. First, we have established a regression model for the concentration of 5 substances. But only the regression model of the substance concentration of urea in milk can pass through the significance test. The regression model established by the second sets of data can pass the significance test. But this model exists serious multicollinearity. We have improved the model by principal component analysis. The improved model is used to control the system so that it is possible to measure the concentration of material by direct colorimetric method.
Lee, I-Te; Chiu, Yen-Feng; Hwu, Chii-Min; He, Chih-Tsueng; Chiang, Fu-Tien; Lin, Yu-Chun; Assimes, Themistocles; Curb, J David; Sheu, Wayne H-H
2012-04-26
Metabolic abnormalities have a cumulative effect on development of diabetes, but only central obesity has been defined as the essential criterion of metabolic syndrome (MetS) by the International Diabetes Federation. We hypothesized that central obesity contributes to a higher risk of new-onset diabetes than other metabolic abnormalities in the hypertensive families. Non-diabetic Chinese were enrolled and MetS components were assessed to establish baseline data in a hypertensive family-based cohort study. Based on medical records and glucose tolerance test (OGTT), the cumulative incidence of diabetes was analyzed in this five-year study by Cox regression models. Contribution of central obesity to development of new-onset diabetes was assessed in subjects with the same number of positive MetS components. Among the total of 595 subjects who completed the assessment, 125 (21.0%) developed diabetes. Incidence of diabetes increased in direct proportion to the number of positive MetS components (P ≪ 0.001). Although subjects with central obesity had a higher incidence of diabetes than those without (55.7 vs. 30.0 events/1000 person-years, P ≪ 0.001), the difference became non-significant after adjusting of the number of positive MetS components (hazard ratio = 0.72, 95%CI: 0.45-1.13). Furthermore, in all participants with three positive MetS components, there was no difference in the incidence of diabetes between subjects with and without central obesity (hazard ratio = 1.04, 95%CI: 0.50-2.16). In Chinese hypertensive families, the incidence of diabetes in subjects without central obesity was similar to that in subjects with central obesity when they also had the same number of positive MetS components. We suggest that central obesity is very important, but not the essential component of the metabolic syndrome for predicting of new-onset diabetes. ( NCT00260910, ClinicalTrials.gov).
NASA Astrophysics Data System (ADS)
Kim, Young-Pil; Hong, Mi-Young; Shon, Hyun Kyong; Chegal, Won; Cho, Hyun Mo; Moon, Dae Won; Kim, Hak-Sung; Lee, Tae Geol
2008-12-01
Interaction between streptavidin and biotin on poly(amidoamine) (PAMAM) dendrimer-activated surfaces and on self-assembled monolayers (SAMs) was quantitatively studied by using time-of-flight secondary ion mass spectrometry (ToF-SIMS). The surface protein density was systematically varied as a function of protein concentration and independently quantified using the ellipsometry technique. Principal component analysis (PCA) and principal component regression (PCR) were used to identify a correlation between the intensities of the secondary ion peaks and the surface protein densities. From the ToF-SIMS and ellipsometry results, a good linear correlation of protein density was found. Our study shows that surface protein densities are higher on dendrimer-activated surfaces than on SAMs surfaces due to the spherical property of the dendrimer, and that these surface protein densities can be easily quantified with high sensitivity in a label-free manner by ToF-SIMS.
González-Costa, Juan José; Reigosa, Manuel Joaquín; Matías, José María; Fernández-Covelo, Emma
2017-01-01
This study determines the influence of the different soil components and of the cation-exchange capacity on the adsorption and retention of different heavy metals: cadmium, chromium, copper, nickel, lead and zinc. In order to do so, regression models were created through decision trees and the importance of soil components was assessed. Used variables were: humified organic matter, specific cation-exchange capacity, percentages of sand and silt, proportions of Mn, Fe and Al oxides and hematite, and the proportion of quartz, plagioclase and mica, and the proportions of the different clays: kaolinite, vermiculite, gibbsite and chlorite. The most important components in the obtained models were vermiculite and gibbsite, especially for the adsorption of cadmium and zinc, while clays were less relevant. Oxides are less important than clays, especially for the adsorption of chromium and lead and the retention of chromium, copper and lead. PMID:28072849
Ho, Hsing-Hao; Li, Ya-Hui; Lee, Jih-Chin; Wang, Chih-Wei; Yu, Yi-Lin; Hueng, Dueng-Yuan; Ma, Hsin-I; Hsu, Hsian-He; Juan, Chun-Jung
2018-01-01
We estimated the volume of vestibular schwannomas by an ice cream cone formula using thin-sliced magnetic resonance images (MRI) and compared the estimation accuracy among different estimating formulas and between different models. The study was approved by a local institutional review board. A total of 100 patients with vestibular schwannomas examined by MRI between January 2011 and November 2015 were enrolled retrospectively. Informed consent was waived. Volumes of vestibular schwannomas were estimated by cuboidal, ellipsoidal, and spherical formulas based on a one-component model, and cuboidal, ellipsoidal, Linskey's, and ice cream cone formulas based on a two-component model. The estimated volumes were compared to the volumes measured by planimetry. Intraobserver reproducibility and interobserver agreement was tested. Estimation error, including absolute percentage error (APE) and percentage error (PE), was calculated. Statistical analysis included intraclass correlation coefficient (ICC), linear regression analysis, one-way analysis of variance, and paired t-tests with P < 0.05 considered statistically significant. Overall tumor size was 4.80 ± 6.8 mL (mean ±standard deviation). All ICCs were no less than 0.992, suggestive of high intraobserver reproducibility and high interobserver agreement. Cuboidal formulas significantly overestimated the tumor volume by a factor of 1.9 to 2.4 (P ≤ 0.001). The one-component ellipsoidal and spherical formulas overestimated the tumor volume with an APE of 20.3% and 29.2%, respectively. The two-component ice cream cone method, and ellipsoidal and Linskey's formulas significantly reduced the APE to 11.0%, 10.1%, and 12.5%, respectively (all P < 0.001). The ice cream cone method and other two-component formulas including the ellipsoidal and Linskey's formulas allow for estimation of vestibular schwannoma volume more accurately than all one-component formulas.
Fidalgo, Angel M; Tenenbaum, Harriet R; Aznar, Ana
2018-01-01
This article examines whether there are gender differences in understanding the emotions evaluated by the Test of Emotion Comprehension (TEC). The TEC provides a global index of emotion comprehension in children 3-11 years of age, which is the sum of the nine components that constitute emotion comprehension: (1) recognition of facial expressions, (2) understanding of external causes of emotions, (3) understanding of desire-based emotions, (4) understanding of belief-based emotions, (5) understanding of the influence of a reminder on present emotional states, (6) understanding of the possibility to regulate emotional states, (7) understanding of the possibility of hiding emotional states, (8) understanding of mixed emotions, and (9) understanding of moral emotions. We used the answers to the TEC given by 172 English girls and 181 boys from 3 to 8 years of age. First, the nine components into which the TEC is subdivided were analysed for differential item functioning (DIF), taking gender as the grouping variable. To evaluate DIF, the Mantel-Haenszel method and logistic regression analysis were used applying the Educational Testing Service DIF classification criteria. The results show that the TEC did not display gender DIF. Second, when absence of DIF had been corroborated, it was analysed for differences between boys and girls in the total TEC score and its components controlling for age. Our data are compatible with the hypothesis of independence between gender and level of comprehension in 8 of the 9 components of the TEC. Several hypotheses are discussed that could explain the differences found between boys and girls in the belief component. Given that the Belief component is basically a false belief task, the differences found seem to support findings in the literature indicating that girls perform better on this task.
Effects of metals within ambient air particulate matter (PM) on human health.
Chen, Lung Chi; Lippmann, Morton
2009-01-01
We review literature providing insights on health-related effects caused by inhalation of ambient air particulate matter (PM) containing metals, emphasizing effects associated with in vivo exposures at or near contemporary atmospheric concentrations. Inhalation of much higher concentrations, and high-level exposures via intratracheal (IT) instillation that inform mechanistic processes, are also reviewed. The most informative studies of effects at realistic exposure levels, in terms of identifying influential individual PM components or source-related mixtures, have been based on (1) human and laboratory animal exposures to concentrated ambient particles (CAPs), and (2) human population studies for which both health-related effects were observed and PM composition data were available for multipollutant regression analyses or source apportionment. Such studies have implicated residual oil fly ash (ROFA) as the most toxic source-related mixture, and Ni and V, which are characteristic tracers of ROFA, as particularly influential components in terms of acute cardiac function changes and excess short-term mortality. There is evidence that other metals within ambient air PM, such as Pb and Zn, also affect human health. Most evidence now available is based on the use of ambient air PM components concentration data, rather than actual exposures, to determine significant associations and/or effects coefficients. Therefore, considerable uncertainties about causality are associated with exposure misclassification and measurement errors. As more PM speciation data and more refined modeling techniques become available, and as more CAPs studies involving PM component analyses are performed, the roles of specific metals and other components within PM will become clearer.
The Socioeconomic Factors and the Indigenous Component of Tuberculosis in Amazonas
2016-01-01
Despite the availability of tuberculosis prevention and control services throughout Amazonas, high rates of morbidity and mortality from tuberculosis remain in the region. Knowledge of the social determinants of tuberculosis in Amazonas is important for the establishment of public policies and the planning of effective preventive and control measures for the disease. To analyze the relationship of the spatial distribution of the incidence of tuberculosis in municipalities and regions of Amazonas to the socioeconomic factors and indigenous tuberculosis component, from 2007 to 2013. An ecological study was conducted based on secondary data from the epidemiological surveillance of tuberculosis. A linear regression model was used to analyze the relationship of the annual incidence of tuberculosis to the socioeconomic factors, performance indicators of health services, and indigenous tuberculosis component. The distribution of the incidence of tuberculosis in the municipalities of Amazonas was positively associated with the Gini index and the population attributable fraction of tuberculosis in the indigenous peoples, but negatively associated with the proportion of the poor and the unemployment rate. The spatial distribution of tuberculosis in the different regions of Amazonas was heterogeneous and closely related with the socioeconomic factors and indigenous component of tuberculosis. PMID:27362428
Torija, Antonio J; Ruiz, Diego P
2015-02-01
The prediction of environmental noise in urban environments requires the solution of a complex and non-linear problem, since there are complex relationships among the multitude of variables involved in the characterization and modelling of environmental noise and environmental-noise magnitudes. Moreover, the inclusion of the great spatial heterogeneity characteristic of urban environments seems to be essential in order to achieve an accurate environmental-noise prediction in cities. This problem is addressed in this paper, where a procedure based on feature-selection techniques and machine-learning regression methods is proposed and applied to this environmental problem. Three machine-learning regression methods, which are considered very robust in solving non-linear problems, are used to estimate the energy-equivalent sound-pressure level descriptor (LAeq). These three methods are: (i) multilayer perceptron (MLP), (ii) sequential minimal optimisation (SMO), and (iii) Gaussian processes for regression (GPR). In addition, because of the high number of input variables involved in environmental-noise modelling and estimation in urban environments, which make LAeq prediction models quite complex and costly in terms of time and resources for application to real situations, three different techniques are used to approach feature selection or data reduction. The feature-selection techniques used are: (i) correlation-based feature-subset selection (CFS), (ii) wrapper for feature-subset selection (WFS), and the data reduction technique is principal-component analysis (PCA). The subsequent analysis leads to a proposal of different schemes, depending on the needs regarding data collection and accuracy. The use of WFS as the feature-selection technique with the implementation of SMO or GPR as regression algorithm provides the best LAeq estimation (R(2)=0.94 and mean absolute error (MAE)=1.14-1.16 dB(A)). Copyright © 2014 Elsevier B.V. All rights reserved.
Delwiche, Stephen R; Reeves, James B
2010-01-01
In multivariate regression analysis of spectroscopy data, spectral preprocessing is often performed to reduce unwanted background information (offsets, sloped baselines) or accentuate absorption features in intrinsically overlapping bands. These procedures, also known as pretreatments, are commonly smoothing operations or derivatives. While such operations are often useful in reducing the number of latent variables of the actual decomposition and lowering residual error, they also run the risk of misleading the practitioner into accepting calibration equations that are poorly adapted to samples outside of the calibration. The current study developed a graphical method to examine this effect on partial least squares (PLS) regression calibrations of near-infrared (NIR) reflection spectra of ground wheat meal with two analytes, protein content and sodium dodecyl sulfate sedimentation (SDS) volume (an indicator of the quantity of the gluten proteins that contribute to strong doughs). These two properties were chosen because of their differing abilities to be modeled by NIR spectroscopy: excellent for protein content, fair for SDS sedimentation volume. To further demonstrate the potential pitfalls of preprocessing, an artificial component, a randomly generated value, was included in PLS regression trials. Savitzky-Golay (digital filter) smoothing, first-derivative, and second-derivative preprocess functions (5 to 25 centrally symmetric convolution points, derived from quadratic polynomials) were applied to PLS calibrations of 1 to 15 factors. The results demonstrated the danger of an over reliance on preprocessing when (1) the number of samples used in a multivariate calibration is low (<50), (2) the spectral response of the analyte is weak, and (3) the goodness of the calibration is based on the coefficient of determination (R(2)) rather than a term based on residual error. The graphical method has application to the evaluation of other preprocess functions and various types of spectroscopy data.
Zhang, Yuji; Li, Xiaoju; Mao, Lu; Zhang, Mei; Li, Ke; Zheng, Yinxia; Cui, Wangfei; Yin, Hongpo; He, Yanli; Jing, Mingxia
2018-01-01
The analysis of factors affecting the nonadherence to antihypertensive medications is important in the control of blood pressure among patients with hypertension. The purpose of this study was to assess the relationship between factors and medication adherence in Xinjiang community-managed patients with hypertension based on the principal component analysis. A total of 1,916 community-managed patients with hypertension, selected randomly through a multi-stage sampling, participated in the survey. Self-designed questionnaires were used to classify the participants as either adherent or nonadherent to their medication regimen. A principal component analysis was used in order to eliminate the correlation between factors. Factors related to nonadherence were analyzed by using a χ 2 -test and a binary logistic regression model. This study extracted nine common factors, with a cumulative variance contribution rate of 63.6%. Further analysis revealed that the following variables were significantly related to nonadherence: severity of disease, community management, diabetes, and taking traditional medications. Community management plays an important role in improving the patients' medication-taking behavior. Regular medication regimen instruction and better community management services through community-level have the potential to reduce nonadherence. Mild hypertensive patients should be monitored by community health care providers.
Miller, Jane E; Nugent, Colleen N; Russell, Louise B
2015-01-01
Objectives To examine which components of medical homes affect time families spend arranging/coordinating health care for their children with special health care needs (CSHCNs) and providing health care at home. Data Sources 2009–2010 National Survey of Children with Special Health Care Needs (NS-CSHCN), a population-based survey of 40,242 CSHCNs. Study Design NS-CSHCN is a cross-sectional, observational study. We used generalized ordered logistic regression, testing for nonproportional odds in the associations between each of five medical home components and time burden, controlling for insurance, child health, and sociodemographics. Data Collection/Extraction Methods Medical home components were collected using Child and Adolescent Health Measurement Initiative definitions. Principal Findings Family-centered care, care coordination, and obtaining needed referrals were associated with 15–32 percent lower odds of time burdens arranging/coordinating and 16–19 percent lower odds providing health care. All five components together were associated with lower odds of time burdens, with greater reductions for higher burdens providing care. Conclusions Three of the five medical home components were associated with lower family time burdens arranging/coordinating and providing health care for children with chronic conditions. If the 55 percent of CSHCNs lacking medical homes had one, the share of families with time burdens arranging care could be reduced by 13 percent. PMID:25100200
NASA Astrophysics Data System (ADS)
Delbari, Masoomeh; Sharifazari, Salman; Mohammadi, Ehsan
2018-02-01
The knowledge of soil temperature at different depths is important for agricultural industry and for understanding climate change. The aim of this study is to evaluate the performance of a support vector regression (SVR)-based model in estimating daily soil temperature at 10, 30 and 100 cm depth at different climate conditions over Iran. The obtained results were compared to those obtained from a more classical multiple linear regression (MLR) model. The correlation sensitivity for the input combinations and periodicity effect were also investigated. Climatic data used as inputs to the models were minimum and maximum air temperature, solar radiation, relative humidity, dew point, and the atmospheric pressure (reduced to see level), collected from five synoptic stations Kerman, Ahvaz, Tabriz, Saghez, and Rasht located respectively in the hyper-arid, arid, semi-arid, Mediterranean, and hyper-humid climate conditions. According to the results, the performance of both MLR and SVR models was quite well at surface layer, i.e., 10-cm depth. However, SVR performed better than MLR in estimating soil temperature at deeper layers especially 100 cm depth. Moreover, both models performed better in humid climate condition than arid and hyper-arid areas. Further, adding a periodicity component into the modeling process considerably improved the models' performance especially in the case of SVR.
Enhanced ID Pit Sizing Using Multivariate Regression Algorithm
NASA Astrophysics Data System (ADS)
Krzywosz, Kenji
2007-03-01
EPRI is funding a program to enhance and improve the reliability of inside diameter (ID) pit sizing for balance-of plant heat exchangers, such as condensers and component cooling water heat exchangers. More traditional approaches to ID pit sizing involve the use of frequency-specific amplitude or phase angles. The enhanced multivariate regression algorithm for ID pit depth sizing incorporates three simultaneous input parameters of frequency, amplitude, and phase angle. A set of calibration data sets consisting of machined pits of various rounded and elongated shapes and depths was acquired in the frequency range of 100 kHz to 1 MHz for stainless steel tubing having nominal wall thickness of 0.028 inch. To add noise to the acquired data set, each test sample was rotated and test data acquired at 3, 6, 9, and 12 o'clock positions. The ID pit depths were estimated using a second order and fourth order regression functions by relying on normalized amplitude and phase angle information from multiple frequencies. Due to unique damage morphology associated with the microbiologically-influenced ID pits, it was necessary to modify the elongated calibration standard-based algorithms by relying on the algorithm developed solely from the destructive sectioning results. This paper presents the use of transformed multivariate regression algorithm to estimate ID pit depths and compare the results with the traditional univariate phase angle analysis. Both estimates were then compared with the destructive sectioning results.
Li, Jieyue; Xiong, Liang; Schneider, Jeff; Murphy, Robert F
2012-06-15
Knowledge of the subcellular location of a protein is crucial for understanding its functions. The subcellular pattern of a protein is typically represented as the set of cellular components in which it is located, and an important task is to determine this set from microscope images. In this article, we address this classification problem using confocal immunofluorescence images from the Human Protein Atlas (HPA) project. The HPA contains images of cells stained for many proteins; each is also stained for three reference components, but there are many other components that are invisible. Given one such cell, the task is to classify the pattern type of the stained protein. We first randomly select local image regions within the cells, and then extract various carefully designed features from these regions. This region-based approach enables us to explicitly study the relationship between proteins and different cell components, as well as the interactions between these components. To achieve these two goals, we propose two discriminative models that extend logistic regression with structured latent variables. The first model allows the same protein pattern class to be expressed differently according to the underlying components in different regions. The second model further captures the spatial dependencies between the components within the same cell so that we can better infer these components. To learn these models, we propose a fast approximate algorithm for inference, and then use gradient-based methods to maximize the data likelihood. In the experiments, we show that the proposed models help improve the classification accuracies on synthetic data and real cellular images. The best overall accuracy we report in this article for classifying 942 proteins into 13 classes of patterns is about 84.6%, which to our knowledge is the best so far. In addition, the dependencies learned are consistent with prior knowledge of cell organization. http://murphylab.web.cmu.edu/software/.
Orthogonal decomposition of left ventricular remodeling in myocardial infarction
Zhang, Xingyu; Medrano-Gracia, Pau; Ambale-Venkatesh, Bharath; Bluemke, David A.; Cowan, Brett R; Finn, J. Paul; Kadish, Alan H.; Lee, Daniel C.; Lima, Joao A. C.; Young, Alistair A.; Suinesiaputra, Avan
2017-01-01
Abstract Left ventricular size and shape are important for quantifying cardiac remodeling in response to cardiovascular disease. Geometric remodeling indices have been shown to have prognostic value in predicting adverse events in the clinical literature, but these often describe interrelated shape changes. We developed a novel method for deriving orthogonal remodeling components directly from any (moderately independent) set of clinical remodeling indices. Results: Six clinical remodeling indices (end-diastolic volume index, sphericity, relative wall thickness, ejection fraction, apical conicity, and longitudinal shortening) were evaluated using cardiac magnetic resonance images of 300 patients with myocardial infarction, and 1991 asymptomatic subjects, obtained from the Cardiac Atlas Project. Partial least squares (PLS) regression of left ventricular shape models resulted in remodeling components that were optimally associated with each remodeling index. A Gram–Schmidt orthogonalization process, by which remodeling components were successively removed from the shape space in the order of shape variance explained, resulted in a set of orthonormal remodeling components. Remodeling scores could then be calculated that quantify the amount of each remodeling component present in each case. A one-factor PLS regression led to more decoupling between scores from the different remodeling components across the entire cohort, and zero correlation between clinical indices and subsequent scores. Conclusions: The PLS orthogonal remodeling components had similar power to describe differences between myocardial infarction patients and asymptomatic subjects as principal component analysis, but were better associated with well-understood clinical indices of cardiac remodeling. The data and analyses are available from www.cardiacatlas.org. PMID:28327972
Ohseto, Hisashi; Ishikuro, Mami; Kikuya, Masahiro; Obara, Taku; Igarashi, Yuko; Takahashi, Satomi; Kikuchi, Daisuke; Shigihara, Michiko; Yamanaka, Chizuru; Miyashita, Masako; Mizuno, Satoshi; Nagai, Masato; Matsubara, Hiroko; Sato, Yuki; Metoki, Hirohito; Tachibana, Hirofumi; Maeda-Yamamoto, Mari; Kuriyama, Shinichi
2018-04-01
Metabolic syndrome and the presence of metabolic syndrome components are risk factors for cardiovascular disease (CVD). However, the association between personality traits and metabolic syndrome remains controversial, and few studies have been conducted in East Asian populations. We measured personality traits using the Japanese version of the Eysenck Personality Questionnaire (Revised Short Form) and five metabolic syndrome components-elevated waist circumference, elevated triglycerides, reduced high-density lipoprotein cholesterol, elevated blood pressure, and elevated fasting glucose-in 1322 participants aged 51.1±12.7years old from Kakegawa city, Japan. Metabolic syndrome score (MS score) was defined as the number of metabolic syndrome components present, and metabolic syndrome as having the MS score of 3 or higher. We performed multiple logistic regression analyses to examine the relationship between personality traits and metabolic syndrome components and multiple regression analyses to examine the relationship between personality traits and MS scores adjusted for age, sex, education, income, smoking status, alcohol use, and family history of CVD and diabetes mellitus. We also examine the relationship between personality traits and metabolic syndrome presence by multiple logistic regression analyses. "Extraversion" scores were higher in those with metabolic syndrome components (elevated waist circumference: P=0.001; elevated triglycerides: P=0.01; elevated blood pressure: P=0.004; elevated fasting glucose: P=0.002). "Extraversion" was associated with the MS score (coefficient=0.12, P=0.0003). No personality trait was significantly associated with the presence of metabolic syndrome. Higher "extraversion" scores were related to higher MS scores, but no personality trait was significantly associated with the presence of metabolic syndrome. Copyright © 2018 Elsevier Inc. All rights reserved.
Meaning profiles of dwellings, pathways, and metaphors in design: implications for education
NASA Astrophysics Data System (ADS)
Casakin, Hernan; Kreitler, Shulamith
2017-11-01
The study deals with the roles and interrelations of the meaning-based assessments of dwellings, pathways and metaphors in design performance. It is grounded in the Meaning Theory [Kreitler, S., and H. Kreitler. 1990. The Cognitive Foundations of Personality Traits. New York: Plenum], which enables identifying the cognitive contents and processes underlying cognitive performance in different domains, thus rendering them more accessible to educational training. The objectives were to identify the components of the meaning profiles of dwellings, pathways, and metaphors as perceived by design students; to analyse their interrelations; and to examine which of the identified components of these constructs serve as best predictors of design performance aided by the use of metaphors. Participants were administered a design task and questionnaires about the Dimensional Profiles of Dwellings, Pathways, and Metaphors, based on the meaning system. Factors based on the factor analyses of the responses to the three questionnaires were used in regression analyses as predictors of the performance score in a design task. The following three factors of the dimensional meaning profiles of metaphors were significant predictors of design performance: sensory, functional, and structural evaluations. Implications for design education are discussed, primarily concerning the important role of metaphor in design problem-solving.
USDA-ARS?s Scientific Manuscript database
Improvement of cold tolerance of winter wheat (Triticum aestivum L.) through breeding methods has been problematic. A better understanding of how individual wheat cultivars respond to components of the freezing process may provide new information that can be used to develop more cold tolerance culti...
Which Components of Working Memory Are Important in the Writing Process?
ERIC Educational Resources Information Center
Vanderberg, Robert; Swanson, H. Lee
2007-01-01
This study investigated the relationship between components of working memory (WM) and the macrostructure (e.g., planning, writing, and revision) and microstructure (e.g., grammar, punctuation) of writing. A battery of WM and writing measures were administered to 160 high-school students. Overall, hierarchical regression analyses showed that the…
Age-Dependent and Age-Independent Measures of Locus of Control.
ERIC Educational Resources Information Center
Sherman, Lawrence W.; Hofmann, Richard
Using a longitudinal data set obtained from 169 pre-adolescent children between the ages of 8 and 13 years, this study statistically divided locus of control into two independent components. The first component was noted as "age-dependent" (AD) and was determined by predicted values generated by regressing children's ages onto their…
Resting-State Functional Connectivity Predicts Cognitive Impairment Related to Alzheimer's Disease.
Lin, Qi; Rosenberg, Monica D; Yoo, Kwangsun; Hsu, Tiffany W; O'Connell, Thomas P; Chun, Marvin M
2018-01-01
Resting-state functional connectivity (rs-FC) is a promising neuromarker for cognitive decline in aging population, based on its ability to reveal functional differences associated with cognitive impairment across individuals, and because rs-fMRI may be less taxing for participants than task-based fMRI or neuropsychological tests. Here, we employ an approach that uses rs-FC to predict the Alzheimer's Disease Assessment Scale (11 items; ADAS11) scores, which measure overall cognitive functioning, in novel individuals. We applied this technique, connectome-based predictive modeling, to a heterogeneous sample of 59 subjects from the Alzheimer's Disease Neuroimaging Initiative, including normal aging, mild cognitive impairment, and AD subjects. First, we built linear regression models to predict ADAS11 scores from rs-FC measured with Pearson's r correlation. The positive network model tested with leave-one-out cross validation (LOOCV) significantly predicted individual differences in cognitive function from rs-FC. In a second analysis, we considered other functional connectivity features, accordance and discordance, which disentangle the correlation and anticorrelation components of activity timecourses between brain areas. Using partial least square regression and LOOCV, we again built models to successfully predict ADAS11 scores in novel individuals. Our study provides promising evidence that rs-FC can reveal cognitive impairment in an aging population, although more development is needed for clinical application.
Tremblay, Marlène; Crim, Stacy M; Cole, Dana J; Hoekstra, Robert M; Henao, Olga L; Döpfer, Dörte
2017-10-01
The Foodborne Diseases Active Surveillance Network (FoodNet) is currently using a negative binomial (NB) regression model to estimate temporal changes in the incidence of Campylobacter infection. FoodNet active surveillance in 483 counties collected data on 40,212 Campylobacter cases between years 2004 and 2011. We explored models that disaggregated these data to allow us to account for demographic, geographic, and seasonal factors when examining changes in incidence of Campylobacter infection. We hypothesized that modeling structural zeros and including demographic variables would increase the fit of FoodNet's Campylobacter incidence regression models. Five different models were compared: NB without demographic covariates, NB with demographic covariates, hurdle NB with covariates in the count component only, hurdle NB with covariates in both zero and count components, and zero-inflated NB with covariates in the count component only. Of the models evaluated, the nonzero-augmented NB model with demographic variables provided the best fit. Results suggest that even though zero inflation was not present at this level, individualizing the level of aggregation and using different model structures and predictors per site might be required to correctly distinguish between structural and observational zeros and account for risk factors that vary geographically.
Giesen, E B W; Ding, M; Dalstra, M; van Eijden, T M G J
2003-09-01
As several morphological parameters of cancellous bone express more or less the same architectural measure, we applied principal components analysis to group these measures and correlated these to the mechanical properties. Cylindrical specimens (n = 24) were obtained in different orientations from embalmed mandibular condyles; the angle of the first principal direction and the axis of the specimen, expressing the orientation of the trabeculae, ranged from 10 degrees to 87 degrees. Morphological parameters were determined by a method based on Archimedes' principle and by micro-CT scanning, and the mechanical properties were obtained by mechanical testing. The principal components analysis was used to obtain a set of independent components to describe the morphology. This set was entered into linear regression analyses for explaining the variance in mechanical properties. The principal components analysis revealed four components: amount of bone, number of trabeculae, trabecular orientation, and miscellaneous. They accounted for about 90% of the variance in the morphological variables. The component loadings indicated that a higher amount of bone was primarily associated with more plate-like trabeculae, and not with more or thicker trabeculae. The trabecular orientation was most determinative (about 50%) in explaining stiffness, strength, and failure energy. The amount of bone was second most determinative and increased the explained variance to about 72%. These results suggest that trabecular orientation and amount of bone are important in explaining the anisotropic mechanical properties of the cancellous bone of the mandibular condyle.
Evaluation of driver fatigue on two channels of EEG data.
Li, Wei; He, Qi-chang; Fan, Xiu-min; Fei, Zhi-min
2012-01-11
Electroencephalogram (EEG) data is an effective indicator to evaluate driver fatigue. The 16 channels of EEG data are collected and transformed into three bands (θ, α, and β) in the current paper. First, 12 types of energy parameters are computed based on the EEG data. Then, Grey Relational Analysis (GRA) is introduced to identify the optimal indicator of driver fatigue, after which, the number of significant electrodes is reduced using Kernel Principle Component Analysis (KPCA). Finally, the evaluation model for driver fatigue is established with the regression equation based on the EEG data from two significant electrodes (Fp1 and O1). The experimental results verify that the model is effective in evaluating driver fatigue. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.
Discrimination of serum Raman spectroscopy between normal and colorectal cancer
NASA Astrophysics Data System (ADS)
Li, Xiaozhou; Yang, Tianyue; Yu, Ting; Li, Siqi
2011-07-01
Raman spectroscopy of tissues has been widely studied for the diagnosis of various cancers, but biofluids were seldom used as the analyte because of the low concentration. Herein, serum of 30 normal people, 46 colon cancer, and 44 rectum cancer patients were measured Raman spectra and analyzed. The information of Raman peaks (intensity and width) and that of the fluorescence background (baseline function coefficients) were selected as parameters for statistical analysis. Principal component regression (PCR) and partial least square regression (PLSR) were used on the selected parameters separately to see the performance of the parameters. PCR performed better than PLSR in our spectral data. Then linear discriminant analysis (LDA) was used on the principal components (PCs) of the two regression method on the selected parameters, and a diagnostic accuracy of 88% and 83% were obtained. The conclusion is that the selected features can maintain the information of original spectra well and Raman spectroscopy of serum has the potential for the diagnosis of colorectal cancer.
Del Brutto, Oscar H; Mera, Robertino M; Zambrano, Mauricio
2016-04-01
Studies investigating a possible correlation between metabolic syndrome and cognitive decline have been inconsistent. To determine whether metabolic syndrome or each of its components correlate with cognitive performance in community-dwelling older adults in rural Ecuador. Stroke-free Atahualpa residents aged ≥60 years were identified during a door-to-door survey. Metabolic syndrome was defined according to the International Diabetes Federation criteria. Cognition was evaluated by the use of the Montreal Cognitive Assessment (MoCA). Multivariate logistic regression models estimated the association between metabolic syndrome and each of its components with cognitive performance. A total of 212 persons (mean age: 69.2 ± 7.2 years, 64 % women) were enrolled. Of these, 120 (57 %) had metabolic syndrome. Mean scores in the MoCA were 18.2 ± 4.6 for persons with and 19 ± 4.7 for those without metabolic syndrome. In fully adjusted logistic models, MoCA scores were not associated with metabolic syndrome (p = 0.101). After testing individual components of metabolic syndrome with the MoCA score, we found that only hypertriglyceridemia was independently associated with the MoCA score (p = 0.009). This population-based study showed a poor correlation of metabolic syndrome with cognitive performance after adjusting for relevant confounders. Of the individual components of metabolic syndrome, only hypertriglyceridemia correlated with worse cognitive performance.
Climatic change projections for winter streamflow in Guadalquivir river
NASA Astrophysics Data System (ADS)
Jesús Esteban Parra, María; Hidalgo Muñoz, José Manuel; García-Valdecasas-Ojeda, Matilde; Raquel Gámiz Fortis, Sonia; Castro Díez, Yolanda
2015-04-01
In this work we have obtained climate change projections for winter streamflow of the Guadalquivir River in the period 2071-2100 using the Principal Component Regression (PCR) method. The streamflow data base used has been provided by the Center for Studies and Experimentation of Public Works, CEDEX. Series from gauging stations and reservoirs with less than 10% of missing data (filled by regression with well correlated neighboring stations) have been considered. The homogeneity of these series has been evaluated through the Pettit test and degree of human alteration by the Common Area Index. The application of these criteria led to the selection of 13 streamflow time series homogeneously distributed over the basin, covering the period 1952-2011. For this streamflow data, winter seasonal values were obtained by averaging the monthly values from January to March. The PCR method has been applied using the Principal Components of the mean anomalies of sea level pressure (SLP) in winter (December to February averaged) as predictors of streamflow for the development of a downscaled statistical model. The SLP database is the NCEP reanalysis covering the North Atlantic region, and the calibration and validation periods used for fitting and evaluating the ability of the model are 1952-1992 and 1993-2011, respectively. In general, using four Principal Components, regression models are able to explain up to 70% of the variance of the streamflow data. Finally, the statistical model obtained for the observational data was applied to the SLP data for the period 2071-2100, using the outputs of different GCMs of the CMIP5 under the RPC8.5 scenario. The results found for the end of the century show no significant changes or moderate decrease in the streamflow of this river for most GCMs in winter, but for some of them the decrease is very strong. Keywords: Statistical downscaling, streamflow, Guadalquivir River, climate change. ACKNOWLEDGEMENTS This work has been financed by the projects P11-RNM-7941 (Junta de Andalucía-Spain) and CGL2013-48539-R (MINECO-Spain, FEDER).
Carabaño, M J; Díaz, C; Ugarte, C; Serrano, M
2007-02-01
Artificial insemination centers routinely collect records of quantity and quality of semen of bulls throughout the animals' productive period. The goal of this paper was to explore the use of random regression models with orthogonal polynomials to analyze repeated measures of semen production of Spanish Holstein bulls. A total of 8,773 records of volume of first ejaculate (VFE) collected between 12 and 30 mo of age from 213 Spanish Holstein bulls was analyzed under alternative random regression models. Legendre polynomial functions of increasing order (0 to 6) were fitted to the average trajectory, additive genetic and permanent environmental effects. Age at collection and days in production were used as time variables. Heterogeneous and homogeneous residual variances were alternatively assumed. Analyses were carried out within a Bayesian framework. The logarithm of the marginal density and the cross-validation predictive ability of the data were used as model comparison criteria. Based on both criteria, age at collection as a time variable and heterogeneous residuals models are recommended to analyze changes of VFE over time. Both criteria indicated that fitting random curves for genetic and permanent environmental components as well as for the average trajector improved the quality of models. Furthermore, models with a higher order polynomial for the permanent environmental (5 to 6) than for the genetic components (4 to 5) and the average trajectory (2 to 3) tended to perform best. High-order polynomials were needed to accommodate the highly oscillating nature of the phenotypic values. Heritability and repeatability estimates, disregarding the extremes of the studied period, ranged from 0.15 to 0.35 and from 0.20 to 0.50, respectively, indicating that selection for VFE may be effective at any stage. Small differences among models were observed. Apart from the extremes, estimated correlations between ages decreased steadily from 0.9 and 0.4 for measures 1 mo apart to 0.4 and 0.2 for most distant measures for additive genetic and phenotypic components, respectively. Further investigation to account for environmental factors that may be responsible for the oscillating observations of VFE is needed.
Nakatochi, Masahiro; Ushida, Yasunori; Yasuda, Yoshinari; Yoshida, Yasuko; Kawai, Shun; Kato, Ryuji; Nakashima, Toru; Iwata, Masamitsu; Kuwatsuka, Yachiyo; Ando, Masahiko; Hamajima, Nobuyuki; Kondo, Takaaki; Oda, Hiroaki; Hayashi, Mutsuharu; Kato, Sawako; Yamaguchi, Makoto; Maruyama, Shoichi; Matsuo, Seiichi; Honda, Hiroyuki
2015-01-01
Although many single nucleotide polymorphisms (SNPs) have been identified to be associated with metabolic syndrome (MetS), there was only a slight improvement in the ability to predict future MetS by the simply addition of SNPs to clinical risk markers. To improve the ability to predict future MetS, combinational effects, such as SNP—SNP interaction, SNP—environment interaction, and SNP—clinical parameter (SNP × CP) interaction should be also considered. We performed a case-control study to explore novel SNP × CP interactions as risk markers for MetS based on health check-up data of Japanese male employees. We selected 99 SNPs that were previously reported to be associated with MetS and components of MetS; subsequently, we genotyped these SNPs from 360 cases and 1983 control subjects. First, we performed logistic regression analyses to assess the association of each SNP with MetS. Of these SNPs, five SNPs were significantly associated with MetS (P < 0.05): LRP2 rs2544390, rs1800592 between UCP1 and TBC1D9, APOA5 rs662799, VWF rs7965413, and rs1411766 between MYO16 and IRS2. Furthermore, we performed multiple logistic regression analyses, including an SNP term, a CP term, and an SNP × CP interaction term for each CP and SNP that was significantly associated with MetS. We identified a novel SNP × CP interaction between rs7965413 and platelet count that was significantly associated with MetS [SNP term: odds ratio (OR) = 0.78, P = 0.004; SNP × CP interaction term: OR = 1.33, P = 0.001]. This association of the SNP × CP interaction with MetS remained nominally significant in multiple logistic regression analysis after adjustment for either the number of MetS components or MetS components excluding obesity. Our results reveal new insight into platelet count as a risk marker for MetS. PMID:25646961
Nakatochi, Masahiro; Ushida, Yasunori; Yasuda, Yoshinari; Yoshida, Yasuko; Kawai, Shun; Kato, Ryuji; Nakashima, Toru; Iwata, Masamitsu; Kuwatsuka, Yachiyo; Ando, Masahiko; Hamajima, Nobuyuki; Kondo, Takaaki; Oda, Hiroaki; Hayashi, Mutsuharu; Kato, Sawako; Yamaguchi, Makoto; Maruyama, Shoichi; Matsuo, Seiichi; Honda, Hiroyuki
2015-01-01
Although many single nucleotide polymorphisms (SNPs) have been identified to be associated with metabolic syndrome (MetS), there was only a slight improvement in the ability to predict future MetS by the simply addition of SNPs to clinical risk markers. To improve the ability to predict future MetS, combinational effects, such as SNP-SNP interaction, SNP-environment interaction, and SNP-clinical parameter (SNP × CP) interaction should be also considered. We performed a case-control study to explore novel SNP × CP interactions as risk markers for MetS based on health check-up data of Japanese male employees. We selected 99 SNPs that were previously reported to be associated with MetS and components of MetS; subsequently, we genotyped these SNPs from 360 cases and 1983 control subjects. First, we performed logistic regression analyses to assess the association of each SNP with MetS. Of these SNPs, five SNPs were significantly associated with MetS (P < 0.05): LRP2 rs2544390, rs1800592 between UCP1 and TBC1D9, APOA5 rs662799, VWF rs7965413, and rs1411766 between MYO16 and IRS2. Furthermore, we performed multiple logistic regression analyses, including an SNP term, a CP term, and an SNP × CP interaction term for each CP and SNP that was significantly associated with MetS. We identified a novel SNP × CP interaction between rs7965413 and platelet count that was significantly associated with MetS [SNP term: odds ratio (OR) = 0.78, P = 0.004; SNP × CP interaction term: OR = 1.33, P = 0.001]. This association of the SNP × CP interaction with MetS remained nominally significant in multiple logistic regression analysis after adjustment for either the number of MetS components or MetS components excluding obesity. Our results reveal new insight into platelet count as a risk marker for MetS.
Mathews, Catherine; Eggers, Sander M; Townsend, Loraine; Aarø, Leif E; de Vries, Petrus J; Mason-Jones, Amanda J; De Koker, Petra; McClinton Appollis, Tracy; Mtshizana, Yolisa; Koech, Joy; Wubs, Annegreet; De Vries, Hein
2016-09-01
Young South Africans, especially women, are at high risk of HIV. We evaluated the effects of PREPARE, a multi-component, school-based HIV prevention intervention to delay sexual debut, increase condom use and decrease intimate partner violence (IPV) among young adolescents. We conducted a cluster RCT among Grade eights in 42 high schools. The intervention comprised education sessions, a school health service and a school sexual violence prevention programme. Participants completed questionnaires at baseline, 6 and 12 months. Regression was undertaken to provide ORs or coefficients adjusted for clustering. Of 6244 sampled adolescents, 55.3 % participated. At 12 months there were no differences between intervention and control arms in sexual risk behaviours. Participants in the intervention arm were less likely to report IPV victimisation (35.1 vs. 40.9 %; OR 0.77, 95 % CI 0.61-0.99; t(40) = 2.14) suggesting the intervention shaped intimate partnerships into safer ones, potentially lowering the risk for HIV.
Ribonucleoprotein components in liver cell nuclei as visualized by cryoultramicrotomy
1975-01-01
The interphase nucleus of the normal rat hepatocyte has been studied in ultrathin frozen sections after glutaraldehyde fixation and the modification of various staining procedures known to be specific for DNA structures (Moyne's thallium stain, Gautier's osmium-ammine) or preferential for RNP carriers and basic proteins (regressive stains based on the use of EDTA or citrate, negatively charged colloidal iron). The results are comparable to those obtained after classical dehydration and embedding. Particular attention has been paid to the nucleolus and extranucleolar RNP components, such as perichromatin fibrils and granules, as well as interchromatin granules. A striking observation was the uneven size and the strongly increased number of perichromatin granules, and the appearance of a contiguous interchromatin net, containing nucleoproteins. Cryoultramicrotomy without embedding appears to be very useful for the exploration of the nucleus in thick sections which remain sufficiently transparent even with the usual accelerating voltages. PMID:51852
A study of fuzzy logic ensemble system performance on face recognition problem
NASA Astrophysics Data System (ADS)
Polyakova, A.; Lipinskiy, L.
2017-02-01
Some problems are difficult to solve by using a single intelligent information technology (IIT). The ensemble of the various data mining (DM) techniques is a set of models which are able to solve the problem by itself, but the combination of which allows increasing the efficiency of the system as a whole. Using the IIT ensembles can improve the reliability and efficiency of the final decision, since it emphasizes on the diversity of its components. The new method of the intellectual informational technology ensemble design is considered in this paper. It is based on the fuzzy logic and is designed to solve the classification and regression problems. The ensemble consists of several data mining algorithms: artificial neural network, support vector machine and decision trees. These algorithms and their ensemble have been tested by solving the face recognition problems. Principal components analysis (PCA) is used for feature selection.
Social networks and health-related quality of life: a population based study among older adults.
Gallegos-Carrillo, Katia; Mudgal, Jyoti; Sánchez-García, Sergio; Wagner, Fernando A; Gallo, Joseph J; Salmerón, Jorge; García-Peña, Carmen
2009-01-01
To examine the relationship between components of social networks and health-related quality of life (HRQL) in older adults with and without depressive symptoms. Comparative cross-sectional study with data from the cohort study 'Integral Study of Depression', carried out in Mexico City during 2004. The sample was selected through a multi-stage probability design. HRQL was measured with the SF-36. Geriatric Depression Scale (GDS) and the Short Anxiety Screening Test (SAST) determined depressive symptoms and anxiety. T-test and multiple linear regressions were conducted. Older adults with depressive symptoms had the lowest scores in all HRQL scales. A larger network of close relatives and friends was associated with better HRQL on several scales. Living alone did not significantly affect HRQL level, in either the study or comparison group. A positive association between some components of social networks and good HRQL exists even in older adults with depressive symptoms.
[Theory, method and application of method R on estimation of (co)variance components].
Liu, Wen-Zhong
2004-07-01
Theory, method and application of Method R on estimation of (co)variance components were reviewed in order to make the method be reasonably used. Estimation requires R values,which are regressions of predicted random effects that are calculated using complete dataset on predicted random effects that are calculated using random subsets of the same data. By using multivariate iteration algorithm based on a transformation matrix,and combining with the preconditioned conjugate gradient to solve the mixed model equations, the computation efficiency of Method R is much improved. Method R is computationally inexpensive,and the sampling errors and approximate credible intervals of estimates can be obtained. Disadvantages of Method R include a larger sampling variance than other methods for the same data,and biased estimates in small datasets. As an alternative method, Method R can be used in larger datasets. It is necessary to study its theoretical properties and broaden its application range further.
DOE Office of Scientific and Technical Information (OSTI.GOV)
2012-01-05
SandiaMCR was developed to identify pure components and their concentrations from spectral data. This software efficiently implements the multivariate calibration regression alternating least squares (MCR-ALS), principal component analysis (PCA), and singular value decomposition (SVD). Version 3.37 also includes the PARAFAC-ALS Tucker-1 (for trilinear analysis) algorithms. The alternating least squares methods can be used to determine the composition without or with incomplete prior information on the constituents and their concentrations. It allows the specification of numerous preprocessing, initialization and data selection and compression options for the efficient processing of large data sets. The software includes numerous options including the definition ofmore » equality and non-negativety constraints to realistically restrict the solution set, various normalization or weighting options based on the statistics of the data, several initialization choices and data compression. The software has been designed to provide a practicing spectroscopist the tools required to routinely analysis data in a reasonable time and without requiring expert intervention.« less
NASA Astrophysics Data System (ADS)
Vokhmyanin, M. V.; Ponyavin, D. I.
2016-12-01
The interplanetary magnetic field (IMF) By component affects the configuration of field-aligned currents (FAC) whose geomagnetic response is observed from high to low latitudes. The ground magnetic perturbations induced by FACs are opposite on the dawnside and duskside and depend upon the IMF By polarity. Based on the multilinear regression analysis, we show that this effect is presented at the midlatitude observatories, Niemegk and Arti, in the X and Y components of the geomagnetic field. This allows us to infer the IMF sector structure from the old geomagnetic records made at Ekaterinburg and Potsdam since 1850 and 1890, respectively. Geomagnetic data from various stations provide proxies of the IMF polarity which coincide for the most part of the nineteenth and twentieth centuries. This supports their reliabilities and makes them suitable for studying the large-scale IMF sector structure in the past.
Fluoroscopic tumor tracking for image-guided lung cancer radiotherapy
NASA Astrophysics Data System (ADS)
Lin, Tong; Cerviño, Laura I.; Tang, Xiaoli; Vasconcelos, Nuno; Jiang, Steve B.
2009-02-01
Accurate lung tumor tracking in real time is a keystone to image-guided radiotherapy of lung cancers. Existing lung tumor tracking approaches can be roughly grouped into three categories: (1) deriving tumor position from external surrogates; (2) tracking implanted fiducial markers fluoroscopically or electromagnetically; (3) fluoroscopically tracking lung tumor without implanted fiducial markers. The first approach suffers from insufficient accuracy, while the second may not be widely accepted due to the risk of pneumothorax. Previous studies in fluoroscopic markerless tracking are mainly based on template matching methods, which may fail when the tumor boundary is unclear in fluoroscopic images. In this paper we propose a novel markerless tumor tracking algorithm, which employs the correlation between the tumor position and surrogate anatomic features in the image. The positions of the surrogate features are not directly tracked; instead, we use principal component analysis of regions of interest containing them to obtain parametric representations of their motion patterns. Then, the tumor position can be predicted from the parametric representations of surrogates through regression. Four regression methods were tested in this study: linear and two-degree polynomial regression, artificial neural network (ANN) and support vector machine (SVM). The experimental results based on fluoroscopic sequences of ten lung cancer patients demonstrate a mean tracking error of 2.1 pixels and a maximum error at a 95% confidence level of 4.6 pixels (pixel size is about 0.5 mm) for the proposed tracking algorithm.
Online measurement of urea concentration in spent dialysate during hemodialysis.
Olesberg, Jonathon T; Arnold, Mark A; Flanigan, Michael J
2004-01-01
We describe online optical measurements of urea in the effluent dialysate line during regular hemodialysis treatment of several patients. Monitoring urea removal can provide valuable information about dialysis efficiency. Spectral measurements were performed with a Fourier-transform infrared spectrometer equipped with a flow-through cell. Spectra were recorded across the 5000-4000 cm(-1) (2.0-2.5 microm) wavelength range at 1-min intervals. Savitzky-Golay filtering was used to remove baseline variations attributable to the temperature dependence of the water absorption spectrum. Urea concentrations were extracted from the filtered spectra by use of partial least-squares regression and the net analyte signal of urea. Urea concentrations predicted by partial least-squares regression matched concentrations obtained from standard chemical assays with a root mean square error of 0.30 mmol/L (0.84 mg/dL urea nitrogen) over an observed concentration range of 0-11 mmol/L. The root mean square error obtained with the net analyte signal of urea was 0.43 mmol/L with a calibration based only on a set of pure-component spectra. The error decreased to 0.23 mmol/L when a slope and offset correction were used. Urea concentrations can be continuously monitored during hemodialysis by near-infrared spectroscopy. Calibrations based on the net analyte signal of urea are particularly appealing because they do not require a training step, as do statistical multivariate calibration procedures such as partial least-squares regression.
NASA Astrophysics Data System (ADS)
Li, D.; Nanseki, T.; Chomei, Y.; Yokota, S.
2017-07-01
Rice, a staple crop in Japan, is at risk of decreasing production and its yield highly depends on soil fertility. This study aimed to investigate determinants of rice yield, from the perspectives of fertilizer nitrogen and soil chemical properties. The data were sampled in 2014 and 2015 from 92 peat soil paddy fields on a large-scale farm located in the Kanto Region of Japan. The rice variety used was the most widely planted Koshihikari in Japan. Regression analysis indicated that fertilizer nitrogen significantly affected the yield, with a significant sustained effect to the subsequent year. Twelve soil chemical properties, including pH, cation exchange capacity, content of pyridine base elements, phosphoric acid, and silicic acid, were estimated. In addition to silicic acid, magnesia, in forms of its exchangeable content, saturation, and ratios to potassium and lime, positively affected the yield, while phosphoric acid negatively affected the yield. We assessed the soil chemical properties by soil quality index and principal component analysis. Positive effects were identified for both approaches, with the former performing better in explaining the rice yield. For soil quality index, the individual standardized soil properties and margins for improvement were indicated for each paddy field. Finally, multivariate regression on the principal components identified the most significant properties.
Risk prediction for myocardial infarction via generalized functional regression models.
Ieva, Francesca; Paganoni, Anna M
2016-08-01
In this paper, we propose a generalized functional linear regression model for a binary outcome indicating the presence/absence of a cardiac disease with multivariate functional data among the relevant predictors. In particular, the motivating aim is the analysis of electrocardiographic traces of patients whose pre-hospital electrocardiogram (ECG) has been sent to 118 Dispatch Center of Milan (the Italian free-toll number for emergencies) by life support personnel of the basic rescue units. The statistical analysis starts with a preprocessing of ECGs treated as multivariate functional data. The signals are reconstructed from noisy observations. The biological variability is then removed by a nonlinear registration procedure based on landmarks. Thus, in order to perform a data-driven dimensional reduction, a multivariate functional principal component analysis is carried out on the variance-covariance matrix of the reconstructed and registered ECGs and their first derivatives. We use the scores of the Principal Components decomposition as covariates in a generalized linear model to predict the presence of the disease in a new patient. Hence, a new semi-automatic diagnostic procedure is proposed to estimate the risk of infarction (in the case of interest, the probability of being affected by Left Bundle Brunch Block). The performance of this classification method is evaluated and compared with other methods proposed in literature. Finally, the robustness of the procedure is checked via leave-j-out techniques. © The Author(s) 2013.
NASA Astrophysics Data System (ADS)
Chen, Yi-Ying; Chu, Chia-Ren; Li, Ming-Hsu
2012-10-01
SummaryIn this paper we present a semi-parametric multivariate gap-filling model for tower-based measurement of latent heat flux (LE). Two statistical techniques, the principal component analysis (PCA) and a nonlinear interpolation approach were integrated into this LE gap-filling model. The PCA was first used to resolve the multicollinearity relationships among various environmental variables, including radiation, soil moisture deficit, leaf area index, wind speed, etc. Two nonlinear interpolation methods, multiple regressions (MRS) and the K-nearest neighbors (KNNs) were examined with random selected flux gaps for both clear sky and nighttime/cloudy data to incorporate into this LE gap-filling model. Experimental results indicated that the KNN interpolation approach is able to provide consistent LE estimations while MRS presents over estimations during nighttime/cloudy. Rather than using empirical regression parameters, the KNN approach resolves the nonlinear relationship between the gap-filled LE flux and principal components with adaptive K values under different atmospheric states. The developed LE gap-filling model (PCA with KNN) works with a RMSE of 2.4 W m-2 (˜0.09 mm day-1) at a weekly time scale by adding 40% artificial flux gaps into original dataset. Annual evapotranspiration at this study site were estimated at 736 mm (1803 MJ) and 728 mm (1785 MJ) for year 2008 and 2009, respectively.
Safety climate and mindful safety practices in the oil and gas industry.
Dahl, Øyvind; Kongsvik, Trond
2018-02-01
The existence of a positive association between safety climate and the safety behavior of sharp-end workers in high-risk organizations is supported by a considerable body of research. Previous research has primarily analyzed two components of safety behavior, namely safety compliance and safety participation. The present study extends previous research by looking into the relationship between safety climate and another component of safety behavior, namely mindful safety practices. Mindful safety practices are defined as the ability to be aware of critical factors in the environment and to act appropriately when dangers arise. Regression analysis was used to examine whether mindful safety practices are, like compliance and participation, promoted by a positive safety climate, in a questionnaire-based study of 5712 sharp-end workers in the oil and gas industry. The analysis revealed that a positive safety climate promotes mindful safety practices. The regression model accounted for roughly 31% of the variance in mindful safety practices. The most important safety climate factor was safety leadership. The findings clearly demonstrate that mindful safety practices are highly context-dependent, hence, manageable and susceptible to change. In order to improve safety climate in a direction which is favorable for mindful safety practices, the results demonstrate that it is important to give the fundamental features of safety climate high priority and in particular that of safety leadership. Copyright © 2017 National Safety Council and Elsevier Ltd. All rights reserved.
Park, Gwansik; Forman, Jason; Kim, Taewung; Panzer, Matthew B; Crandall, Jeff R
2018-02-28
The goal of this study was to explore a framework for developing injury risk functions (IRFs) in a bottom-up approach based on responses of parametrically variable finite element (FE) models representing exemplar populations. First, a parametric femur modeling tool was developed and validated using a subject-specific (SS)-FE modeling approach. Second, principal component analysis and regression were used to identify parametric geometric descriptors of the human femur and the distribution of those factors for 3 target occupant sizes (5th, 50th, and 95th percentile males). Third, distributions of material parameters of cortical bone were obtained from the literature for 3 target occupant ages (25, 50, and 75 years) using regression analysis. A Monte Carlo method was then implemented to generate populations of FE models of the femur for target occupants, using a parametric femur modeling tool. Simulations were conducted with each of these models under 3-point dynamic bending. Finally, model-based IRFs were developed using logistic regression analysis, based on the moment at fracture observed in the FE simulation. In total, 100 femur FE models incorporating the variation in the population of interest were generated, and 500,000 moments at fracture were observed (applying 5,000 ultimate strains for each synthesized 100 femur FE models) for each target occupant characteristics. Using the proposed framework on this study, the model-based IRFs for 3 target male occupant sizes (5th, 50th, and 95th percentiles) and ages (25, 50, and 75 years) were developed. The model-based IRF was located in the 95% confidence interval of the test-based IRF for the range of 15 to 70% injury risks. The 95% confidence interval of the developed IRF was almost in line with the mean curve due to a large number of data points. The framework proposed in this study would be beneficial for developing the IRFs in a bottom-up manner, whose range of variabilities is informed by the population-based FE model responses. Specifically, this method mitigates the uncertainties in applying empirical scaling and may improve IRF fidelity when a limited number of experimental specimens are available.
Misspecification of Cox regression models with composite endpoints
Wu, Longyang; Cook, Richard J
2012-01-01
Researchers routinely adopt composite endpoints in multicenter randomized trials designed to evaluate the effect of experimental interventions in cardiovascular disease, diabetes, and cancer. Despite their widespread use, relatively little attention has been paid to the statistical properties of estimators of treatment effect based on composite endpoints. We consider this here in the context of multivariate models for time to event data in which copula functions link marginal distributions with a proportional hazards structure. We then examine the asymptotic and empirical properties of the estimator of treatment effect arising from a Cox regression model for the time to the first event. We point out that even when the treatment effect is the same for the component events, the limiting value of the estimator based on the composite endpoint is usually inconsistent for this common value. We find that in this context the limiting value is determined by the degree of association between the events, the stochastic ordering of events, and the censoring distribution. Within the framework adopted, marginal methods for the analysis of multivariate failure time data yield consistent estimators of treatment effect and are therefore preferred. We illustrate the methods by application to a recent asthma study. Copyright © 2012 John Wiley & Sons, Ltd. PMID:22736519
Web-based tools for modelling and analysis of multivariate data: California ozone pollution activity
Dinov, Ivo D.; Christou, Nicolas
2014-01-01
This article presents a hands-on web-based activity motivated by the relation between human health and ozone pollution in California. This case study is based on multivariate data collected monthly at 20 locations in California between 1980 and 2006. Several strategies and tools for data interrogation and exploratory data analysis, model fitting and statistical inference on these data are presented. All components of this case study (data, tools, activity) are freely available online at: http://wiki.stat.ucla.edu/socr/index.php/SOCR_MotionCharts_CAOzoneData. Several types of exploratory (motion charts, box-and-whisker plots, spider charts) and quantitative (inference, regression, analysis of variance (ANOVA)) data analyses tools are demonstrated. Two specific human health related questions (temporal and geographic effects of ozone pollution) are discussed as motivational challenges. PMID:24465054
Dinov, Ivo D; Christou, Nicolas
2011-09-01
This article presents a hands-on web-based activity motivated by the relation between human health and ozone pollution in California. This case study is based on multivariate data collected monthly at 20 locations in California between 1980 and 2006. Several strategies and tools for data interrogation and exploratory data analysis, model fitting and statistical inference on these data are presented. All components of this case study (data, tools, activity) are freely available online at: http://wiki.stat.ucla.edu/socr/index.php/SOCR_MotionCharts_CAOzoneData. Several types of exploratory (motion charts, box-and-whisker plots, spider charts) and quantitative (inference, regression, analysis of variance (ANOVA)) data analyses tools are demonstrated. Two specific human health related questions (temporal and geographic effects of ozone pollution) are discussed as motivational challenges.
Predictive validity of the tobacco marketing receptivity index among non-smoking youth.
Braun, Sandra; Abad-Vivero, Erika Nayeli; Mejía, Raúl; Barrientos, Inti; Sargent, James D; Thrasher, James F
2018-05-01
In a previous cross-sectional study of early adolescents, we developed a marketing receptivity index (MRI) that integrates point-of-sale (PoS) marketing exposures, brand recall, and ownership of branded merchandise. The MRI had independent, positive associations with smoking susceptibility among never smokers and with current smoking behavior. The current longitudinal study assessed the MRI's predictive validity among adolescents who have never smoked cigarettes METHODS: Data come from a longitudinal, school-based survey of 33 secondary schools in Argentina. Students who had never smoked at baseline were followed up approximately 17months later (n=1700). Questions assessed: PoS marketing exposure by querying frequency of going to stores where tobacco is commonly sold; cued recall of brand names for 3 cigarette packages from dominant brands but with the brand name removed; and ownership of branded merchandise. A four-level MRI was derived: 1.low PoS marketing exposure only; 2. high PoS exposure or recall of 1 brand; 3. recall of 2 or more brands; and 4. ownership of branded merchandise. Logistic regression models regressed smoking initiation by follow up survey on the MRI, each of its components, and students' willingness to try a brand, adjusting for sociodemographics, social network smoking, and sensation seeking. The MRI had an independent positive association with smoking initiation. When analyzed separately, each MRI component was associated with outcomes except branded merchandise ownership. The MRI and its components were associated with smoking initiation, except for branded merchandise ownership, which may better predict smoking progression than initiation. The MRI appears valid and useful for future studies. Copyright © 2018 Elsevier Ltd. All rights reserved.
Marzi, Ilaria; D'Amico, Massimo; Biagiotti, Tiziana; Giunti, Serena; Carbone, Maria Vittoria; Fredducci, David; Wanke, Enzo; Olivotto, Massimo
2007-03-15
We worked out an experimental protocol able to purge the stem cell compartment of the SH-SY5Y neuroblastoma clone. This protocol was based on the prolonged treatment of the wild-type cell population with either hypoxia or the antiblastic etoposide. Cell fate was monitored by immunocytochemical and electrophysiologic (patch-clamp) techniques. Both treatments produced the progressive disappearance of neuronal type (N) cells (which constitute the bulk of the tumor), leaving space for a special category of epithelial-like substrate-adherent cells (S(0)). The latter represent a minimal cell component of the untreated population and are endowed with immunocytochemical markers (p75, c-kit, and CD133) and the electrophysiologic "nude" profile, typical of the neural crest stem cells. S(0) cells displayed a highly clonogenic potency and a substantial plasticity, generating both the N component and an alternative subpopulation terminally committed to the fibromuscular lineage. Unlike the N component, this lineage was highly insensitive to the apoptotic activity of hypoxia and etoposide and developed only when the neuronal option was abolished. Under these conditions, the fibromuscular progeny of S(0) expanded and progressed up to the exhaustion of the staminal compartment and to the extinction of the tumor. When combined, hypoxia and etoposide cooperated in abolishing the N cell generation and promoting the conversion of the tumor described. This synergy might mirror a natural condition in the ischemic areas occurring in cancer. These results have relevant implications for the understanding of the documented tendency of neuroblastomas to regress from a malignant to a benign phenotype, either spontaneously or on antiblastic treatment.
Patterson, Debra; Resko, Stella
2015-01-01
Participant attrition is a major concern for online continuing education health care courses. The current study sought to understand what factors predicted health care professionals completing the online component of a sexual assault forensic examiner (SAFE) blended learning training program (12-week online course and 2-day in-person clinical skills workshop). The study used a Web-based survey to examine participant characteristics, motivation, and external barriers that may influence training completion. Hierarchical logistic regression was utilized to examine the predictors of training completion, while the Cox proportional hazards (Cox PH) regression model helped determine the factors associated with the timing of participant attrition. Results show that 79.3% of the enrolled professionals completed the online component. The study also found that clinicians who work in rural communities and those who were interested in a 2-day clinical skills workshop were more likely to complete the online course. In terms of when attrition occurred, we found that participants who were motivated by the 2-day clinical workshop, those who worked in a rural community, and participants interested in the training program because of its online nature were more likely to complete more of the online course. Blending an online course with a brief in-person clinical component may serve as a motivator for completing an online course because it provides the opportunity to develop clinical skills while receiving immediate feedback. Participant attrition appears to be less of a concern for rural clinicians because this modality can reduce their barriers to accessing continuing education. © 2015 The Alliance for Continuing Education in the Health Professions, the Society for Academic Continuing Medical Education, and the Council on Continuing Medical Education, Association for Hospital Medical Education.
A data fusion-based drought index
NASA Astrophysics Data System (ADS)
Azmi, Mohammad; Rüdiger, Christoph; Walker, Jeffrey P.
2016-03-01
Drought and water stress monitoring plays an important role in the management of water resources, especially during periods of extreme climate conditions. Here, a data fusion-based drought index (DFDI) has been developed and analyzed for three different locations of varying land use and climate regimes in Australia. The proposed index comprehensively considers all types of drought through a selection of indices and proxies associated with each drought type. In deriving the proposed index, weekly data from three different data sources (OzFlux Network, Asia-Pacific Water Monitor, and MODIS-Terra satellite) were employed to first derive commonly used individual standardized drought indices (SDIs), which were then grouped using an advanced clustering method. Next, three different multivariate methods (principal component analysis, factor analysis, and independent component analysis) were utilized to aggregate the SDIs located within each group. For the two clusters in which the grouped SDIs best reflected the water availability and vegetation conditions, the variables were aggregated based on an averaging between the standardized first principal components of the different multivariate methods. Then, considering those two aggregated indices as well as the classifications of months (dry/wet months and active/non-active months), the proposed DFDI was developed. Finally, the symbolic regression method was used to derive mathematical equations for the proposed DFDI. The results presented here show that the proposed index has revealed new aspects in water stress monitoring which previous indices were not able to, by simultaneously considering both hydrometeorological and ecological concepts to define the real water stress of the study areas.
2011-01-01
Background We investigate whether the changing environment caused by rapid economic growth yielded differential effects for successive Taiwanese generations on 8 components of metabolic syndrome (MetS): body mass index (BMI), systolic blood pressure (SBP), diastolic blood pressure (DBP), fasting plasma glucose (FPG), triglycerides (TG), high-density lipoprotein (HDL), Low-density lipoproteins (LDL) and uric acid (UA). Methods To assess the impact of age, birth year and year of examination on MetS components, we used partial least squares regression to analyze data collected by Mei-Jaw clinics in Taiwan in years 1996 and 2006. Confounders, such as the number of years in formal education, alcohol intake, smoking history status, and betel-nut chewing were adjusted for. Results As the age of individuals increased, the values of components generally increased except for UA. Men born after 1970 had lower FPG, lower BMI, lower DBP, lower TG, Lower LDL and greater HDL; women born after 1970 had lower BMI, lower DBP, lower TG, Lower LDL and greater HDL and UA. There is a similar pattern between the trend in levels of metabolic syndrome components against birth year of birth and economic growth in Taiwan. Conclusions We found cohort effects in some MetS components, suggesting associations between the changing environment and health outcomes in later life. This ecological association is worthy of further investigation. PMID:21619595
Baldi, F; Alencar, M M; Albuquerque, L G
2010-12-01
The objective of this work was to estimate covariance functions using random regression models on B-splines functions of animal age, for weights from birth to adult age in Canchim cattle. Data comprised 49,011 records on 2435 females. The model of analysis included fixed effects of contemporary groups, age of dam as quadratic covariable and the population mean trend taken into account by a cubic regression on orthogonal polynomials of animal age. Residual variances were modelled through a step function with four classes. The direct and maternal additive genetic effects, and animal and maternal permanent environmental effects were included as random effects in the model. A total of seventeen analyses, considering linear, quadratic and cubic B-splines functions and up to seven knots, were carried out. B-spline functions of the same order were considered for all random effects. Random regression models on B-splines functions were compared to a random regression model on Legendre polynomials and with a multitrait model. Results from different models of analyses were compared using the REML form of the Akaike Information criterion and Schwarz' Bayesian Information criterion. In addition, the variance components and genetic parameters estimated for each random regression model were also used as criteria to choose the most adequate model to describe the covariance structure of the data. A model fitting quadratic B-splines, with four knots or three segments for direct additive genetic effect and animal permanent environmental effect and two knots for maternal additive genetic effect and maternal permanent environmental effect, was the most adequate to describe the covariance structure of the data. Random regression models using B-spline functions as base functions fitted the data better than Legendre polynomials, especially at mature ages, but higher number of parameters need to be estimated with B-splines functions. © 2010 Blackwell Verlag GmbH.
Multivariate Analysis of Seismic Field Data
DOE Office of Scientific and Technical Information (OSTI.GOV)
Alam, M. Kathleen
1999-06-01
This report includes the details of the model building procedure and prediction of seismic field data. Principal Components Regression, a multivariate analysis technique, was used to model seismic data collected as two pieces of equipment were cycled on and off. Models built that included only the two pieces of equipment of interest had trouble predicting data containing signals not included in the model. Evidence for poor predictions came from the prediction curves as well as spectral F-ratio plots. Once the extraneous signals were included in the model, predictions improved dramatically. While Principal Components Regression performed well for the present datamore » sets, the present data analysis suggests further work will be needed to develop more robust modeling methods as the data become more complex.« less
Local Prediction Models on Mid-Atlantic Ridge MORB by Principal Component Regression
NASA Astrophysics Data System (ADS)
Ling, X.; Snow, J. E.; Chin, W.
2017-12-01
The isotopic compositions of the daughter isotopes of long-lived radioactive systems (Sr, Nd, Hf and Pb ) can be used to map the scale and history of mantle heterogeneities beneath mid-ocean ridges. Our goal is to relate the multidimensional structure in the existing isotopic dataset with an underlying physical reality of mantle sources. The numerical technique of Principal Component Analysis is useful to reduce the linear dependence of the data to a minimum set of orthogonal eigenvectors encapsulating the information contained (cf Agranier et al 2005). The dataset used for this study covers almost all the MORBs along mid-Atlantic Ridge (MAR), from 54oS to 77oN and 8.8oW to -46.7oW, including replicating the dataset of Agranier et al., 2005 published plus 53 basalt samples dredged and analyzed since then (data from PetDB). The principal components PC1 and PC2 account for 61.56% and 29.21%, respectively, of the total isotope ratios variability. The samples with similar compositions to HIMU and EM and DM are identified to better understand the PCs. PC1 and PC2 are accountable for HIMU and EM whereas PC2 has limited control over the DM source. PC3 is more strongly controlled by the depleted mantle source than PC2. What this means is that all three principal components have a high degree of significance relevant to the established mantle sources. We also tested the relationship between mantle heterogeneity and sample locality. K-means clustering algorithm is a type of unsupervised learning to find groups in the data based on feature similarity. The PC factor scores of each sample are clustered into three groups. Cluster one and three are alternating on the north and south MAR. Cluster two exhibits on 45.18oN to 0.79oN and -27.9oW to -30.40oW alternating with cluster one. The ridge has been preliminarily divided into 16 sections considering both the clusters and ridge segments. The principal component regression models the section based on 6 isotope ratios and PCs. The prediction residual is about 1-2km. It means that the combined 5 isotopes are a strong predictor of geographic location along the ridge, a slightly surprising result. PCR is a robust and powerful method for both visualizing and manipulating the multidimensional representation of isotope data.
Deng, Yang; Tu, Huakang; Pierzynski, Jeanne A; Miller, Ethan D; Gu, Xiangjun; Huang, Maosheng; Chang, David W; Ye, Yuanqing; Hildebrandt, Michelle A T; Klein, Alison P; Zhao, Ren; Lippman, Scott M; Wu, Xifeng
2018-03-01
Quality of life (QOL) is impaired in pancreatic cancer patients. Our aim was to investigate the determinants and prognostic value of QOL after diagnosis in a hospital-based cohort of racially/ethnically diverse patients with pancreatic ductal adenocarcinoma (PDAC). QOL was prospectively assessed using the Short Form-12 in 2478 PDAC patients. The Physical Component Summary (PCS) and Mental Component Summary (MCS) were categorised into tertiles based on their distribution. Ordered logistic regression was adopted to compare the risk of having lower PCS and MCS by patient sociodemographic and clinical characteristics. The association of PCS and MCS with mortality was assessed by Cox regression. Compared with non-Hispanic whites, Hispanics were at significantly higher risk of having lower PCS (odds ratio [95% CI], 1.69 [1.26-2.26]; P < 0.001) and lower MCS (1.66 [1.24-2.23]; P < 0.001). Patients diagnosed with stage III (1.80 [1.10-2.94]; P = 0.02) and stage IV (2.32 [1.50-3.59]; P < 0.001) PDAC were more likely to have lower PCS than stage I patients. Other determinants of QOL included sex, age, drinking, smoking, education level, comorbidities and time since diagnosis. The low tertile of PCS (hazard ratio [95% CI], 1.94 [1.72-2.18]; P < 0.001) and MCS (1.42 [1.26-1.59]; P < 0.001) were each related to poor prognosis. Similar results were found for non-Hispanic whites as compared with African-Americans/Hispanics/others. QOL after diagnosis is a significant prognostic indicator for patients with PDAC. Multiple factors determine QOL, suggesting possible means of intervention to improve QOL and outcomes of PDAC patients. Copyright © 2017. Published by Elsevier Ltd.
Balconi, Michela; Pagani, Silvia
2014-06-22
The perception and interpretation of social hierarchies are a key part of our social life. In the present research we considered the activation of cortical areas, mainly the prefrontal cortex, related to social ranking perception in conjunction with some personality components (BAS - Behavioral Activation System - and BIS - Behavioral Inhibition System). In two experiments we manipulated the perceived superior/inferior status during a competitive cognitive task. Indeed, we created an explicit and strongly reinforced social hierarchy based on incidental rating in an attentional task. Specifically, a peer group comparison was undertaken and improved (Experiment 1) or decreased (Experiment 2) performance was artificially manipulated by the experimenter. For each experiment two groups were compared, based on a BAS and BIS dichotomy. Alpha band modulation in prefrontal cortex, behavioral measures (performance: error rate, ER; response times, RTs), and self-perceived ranking were considered. Repeated measures ANOVAs and regression analyses showed in Experiment 1 a significant improved cognitive performance (decreased ER and RTs) and higher self-perceived ranking in high-BAS participants. Moreover, their prefrontal activity was increased within the left side (alpha band decreasing). Conversely, in Experiment 2 a significant decreased cognitive performance (increased ER and RTs) and lower self-perceived ranking was observed in higher-BIS participants. Their prefrontal right activity was increased in comparison with higher BAS. The regression analyses confirmed the significant predictive role of alpha band modulation with respect of subjects' performance and self-perception of social ranking, differently for BAS/BIS components. The present results suggest that social status perception is directly modulated by cortical activity and personality correlates. Copyright © 2014 Elsevier Inc. All rights reserved.
Rationale for hedging initiatives: Empirical evidence from the energy industry
NASA Astrophysics Data System (ADS)
Dhanarajata, Srirajata
Theory offers different rationales for hedging including (i) financial distress and bankruptcy cost, (ii) capacity to capture attractive investment opportunities, (iii) information asymmetry, (iv) economy of scale, (v) substitution for hedging, (vi) managerial risk aversion, and (vii) convexity of tax schedule. The purpose of this dissertation is to empirically test the explanatory power of the first five theoretical rationales on hedging done by oil and gas exploration and production (E&P) companies. The level of hedging is measured by the percentage of production effectively hedged, calculated based on the concept of delta and delta-gamma hedging. I employ Tobit regression, principal components, and panel data analysis on dependent and raw independent variables. Tobit regression is applied due to the fact that the dependent variable used in the analysis is non-negative. Principal component analysis helps to reduce the dimension of explanatory variables while panel data analysis combines/pools the data that is a combination of time-series and cross-sectional. Based on the empirical results, leverage level is consistently found to be a significant factor on hedging activities, either due to an attempt to avoid financial distress by the firm, or an attempt to control agency cost by debtholders, or both. The effect of capital expenditures and discretionary cash flows are both indeterminable due possibly to a potential mismatch in timing of realized cash flow items and hedging decision. Firm size is found to be positively related to hedging supporting economy of scale hypothesis, which is introduced in past literature, as well as the argument that large firm usually are more sophisticated and should be more willing and more comfortable to use hedge instruments than smaller firms.
Wu, Xia; Zhu, Jian-Cheng; Zhang, Yu; Li, Wei-Min; Rong, Xiang-Lu; Feng, Yi-Fan
2016-08-25
Potential impact of lipid research has been increasingly realized both in disease treatment and prevention. An effective metabolomics approach based on ultra-performance liquid chromatography/quadrupole-time-of-flight mass spectrometry (UPLC/Q-TOF-MS) along with multivariate statistic analysis has been applied for investigating the dynamic change of plasma phospholipids compositions in early type 2 diabetic rats after the treatment of an ancient prescription of Chinese Medicine Huang-Qi-San. The exported UPLC/Q-TOF-MS data of plasma samples were subjected to SIMCA-P and processed by bioMark, mixOmics, Rcomdr packages with R software. A clear score plots of plasma sample groups, including normal control group (NC), model group (MC), positive medicine control group (Flu) and Huang-Qi-San group (HQS), were achieved by principal-components analysis (PCA), partial least-squares discriminant analysis (PLS-DA) and orthogonal partial least-squares discriminant analysis (OPLS-DA). Biomarkers were screened out using student T test, principal component regression (PCR), partial least-squares regression (PLS) and important variable method (variable influence on projection, VIP). Structures of metabolites were identified and metabolic pathways were deduced by correlation coefficient. The relationship between compounds was explained by the correlation coefficient diagram, and the metabolic differences between similar compounds were illustrated. Based on KEGG database, the biological significances of identified biomarkers were described. The correlation coefficient was firstly applied to identify the structure and deduce the metabolic pathways of phospholipids metabolites, and the study provided a new methodological cue for further understanding the molecular mechanisms of metabolites in the process of regulating Huang-Qi-San for treating early type 2 diabetes. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Kawaguchi, Hiroyuki; Hashimoto, Hideki; Matsuda, Shinya
2012-09-22
The casemix-based payment system has been adopted in many countries, although it often needs complementary adjustment taking account of each hospital's unique production structure such as teaching and research duties, and non-profit motives. It has been challenging to numerically evaluate the impact of such structural heterogeneity on production, separately of production inefficiency. The current study adopted stochastic frontier analysis and proposed a method to assess unique components of hospital production structures using a fixed-effect variable. There were two stages of analyses in this study. In the first stage, we estimated the efficiency score from the hospital production function using a true fixed-effect model (TFEM) in stochastic frontier analysis. The use of a TFEM allowed us to differentiate the unobserved heterogeneity of individual hospitals as hospital-specific fixed effects. In the second stage, we regressed the obtained fixed-effect variable for structural components of hospitals to test whether the variable was explicitly related to the characteristics and local disadvantages of the hospitals. In the first analysis, the estimated efficiency score was approximately 0.6. The mean value of the fixed-effect estimator was 0.784, the standard deviation was 0.137, the range was between 0.437 and 1.212. The second-stage regression confirmed that the value of the fixed effect was significantly correlated with advanced technology and local conditions of the sample hospitals. The obtained fixed-effect estimator may reflect hospitals' unique structures of production, considering production inefficiency. The values of fixed-effect estimators can be used as evaluation tools to improve fairness in the reimbursement system for various functions of hospitals based on casemix classification.
WAVELET-DOMAIN REGRESSION AND PREDICTIVE INFERENCE IN PSYCHIATRIC NEUROIMAGING
Reiss, Philip T.; Huo, Lan; Zhao, Yihong; Kelly, Clare; Ogden, R. Todd
2016-01-01
An increasingly important goal of psychiatry is the use of brain imaging data to develop predictive models. Here we present two contributions to statistical methodology for this purpose. First, we propose and compare a set of wavelet-domain procedures for fitting generalized linear models with scalar responses and image predictors: sparse variants of principal component regression and of partial least squares, and the elastic net. Second, we consider assessing the contribution of image predictors over and above available scalar predictors, in particular via permutation tests and an extension of the idea of confounding to the case of functional or image predictors. Using the proposed methods, we assess whether maps of a spontaneous brain activity measure, derived from functional magnetic resonance imaging, can meaningfully predict presence or absence of attention deficit/hyperactivity disorder (ADHD). Our results shed light on the role of confounding in the surprising outcome of the recent ADHD-200 Global Competition, which challenged researchers to develop algorithms for automated image-based diagnosis of the disorder. PMID:27330652
Ibrahim, George M; Morgan, Benjamin R; Macdonald, R Loch
2014-03-01
Predictors of outcome after aneurysmal subarachnoid hemorrhage have been determined previously through hypothesis-driven methods that often exclude putative covariates and require a priori knowledge of potential confounders. Here, we apply a data-driven approach, principal component analysis, to identify baseline patient phenotypes that may predict neurological outcomes. Principal component analysis was performed on 120 subjects enrolled in a prospective randomized trial of clazosentan for the prevention of angiographic vasospasm. Correlation matrices were created using a combination of Pearson, polyserial, and polychoric regressions among 46 variables. Scores of significant components (with eigenvalues>1) were included in multivariate logistic regression models with incidence of severe angiographic vasospasm, delayed ischemic neurological deficit, and long-term outcome as outcomes of interest. Sixteen significant principal components accounting for 74.6% of the variance were identified. A single component dominated by the patients' initial hemodynamic status, World Federation of Neurosurgical Societies score, neurological injury, and initial neutrophil/leukocyte counts was significantly associated with poor outcome. Two additional components were associated with angiographic vasospasm, of which one was also associated with delayed ischemic neurological deficit. The first was dominated by the aneurysm-securing procedure, subarachnoid clot clearance, and intracerebral hemorrhage, whereas the second had high contributions from markers of anemia and albumin levels. Principal component analysis, a data-driven approach, identified patient phenotypes that are associated with worse neurological outcomes. Such data reduction methods may provide a better approximation of unique patient phenotypes and may inform clinical care as well as patient recruitment into clinical trials. http://www.clinicaltrials.gov. Unique identifier: NCT00111085.
Orthogonal decomposition of left ventricular remodeling in myocardial infarction.
Zhang, Xingyu; Medrano-Gracia, Pau; Ambale-Venkatesh, Bharath; Bluemke, David A; Cowan, Brett R; Finn, J Paul; Kadish, Alan H; Lee, Daniel C; Lima, Joao A C; Young, Alistair A; Suinesiaputra, Avan
2017-03-01
Left ventricular size and shape are important for quantifying cardiac remodeling in response to cardiovascular disease. Geometric remodeling indices have been shown to have prognostic value in predicting adverse events in the clinical literature, but these often describe interrelated shape changes. We developed a novel method for deriving orthogonal remodeling components directly from any (moderately independent) set of clinical remodeling indices. Six clinical remodeling indices (end-diastolic volume index, sphericity, relative wall thickness, ejection fraction, apical conicity, and longitudinal shortening) were evaluated using cardiac magnetic resonance images of 300 patients with myocardial infarction, and 1991 asymptomatic subjects, obtained from the Cardiac Atlas Project. Partial least squares (PLS) regression of left ventricular shape models resulted in remodeling components that were optimally associated with each remodeling index. A Gram-Schmidt orthogonalization process, by which remodeling components were successively removed from the shape space in the order of shape variance explained, resulted in a set of orthonormal remodeling components. Remodeling scores could then be calculated that quantify the amount of each remodeling component present in each case. A one-factor PLS regression led to more decoupling between scores from the different remodeling components across the entire cohort, and zero correlation between clinical indices and subsequent scores. The PLS orthogonal remodeling components had similar power to describe differences between myocardial infarction patients and asymptomatic subjects as principal component analysis, but were better associated with well-understood clinical indices of cardiac remodeling. The data and analyses are available from www.cardiacatlas.org. © The Author 2017. Published by Oxford University Press.
Karanjekar, Richa V; Bhatt, Arpita; Altouqui, Said; Jangikhatoonabad, Neda; Durai, Vennila; Sattler, Melanie L; Hossain, M D Sahadat; Chen, Victoria
2015-12-01
Accurately estimating landfill methane emissions is important for quantifying a landfill's greenhouse gas emissions and power generation potential. Current models, including LandGEM and IPCC, often greatly simplify treatment of factors like rainfall and ambient temperature, which can substantially impact gas production. The newly developed Capturing Landfill Emissions for Energy Needs (CLEEN) model aims to improve landfill methane generation estimates, but still require inputs that are fairly easy to obtain: waste composition, annual rainfall, and ambient temperature. To develop the model, methane generation was measured from 27 laboratory scale landfill reactors, with varying waste compositions (ranging from 0% to 100%); average rainfall rates of 2, 6, and 12 mm/day; and temperatures of 20, 30, and 37°C, according to a statistical experimental design. Refuse components considered were the major biodegradable wastes, food, paper, yard/wood, and textile, as well as inert inorganic waste. Based on the data collected, a multiple linear regression equation (R(2)=0.75) was developed to predict first-order methane generation rate constant values k as functions of waste composition, annual rainfall, and temperature. Because, laboratory methane generation rates exceed field rates, a second scale-up regression equation for k was developed using actual gas-recovery data from 11 landfills in high-income countries with conventional operation. The Capturing Landfill Emissions for Energy Needs (CLEEN) model was developed by incorporating both regression equations into the first-order decay based model for estimating methane generation rates from landfills. CLEEN model values were compared to actual field data from 6 US landfills, and to estimates from LandGEM and IPCC. For 4 of the 6 cases, CLEEN model estimates were the closest to actual. Copyright © 2015 Elsevier Ltd. All rights reserved.
Nelms, David L.; Messinger, Terence; McCoy, Kurt J.
2015-07-14
As part of the U.S. Geological Survey’s Groundwater Resources Program study of the Appalachian Plateaus aquifers, annual and average estimates of water-budget components based on hydrograph separation and precipitation data from parameter-elevation regressions on independent slopes model (PRISM) were determined at 849 continuous-record streamflow-gaging stations from Mississippi to New York and covered the period of 1900 to 2011. Only complete calendar years (January to December) of streamflow record at each gage were used to determine estimates of base flow, which is that part of streamflow attributed to groundwater discharge; such estimates can serve as a proxy for annual recharge. For each year, estimates of annual base flow, runoff, and base-flow index were determined using computer programs—PART, HYSEP, and BFI—that have automated the separation procedures. These streamflow-hydrograph analysis methods are provided with version 1.0 of the U.S. Geological Survey Groundwater Toolbox, which is a new program that provides graphing, mapping, and analysis capabilities in a Windows environment. Annual values of precipitation were estimated by calculating the average of cell values intercepted by basin boundaries where previously defined in the GAGES–II dataset. Estimates of annual evapotranspiration were then calculated from the difference between precipitation and streamflow.
Modeling pollen time series using seasonal-trend decomposition procedure based on LOESS smoothing
NASA Astrophysics Data System (ADS)
Rojo, Jesús; Rivero, Rosario; Romero-Morte, Jorge; Fernández-González, Federico; Pérez-Badia, Rosa
2017-02-01
Analysis of airborne pollen concentrations provides valuable information on plant phenology and is thus a useful tool in agriculture—for predicting harvests in crops such as the olive and for deciding when to apply phytosanitary treatments—as well as in medicine and the environmental sciences. Variations in airborne pollen concentrations, moreover, are indicators of changing plant life cycles. By modeling pollen time series, we can not only identify the variables influencing pollen levels but also predict future pollen concentrations. In this study, airborne pollen time series were modeled using a seasonal-trend decomposition procedure based on LOcally wEighted Scatterplot Smoothing (LOESS) smoothing (STL). The data series—daily Poaceae pollen concentrations over the period 2006-2014—was broken up into seasonal and residual (stochastic) components. The seasonal component was compared with data on Poaceae flowering phenology obtained by field sampling. Residuals were fitted to a model generated from daily temperature and rainfall values, and daily pollen concentrations, using partial least squares regression (PLSR). This method was then applied to predict daily pollen concentrations for 2014 (independent validation data) using results for the seasonal component of the time series and estimates of the residual component for the period 2006-2013. Correlation between predicted and observed values was r = 0.79 (correlation coefficient) for the pre-peak period (i.e., the period prior to the peak pollen concentration) and r = 0.63 for the post-peak period. Separate analysis of each of the components of the pollen data series enables the sources of variability to be identified more accurately than by analysis of the original non-decomposed data series, and for this reason, this procedure has proved to be a suitable technique for analyzing the main environmental factors influencing airborne pollen concentrations.
Modeling pollen time series using seasonal-trend decomposition procedure based on LOESS smoothing.
Rojo, Jesús; Rivero, Rosario; Romero-Morte, Jorge; Fernández-González, Federico; Pérez-Badia, Rosa
2017-02-01
Analysis of airborne pollen concentrations provides valuable information on plant phenology and is thus a useful tool in agriculture-for predicting harvests in crops such as the olive and for deciding when to apply phytosanitary treatments-as well as in medicine and the environmental sciences. Variations in airborne pollen concentrations, moreover, are indicators of changing plant life cycles. By modeling pollen time series, we can not only identify the variables influencing pollen levels but also predict future pollen concentrations. In this study, airborne pollen time series were modeled using a seasonal-trend decomposition procedure based on LOcally wEighted Scatterplot Smoothing (LOESS) smoothing (STL). The data series-daily Poaceae pollen concentrations over the period 2006-2014-was broken up into seasonal and residual (stochastic) components. The seasonal component was compared with data on Poaceae flowering phenology obtained by field sampling. Residuals were fitted to a model generated from daily temperature and rainfall values, and daily pollen concentrations, using partial least squares regression (PLSR). This method was then applied to predict daily pollen concentrations for 2014 (independent validation data) using results for the seasonal component of the time series and estimates of the residual component for the period 2006-2013. Correlation between predicted and observed values was r = 0.79 (correlation coefficient) for the pre-peak period (i.e., the period prior to the peak pollen concentration) and r = 0.63 for the post-peak period. Separate analysis of each of the components of the pollen data series enables the sources of variability to be identified more accurately than by analysis of the original non-decomposed data series, and for this reason, this procedure has proved to be a suitable technique for analyzing the main environmental factors influencing airborne pollen concentrations.
Schlairet, Maura C; Schlairet, Timothy James; Sauls, Denise H; Bellflowers, Lois
2015-03-01
Establishing the impact of the high-fidelity simulation environment on student performance, as well as identifying factors that could predict learning, would refine simulation outcome expectations among educators. The purpose of this quasi-experimental pilot study was to explore the impact of simulation on emotion and cognitive load among beginning nursing students. Forty baccalaureate nursing students participated in teaching simulations, rated their emotional state and cognitive load, and completed evaluation simulations. Two principal components of emotion were identified representing the pleasant activation and pleasant deactivation components of affect. Mean rating of cognitive load following simulation was high. Linear regression identiffed slight but statistically nonsignificant positive associations between principal components of emotion and cognitive load. Logistic regression identified a negative but statistically nonsignificant effect of cognitive load on assessment performance. Among lower ability students, a more pronounced effect of cognitive load on assessment performance was observed; this also was statistically non-significant. Copyright 2015, SLACK Incorporated.
Kong, Jessica; Giridharagopal, Rajiv; Harrison, Jeffrey S; Ginger, David S
2018-05-31
Correlating nanoscale chemical specificity with operational physics is a long-standing goal of functional scanning probe microscopy (SPM). We employ a data analytic approach combining multiple microscopy modes, using compositional information in infrared vibrational excitation maps acquired via photoinduced force microscopy (PiFM) with electrical information from conductive atomic force microscopy. We study a model polymer blend comprising insulating poly(methyl methacrylate) (PMMA) and semiconducting poly(3-hexylthiophene) (P3HT). We show that PiFM spectra are different from FTIR spectra, but can still be used to identify local composition. We use principal component analysis to extract statistically significant principal components and principal component regression to predict local current and identify local polymer composition. In doing so, we observe evidence of semiconducting P3HT within PMMA aggregates. These methods are generalizable to correlated SPM data and provide a meaningful technique for extracting complex compositional information that are impossible to measure from any one technique.
Diez-Martin, J; Moreno-Ortega, M; Bagney, A; Rodriguez-Jimenez, R; Padilla-Torres, D; Sanchez-Morla, E M; Santos, J L; Palomo, T; Jimenez-Arriero, M A
2014-01-01
To assess insight in a large sample of patients with schizophrenia and to study its relationship with set shifting as an executive function. The insight of a sample of 161 clinically stable, community-dwelling patients with schizophrenia was evaluated by means of the Scale to Assess Unawareness of Mental Disorder (SUMD). Set shifting was measured using the Trail-Making Test time required to complete part B minus the time required to complete part A (TMT B-A). Linear regression analyses were performed to investigate the relationships of TMT B-A with different dimensions of general insight. Regression analyses revealed a significant association between TMT B-A and two of the SUMD general components: 'awareness of mental disorder' and 'awareness of the efficacy of treatment'. The 'awareness of social consequences' component was not significantly associated with set shifting. Our results show a significant relation between set shifting and insight, but not in the same manner for the different components of the SUMD general score. Copyright © 2013 S. Karger AG, Basel.
Gong, Inna Y; Goodman, Shaun G; Brieger, David; Gale, Chris P; Chew, Derek P; Welsh, Robert C; Huynh, Thao; DeYoung, J Paul; Baer, Carolyn; Gyenes, Gabor T; Udell, Jacob A; Fox, Keith A A; Yan, Andrew T
2017-10-01
Although there are sex differences in management and outcome of acute coronary syndromes (ACS), sex is not a component of Global Registry of Acute Coronary Events (GRACE) risk score (RS) for in-hospital mortality prediction. We sought to determine the prognostic utility of GRACE RS in men and women, and whether its predictive accuracy would be augmented through sex-based modification of its components. Canadian men and women enrolled in GRACE and Canadian Registry of Acute Coronary Events were stratified as ST-segment elevation myocardial infarction (STEMI) or non-ST-segment elevation ACS (NSTE-ACS). GRACE RS was calculated as per original model. Discrimination and calibration were evaluated using the c-statistic and Hosmer-Lemeshow goodness-of-fit test, respectively. Multivariable logistic regression was undertaken to assess potential interactions of sex with GRACE RS components. For the overall cohort (n=14,422), unadjusted in-hospital mortality rate was higher in women than men (4.5% vs. 3.0%, p<0.001). Overall, GRACE RS c-statistic and goodness-of-fit test p-value were 0.85 (95% CI 0.83-0.87) and 0.11, respectively. While the RS had excellent discrimination for all subgroups (c-statistics >0.80), discrimination was lower for women compared to men with STEMI [0.80 (0.75-0.84) vs. 0.86 (0.82-0.89), respectively, p<0.05]. The goodness-of-fit test showed good calibration for women (p=0.86), but suboptimal for men (p=0.031). No significant interaction was evident between sex and RS components (all p>0.25). The GRACE RS is a valid predictor of in-hospital mortality for both men and women with ACS. The lack of interaction between sex and RS components suggests that sex-based modification is not required. Copyright © 2017 Elsevier B.V. All rights reserved.
Villa, Chiara; Brůžek, Jaroslav
2017-01-01
Background Estimating volumes and masses of total body components is important for the study and treatment monitoring of nutrition and nutrition-related disorders, cancer, joint replacement, energy-expenditure and exercise physiology. While several equations have been offered for estimating total body components from MRI slices, no reliable and tested method exists for CT scans. For the first time, body composition data was derived from 41 high-resolution whole-body CT scans. From these data, we defined equations for estimating volumes and masses of total body AT and LT from corresponding tissue areas measured in selected CT scan slices. Methods We present a new semi-automatic approach to defining the density cutoff between adipose tissue (AT) and lean tissue (LT) in such material. An intra-class correlation coefficient (ICC) was used to validate the method. The equations for estimating the whole-body composition volume and mass from areas measured in selected slices were modeled with ordinary least squares (OLS) linear regressions and support vector machine regression (SVMR). Results and Discussion The best predictive equation for total body AT volume was based on the AT area of a single slice located between the 4th and 5th lumbar vertebrae (L4-L5) and produced lower prediction errors (|PE| = 1.86 liters, %PE = 8.77) than previous equations also based on CT scans. The LT area of the mid-thigh provided the lowest prediction errors (|PE| = 2.52 liters, %PE = 7.08) for estimating whole-body LT volume. We also present equations to predict total body AT and LT masses from a slice located at L4-L5 that resulted in reduced error compared with the previously published equations based on CT scans. The multislice SVMR predictor gave the theoretical upper limit for prediction precision of volumes and cross-validated the results. PMID:28533960
Lacoste Jeanson, Alizé; Dupej, Ján; Villa, Chiara; Brůžek, Jaroslav
2017-01-01
Estimating volumes and masses of total body components is important for the study and treatment monitoring of nutrition and nutrition-related disorders, cancer, joint replacement, energy-expenditure and exercise physiology. While several equations have been offered for estimating total body components from MRI slices, no reliable and tested method exists for CT scans. For the first time, body composition data was derived from 41 high-resolution whole-body CT scans. From these data, we defined equations for estimating volumes and masses of total body AT and LT from corresponding tissue areas measured in selected CT scan slices. We present a new semi-automatic approach to defining the density cutoff between adipose tissue (AT) and lean tissue (LT) in such material. An intra-class correlation coefficient (ICC) was used to validate the method. The equations for estimating the whole-body composition volume and mass from areas measured in selected slices were modeled with ordinary least squares (OLS) linear regressions and support vector machine regression (SVMR). The best predictive equation for total body AT volume was based on the AT area of a single slice located between the 4th and 5th lumbar vertebrae (L4-L5) and produced lower prediction errors (|PE| = 1.86 liters, %PE = 8.77) than previous equations also based on CT scans. The LT area of the mid-thigh provided the lowest prediction errors (|PE| = 2.52 liters, %PE = 7.08) for estimating whole-body LT volume. We also present equations to predict total body AT and LT masses from a slice located at L4-L5 that resulted in reduced error compared with the previously published equations based on CT scans. The multislice SVMR predictor gave the theoretical upper limit for prediction precision of volumes and cross-validated the results.
Spratlen, Miranda J; Grau-Perez, Maria; Best, Lyle G; Yracheta, Joseph; Lazo, Mariana; Vaidya, Dhananjay; Balakrishnan, Poojitha; Gamble, Mary V; Francesconi, Kevin A; Goessler, Walter; Cole, Shelley A; Umans, Jason G; Howard, Barbara V; Navas-Acien, Ana
2018-03-15
Inorganic arsenic exposure is ubiquitous and both exposure and inter-individual differences in its metabolism have been associated with cardiometabolic risk. The association between arsenic exposure and arsenic metabolism with metabolic syndrome and its individual components, however, is relatively unknown. We used poisson regression with robust variance to evaluate the association between baseline arsenic exposure (urine arsenic levels) and metabolism (relative percentage of arsenic species over their sum) with incident metabolic syndrome and its individual components (elevated waist circumference, elevated triglycerides, reduced HDL, hypertension, elevated fasting plasma glucose) in 1,047 participants from the Strong Heart Family Study, a prospective family-based cohort in American Indian communities (baseline visits in 1998-1999 and 2001-2003, follow-up visits in 2001-2003 and 2006-2009). 32% of participants developed metabolic syndrome over follow-up. An IQR increase in arsenic exposure was associated with 1.19 (95% CI: 1.01, 1.41) greater risk for elevated fasting plasma glucose but not with other individual components or overall metabolic syndrome. Arsenic metabolism, specifically lower MMA% and higher DMA% was associated with higher risk of overall metabolic syndrome and elevated waist circumference, but not with any other component. These findings support there is a contrasting and independent association between arsenic exposure and arsenic metabolism with metabolic outcomes which may contribute to overall diabetes risk.
Abnormal dynamics of language in schizophrenia.
Stephane, Massoud; Kuskowski, Michael; Gundel, Jeanette
2014-05-30
Language could be conceptualized as a dynamic system that includes multiple interactive levels (sub-lexical, lexical, sentence, and discourse) and components (phonology, semantics, and syntax). In schizophrenia, abnormalities are observed at all language elements (levels and components) but the dynamic between these elements remains unclear. We hypothesize that the dynamics between language elements in schizophrenia is abnormal and explore how this dynamic is altered. We, first, investigated language elements with comparable procedures in patients and healthy controls. Second, using measures of reaction time, we performed multiple linear regression analyses to evaluate the inter-relationships among language elements and the effect of group on these relationships. Patients significantly differed from controls with respect to sub-lexical/lexical, lexical/sentence, and sentence/discourse regression coefficients. The intercepts of the regression slopes increased in the same order above (from lower to higher levels) in patients but not in controls. Regression coefficients between syntax and both sentence level and discourse level semantics did not differentiate patients from controls. This study indicates that the dynamics between language elements is abnormal in schizophrenia. In patients, top-down flow of linguistic information might be reduced, and the relationship between phonology and semantics but not between syntax and semantics appears to be altered. Published by Elsevier Ireland Ltd.
Acoustic-articulatory mapping in vowels by locally weighted regression
McGowan, Richard S.; Berger, Michael A.
2009-01-01
A method for mapping between simultaneously measured articulatory and acoustic data is proposed. The method uses principal components analysis on the articulatory and acoustic variables, and mapping between the domains by locally weighted linear regression, or loess [Cleveland, W. S. (1979). J. Am. Stat. Assoc. 74, 829–836]. The latter method permits local variation in the slopes of the linear regression, assuming that the function being approximated is smooth. The methodology is applied to vowels of four speakers in the Wisconsin X-ray Microbeam Speech Production Database, with formant analysis. Results are examined in terms of (1) examples of forward (articulation-to-acoustics) mappings and inverse mappings, (2) distributions of local slopes and constants, (3) examples of correlations among slopes and constants, (4) root-mean-square error, and (5) sensitivity of formant frequencies to articulatory change. It is shown that the results are qualitatively correct and that loess performs better than global regression. The forward mappings show different root-mean-square error properties than the inverse mappings indicating that this method is better suited for the forward mappings than the inverse mappings, at least for the data chosen for the current study. Some preliminary results on sensitivity of the first two formant frequencies to the two most important articulatory principal components are presented. PMID:19813812
Seasonal forecasting of high wind speeds over Western Europe
NASA Astrophysics Data System (ADS)
Palutikof, J. P.; Holt, T.
2003-04-01
As financial losses associated with extreme weather events escalate, there is interest from end users in the forestry and insurance industries, for example, in the development of seasonal forecasting models with a long lead time. This study uses exceedences of the 90th, 95th, and 99th percentiles of daily maximum wind speed over the period 1958 to present to derive predictands of winter wind extremes. The source data is the 6-hourly NCEP Reanalysis gridded surface wind field. Predictor variables include principal components of Atlantic sea surface temperature and several indices of climate variability, including the NAO and SOI. Lead times of up to a year are considered, in monthly increments. Three regression techniques are evaluated; multiple linear regression (MLR), principal component regression (PCR), and partial least squares regression (PLS). PCR and PLS proved considerably superior to MLR with much lower standard errors. PLS was chosen to formulate the predictive model since it offers more flexibility in experimental design and gave slightly better results than PCR. The results indicate that winter windiness can be predicted with considerable skill one year ahead for much of coastal Europe, but that this deteriorates rapidly in the hinterland. The experiment succeeded in highlighting PLS as a very useful method for developing more precise forecasting models, and in identifying areas of high predictability.
Multiplicative Multitask Feature Learning
Wang, Xin; Bi, Jinbo; Yu, Shipeng; Sun, Jiangwen; Song, Minghu
2016-01-01
We investigate a general framework of multiplicative multitask feature learning which decomposes individual task’s model parameters into a multiplication of two components. One of the components is used across all tasks and the other component is task-specific. Several previous methods can be proved to be special cases of our framework. We study the theoretical properties of this framework when different regularization conditions are applied to the two decomposed components. We prove that this framework is mathematically equivalent to the widely used multitask feature learning methods that are based on a joint regularization of all model parameters, but with a more general form of regularizers. Further, an analytical formula is derived for the across-task component as related to the task-specific component for all these regularizers, leading to a better understanding of the shrinkage effects of different regularizers. Study of this framework motivates new multitask learning algorithms. We propose two new learning formulations by varying the parameters in the proposed framework. An efficient blockwise coordinate descent algorithm is developed suitable for solving the entire family of formulations with rigorous convergence analysis. Simulation studies have identified the statistical properties of data that would be in favor of the new formulations. Extensive empirical studies on various classification and regression benchmark data sets have revealed the relative advantages of the two new formulations by comparing with the state of the art, which provides instructive insights into the feature learning problem with multiple tasks. PMID:28428735
Hidden Connections between Regression Models of Strain-Gage Balance Calibration Data
NASA Technical Reports Server (NTRS)
Ulbrich, Norbert
2013-01-01
Hidden connections between regression models of wind tunnel strain-gage balance calibration data are investigated. These connections become visible whenever balance calibration data is supplied in its design format and both the Iterative and Non-Iterative Method are used to process the data. First, it is shown how the regression coefficients of the fitted balance loads of a force balance can be approximated by using the corresponding regression coefficients of the fitted strain-gage outputs. Then, data from the manual calibration of the Ames MK40 six-component force balance is chosen to illustrate how estimates of the regression coefficients of the fitted balance loads can be obtained from the regression coefficients of the fitted strain-gage outputs. The study illustrates that load predictions obtained by applying the Iterative or the Non-Iterative Method originate from two related regression solutions of the balance calibration data as long as balance loads are given in the design format of the balance, gage outputs behave highly linear, strict statistical quality metrics are used to assess regression models of the data, and regression model term combinations of the fitted loads and gage outputs can be obtained by a simple variable exchange.
ERIC Educational Resources Information Center
Hansmann, Ralf
2009-01-01
A university Environmental Sciences curriculum is described against the background of requirements for environmental problem solving for sustainability and then analyzed using data from regular surveys of graduates (N = 373). Three types of multiple regression models examine links between qualifications and curriculum components in order to derive…
NASA Astrophysics Data System (ADS)
Venedikov, A. P.; Arnoso, J.; Cai, W.; Vieira, R.; Tan, S.; Velez, E. J.
2006-01-01
A 12-year series (1992-2004) of strain measurements recorded in the Geodynamics Laboratory of Lanzarote is investigated. Through a tidal analysis the non-tidal component of the data is separated in order to use it for studying signals, useful for monitoring of the volcanic activity on the island. This component contains various perturbations of meteorological and oceanic origin, which should be eliminated in order to make the useful signals discernible. The paper is devoted to the estimation and elimination of the effect of the air temperature inside the station, which strongly dominates the strainmeter data. For solving this task, a regression model is applied, which includes a linear relation with the temperature and time-dependant polynomials. The regression includes nonlinearly a set of parameters, which are estimated by a properly applied Bayesian approach. The results obtained are: the regression coefficient of the strain data on temperature is equal to (-367.4 ± 0.8) × 10 -9 °C -1, the curve of the non-tidal component reduced by the effect of the temperature and a polynomial approximation of the reduced curve. The technique used here can be helpful to investigators in the domain of the earthquake and volcano monitoring. However, the fundamental and extremely difficult problem of what kind of signals in the reduced curves might be useful in this field is not considered here.
Walker, Mary Ellen; Anonson, June; Szafron, Michael
2015-01-01
The relationship between political environment and health services accessibility (HSA) has not been the focus of any specific studies. The purpose of this study was to address this gap in the literature by examining the relationship between political environment and HSA. This relationship that HSA indicators (physicians, nurses and hospital beds per 10 000 people) has with political environment was analyzed with multiple least-squares regression using the components of democracy (electoral processes and pluralism, functioning of government, political participation, political culture, and civil liberties). The components of democracy were represented by the 2011 Economist Intelligence Unit Democracy Index (EIUDI) sub-scores. The EIUDI sub-scores and the HSA indicators were evaluated for significant relationships with multiple least-squares regression. While controlling for a country's geographic location and level of democracy, we found that two components of a nation's political environment: functioning of government and political participation, and their interaction had significant relationships with the three HSA indicators. These study findings are of significance to health professionals because they examine the political contexts in which citizens access health services, they come from research that is the first of its kind, and they help explain the effect political environment has on health. © The Author 2014. Published by Oxford University Press on behalf of Royal Society of Tropical Medicine and Hygiene. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
NASA Astrophysics Data System (ADS)
Talebpour, Zahra; Tavallaie, Roya; Ahmadi, Seyyed Hamid; Abdollahpour, Assem
2010-09-01
In this study, a new method for the simultaneous determination of penicillin G salts in pharmaceutical mixture via FT-IR spectroscopy combined with chemometrics was investigated. The mixture of penicillin G salts is a complex system due to similar analytical characteristics of components. Partial least squares (PLS) and radial basis function-partial least squares (RBF-PLS) were used to develop the linear and nonlinear relation between spectra and components, respectively. The orthogonal signal correction (OSC) preprocessing method was used to correct unexpected information, such as spectral overlapping and scattering effects. In order to compare the influence of OSC on PLS and RBF-PLS models, the optimal linear (PLS) and nonlinear (RBF-PLS) models based on conventional and OSC preprocessed spectra were established and compared. The obtained results demonstrated that OSC clearly enhanced the performance of both RBF-PLS and PLS calibration models. Also in the case of some nonlinear relation between spectra and component, OSC-RBF-PLS gave satisfactory results than OSC-PLS model which indicated that the OSC was helpful to remove extrinsic deviations from linearity without elimination of nonlinear information related to component. The chemometric models were tested on an external dataset and finally applied to the analysis commercialized injection product of penicillin G salts.
Reference Models for Structural Technology Assessment and Weight Estimation
NASA Technical Reports Server (NTRS)
Cerro, Jeff; Martinovic, Zoran; Eldred, Lloyd
2005-01-01
Previously the Exploration Concepts Branch of NASA Langley Research Center has developed techniques for automating the preliminary design level of launch vehicle airframe structural analysis for purposes of enhancing historical regression based mass estimating relationships. This past work was useful and greatly reduced design time, however its application area was very narrow in terms of being able to handle a large variety in structural and vehicle general arrangement alternatives. Implementation of the analysis approach presented herein also incorporates some newly developed computer programs. Loft is a program developed to create analysis meshes and simultaneously define structural element design regions. A simple component defining ASCII file is read by Loft to begin the design process. HSLoad is a Visual Basic implementation of the HyperSizer Application Programming Interface, which automates the structural element design process. Details of these two programs and their use are explained in this paper. A feature which falls naturally out of the above analysis paradigm is the concept of "reference models". The flexibility of the FEA based JAVA processing procedures and associated process control classes coupled with the general utility of Loft and HSLoad make it possible to create generic program template files for analysis of components ranging from something as simple as a stiffened flat panel, to curved panels, fuselage and cryogenic tank components, flight control surfaces, wings, through full air and space vehicle general arrangements.
Variance Component Selection With Applications to Microbiome Taxonomic Data.
Zhai, Jing; Kim, Juhyun; Knox, Kenneth S; Twigg, Homer L; Zhou, Hua; Zhou, Jin J
2018-01-01
High-throughput sequencing technology has enabled population-based studies of the role of the human microbiome in disease etiology and exposure response. Microbiome data are summarized as counts or composition of the bacterial taxa at different taxonomic levels. An important problem is to identify the bacterial taxa that are associated with a response. One method is to test the association of specific taxon with phenotypes in a linear mixed effect model, which incorporates phylogenetic information among bacterial communities. Another type of approaches consider all taxa in a joint model and achieves selection via penalization method, which ignores phylogenetic information. In this paper, we consider regression analysis by treating bacterial taxa at different level as multiple random effects. For each taxon, a kernel matrix is calculated based on distance measures in the phylogenetic tree and acts as one variance component in the joint model. Then taxonomic selection is achieved by the lasso (least absolute shrinkage and selection operator) penalty on variance components. Our method integrates biological information into the variable selection problem and greatly improves selection accuracies. Simulation studies demonstrate the superiority of our methods versus existing methods, for example, group-lasso. Finally, we apply our method to a longitudinal microbiome study of Human Immunodeficiency Virus (HIV) infected patients. We implement our method using the high performance computing language Julia. Software and detailed documentation are freely available at https://github.com/JingZhai63/VCselection.
Sieve estimation of Cox models with latent structures.
Cao, Yongxiu; Huang, Jian; Liu, Yanyan; Zhao, Xingqiu
2016-12-01
This article considers sieve estimation in the Cox model with an unknown regression structure based on right-censored data. We propose a semiparametric pursuit method to simultaneously identify and estimate linear and nonparametric covariate effects based on B-spline expansions through a penalized group selection method with concave penalties. We show that the estimators of the linear effects and the nonparametric component are consistent. Furthermore, we establish the asymptotic normality of the estimator of the linear effects. To compute the proposed estimators, we develop a modified blockwise majorization descent algorithm that is efficient and easy to implement. Simulation studies demonstrate that the proposed method performs well in finite sample situations. We also use the primary biliary cirrhosis data to illustrate its application. © 2016, The International Biometric Society.
A kinetic energy model of two-vehicle crash injury severity.
Sobhani, Amir; Young, William; Logan, David; Bahrololoom, Sareh
2011-05-01
An important part of any model of vehicle crashes is the development of a procedure to estimate crash injury severity. After reviewing existing models of crash severity, this paper outlines the development of a modelling approach aimed at measuring the injury severity of people in two-vehicle road crashes. This model can be incorporated into a discrete event traffic simulation model, using simulation model outputs as its input. The model can then serve as an integral part of a simulation model estimating the crash potential of components of the traffic system. The model is developed using Newtonian Mechanics and Generalised Linear Regression. The factors contributing to the speed change (ΔV(s)) of a subject vehicle are identified using the law of conservation of momentum. A Log-Gamma regression model is fitted to measure speed change (ΔV(s)) of the subject vehicle based on the identified crash characteristics. The kinetic energy applied to the subject vehicle is calculated by the model, which in turn uses a Log-Gamma Regression Model to estimate the Injury Severity Score of the crash from the calculated kinetic energy, crash impact type, presence of airbag and/or seat belt and occupant age. Copyright © 2010 Elsevier Ltd. All rights reserved.
Alwee, Razana; Hj Shamsuddin, Siti Mariyam; Sallehuddin, Roselina
2013-01-01
Crimes forecasting is an important area in the field of criminology. Linear models, such as regression and econometric models, are commonly applied in crime forecasting. However, in real crimes data, it is common that the data consists of both linear and nonlinear components. A single model may not be sufficient to identify all the characteristics of the data. The purpose of this study is to introduce a hybrid model that combines support vector regression (SVR) and autoregressive integrated moving average (ARIMA) to be applied in crime rates forecasting. SVR is very robust with small training data and high-dimensional problem. Meanwhile, ARIMA has the ability to model several types of time series. However, the accuracy of the SVR model depends on values of its parameters, while ARIMA is not robust to be applied to small data sets. Therefore, to overcome this problem, particle swarm optimization is used to estimate the parameters of the SVR and ARIMA models. The proposed hybrid model is used to forecast the property crime rates of the United State based on economic indicators. The experimental results show that the proposed hybrid model is able to produce more accurate forecasting results as compared to the individual models. PMID:23766729
Park, Taeyoung; Krafty, Robert T; Sánchez, Alvaro I
2012-07-27
A Poisson regression model with an offset assumes a constant baseline rate after accounting for measured covariates, which may lead to biased estimates of coefficients in an inhomogeneous Poisson process. To correctly estimate the effect of time-dependent covariates, we propose a Poisson change-point regression model with an offset that allows a time-varying baseline rate. When the nonconstant pattern of a log baseline rate is modeled with a nonparametric step function, the resulting semi-parametric model involves a model component of varying dimension and thus requires a sophisticated varying-dimensional inference to obtain correct estimates of model parameters of fixed dimension. To fit the proposed varying-dimensional model, we devise a state-of-the-art MCMC-type algorithm based on partial collapse. The proposed model and methods are used to investigate an association between daily homicide rates in Cali, Colombia and policies that restrict the hours during which the legal sale of alcoholic beverages is permitted. While simultaneously identifying the latent changes in the baseline homicide rate which correspond to the incidence of sociopolitical events, we explore the effect of policies governing the sale of alcohol on homicide rates and seek a policy that balances the economic and cultural dependencies on alcohol sales to the health of the public.
Gianola, Daniel; Fariello, Maria I.; Naya, Hugo; Schön, Chris-Carolin
2016-01-01
Standard genome-wide association studies (GWAS) scan for relationships between each of p molecular markers and a continuously distributed target trait. Typically, a marker-based matrix of genomic similarities among individuals (G) is constructed, to account more properly for the covariance structure in the linear regression model used. We show that the generalized least-squares estimator of the regression of phenotype on one or on m markers is invariant with respect to whether or not the marker(s) tested is(are) used for building G, provided variance components are unaffected by exclusion of such marker(s) from G. The result is arrived at by using a matrix expression such that one can find many inverses of genomic relationship, or of phenotypic covariance matrices, stemming from removing markers tested as fixed, but carrying out a single inversion. When eigenvectors of the genomic relationship matrix are used as regressors with fixed regression coefficients, e.g., to account for population stratification, their removal from G does matter. Removal of eigenvectors from G can have a noticeable effect on estimates of genomic and residual variances, so caution is needed. Concepts were illustrated using genomic data on 599 wheat inbred lines, with grain yield as target trait, and on close to 200 Arabidopsis thaliana accessions. PMID:27520956
Wang, Wei; Griswold, Michael E
2016-11-30
The random effect Tobit model is a regression model that accommodates both left- and/or right-censoring and within-cluster dependence of the outcome variable. Regression coefficients of random effect Tobit models have conditional interpretations on a constructed latent dependent variable and do not provide inference of overall exposure effects on the original outcome scale. Marginalized random effects model (MREM) permits likelihood-based estimation of marginal mean parameters for the clustered data. For random effect Tobit models, we extend the MREM to marginalize over both the random effects and the normal space and boundary components of the censored response to estimate overall exposure effects at population level. We also extend the 'Average Predicted Value' method to estimate the model-predicted marginal means for each person under different exposure status in a designated reference group by integrating over the random effects and then use the calculated difference to assess the overall exposure effect. The maximum likelihood estimation is proposed utilizing a quasi-Newton optimization algorithm with Gauss-Hermite quadrature to approximate the integration of the random effects. We use these methods to carefully analyze two real datasets. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Alwee, Razana; Shamsuddin, Siti Mariyam Hj; Sallehuddin, Roselina
2013-01-01
Crimes forecasting is an important area in the field of criminology. Linear models, such as regression and econometric models, are commonly applied in crime forecasting. However, in real crimes data, it is common that the data consists of both linear and nonlinear components. A single model may not be sufficient to identify all the characteristics of the data. The purpose of this study is to introduce a hybrid model that combines support vector regression (SVR) and autoregressive integrated moving average (ARIMA) to be applied in crime rates forecasting. SVR is very robust with small training data and high-dimensional problem. Meanwhile, ARIMA has the ability to model several types of time series. However, the accuracy of the SVR model depends on values of its parameters, while ARIMA is not robust to be applied to small data sets. Therefore, to overcome this problem, particle swarm optimization is used to estimate the parameters of the SVR and ARIMA models. The proposed hybrid model is used to forecast the property crime rates of the United State based on economic indicators. The experimental results show that the proposed hybrid model is able to produce more accurate forecasting results as compared to the individual models.
Reconstruction of magnetic configurations in W7-X using artificial neural networks
NASA Astrophysics Data System (ADS)
Böckenhoff, Daniel; Blatzheim, Marko; Hölbe, Hauke; Niemann, Holger; Pisano, Fabio; Labahn, Roger; Pedersen, Thomas Sunn; The W7-X Team
2018-05-01
It is demonstrated that artificial neural networks can be used to accurately and efficiently predict details of the magnetic topology at the plasma edge of the Wendelstein 7-X stellarator, based on simulated as well as measured heat load patterns onto plasma-facing components observed with infrared cameras. The connection between heat load patterns and the magnetic topology is a challenging regression problem, but one that suits artificial neural networks well. The use of a neural network makes it feasible to analyze and control the plasma exhaust in real-time, an important goal for Wendelstein 7-X, and for magnetic confinement fusion research in general.
Crop area estimation based on remotely-sensed data with an accurate but costly subsample
NASA Technical Reports Server (NTRS)
Gunst, R. F.
1985-01-01
Research activities conducted under the auspices of National Aeronautics and Space Administration Cooperative Agreement NCC 9-9 are discussed. During this contract period research efforts are concentrated in two primary areas. The first are is an investigation of the use of measurement error models as alternatives to least squares regression estimators of crop production or timber biomass. The secondary primary area of investigation is on the estimation of the mixing proportion of two-component mixture models. This report lists publications, technical reports, submitted manuscripts, and oral presentation generated by these research efforts. Possible areas of future research are mentioned.
An expert system for diagnostics and estimation of steam turbine components condition
NASA Astrophysics Data System (ADS)
Murmansky, B. E.; Aronson, K. E.; Brodov, Yu. M.
2017-11-01
The report describes an expert system of probability type for diagnostics and state estimation of steam turbine technological subsystems components. The expert system is based on Bayes’ theorem and permits to troubleshoot the equipment components, using expert experience, when there is a lack of baseline information on the indicators of turbine operation. Within a unified approach the expert system solves the problems of diagnosing the flow steam path of the turbine, bearings, thermal expansion system, regulatory system, condensing unit, the systems of regenerative feed-water and hot water heating. The knowledge base of the expert system for turbine unit rotors and bearings contains a description of 34 defects and of 104 related diagnostic features that cause a change in its vibration state. The knowledge base for the condensing unit contains 12 hypotheses and 15 evidence (indications); the procedures are also designated for 20 state parameters estimation. Similar knowledge base containing the diagnostic features and faults hypotheses are formulated for other technological subsystems of turbine unit. With the necessary initial information available a number of problems can be solved within the expert system for various technological subsystems of steam turbine unit: for steam flow path it is the correlation and regression analysis of multifactor relationship between the vibration parameters variations and the regime parameters; for system of thermal expansions it is the evaluation of force acting on the longitudinal keys depending on the temperature state of the turbine cylinder; for condensing unit it is the evaluation of separate effect of the heat exchange surface contamination and of the presence of air in condenser steam space on condenser thermal efficiency performance, as well as the evaluation of term for condenser cleaning and for tube system replacement and so forth. With a lack of initial information the expert system enables to formulate a diagnosis, calculating the probability of faults hypotheses, given the degree of the expert confidence in estimation of turbine components operation parameters.
Bohmanova, J; Miglior, F; Jamrozik, J; Misztal, I; Sullivan, P G
2008-09-01
A random regression model with both random and fixed regressions fitted by Legendre polynomials of order 4 was compared with 3 alternative models fitting linear splines with 4, 5, or 6 knots. The effects common for all models were a herd-test-date effect, fixed regressions on days in milk (DIM) nested within region-age-season of calving class, and random regressions for additive genetic and permanent environmental effects. Data were test-day milk, fat and protein yields, and SCS recorded from 5 to 365 DIM during the first 3 lactations of Canadian Holstein cows. A random sample of 50 herds consisting of 96,756 test-day records was generated to estimate variance components within a Bayesian framework via Gibbs sampling. Two sets of genetic evaluations were subsequently carried out to investigate performance of the 4 models. Models were compared by graphical inspection of variance functions, goodness of fit, error of prediction of breeding values, and stability of estimated breeding values. Models with splines gave lower estimates of variances at extremes of lactations than the model with Legendre polynomials. Differences among models in goodness of fit measured by percentages of squared bias, correlations between predicted and observed records, and residual variances were small. The deviance information criterion favored the spline model with 6 knots. Smaller error of prediction and higher stability of estimated breeding values were achieved by using spline models with 5 and 6 knots compared with the model with Legendre polynomials. In general, the spline model with 6 knots had the best overall performance based upon the considered model comparison criteria.
Postmolar gestational trophoblastic neoplasia: beyond the traditional risk factors.
Bakhtiyari, Mahmood; Mirzamoradi, Masoumeh; Kimyaiee, Parichehr; Aghaie, Abbas; Mansournia, Mohammd Ali; Ashrafi-Vand, Sepideh; Sarfjoo, Fatemeh Sadat
2015-09-01
To investigate the slope of linear regression of postevacuation serum hCG as an independent risk factor for postmolar gestational trophoblastic neoplasia (GTN). Multicenter retrospective cohort study. Academic referral health care centers. All subjects with confirmed hydatidiform mole and at least four measurements of β-hCG titer. None. Type and magnitude of the relationship between the slope of linear regression of β-hCG as a new risk factor and GTN using Bayesian logistic regression with penalized log-likelihood estimation. Among the high-risk and low-risk molar pregnancy cases, 11 (18.6%) and 19 cases (13.3%) had GTN, respectively. No significant relationship was found between the components of a high-risk pregnancy and GTN. The β-hCG return slope was higher in the spontaneous cure group. However, the initial level of this hormone in the first measurement was higher in the GTN group compared with in the spontaneous recovery group. The average time for diagnosing GTN in the high-risk molar pregnancy group was 2 weeks less than that of the low-risk molar pregnancy group. In addition to slope of linear regression of β-hCG (odds ratio [OR], 12.74, confidence interval [CI], 5.42-29.2), abortion history (OR, 2.53; 95% CI, 1.27-5.04) and large uterine height for gestational age (OR, 1.26; CI, 1.04-1.54) had the maximum effects on GTN outcome, respectively. The slope of linear regression of β-hCG was introduced as an independent risk factor, which could be used for clinical decision making based on records of β-hCG titer and subsequent prevention program. Copyright © 2015 American Society for Reproductive Medicine. Published by Elsevier Inc. All rights reserved.
Prediction by regression and intrarange data scatter in surface-process studies
Toy, T.J.; Osterkamp, W.R.; Renard, K.G.
1993-01-01
Modeling is a major component of contemporary earth science, and regression analysis occupies a central position in the parameterization, calibration, and validation of geomorphic and hydrologic models. Although this methodology can be used in many ways, we are primarily concerned with the prediction of values for one variable from another variable. Examination of the literature reveals considerable inconsistency in the presentation of the results of regression analysis and the occurrence of patterns in the scatter of data points about the regression line. Both circumstances confound utilization and evaluation of the models. Statisticians are well aware of various problems associated with the use of regression analysis and offer improved practices; often, however, their guidelines are not followed. After a review of the aforementioned circumstances and until standard criteria for model evaluation become established, we recommend, as a minimum, inclusion of scatter diagrams, the standard error of the estimate, and sample size in reporting the results of regression analyses for most surface-process studies. ?? 1993 Springer-Verlag.
Mountain torrents: Quantifying vulnerability and assessing uncertainties
Totschnig, Reinhold; Fuchs, Sven
2013-01-01
Vulnerability assessment for elements at risk is an important component in the framework of risk assessment. The vulnerability of buildings affected by torrent processes can be quantified by vulnerability functions that express a mathematical relationship between the degree of loss of individual elements at risk and the intensity of the impacting process. Based on data from the Austrian Alps, we extended a vulnerability curve for residential buildings affected by fluvial sediment transport processes to other torrent processes and other building types. With respect to this goal to merge different data based on different processes and building types, several statistical tests were conducted. The calculation of vulnerability functions was based on a nonlinear regression approach applying cumulative distribution functions. The results suggest that there is no need to distinguish between different sediment-laden torrent processes when assessing vulnerability of residential buildings towards torrent processes. The final vulnerability functions were further validated with data from the Italian Alps and different vulnerability functions presented in the literature. This comparison showed the wider applicability of the derived vulnerability functions. The uncertainty inherent to regression functions was quantified by the calculation of confidence bands. The derived vulnerability functions may be applied within the framework of risk management for mountain hazards within the European Alps. The method is transferable to other mountain regions if the input data needed are available. PMID:27087696
Integrated Low-Rank-Based Discriminative Feature Learning for Recognition.
Zhou, Pan; Lin, Zhouchen; Zhang, Chao
2016-05-01
Feature learning plays a central role in pattern recognition. In recent years, many representation-based feature learning methods have been proposed and have achieved great success in many applications. However, these methods perform feature learning and subsequent classification in two separate steps, which may not be optimal for recognition tasks. In this paper, we present a supervised low-rank-based approach for learning discriminative features. By integrating latent low-rank representation (LatLRR) with a ridge regression-based classifier, our approach combines feature learning with classification, so that the regulated classification error is minimized. In this way, the extracted features are more discriminative for the recognition tasks. Our approach benefits from a recent discovery on the closed-form solutions to noiseless LatLRR. When there is noise, a robust Principal Component Analysis (PCA)-based denoising step can be added as preprocessing. When the scale of a problem is large, we utilize a fast randomized algorithm to speed up the computation of robust PCA. Extensive experimental results demonstrate the effectiveness and robustness of our method.
Chuang, Yung-Chung Matt; Shiu, Yi-Shiang
2016-01-01
Tea is an important but vulnerable economic crop in East Asia, highly impacted by climate change. This study attempts to interpret tea land use/land cover (LULC) using very high resolution WorldView-2 imagery of central Taiwan with both pixel and object-based approaches. A total of 80 variables derived from each WorldView-2 band with pan-sharpening, standardization, principal components and gray level co-occurrence matrix (GLCM) texture indices transformation, were set as the input variables. For pixel-based image analysis (PBIA), 34 variables were selected, including seven principal components, 21 GLCM texture indices and six original WorldView-2 bands. Results showed that support vector machine (SVM) had the highest tea crop classification accuracy (OA = 84.70% and KIA = 0.690), followed by random forest (RF), maximum likelihood algorithm (ML), and logistic regression analysis (LR). However, the ML classifier achieved the highest classification accuracy (OA = 96.04% and KIA = 0.887) in object-based image analysis (OBIA) using only six variables. The contribution of this study is to create a new framework for accurately identifying tea crops in a subtropical region with real-time high-resolution WorldView-2 imagery without field survey, which could further aid agriculture land management and a sustainable agricultural product supply. PMID:27128915
Chuang, Yung-Chung Matt; Shiu, Yi-Shiang
2016-04-26
Tea is an important but vulnerable economic crop in East Asia, highly impacted by climate change. This study attempts to interpret tea land use/land cover (LULC) using very high resolution WorldView-2 imagery of central Taiwan with both pixel and object-based approaches. A total of 80 variables derived from each WorldView-2 band with pan-sharpening, standardization, principal components and gray level co-occurrence matrix (GLCM) texture indices transformation, were set as the input variables. For pixel-based image analysis (PBIA), 34 variables were selected, including seven principal components, 21 GLCM texture indices and six original WorldView-2 bands. Results showed that support vector machine (SVM) had the highest tea crop classification accuracy (OA = 84.70% and KIA = 0.690), followed by random forest (RF), maximum likelihood algorithm (ML), and logistic regression analysis (LR). However, the ML classifier achieved the highest classification accuracy (OA = 96.04% and KIA = 0.887) in object-based image analysis (OBIA) using only six variables. The contribution of this study is to create a new framework for accurately identifying tea crops in a subtropical region with real-time high-resolution WorldView-2 imagery without field survey, which could further aid agriculture land management and a sustainable agricultural product supply.
Estimating Driving Performance Based on EEG Spectrum Analysis
NASA Astrophysics Data System (ADS)
Lin, Chin-Teng; Wu, Ruei-Cheng; Jung, Tzyy-Ping; Liang, Sheng-Fu; Huang, Teng-Yi
2005-12-01
The growing number of traffic accidents in recent years has become a serious concern to society. Accidents caused by driver's drowsiness behind the steering wheel have a high fatality rate because of the marked decline in the driver's abilities of perception, recognition, and vehicle control abilities while sleepy. Preventing such accidents caused by drowsiness is highly desirable but requires techniques for continuously detecting, estimating, and predicting the level of alertness of drivers and delivering effective feedbacks to maintain their maximum performance. This paper proposes an EEG-based drowsiness estimation system that combines electroencephalogram (EEG) log subband power spectrum, correlation analysis, principal component analysis, and linear regression models to indirectly estimate driver's drowsiness level in a virtual-reality-based driving simulator. Our results demonstrated that it is feasible to accurately estimate quantitatively driving performance, expressed as deviation between the center of the vehicle and the center of the cruising lane, in a realistic driving simulator.
2014-01-01
A brief overview is provided of cosinor-based techniques for the analysis of time series in chronobiology. Conceived as a regression problem, the method is applicable to non-equidistant data, a major advantage. Another dividend is the feasibility of deriving confidence intervals for parameters of rhythmic components of known periods, readily drawn from the least squares procedure, stressing the importance of prior (external) information. Originally developed for the analysis of short and sparse data series, the extended cosinor has been further developed for the analysis of long time series, focusing both on rhythm detection and parameter estimation. Attention is given to the assumptions underlying the use of the cosinor and ways to determine whether they are satisfied. In particular, ways of dealing with non-stationary data are presented. Examples illustrate the use of the different cosinor-based methods, extending their application from the study of circadian rhythms to the mapping of broad time structures (chronomes). PMID:24725531
Mirmohseni, A; Abdollahi, H; Rostamizadeh, K
2007-02-28
Net analyte signal (NAS)-based method called HLA/GO was applied for the selectively determination of binary mixture of ethanol and water by quartz crystal nanobalance (QCN) sensor. A full factorial design was applied for the formation of calibration and prediction sets in the concentration ranges 5.5-22.2 microg mL(-1) for ethanol and 7.01-28.07 microg mL(-1) for water. An optimal time range was selected by procedure which was based on the calculation of the net analyte signal regression plot in any considered time window for each test sample. A moving window strategy was used for searching the region with maximum linearity of NAS regression plot (minimum error indicator) and minimum of PRESS value. On the base of obtained results, the differences on the adsorption profiles in the time range between 1 and 600 s were used to determine mixtures of both compounds by HLA/GO method. The calculation of the net analytical signal using HLA/GO method allows determination of several figures of merit like selectivity, sensitivity, analytical sensitivity and limit of detection, for each component. To check the ability of the proposed method in the selection of linear regions of adsorption profile, a test for detecting non-linear regions of adsorption profile data in the presence of methanol was also described. The results showed that the method was successfully applied for the determination of ethanol and water.
Estimating the Biodegradability of Treated Sewage Samples Using Synchronous Fluorescence Spectra
Lai, Tien M.; Shin, Jae-Ki; Hur, Jin
2011-01-01
Synchronous fluorescence spectra (SFS) and the first derivative spectra of the influent versus the effluent wastewater samples were compared and the use of fluorescence indices is suggested as a means to estimate the biodegradability of the effluent wastewater. Three distinct peaks were identified from the SFS of the effluent wastewater samples. Protein-like fluorescence (PLF) was reduced, whereas fulvic and/or humic-like fluorescence (HLF) were enhanced, suggesting that the two fluorescence characteristics may represent biodegradable and refractory components, respectively. Five fluorescence indices were selected for the biodegradability estimation based on the spectral features changing from the influent to the effluent. Among the selected indices, the relative distribution of PLF to the total fluorescence area of SFS (Index II) exhibited the highest correlation coefficient with total organic carbon (TOC)-based biodegradability, which was even higher than those obtained with the traditional oxygen demand-based parameters. A multiple regression analysis using Index II and the area ratio of PLF to HLF (Index III) demonstrated the enhancement of the correlations from 0.558 to 0.711 for TOC-based biodegradability. The multiple regression equation finally obtained was 0.148 × Index II − 4.964 × Index III − 0.001 and 0.046 × Index II − 1.128 × Index III + 0.026. The fluorescence indices proposed here are expected to be utilized for successful development of real-time monitoring using a simple fluorescence sensing device for the biodegradability of treated sewage. PMID:22164023
Estimating the biodegradability of treated sewage samples using synchronous fluorescence spectra.
Lai, Tien M; Shin, Jae-Ki; Hur, Jin
2011-01-01
Synchronous fluorescence spectra (SFS) and the first derivative spectra of the influent versus the effluent wastewater samples were compared and the use of fluorescence indices is suggested as a means to estimate the biodegradability of the effluent wastewater. Three distinct peaks were identified from the SFS of the effluent wastewater samples. Protein-like fluorescence (PLF) was reduced, whereas fulvic and/or humic-like fluorescence (HLF) were enhanced, suggesting that the two fluorescence characteristics may represent biodegradable and refractory components, respectively. Five fluorescence indices were selected for the biodegradability estimation based on the spectral features changing from the influent to the effluent. Among the selected indices, the relative distribution of PLF to the total fluorescence area of SFS (Index II) exhibited the highest correlation coefficient with total organic carbon (TOC)-based biodegradability, which was even higher than those obtained with the traditional oxygen demand-based parameters. A multiple regression analysis using Index II and the area ratio of PLF to HLF (Index III) demonstrated the enhancement of the correlations from 0.558 to 0.711 for TOC-based biodegradability. The multiple regression equation finally obtained was 0.148 × Index II - 4.964 × Index III - 0.001 and 0.046 × Index II - 1.128 × Index III + 0.026. The fluorescence indices proposed here are expected to be utilized for successful development of real-time monitoring using a simple fluorescence sensing device for the biodegradability of treated sewage.
Feng, Zhaozhong; Calatayud, Vicent; Zhu, Jianguo; Kobayashi, Kazuhiko
2018-04-01
Five winter wheat cultivars were exposed to ambient (A-O 3 ) and elevated (E-O 3 , 1.5 ambient) O 3 in a fully open-air fumigation system in China. Ozone exposure- and flux based response relationships were established for seven physiological variables related to photosynthesis. The performance of the fitting of the regressions in terms of R 2 increased when second order regressions instead of first order ones were used, suggesting that effects of O 3 were more pronounced towards the last developmental stages of the wheat. The more robust indicators were those related with CO 2 assimilation, Rubisco activity and RuBP regeneration capacity (A sat , J max and Vc max ), and chlorophyll content (Chl). Flux-based metrics (POD y , Phytotoxic O 3 Dose over a threshold ynmolO 3 m -2 s -1 ) predicted slightly better the responses to O 3 than exposure metrics (AOTX, Accumulated O 3 exposure over an hourly Threshold of X ppb) for most of the variables. The best performance was observed for metrics POD 1 ( A sat , J max and Vc max ) and POD 3 (Chl). For this crop, the proposed response functions could be used for O 3 risk assessment based on physiological effects and also to include the influence of O 3 on yield or other variables in models with a photosynthetic component. Copyright © 2017 Elsevier B.V. All rights reserved.
Prodinger, Birgit; Cieza, Alarcos; Oberhauser, Cornelia; Bickenbach, Jerome; Üstün, Tevfik Bedirhan; Chatterji, Somnath; Stucki, Gerold
2016-06-01
To develop a comprehensive set of the International Classification of Functioning, Disability and Health (ICF) categories as a minimal standard for reporting and assessing functioning and disability in clinical populations along the continuum of care. The specific aims were to specify the domains of functioning recommended for an ICF Rehabilitation Set and to identify a minimal set of environmental factors (EFs) to be used alongside the ICF Rehabilitation Set when describing disability across individuals and populations with various health conditions. Secondary analysis of existing data sets using regression methods (Random Forests and Group Lasso regression) and expert consultations. Along the continuum of care, including acute, early postacute, and long-term and community rehabilitation settings. Persons (N=9863) with various health conditions participated in primary studies. The number of respondents for whom the dependent variable data were available and used in this analysis was 9264. Not applicable. For regression analyses, self-reported general health was used as a dependent variable. The ICF categories from the functioning component and the EF component were used as independent variables for the development of the ICF Rehabilitation Set and the minimal set of EFs, respectively. Thirty ICF categories to be complemented with 12 EFs were identified as relevant to the identified ICF sets. The ICF Rehabilitation Set constitutes of 9 ICF categories from the component body functions and 21 from the component activities and participation. The minimal set of EFs contains 12 categories spanning all chapters of the EF component of the ICF. The identified sets proposed serve as minimal generic sets of aspects of functioning in clinical populations for reporting data within and across heath conditions, time, clinical settings including rehabilitation, and countries. These sets present a reference framework for harmonizing existing information on disability across general and clinical populations. Copyright © 2016 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Functional mixture regression.
Yao, Fang; Fu, Yuejiao; Lee, Thomas C M
2011-04-01
In functional linear models (FLMs), the relationship between the scalar response and the functional predictor process is often assumed to be identical for all subjects. Motivated by both practical and methodological considerations, we relax this assumption and propose a new class of functional regression models that allow the regression structure to vary for different groups of subjects. By projecting the predictor process onto its eigenspace, the new functional regression model is simplified to a framework that is similar to classical mixture regression models. This leads to the proposed approach named as functional mixture regression (FMR). The estimation of FMR can be readily carried out using existing software implemented for functional principal component analysis and mixture regression. The practical necessity and performance of FMR are illustrated through applications to a longevity analysis of female medflies and a human growth study. Theoretical investigations concerning the consistent estimation and prediction properties of FMR along with simulation experiments illustrating its empirical properties are presented in the supplementary material available at Biostatistics online. Corresponding results demonstrate that the proposed approach could potentially achieve substantial gains over traditional FLMs.
Song, Xiao-Dong; Zhang, Gan-Lin; Liu, Feng; Li, De-Cheng; Zhao, Yu-Guo
2016-11-01
The influence of anthropogenic activities and natural processes involved high uncertainties to the spatial variation modeling of soil available zinc (AZn) in plain river network regions. Four datasets with different sampling densities were split over the Qiaocheng district of Bozhou City, China. The difference of AZn concentrations regarding soil types was analyzed by the principal component analysis (PCA). Since the stationarity was not indicated and effective ranges of four datasets were larger than the sampling extent (about 400 m), two investigation tools, namely F3 test and stationarity index (SI), were employed to test the local non-stationarity. Geographically weighted regression (GWR) technique was performed to describe the spatial heterogeneity of AZn concentrations under the non-stationarity assumption. GWR based on grouped soil type information (GWRG for short) was proposed so as to benefit the local modeling of soil AZn within each soil-landscape unit. For reference, the multiple linear regression (MLR) model, a global regression technique, was also employed and incorporated the same predictors as in the GWR models. Validation results based on 100 times realization demonstrated that GWRG outperformed MLR and can produce similar or better accuracy than the GWR approach. Nevertheless, GWRG can generate better soil maps than GWR for limit soil data. Two-sample t test of produced soil maps also confirmed significantly different means. Variogram analysis of the model residuals exhibited weak spatial correlation, rejecting the use of hybrid kriging techniques. As a heuristically statistical method, the GWRG was beneficial in this study and potentially for other soil properties.
Popa, Laurentiu S.; Hewitt, Angela L.; Ebner, Timothy J.
2012-01-01
The cerebellum has been implicated in processing motor errors required for online control of movement and motor learning. The dominant view is that Purkinje cell complex spike discharge signals motor errors. This study investigated whether errors are encoded in the simple spike discharge of Purkinje cells in monkeys trained to manually track a pseudo-randomly moving target. Four task error signals were evaluated based on cursor movement relative to target movement. Linear regression analyses based on firing residuals ensured that the modulation with a specific error parameter was independent of the other error parameters and kinematics. The results demonstrate that simple spike firing in lobules IV–VI is significantly correlated with position, distance and directional errors. Independent of the error signals, the same Purkinje cells encode kinematics. The strongest error modulation occurs at feedback timing. However, in 72% of cells at least one of the R2 temporal profiles resulting from regressing firing with individual errors exhibit two peak R2 values. For these bimodal profiles, the first peak is at a negative τ (lead) and a second peak at a positive τ (lag), implying that Purkinje cells encode both prediction and feedback about an error. For the majority of the bimodal profiles, the signs of the regression coefficients or preferred directions reverse at the times of the peaks. The sign reversal results in opposing simple spike modulation for the predictive and feedback components. Dual error representations may provide the signals needed to generate sensory prediction errors used to update a forward internal model. PMID:23115173
NASA Astrophysics Data System (ADS)
Zimmerling, Clemens; Dörr, Dominik; Henning, Frank; Kärger, Luise
2018-05-01
Due to their high mechanical performance, continuous fibre reinforced plastics (CoFRP) become increasingly important for load bearing structures. In many cases, manufacturing CoFRPs comprises a forming process of textiles. To predict and optimise the forming behaviour of a component, numerical simulations are applied. However, for maximum part quality, both the geometry and the process parameters must match in mutual regard, which in turn requires numerous numerically expensive optimisation iterations. In both textile and metal forming, a lot of research has focused on determining optimum process parameters, whilst regarding the geometry as invariable. In this work, a meta-model based approach on component level is proposed, that provides a rapid estimation of the formability for variable geometries based on pre-sampled, physics-based draping data. Initially, a geometry recognition algorithm scans the geometry and extracts a set of doubly-curved regions with relevant geometry parameters. If the relevant parameter space is not part of an underlying data base, additional samples via Finite-Element draping simulations are drawn according to a suitable design-table for computer experiments. Time saving parallel runs of the physical simulations accelerate the data acquisition. Ultimately, a Gaussian Regression meta-model is built from the data base. The method is demonstrated on a box-shaped generic structure. The predicted results are in good agreement with physics-based draping simulations. Since evaluations of the established meta-model are numerically inexpensive, any further design exploration (e.g. robustness analysis or design optimisation) can be performed in short time. It is expected that the proposed method also offers great potential for future applications along virtual process chains: For each process step along the chain, a meta-model can be set-up to predict the impact of design variations on manufacturability and part performance. Thus, the method is considered to facilitate a lean and economic part and process design under consideration of manufacturing effects.
A use of regression analysis in acoustical diagnostics of gear drives
NASA Technical Reports Server (NTRS)
Balitskiy, F. Y.; Genkin, M. D.; Ivanova, M. A.; Kobrinskiy, A. A.; Sokolova, A. G.
1973-01-01
A study is presented of components of the vibration spectrum as the filtered first and second harmonics of the tooth frequency which permits information to be obtained on the physical characteristics of the vibration excitation process, and an approach to be made to comparison of models of the gearing. Regression analysis of two random processes has shown a strong dependence of the second harmonic on the first, and independence of the first from the second. The nature of change in the regression line, with change in loading moment, gives rise to the idea of a variable phase shift between the first and second harmonics.
Quantile regression in the presence of monotone missingness with sensitivity analysis
Liu, Minzhao; Daniels, Michael J.; Perri, Michael G.
2016-01-01
In this paper, we develop methods for longitudinal quantile regression when there is monotone missingness. In particular, we propose pattern mixture models with a constraint that provides a straightforward interpretation of the marginal quantile regression parameters. Our approach allows sensitivity analysis which is an essential component in inference for incomplete data. To facilitate computation of the likelihood, we propose a novel way to obtain analytic forms for the required integrals. We conduct simulations to examine the robustness of our approach to modeling assumptions and compare its performance to competing approaches. The model is applied to data from a recent clinical trial on weight management. PMID:26041008
Forecasting daily meteorological time series using ARIMA and regression models
NASA Astrophysics Data System (ADS)
Murat, Małgorzata; Malinowska, Iwona; Gos, Magdalena; Krzyszczak, Jaromir
2018-04-01
The daily air temperature and precipitation time series recorded between January 1, 1980 and December 31, 2010 in four European sites (Jokioinen, Dikopshof, Lleida and Lublin) from different climatic zones were modeled and forecasted. In our forecasting we used the methods of the Box-Jenkins and Holt- Winters seasonal auto regressive integrated moving-average, the autoregressive integrated moving-average with external regressors in the form of Fourier terms and the time series regression, including trend and seasonality components methodology with R software. It was demonstrated that obtained models are able to capture the dynamics of the time series data and to produce sensible forecasts.
Gautam, Rajesh K.; Kapoor, Anup K.; Kshatriya, G. K.
2009-01-01
The present investigation on fertility and mortality differential among Kinnaura of the Himalayan highland is based on data collected from 160 post-menopausal women belonging to the middle and high altitude region of Kinnaur district of Himachal Pradesh (Indian Himalayas). Selection potential based on differential fertility and mortality was computed for middle-and high-altitude women. Irrespective of the methodology, the total index of selection was found to be highest among middle-altitude women (0.386) as compared with high-altitude (0.370) women, whereas for the total population it is estimated to be 0.384. It was found that the Kinnaura of the Himalayan highland showing moderate index of total selection and relative contribution of the mortality component (Im) to the index of total selection is higher than the corresponding fertility component (If). The analysis of embryonic and post-natal mortality components shows that the post-natal mortality components are higher in comparison with the embryonic mortality components among highlanders and needs special intervention and health care. The present findings are compared with other Indian tribes as well as non-tribes of the Himalayan region and other parts of the country. It reveals that this index among Kinnaura is moderate than the other population groups; among the Himalayan population, the highest was reported for Galong (It = 1.07) of Arunachal, whereas the lowest was reported from Ahom (It = 0.218) of Manipur. The correlation and regression analysis between total index of selection (It) and fertility (If) and mortality (Im) components for pooled data of populations of the Indian Himalayan states show that If and Im account for 21.6 and 29.1% variability, respectively. In Crow's total index of selection (It) along with strong association, which is significant at the 1% level, this indicates that mortality plays a greater role in natural selection in comparison with fertility among populations of the Indian Himalayas. PMID:21088718
Ho, Hsing-Hao; Li, Ya-Hui; Lee, Jih-Chin; Wang, Chih-Wei; Yu, Yi-Lin; Hueng, Dueng-Yuan; Hsu, Hsian-He
2018-01-01
Purpose We estimated the volume of vestibular schwannomas by an ice cream cone formula using thin-sliced magnetic resonance images (MRI) and compared the estimation accuracy among different estimating formulas and between different models. Methods The study was approved by a local institutional review board. A total of 100 patients with vestibular schwannomas examined by MRI between January 2011 and November 2015 were enrolled retrospectively. Informed consent was waived. Volumes of vestibular schwannomas were estimated by cuboidal, ellipsoidal, and spherical formulas based on a one-component model, and cuboidal, ellipsoidal, Linskey’s, and ice cream cone formulas based on a two-component model. The estimated volumes were compared to the volumes measured by planimetry. Intraobserver reproducibility and interobserver agreement was tested. Estimation error, including absolute percentage error (APE) and percentage error (PE), was calculated. Statistical analysis included intraclass correlation coefficient (ICC), linear regression analysis, one-way analysis of variance, and paired t-tests with P < 0.05 considered statistically significant. Results Overall tumor size was 4.80 ± 6.8 mL (mean ±standard deviation). All ICCs were no less than 0.992, suggestive of high intraobserver reproducibility and high interobserver agreement. Cuboidal formulas significantly overestimated the tumor volume by a factor of 1.9 to 2.4 (P ≤ 0.001). The one-component ellipsoidal and spherical formulas overestimated the tumor volume with an APE of 20.3% and 29.2%, respectively. The two-component ice cream cone method, and ellipsoidal and Linskey’s formulas significantly reduced the APE to 11.0%, 10.1%, and 12.5%, respectively (all P < 0.001). Conclusion The ice cream cone method and other two-component formulas including the ellipsoidal and Linskey’s formulas allow for estimation of vestibular schwannoma volume more accurately than all one-component formulas. PMID:29438424
A Developmental Model of Cross-Cultural Competence at the Tactical Level
2010-11-01
components of 3C and describe how 3C develops in Soldiers. Five components of 3C were identified: Cultural Maturity , Cognitive Flexibility, Cultural...a result of the data analysis: Cultural Maturity , Cognitive Flexibility, Cultural Knowledge, Cultural Acuity, and Interpersonal Skills. These five...create regressions in the 3C development process. In short, KSAAs mature interdependently and simultaneously. Thus, development and transitions across
Puchtel, I.S.; Walker, R.J.; James, O.B.; Kring, D.A.
2008-01-01
To characterize the compositions of materials accreted to the Earth-Moon system between about 4.5 and 3.8 Ga, we have determined Os isotopic compositions and some highly siderophile element (HSE: Re, Os, Ir, Ru, Pt, and Pd) abundances in 48 subsamples of six lunar breccias. These are: Apollo 17 poikilitic melt breccias 72395 and 76215; Apollo 17 aphanitic melt breccias 73215 and 73255; Apollo 14 polymict breccia 14321; and lunar meteorite NWA482, a crystallized impact melt. Plots of Ir versus other HSE define excellent linear correlations, indicating that all data sets likely represent dominantly two-component mixtures of a low-HSE target, presumably endogenous component, and a high-HSE, presumably exogenous component. Linear regressions of these trends yield intercepts that are statistically indistinguishable from zero for all HSE, except for Ru and Pd in two samples. The slopes of the linear regressions are insensitive to target rock contributions of Ru and Pd of the magnitude observed; thus, the trendline slopes approximate the elemental ratios present in the impactor components contributed to these rocks. The 187Os/188Os and regression-derived elemental ratios for the Apollo 17 aphanitic melt breccias and the lunar meteorite indicate that the impactor components in these samples have close affinities to chondritic meteorites. The HSE in the Apollo 17 aphanitic melt breccias, however, might partially or entirely reflect the HSE characteristics of HSE-rich granulitic breccia clasts that were incorporated in the impact melt at the time of its creation. In this case, the HSE characteristics of these rocks may reflect those of an impactor that predated the impact event that led to the creation of the melt breccias. The impactor components in the Apollo 17 poikilitic melt breccias and in the Apollo 14 breccia have higher 187Os/188Os, Pt/Ir, and Ru/Ir and lower Os/Ir than most chondrites. These compositions suggest that the impactors they represent were chemically distinct from known chondrite types, and possibly represent a type of primitive material not currently delivered to Earth as meteorites. ?? 2008 Elsevier Ltd.
Meijster, Tim; Burstyn, Igor; Van Wendel De Joode, Berna; Posthumus, Maarten A; Kromhout, Hans
2004-08-01
The goal of this study was to monitor emission of chemicals at a factory where plastics products were fabricated by a new robotic (impregnated tape winding) production process. Stationary and personal air measurements were taken to determine which chemicals were released and at what concentrations. Principal component analyses (PCA) and linear regression were used to determine the emission sources of different chemicals found in the air samples. We showed that complex mixtures of chemicals were released, but most concentrations were below Dutch exposure limits. Based on the results of the principal component analyses, the chemicals found were divided into three groups. The first group consisted of short chain aliphatic hydrocarbons (C2-C6). The second group included larger hydrocarbons (C9-C11) and some cyclic hydrocarbons. The third group contained all aromatic and two aliphatic hydrocarbons. Regression analyses showed that emission of the first group of chemicals was associated with cleaning activities and the use of epoxy resins. The second and third group showed strong association with the type of tape used in the new tape winding process. High levels of CO and HCN (above exposure limits) were measured on one occasion when a different brand of impregnated polypropylene sulphide tape was used in the tape winding process. Plans exist to drastically increase production with the new tape winding process. This will cause exposure levels to rise and therefore further control measures should be installed to reduce release of these chemicals.
NASA Astrophysics Data System (ADS)
Storm, Emma; Weniger, Christoph; Calore, Francesca
2017-08-01
We present SkyFACT (Sky Factorization with Adaptive Constrained Templates), a new approach for studying, modeling and decomposing diffuse gamma-ray emission. Like most previous analyses, the approach relies on predictions from cosmic-ray propagation codes like GALPROP and DRAGON. However, in contrast to previous approaches, we account for the fact that models are not perfect and allow for a very large number (gtrsim 105) of nuisance parameters to parameterize these imperfections. We combine methods of image reconstruction and adaptive spatio-spectral template regression in one coherent hybrid approach. To this end, we use penalized Poisson likelihood regression, with regularization functions that are motivated by the maximum entropy method. We introduce methods to efficiently handle the high dimensionality of the convex optimization problem as well as the associated semi-sparse covariance matrix, using the L-BFGS-B algorithm and Cholesky factorization. We test the method both on synthetic data as well as on gamma-ray emission from the inner Galaxy, |l|<90o and |b|<20o, as observed by the Fermi Large Area Telescope. We finally define a simple reference model that removes most of the residual emission from the inner Galaxy, based on conventional diffuse emission components as well as components for the Fermi bubbles, the Fermi Galactic center excess, and extended sources along the Galactic disk. Variants of this reference model can serve as basis for future studies of diffuse emission in and outside the Galactic disk.
Seo, Chang-Seob; Kim, Seong-Sil; Ha, Hyekyung
2013-01-01
This study was designed to perform simultaneous determination of three reference compounds in Syzygium aromaticum (SA), gallic acid, ellagic acid, and eugenol, and to investigate the chemical antagonistic effect when combining Curcuma aromatica (CA) with SA, based on chromatographic analysis. The values of LODs and LOQs were 0.01–0.11 μg/mL and 0.03–0.36 μg/mL, respectively. The intraday and interday precisions were <3.0 of RSD values, and the recovery was in the range of 92.19–103.24%, with RSD values <3.0%. Repeatability and stability were 0.38–0.73% and 0.49–2.24%, respectively. Compared with the content of reference and relative peaks in SA and SA combined with CA (SAC), the amounts of gallic acid and eugenol were increased, while that of ellagic acid was decreased in SAC (compared with SA), and most of peak areas in SA were reduced in SAC. Regression analysis of the relative peak areas between SA and SAC showed r 2 values >0.87, indicating a linear relationship between SA and SAC. These results demonstrate that the components contained in CA could affect the extraction of components of SA mainly in a decreasing manner. The antagonistic effect of CA on SA was verified by chemical analysis. PMID:23878761
Impact of Dental Disorders and its Influence on Self Esteem Levels among Adolescents.
Kaur, Puneet; Singh, Simarpreet; Mathur, Anmol; Makkar, Diljot Kaur; Aggarwal, Vikram Pal; Batra, Manu; Sharma, Anshika; Goyal, Nikita
2017-04-01
Self esteem is more of a psychological concept therefore, even the common dental disorders like dental trauma, tooth loss and untreated carious lesions may affect the self esteem thus influencing the quality of life. This study aims to assess the impact of dental disorders among the adolescents on their self esteem level. The present cross-sectional study was conducted among 10 to 17 years adolescents. In order to obtain a representative sample, multistage sampling technique was used and sample was selected based on Probability Proportional to Enrolment size (PPE). Oral health assessment was carried out using WHO type III examination and self esteem was estimated using the Rosenberg Self Esteem Scale score (RSES). The descriptive and inferential analysis of the data was done by using IBM SPSS software. Logistic and linear regression analysis was executed to test the individual association of different independent clinical variables with self esteem. Total sample of 1140 adolescents with mean age of 14.95 ±2.08 and RSES of 27.09 ±3.12 were considered. Stepwise multiple linear regression analysis was applied and best predictors in relation to RSES in the descending order were Dental Health Component (DHC), Aesthetic Component (AC), dental decay {(aesthetic zone), (masticatory zone)}, tooth loss {(aesthetic zone), (masticatory zone)} and anterior fracture of tooth. It was found that various dental disorders like malocclusion, anterior traumatic tooth, tooth loss and untreated decay causes a profound impact on aesthetics and psychosocial behaviour of adolescents, thus affecting their self esteem.
[Geographical distribution of left ventricular Tei index based on principal component analysis].
Xu, Jinhui; Ge, Miao; He, Jinwei; Xue, Ranyin; Yang, Shaofang; Jiang, Jilin
2014-11-01
To provide a scientific standard of left ventricular Tei index for healthy people from various region of China, and to lay a reliable foundation for the evaluation of left ventricular diastolic and systolic function. The correlation and principal component analysis were used to explore the left ventricular Tei index, which based on the data of 3 562 samples from 50 regions of China by means of literature retrieval. Th e nine geographical factors were longitude(X₁), latitude(X₂), altitude(X₃), annual sunshine hours (X₄), the annual average temperature (X₅), annual average relative humidity (X₆), annual precipitation (X₇), annual temperature range (X₈) and annual average wind speed (X₉). ArcGIS soft ware was applied to calculate the spatial distribution regularities of left ventricular Tei index. There is a significant correlation between the healthy people's left ventricular Tei index and geographical factors, and the correlation coefficients were -0.107 (r₁), -0.301 (r₂), -0.029 (r₃), -0.277 (r₄), -0.256(r₅), -0.289(r₆), -0.320(r₇), -0.310 (r₈) and -0.117 (r₉), respectively. A linear equation between the Tei index and the geographical factor was obtained by regression analysis based on the three extracting principal components. The geographical distribution tendency chart for healthy people's left Tei index was fitted out by the ArcGIS spatial interpolation analysis. The geographical distribution for left ventricular Tei index in China follows certain pattern. The reference value in North is higher than that in South, while the value in East is higher than that in West.
Suskind, Anne M; Clemens, J Quentin; Kaufman, Samuel R; Stoffel, John T; Oldendorf, Ann; Malaeb, Bahaa S; Jandron, Teresa; Cameron, Anne P
2015-03-01
To determine predictors of physical and emotional discomfort associated with urodynamic testing in men and women both with and without neurologic conditions. An anonymous questionnaire-based study was completed by patients immediately after undergoing fluoroscopic urodynamic testing. Participants were asked questions pertaining to their perceptions of physical and emotional discomfort related to the study, their urologic and general health history, and demographics. Logistic regression was performed to determine predictors of physical and emotional discomfort. A total of 314 patients completed the questionnaire representing a response rate of 60%. Half of the respondents (50.7%) felt that the examination was neither physically nor emotionally uncomfortable, whereas 29.0% and 12.4% of respondents felt that the physical and emotional components of the examination were most uncomfortable, respectively. Placement of the urethral catheter was the most commonly reported component of physical discomfort (42.9%), whereas anxiety (27.7%) was the most commonly reported component of emotional discomfort. Presence of a neurologic problem (odds ratio, 0.273; 95% confidence interval, 0.121-0.617) and older age (odds ratio, 0.585; 95% confidence interval, 0.405-0.847) were factors associated with less physical discomfort. There were no significant predictors of emotional discomfort based on our model. Urodynamic studies were well tolerated regardless of gender. Presence of a neurologic condition and older age were predictors of less physical discomfort. These findings are useful in counseling patients regarding what to expect when having urodynamic procedures. Copyright © 2015 Elsevier Inc. All rights reserved.
Stimulation artifact correction method for estimation of early cortico-cortical evoked potentials.
Trebaul, Lena; Rudrauf, David; Job, Anne-Sophie; Mălîia, Mihai Dragos; Popa, Irina; Barborica, Andrei; Minotti, Lorella; Mîndruţă, Ioana; Kahane, Philippe; David, Olivier
2016-05-01
Effective connectivity can be explored using direct electrical stimulations in patients suffering from drug-resistant focal epilepsies and investigated with intracranial electrodes. Responses to brief electrical pulses mimic the physiological propagation of signals and manifest as cortico-cortical evoked potentials (CCEP). The first CCEP component is believed to reflect direct connectivity with the stimulated region but the stimulation artifact, a sharp deflection occurring during a few milliseconds, frequently contaminates it. In order to recover the characteristics of early CCEP responses, we developed an artifact correction method based on electrical modeling of the electrode-tissue interface. The biophysically motivated artifact templates are then regressed out of the recorded data as in any classical template-matching removal artifact methods. Our approach is able to make the distinction between the physiological responses time-locked to the stimulation pulses and the non-physiological component. We tested the correction on simulated CCEP data in order to quantify its efficiency for different stimulation and recording parameters. We demonstrated the efficiency of the new correction method on simulations of single trial recordings for early responses contaminated with the stimulation artifact. The results highlight the importance of sampling frequency for an accurate analysis of CCEP. We then applied the approach to experimental data. The model-based template removal was compared to a correction based on the subtraction of the averaged artifact. This new correction method of stimulation artifact will enable investigators to better analyze early CCEP components and infer direct effective connectivity in future CCEP studies. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.
Ocké, Marga C
2013-05-01
This paper aims to describe different approaches for studying the overall diet with advantages and limitations. Studies of the overall diet have emerged because the relationship between dietary intake and health is very complex with all kinds of interactions. These cannot be captured well by studying single dietary components. Three main approaches to study the overall diet can be distinguished. The first method is researcher-defined scores or indices of diet quality. These are usually based on guidelines for a healthy diet or on diets known to be healthy. The second approach, using principal component or cluster analysis, is driven by the underlying dietary data. In principal component analysis, scales are derived based on the underlying relationships between food groups, whereas in cluster analysis, subgroups of the population are created with people that cluster together based on their dietary intake. A third approach includes methods that are driven by a combination of biological pathways and the underlying dietary data. Reduced rank regression defines linear combinations of food intakes that maximally explain nutrient intakes or intermediate markers of disease. Decision tree analysis identifies subgroups of a population whose members share dietary characteristics that influence (intermediate markers of) disease. It is concluded that all approaches have advantages and limitations and essentially answer different questions. The third approach is still more in an exploration phase, but seems to have great potential with complementary value. More insight into the utility of conducting studies on the overall diet can be gained if more attention is given to methodological issues.
Suskind, Anne M.; Clemens, J. Quentin; Kaufman, Samuel R.; Stoffel, John T.; Oldendorf, Ann; Malaeb, Bahaa S.; Jandron, Teresa; Cameron, Anne P.
2014-01-01
Objectives To determine predictors of physical and emotional discomfort associated with urodynamic testing in men and women both with and without neurologic conditions. Methods An anonymous questionnaire-based study completed by patients immediately after undergoing fluoroscopic urodynamic testing. Participants were asked questions pertaining to their perceptions of physical and emotional discomfort related to the study, their urologic and general health history, and demographics. Logistic regression was performed to determine predictors of physical and emotional discomfort. Results A total of 314 patients completed the questionnaire representing a response rate of 60%. Half of the respondents (50.7%) felt that the exam was neither physically nor emotionally uncomfortable, while 29.0% and 12.4% of respondents felt that the physical and emotional components of the exam were most uncomfortable, respectively. Placement of the urethral catheter was the most commonly reported component of physical discomfort (42.9%), while anxiety (27.7%) was the most commonly reported component of emotional discomfort. Having a neurologic problem (OR 0.273; 95% CI 0.121, 0.617) and older age (OR 0.585; 95% CI 0.405, 0.847) were factors associated with less physical discomfort. There were no significant predictors of emotional discomfort based on our model. Conclusions Urodynamic studies were well tolerated regardless of gender. Having a neurologic condition and older age were predictors of less physical discomfort. These findings are useful in counseling patients regarding what to expect when having urodynamic procedures. PMID:25733264
Dean, Jamie A; Wong, Kee H; Gay, Hiram; Welsh, Liam C; Jones, Ann-Britt; Schick, Ulrike; Oh, Jung Hun; Apte, Aditya; Newbold, Kate L; Bhide, Shreerang A; Harrington, Kevin J; Deasy, Joseph O; Nutting, Christopher M; Gulliford, Sarah L
2016-11-15
Current normal tissue complication probability modeling using logistic regression suffers from bias and high uncertainty in the presence of highly correlated radiation therapy (RT) dose data. This hinders robust estimates of dose-response associations and, hence, optimal normal tissue-sparing strategies from being elucidated. Using functional data analysis (FDA) to reduce the dimensionality of the dose data could overcome this limitation. FDA was applied to modeling of severe acute mucositis and dysphagia resulting from head and neck RT. Functional partial least squares regression (FPLS) and functional principal component analysis were used for dimensionality reduction of the dose-volume histogram data. The reduced dose data were input into functional logistic regression models (functional partial least squares-logistic regression [FPLS-LR] and functional principal component-logistic regression [FPC-LR]) along with clinical data. This approach was compared with penalized logistic regression (PLR) in terms of predictive performance and the significance of treatment covariate-response associations, assessed using bootstrapping. The area under the receiver operating characteristic curve for the PLR, FPC-LR, and FPLS-LR models was 0.65, 0.69, and 0.67, respectively, for mucositis (internal validation) and 0.81, 0.83, and 0.83, respectively, for dysphagia (external validation). The calibration slopes/intercepts for the PLR, FPC-LR, and FPLS-LR models were 1.6/-0.67, 0.45/0.47, and 0.40/0.49, respectively, for mucositis (internal validation) and 2.5/-0.96, 0.79/-0.04, and 0.79/0.00, respectively, for dysphagia (external validation). The bootstrapped odds ratios indicated significant associations between RT dose and severe toxicity in the mucositis and dysphagia FDA models. Cisplatin was significantly associated with severe dysphagia in the FDA models. None of the covariates was significantly associated with severe toxicity in the PLR models. Dose levels greater than approximately 1.0 Gy/fraction were most strongly associated with severe acute mucositis and dysphagia in the FDA models. FPLS and functional principal component analysis marginally improved predictive performance compared with PLR and provided robust dose-response associations. FDA is recommended for use in normal tissue complication probability modeling. Copyright © 2016 The Author(s). Published by Elsevier Inc. All rights reserved.
Morita, Yuko; Sasai-Sakuma, Taeko; Asaoka, Shoichi; Inoue, Yuichi
2015-10-15
This study investigated the prevalence and risk factors of insufficient sleep syndrome (ISS), and factors associated with daytime dysfunction in the disorder in Japanese young adults. In this cross-sectional study, a web-based questionnaire survey was used to assess demographic variables, sleep habits and quality, depressive symptoms, and health-related quality of life (HRQOL) in 2,276 participants aged 20-25. Eleven percent of participants were classified as having ISS. Multiple logistic regression analysis revealed that the presence of ISS was significantly associated with social status (student or full-time employee). The participants with ISS had significantly higher depression scores and lower mental component summary scores than healthy sleepers. In the participants with ISS, a delayed sleep-wake schedule was extracted as a factor associated with worse mental component summary. Results indicate a relatively high proportion of Japanese young adults suffer from ISS, and that the condition is associated with a social status of student or full-time employee. Moreover, a delayed sleep-wake schedule may lead to further deterioration of mental HRQOL in ISS-affected persons. © 2015 American Academy of Sleep Medicine.
Hanke, Alexander T; Tsintavi, Eleni; Ramirez Vazquez, Maria Del Pilar; van der Wielen, Luuk A M; Verhaert, Peter D E M; Eppink, Michel H M; van de Sandt, Emile J A X; Ottens, Marcel
2016-09-01
Knowledge-based development of chromatographic separation processes requires efficient techniques to determine the physicochemical properties of the product and the impurities to be removed. These characterization techniques are usually divided into approaches that determine molecular properties, such as charge, hydrophobicity and size, or molecular interactions with auxiliary materials, commonly in the form of adsorption isotherms. In this study we demonstrate the application of a three-dimensional liquid chromatography approach to a clarified cell homogenate containing a therapeutic enzyme. Each separation dimension determines a molecular property relevant to the chromatographic behavior of each component. Matching of the peaks across the different separation dimensions and against a high-resolution reference chromatogram allows to assign the determined parameters to pseudo-components, allowing to determine the most promising technique for the removal of each impurity. More detailed process design using mechanistic models requires isotherm parameters. For this purpose, the second dimension consists of multiple linear gradient separations on columns in a high-throughput screening compatible format, that allow regression of isotherm parameters with an average standard error of 8%. © 2016 American Institute of Chemical Engineers Biotechnol. Prog., 32:1283-1291, 2016. © 2016 American Institute of Chemical Engineers.
Ozdemir, Durmus; Dinc, Erdal
2004-07-01
Simultaneous determination of binary mixtures pyridoxine hydrochloride and thiamine hydrochloride in a vitamin combination using UV-visible spectrophotometry and classical least squares (CLS) and three newly developed genetic algorithm (GA) based multivariate calibration methods was demonstrated. The three genetic multivariate calibration methods are Genetic Classical Least Squares (GCLS), Genetic Inverse Least Squares (GILS) and Genetic Regression (GR). The sample data set contains the UV-visible spectra of 30 synthetic mixtures (8 to 40 microg/ml) of these vitamins and 10 tablets containing 250 mg from each vitamin. The spectra cover the range from 200 to 330 nm in 0.1 nm intervals. Several calibration models were built with the four methods for the two components. Overall, the standard error of calibration (SEC) and the standard error of prediction (SEP) for the synthetic data were in the range of <0.01 and 0.43 microg/ml for all the four methods. The SEP values for the tablets were in the range of 2.91 and 11.51 mg/tablets. A comparison of genetic algorithm selected wavelengths for each component using GR method was also included.
Mirman, Daniel; Zhang, Yongsheng; Wang, Ze; Coslett, H. Branch; Schwartz, Myrna F.
2015-01-01
Theories about the architecture of language processing differ with regard to whether verbal and nonverbal comprehension share a functional and neural substrate and how meaning extraction in comprehension relates to the ability to use meaning to drive verbal production. We (re-)evaluate data from 17 cognitive-linguistic performance measures of 99 participants with chronic aphasia using factor analysis to establish functional components and support vector regression-based lesion-symptom mapping to determine the neural correlates of deficits on these functional components. The results are highly consistent with our previous findings: production of semantic errors is behaviorally and neuroanatomically distinct from verbal and nonverbal comprehension. Semantic errors were most strongly associated with left ATL damage whereas deficits on tests of verbal and non-verbal semantic recognition were most strongly associated with damage to deep white matter underlying the frontal lobe at the confluence of multiple tracts, including the inferior fronto-occipital fasciculus, the uncinate fasciculus, and the anterior thalamic radiations. These results suggest that traditional views based on grey matter hub(s) for semantic processing are incomplete and that the role of white matter in semantic cognition has been underappreciated. PMID:25681739
Component analysis and initial validity of the exercise fear avoidance scale.
Wingo, Brooks C; Baskin, Monica; Ard, Jamy D; Evans, Retta; Roy, Jane; Vogtle, Laura; Grimley, Diane; Snyder, Scott
2013-01-01
To develop the Exercise Fear Avoidance Scale (EFAS) to measure fear of exercise-induced discomfort. We conducted principal component analysis to determine component structure and Cronbach's alpha to assess internal consistency of the EFAS. Relationships between EFAS scores, BMI, physical activity, and pain were analyzed using multivariate regression. The best fit was a 3-component structure: weight-specific fears, cardiorespiratory fears, and musculoskeletal fears. Cronbach's alpha for the EFAS was α=.86. EFAS scores significantly predicted BMI, physical activity, and PDI scores. Psychometric properties of this scale suggest it may be useful for tailoring exercise prescriptions to address fear of exercise-related discomfort.
Importance of spatial autocorrelation in modeling bird distributions at a continental scale
Bahn, V.; O'Connor, R.J.; Krohn, W.B.
2006-01-01
Spatial autocorrelation in species' distributions has been recognized as inflating the probability of a type I error in hypotheses tests, causing biases in variable selection, and violating the assumption of independence of error terms in models such as correlation or regression. However, it remains unclear whether these problems occur at all spatial resolutions and extents, and under which conditions spatially explicit modeling techniques are superior. Our goal was to determine whether spatial models were superior at large extents and across many different species. In addition, we investigated the importance of purely spatial effects in distribution patterns relative to the variation that could be explained through environmental conditions. We studied distribution patterns of 108 bird species in the conterminous United States using ten years of data from the Breeding Bird Survey. We compared the performance of spatially explicit regression models with non-spatial regression models using Akaike's information criterion. In addition, we partitioned the variance in species distributions into an environmental, a pure spatial and a shared component. The spatially-explicit conditional autoregressive regression models strongly outperformed the ordinary least squares regression models. In addition, partialling out the spatial component underlying the species' distributions showed that an average of 17% of the explained variation could be attributed to purely spatial effects independent of the spatial autocorrelation induced by the underlying environmental variables. We concluded that location in the range and neighborhood play an important role in the distribution of species. Spatially explicit models are expected to yield better predictions especially for mobile species such as birds, even in coarse-grained models with a large extent. ?? Ecography.
Miller, Matthew P.; Johnson, Henry M.; Susong, David D.; Wolock, David M.
2015-01-01
Understanding how watershed characteristics and climate influence the baseflow component of stream discharge is a topic of interest to both the scientific and water management communities. Therefore, the development of baseflow estimation methods is a topic of active research. Previous studies have demonstrated that graphical hydrograph separation (GHS) and conductivity mass balance (CMB) methods can be applied to stream discharge data to estimate daily baseflow. While CMB is generally considered to be a more objective approach than GHS, its application across broad spatial scales is limited by a lack of high frequency specific conductance (SC) data. We propose a new method that uses discrete SC data, which are widely available, to estimate baseflow at a daily time step using the CMB method. The proposed approach involves the development of regression models that relate discrete SC concentrations to stream discharge and time. Regression-derived CMB baseflow estimates were more similar to baseflow estimates obtained using a CMB approach with measured high frequency SC data than were the GHS baseflow estimates at twelve snowmelt dominated streams and rivers. There was a near perfect fit between the regression-derived and measured CMB baseflow estimates at sites where the regression models were able to accurately predict daily SC concentrations. We propose that the regression-derived approach could be applied to estimate baseflow at large numbers of sites, thereby enabling future investigations of watershed and climatic characteristics that influence the baseflow component of stream discharge across large spatial scales.
Paschalidou, Anastasia K; Karakitsios, Spyridon; Kleanthous, Savvas; Kassomenos, Pavlos A
2011-02-01
In the present work, two types of artificial neural network (NN) models using the multilayer perceptron (MLP) and the radial basis function (RBF) techniques, as well as a model based on principal component regression analysis (PCRA), are employed to forecast hourly PM(10) concentrations in four urban areas (Larnaca, Limassol, Nicosia and Paphos) in Cyprus. The model development is based on a variety of meteorological and pollutant parameters corresponding to the 2-year period between July 2006 and June 2008, and the model evaluation is achieved through the use of a series of well-established evaluation instruments and methodologies. The evaluation reveals that the MLP NN models display the best forecasting performance with R (2) values ranging between 0.65 and 0.76, whereas the RBF NNs and the PCRA models reveal a rather weak performance with R (2) values between 0.37-0.43 and 0.33-0.38, respectively. The derived MLP models are also used to forecast Saharan dust episodes with remarkable success (probability of detection ranging between 0.68 and 0.71). On the whole, the analysis shows that the models introduced here could provide local authorities with reliable and precise predictions and alarms about air quality if used on an operational basis.
Towards molecular design using 2D-molecular contour maps obtained from PLS regression coefficients
NASA Astrophysics Data System (ADS)
Borges, Cleber N.; Barigye, Stephen J.; Freitas, Matheus P.
2017-12-01
The multivariate image analysis descriptors used in quantitative structure-activity relationships are direct representations of chemical structures as they are simply numerical decodifications of pixels forming the 2D chemical images. These MDs have found great utility in the modeling of diverse properties of organic molecules. Given the multicollinearity and high dimensionality of the data matrices generated with the MIA-QSAR approach, modeling techniques that involve the projection of the data space onto orthogonal components e.g. Partial Least Squares (PLS) have been generally used. However, the chemical interpretation of the PLS-based MIA-QSAR models, in terms of the structural moieties affecting the modeled bioactivity has not been straightforward. This work describes the 2D-contour maps based on the PLS regression coefficients, as a means of assessing the relevance of single MIA predictors to the response variable, and thus allowing for the structural, electronic and physicochemical interpretation of the MIA-QSAR models. A sample study to demonstrate the utility of the 2D-contour maps to design novel drug-like molecules is performed using a dataset of some anti-HIV-1 2-amino-6-arylsulfonylbenzonitriles and derivatives, and the inferences obtained are consistent with other reports in the literature. In addition, the different schemes for encoding atomic properties in molecules are discussed and evaluated.
Escarela, Gabriel; Ruiz-de-Chavez, Juan; Castillo-Morales, Alberto
2016-08-01
Competing risks arise in medical research when subjects are exposed to various types or causes of death. Data from large cohort studies usually exhibit subsets of regressors that are missing for some study subjects. Furthermore, such studies often give rise to censored data. In this article, a carefully formulated likelihood-based technique for the regression analysis of right-censored competing risks data when two of the covariates are discrete and partially missing is developed. The approach envisaged here comprises two models: one describes the covariate effects on both long-term incidence and conditional latencies for each cause of death, whilst the other deals with the observation process by which the covariates are missing. The former is formulated with a well-established mixture model and the latter is characterised by copula-based bivariate probability functions for both the missing covariates and the missing data mechanism. The resulting formulation lends itself to the empirical assessment of non-ignorability by performing sensitivity analyses using models with and without a non-ignorable component. The methods are illustrated on a 20-year follow-up involving a prostate cancer cohort from the National Cancer Institutes Surveillance, Epidemiology, and End Results program. © The Author(s) 2013.
Yang, Y-M; Lee, J; Kim, Y-I; Cho, B-H; Park, S-B
2014-08-01
This study aimed to determine the viability of using axial cervical vertebrae (ACV) as biological indicators of skeletal maturation and to build models that estimate ossification level with improved explanatory power over models based only on chronological age. The study population comprised 74 female and 47 male patients with available hand-wrist radiographs and cone-beam computed tomography images. Generalized Procrustes analysis was used to analyze the shape, size, and form of the ACV regions of interest. The variabilities of these factors were analyzed by principal component analysis. Skeletal maturation was then estimated using a multiple regression model. Separate models were developed for male and female participants. For the female estimation model, the adjusted R(2) explained 84.8% of the variability of the Sempé maturation level (SML), representing a 7.9% increase in SML explanatory power over that using chronological age alone (76.9%). For the male estimation model, the adjusted R(2) was over 90%, representing a 1.7% increase relative to the reference model. The simplest possible ACV morphometric information provided a statistically significant explanation of the portion of skeletal-maturation variability not dependent on chronological age. These results verify that ACV is a strong biological indicator of ossification status. © 2014 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Predicting summer monsoon of Bhutan based on SST and teleconnection indices
NASA Astrophysics Data System (ADS)
Dorji, Singay; Herath, Srikantha; Mishra, Binaya Kumar; Chophel, Ugyen
2018-02-01
The paper uses a statistical method of predicting summer monsoon over Bhutan using the ocean-atmospheric circulation variables of sea surface temperature (SST), mean sea-level pressure (MSLP), and selected teleconnection indices. The predictors are selected based on the correlation. They are the SST and MSLP of the Bay of Bengal and the Arabian Sea and the MSLP of Bangladesh and northeast India. The Northern Hemisphere teleconnections of East Atlantic Pattern (EA), West Pacific Pattern (WP), Pacific/North American Pattern, and East Atlantic/West Russia Pattern (EA/WR). The rainfall station data are grouped into two regions with principal components analysis and Ward's hierarchical clustering algorithm. A support vector machine for regression model is proposed to predict the monsoon. The model shows improved skills over traditional linear regression. The model was able to predict the summer monsoon for the test data from 2011 to 2015 with a total monthly root mean squared error of 112 mm for region A and 33 mm for region B. Model could also forecast the 2016 monsoon of the South Asia Monsoon Outlook of World Meteorological Organization (WMO) for Bhutan. The reliance on agriculture and hydropower economy makes the prediction of summer monsoon highly valuable information for farmers and various other sectors. The proposed method can predict summer monsoon for operational forecasting.
Boosted Regression Tree Models to Explain Watershed ...
Boosted regression tree (BRT) models were developed to quantify the nonlinear relationships between landscape variables and nutrient concentrations in a mesoscale mixed land cover watershed during base-flow conditions. Factors that affect instream biological components, based on the Index of Biotic Integrity (IBI), were also analyzed. Seasonal BRT models at two spatial scales (watershed and riparian buffered area [RBA]) for nitrite-nitrate (NO2-NO3), total Kjeldahl nitrogen, and total phosphorus (TP) and annual models for the IBI score were developed. Two primary factors — location within the watershed (i.e., geographic position, stream order, and distance to a downstream confluence) and percentage of urban land cover (both scales) — emerged as important predictor variables. Latitude and longitude interacted with other factors to explain the variability in summer NO2-NO3 concentrations and IBI scores. BRT results also suggested that location might be associated with indicators of sources (e.g., land cover), runoff potential (e.g., soil and topographic factors), and processes not easily represented by spatial data indicators. Runoff indicators (e.g., Hydrological Soil Group D and Topographic Wetness Indices) explained a substantial portion of the variability in nutrient concentrations as did point sources for TP in the summer months. The results from our BRT approach can help prioritize areas for nutrient management in mixed-use and heavily impacted watershed
NASA Astrophysics Data System (ADS)
Duan, Libin; Xiao, Ning-cong; Li, Guangyao; Cheng, Aiguo; Chen, Tao
2017-07-01
Tailor-rolled blank thin-walled (TRB-TH) structures have become important vehicle components owing to their advantages of light weight and crashworthiness. The purpose of this article is to provide an efficient lightweight design for improving the energy-absorbing capability of TRB-TH structures under dynamic loading. A finite element (FE) model for TRB-TH structures is established and validated by performing a dynamic axial crash test. Different material properties for individual parts with different thicknesses are considered in the FE model. Then, a multi-objective crashworthiness design of the TRB-TH structure is constructed based on the ɛ-support vector regression (ɛ-SVR) technique and non-dominated sorting genetic algorithm-II. The key parameters (C, ɛ and σ) are optimized to further improve the predictive accuracy of ɛ-SVR under limited sample points. Finally, the technique for order preference by similarity to the ideal solution method is used to rank the solutions in Pareto-optimal frontiers and find the best compromise optima. The results demonstrate that the light weight and crashworthiness performance of the optimized TRB-TH structures are superior to their uniform thickness counterparts. The proposed approach provides useful guidance for designing TRB-TH energy absorbers for vehicle bodies.
NASA Astrophysics Data System (ADS)
Reinhardt, Katja; Samimi, Cyrus
2018-01-01
While climatological data of high spatial resolution are largely available in most developed countries, the network of climatological stations in many other regions of the world still constitutes large gaps. Especially for those regions, interpolation methods are important tools to fill these gaps and to improve the data base indispensible for climatological research. Over the last years, new hybrid methods of machine learning and geostatistics have been developed which provide innovative prospects in spatial predictive modelling. This study will focus on evaluating the performance of 12 different interpolation methods for the wind components \\overrightarrow{u} and \\overrightarrow{v} in a mountainous region of Central Asia. Thereby, a special focus will be on applying new hybrid methods on spatial interpolation of wind data. This study is the first evaluating and comparing the performance of several of these hybrid methods. The overall aim of this study is to determine whether an optimal interpolation method exists, which can equally be applied for all pressure levels, or whether different interpolation methods have to be used for the different pressure levels. Deterministic (inverse distance weighting) and geostatistical interpolation methods (ordinary kriging) were explored, which take into account only the initial values of \\overrightarrow{u} and \\overrightarrow{v} . In addition, more complex methods (generalized additive model, support vector machine and neural networks as single methods and as hybrid methods as well as regression-kriging) that consider additional variables were applied. The analysis of the error indices revealed that regression-kriging provided the most accurate interpolation results for both wind components and all pressure heights. At 200 and 500 hPa, regression-kriging is followed by the different kinds of neural networks and support vector machines and for 850 hPa it is followed by the different types of support vector machine and ordinary kriging. Overall, explanatory variables improve the interpolation results.
Wang, Jinxu; Tong, Xin; Li, Peibo; Liu, Menghua; Peng, Wei; Cao, Hui; Su, Weiwei
2014-08-08
Shenqi Fuzheng Injection (SFI) is an injectable traditional Chinese herbal formula comprised of two Chinese herbs, Radix codonopsis and Radix astragali, which were commonly used to improve immune functions against chronic diseases in an integrative and holistic way in China and other East Asian countries for thousands of years. This present study was designed to explore the bioactive components on immuno-enhancement effects in SFI using the relevance analysis between chemical fingerprints and biological effects in vivo. According to a four-factor, nine-level uniform design, SFI samples were prepared with different proportions of the four portions separated from SFI via high speed counter current chromatography (HSCCC). SFI samples were assessed with high performance liquid chromatography (HPLC) for 23 identified components. For the immunosuppressed murine experiments, biological effects in vivo were evaluated on spleen index (E1), peripheral white blood cell counts (E2), bone marrow cell counts (E3), splenic lymphocyte proliferation (E4), splenic natural killer cell activity (E5), peritoneal macrophage phagocytosis (E6) and the amount of interleukin-2 (E7). Based on the hypothesis that biological effects in vivo varied with differences in components, multivariate relevance analysis, including gray relational analysis (GRA), multi-linear regression analysis (MLRA) and principal component analysis (PCA), were performed to evaluate the contribution of each identified component. The results indicated that the bioactive components of SFI on immuno-enhancement activities were calycosin-7-O-β-d-glucopyranoside (P9), isomucronulatol-7,2'-di-O-glucoside (P11), biochanin-7-glucoside (P12), 9,10-dimethoxypterocarpan-3-O-xylosylglucoside (P15) and astragaloside IV (P20), which might have positive effects on spleen index (E1), splenic lymphocyte proliferation (E4), splenic natural killer cell activity (E5), peritoneal macrophage phagocytosis (E6) and the amount of interleukin-2 (E7), while 5-hydroxymethyl-furaldehyde (P5) and lobetyolin (P13) might have negative effects on E1, E4, E5, E6 and E7. Finally, the bioactive HPLC fingerprint of SFI based on its bioactive components on immuno-enhancement effects was established for quality control of SFI. In summary, this study provided a perspective to explore the bioactive components in a traditional Chinese herbal formula with a series of HPLC and animal experiments, which would be helpful to improve quality control and inspire further clinical studies of traditional Chinese medicines. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Chandra X-ray Center Science Data Systems Regression Testing of CIAO
NASA Astrophysics Data System (ADS)
Lee, N. P.; Karovska, M.; Galle, E. C.; Bonaventura, N. R.
2011-07-01
The Chandra Interactive Analysis of Observations (CIAO) is a software system developed for the analysis of Chandra X-ray Observatory observations. An important component of a successful CIAO release is the repeated testing of the tools across various platforms to ensure consistent and scientifically valid results. We describe the procedures of the scientific regression testing of CIAO and the enhancements made to the testing system to increase the efficiency of run time and result validation.
NASA Astrophysics Data System (ADS)
Kargoll, Boris; Omidalizarandi, Mohammad; Loth, Ina; Paffenholz, Jens-André; Alkhatib, Hamza
2018-03-01
In this paper, we investigate a linear regression time series model of possibly outlier-afflicted observations and autocorrelated random deviations. This colored noise is represented by a covariance-stationary autoregressive (AR) process, in which the independent error components follow a scaled (Student's) t-distribution. This error model allows for the stochastic modeling of multiple outliers and for an adaptive robust maximum likelihood (ML) estimation of the unknown regression and AR coefficients, the scale parameter, and the degree of freedom of the t-distribution. This approach is meant to be an extension of known estimators, which tend to focus only on the regression model, or on the AR error model, or on normally distributed errors. For the purpose of ML estimation, we derive an expectation conditional maximization either algorithm, which leads to an easy-to-implement version of iteratively reweighted least squares. The estimation performance of the algorithm is evaluated via Monte Carlo simulations for a Fourier as well as a spline model in connection with AR colored noise models of different orders and with three different sampling distributions generating the white noise components. We apply the algorithm to a vibration dataset recorded by a high-accuracy, single-axis accelerometer, focusing on the evaluation of the estimated AR colored noise model.
Shen, Minxue; Tan, Hongzhuan; Zhou, Shujin; Retnakaran, Ravi; Smith, Graeme N.; Davidge, Sandra T.; Trasler, Jacquetta; Walker, Mark C.; Wen, Shi Wu
2016-01-01
Background It has been reported that higher folate intake from food and supplementation is associated with decreased blood pressure (BP). The association between serum folate concentration and BP has been examined in few studies. We aim to examine the association between serum folate and BP levels in a cohort of young Chinese women. Methods We used the baseline data from a pre-conception cohort of women of childbearing age in Liuyang, China, for this study. Demographic data were collected by structured interview. Serum folate concentration was measured by immunoassay, and homocysteine, blood glucose, triglyceride and total cholesterol were measured through standardized clinical procedures. Multiple linear regression and principal component regression model were applied in the analysis. Results A total of 1,532 healthy normotensive non-pregnant women were included in the final analysis. The mean concentration of serum folate was 7.5 ± 5.4 nmol/L and 55% of the women presented with folate deficiency (< 6.8 nmol/L). Multiple linear regression and principal component regression showed that serum folate levels were inversely associated with systolic and diastolic BP, after adjusting for demographic, anthropometric, and biochemical factors. Conclusions Serum folate is inversely associated with BP in non-pregnant women of childbearing age with high prevalence of folate deficiency. PMID:27182603
Understanding software faults and their role in software reliability modeling
NASA Technical Reports Server (NTRS)
Munson, John C.
1994-01-01
This study is a direct result of an on-going project to model the reliability of a large real-time control avionics system. In previous modeling efforts with this system, hardware reliability models were applied in modeling the reliability behavior of this system. In an attempt to enhance the performance of the adapted reliability models, certain software attributes were introduced in these models to control for differences between programs and also sequential executions of the same program. As the basic nature of the software attributes that affect software reliability become better understood in the modeling process, this information begins to have important implications on the software development process. A significant problem arises when raw attribute measures are to be used in statistical models as predictors, for example, of measures of software quality. This is because many of the metrics are highly correlated. Consider the two attributes: lines of code, LOC, and number of program statements, Stmts. In this case, it is quite obvious that a program with a high value of LOC probably will also have a relatively high value of Stmts. In the case of low level languages, such as assembly language programs, there might be a one-to-one relationship between the statement count and the lines of code. When there is a complete absence of linear relationship among the metrics, they are said to be orthogonal or uncorrelated. Usually the lack of orthogonality is not serious enough to affect a statistical analysis. However, for the purposes of some statistical analysis such as multiple regression, the software metrics are so strongly interrelated that the regression results may be ambiguous and possibly even misleading. Typically, it is difficult to estimate the unique effects of individual software metrics in the regression equation. The estimated values of the coefficients are very sensitive to slight changes in the data and to the addition or deletion of variables in the regression equation. Since most of the existing metrics have common elements and are linear combinations of these common elements, it seems reasonable to investigate the structure of the underlying common factors or components that make up the raw metrics. The technique we have chosen to use to explore this structure is a procedure called principal components analysis. Principal components analysis is a decomposition technique that may be used to detect and analyze collinearity in software metrics. When confronted with a large number of metrics measuring a single construct, it may be desirable to represent the set by some smaller number of variables that convey all, or most, of the information in the original set. Principal components are linear transformations of a set of random variables that summarize the information contained in the variables. The transformations are chosen so that the first component accounts for the maximal amount of variation of the measures of any possible linear transform; the second component accounts for the maximal amount of residual variation; and so on. The principal components are constructed so that they represent transformed scores on dimensions that are orthogonal. Through the use of principal components analysis, it is possible to have a set of highly related software attributes mapped into a small number of uncorrelated attribute domains. This definitively solves the problem of multi-collinearity in subsequent regression analysis. There are many software metrics in the literature, but principal component analysis reveals that there are few distinct sources of variation, i.e. dimensions, in this set of metrics. It would appear perfectly reasonable to characterize the measurable attributes of a program with a simple function of a small number of orthogonal metrics each of which represents a distinct software attribute domain.
Gui, Wei; Dombrow, Matthew; Marcus, Inna; Stowe, Meredith H; Tessier-Sherman, Baylah; Yang, Elizabeth; Huang, John J
2015-04-01
To compare vision-related (VR-QOL) and health-related quality of life (HR-QOL) in patients with noninfectious uveitis treated with systemic anti-inflammatory therapy versus nonsystemic therapy. A prospective, cross-sectional study design was employed. VR-QOL and HR-QOL were assessed by the 25-Item Visual Function Questionnaire (VFQ-25) and the Short Form 12-Item Health Survey (SF-12), respectively. Multivariate regression analysis was performed to assess the VR-QOL and HR-QOL based on treatment. Among the 80 patients, the median age was 51 years with 28 males (35%). The adjusted effect of treatment modality on VR-QOL or HR-QOL showed no statistically significant difference in all subscores of VFQ-25 or physical component score (PCS) and mental component score (MCS) of SF-12. Systemic therapy did not compromise VR-QOL or HR-QOL compared to nonsystemic therapy. Systemic therapy can be effectively used to control serious cases of noninfectious uveitis without significant relative adverse impact on quality of life.
Kontosic, I; Vukelić, M; Pancić, M; Kunisek, J
1994-12-01
Physical work load was estimated in a female conveyor-belt worker in a bottling plant. Estimation was based on continuous measurement and on calculation of average heart rate values in three-minute and one-hour periods and during the total measuring period. The thermal component of the heart rate was calculated by means of the corrected effective temperature, for the one-hour periods. The average heart rate at rest was also determined. The work component of the heart rate was calculated by subtraction of the resting heart rate and the heart rate measured at 50 W, using a regression equation. The average estimated gross energy expenditure during the work was 9.6 +/- 1.3 kJ/min corresponding to the category of light industrial work. The average estimated oxygen uptake was 0.42 +/- 0.06 L/min. The average performed mechanical work was 12.2 +/- 4.2 W, i.e. the energy expenditure was 8.3 +/- 1.5%.
2011-01-01
Background Evidence about a possible causal relationship between non-specific physical symptoms (NSPS) and exposure to electromagnetic fields (EMF) emitted by sources such as mobile phone base stations (BS) and powerlines is insufficient. So far little epidemiological research has been published on the contribution of psychological components to the occurrence of EMF-related NSPS. The prior objective of the current study is to explore the relative importance of actual and perceived proximity to base stations and psychological components as determinants of NSPS, adjusting for demographic, residency and area characteristics. Methods Analysis was performed on data obtained in a cross-sectional study on environment and health in 2006 in the Netherlands. In the current study, 3611 adult respondents (response rate: 37%) in twenty-two Dutch residential areas completed a questionnaire. Self-reported instruments included a symptom checklist and assessment of environmental and psychological characteristics. The computation of the distance between household addresses and location of base stations and powerlines was based on geo-coding. Multilevel regression models were used to test the hypotheses regarding the determinants related to the occurrence of NSPS. Results After adjustment for demographic and residential characteristics, analyses yielded a number of statistically significant associations: Increased report of NSPS was predominantly predicted by higher levels of self-reported environmental sensitivity; perceived proximity to base stations and powerlines, lower perceived control and increased avoidance (coping) behavior were also associated with NSPS. A trend towards a moderator effect of perceived environmental sensitivity on the relation between perceived proximity to BS and NSPS was verified (p = 0.055). There was no significant association between symptom occurrence and actual distance to BS or powerlines. Conclusions Perceived proximity to BS, psychological components and socio-demographic characteristics are associated with the report of symptomatology. Actual distance to the EMF source did not show up as determinant of NSPS. PMID:21631930
Shrinkage regression-based methods for microarray missing value imputation.
Wang, Hsiuying; Chiu, Chia-Chun; Wu, Yi-Ching; Wu, Wei-Sheng
2013-01-01
Missing values commonly occur in the microarray data, which usually contain more than 5% missing values with up to 90% of genes affected. Inaccurate missing value estimation results in reducing the power of downstream microarray data analyses. Many types of methods have been developed to estimate missing values. Among them, the regression-based methods are very popular and have been shown to perform better than the other types of methods in many testing microarray datasets. To further improve the performances of the regression-based methods, we propose shrinkage regression-based methods. Our methods take the advantage of the correlation structure in the microarray data and select similar genes for the target gene by Pearson correlation coefficients. Besides, our methods incorporate the least squares principle, utilize a shrinkage estimation approach to adjust the coefficients of the regression model, and then use the new coefficients to estimate missing values. Simulation results show that the proposed methods provide more accurate missing value estimation in six testing microarray datasets than the existing regression-based methods do. Imputation of missing values is a very important aspect of microarray data analyses because most of the downstream analyses require a complete dataset. Therefore, exploring accurate and efficient methods for estimating missing values has become an essential issue. Since our proposed shrinkage regression-based methods can provide accurate missing value estimation, they are competitive alternatives to the existing regression-based methods.
Comparative evaluation of urban storm water quality models
NASA Astrophysics Data System (ADS)
Vaze, J.; Chiew, Francis H. S.
2003-10-01
The estimation of urban storm water pollutant loads is required for the development of mitigation and management strategies to minimize impacts to receiving environments. Event pollutant loads are typically estimated using either regression equations or "process-based" water quality models. The relative merit of using regression models compared to process-based models is not clear. A modeling study is carried out here to evaluate the comparative ability of the regression equations and process-based water quality models to estimate event diffuse pollutant loads from impervious surfaces. The results indicate that, once calibrated, both the regression equations and the process-based model can estimate event pollutant loads satisfactorily. In fact, the loads estimated using the regression equation as a function of rainfall intensity and runoff rate are better than the loads estimated using the process-based model. Therefore, if only estimates of event loads are required, regression models should be used because they are simpler and require less data compared to process-based models.
NASA Technical Reports Server (NTRS)
Gao, Bo-Cai; Goetz, Alexander F. H.
1992-01-01
Over the last decade, technological advances in airborne imaging spectrometers, having spectral resolution comparable with laboratory spectrometers, have made it possible to estimate biochemical constituents of vegetation canopies. Wessman estimated lignin concentration from data acquired with NASA's Airborne Imaging Spectrometer (AIS) over Blackhawk Island in Wisconsin. A stepwise linear regression technique was used to determine the single spectral channel or channels in the AIS data that best correlated with measured lignin contents using chemical methods. The regression technique does not take advantage of the spectral shape of the lignin reflectance feature as a diagnostic tool nor the increased discrimination among other leaf components with overlapping spectral features. A nonlinear least squares spectral matching technique was recently reported for deriving both the equivalent water thicknesses of surface vegetation and the amounts of water vapor in the atmosphere from contiguous spectra measured with the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS). The same technique was applied to a laboratory reflectance spectrum of fresh, green leaves. The result demonstrates that the fresh leaf spectrum in the 1.0-2.5 microns region consists of spectral components of dry leaves and the spectral component of liquid water. A linear least squares spectral matching technique for retrieving equivalent water thickness and biochemical components of green vegetation is described.
Use of Continuous Integration Tools for Application Performance Monitoring
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vergara Larrea, Veronica G; Joubert, Wayne; Fuson, Christopher B
High performance computing systems are becom- ing increasingly complex, both in node architecture and in the multiple layers of software stack required to compile and run applications. As a consequence, the likelihood is increasing for application performance regressions to occur as a result of routine upgrades of system software components which interact in complex ways. The purpose of this study is to evaluate the effectiveness of continuous integration tools for application performance monitoring on HPC systems. In addition, this paper also describes a prototype system for application perfor- mance monitoring based on Jenkins, a Java-based continuous integration tool. The monitoringmore » system described leverages several features in Jenkins to track application performance results over time. Preliminary results and lessons learned from monitoring applications on Cray systems at the Oak Ridge Leadership Computing Facility are presented.« less
NASA Astrophysics Data System (ADS)
Lucifredi, A.; Mazzieri, C.; Rossi, M.
2000-05-01
Since the operational conditions of a hydroelectric unit can vary within a wide range, the monitoring system must be able to distinguish between the variations of the monitored variable caused by variations of the operation conditions and those due to arising and progressing of failures and misoperations. The paper aims to identify the best technique to be adopted for the monitoring system. Three different methods have been implemented and compared. Two of them use statistical techniques: the first, the linear multiple regression, expresses the monitored variable as a linear function of the process parameters (independent variables), while the second, the dynamic kriging technique, is a modified technique of multiple linear regression representing the monitored variable as a linear combination of the process variables in such a way as to minimize the variance of the estimate error. The third is based on neural networks. Tests have shown that the monitoring system based on the kriging technique is not affected by some problems common to the other two models e.g. the requirement of a large amount of data for their tuning, both for training the neural network and defining the optimum plane for the multiple regression, not only in the system starting phase but also after a trivial operation of maintenance involving the substitution of machinery components having a direct impact on the observed variable. Or, in addition, the necessity of different models to describe in a satisfactory way the different ranges of operation of the plant. The monitoring system based on the kriging statistical technique overrides the previous difficulties: it does not require a large amount of data to be tuned and is immediately operational: given two points, the third can be immediately estimated; in addition the model follows the system without adapting itself to it. The results of the experimentation performed seem to indicate that a model based on a neural network or on a linear multiple regression is not optimal, and that a different approach is necessary to reduce the amount of work during the learning phase using, when available, all the information stored during the initial phase of the plant to build the reference baseline, elaborating, if it is the case, the raw information available. A mixed approach using the kriging statistical technique and neural network techniques could optimise the result.
Katano, Sayuri; Nakamura, Yasuyuki; Nakamura, Aki; Murakami, Yoshitaka; Tanaka, Taichiro; Nakagawa, Hideaki; Takebayashi, Toru; Yamato, Hiroshi; Okayama, Akira; Miura, Katsuyuki; Okamura, Tomonori; Ueshima, Hirotsugu
2010-06-30
To examine the relation between lifestyle and the number of metabolic syndrome (MetS) diagnostic components in a general population, and to find a means of preventing the development of MetS components. We examined baseline data from 3,365 participants (2,714 men and 651 women) aged 19 to 69 years who underwent a physical examination, lifestyle survey, and blood chemical examination. The physical activity of each participant was classified according to the International Physical Activity Questionnaire (IPAQ). We defined four components for MetS in this study as follows: 1) high BP: systolic BP > or = 130 mmHg or diastolic BP > or = 85 mmHg, or the use of antihypertensive drugs; 2) dyslipidemia: high-density lipoprotein-cholesterol concentration < 40 mg/dL, triglycerides concentration > or = 150 mg/dL, or on medication for dyslipidemia; 3) Impaired glucose tolerance: fasting blood sugar level > or = 110 mg/d, or if less than 8 hours after meals > or = 140 mg/dL), or on medication for diabetes mellitus; 4) obesity: body mass index > or = 25 kg/m(2). Those who had 0 to 4 MetS diagnostic components accounted for 1,726, 949, 484, 190, and 16 participants, respectively, in the Poisson distribution. Poisson regression analysis revealed that independent factors contributing to the number of MetS diagnostic components were being male (regression coefficient b=0.600, p < 0.01), age (b=0.027, p < 0.01), IPAQ class (b=-0.272, p= 0.03), and alcohol consumption (b=0.020, p=0.01). The contribution of current smoking was not statistically significant (b=-0.067, p=0.76). Moderate physical activity was inversely associated with the number of MetS diagnostic components, whereas smoking was not associated.
Jankowski, Konrad S
2016-05-15
The study aimed to elucidate previously observed associations between morningness-eveningness and depressive symptomatology in university students. Relations between components of depressive symptomatology and morningness-eveningness were analysed. Nine hundred and seventy-four university students completed Polish versions of the Centre for Epidemiological Studies - Depression scale (CES-D; Polish translation appended to this paper) and the Composite Scale of Morningness. Principal component analysis (PCA) was used to test the structure of depressive symptoms. Pearson and partial correlations (with age and sex controlled), along with regression analyses with morning affect (MA) and circadian preference as predictors, were used. PCA revealed three components of depressive symptoms: depressed/somatic affect, positive affect, interpersonal relations. Greater MA was related to less depressive symptoms in three components. Morning circadian preference was related to less depressive symptoms in depressed/somatic and positive affects and unrelated to interpersonal relations. Both morningness-eveningness components exhibited stronger links with depressed/somatic and positive affects than with interpersonal relations. Three CES-D components exhibited stronger links with MA than with circadian preference. In regression analyses only MA was statistically significant for positive affect and better interpersonal relations, whereas more depressed/somatic affect was predicted by lower MA and morning circadian preference (relationship reversed compared to correlations). Self-report assessment. There are three groups of depressive symptoms in Polish university students. Associations of MA with depressed/somatic and positive affects are primarily responsible for the observed links between morningness-eveningness and depressive symptoms in university students. People with evening circadian preference whose MA is not lowered have less depressed/somatic affect. Copyright © 2016 Elsevier B.V. All rights reserved.
A questionnaire-wide association study of personality and mortality: the Vietnam Experience Study.
Weiss, Alexander; Gale, Catharine R; Batty, G David; Deary, Ian J
2013-06-01
We examined the association between the Minnesota Multiphasic Personality Inventory (MMPI) and all-cause mortality in 4462 middle-aged Vietnam-era veterans. We split the study population into half-samples. In each half, we used proportional hazards (Cox) regression to test the 550 MMPI items' associations with mortality over 15years. In all participants, we subjected significant (p<.01) items in both halves to principal-components analysis (PCA). We used Cox regression to test whether these components predicted mortality when controlling for other predictors (demographics, cognitive ability, health behaviors, and mental/physical health). Eighty-nine items were associated with mortality in both half-samples. PCA revealed Neuroticism/Negative Affectivity, Somatic Complaints, Psychotic/Paranoia, and Antisocial components, and a higher-order component, Personal Disturbance. Individually, Neuroticism/Negative Affectivity (HR=1.55; 95% CI=1.39, 1.72), Somatic Complaints (HR=1.66; 95% CI=1.52, 1.80), Psychotic/Paranoid (HR=1.44; 95% CI=1.32, 1.57), Antisocial (HR=1.79; 95% CI=1.59, 2.01), and Personal Disturbance (HR=1.74; 95% CI=1.58, 1.91) were associated with risk. Including covariates attenuated these associations (28.4 to 54.5%), though they were still significant. After entering Personal Disturbance into models with each component, Neuroticism/Negative Affectivity and Somatic Complaints were significant, although Neuroticism/Negative Affectivity's were now protective (HR=0.73; 95% CI=0.58, 0.92). When the four components were entered together with or without covariates, Somatic Complaints and Antisocial were significant risk factors. Somatic Complaints and Personal Disturbance are associated with increased mortality risk. Other components' effects varied as a function of variables in the model. Copyright © 2013 Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Gloria, R. Y.; Sudarmin, S.; Wiyanto; Indriyanti, D. R.
2018-03-01
Habits of mind are intelligent thinking dispositions that every individual needs to have, and it needs an effort to form them as expected. A behavior can be formed by continuous practice; therefore the student's habits of mind can also be formed and trained. One effort that can be used to encourage the formation of habits of mind is a formative assessment strategy with the stages of UbD (Understanding by Design), and a study needs to be done to prove it. This study aims to determine the contribution of formative assessment to the value of habits of mind owned by prospective teachers. The method used is a quantitative method with a quasi-experimental design. To determine the effectiveness of formative assessment with Ubd stages on the formation of habits of mind, correlation test and regression analysis were conducted in the formative assessment questionnaire consisting of three components, i.e. feed back, peer assessment and self assessment, and habits of mind. The result of the research shows that from the three components of Formative Assessment, only Feedback component does not show correlation to students’ habits of mind (r = 0.323). While peer assessment component (r = 0. 732) and self assessment component (r = 0.625), both indicate correlation. From the regression test the overall component of the formative assessment contributed to the habits of mind at 57.1%. From the result of the research, it can be concluded that the formative assessment with Ubd stages is effective and contributes in forming the student's habits of mind; the formative assessment components that contributed the most are the peer assessment and self assessment. The greatest contribution goes to the Thinking interdependently category.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tang, Kunkun, E-mail: ktg@illinois.edu; Inria Bordeaux – Sud-Ouest, Team Cardamom, 200 avenue de la Vieille Tour, 33405 Talence; Congedo, Pietro M.
The Polynomial Dimensional Decomposition (PDD) is employed in this work for the global sensitivity analysis and uncertainty quantification (UQ) of stochastic systems subject to a moderate to large number of input random variables. Due to the intimate connection between the PDD and the Analysis of Variance (ANOVA) approaches, PDD is able to provide a simpler and more direct evaluation of the Sobol' sensitivity indices, when compared to the Polynomial Chaos expansion (PC). Unfortunately, the number of PDD terms grows exponentially with respect to the size of the input random vector, which makes the computational cost of standard methods unaffordable formore » real engineering applications. In order to address the problem of the curse of dimensionality, this work proposes essentially variance-based adaptive strategies aiming to build a cheap meta-model (i.e. surrogate model) by employing the sparse PDD approach with its coefficients computed by regression. Three levels of adaptivity are carried out in this paper: 1) the truncated dimensionality for ANOVA component functions, 2) the active dimension technique especially for second- and higher-order parameter interactions, and 3) the stepwise regression approach designed to retain only the most influential polynomials in the PDD expansion. During this adaptive procedure featuring stepwise regressions, the surrogate model representation keeps containing few terms, so that the cost to resolve repeatedly the linear systems of the least-squares regression problem is negligible. The size of the finally obtained sparse PDD representation is much smaller than the one of the full expansion, since only significant terms are eventually retained. Consequently, a much smaller number of calls to the deterministic model is required to compute the final PDD coefficients.« less
Health behaviours, body weight and self-esteem among grade five students in Canada.
Wu, Xiuyun; Kirk, Sara F L; Ohinmaa, Arto; Veugelers, Paul
2016-01-01
This study sought to identify the principal components of self-esteem and the health behavioural determinants of these components among grade five students. We analysed data from a population-based survey among 4918 grade five students, who are primarily 10 and 11 years of age, and their parents in the Canadian province of Nova Scotia. The survey comprised the Harvard Youth and Adolescent Questionnaire, parental reporting of students' physical activity (PA) and time spent watching television or using computer/video games. Students heights and weights were objectively measured. We applied principal component analysis (PCA) to derive the components of self-esteem, and multilevel, multivariable logistic regression to quantify associations of diet quality, PA, sedentary behaviour and body weight with these components of self-esteem. PCA identified four components for self-esteem: self-perception, externalizing problems, internalizing problems, social-perception. Influences of health behaviours and body weight on self-esteem varied across the components. Better diet quality was associated with higher self-perception and fewer externalizing problems. Less PA and more use of computer/video games were related to lower self-perception and social-perception. Excessive TV watching was associated with more internalizing problems. Students classified as obese were more likely to report low self- and social-perception, and to experience fewer externalizing problems relative to students classified as normal weight. This study demonstrates independent influences of diet quality, physical activity, sedentary behaviour and body weight on four aspects of self-esteem among children. These findings suggest that school programs and health promotion strategies that target health behaviours may benefit self-esteem in childhood, and mental health and quality of life later in life.
A Survey of UML Based Regression Testing
NASA Astrophysics Data System (ADS)
Fahad, Muhammad; Nadeem, Aamer
Regression testing is the process of ensuring software quality by analyzing whether changed parts behave as intended, and unchanged parts are not affected by the modifications. Since it is a costly process, a lot of techniques are proposed in the research literature that suggest testers how to build regression test suite from existing test suite with minimum cost. In this paper, we discuss the advantages and drawbacks of using UML diagrams for regression testing and analyze that UML model helps in identifying changes for regression test selection effectively. We survey the existing UML based regression testing techniques and provide an analysis matrix to give a quick insight into prominent features of the literature work. We discuss the open research issues like managing and reducing the size of regression test suite, prioritization of the test cases that would be helpful during strict schedule and resources that remain to be addressed for UML based regression testing.
Integrated Central-Autonomic Multifractal Complexity in the Heart Rate Variability of Healthy Humans
Lin, D. C.; Sharif, A.
2012-01-01
Purpose of Study: The aim of this study was to characterize the central-autonomic interaction underlying the multifractality in heart rate variability (HRV) of healthy humans. Materials and Methods: Eleven young healthy subjects participated in two separate ~40 min experimental sessions, one in supine (SUP) and one in, head-up-tilt (HUT), upright (UPR) body positions. Surface scalp electroencephalography (EEG) and electrocardiogram (ECG) were collected and fractal correlation of brain and heart rate data was analyzed based on the idea of relative multifractality. The fractal correlation was further examined with the EEG, HRV spectral measures using linear regression of two variables and principal component analysis (PCA) to find clues for the physiological processing underlying the central influence in fractal HRV. Results: We report evidence of a central-autonomic fractal correlation (CAFC) where the HRV multifractal complexity varies significantly with the fractal correlation between the heart rate and brain data (P = 0.003). The linear regression shows significant correlation between CAFC measure and EEG Beta band spectral component (P = 0.01 for SUP and P = 0.002 for UPR positions). There is significant correlation between CAFC measure and HRV LF component in the SUP position (P = 0.04), whereas the correlation with the HRV HF component approaches significance (P = 0.07). The correlation between CAFC measure and HRV spectral measures in the UPR position is weak. The PCA results confirm these findings and further imply multiple physiological processes underlying CAFC, highlighting the importance of the EEG Alpha, Beta band, and the HRV LF, HF spectral measures in the supine position. Discussion and Conclusion: The findings of this work can be summarized into three points: (i) Similar fractal characteristics exist in the brain and heart rate fluctuation and the change toward stronger fractal correlation implies the change toward more complex HRV multifractality. (ii) CAFC is likely contributed by multiple physiological mechanisms, with its central elements mainly derived from the EEG Alpha, Beta band dynamics. (iii) The CAFC in SUP and UPR positions is qualitatively different, with a more predominant central influence in the fractal HRV of the UPR position. PMID:22403548
Triyana, Margaret; Shankar, Anuraj H
2017-10-22
To analyse the effectiveness of a household conditional cash transfer programme (CCT) on antenatal care (ANC) coverage reported by women and ANC quality reported by midwives. The CCT was piloted as a cluster randomised control trial in 2007. Intent-to-treat parameters were estimated using linear regression and logistic regression. Secondary analysis of the longitudinal CCT impact evaluation survey, conducted in 2007 and 2009. This included 6869 pregnancies and 1407 midwives in 180 control subdistricts and 180 treated subdistricts in Indonesia. ANC component coverage index, a composite measure of each ANC service component as self-reported by women, and ANC provider quality index, a composite measure of ANC service provided as self-reported by midwives. Each index was created by principal component analysis (PCA). Specific ANC component items were also assessed. The CCT was associated with improved ANC component coverage index by 0.07 SD (95% CI 0.002 to 0.141). Women were more likely to receive the following assessments: weight (OR 1.56 (95% CI 1.25 to 1.95)), height (OR 1.41 (95% CI 1.247 to 1.947)), blood pressure (OR 1.36 (95% CI 1.045 to 1.761)), fundal height measurements (OR 1.65 (95% CI 1.372 to 1.992)), fetal heart beat monitoring (OR 1.29 (95% CI 1.006 to 1.653)), external pelvic examination (OR 1.28 (95% CI 1.086 to 1.505)), iron-folic acid pills (OR 1.42 (95% CI 1.081 to 1.859)) and information on pregnancy complications (OR 2.09 (95% CI 1.724 to 2.551)). On the supply side, the CCT had no significant effect on the ANC provider quality index based on reports from midwives. The CCT programme improved ANC coverage for women, but midwives did not improve ANC quality. The results suggest that enhanced ANC utilisation may not be sufficient to improve health outcomes, and steps to improve ANC quality are essential for programme impact. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Nelemans, Stefanie A; van Assche, Evelien; Bijttebier, Patricia; Colpin, Hilde; van Leeuwen, Karla; Verschueren, Karine; Claes, Stephan; van den Noortgate, Wim; Goossens, Luc
2018-04-26
Guided by a developmental psychopathology framework, research has increasingly focused on the interplay of genetics and environment as a predictor of different forms of psychopathology, including social anxiety. In these efforts, the polygenic nature of complex phenotypes such as social anxiety is increasingly recognized, but studies applying polygenic approaches are still scarce. In this study, we applied Principal Covariates Regression as a novel approach to creating polygenic components for the oxytocin system, which has recently been put forward as particularly relevant to social anxiety. Participants were 978 adolescents (49.4% girls; M age T 1 = 13.8 years). Across 3 years, questionnaires were used to assess adolescent social anxiety symptoms and multi-informant reports of parental psychological control and autonomy support. All adolescents were genotyped for 223 oxytocin single nucleotide polymorphisms (SNPs) in 14 genes. Using Principal Covariates Regression, these SNPs could be reduced to five polygenic components. Four components reflected the underlying linkage disequilibrium and ancestry structure, whereas the fifth component, which consisted of small contributions of many SNPs across multiple genes, was strongly positively associated with adolescent social anxiety symptoms, pointing to an index of genetic risk. Moreover, significant interactions were found with this polygenic component and the environmental variables of interest. Specifically, adolescents who scored high on this polygenic component and experienced less adequate parenting (i.e., high psychological control or low autonomy support) showed the highest levels of social anxiety. Implications of these findings are discussed in the context of individual-by-environment models.
Gorban, A N; Mirkes, E M; Zinovyev, A
2016-12-01
Most of machine learning approaches have stemmed from the application of minimizing the mean squared distance principle, based on the computationally efficient quadratic optimization methods. However, when faced with high-dimensional and noisy data, the quadratic error functionals demonstrated many weaknesses including high sensitivity to contaminating factors and dimensionality curse. Therefore, a lot of recent applications in machine learning exploited properties of non-quadratic error functionals based on L 1 norm or even sub-linear potentials corresponding to quasinorms L p (0
Jarosova, Darja; Gurkova, Elena; Ziakova, Katarina; Nedvedova, Daniela; Palese, Alvisa; Godeas, Gloria; Chan, Sally Wai-Chi; Song, Mi Sook; Lee, Jongwon; Cordeiro, Raul; Babiarczyk, Beata; Fras, Malgorzata
2017-03-01
There is a considerable amount of empirical evidence to indicate a positive association between an employee's subjective well-being and workplace performance and job satisfaction. Compared with nursing research, there is a relative lack of consistent scientific evidence concerning midwives' subjective well-being and its determinants related to domains of job satisfaction. The purpose of the study was to examine the association between the domains of job satisfaction and components of subjective well-being in hospital midwives. This cross-sectional descriptive study involved 1190 hospital midwives from 7 countries. Job satisfaction was measured by the McCloskey/Mueller Satisfaction Scale. Subjective well-being was conceptualized in the study by the 2 components (the affective and the cognitive component). The affective component of subjective well-being (ie, emotional well-being) was assessed by the Positive and the Negative Affect Scale. The cognitive component of subjective well-being (ie, life satisfaction) was measured by the Personal Well-Being Index. Pearson correlations and multiple regression analyses were used to determine associations between variables. Findings from correlation and regression analyses indicated an overall weak association between the domains of job satisfaction and components of subjective well-being. Satisfaction with extrinsic rewards, coworkers, and interaction opportunities accounted for only 13% of variance in the cognitive component (life satisfaction). The affective component (emotional well-being) was weakly associated with satisfaction with control and responsibility. The low amount of variance suggests that neither component of subjective well-being is influenced by the domains of job satisfaction. Further studies should focus on identifying other predictors of subjective well-being among midwives. A better understanding of how specific job facets are related to the subjective well-being of midwives might assist employers in the design of counseling and intervention programs for subjective well-being of midwives in the workplace and workplace performance. © 2016 by the American College of Nurse-Midwives.
Iurin, A G
2010-01-01
Non-metastatic clear-cell renal cancer: dependence of the tumour stage on clinico-anatomic and morphologic factors; prognostic value of macro- and karyometric characteristics Sankt Peterburg Pathology Bureau, Sankt Peterburg It was shown based on multivariate regression analysis that pT1a3bN0MO stages of non-metastatic clear-cell renal cancer significantly correlate not only with the tumor size and invasion into the fatty tissue and/or renal vein but also with the invasion into the renal capsule and with the mean maximum diameter and mean nucleus area of tumor cells. There was no correlation of clear-cell renal cancer stages with tumor proliferative activity, gene p53 mutation, oncosuppressor gene PTEN expression, fraction of tumour clear-cell component, and such clinical characteristics as patients' sex, age, and body mass index. Taking into account statistically significant differences between the patients' survival rates, the regression equations developed in this work may be used for the prediction of disease outcome.
Valérie Passo Tsamo, Claudine; Andre, Christelle M; Ritter, Christian; Tomekpe, Kodjo; Ngoh Newilah, Gérard; Rogez, Hervé; Larondelle, Yvan
2014-08-27
This study aimed at understanding the contribution of the fruit physicochemical parameters to Musa sp. diversity and plantain ripening stages. A discriminant analysis was first performed on a collection of 35 Musa sp. cultivars, organized in six groups based on the consumption mode (dessert or cooking banana) and the genomic constitution. A principal component analysis reinforced by a logistic regression on plantain cultivars was proposed as an analytical approach to describe the plantain ripening stages. The results of the discriminant analysis showed that edible fraction, peel pH, pulp water content, and pulp total phenolics were among the most contributing attributes for the discrimination of the cultivar groups. With mean values ranging from 65.4 to 247.3 mg of gallic acid equivalents/100 g of fresh weight, the pulp total phenolics strongly differed between interspecific and monospecific cultivars within dessert and nonplantain cooking bananas. The results of the logistic regression revealed that the best models according to fitting parameters involved more than one physicochemical attribute. Interestingly, pulp and peel total phenolic contents contributed in the building up of these models.
Kernel analysis of partial least squares (PLS) regression models.
Shinzawa, Hideyuki; Ritthiruangdej, Pitiporn; Ozaki, Yukihiro
2011-05-01
An analytical technique based on kernel matrix representation is demonstrated to provide further chemically meaningful insight into partial least squares (PLS) regression models. The kernel matrix condenses essential information about scores derived from PLS or principal component analysis (PCA). Thus, it becomes possible to establish the proper interpretation of the scores. A PLS model for the total nitrogen (TN) content in multiple Thai fish sauces is built with a set of near-infrared (NIR) transmittance spectra of the fish sauce samples. The kernel analysis of the scores effectively reveals that the variation of the spectral feature induced by the change in protein content is substantially associated with the total water content and the protein hydration. Kernel analysis is also carried out on a set of time-dependent infrared (IR) spectra representing transient evaporation of ethanol from a binary mixture solution of ethanol and oleic acid. A PLS model to predict the elapsed time is built with the IR spectra and the kernel matrix is derived from the scores. The detailed analysis of the kernel matrix provides penetrating insight into the interaction between the ethanol and the oleic acid.
Urban change analysis and future growth of Istanbul.
Akın, Anıl; Sunar, Filiz; Berberoğlu, Süha
2015-08-01
This study is aimed at analyzing urban change within Istanbul and assessing the city's future growth potential using appropriate approach modeling for the year 2040. Urban growth is a major driving force of land-use change, and spatial and temporal components of urbanization can be identified through accurate spatial modeling. In this context, widely used urban modeling approaches, such as the Markov chain and logistic regression based on cellular automata (CA), were used to simulate urban growth within Istanbul. The distance from each pixel to the urban and road classes, elevation, and slope, together with municipality and land use maps (as an excluded layer), were identified as factors. Calibration data were obtained from remotely sensed data recorded in 1972, 1986, and 2013. Validation was performed by overlaying the simulated and actual 2013 urban maps, and a kappa index of agreement was derived. The results indicate that urban expansion will influence mainly forest areas during the time period of 2013-2040. The urban expansion was predicted as 429 and 327 km(2) with the Markov chain and logistic regression models, respectively.
Jung, Taejin; Youn, Hyunsook; McClung, Steven
2007-02-01
The main purposes of this study are to find out individuals' motives and interpersonal self-presentation strategies on constructing Korean weblog format personal homepage (e.g., "Cyworld mini-homepage"). The study also attempts to find predictor motives that lead to the activities of posting and maintaining a homepage and compare the self-presentation strategies used on the Web with those commonly used in interpersonal situations. By using a principal component factor analysis, four salient self-presentation strategy factors and five interpretable mini-homepage hosting motive factors were identified. Accompanying multiple regression analysis shows that entertainment and personal income factors are major predictors in explaining homepage maintenance expenditures and frequencies of updating.
Dmitriev, Egor V; Khomenko, Georges; Chami, Malik; Sokolov, Anton A; Churilova, Tatyana Y; Korotaev, Gennady K
2009-03-01
The absorption of sunlight by oceanic constituents significantly contributes to the spectral distribution of the water-leaving radiance. Here it is shown that current parameterizations of absorption coefficients do not apply to the optically complex waters of the Crimea Peninsula. Based on in situ measurements, parameterizations of phytoplankton, nonalgal, and total particulate absorption coefficients are proposed. Their performance is evaluated using a log-log regression combined with a low-pass filter and the nonlinear least-square method. Statistical significance of the estimated parameters is verified using the bootstrap method. The parameterizations are relevant for chlorophyll a concentrations ranging from 0.45 up to 2 mg/m(3).
2012-09-01
3,435 10,461 9.1 3.1 63 Unmarried with Children+ Unmarried without Children 439,495 0.01 10,350 43,870 10.1 2.2 64 Married with Children+ Married ...logistic regression model was used to predict the probability of eligibility for the survey (known eligibility vs . unknown eligibility). A second logistic...regression model was used to predict the probability of response among eligible sample members (complete response vs . non-response). CHAID (Chi
Kim, Su Kang; Hong, Seung-Hee; Chung, Joo-Ho; Cho, Kyu Bong
2017-01-01
Background The relationship between alcohol consumption and metabolic syndrome (MetS) remains controversial. This study investigated the relationship between alcohol consumption and MetS components and prevalence. Material/Methods We analyzed 10 037 subjects (3076 MetS and 6961 non-MetS) in a community-based cohort. MetS was defined according to the ATP III Guidelines. Subjects were divided according to amount of alcohol consumption; non-drinker, very light (0.1–5.0 g/day), light (5.1–15.0 g/day), moderate (15.1–30.0 g/day), and heavy drinker (>30 g/day). Multiple logistic regression models were performed to estimate odds ratios (ORs) and confidence intervals (CIs). The analyses were performed in men and women separately. SPSS statistical software was used for analyses. Results The prevalence of MetS in both males and females was associated with alcohol drinking status (p<0.0001). Amount of alcohol consumption (0.1–5.0 g/day) was significantly associated with lower prevalence of MetS in both genders compared to non-drinkers. Amount of alcohol consumption (>30.0 g/day) did not show a significant association with prevalence of MetS. However, alcohol consumption (>30.0 g/day) showed an association with glucose and HDL cholesterol among the components of MetS. Conclusions Our results indicate that alcohol drinking (0.1–5.0 g/day) contributed to decrease prevalence of MetS and components, including triglyceride and HDL cholesterol. PMID:28465500
Kim, Su Kang; Hong, Seung-Hee; Chung, Joo-Ho; Cho, Kyu Bong
2017-05-03
BACKGROUND The relationship between alcohol consumption and metabolic syndrome (MetS) remains controversial. This study investigated the relationship between alcohol consumption and MetS components and prevalence. MATERIAL AND METHODS We analyzed 10 037 subjects (3076 MetS and 6961 non-MetS) in a community-based cohort. MetS was defined according to the ATP III Guidelines. Subjects were divided according to amount of alcohol consumption; non-drinker, very light (0.1-5.0 g/day), light (5.1-15.0 g/day), moderate (15.1-30.0 g/day), and heavy drinker (>30 g/day). Multiple logistic regression models were performed to estimate odds ratios (ORs) and confidence intervals (CIs). The analyses were performed in men and women separately. SPSS statistical software was used for analyses. RESULTS The prevalence of MetS in both males and females was associated with alcohol drinking status (p<0.0001). Amount of alcohol consumption (0.1-5.0 g/day) was significantly associated with lower prevalence of MetS in both genders compared to non-drinkers. Amount of alcohol consumption (>30.0 g/day) did not show a significant association with prevalence of MetS. However, alcohol consumption (>30.0 g/day) showed an association with glucose and HDL cholesterol among the components of MetS. CONCLUSIONS Our results indicate that alcohol drinking (0.1-5.0 g/day) contributed to decrease prevalence of MetS and components, including triglyceride and HDL cholesterol.
Córdoba-Torrecilla, S; Aparicio, V A; Soriano-Maldonado, A; Estévez-López, F; Segura-Jiménez, V; Álvarez-Gallardo, I; Femia, P; Delgado-Fernández, M
2016-04-01
To assess the independent associations of individual physical fitness components with anxiety in women with fibromyalgia and to test which physical fitness component shows the greatest association. This population-based cross-sectional study included 439 women with fibromyalgia (age 52.2 ± 8.0 years). Anxiety symptoms were measured with the State Trait Anxiety Inventory (STAI) and the anxiety item of the Revised Fibromyalgia Impact Questionnaire (FIQR). Physical fitness was assessed through the Senior Fitness Test battery and handgrip strength test. Overall, lower physical fitness was associated with higher anxiety levels (all, p < 0.05). The coefficients of the optimal regression model (stepwise selection method) between anxiety symptoms and physical fitness components adjusted for age, body fat percentage and anxiolytics intake showed that the back scratch test (b = -0.18), the chair sit-and-reach test (b = -0.12; p = 0.027) and the 6-min walk test (b = -0.02; p = 0.024) were independently and inversely associated with STAI. The back scratch test and the arm- curl test were associated with FIQR-anxiety (b = -0.05; p < 0.001 and b = -0.07; p = 0.021, respectively). Physical fitness was inversely and consistently associated with anxiety in women with fibromyalgia, regardless of the fitness component evaluated. In particular, upper-body flexibility was an independent indicator of anxiety levels, followed by cardiorespiratory fitness and muscular strength.
A single determinant dominates the rate of yeast protein evolution.
Drummond, D Allan; Raval, Alpan; Wilke, Claus O
2006-02-01
A gene's rate of sequence evolution is among the most fundamental evolutionary quantities in common use, but what determines evolutionary rates has remained unclear. Here, we carry out the first combined analysis of seven predictors (gene expression level, dispensability, protein abundance, codon adaptation index, gene length, number of protein-protein interactions, and the gene's centrality in the interaction network) previously reported to have independent influences on protein evolutionary rates. Strikingly, our analysis reveals a single dominant variable linked to the number of translation events which explains 40-fold more variation in evolutionary rate than any other, suggesting that protein evolutionary rate has a single major determinant among the seven predictors. The dominant variable explains nearly half the variation in the rate of synonymous and protein evolution. We show that the two most commonly used methods to disentangle the determinants of evolutionary rate, partial correlation analysis and ordinary multivariate regression, produce misleading or spurious results when applied to noisy biological data. We overcome these difficulties by employing principal component regression, a multivariate regression of evolutionary rate against the principal components of the predictor variables. Our results support the hypothesis that translational selection governs the rate of synonymous and protein sequence evolution in yeast.
Sun, Yanqing; Sun, Liuquan; Zhou, Jie
2013-07-01
This paper studies the generalized semiparametric regression model for longitudinal data where the covariate effects are constant for some and time-varying for others. Different link functions can be used to allow more flexible modelling of longitudinal data. The nonparametric components of the model are estimated using a local linear estimating equation and the parametric components are estimated through a profile estimating function. The method automatically adjusts for heterogeneity of sampling times, allowing the sampling strategy to depend on the past sampling history as well as possibly time-dependent covariates without specifically model such dependence. A [Formula: see text]-fold cross-validation bandwidth selection is proposed as a working tool for locating an appropriate bandwidth. A criteria for selecting the link function is proposed to provide better fit of the data. Large sample properties of the proposed estimators are investigated. Large sample pointwise and simultaneous confidence intervals for the regression coefficients are constructed. Formal hypothesis testing procedures are proposed to check for the covariate effects and whether the effects are time-varying. A simulation study is conducted to examine the finite sample performances of the proposed estimation and hypothesis testing procedures. The methods are illustrated with a data example.
2011-01-01
Principal component regression is a multivariate data analysis approach routinely used to predict neurochemical concentrations from in vivo fast-scan cyclic voltammetry measurements. This mathematical procedure can rapidly be employed with present day computer programming languages. Here, we evaluate several methods that can be used to evaluate and improve multivariate concentration determination. The cyclic voltammetric representation of the calculated regression vector is shown to be a valuable tool in determining whether the calculated multivariate model is chemically appropriate. The use of Cook’s distance successfully identified outliers contained within in vivo fast-scan cyclic voltammetry training sets. This work also presents the first direct interpretation of a residual color plot and demonstrated the effect of peak shifts on predicted dopamine concentrations. Finally, separate analyses of smaller increments of a single continuous measurement could not be concatenated without substantial error in the predicted neurochemical concentrations due to electrode drift. Taken together, these tools allow for the construction of more robust multivariate calibration models and provide the first approach to assess the predictive ability of a procedure that is inherently impossible to validate because of the lack of in vivo standards. PMID:21966586
Keithley, Richard B; Wightman, R Mark
2011-06-07
Principal component regression is a multivariate data analysis approach routinely used to predict neurochemical concentrations from in vivo fast-scan cyclic voltammetry measurements. This mathematical procedure can rapidly be employed with present day computer programming languages. Here, we evaluate several methods that can be used to evaluate and improve multivariate concentration determination. The cyclic voltammetric representation of the calculated regression vector is shown to be a valuable tool in determining whether the calculated multivariate model is chemically appropriate. The use of Cook's distance successfully identified outliers contained within in vivo fast-scan cyclic voltammetry training sets. This work also presents the first direct interpretation of a residual color plot and demonstrated the effect of peak shifts on predicted dopamine concentrations. Finally, separate analyses of smaller increments of a single continuous measurement could not be concatenated without substantial error in the predicted neurochemical concentrations due to electrode drift. Taken together, these tools allow for the construction of more robust multivariate calibration models and provide the first approach to assess the predictive ability of a procedure that is inherently impossible to validate because of the lack of in vivo standards.
Pradervand, Sylvain; Maurya, Mano R; Subramaniam, Shankar
2006-01-01
Background Release of immuno-regulatory cytokines and chemokines during inflammatory response is mediated by a complex signaling network. Multiple stimuli produce different signals that generate different cytokine responses. Current knowledge does not provide a complete picture of these signaling pathways. However, using specific markers of signaling pathways, such as signaling proteins, it is possible to develop a 'coarse-grained network' map that can help understand common regulatory modules for various cytokine responses and help differentiate between the causes of their release. Results Using a systematic profiling of signaling responses and cytokine release in RAW 264.7 macrophages made available by the Alliance for Cellular Signaling, an analysis strategy is presented that integrates principal component regression and exhaustive search-based model reduction to identify required signaling factors necessary and sufficient to predict the release of seven cytokines (G-CSF, IL-1α, IL-6, IL-10, MIP-1α, RANTES, and TNFα) in response to selected ligands. This study provides a model-based quantitative estimate of cytokine release and identifies ten signaling components involved in cytokine production. The models identified capture many of the known signaling pathways involved in cytokine release and predict potentially important novel signaling components, like p38 MAPK for G-CSF release, IFNγ- and IL-4-specific pathways for IL-1a release, and an M-CSF-specific pathway for TNFα release. Conclusion Using an integrative approach, we have identified the pathways responsible for the differential regulation of cytokine release in RAW 264.7 macrophages. Our results demonstrate the power of using heterogeneous cellular data to qualitatively and quantitatively map intermediate cellular phenotypes. PMID:16507166
NASA Astrophysics Data System (ADS)
Chan, H. M.; van der Velden, B. H. M.; E Loo, C.; Gilhuijs, K. G. A.
2017-08-01
We present a radiomics model to discriminate between patients at low risk and those at high risk of treatment failure at long-term follow-up based on eigentumors: principal components computed from volumes encompassing tumors in washin and washout images of pre-treatment dynamic contrast-enhanced (DCE-) MR images. Eigentumors were computed from the images of 563 patients from the MARGINS study. Subsequently, a least absolute shrinkage selection operator (LASSO) selected candidates from the components that contained 90% of the variance of the data. The model for prediction of survival after treatment (median follow-up time 86 months) was based on logistic regression. Receiver operating characteristic (ROC) analysis was applied and area-under-the-curve (AUC) values were computed as measures of training and cross-validated performances. The discriminating potential of the model was confirmed using Kaplan-Meier survival curves and log-rank tests. From the 322 principal components that explained 90% of the variance of the data, the LASSO selected 28 components. The ROC curves of the model yielded AUC values of 0.88, 0.77 and 0.73, for the training, leave-one-out cross-validated and bootstrapped performances, respectively. The bootstrapped Kaplan-Meier survival curves confirmed significant separation for all tumors (P < 0.0001). Survival analysis on immunohistochemical subgroups shows significant separation for the estrogen-receptor subtype tumors (P < 0.0001) and the triple-negative subtype tumors (P = 0.0039), but not for tumors of the HER2 subtype (P = 0.41). The results of this retrospective study show the potential of early-stage pre-treatment eigentumors for use in prediction of treatment failure of breast cancer.
NASA Astrophysics Data System (ADS)
Sahabiev, I. A.; Ryazanov, S. S.; Kolcova, T. G.; Grigoryan, B. R.
2018-03-01
The three most common techniques to interpolate soil properties at a field scale—ordinary kriging (OK), regression kriging with multiple linear regression drift model (RK + MLR), and regression kriging with principal component regression drift model (RK + PCR)—were examined. The results of the performed study were compiled into an algorithm of choosing the most appropriate soil mapping technique. Relief attributes were used as the auxiliary variables. When spatial dependence of a target variable was strong, the OK method showed more accurate interpolation results, and the inclusion of the auxiliary data resulted in an insignificant improvement in prediction accuracy. According to the algorithm, the RK + PCR method effectively eliminates multicollinearity of explanatory variables. However, if the number of predictors is less than ten, the probability of multicollinearity is reduced, and application of the PCR becomes irrational. In that case, the multiple linear regression should be used instead.
Reference-Free Removal of EEG-fMRI Ballistocardiogram Artifacts with Harmonic Regression
Krishnaswamy, Pavitra; Bonmassar, Giorgio; Poulsen, Catherine; Pierce, Eric T; Purdon, Patrick L.; Brown, Emery N.
2016-01-01
Combining electroencephalogram (EEG) recording and functional magnetic resonance imaging (fMRI) offers the potential for imaging brain activity with high spatial and temporal resolution. This potential remains limited by the significant ballistocardiogram (BCG) artifacts induced in the EEG by cardiac pulsation-related head movement within the magnetic field. We model the BCG artifact using a harmonic basis, pose the artifact removal problem as a local harmonic regression analysis, and develop an efficient maximum likelihood algorithm to estimate and remove BCG artifacts. Our analysis paradigm accounts for time-frequency overlap between the BCG artifacts and neurophysiologic EEG signals, and tracks the spatiotemporal variations in both the artifact and the signal. We evaluate performance on: simulated oscillatory and evoked responses constructed with realistic artifacts; actual anesthesia-induced oscillatory recordings; and actual visual evoked potential recordings. In each case, the local harmonic regression analysis effectively removes the BCG artifacts, and recovers the neurophysiologic EEG signals. We further show that our algorithm outperforms commonly used reference-based and component analysis techniques, particularly in low SNR conditions, the presence of significant time-frequency overlap between the artifact and the signal, and/or large spatiotemporal variations in the BCG. Because our algorithm does not require reference signals and has low computational complexity, it offers a practical tool for removing BCG artifacts from EEG data recorded in combination with fMRI. PMID:26151100
Kandala, Sridhar; Nolan, Dan; Laumann, Timothy O.; Power, Jonathan D.; Adeyemo, Babatunde; Harms, Michael P.; Petersen, Steven E.; Barch, Deanna M.
2016-01-01
Abstract Like all resting-state functional connectivity data, the data from the Human Connectome Project (HCP) are adversely affected by structured noise artifacts arising from head motion and physiological processes. Functional connectivity estimates (Pearson's correlation coefficients) were inflated for high-motion time points and for high-motion participants. This inflation occurred across the brain, suggesting the presence of globally distributed artifacts. The degree of inflation was further increased for connections between nearby regions compared with distant regions, suggesting the presence of distance-dependent spatially specific artifacts. We evaluated several denoising methods: censoring high-motion time points, motion regression, the FMRIB independent component analysis-based X-noiseifier (FIX), and mean grayordinate time series regression (MGTR; as a proxy for global signal regression). The results suggest that FIX denoising reduced both types of artifacts, but left substantial global artifacts behind. MGTR significantly reduced global artifacts, but left substantial spatially specific artifacts behind. Censoring high-motion time points resulted in a small reduction of distance-dependent and global artifacts, eliminating neither type. All denoising strategies left differences between high- and low-motion participants, but only MGTR substantially reduced those differences. Ultimately, functional connectivity estimates from HCP data showed spatially specific and globally distributed artifacts, and the most effective approach to address both types of motion-correlated artifacts was a combination of FIX and MGTR. PMID:27571276
Perez-Guaita, David; Kuligowski, Julia; Quintás, Guillermo; Garrigues, Salvador; Guardia, Miguel de la
2013-03-30
Locally weighted partial least squares regression (LW-PLSR) has been applied to the determination of four clinical parameters in human serum samples (total protein, triglyceride, glucose and urea contents) by Fourier transform infrared (FTIR) spectroscopy. Classical LW-PLSR models were constructed using different spectral regions. For the selection of parameters by LW-PLSR modeling, a multi-parametric study was carried out employing the minimum root-mean square error of cross validation (RMSCV) as objective function. In order to overcome the effect of strong matrix interferences on the predictive accuracy of LW-PLSR models, this work focuses on sample selection. Accordingly, a novel strategy for the development of local models is proposed. It was based on the use of: (i) principal component analysis (PCA) performed on an analyte specific spectral region for identifying most similar sample spectra and (ii) partial least squares regression (PLSR) constructed using the whole spectrum. Results found by using this strategy were compared to those provided by PLSR using the same spectral intervals as for LW-PLSR. Prediction errors found by both, classical and modified LW-PLSR improved those obtained by PLSR. Hence, both proposed approaches were useful for the determination of analytes present in a complex matrix as in the case of human serum samples. Copyright © 2013 Elsevier B.V. All rights reserved.
Gianola, Daniel; Fariello, Maria I; Naya, Hugo; Schön, Chris-Carolin
2016-10-13
Standard genome-wide association studies (GWAS) scan for relationships between each of p molecular markers and a continuously distributed target trait. Typically, a marker-based matrix of genomic similarities among individuals ( G: ) is constructed, to account more properly for the covariance structure in the linear regression model used. We show that the generalized least-squares estimator of the regression of phenotype on one or on m markers is invariant with respect to whether or not the marker(s) tested is(are) used for building G,: provided variance components are unaffected by exclusion of such marker(s) from G: The result is arrived at by using a matrix expression such that one can find many inverses of genomic relationship, or of phenotypic covariance matrices, stemming from removing markers tested as fixed, but carrying out a single inversion. When eigenvectors of the genomic relationship matrix are used as regressors with fixed regression coefficients, e.g., to account for population stratification, their removal from G: does matter. Removal of eigenvectors from G: can have a noticeable effect on estimates of genomic and residual variances, so caution is needed. Concepts were illustrated using genomic data on 599 wheat inbred lines, with grain yield as target trait, and on close to 200 Arabidopsis thaliana accessions. Copyright © 2016 Gianola et al.
Gordillo Altamirano, Fernando; Fierro Torres, María José; Cevallos Salas, Nelson; Cervantes Vélez, María Cristina
To identify the main factors determining the health related quality of life (HRQL) in patients with cancer-related neuropathic pain in a tertiary care hospital. A cross-sectional analytical study was performed on a sample of 237 patients meeting criteria for cancer-related neuropathic pain. Clinical and demographic variables were recorded including, cancer type, stage, time since diagnosis, pain intensity, physical functionality with the Palliative Performance Scale (PPS), and anxiety and depression with the Hospital Anxiety and Depression Scale (HADS). Their respective correlation coefficients (r) with HRQL assessed with the SF-36v2 Questionnaire were then calculated. Linear regression equations were then constructed with the variables that showed an r≥.5 with the HRQL. The HRQL scores of the sample were 39.3±9.1 (Physical Component) and 45.5±13.8 (Mental Component). Anxiety and depression strongly correlated with the mental component (r=-.641 and r=-.741, respectively) while PPS score correlated with the physical component (r=.617). The linear regression model that better explained the variance of the mental component was designed combining the Anxiety and Depression variables (R=77.3%; P<.001). The strong influence of psychiatric comorbidity on the HRQL of patients with cancer-related neuropathic pain makes an integral management plan essential for these patients to include interventions for its timely diagnosis and treatment. Copyright © 2016 Asociación Colombiana de Psiquiatría. Publicado por Elsevier España. All rights reserved.
Jørgensen, Sanne Ellegård; Jørgensen, Thea Suldrup; Aarestrup, Anne Kristine; Due, Pernille; Krølner, Rikke
2016-10-26
Based on the assumption of parental influence on adolescent behavior, multicomponent school-based dietary interventions often include a parental component. The effect of this intervention component is seldom reported and the evidence is inconsistent. We conducted a systematic process evaluation of the parental component and examined whether the leveal of parental involvement in a large multi-component intervention: the Boost study was associated with adolescents' fruit and vegetable (FV) intake at follow-up. The Boost study was targeting FV intake among 1,175 Danish 7 th graders (≈13- year-olds) in the school year 2010/11. The study included a school component: free FV in class and curricular activities; a local community component: fact sheets for sports- and youth clubs; and a parental component: presentation of Boost at a parent-school meeting, 6 newsletters to parents, 3 guided student-parent curricular activities, and a student-parent Boost event. Students whose parent replied to the follow-up survey (n = 347). Questionnaire data from students, parents and teachers at 20 intervention schools. Process evaluation measures: dose delivered, dose received, appreciation and level of parental involvement. Parental involvement was trichotomized into: low/no (0-2 points), medium (3 points) and high (4-6 points). The association between level of parental involvement and self-reported FV intake (24-h recall), was analyzed using multilevel regression analyses. The Boost study was presented at a parent-school meeting at all intervention schools. The dose delivered was low to moderate for the three other parental elements. Most parents appreciated the intervention and talked with their child about Boost (83.5 %). High, medium and low parental involvement was found among 30.5 %, 29.6 % and 39.4 % of the students respectively. Parental involvement was highest among women. More men agreed that the parental newsletters provided new information. Students with a medium and high level of parental involvement ate 47.5 and 95.2 g more FV per day compared to students with low level/no parental involvement (p = 0.02). Students with a high level of parental involvement ate significantly more FV at follow-up compared to students with a low level/no parental involvement. Parental involvement in interventions may improve adolescents' FV intake if challenges of implementation can be overcome. ISRCTN11666034 . Registered 06/01/2012. Retrospectively registered.
General Framework for Meta-analysis of Rare Variants in Sequencing Association Studies
Lee, Seunggeun; Teslovich, Tanya M.; Boehnke, Michael; Lin, Xihong
2013-01-01
We propose a general statistical framework for meta-analysis of gene- or region-based multimarker rare variant association tests in sequencing association studies. In genome-wide association studies, single-marker meta-analysis has been widely used to increase statistical power by combining results via regression coefficients and standard errors from different studies. In analysis of rare variants in sequencing studies, region-based multimarker tests are often used to increase power. We propose meta-analysis methods for commonly used gene- or region-based rare variants tests, such as burden tests and variance component tests. Because estimation of regression coefficients of individual rare variants is often unstable or not feasible, the proposed method avoids this difficulty by calculating score statistics instead that only require fitting the null model for each study and then aggregating these score statistics across studies. Our proposed meta-analysis rare variant association tests are conducted based on study-specific summary statistics, specifically score statistics for each variant and between-variant covariance-type (linkage disequilibrium) relationship statistics for each gene or region. The proposed methods are able to incorporate different levels of heterogeneity of genetic effects across studies and are applicable to meta-analysis of multiple ancestry groups. We show that the proposed methods are essentially as powerful as joint analysis by directly pooling individual level genotype data. We conduct extensive simulations to evaluate the performance of our methods by varying levels of heterogeneity across studies, and we apply the proposed methods to meta-analysis of rare variant effects in a multicohort study of the genetics of blood lipid levels. PMID:23768515
Serum metabolomics differentiating pancreatic cancer from new-onset diabetes
He, Xiangyi; Zhong, Jie; Wang, Shuwei; Zhou, Yufen; Wang, Lei; Zhang, Yongping; Yuan, Yaozong
2017-01-01
To establish a screening strategy for pancreatic cancer (PC) based on new-onset diabetic mellitus (NO-DM), serum metabolomics analysis and a search for the metabolic pathways associated with PC related DM were performed. Serum samples from patients with NO-DM (n = 30) and patients with pancreatic cancer and NO-DM were examined by liquid chromatography-mass spectrometry. Data were analyzed using principal components analysis (PCA) and orthogonal projection to latent structures (OPLS) of the most significant metabolites. The diagnostic model was constructed using logistic regression analysis. Metabolic pathways were analyzed using the web-based tool MetPA. PC patients with NO-DM were older and had a lower BMI and shorter duration of DM than those with NO-DM. The metabolomic profiles of patients with PC and NO-DM were significantly different from those of patients with NO-DM in the PCA and OPLS models. Sixty two differential metabolites were identified by the OPLS model. The logistic regression model using a panel of two metabolites including N_Succinyl_L_diaminopimelic_acid and PE (18:2) had high sensitivity (93.3%) and specificity (93.1%) for PC. The top three metabolic pathways associated with PC related DM were valine, leucine and isoleucine biosynthesis and degradation, primary bile acid biosynthesis, and sphingolipid metabolism. In conclusion, screening for PC based on NO-DM using serum metabolomics in combination with clinic characteristics and CA19-9 is a potential useful strategy. Several metabolic pathways differed between PC related DM and type 2 DM. PMID:28418859
Merchak, Noelle; Silvestre, Virginie; Loquet, Denis; Rizk, Toufic; Akoka, Serge; Bejjani, Joseph
2017-01-01
Triacylglycerols, which are quasi-universal components of food matrices, consist of complex mixtures of molecules. Their site-specific 13 C content, their fatty acid profile, and their position on the glycerol moiety may significantly vary with the geographical, botanical, or animal origin of the sample. Such variables are valuable tracers for food authentication issues. The main objective of this work was to develop a new method based on a rapid and precise 13 C-NMR spectroscopy (using a polarization transfer technique) coupled with multivariate linear regression analyses in order to quantify the whole set of individual fatty acids within triacylglycerols. In this respect, olive oil samples were analyzed by means of both adiabatic 13 C-INEPT sequence and gas chromatography (GC). For each fatty acid within the studied matrix and for squalene as well, a multivariate prediction model was constructed using the deconvoluted peak areas of 13 C-INEPT spectra as predictors, and the data obtained by GC as response variables. This 13 C-NMR-based strategy, tested on olive oil, could serve as an alternative to the gas chromatographic quantification of individual fatty acids in other matrices, while providing additional compositional and isotopic information. Graphical abstract A strategy based on the multivariate linear regression of variables obtained by a rapid 13 C-NMR technique was developed for the quantification of individual fatty acids within triacylglycerol matrices. The conceived strategy was tested on olive oil.
The influence of climate variables on dengue in Singapore.
Pinto, Edna; Coelho, Micheline; Oliver, Leuda; Massad, Eduardo
2011-12-01
In this work we correlated dengue cases with climatic variables for the city of Singapore. This was done through a Poisson Regression Model (PRM) that considers dengue cases as the dependent variable and the climatic variables (rainfall, maximum and minimum temperature and relative humidity) as independent variables. We also used Principal Components Analysis (PCA) to choose the variables that influence in the increase of the number of dengue cases in Singapore, where PC₁ (Principal component 1) is represented by temperature and rainfall and PC₂ (Principal component 2) is represented by relative humidity. We calculated the probability of occurrence of new cases of dengue and the relative risk of occurrence of dengue cases influenced by climatic variable. The months from July to September showed the highest probabilities of the occurrence of new cases of the disease throughout the year. This was based on an analysis of time series of maximum and minimum temperature. An interesting result was that for every 2-10°C of variation of the maximum temperature, there was an average increase of 22.2-184.6% in the number of dengue cases. For the minimum temperature, we observed that for the same variation, there was an average increase of 26.1-230.3% in the number of the dengue cases from April to August. The precipitation and the relative humidity, after analysis of correlation, were discarded in the use of Poisson Regression Model because they did not present good correlation with the dengue cases. Additionally, the relative risk of the occurrence of the cases of the disease under the influence of the variation of temperature was from 1.2-2.8 for maximum temperature and increased from 1.3-3.3 for minimum temperature. Therefore, the variable temperature (maximum and minimum) was the best predictor for the increased number of dengue cases in Singapore.
Michelsen, Brigitte; Kristianslund, Eirik Klami; Sexton, Joseph; Hammer, Hilde Berner; Fagerli, Karen Minde; Lie, Elisabeth; Wierød, Ada; Kalstad, Synøve; Rødevand, Erik; Krøll, Frode; Haugeberg, Glenn; Kvien, Tore K
2017-11-01
To investigate the predictive value of baseline depression/anxiety on the likelihood of achieving joint remission in rheumatoid arthritis (RA) and psoriatic arthritis (PsA) as well as the associations between baseline depression/anxiety and the components of the remission criteria at follow-up. We included 1326 patients with RA and 728 patients with PsA from the prospective observational NOR-DMARD study starting first-time tumour necrosis factor inhibitors or methotrexate. The predictive value of depression/anxiety on remission was explored in prespecified logistic regression models and the associations between baseline depression/anxiety and the components of the remission criteria in prespecified multiple linear regression models. Baseline depression/anxiety according to EuroQoL-5D-3L, Short Form-36 (SF-36) Mental Health subscale ≤56 and SF-36 Mental Component Summary ≤38 negatively predicted 28-joint Disease Activity Score <2.6, Simplified Disease Activity Index ≤3.3, Clinical Disease Activity Index ≤2.8, ACR/EULAR Boolean and Disease Activity Index for Psoriatic Arthritis ≤4 remission after 3 and 6 months treatment in RA (p≤0.008) and partly in PsA (p from 0.001 to 0.73). Baseline depression/anxiety was associated with increased patient's and evaluator's global assessment, tender joint count and joint pain in RA at follow-up, but not with swollen joint count and acute phase reactants. Depression and anxiety may reduce likelihood of joint remission based on composite scores in RA and PsA and should be taken into account in individual patients when making a shared decision on a treatment target. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Automated Algorithms for Quantum-Level Accuracy in Atomistic Simulations: LDRD Final Report.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Thompson, Aidan Patrick; Schultz, Peter Andrew; Crozier, Paul
2014-09-01
This report summarizes the result of LDRD project 12-0395, titled "Automated Algorithms for Quantum-level Accuracy in Atomistic Simulations." During the course of this LDRD, we have developed an interatomic potential for solids and liquids called Spectral Neighbor Analysis Poten- tial (SNAP). The SNAP potential has a very general form and uses machine-learning techniques to reproduce the energies, forces, and stress tensors of a large set of small configurations of atoms, which are obtained using high-accuracy quantum electronic structure (QM) calculations. The local environment of each atom is characterized by a set of bispectrum components of the local neighbor density projectedmore » on to a basis of hyperspherical harmonics in four dimensions. The SNAP coef- ficients are determined using weighted least-squares linear regression against the full QM training set. This allows the SNAP potential to be fit in a robust, automated manner to large QM data sets using many bispectrum components. The calculation of the bispectrum components and the SNAP potential are implemented in the LAMMPS parallel molecular dynamics code. Global optimization methods in the DAKOTA software package are used to seek out good choices of hyperparameters that define the overall structure of the SNAP potential. FitSnap.py, a Python-based software pack- age interfacing to both LAMMPS and DAKOTA is used to formulate the linear regression problem, solve it, and analyze the accuracy of the resultant SNAP potential. We describe a SNAP potential for tantalum that accurately reproduces a variety of solid and liquid properties. Most significantly, in contrast to existing tantalum potentials, SNAP correctly predicts the Peierls barrier for screw dislocation motion. We also present results from SNAP potentials generated for indium phosphide (InP) and silica (SiO 2 ). We describe efficient algorithms for calculating SNAP forces and energies in molecular dynamics simulations using massively parallel computers and advanced processor ar- chitectures. Finally, we briefly describe the MSM method for efficient calculation of electrostatic interactions on massively parallel computers.« less
Zhou, Xiang
2017-12-01
Linear mixed models (LMMs) are among the most commonly used tools for genetic association studies. However, the standard method for estimating variance components in LMMs-the restricted maximum likelihood estimation method (REML)-suffers from several important drawbacks: REML requires individual-level genotypes and phenotypes from all samples in the study, is computationally slow, and produces downward-biased estimates in case control studies. To remedy these drawbacks, we present an alternative framework for variance component estimation, which we refer to as MQS. MQS is based on the method of moments (MoM) and the minimal norm quadratic unbiased estimation (MINQUE) criterion, and brings two seemingly unrelated methods-the renowned Haseman-Elston (HE) regression and the recent LD score regression (LDSC)-into the same unified statistical framework. With this new framework, we provide an alternative but mathematically equivalent form of HE that allows for the use of summary statistics. We provide an exact estimation form of LDSC to yield unbiased and statistically more efficient estimates. A key feature of our method is its ability to pair marginal z -scores computed using all samples with SNP correlation information computed using a small random subset of individuals (or individuals from a proper reference panel), while capable of producing estimates that can be almost as accurate as if both quantities are computed using the full data. As a result, our method produces unbiased and statistically efficient estimates, and makes use of summary statistics, while it is computationally efficient for large data sets. Using simulations and applications to 37 phenotypes from 8 real data sets, we illustrate the benefits of our method for estimating and partitioning SNP heritability in population studies as well as for heritability estimation in family studies. Our method is implemented in the GEMMA software package, freely available at www.xzlab.org/software.html.
NASA Astrophysics Data System (ADS)
Freeman, Mary Pyott
ABSTRACT An Analysis of Tree Mortality Using High Resolution Remotely-Sensed Data for Mixed-Conifer Forests in San Diego County by Mary Pyott Freeman The montane mixed-conifer forests of San Diego County are currently experiencing extensive tree mortality, which is defined as dieback where whole stands are affected. This mortality is likely the result of the complex interaction of many variables, such as altered fire regimes, climatic conditions such as drought, as well as forest pathogens and past management strategies. Conifer tree mortality and its spatial pattern and change over time were examined in three components. In component 1, two remote sensing approaches were compared for their effectiveness in delineating dead trees, a spatial contextual approach and an OBIA (object based image analysis) approach, utilizing various dates and spatial resolutions of airborne image data. For each approach transforms and masking techniques were explored, which were found to improve classifications, and an object-based assessment approach was tested. In component 2, dead tree maps produced by the most effective techniques derived from component 1 were utilized for point pattern and vector analyses to further understand spatio-temporal changes in tree mortality for the years 1997, 2000, 2002, and 2005 for three study areas: Palomar, Volcan and Laguna mountains. Plot-based fieldwork was conducted to further assess mortality patterns. Results indicate that conifer mortality was significantly clustered, increased substantially between 2002 and 2005, and was non-random with respect to tree species and diameter class sizes. In component 3, multiple environmental variables were used in Generalized Linear Model (GLM-logistic regression) and decision tree classifier model development, revealing the importance of climate and topographic factors such as precipitation and elevation, in being able to predict areas of high risk for tree mortality. The results from this study highlight the importance of multi-scale spatial as well as temporal analyses, in order to understand mixed-conifer forest structure, dynamics, and processes of decline, which can lead to more sustainable management of forests with continued natural and anthropogenic disturbance.
Low-flow, base-flow, and mean-flow regression equations for Pennsylvania streams
Stuckey, Marla H.
2006-01-01
Low-flow, base-flow, and mean-flow characteristics are an important part of assessing water resources in a watershed. These streamflow characteristics can be used by watershed planners and regulators to determine water availability, water-use allocations, assimilative capacities of streams, and aquatic-habitat needs. Streamflow characteristics are commonly predicted by use of regression equations when a nearby streamflow-gaging station is not available. Regression equations for predicting low-flow, base-flow, and mean-flow characteristics for Pennsylvania streams were developed from data collected at 293 continuous- and partial-record streamflow-gaging stations with flow unaffected by upstream regulation, diversion, or mining. Continuous-record stations used in the regression analysis had 9 years or more of data, and partial-record stations used had seven or more measurements collected during base-flow conditions. The state was divided into five low-flow regions and regional regression equations were developed for the 7-day, 10-year; 7-day, 2-year; 30-day, 10-year; 30-day, 2-year; and 90-day, 10-year low flows using generalized least-squares regression. Statewide regression equations were developed for the 10-year, 25-year, and 50-year base flows using generalized least-squares regression. Statewide regression equations were developed for harmonic mean and mean annual flow using weighted least-squares regression. Basin characteristics found to be significant explanatory variables at the 95-percent confidence level for one or more regression equations were drainage area, basin slope, thickness of soil, stream density, mean annual precipitation, mean elevation, and the percentage of glaciation, carbonate bedrock, forested area, and urban area within a basin. Standard errors of prediction ranged from 33 to 66 percent for the n-day, T-year low flows; 21 to 23 percent for the base flows; and 12 to 38 percent for the mean annual flow and harmonic mean, respectively. The regression equations are not valid in watersheds with upstream regulation, diversions, or mining activities. Watersheds with karst features need close examination as to the applicability of the regression-equation results.
Self-consistent asset pricing models
NASA Astrophysics Data System (ADS)
Malevergne, Y.; Sornette, D.
2007-08-01
We discuss the foundations of factor or regression models in the light of the self-consistency condition that the market portfolio (and more generally the risk factors) is (are) constituted of the assets whose returns it is (they are) supposed to explain. As already reported in several articles, self-consistency implies correlations between the return disturbances. As a consequence, the alphas and betas of the factor model are unobservable. Self-consistency leads to renormalized betas with zero effective alphas, which are observable with standard OLS regressions. When the conditions derived from internal consistency are not met, the model is necessarily incomplete, which means that some sources of risk cannot be replicated (or hedged) by a portfolio of stocks traded on the market, even for infinite economies. Analytical derivations and numerical simulations show that, for arbitrary choices of the proxy which are different from the true market portfolio, a modified linear regression holds with a non-zero value αi at the origin between an asset i's return and the proxy's return. Self-consistency also introduces “orthogonality” and “normality” conditions linking the betas, alphas (as well as the residuals) and the weights of the proxy portfolio. Two diagnostics based on these orthogonality and normality conditions are implemented on a basket of 323 assets which have been components of the S&P500 in the period from January 1990 to February 2005. These two diagnostics show interesting departures from dynamical self-consistency starting about 2 years before the end of the Internet bubble. Assuming that the CAPM holds with the self-consistency condition, the OLS method automatically obeys the resulting orthogonality and normality conditions and therefore provides a simple way to self-consistently assess the parameters of the model by using proxy portfolios made only of the assets which are used in the CAPM regressions. Finally, the factor decomposition with the self-consistency condition derives a risk-factor decomposition in the multi-factor case which is identical to the principal component analysis (PCA), thus providing a direct link between model-driven and data-driven constructions of risk factors. This correspondence shows that PCA will therefore suffer from the same limitations as the CAPM and its multi-factor generalization, namely lack of out-of-sample explanatory power and predictability. In the multi-period context, the self-consistency conditions force the betas to be time-dependent with specific constraints.
NASA Astrophysics Data System (ADS)
Visser, H.; Molenaar, J.
1995-05-01
The detection of trends in climatological data has become central to the discussion on climate change due to the enhanced greenhouse effect. To prove detection, a method is needed (i) to make inferences on significant rises or declines in trends, (ii) to take into account natural variability in climate series, and (iii) to compare output from GCMs with the trends in observed climate data. To meet these requirements, flexible mathematical tools are needed. A structural time series model is proposed with which a stochastic trend, a deterministic trend, and regression coefficients can be estimated simultaneously. The stochastic trend component is described using the class of ARIMA models. The regression component is assumed to be linear. However, the regression coefficients corresponding with the explanatory variables may be time dependent to validate this assumption. The mathematical technique used to estimate this trend-regression model is the Kaiman filter. The main features of the filter are discussed.Examples of trend estimation are given using annual mean temperatures at a single station in the Netherlands (1706-1990) and annual mean temperatures at Northern Hemisphere land stations (1851-1990). The inclusion of explanatory variables is shown by regressing the latter temperature series on four variables: Southern Oscillation index (SOI), volcanic dust index (VDI), sunspot numbers (SSN), and a simulated temperature signal, induced by increasing greenhouse gases (GHG). In all analyses, the influence of SSN on global temperatures is found to be negligible. The correlations between temperatures and SOI and VDI appear to be negative. For SOI, this correlation is significant, but for VDI it is not, probably because of a lack of volcanic eruptions during the sample period. The relation between temperatures and GHG is positive, which is in agreement with the hypothesis of a warming climate because of increasing levels of greenhouse gases. The prediction performance of the model is rather poor, and possible explanations are discussed.
NASA Astrophysics Data System (ADS)
Khazaei, Ardeshir; Sarmasti, Negin; Seyf, Jaber Yousefi
2016-03-01
Quantitative structure activity relationship were used to study a series of curcumin-related compounds with inhibitory effect on prostate cancer PC-3 cells, pancreas cancer Panc-1 cells, and colon cancer HT-29 cells. Sphere exclusion method was used to split data set in two categories of train and test set. Multiple linear regression, principal component regression and partial least squares were used as the regression methods. In other hand, to investigate the effect of feature selection methods, stepwise, Genetic algorithm, and simulated annealing were used. In two cases (PC-3 cells and Panc-1 cells), the best models were generated by a combination of multiple linear regression and stepwise (PC-3 cells: r2 = 0.86, q2 = 0.82, pred_r2 = 0.93, and r2m (test) = 0.43, Panc-1 cells: r2 = 0.85, q2 = 0.80, pred_r2 = 0.71, and r2m (test) = 0.68). For the HT-29 cells, principal component regression with stepwise (r2 = 0.69, q2 = 0.62, pred_r2 = 0.54, and r2m (test) = 0.41) is the best method. The QSAR study reveals descriptors which have crucial role in the inhibitory property of curcumin-like compounds. 6ChainCount, T_C_C_1, and T_O_O_7 are the most important descriptors that have the greatest effect. With a specific end goal to design and optimization of novel efficient curcumin-related compounds it is useful to introduce heteroatoms such as nitrogen, oxygen, and sulfur atoms in the chemical structure (reduce the contribution of T_C_C_1 descriptor) and increase the contribution of 6ChainCount and T_O_O_7 descriptors. Models can be useful in the better design of some novel curcumin-related compounds that can be used in the treatment of prostate, pancreas, and colon cancers.
Partitioning sources of variation in vertebrate species richness
Boone, R.B.; Krohn, W.B.
2000-01-01
Aim: To explore biogeographic patterns of terrestrial vertebrates in Maine, USA using techniques that would describe local and spatial correlations with the environment. Location: Maine, USA. Methods: We delineated the ranges within Maine (86,156 km2) of 275 species using literature and expert review. Ranges were combined into species richness maps, and compared to geomorphology, climate, and woody plant distributions. Methods were adapted that compared richness of all vertebrate classes to each environmental correlate, rather than assessing a single explanatory theory. We partitioned variation in species richness into components using tree and multiple linear regression. Methods were used that allowed for useful comparisons between tree and linear regression results. For both methods we partitioned variation into broad-scale (spatially autocorrelated) and fine-scale (spatially uncorrelated) explained and unexplained components. By partitioning variance, and using both tree and linear regression in analyses, we explored the degree of variation in species richness for each vertebrate group that Could be explained by the relative contribution of each environmental variable. Results: In tree regression, climate variation explained richness better (92% of mean deviance explained for all species) than woody plant variation (87%) and geomorphology (86%). Reptiles were highly correlated with environmental variation (93%), followed by mammals, amphibians, and birds (each with 84-82% deviance explained). In multiple linear regression, climate was most closely associated with total vertebrate richness (78%), followed by woody plants (67%) and geomorphology (56%). Again, reptiles were closely correlated with the environment (95%), followed by mammals (73%), amphibians (63%) and birds (57%). Main conclusions: Comparing variation explained using tree and multiple linear regression quantified the importance of nonlinear relationships and local interactions between species richness and environmental variation, identifying the importance of linear relationships between reptiles and the environment, and nonlinear relationships between birds and woody plants, for example. Conservation planners should capture climatic variation in broad-scale designs; temperatures may shift during climate change, but the underlying correlations between the environment and species richness will presumably remain.
NASA Astrophysics Data System (ADS)
Yin, Jianhua; Xia, Yang
2014-12-01
Fourier transform infrared imaging (FTIRI) combining with principal component regression (PCR) analysis were used to determine the reduction of proteoglycan (PG) in articular cartilage after the transection of the anterior cruciate ligament (ACL). A number of canine knee cartilage sections were harvested from the meniscus-covered and meniscus-uncovered medial tibial locations from the control joints, the ACL joints at three time points after the surgery, and their contralateral joints. The PG loss in the ACL cartilage was related positively to the durations after the surgery. The PG loss in the contralateral knees was less than that of the ACL knees. The PG loss in the meniscus-covered cartilage was less than that of the meniscus-uncovered tissue in both ACL and contralateral knees. The quantitative mapping of PG loss could monitor the disease progression and repair processes in arthritis.
Homogenization Issues in the Combustion of Heterogeneous Solid Propellants
NASA Technical Reports Server (NTRS)
Chen, M.; Buckmaster, J.; Jackson, T. L.; Massa, L.
2002-01-01
We examine random packs of discs or spheres, models for ammonium-perchlorate-in-binder propellants, and discuss their average properties. An analytical strategy is described for calculating the mean or effective heat conduction coefficient in terms of the heat conduction coefficients of the individual components, and the results are verified by comparison with those of direct numerical simulations (dns) for both 2-D (disc) and 3-D (sphere) packs across which a temperature difference is applied. Similarly, when the surface regression speed of each component is related to the surface temperature via a simple Arrhenius law, an analytical strategy is developed for calculating an effective Arrhenius law for the combination, and these results are verified using dns in which a uniform heat flux is applied to the pack surface, causing it to regress. These results are needed for homogenization strategies necessary for fully integrated 2-D or 3-D simulations of heterogeneous propellant combustion.
Eaton, Jennifer L; Mohr, David C; Hodgson, Michael J; McPhaul, Kathleen M
2018-02-01
To describe development and validation of the work-related well-being (WRWB) index. Principal components analysis was performed using Federal Employee Viewpoint Survey (FEVS) data (N = 392,752) to extract variables representing worker well-being constructs. Confirmatory factor analysis was performed to verify factor structure. To validate the WRWB index, we used multiple regression analysis to examine relationships with burnout associated outcomes. Principal Components Analysis identified three positive psychology constructs: "Work Positivity", "Co-worker Relationships", and "Work Mastery". An 11 item index explaining 63.5% of variance was achieved. The structural equation model provided a very good fit to the data. Higher WRWB scores were positively associated with all three employee experience measures examined in regression models. The new WRWB index shows promise as a valid and widely accessible instrument to assess worker well-being.
A kinetic model of municipal sludge degradation during non-catalytic wet oxidation.
Prince-Pike, Arrian; Wilson, David I; Baroutian, Saeid; Andrews, John; Gapes, Daniel J
2015-12-15
Wet oxidation is a successful process for the treatment of municipal sludge. In addition, the resulting effluent from wet oxidation is a useful carbon source for subsequent biological nutrient removal processes in wastewater treatment. Owing to limitations with current kinetic models, this study produced a kinetic model which predicts the concentrations of key intermediate components during wet oxidation. The model was regressed from lab-scale experiments and then subsequently validated using data from a wet oxidation pilot plant. The model was shown to be accurate in predicting the concentrations of each component, and produced good results when applied to a plant 500 times larger in size. A statistical study was undertaken to investigate the validity of the regressed model parameters. Finally the usefulness of the model was demonstrated by suggesting optimum operating conditions such that volatile fatty acids were maximised. Copyright © 2015 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Sirenko, M. A.; Tarasenko, P. F.; Pushkarev, M. I.
2017-01-01
One of the most noticeable features of sign-based statistical procedures is an opportunity to build an exact test for simple hypothesis testing of parameters in a regression model. In this article, we expanded a sing-based approach to the nonlinear case with dependent noise. The examined model is a multi-quantile regression, which makes it possible to test hypothesis not only of regression parameters, but of noise parameters as well.