Sample records for joinpoint regression program

  1. Silent changes of tuberculosis in Iran (2005-2015): A joinpoint regression analysis.

    PubMed

    Marvi, Abolfazl; Asadi-Aliabadi, Mehran; Darabi, Mehdi; Rostami-Maskopaee, Fereshteh; Siamian, Hasan; Abedi, Ghasem

    2017-01-01

    Tuberculosis (TB) poses a severe risk to public health through the world but excessively distresses low-income nations. The aim of this study is to analyze silent changes of TB in Iran (2005-2015): A joinpoint regression analysis. This is a trend study conducted on all patients ( n = 70) that register in control disease center of Joibar (one of coastal cities and tourism destination in Northern Iran which was recognized as an independent town since 1998) during 2005-2015. The characteristics of patients imported to the SPSS 19 and variation in incidence rate of different forms of pulmonary TB (PTB) (PTB+ or PTB-) and extra-PTB (EPTB)/year was analyzed. Variation in incidence rate of TB for male and female groups and different age groups (0-14, 15-24, 25-34, 35-44, 45-54, 55-64, and above 65 years) was analyzed, variation in trend of this diseases for different groups was compared in intended years, and also, variation in incidence rate of TB was analyzed by Joinpoint Regression Software. The total number of TB was 70 cases during 2005-2015. The mean age of patients was 42.31 ± 21.26 years and median age was 40 years. About 71.4% of patients were PTB (55.7% for with PTB+ and 15.7% with PTB-) and rest of them (28.4%) were EPTB. In regard to classification of cases, 97.1% of them were new cases, 1.45% of them were relapsed cases, and 1.45% of them imported cases. In addition, history of hospitalization due to TB was observed in 44.3%. Despite recent developments of governmental health-care system in Iran and proper access to it and considering this fact that identification of TB cases with passive surveillance is possible. Hence, developing certain programs for sensitization of the covered population is essential.

  2. Trend Analysis of Cancer Mortality and Incidence in Panama, Using Joinpoint Regression Analysis.

    PubMed

    Politis, Michael; Higuera, Gladys; Chang, Lissette Raquel; Gomez, Beatriz; Bares, Juan; Motta, Jorge

    2015-06-01

    Cancer is one of the leading causes of death worldwide and its incidence is expected to increase in the future. In Panama, cancer is also one of the leading causes of death. In 1964, a nationwide cancer registry was started and it was restructured and improved in 2012. The aim of this study is to utilize Joinpoint regression analysis to study the trends of the incidence and mortality of cancer in Panama in the last decade. Cancer mortality was estimated from the Panamanian National Institute of Census and Statistics Registry for the period 2001 to 2011. Cancer incidence was estimated from the Panamanian National Cancer Registry for the period 2000 to 2009. The Joinpoint Regression Analysis program, version 4.0.4, was used to calculate trends by age-adjusted incidence and mortality rates for selected cancers. Overall, the trend of age-adjusted cancer mortality in Panama has declined over the last 10 years (-1.12% per year). The cancers for which there was a significant increase in the trend of mortality were female breast cancer and ovarian cancer; while the highest increases in incidence were shown for breast cancer, liver cancer, and prostate cancer. Significant decrease in the trend of mortality was evidenced for the following: prostate cancer, lung and bronchus cancer, and cervical cancer; with respect to incidence, only oral and pharynx cancer in both sexes had a significant decrease. Some cancers showed no significant trends in incidence or mortality. This study reveals contrasting trends in cancer incidence and mortality in Panama in the last decade. Although Panama is considered an upper middle income nation, this study demonstrates that some cancer mortality trends, like the ones seen in cervical and lung cancer, behave similarly to the ones seen in high income countries. In contrast, other types, like breast cancer, follow a pattern seen in countries undergoing a transition to a developed economy with its associated lifestyle, nutrition, and body weight

  3. Selecting the Final Model — Joinpoint Help System 4.4.0.0

    Cancer.gov

    Why doesn't the joinpoint program give me the best possible fit? I can see other models with more joinpoints that would fit better. Exactly how does the program decide which tests to perform and which joinpoint model is the final model?

  4. Analysis of geographical disparities in temporal trends of health outcomes using space-time joinpoint regression

    NASA Astrophysics Data System (ADS)

    Goovaerts, Pierre

    2013-06-01

    Analyzing temporal trends in health outcomes can provide a more comprehensive picture of the burden of a disease like cancer and generate new insights about the impact of various interventions. In the United States such an analysis is increasingly conducted using joinpoint regression outside a spatial framework, which overlooks the existence of significant variation among U.S. counties and states with regard to the incidence of cancer. This paper presents several innovative ways to account for space in joinpoint regression: (1) prior filtering of noise in the data by binomial kriging and use of the kriging variance as measure of reliability in weighted least-square regression, (2) detection of significant boundaries between adjacent counties based on tests of parallelism of time trends and confidence intervals of annual percent change of rates, and (3) creation of spatially compact groups of counties with similar temporal trends through the application of hierarchical cluster analysis to the results of boundary analysis. The approach is illustrated using time series of proportions of prostate cancer late-stage cases diagnosed yearly in every county of Florida since 1980s. The annual percent change (APC) in late-stage diagnosis and the onset years for significant declines vary greatly across Florida. Most counties with non-significant average APC are located in the north-western part of Florida, known as the Panhandle, which is more rural than other parts of Florida. The number of significant boundaries peaked in the early 1990s when prostate-specific antigen (PSA) test became widely available, a temporal trend that suggests the existence of geographical disparities in the implementation and/or impact of the new screening procedure, in particular as it began available.

  5. Mortality trends among Japanese dialysis patients, 1988-2013: a joinpoint regression analysis.

    PubMed

    Wakasugi, Minako; Kazama, Junichiro James; Narita, Ichiei

    2016-09-01

    Evaluation of mortality trends in dialysis patients is important for improving their prognoses. The present study aimed to examine temporal trends in deaths (all-cause, cardiovascular, noncardiovascular and the five leading causes) among Japanese dialysis patients. Mortality data were extracted from the Japanese Society of Dialysis Therapy registry. Age-standardized mortality rates were calculated by direct standardization against the 2013 dialysis population. The average annual percentage of change (APC) and the corresponding 95% confidence interval (CI) were computed for trends using joinpoint regression analysis. A total of 469 324 deaths occurred, of which 25.9% were from cardiac failure, 17.5% from infectious disease, 10.2% from cerebrovascular disorders, 8.6% from malignant tumors and 5.6% from cardiac infarction. The joinpoint trend for all-cause mortality decreased significantly, by -3.7% (95% CI -4.2 to -3.2) per year from 1988 through 2000, then decreased more gradually, by -1.4% (95% CI -1.7 to -1.2) per year during 2000-13. The improved mortality rates were mainly due to decreased deaths from cardiovascular disease, with mortality rates due to noncardiovascular disease outnumbering those of cardiovascular disease in the last decade. Among the top five causes of death, cardiac failure has shown a marked decrease in mortality rate. However, the rates due to infectious disease have remained stable during the study period [APC 0.1 (95% CI -0.2-0.3)]. Significant progress has been made, particularly with regard to the decrease in age-standardized mortality rates. The risk of cardiovascular death has decreased, while the risk of death from infection has remained unchanged for 25 years. © The Author 2016. Published by Oxford University Press on behalf of ERA-EDTA. All rights reserved.

  6. Hysterectomy trends in Australia, 2000-2001 to 2013-2014: joinpoint regression analysis.

    PubMed

    Wilson, Louise F; Pandeya, Nirmala; Mishra, Gita D

    2017-10-01

    Hysterectomy is a common gynecological procedure, particularly in middle and high income countries. The aim of this paper was to describe and examine hysterectomy trends in Australia from 2000-2001 to 2013-2014. For women aged 25 years and over, data on the number of hysterectomies performed in Australia annually were sourced from the National Hospital and Morbidity Database. Age-specific and age-standardized hysterectomy rates per 10 000 women were estimated with adjustment for hysterectomy prevalence in the population. Using joinpoint regression analysis, we estimated the average annual percentage change over the whole study period (2000-2014) and the annual percentage change for each identified trend line segment. A total of 431 162 hysterectomy procedures were performed between 2000-2001 and 2013-2014; an annual average of 30 797 procedures (for women aged 25+ years). The age-standardized hysterectomy rate, adjusted for underlying hysterectomy prevalence, decreased significantly over the whole study period [average annual percentage change -2.8%; 95% confidence interval (CI) -3.5%, -2.2%]. The trend was not linear with one joinpoint detected in 2008-2009. Between 2000-2001 and 2008-2009 there was a significant decrease in incidence (annual percentage change -4.4%; 95% CI -5.2%, -3.7%); from 2008-2009 to 2013-2014 the decrease was minimal and not significantly different from zero (annual percentage change -0.1%; 95% CI -1.7%, 1.5%). A similar change in trend was seen in all age groups. Hysterectomy rates in Australian women aged 25 years and over have declined in the first decade of the 21st century. However, in the last 5 years, rates appear to have stabilized. © 2017 Nordic Federation of Societies of Obstetrics and Gynecology.

  7. Scientific Productivity on Research in Ethical Issues over the Past Half Century: A JoinPoint Regression Analysis.

    PubMed

    Long, Nguyen Phuoc; Huy, Nguyen Tien; Trang, Nguyen Thi Huyen; Luan, Nguyen Thien; Anh, Nguyen Hoang; Nghi, Tran Diem; Hieu, Mai Van; Hirayama, Kenji; Karbwang, Juntra

    2014-09-01

    Ethics is one of the main pillars in the development of science. We performed a JoinPoint regression analysis to analyze the trends of ethical issue research over the past half century. The question is whether ethical issues are neglected despite their importance in modern research. PubMed electronic library was used to retrieve publications of all fields and ethical issues. JoinPoint regression analysis was used to identify the significant time trends of publications of all fields and ethical issues, as well as the proportion of publications on ethical issues to all fields over the past half century. Annual percent changes (APC) were computed with their 95% confidence intervals, and a p-value < 0.05 was considered statistically significant. We found that publications of ethical issues increased during the period of 1965-1996 but slightly fell in recent years (from 1996 to 2013). When comparing the absolute number of ethics related articles (APEI) to all publications of all fields (APAF) on PubMed, the results showed that the proportion of APEI to APAF statistically increased during the periods of 1965-1974, 1974-1986, and 1986-1993, with APCs of 11.0, 2.1, and 8.8, respectively. However, the trend has gradually dropped since 1993 and shown a marked decrease from 2002 to 2013 with an annual percent change of -7.4%. Scientific productivity in ethical issues research on over the past half century rapidly increased during the first 30-year period but has recently been in decline. Since ethics is an important aspect of scientific research, we suggest that greater attention is needed in order to emphasize the role of ethics in modern research.

  8. Prostate cancer mortality in Serbia, 1991-2010: a joinpoint regression analysis.

    PubMed

    Ilic, Milena; Ilic, Irena

    2016-06-01

    The aim of this descriptive epidemiological study was to analyze the mortality trend of prostate cancer in Serbia (excluding the Kosovo and Metohia) from 1991 to 2010. The age-standardized prostate cancer mortality rates (per 100 000) were calculated by direct standardization, using the World Standard Population. Average annual percentage of change (AAPC) and the corresponding 95% confidence interval (CI) was computed for trend using the joinpoint regression analysis. Significantly increased trend in prostate cancer mortality was recorded in Serbia continuously from 1991 to 2010 (AAPC = +2.2, 95% CI = 1.6-2.9). Mortality rates for prostate cancer showed a significant upward trend in all men aged 50 and over: AAPC (95% CI) was +1.9% (0.1-3.8) in aged 50-59 years, +1.7% (0.9-2.6) in aged 60-69 years, +2.0% (1.2-2.9) in aged 70-79 years and +3.5% (2.4-4.6) in aged 80 years and over. According to comparability test, prostate cancer mortality trends in majority of age groups were parallel (final selected model failed to reject parallelism, P > 0.05). The increasing prostate cancer mortality trend implies the need for more effective measures of prevention, screening and early diagnosis, as well as prostate cancer treatment in Serbia. © The Author 2015. Published by Oxford University Press on behalf of Faculty of Public Health. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  9. Diabetes mortality in Serbia, 1991-2015 (a nationwide study): A joinpoint regression analysis.

    PubMed

    Ilic, Milena; Ilic, Irena

    2017-02-01

    The aim of this study was to analyze the mortality trends of diabetes mellitus in Serbia (excluding the Autonomous Province of Kosovo and Metohia). A population-based cross sectional study analyzing diabetes mortality in Serbia in the period 1991-2015 was carried out based on official data. The age-standardized mortality rates (per 100,000) were calculated by direct standardization, using the European Standard Population. Average annual percentage of change (AAPC) and the corresponding 95% confidence interval (CI) were computed using the joinpoint regression analysis. More than 63,000 (about 27,000 of men and 36,000 of women) diabetes deaths occurred in Serbia from 1991 to 2015. Death rates from diabetes were almost equal in men and in women (about 24.0 per 100,000) and places Serbia among the countries with the highest diabetes mortality rates in Europe. Since 1991, mortality from diabetes in men significantly increased by +1.2% per year (95% CI 0.7-1.7), but non-significantly increased in women by +0.2% per year (95% CI -0.4 to 0.7). Increased trends in diabetes type 1 mortality rates were significant in both genders in Serbia. Trends in mortality for diabetes type 2 showed a significant decrease in both genders since 2010. Given that diabetes mortality trends showed different patterns during the studied period, our results imply that further observation of trend is needed. Copyright © 2016 Primary Care Diabetes Europe. Published by Elsevier Ltd. All rights reserved.

  10. Temporal trends in motor vehicle fatalities in the United States, 1968 to 2010 - a joinpoint regression analysis.

    PubMed

    Bandi, Priti; Silver, Diana; Mijanovich, Tod; Macinko, James

    2015-12-01

    In the past 40 years, a variety of factors might have impacted motor vehicle (MV) fatality trends in the US, including public health policies, engineering innovations, trauma care improvements, etc. These factors varied in their timing across states/localities, and many were targeted at particular population subgroups. In order to identify and quantify differential rates of change over time and differences in trend patterns between population subgroups, this study employed a novel analytic method to assess temporal trends in MV fatalities between 1968 and 2010, by age group and sex. Cause-specific MV fatality data from traffic injuries between 1968 and 2010, based on death certificates filed in the 50 states, and DC were obtained from Centers for Disease Control and Prevention Wide-ranging Online Data for Epidemiologic Research (CDC WONDER). Long-term (1968 to 2010) and short-term (log-linear piecewise segments) trends in fatality rates were compared for males and females overall and in four separate age groups using joinpoint regression. MV fatalities declined on average by 2.4% per year in males and 2.2% per year in females between 1968 and 2010, with significant declines observed in all age groups and in both sexes. In males overall and those 25 to 64 years, sharp declines between 1968 and mid-to-late 1990s were followed by a stalling until the mid-2000s, but rates in females experienced a long-term steady decline of a lesser magnitude than males during this time. Trends in those aged <1 to 14 years and 15 to 24 years were mostly steady over time, but males had a larger decline than females in the latter age group between 1968 and the mid-2000s. In ages 65+, short-term trends were similar between sexes. Despite significant long-term declines in MV fatalities, the application of Joinpoint Regression found that progress in young adult and middle-aged adult males stalled in recent decades and rates in males declined relatively more than in females in certain age

  11. Geographical, temporal and racial disparities in late-stage prostate cancer incidence across Florida: A multiscale joinpoint regression analysis

    PubMed Central

    2011-01-01

    Background Although prostate cancer-related incidence and mortality have declined recently, striking racial/ethnic differences persist in the United States. Visualizing and modelling temporal trends of prostate cancer late-stage incidence, and how they vary according to geographic locations and race, should help explaining such disparities. Joinpoint regression is increasingly used to identify the timing and extent of changes in time series of health outcomes. Yet, most analyses of temporal trends are aspatial and conducted at the national level or for a single cancer registry. Methods Time series (1981-2007) of annual proportions of prostate cancer late-stage cases were analyzed for non-Hispanic Whites and non-Hispanic Blacks in each county of Florida. Noise in the data was first filtered by binomial kriging and results were modelled using joinpoint regression. A similar analysis was also conducted at the state level and for groups of metropolitan and non-metropolitan counties. Significant racial differences were detected using tests of parallelism and coincidence of time trends. A new disparity statistic was introduced to measure spatial and temporal changes in the frequency of racial disparities. Results State-level percentage of late-stage diagnosis decreased 50% since 1981; a decline that accelerated in the 90's when Prostate Specific Antigen (PSA) screening was introduced. Analysis at the metropolitan and non-metropolitan levels revealed that the frequency of late-stage diagnosis increased recently in urban areas, and this trend was significant for white males. The annual rate of decrease in late-stage diagnosis and the onset years for significant declines varied greatly among counties and racial groups. Most counties with non-significant average annual percent change (AAPC) were located in the Florida Panhandle for white males, whereas they clustered in South-eastern Florida for black males. The new disparity statistic indicated that the spatial extent of

  12. Geographical, temporal and racial disparities in late-stage prostate cancer incidence across Florida: a multiscale joinpoint regression analysis.

    PubMed

    Goovaerts, Pierre; Xiao, Hong

    2011-12-05

    Although prostate cancer-related incidence and mortality have declined recently, striking racial/ethnic differences persist in the United States. Visualizing and modelling temporal trends of prostate cancer late-stage incidence, and how they vary according to geographic locations and race, should help explaining such disparities. Joinpoint regression is increasingly used to identify the timing and extent of changes in time series of health outcomes. Yet, most analyses of temporal trends are aspatial and conducted at the national level or for a single cancer registry. Time series (1981-2007) of annual proportions of prostate cancer late-stage cases were analyzed for non-Hispanic Whites and non-Hispanic Blacks in each county of Florida. Noise in the data was first filtered by binomial kriging and results were modelled using joinpoint regression. A similar analysis was also conducted at the state level and for groups of metropolitan and non-metropolitan counties. Significant racial differences were detected using tests of parallelism and coincidence of time trends. A new disparity statistic was introduced to measure spatial and temporal changes in the frequency of racial disparities. State-level percentage of late-stage diagnosis decreased 50% since 1981; a decline that accelerated in the 90's when Prostate Specific Antigen (PSA) screening was introduced. Analysis at the metropolitan and non-metropolitan levels revealed that the frequency of late-stage diagnosis increased recently in urban areas, and this trend was significant for white males. The annual rate of decrease in late-stage diagnosis and the onset years for significant declines varied greatly among counties and racial groups. Most counties with non-significant average annual percent change (AAPC) were located in the Florida Panhandle for white males, whereas they clustered in South-eastern Florida for black males. The new disparity statistic indicated that the spatial extent of racial disparities reached a

  13. Trends in Unintentional Fall-Related Traumatic Brain Injury Death Rates in Older Adults in the United States, 1980-2010: A Joinpoint Analysis.

    PubMed

    Sung, Kuan-Chin; Liang, Fu-Wen; Cheng, Tain-Junn; Lu, Tsung-Hsueh; Kawachi, Ichiro

    2015-07-15

    Unintentional fall-related traumatic brain injury (TBI) death rate is high in older adults in the United States, but little is known regarding trends of these death rates. We sought to examine unintentional fall-related TBI death rates by age and sex in older adults from 1980 through 2010 in the United States. We used multiple-cause mortality data from 1980 through 2010 (31 years of data) to identify fall-related TBI deaths. Using a joinpoint regression program, we determined the joinpoints (years at which trends change significantly) and annual percentage changes (APCs) in mortality trends. The fall-related TBI death rates (deaths per 100,000 population) in older adults ages 65-74, 75-84, and 85 years and above were 2.7, 9.2, and 21.5 for females and 8.5, 18.2, and 40.8 for males, respectively, in 1980. The rate was about the same in 1992, yet increased markedly to 5.9, 23.4, and 68.9 for females and 11.6, 41.2, and 112.4 for males, respectively, in 2010. For males all 65 years years of age and above, we found the first joinpoint in 1992, when the APC for 1980 through 1992, -0.8%, changed to 6.2% for 1992-2005. The second joinpoint occurred in 2005, when the APC decreased to 3.7% for 2005-2010. For all females 65 years of age and above, the first joinpoint was in 1993 when the APC for 1980 through 1993, -0.2%, changed to 7.6% from 1993 to 2005. The second joinpoint occurred in 2005 when the APC decreased to 3.8% for 2005-2010. This descriptive epidemiological study suggests increasing fall-related TBI death rates from 1992 to 2005 and then a slowdown of increasing trends between 2005 and 2010. Continued monitoring of fall-related TBI death rate trends is needed to determine the burden of this public health problem among older adults in the United States.

  14. Consecutive Non-Significant Segments — Joinpoint Help System 4.4.0.0

    Cancer.gov

    Sometimes, the APC for one segment is significantly different from zero, but when an extra joinpoint in the segment is determined by the Joinpoint software, neither APCs for the two consecutive segments are significant. Why?

  15. Malignant Lymphatic and Hematopoietic Neoplasms Mortality in Serbia, 1991–2010: A Joinpoint Regression Analysis

    PubMed Central

    Ilic, Milena; Ilic, Irena

    2014-01-01

    Background Limited data on mortality from malignant lymphatic and hematopoietic neoplasms have been published for Serbia. Methods The study covered population of Serbia during the 1991–2010 period. Mortality trends were assessed using the joinpoint regression analysis. Results Trend for overall death rates from malignant lymphoid and haematopoietic neoplasms significantly decreased: by −2.16% per year from 1991 through 1998, and then significantly increased by +2.20% per year for the 1998–2010 period. The growth during the entire period was on average +0.8% per year (95% CI 0.3 to 1.3). Mortality was higher among males than among females in all age groups. According to the comparability test, mortality trends from malignant lymphoid and haematopoietic neoplasms in men and women were parallel (final selected model failed to reject parallelism, P = 0.232). Among younger Serbian population (0–44 years old) in both sexes: trends significantly declined in males for the entire period, while in females 15–44 years of age mortality rates significantly declined only from 2003 onwards. Mortality trend significantly increased in elderly in both genders (by +1.7% in males and +1.5% in females in the 60–69 age group, and +3.8% in males and +3.6% in females in the 70+ age group). According to the comparability test, mortality trend for Hodgkin's lymphoma differed significantly from mortality trends for all other types of malignant lymphoid and haematopoietic neoplasms (P<0.05). Conclusion Unfavourable mortality trend in Serbia requires targeted intervention for risk factors control, early diagnosis and modern therapy. PMID:25333862

  16. Colorectal cancer mortality trends in Serbia during 1991-2010: an age-period-cohort analysis and a joinpoint regression analysis.

    PubMed

    Ilic, Milena; Ilic, Irena

    2016-06-22

    For both men and women worldwide, colorectal cancer is among the leading causes of cancer-related death. This study aimed to assess the mortality trends of colorectal cancer in Serbia between 1991 and 2010, prior to the introduction of population-based screening. Joinpoint regression analysis was used to estimate average annual percent change (AAPC) with the corresponding 95% confidence interval (CI). Furthermore, age-period-cohort analysis was performed to examine the effects of birth cohort and calendar period on the observed temporal trends. We observed a significantly increased trend in colorectal cancer mortality in Serbia during the study period (AAPC = 1.6%, 95% CI 1.3%-1.8%). Colorectal cancer showed an increased mortality trend in both men (AAPC = 2.0%, 95% CI 1.7%-2.2%) and women (AAPC = 1.0%, 95% CI 0.6%-1.4%). The temporal trend of colorectal cancer mortality was significantly affected by birth cohort (P < 0.05), whereas the study period did not significantly affect the trend (P = 0.072). Colorectal cancer mortality increased for the first several birth cohorts in Serbia (from 1916 to 1955), followed by downward flexion for people born after the 1960s. According to comparability test, overall mortality trends for colon cancer and rectal and anal cancer were not parallel (the final selected model rejected parallelism, P < 0.05). We found that colorectal cancer mortality in Serbia increased considerably over the past two decades. Mortality increased particularly in men, but the trends were different according to age group and subsite. In Serbia, interventions to reduce colorectal cancer burden, especially the implementation of a national screening program, as well as treatment improvements and measures to encourage the adoption of a healthy lifestyle, are needed.

  17. Long-term trends of suicide by choice of method in Norway: a joinpoint regression analysis of data from 1969 to 2012.

    PubMed

    Puzo, Quirino; Qin, Ping; Mehlum, Lars

    2016-03-11

    Suicide mortality and the rates by specific methods in a population may change over time in response to concurrent changes in relevant factors in society. This study aimed to identify significant changing points in method-specific suicide mortality from 1969 to 2012 in Norway. Data on suicide mortality by specific methods and by sex and age were retrieved from the Norwegian Cause-of-Death Register. Long-term trends in age-standardized rates of suicide mortality were analyzed by using joinpoint regression analysis. The most frequently used suicide method in the total population was hanging, followed by poisoning and firearms. Men chose suicide by firearms more often than women, whereas poisoning and drowning were more frequently used by women. The joinpoint analysis revealed that the overall trend of suicide mortality significantly changed twice along the period of 1969 to 2012 for both sexes. The male age-standardized suicide rate increased by 3.1% per year until 1989, and decreased by 1.2% per year between 1994 and 2012. Among females the long-term suicide rate increased by 4.0% per year until 1988, decreased by 5.5% through 1995, and then stabilized. Both sexes experienced an upward trend for suicide by hanging during the 44-year observation period, with a particularly significant increase in 15-24 year old males. The most distinct change among men was seen for firearms after 1988 with a significant decrease through 2012 of around 5% per year. For women, significant reductions since 1985-88 were observed for suicide by drowning and poisoning. The present study demonstrates different time trends for different suicide methods with significant reductions in suicide by firearms, drowning and poisoning after the peak in the suicide rate in the late 1980s. Suicide by means of hanging continuously increased, but did not fully compensate for the reduced use of other methods. This lends some support for the effectiveness of method-specific suicide preventive measures

  18. Research Trends in Evidence-Based Medicine: A Joinpoint Regression Analysis of More than 50 Years of Publication Data

    PubMed Central

    Hung, Bui The; Long, Nguyen Phuoc; Hung, Le Phi; Luan, Nguyen Thien; Anh, Nguyen Hoang; Nghi, Tran Diem; Van Hieu, Mai; Trang, Nguyen Thi Huyen; Rafidinarivo, Herizo Fabien; Anh, Nguyen Ky; Hawkes, David; Huy, Nguyen Tien; Hirayama, Kenji

    2015-01-01

    Background Evidence-based medicine (EBM) has developed as the dominant paradigm of assessment of evidence that is used in clinical practice. Since its development, EBM has been applied to integrate the best available research into diagnosis and treatment with the purpose of improving patient care. In the EBM era, a hierarchy of evidence has been proposed, including various types of research methods, such as meta-analysis (MA), systematic review (SRV), randomized controlled trial (RCT), case report (CR), practice guideline (PGL), and so on. Although there are numerous studies examining the impact and importance of specific cases of EBM in clinical practice, there is a lack of research quantitatively measuring publication trends in the growth and development of EBM. Therefore, a bibliometric analysis was constructed to determine the scientific productivity of EBM research over decades. Methods NCBI PubMed database was used to search, retrieve and classify publications according to research method and year of publication. Joinpoint regression analysis was undertaken to analyze trends in research productivity and the prevalence of individual research methods. Findings Analysis indicates that MA and SRV, which are classified as the highest ranking of evidence in the EBM, accounted for a relatively small but auspicious number of publications. For most research methods, the annual percent change (APC) indicates a consistent increase in publication frequency. MA, SRV and RCT show the highest rate of publication growth in the past twenty years. Only controlled clinical trials (CCT) shows a non-significant reduction in publications over the past ten years. Conclusions Higher quality research methods, such as MA, SRV and RCT, are showing continuous publication growth, which suggests an acknowledgement of the value of these methods. This study provides the first quantitative assessment of research method publication trends in EBM. PMID:25849641

  19. Trends in Lung Cancer Incidence in Delhi, India 1988-2012: Age-Period-Cohort and Joinpoint Analyses

    PubMed

    Malhotra, Rajeev Kumar; Manoharan, Nalliah; Nair, Omana; Deo, Suryanarayana; Rath, Goura Kishor

    2018-06-25

    Introduction: Lung cancer (LC) has been one of the most commonly diagnosed cancers worldwide, both in terms of new cases and mortality. Exponential growth of economic and industrial activities in recent decades in the Delhi urban area may have increased the incidence of LC. The primary objective of this study was to evaluate the time trend according to gender. Method: LC incidence data over 25 years were obtained from the population based urban Delhi cancer registry. Joinpoint regression analysis was applied for evaluating the time trend of age-standardized incidence rates. The age-period-cohort (APC) model was employed using Poisson distribution with a log link function and the intrinsic estimator method. Results: During the 25 years, 13,489 male and 3,259 female LC cases were registered, accounting for 9.78% of male and 2.53% of female total cancer cases. Joinpoint regression analysis revealed that LC incidence in males continued to increase during the entire period, a sharp acceleration being observed starting from 2009. In females the LC incidence rate remained a plateau during 1988-2002 and thereafter increased. The cumulative risks for 1988-2012 were 1.79% and 0.45%. The full APC (IE) model showed best fit for an age-period-cohort effect on LC incidence, with significant increase with age peaking at 70-74 years in males and 65-69 years in females. A rising period effect was observed after adjusting for age and cohort effects in both genders and a declining cohort effect was identified after controlling for age and period effects. Conclusion: The incidence of LC in urban Delhi showed increasing trend from 1988-2012. Known factors such as environmental conservation, tobacco control, physical activity awareness and medical security should be implemented more vigorously over the long term in our population. Creative Commons Attribution License

  20. Jump Model / Comparability Ratio Model — Joinpoint Help System 4.4.0.0

    Cancer.gov

    The Jump Model / Comparability Ratio Model in the Joinpoint software provides a direct estimation of trend data (e.g. cancer rates) where there is a systematic scale change, which causes a “jump” in the rates, but is assumed not to affect the underlying trend.

  1. Ischaemic heart disease mortality in Serbia, 1991-2013; a joinpoint analysis

    PubMed Central

    Ilic, Milena; Ilic, Irena

    2017-01-01

    Background & objectives: Ischaemic heart disease (IHD) has been one of the leading causes of mortality in the world. In many European countries the mortality rates due to IHD have been rising rapidly. This study was aimed to assess the IHD mortality trend in Serbia. Methods: A population-based cross-sectional study analyzing IHD mortality in Serbia in the period 1991-2013 was carried out based on official data. The age-standardized rates (ASRs, per 100,000) were calculated using the direct method, according to the European standard population. Joinpoint analysis was used to estimate the average annual percentage change (AAPC) with the corresponding 95 per cent confidence interval (CI). Results: More than 253,000 people (143,420 men and 110,276 women) died due to IHD in Serbia during the observed period, and most of them (over 160,000 people) were patients with myocardial infarction (MI). Average annual ASR for IHD was 113.6/100,000. There was no overall significant trend for mortality due to IHD (AAPC=+0.1%, 95% CI −0.8-1.0), but there was one joinpoint: the trend significantly increased by +2.3 per cent per year from 1991 to 2006 and then significantly decreased by −6.4 per cent from 2006 to onwards. Significantly decreased mortality trends for MI in both genders were observed: according to the comparability test, mortality trends in men and women were parallel (final selected model failed to reject parallelism, P=0.0567). Interpretation & conclusions: No significant trend for mortality due to IHD was observed in Serbia during the study period. The substantial decline of mortality from IHD seen in most developed countries during the past decades was not observed in Serbia. Further efforts are required to reduce mortality from IHD in Serbian population. PMID:29664033

  2. Pancreatic cancer mortality in Serbia from 1991-2010 – a joinpoint analysis

    PubMed Central

    Ilić, Milena; Vlajinac, Hristina; Marinković, Jelena; Kocev, Nikola

    2013-01-01

    Aim To analyze the trends of pancreatic cancer mortality in Serbia. Methods The study covered the population of Serbia in the period 1991 to 2010. Mortality trends were assessed by the joinpoint regression analysis by age and sex. Results Age-standardized mortality rates ranged from 5.93 to 8.57 per 100 000 in men and from 3.51 to 5.79 per 100 000 in women. Pancreatic cancer mortality in all age groups was higher among men than among women. It was continuously increasing since 1991 by 1.6% (95% confidence interval [CI] 1.1 to 2.0) yearly in men and by 2.2% (95% CI 1.7 to 2.7) yearly in women. Changes in mortality were not significant in younger age groups for both sexes. In older men (≥55 years), mortality was increasing, although in age groups 70-74 and 80-84 the increase was not significant. In 65-69 years old men, the increase in mortality was significant only in the period 2004 to 2010. In ≥50 years old women, mortality significantly increased from 1991 onward. In 75-79 years old women, a non-significant decrease in the period 1991 to 2000 was followed by a significant increase from 2000 to 2010. Conclusion Serbia is one of the countries with the highest pancreatic cancer mortality in the world, with increasing mortality trend in both sexes and in most age groups. PMID:23986278

  3. RAWS II: A MULTIPLE REGRESSION ANALYSIS PROGRAM,

    DTIC Science & Technology

    This memorandum gives instructions for the use and operation of a revised version of RAWS, a multiple regression analysis program. The program...of preprocessed data, the directed retention of variable, listing of the matrix of the normal equations and its inverse, and the bypassing of the regression analysis to provide the input variable statistics only. (Author)

  4. Ridge: a computer program for calculating ridge regression estimates

    Treesearch

    Donald E. Hilt; Donald W. Seegrist

    1977-01-01

    Least-squares coefficients for multiple-regression models may be unstable when the independent variables are highly correlated. Ridge regression is a biased estimation procedure that produces stable estimates of the coefficients. Ridge regression is discussed, and a computer program for calculating the ridge coefficients is presented.

  5. Two SPSS programs for interpreting multiple regression results.

    PubMed

    Lorenzo-Seva, Urbano; Ferrando, Pere J; Chico, Eliseo

    2010-02-01

    When multiple regression is used in explanation-oriented designs, it is very important to determine both the usefulness of the predictor variables and their relative importance. Standardized regression coefficients are routinely provided by commercial programs. However, they generally function rather poorly as indicators of relative importance, especially in the presence of substantially correlated predictors. We provide two user-friendly SPSS programs that implement currently recommended techniques and recent developments for assessing the relevance of the predictors. The programs also allow the user to take into account the effects of measurement error. The first program, MIMR-Corr.sps, uses a correlation matrix as input, whereas the second program, MIMR-Raw.sps, uses the raw data and computes bootstrap confidence intervals of different statistics. The SPSS syntax, a short manual, and data files related to this article are available as supplemental materials from http://brm.psychonomic-journals.org/content/supplemental.

  6. SCI model structure determination program (OSR) user's guide. [optimal subset regression

    NASA Technical Reports Server (NTRS)

    1979-01-01

    The computer program, OSR (Optimal Subset Regression) which estimates models for rotorcraft body and rotor force and moment coefficients is described. The technique used is based on the subset regression algorithm. Given time histories of aerodynamic coefficients, aerodynamic variables, and control inputs, the program computes correlation between various time histories. The model structure determination is based on these correlations. Inputs and outputs of the program are given.

  7. Genetic Programming Transforms in Linear Regression Situations

    NASA Astrophysics Data System (ADS)

    Castillo, Flor; Kordon, Arthur; Villa, Carlos

    The chapter summarizes the use of Genetic Programming (GP) inMultiple Linear Regression (MLR) to address multicollinearity and Lack of Fit (LOF). The basis of the proposed method is applying appropriate input transforms (model respecification) that deal with these issues while preserving the information content of the original variables. The transforms are selected from symbolic regression models with optimal trade-off between accuracy of prediction and expressional complexity, generated by multiobjective Pareto-front GP. The chapter includes a comparative study of the GP-generated transforms with Ridge Regression, a variant of ordinary Multiple Linear Regression, which has been a useful and commonly employed approach for reducing multicollinearity. The advantages of GP-generated model respecification are clearly defined and demonstrated. Some recommendations for transforms selection are given as well. The application benefits of the proposed approach are illustrated with a real industrial application in one of the broadest empirical modeling areas in manufacturing - robust inferential sensors. The chapter contributes to increasing the awareness of the potential of GP in statistical model building by MLR.

  8. MULGRES: a computer program for stepwise multiple regression analysis

    Treesearch

    A. Jeff Martin

    1971-01-01

    MULGRES is a computer program source deck that is designed for multiple regression analysis employing the technique of stepwise deletion in the search for most significant variables. The features of the program, along with inputs and outputs, are briefly described, with a note on machine compatibility.

  9. Leukemia in Iran: Epidemiology and Morphology Trends.

    PubMed

    Koohi, Fatemeh; Salehiniya, Hamid; Shamlou, Reza; Eslami, Soheyla; Ghojogh, Ziyaeddin Mahery; Kor, Yones; Rafiemanesh, Hosein

    2015-01-01

    Leukemia accounts for 8% of total cancer cases and involves all age groups with different prevalence and incidence rates in Iran and the entire world and causes a significant death toll and heavy expenses for diagnosis and treatment processes. This study was done to evaluate epidemiology and morphology of blood cancer during 2003-2008. This cross- sectional study was carried out based on re- analysis of the Cancer Registry Center report of the Health Deputy in Iran during a 6-year period (2003 - 2008). Statistical analysis for incidence time trends and morphology change percentage was performed with joinpoint regression analysis using the software Joinpoint Regression Program. During the studied years a total of 18,353 hematopoietic and reticuloendothelial system cancers were recorded. Chi square test showed significant difference between sex and morphological types of blood cancer (P-value<0.001). Joinpoint analysis showed a significant increasing trend for the adjusted standard incidence rate (ASIR) for both sexes (P-value<0.05). Annual percent changes (APC) for women and men were 18.7 and 19.9, respectively. The most common morphological blood cancers were ALL, ALM, MM and CLL which accounted for 60% of total hematopoietic system cancers. Joinpoint analyze showed a significant decreasing trend for ALM in both sexes (P-value<0.05). Hematopoietic system cancers in Iran demonstrate an increasing trend for incidence rate and decreasing trend for ALL, ALM and CLL morphology.

  10. An improved multiple linear regression and data analysis computer program package

    NASA Technical Reports Server (NTRS)

    Sidik, S. M.

    1972-01-01

    NEWRAP, an improved version of a previous multiple linear regression program called RAPIER, CREDUC, and CRSPLT, allows for a complete regression analysis including cross plots of the independent and dependent variables, correlation coefficients, regression coefficients, analysis of variance tables, t-statistics and their probability levels, rejection of independent variables, plots of residuals against the independent and dependent variables, and a canonical reduction of quadratic response functions useful in optimum seeking experimentation. A major improvement over RAPIER is that all regression calculations are done in double precision arithmetic.

  11. [A SAS marco program for batch processing of univariate Cox regression analysis for great database].

    PubMed

    Yang, Rendong; Xiong, Jie; Peng, Yangqin; Peng, Xiaoning; Zeng, Xiaomin

    2015-02-01

    To realize batch processing of univariate Cox regression analysis for great database by SAS marco program. We wrote a SAS macro program, which can filter, integrate, and export P values to Excel by SAS9.2. The program was used for screening survival correlated RNA molecules of ovarian cancer. A SAS marco program could finish the batch processing of univariate Cox regression analysis, the selection and export of the results. The SAS macro program has potential applications in reducing the workload of statistical analysis and providing a basis for batch processing of univariate Cox regression analysis.

  12. Fitting program for linear regressions according to Mahon (1996)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Trappitsch, Reto G.

    2018-01-09

    This program takes the users' Input data and fits a linear regression to it using the prescription presented by Mahon (1996). Compared to the commonly used York fit, this method has the correct prescription for measurement error propagation. This software should facilitate the proper fitting of measurements with a simple Interface.

  13. Factors Affecting Regression-Discontinuity.

    ERIC Educational Resources Information Center

    Schumacker, Randall E.

    The regression-discontinuity approach to evaluating educational programs is reviewed, and regression-discontinuity post-program mean differences under various conditions are discussed. The regression-discontinuity design is used to determine whether post-program differences exist between an experimental program and a control group. The difference…

  14. Epidemiologic programs for computers and calculators. A microcomputer program for multiple logistic regression by unconditional and conditional maximum likelihood methods.

    PubMed

    Campos-Filho, N; Franco, E L

    1989-02-01

    A frequent procedure in matched case-control studies is to report results from the multivariate unmatched analyses if they do not differ substantially from the ones obtained after conditioning on the matching variables. Although conceptually simple, this rule requires that an extensive series of logistic regression models be evaluated by both the conditional and unconditional maximum likelihood methods. Most computer programs for logistic regression employ only one maximum likelihood method, which requires that the analyses be performed in separate steps. This paper describes a Pascal microcomputer (IBM PC) program that performs multiple logistic regression by both maximum likelihood estimation methods, which obviates the need for switching between programs to obtain relative risk estimates from both matched and unmatched analyses. The program calculates most standard statistics and allows factoring of categorical or continuous variables by two distinct methods of contrast. A built-in, descriptive statistics option allows the user to inspect the distribution of cases and controls across categories of any given variable.

  15. Fuzzy multinomial logistic regression analysis: A multi-objective programming approach

    NASA Astrophysics Data System (ADS)

    Abdalla, Hesham A.; El-Sayed, Amany A.; Hamed, Ramadan

    2017-05-01

    Parameter estimation for multinomial logistic regression is usually based on maximizing the likelihood function. For large well-balanced datasets, Maximum Likelihood (ML) estimation is a satisfactory approach. Unfortunately, ML can fail completely or at least produce poor results in terms of estimated probabilities and confidence intervals of parameters, specially for small datasets. In this study, a new approach based on fuzzy concepts is proposed to estimate parameters of the multinomial logistic regression. The study assumes that the parameters of multinomial logistic regression are fuzzy. Based on the extension principle stated by Zadeh and Bárdossy's proposition, a multi-objective programming approach is suggested to estimate these fuzzy parameters. A simulation study is used to evaluate the performance of the new approach versus Maximum likelihood (ML) approach. Results show that the new proposed model outperforms ML in cases of small datasets.

  16. A mathematical programming method for formulating a fuzzy regression model based on distance criterion.

    PubMed

    Chen, Liang-Hsuan; Hsueh, Chan-Ching

    2007-06-01

    Fuzzy regression models are useful to investigate the relationship between explanatory and response variables with fuzzy observations. Different from previous studies, this correspondence proposes a mathematical programming method to construct a fuzzy regression model based on a distance criterion. The objective of the mathematical programming is to minimize the sum of distances between the estimated and observed responses on the X axis, such that the fuzzy regression model constructed has the minimal total estimation error in distance. Only several alpha-cuts of fuzzy observations are needed as inputs to the mathematical programming model; therefore, the applications are not restricted to triangular fuzzy numbers. Three examples, adopted in the previous studies, and a larger example, modified from the crisp case, are used to illustrate the performance of the proposed approach. The results indicate that the proposed model has better performance than those in the previous studies based on either distance criterion or Kim and Bishu's criterion. In addition, the efficiency and effectiveness for solving the larger example by the proposed model are also satisfactory.

  17. Regression Verification Using Impact Summaries

    NASA Technical Reports Server (NTRS)

    Backes, John; Person, Suzette J.; Rungta, Neha; Thachuk, Oksana

    2013-01-01

    Regression verification techniques are used to prove equivalence of syntactically similar programs. Checking equivalence of large programs, however, can be computationally expensive. Existing regression verification techniques rely on abstraction and decomposition techniques to reduce the computational effort of checking equivalence of the entire program. These techniques are sound but not complete. In this work, we propose a novel approach to improve scalability of regression verification by classifying the program behaviors generated during symbolic execution as either impacted or unimpacted. Our technique uses a combination of static analysis and symbolic execution to generate summaries of impacted program behaviors. The impact summaries are then checked for equivalence using an o-the-shelf decision procedure. We prove that our approach is both sound and complete for sequential programs, with respect to the depth bound of symbolic execution. Our evaluation on a set of sequential C artifacts shows that reducing the size of the summaries can help reduce the cost of software equivalence checking. Various reduction, abstraction, and compositional techniques have been developed to help scale software verification techniques to industrial-sized systems. Although such techniques have greatly increased the size and complexity of systems that can be checked, analysis of large software systems remains costly. Regression analysis techniques, e.g., regression testing [16], regression model checking [22], and regression verification [19], restrict the scope of the analysis by leveraging the differences between program versions. These techniques are based on the idea that if code is checked early in development, then subsequent versions can be checked against a prior (checked) version, leveraging the results of the previous analysis to reduce analysis cost of the current version. Regression verification addresses the problem of proving equivalence of closely related program

  18. Trends in gastrointestinal cancer incidence in Iran, 2001-2010: a joinpoint analysis

    PubMed Central

    Motlagh, Ali; Karimi Jaberi, Maryam

    2016-01-01

    OBJECTIVES The main purpose of this study was to evaluate changes in the time trends of stomach, colorectal, and esophageal cancer during the past decade in Iran. METHODS Cancer incidence data for the years 2001 to 2010 were obtained from the cancer registration of the Ministry of Health. All incidence rates were directly age-standardized to the world standard population. In order to identified significant changes in time trends, we performed a joinpoint analysis. The annual percent change (APC) for each segment of the trends was then calculated. RESULTS The incidence of stomach cancer increased from 4.18 and 2.41 per 100,000 population in men and women, respectively, in 2001 to 17.06 (APC, 16.7%) and 8.85 (APC, 16.2%) per 100,000 population in 2010 for men and women, respectively. The corresponding values for colorectal cancer were 2.12 and 2.00 per 100,000 population for men and women, respectively, in 2001 and 11.28 (APC, 20.0%) and 10.33 (APC, 20.0%) per 100,000 in 2010. For esophageal cancer, the corresponding increase was from 3.25 and 2.10 per 100,000 population in 2001 to 5.57 (APC, 12.0%) and 5.62 (APC, 11.2%) per 100,000 population among men and women, respectively. The incidence increased most rapidly for stomach cancer in men and women aged 80 years and older (APC, 23.7% for men; APC, 18.6% for women), for colorectal cancer in men aged 60 to 69 years (APC, 24.2%) and in women aged 50 to 59 years (APC, 25.1%), and for esophageal cancer in men and women aged 80 years and older (APC, 17.5% for men; APC,15.3% for women) over the period of the study. CONCLUSIONS The incidence of gastrointestinal cancer significantly increased during the past decade. Therefore, monitoring the trends of cancer incidence can assist efforts for cancer prevention and control. PMID:27923268

  19. Use of genetic programming, logistic regression, and artificial neural nets to predict readmission after coronary artery bypass surgery.

    PubMed

    Engoren, Milo; Habib, Robert H; Dooner, John J; Schwann, Thomas A

    2013-08-01

    As many as 14 % of patients undergoing coronary artery bypass surgery are readmitted within 30 days. Readmission is usually the result of morbidity and may lead to death. The purpose of this study is to develop and compare statistical and genetic programming models to predict readmission. Patients were divided into separate Construction and Validation populations. Using 88 variables, logistic regression, genetic programs, and artificial neural nets were used to develop predictive models. Models were first constructed and tested on the Construction populations, then validated on the Validation population. Areas under the receiver operator characteristic curves (AU ROC) were used to compare the models. Two hundred and two patients (7.6 %) in the 2,644 patient Construction group and 216 (8.0 %) of the 2,711 patient Validation group were re-admitted within 30 days of CABG surgery. Logistic regression predicted readmission with AU ROC = .675 ± .021 in the Construction group. Genetic programs significantly improved the accuracy, AU ROC = .767 ± .001, p < .001). Artificial neural nets were less accurate with AU ROC = 0.597 ± .001 in the Construction group. Predictive accuracy of all three techniques fell in the Validation group. However, the accuracy of genetic programming (AU ROC = .654 ± .001) was still trivially but statistically non-significantly better than that of the logistic regression (AU ROC = .644 ± .020, p = .61). Genetic programming and logistic regression provide alternative methods to predict readmission that are similarly accurate.

  20. Regression modeling of ground-water flow

    USGS Publications Warehouse

    Cooley, R.L.; Naff, R.L.

    1985-01-01

    Nonlinear multiple regression methods are developed to model and analyze groundwater flow systems. Complete descriptions of regression methodology as applied to groundwater flow models allow scientists and engineers engaged in flow modeling to apply the methods to a wide range of problems. Organization of the text proceeds from an introduction that discusses the general topic of groundwater flow modeling, to a review of basic statistics necessary to properly apply regression techniques, and then to the main topic: exposition and use of linear and nonlinear regression to model groundwater flow. Statistical procedures are given to analyze and use the regression models. A number of exercises and answers are included to exercise the student on nearly all the methods that are presented for modeling and statistical analysis. Three computer programs implement the more complex methods. These three are a general two-dimensional, steady-state regression model for flow in an anisotropic, heterogeneous porous medium, a program to calculate a measure of model nonlinearity with respect to the regression parameters, and a program to analyze model errors in computed dependent variables such as hydraulic head. (USGS)

  1. SPSS and SAS programs for comparing Pearson correlations and OLS regression coefficients.

    PubMed

    Weaver, Bruce; Wuensch, Karl L

    2013-09-01

    Several procedures that use summary data to test hypotheses about Pearson correlations and ordinary least squares regression coefficients have been described in various books and articles. To our knowledge, however, no single resource describes all of the most common tests. Furthermore, many of these tests have not yet been implemented in popular statistical software packages such as SPSS and SAS. In this article, we describe all of the most common tests and provide SPSS and SAS programs to perform them. When they are applicable, our code also computes 100 × (1 - α)% confidence intervals corresponding to the tests. For testing hypotheses about independent regression coefficients, we demonstrate one method that uses summary data and another that uses raw data (i.e., Potthoff analysis). When the raw data are available, the latter method is preferred, because use of summary data entails some loss of precision due to rounding.

  2. Study of colorectal mortality in the Andalusian population.

    PubMed

    Cayuela, A; Rodríguez-Domínguez, S; Garzón-Benavides, M; Pizarro-Moreno, A; Giráldez-Gallego, A; Cordero-Fernández, C

    2011-06-01

    to provide up-to-date information and to analyze recent changes in colorectal cancer mortality trends in Andalusia during the period of 1980-2008 using joinpoint regression models. age- and sex-specific colorectal cancer deaths were taken from the official vital statistics published by the Instituto de Estadística de Andalucía for the years 1980 to 2008. We computed age-specific rates for each 5-year age group and calendar year and age-standardized mortality rates per 100,000 men and women. A joinpoint regression analysis was used for trend analysis of standardized rates. Joinpoint regression analysis was used to identify the years when a significant change in the linear slope of the temporal trend occurred. The best fitting points (the "join-points") are chosen where the rate significantly changes. mortality from colorectal cancer in Andalusia during the period studied has increased, from 277 deaths in 1980 to 1,227 in 2008 in men, and from 333 to 805 deaths in women. Adjusted overall colorectal cancer mortality rates increased from 7.7 to 17.0 deaths per 100,000 person-years in men and from 6.6 to 9.0 per 100,000 person-years in women Changes in mortality did not evolve similarly for men and women. Age-specific CRC mortality rates are lower in women than in men, which imply that women reach comparable levels of colorectal cancer mortality at higher ages than men. sex differences for colorectal cancer mortality have been widening in the last decade in Andalusia. In spite of the decreasing trends in age-adjusted mortality rates in women, incidence rates and the absolute numbers of deaths are still increasing, largely because of the aging of the population. Consequently, colorectal cancer still has a large impact on health care services, and this impact will continue to increase for many more years.

  3. Predicting county-level cancer incidence rates and counts in the United States

    PubMed Central

    Yu, Binbing

    2018-01-01

    Many countries, including the United States, publish predicted numbers of cancer incidence and death in current and future years for the whole country. These predictions provide important information on the cancer burden for cancer control planners, policymakers and the general public. Based on evidence from several empirical studies, the joinpoint (segmented-line linear regression) model has been adopted by the American Cancer Society to estimate the number of new cancer cases in the United States and in individual states since 2007. Recently, cancer incidence in smaller geographic regions such as counties and FIPS code regions is of increasing interest by local policymakers. The natural extension is to directly apply the joinpoint model to county-level cancer incidence data. The direct application has several drawbacks and its performance has not been evaluated. To address the concerns, we developed a spatial random-effects joinpoint model for county-level cancer incidence data. The proposed model was used to predict both cancer incidence rates and counts at the county level. The standard joinpoint model and the proposed method were compared through a validation study. The proposed method out-performed the standard joinpoint model for almost all cancer sites, especially for moderate or rare cancer sites and for counties with small population sizes. As an application, we predicted county-level prostate cancer incidence rates and counts for the year 2011 in Connecticut. PMID:23670947

  4. Impact of fecal immunochemical test-based screening programs on proximal and distal colorectal cancer surgery rates: A natural multiple-baseline experiment.

    PubMed

    Fedeli, Ugo; Zorzi, Manuel; Urso, Emanuele D L; Gennaro, Nicola; Dei Tos, Angelo P; Saugo, Mario

    2015-11-15

    Colorectal cancer (CRC) screening programs based on the fecal immunochemical test (FIT) were found to reduce overall CRC surgery rates, but to the authors' knowledge data by subsite are lacking. The objective of the current study was to assess the impact of FIT-based screening on proximal and distal CRC surgical resection rates. The Veneto region in Italy can be subdivided into 3 areas with staggered introduction of FIT-based screening programs: early (2002-2004), intermediate (2005-2007), and late (2008-2009) areas. Time series of proximal and distal CRC surgery were investigated in the 3 populations between 2001 and 2012 by Joinpoint regression analysis and segmented Poisson regression models. The impact of screening was similar in the study populations. Rates of distal CRC surgical resection were stable before screening, increased at the time of screening implementation (rate ratio [RR], 1.25; 95% confidence interval [95% CI], 1.14-1.37), and thereafter declined by 10% annually (RR, 0.90; 95% CI, 0.88-0.92). Rates of proximal CRC surgical resection increased by 4% annually before screening (RR, 1.04; 95% CI, 1.03-1.05) but, after a peak at the time of screening initiation, the trend was reversed. The percentage represented by proximal CRC surgery rose from 28% in 2001 to 41% in 2012. In this natural multiple-baseline experiment, consistent findings across each time series demonstrated that FIT-based screening programs have an impact both on proximal and distal CRC surgery rates. However, underlying preexisting epidemiological trends are leading to a rapidly increasing percentage of proximal CRC. © 2015 American Cancer Society.

  5. Evaluation of Visual Field Progression in Glaucoma: Quasar Regression Program and Event Analysis.

    PubMed

    Díaz-Alemán, Valentín T; González-Hernández, Marta; Perera-Sanz, Daniel; Armas-Domínguez, Karintia

    2016-01-01

    To determine the sensitivity, specificity and agreement between the Quasar program, glaucoma progression analysis (GPA II) event analysis and expert opinion in the detection of glaucomatous progression. The Quasar program is based on linear regression analysis of both mean defect (MD) and pattern standard deviation (PSD). Each series of visual fields was evaluated by three methods; Quasar, GPA II and four experts. The sensitivity, specificity and agreement (kappa) for each method was calculated, using expert opinion as the reference standard. The study included 439 SITA Standard visual fields of 56 eyes of 42 patients, with a mean of 7.8 ± 0.8 visual fields per eye. When suspected cases of progression were considered stable, sensitivity and specificity of Quasar, GPA II and the experts were 86.6% and 70.7%, 26.6% and 95.1%, and 86.6% and 92.6% respectively. When suspected cases of progression were considered as progressing, sensitivity and specificity of Quasar, GPA II and the experts were 79.1% and 81.2%, 45.8% and 90.6%, and 85.4% and 90.6% respectively. The agreement between Quasar and GPA II when suspected cases were considered stable or progressing was 0.03 and 0.28 respectively. The degree of agreement between Quasar and the experts when suspected cases were considered stable or progressing was 0.472 and 0.507. The degree of agreement between GPA II and the experts when suspected cases were considered stable or progressing was 0.262 and 0.342. The combination of MD and PSD regression analysis in the Quasar program showed better agreement with the experts and higher sensitivity than GPA II.

  6. Multiple linear regression analysis

    NASA Technical Reports Server (NTRS)

    Edwards, T. R.

    1980-01-01

    Program rapidly selects best-suited set of coefficients. User supplies only vectors of independent and dependent data and specifies confidence level required. Program uses stepwise statistical procedure for relating minimal set of variables to set of observations; final regression contains only most statistically significant coefficients. Program is written in FORTRAN IV for batch execution and has been implemented on NOVA 1200.

  7. Estimating the Counterfactual Impact of Conservation Programs on Land Cover Outcomes: The Role of Matching and Panel Regression Techniques

    PubMed Central

    Jones, Kelly W.; Lewis, David J.

    2015-01-01

    Deforestation and conversion of native habitats continues to be the leading driver of biodiversity and ecosystem service loss. A number of conservation policies and programs are implemented—from protected areas to payments for ecosystem services (PES)—to deter these losses. Currently, empirical evidence on whether these approaches stop or slow land cover change is lacking, but there is increasing interest in conducting rigorous, counterfactual impact evaluations, especially for many new conservation approaches, such as PES and REDD, which emphasize additionality. In addition, several new, globally available and free high-resolution remote sensing datasets have increased the ease of carrying out an impact evaluation on land cover change outcomes. While the number of conservation evaluations utilizing ‘matching’ to construct a valid control group is increasing, the majority of these studies use simple differences in means or linear cross-sectional regression to estimate the impact of the conservation program using this matched sample, with relatively few utilizing fixed effects panel methods—an alternative estimation method that relies on temporal variation in the data. In this paper we compare the advantages and limitations of (1) matching to construct the control group combined with differences in means and cross-sectional regression, which control for observable forms of bias in program evaluation, to (2) fixed effects panel methods, which control for observable and time-invariant unobservable forms of bias, with and without matching to create the control group. We then use these four approaches to estimate forest cover outcomes for two conservation programs: a PES program in Northeastern Ecuador and strict protected areas in European Russia. In the Russia case we find statistically significant differences across estimators—due to the presence of unobservable bias—that lead to differences in conclusions about effectiveness. The Ecuador case

  8. Estimating the Counterfactual Impact of Conservation Programs on Land Cover Outcomes: The Role of Matching and Panel Regression Techniques.

    PubMed

    Jones, Kelly W; Lewis, David J

    2015-01-01

    Deforestation and conversion of native habitats continues to be the leading driver of biodiversity and ecosystem service loss. A number of conservation policies and programs are implemented--from protected areas to payments for ecosystem services (PES)--to deter these losses. Currently, empirical evidence on whether these approaches stop or slow land cover change is lacking, but there is increasing interest in conducting rigorous, counterfactual impact evaluations, especially for many new conservation approaches, such as PES and REDD, which emphasize additionality. In addition, several new, globally available and free high-resolution remote sensing datasets have increased the ease of carrying out an impact evaluation on land cover change outcomes. While the number of conservation evaluations utilizing 'matching' to construct a valid control group is increasing, the majority of these studies use simple differences in means or linear cross-sectional regression to estimate the impact of the conservation program using this matched sample, with relatively few utilizing fixed effects panel methods--an alternative estimation method that relies on temporal variation in the data. In this paper we compare the advantages and limitations of (1) matching to construct the control group combined with differences in means and cross-sectional regression, which control for observable forms of bias in program evaluation, to (2) fixed effects panel methods, which control for observable and time-invariant unobservable forms of bias, with and without matching to create the control group. We then use these four approaches to estimate forest cover outcomes for two conservation programs: a PES program in Northeastern Ecuador and strict protected areas in European Russia. In the Russia case we find statistically significant differences across estimators--due to the presence of unobservable bias--that lead to differences in conclusions about effectiveness. The Ecuador case illustrates that

  9. Increasing Rates of Brain Tumours in the Swedish National Inpatient Register and the Causes of Death Register

    PubMed Central

    Hardell, Lennart; Carlberg, Michael

    2015-01-01

    Radiofrequency emissions in the frequency range 30 kHz–300 GHz were evaluated to be Group 2B, i.e., “possibly”, carcinogenic to humans by the International Agency for Research on Cancer (IARC) at WHO in May 2011. The Swedish Cancer Register has not shown increasing incidence of brain tumours in recent years and has been used to dismiss epidemiological evidence on a risk. In this study we used the Swedish National Inpatient Register (IPR) and Causes of Death Register (CDR) to further study the incidence comparing with the Cancer Register data for the time period 1998–2013 using joinpoint regression analysis. In the IPR we found a joinpoint in 2007 with Annual Percentage Change (APC) +4.25%, 95% CI +1.98, +6.57% during 2007–2013 for tumours of unknown type in the brain or CNS. In the CDR joinpoint regression found one joinpoint in 2008 with APC during 2008–2013 +22.60%, 95% CI +9.68, +37.03%. These tumour diagnoses would be based on clinical examination, mainly CT and/or MRI, but without histopathology or cytology. No statistically significant increasing incidence was found in the Swedish Cancer Register during these years. We postulate that a large part of brain tumours of unknown type are never reported to the Cancer Register. Furthermore, the frequency of diagnosis based on autopsy has declined substantially due to a general decline of autopsies in Sweden adding further to missing cases. We conclude that the Swedish Cancer Register is not reliable to be used to dismiss results in epidemiological studies on the use of wireless phones and brain tumour risk. PMID:25854296

  10. Increasing rates of brain tumours in the Swedish national inpatient register and the causes of death register.

    PubMed

    Hardell, Lennart; Carlberg, Michael

    2015-04-03

    Radiofrequency emissions in the frequency range 30 kHz-300 GHz were evaluated to be Group 2B, i.e., "possibly", carcinogenic to humans by the International Agency for Research on Cancer (IARC) at WHO in May 2011. The Swedish Cancer Register has not shown increasing incidence of brain tumours in recent years and has been used to dismiss epidemiological evidence on a risk. In this study we used the Swedish National Inpatient Register (IPR) and Causes of Death Register (CDR) to further study the incidence comparing with the Cancer Register data for the time period 1998-2013 using joinpoint regression analysis. In the IPR we found a joinpoint in 2007 with Annual Percentage Change (APC) +4.25%, 95% CI +1.98, +6.57% during 2007-2013 for tumours of unknown type in the brain or CNS. In the CDR joinpoint regression found one joinpoint in 2008 with APC during 2008-2013 +22.60%, 95% CI +9.68, +37.03%. These tumour diagnoses would be based on clinical examination, mainly CT and/or MRI, but without histopathology or cytology. No statistically significant increasing incidence was found in the Swedish Cancer Register during these years. We postulate that a large part of brain tumours of unknown type are never reported to the Cancer Register. Furthermore, the frequency of diagnosis based on autopsy has declined substantially due to a general decline of autopsies in Sweden adding further to missing cases. We conclude that the Swedish Cancer Register is not reliable to be used to dismiss results in epidemiological studies on the use of wireless phones and brain tumour risk.

  11. WALLY 1 ...A large, principal components regression program with varimax rotation of the factor weight matrix

    Treesearch

    James R. Wallis

    1965-01-01

    Written in Fortran IV and MAP, this computer program can handle up to 120 variables, and retain 40 principal components. It can perform simultaneous regression of up to 40 criterion variables upon the varimax rotated factor weight matrix. The columns and rows of all output matrices are labeled by six-character alphanumeric names. Data input can be from punch cards or...

  12. FIRE: an SPSS program for variable selection in multiple linear regression analysis via the relative importance of predictors.

    PubMed

    Lorenzo-Seva, Urbano; Ferrando, Pere J

    2011-03-01

    We provide an SPSS program that implements currently recommended techniques and recent developments for selecting variables in multiple linear regression analysis via the relative importance of predictors. The approach consists of: (1) optimally splitting the data for cross-validation, (2) selecting the final set of predictors to be retained in the equation regression, and (3) assessing the behavior of the chosen model using standard indices and procedures. The SPSS syntax, a short manual, and data files related to this article are available as supplemental materials from brm.psychonomic-journals.org/content/supplemental.

  13. The TUNEL assay suggests mandibular regression by programmed cell death during presoldier differentiation in the nasute termite Nasutitermes takasagoensis

    NASA Astrophysics Data System (ADS)

    Toga, Kouhei; Yoda, Shinichi; Maekawa, Kiyoto

    2011-09-01

    Termite soldiers are the most specialized caste of social insects in terms of their morphology and function. Soldier development requires increased juvenile hormone (JH) titer and the two molts via a presoldier stage. These molts are accompanied by dramatic morphological changes, including the exaggeration and regression of certain organs. Soldiers of the most apical termitid subfamily Nasutitermitinae possess not only a horn-like frontal tube, called the nasus, for the projection of defensive chemicals from the frontal gland reservoir but also regressed mandibles. Although candidate genes regulating soldier mandibular growth were reported in a relatively basal termite species, the regulatory mechanisms of mandibular regression remain unknown. To clarify these mechanisms, we performed morphological and histological examinations of the mandibles during soldier differentiation in Nasutitermes takasagoensis. Mandibular size reduced dramatically during soldier differentiation, and mandibular regression occurred just prior to the presoldier molt. Spotted TUNEL signals were observed in regressing mandibles of presoldiers, suggesting that the regression involved programmed cell death. Because soldiers of N. takasagoensis possess exaggerated organs (nasus and frontal gland), the present results suggest that JH-dependent regressive mechanisms exist in the mandibles without interfering with the formation of the exaggerated organs.

  14. Kendall-Theil Robust Line (KTRLine--version 1.0)-A Visual Basic Program for Calculating and Graphing Robust Nonparametric Estimates of Linear-Regression Coefficients Between Two Continuous Variables

    USGS Publications Warehouse

    Granato, Gregory E.

    2006-01-01

    The Kendall-Theil Robust Line software (KTRLine-version 1.0) is a Visual Basic program that may be used with the Microsoft Windows operating system to calculate parameters for robust, nonparametric estimates of linear-regression coefficients between two continuous variables. The KTRLine software was developed by the U.S. Geological Survey, in cooperation with the Federal Highway Administration, for use in stochastic data modeling with local, regional, and national hydrologic data sets to develop planning-level estimates of potential effects of highway runoff on the quality of receiving waters. The Kendall-Theil robust line was selected because this robust nonparametric method is resistant to the effects of outliers and nonnormality in residuals that commonly characterize hydrologic data sets. The slope of the line is calculated as the median of all possible pairwise slopes between points. The intercept is calculated so that the line will run through the median of input data. A single-line model or a multisegment model may be specified. The program was developed to provide regression equations with an error component for stochastic data generation because nonparametric multisegment regression tools are not available with the software that is commonly used to develop regression models. The Kendall-Theil robust line is a median line and, therefore, may underestimate total mass, volume, or loads unless the error component or a bias correction factor is incorporated into the estimate. Regression statistics such as the median error, the median absolute deviation, the prediction error sum of squares, the root mean square error, the confidence interval for the slope, and the bias correction factor for median estimates are calculated by use of nonparametric methods. These statistics, however, may be used to formulate estimates of mass, volume, or total loads. The program is used to read a two- or three-column tab-delimited input file with variable names in the first row and

  15. Childhood Cancer Incidence Trends in Association With US Folic Acid Fortification (1986–2008)

    PubMed Central

    Johnson, Kimberly J.; Ross, Julie A.

    2012-01-01

    OBJECTIVE: Epidemiologic evidence indicates that prenatal vitamin supplementation reduces risk for some childhood cancers; however, a systematic evaluation of population-based childhood cancer incidence trends after fortification of enriched grain products with folic acid in the United States in 1996–1998 has not been previously reported. Here we describe temporal trends in childhood cancer incidence in association with US folic acid fortification. METHODS: Using Surveillance, Epidemiology, and End Results program data (1986–2008), we calculated incidence rate ratios and 95% confidence intervals to compare pre- and postfortification cancer incidence rates in children aged 0 to 4 years. Incidence trends were also evaluated by using joinpoint and loess regression models. RESULTS: From 1986 through 2008, 8829 children aged 0 to 4 years were diagnosed with malignancies, including 3790 and 3299 in utero during the pre- and postfortification periods, respectively. Pre- and postfortification incidence rates were similar for all cancers combined and for most specific cancer types. Rates of Wilms tumor (WT), primitive neuroectodermal tumors (PNETs), and ependymomas were significantly lower postfortification. Joinpoint regression models detected increasing WT incidence from 1986 through 1997 followed by a sizable decline from 1997 through 2008, and increasing PNET incidence from 1986 through 1993 followed by a sharp decrease from 1993 through 2008. Loess curves indicated similar patterns. CONCLUSIONS: These results provide support for a decrease in WT and possibly PNET incidence, but not other childhood cancers, after US folic acid fortification. PMID:22614769

  16. Childhood cancer incidence trends in association with US folic acid fortification (1986-2008).

    PubMed

    Linabery, Amy M; Johnson, Kimberly J; Ross, Julie A

    2012-06-01

    Epidemiologic evidence indicates that prenatal vitamin supplementation reduces risk for some childhood cancers; however, a systematic evaluation of population-based childhood cancer incidence trends after fortification of enriched grain products with folic acid in the United States in 1996-1998 has not been previously reported. Here we describe temporal trends in childhood cancer incidence in association with US folic acid fortification. Using Surveillance, Epidemiology, and End Results program data (1986-2008), we calculated incidence rate ratios and 95% confidence intervals to compare pre- and postfortification cancer incidence rates in children aged 0 to 4 years. Incidence trends were also evaluated by using joinpoint and loess regression models. From 1986 through 2008, 8829 children aged 0 to 4 years were diagnosed with malignancies, including 3790 and 3299 in utero during the pre- and postfortification periods, respectively. Pre- and postfortification incidence rates were similar for all cancers combined and for most specific cancer types. Rates of Wilms tumor (WT), primitive neuroectodermal tumors (PNETs), and ependymomas were significantly lower postfortification. Joinpoint regression models detected increasing WT incidence from 1986 through 1997 followed by a sizable decline from 1997 through 2008, and increasing PNET incidence from 1986 through 1993 followed by a sharp decrease from 1993 through 2008. Loess curves indicated similar patterns. These results provide support for a decrease in WT and possibly PNET incidence, but not other childhood cancers, after US folic acid fortification.

  17. Automating approximate Bayesian computation by local linear regression.

    PubMed

    Thornton, Kevin R

    2009-07-07

    In several biological contexts, parameter inference often relies on computationally-intensive techniques. "Approximate Bayesian Computation", or ABC, methods based on summary statistics have become increasingly popular. A particular flavor of ABC based on using a linear regression to approximate the posterior distribution of the parameters, conditional on the summary statistics, is computationally appealing, yet no standalone tool exists to automate the procedure. Here, I describe a program to implement the method. The software package ABCreg implements the local linear-regression approach to ABC. The advantages are: 1. The code is standalone, and fully-documented. 2. The program will automatically process multiple data sets, and create unique output files for each (which may be processed immediately in R), facilitating the testing of inference procedures on simulated data, or the analysis of multiple data sets. 3. The program implements two different transformation methods for the regression step. 4. Analysis options are controlled on the command line by the user, and the program is designed to output warnings for cases where the regression fails. 5. The program does not depend on any particular simulation machinery (coalescent, forward-time, etc.), and therefore is a general tool for processing the results from any simulation. 6. The code is open-source, and modular.Examples of applying the software to empirical data from Drosophila melanogaster, and testing the procedure on simulated data, are shown. In practice, the ABCreg simplifies implementing ABC based on local-linear regression.

  18. Ca analysis: An Excel based program for the analysis of intracellular calcium transients including multiple, simultaneous regression analysis☆

    PubMed Central

    Greensmith, David J.

    2014-01-01

    Here I present an Excel based program for the analysis of intracellular Ca transients recorded using fluorescent indicators. The program can perform all the necessary steps which convert recorded raw voltage changes into meaningful physiological information. The program performs two fundamental processes. (1) It can prepare the raw signal by several methods. (2) It can then be used to analyze the prepared data to provide information such as absolute intracellular Ca levels. Also, the rates of change of Ca can be measured using multiple, simultaneous regression analysis. I demonstrate that this program performs equally well as commercially available software, but has numerous advantages, namely creating a simplified, self-contained analysis workflow. PMID:24125908

  19. Increasing thyroid cancer incidence in Lithuania in 1978-2003.

    PubMed

    Smailyte, Giedre; Miseikyte-Kaubriene, Edita; Kurtinaitis, Juozas

    2006-12-11

    The aim of this paper is to analyze changes in thyroid cancer incidence trends in Lithuania during the period 1978-2003 using joinpoint regression models, with special attention to the period 1993-2003. The study was based on all cases of thyroid cancer reported to the Lithuanian Cancer Registry between 1978 and 2003. Age group-specific rates and standardized rates were calculated for each gender, using the direct method (world standard population). The joinpoint regression model was used to provide estimated annual percentage change and to detect points in time where significant changes in the trends occur. During the study period the age-standardized incidence rates increased in males from 0.7 to 2.5 cases per 100,000 and in females from 1.5 to 11.4 per 100,000. Annual percentage changes during this period in the age-standardized rates were 4.6% and 7.1% for males and females, respectively. Joinpoint analysis showed two time periods with joinpoint in the year 2000. A change in the trend occurred in which a significant increase changed to a dramatic increase in thyroid cancer incidence rates. Papillary carcinoma and stage I thyroid cancer increases over this period were mainly responsible for the pattern of changes in trend in recent years. A moderate increase in thyroid cancer incidence has been observed in Lithuania between the years 1978 and 2000. An accelerated increase in thyroid cancer incidence rates took place in the period 2000-2003. It seems that the increase in thyroid cancer incidence can be attributed mainly to the changes in the management of non palpable thyroid nodules with growing applications of ultrasound-guided fine needle aspiration biopsy in clinical practice.

  20. Cost-effectiveness analysis of the diarrhea alleviation through zinc and oral rehydration therapy (DAZT) program in rural Gujarat India: an application of the net-benefit regression framework.

    PubMed

    Shillcutt, Samuel D; LeFevre, Amnesty E; Fischer-Walker, Christa L; Taneja, Sunita; Black, Robert E; Mazumder, Sarmila

    2017-01-01

    This study evaluates the cost-effectiveness of the DAZT program for scaling up treatment of acute child diarrhea in Gujarat India using a net-benefit regression framework. Costs were calculated from societal and caregivers' perspectives and effectiveness was assessed in terms of coverage of zinc and both zinc and Oral Rehydration Salt. Regression models were tested in simple linear regression, with a specified set of covariates, and with a specified set of covariates and interaction terms using linear regression with endogenous treatment effects was used as the reference case. The DAZT program was cost-effective with over 95% certainty above $5.50 and $7.50 per appropriately treated child in the unadjusted and adjusted models respectively, with specifications including interaction terms being cost-effective with 85-97% certainty. Findings from this study should be combined with other evidence when considering decisions to scale up programs such as the DAZT program to promote the use of ORS and zinc to treat child diarrhea.

  1. A Practical Guide to Regression Discontinuity

    ERIC Educational Resources Information Center

    Jacob, Robin; Zhu, Pei; Somers, Marie-Andrée; Bloom, Howard

    2012-01-01

    Regression discontinuity (RD) analysis is a rigorous nonexperimental approach that can be used to estimate program impacts in situations in which candidates are selected for treatment based on whether their value for a numeric rating exceeds a designated threshold or cut-point. Over the last two decades, the regression discontinuity approach has…

  2. Ca analysis: an Excel based program for the analysis of intracellular calcium transients including multiple, simultaneous regression analysis.

    PubMed

    Greensmith, David J

    2014-01-01

    Here I present an Excel based program for the analysis of intracellular Ca transients recorded using fluorescent indicators. The program can perform all the necessary steps which convert recorded raw voltage changes into meaningful physiological information. The program performs two fundamental processes. (1) It can prepare the raw signal by several methods. (2) It can then be used to analyze the prepared data to provide information such as absolute intracellular Ca levels. Also, the rates of change of Ca can be measured using multiple, simultaneous regression analysis. I demonstrate that this program performs equally well as commercially available software, but has numerous advantages, namely creating a simplified, self-contained analysis workflow. Copyright © 2013 The Author. Published by Elsevier Ireland Ltd.. All rights reserved.

  3. Methods for identifying SNP interactions: a review on variations of Logic Regression, Random Forest and Bayesian logistic regression.

    PubMed

    Chen, Carla Chia-Ming; Schwender, Holger; Keith, Jonathan; Nunkesser, Robin; Mengersen, Kerrie; Macrossan, Paula

    2011-01-01

    Due to advancements in computational ability, enhanced technology and a reduction in the price of genotyping, more data are being generated for understanding genetic associations with diseases and disorders. However, with the availability of large data sets comes the inherent challenges of new methods of statistical analysis and modeling. Considering a complex phenotype may be the effect of a combination of multiple loci, various statistical methods have been developed for identifying genetic epistasis effects. Among these methods, logic regression (LR) is an intriguing approach incorporating tree-like structures. Various methods have built on the original LR to improve different aspects of the model. In this study, we review four variations of LR, namely Logic Feature Selection, Monte Carlo Logic Regression, Genetic Programming for Association Studies, and Modified Logic Regression-Gene Expression Programming, and investigate the performance of each method using simulated and real genotype data. We contrast these with another tree-like approach, namely Random Forests, and a Bayesian logistic regression with stochastic search variable selection.

  4. Using Multiple and Logistic Regression to Estimate the Median WillCost and Probability of Cost and Schedule Overrun for Program Managers

    DTIC Science & Technology

    2017-03-23

    PUBLIC RELEASE; DISTRIBUTION UNLIMITED Using Multiple and Logistic Regression to Estimate the Median Will- Cost and Probability of Cost and... Cost and Probability of Cost and Schedule Overrun for Program Managers Ryan C. Trudelle Follow this and additional works at: https://scholar.afit.edu...afit.edu. Recommended Citation Trudelle, Ryan C., "Using Multiple and Logistic Regression to Estimate the Median Will- Cost and Probability of Cost and

  5. Prediction of unwanted pregnancies using logistic regression, probit regression and discriminant analysis

    PubMed Central

    Ebrahimzadeh, Farzad; Hajizadeh, Ebrahim; Vahabi, Nasim; Almasian, Mohammad; Bakhteyar, Katayoon

    2015-01-01

    Background: Unwanted pregnancy not intended by at least one of the parents has undesirable consequences for the family and the society. In the present study, three classification models were used and compared to predict unwanted pregnancies in an urban population. Methods: In this cross-sectional study, 887 pregnant mothers referring to health centers in Khorramabad, Iran, in 2012 were selected by the stratified and cluster sampling; relevant variables were measured and for prediction of unwanted pregnancy, logistic regression, discriminant analysis, and probit regression models and SPSS software version 21 were used. To compare these models, indicators such as sensitivity, specificity, the area under the ROC curve, and the percentage of correct predictions were used. Results: The prevalence of unwanted pregnancies was 25.3%. The logistic and probit regression models indicated that parity and pregnancy spacing, contraceptive methods, household income and number of living male children were related to unwanted pregnancy. The performance of the models based on the area under the ROC curve was 0.735, 0.733, and 0.680 for logistic regression, probit regression, and linear discriminant analysis, respectively. Conclusion: Given the relatively high prevalence of unwanted pregnancies in Khorramabad, it seems necessary to revise family planning programs. Despite the similar accuracy of the models, if the researcher is interested in the interpretability of the results, the use of the logistic regression model is recommended. PMID:26793655

  6. Prediction of unwanted pregnancies using logistic regression, probit regression and discriminant analysis.

    PubMed

    Ebrahimzadeh, Farzad; Hajizadeh, Ebrahim; Vahabi, Nasim; Almasian, Mohammad; Bakhteyar, Katayoon

    2015-01-01

    Unwanted pregnancy not intended by at least one of the parents has undesirable consequences for the family and the society. In the present study, three classification models were used and compared to predict unwanted pregnancies in an urban population. In this cross-sectional study, 887 pregnant mothers referring to health centers in Khorramabad, Iran, in 2012 were selected by the stratified and cluster sampling; relevant variables were measured and for prediction of unwanted pregnancy, logistic regression, discriminant analysis, and probit regression models and SPSS software version 21 were used. To compare these models, indicators such as sensitivity, specificity, the area under the ROC curve, and the percentage of correct predictions were used. The prevalence of unwanted pregnancies was 25.3%. The logistic and probit regression models indicated that parity and pregnancy spacing, contraceptive methods, household income and number of living male children were related to unwanted pregnancy. The performance of the models based on the area under the ROC curve was 0.735, 0.733, and 0.680 for logistic regression, probit regression, and linear discriminant analysis, respectively. Given the relatively high prevalence of unwanted pregnancies in Khorramabad, it seems necessary to revise family planning programs. Despite the similar accuracy of the models, if the researcher is interested in the interpretability of the results, the use of the logistic regression model is recommended.

  7. Method for nonlinear exponential regression analysis

    NASA Technical Reports Server (NTRS)

    Junkin, B. G.

    1972-01-01

    Two computer programs developed according to two general types of exponential models for conducting nonlinear exponential regression analysis are described. Least squares procedure is used in which the nonlinear problem is linearized by expanding in a Taylor series. Program is written in FORTRAN 5 for the Univac 1108 computer.

  8. Applied Multiple Linear Regression: A General Research Strategy

    ERIC Educational Resources Information Center

    Smith, Brandon B.

    1969-01-01

    Illustrates some of the basic concepts and procedures for using regression analysis in experimental design, analysis of variance, analysis of covariance, and curvilinear regression. Applications to evaluation of instruction and vocational education programs are illustrated. (GR)

  9. The Norwegian Healthier Goats program--modeling lactation curves using a multilevel cubic spline regression model.

    PubMed

    Nagel-Alne, G E; Krontveit, R; Bohlin, J; Valle, P S; Skjerve, E; Sølverød, L S

    2014-07-01

    In 2001, the Norwegian Goat Health Service initiated the Healthier Goats program (HG), with the aim of eradicating caprine arthritis encephalitis, caseous lymphadenitis, and Johne's disease (caprine paratuberculosis) in Norwegian goat herds. The aim of the present study was to explore how control and eradication of the above-mentioned diseases by enrolling in HG affected milk yield by comparison with herds not enrolled in HG. Lactation curves were modeled using a multilevel cubic spline regression model where farm, goat, and lactation were included as random effect parameters. The data material contained 135,446 registrations of daily milk yield from 28,829 lactations in 43 herds. The multilevel cubic spline regression model was applied to 4 categories of data: enrolled early, control early, enrolled late, and control late. For enrolled herds, the early and late notations refer to the situation before and after enrolling in HG; for nonenrolled herds (controls), they refer to development over time, independent of HG. Total milk yield increased in the enrolled herds after eradication: the total milk yields in the fourth lactation were 634.2 and 873.3 kg in enrolled early and enrolled late herds, respectively, and 613.2 and 701.4 kg in the control early and control late herds, respectively. Day of peak yield differed between enrolled and control herds. The day of peak yield came on d 6 of lactation for the control early category for parities 2, 3, and 4, indicating an inability of the goats to further increase their milk yield from the initial level. For enrolled herds, on the other hand, peak yield came between d 49 and 56, indicating a gradual increase in milk yield after kidding. Our results indicate that enrollment in the HG disease eradication program improved the milk yield of dairy goats considerably, and that the multilevel cubic spline regression was a suitable model for exploring effects of disease control and eradication on milk yield. Copyright © 2014

  10. Time Trend Analysis of Cancer‏ Incidence in Caspian Sea, 2004 - 2009: A Population-based Cancer Registries Study (northern Iran).

    PubMed

    Salehiniya, Hamid; Ghobadi Dashdebi, Sakineh; Rafiemanesh, Hosein; Mohammadian-Hafshejani, Abdollah; Enayatrad, Mostafa

    2016-01-01

    Cancer is a major public health problem in the world. In Iran especially after a transition to a dynamic and urban community, the pattern of cancer has changed significantly. An important change occurred regarding the incidence of cancer at the southern shores of the Caspian Sea, including Gilan, Mazandaran and Golestan province. This study was designed it investigate the epidemiology and changes in trend of cancer incidence in the geographic region of the Caspian Sea (North of Iran). Data were collected from Cancer Registry Center report of Iran health deputy. Trends of incidence were analyzed by joinpoint regression analysis. During the study period year (2004-2009), 33,807 cases of cancer had been recorded in three provinces of Gilan, Mazandran and Golstan. Joinpoint analysis indicated a significant increase in age-standardized incidence rates (ASR) with an average annual percentage change (AAPC) 10.3, 8.5 and 5.2 in Gilan, Mazandaran and Golestan, respectively. The most common cancer in these provinces were correspondingly cancer of stomach, breast, skin, colorectal and bladder, respectively. The incidence of cancer tends to be increasing in North of Iran. These findings warrant the epidemiologic studies are helpful in planning preventive programs and recognition of risk factors.

  11. The microcomputer scientific software series 2: general linear model--regression.

    Treesearch

    Harold M. Rauscher

    1983-01-01

    The general linear model regression (GLMR) program provides the microcomputer user with a sophisticated regression analysis capability. The output provides a regression ANOVA table, estimators of the regression model coefficients, their confidence intervals, confidence intervals around the predicted Y-values, residuals for plotting, a check for multicollinearity, a...

  12. Introductory Linear Regression Programs in Undergraduate Chemistry.

    ERIC Educational Resources Information Center

    Gale, Robert J.

    1982-01-01

    Presented are simple programs in BASIC and FORTRAN to apply the method of least squares. They calculate gradients and intercepts and express errors as standard deviations. An introduction of undergraduate students to such programs in a chemistry class is reviewed, and issues instructors should be aware of are noted. (MP)

  13. Using regression equations built from summary data in the psychological assessment of the individual case: extension to multiple regression.

    PubMed

    Crawford, John R; Garthwaite, Paul H; Denham, Annie K; Chelune, Gordon J

    2012-12-01

    Regression equations have many useful roles in psychological assessment. Moreover, there is a large reservoir of published data that could be used to build regression equations; these equations could then be employed to test a wide variety of hypotheses concerning the functioning of individual cases. This resource is currently underused because (a) not all psychologists are aware that regression equations can be built not only from raw data but also using only basic summary data for a sample, and (b) the computations involved are tedious and prone to error. In an attempt to overcome these barriers, Crawford and Garthwaite (2007) provided methods to build and apply simple linear regression models using summary statistics as data. In the present study, we extend this work to set out the steps required to build multiple regression models from sample summary statistics and the further steps required to compute the associated statistics for drawing inferences concerning an individual case. We also develop, describe, and make available a computer program that implements these methods. Although there are caveats associated with the use of the methods, these need to be balanced against pragmatic considerations and against the alternative of either entirely ignoring a pertinent data set or using it informally to provide a clinical "guesstimate." Upgraded versions of earlier programs for regression in the single case are also provided; these add the point and interval estimates of effect size developed in the present article.

  14. A simple approach to lifetime learning in genetic programming-based symbolic regression.

    PubMed

    Azad, Raja Muhammad Atif; Ryan, Conor

    2014-01-01

    Genetic programming (GP) coarsely models natural evolution to evolve computer programs. Unlike in nature, where individuals can often improve their fitness through lifetime experience, the fitness of GP individuals generally does not change during their lifetime, and there is usually no opportunity to pass on acquired knowledge. This paper introduces the Chameleon system to address this discrepancy and augment GP with lifetime learning by adding a simple local search that operates by tuning the internal nodes of individuals. Although not the first attempt to combine local search with GP, its simplicity means that it is easy to understand and cheap to implement. A simple cache is added which leverages the local search to reduce the tuning cost to a small fraction of the expected cost, and we provide a theoretical upper limit on the maximum tuning expense given the average tree size of the population and show that this limit grows very conservatively as the average tree size of the population increases. We show that Chameleon uses available genetic material more efficiently by exploring more actively than with standard GP, and demonstrate that not only does Chameleon outperform standard GP (on both training and test data) over a number of symbolic regression type problems, it does so by producing smaller individuals and it works harmoniously with two other well-known extensions to GP, namely, linear scaling and a diversity-promoting tournament selection method.

  15. Spontaneous regression of neuroblastoma.

    PubMed

    Brodeur, Garrett M

    2018-05-01

    Neuroblastomas are characterized by heterogeneous clinical behavior, from spontaneous regression or differentiation into a benign ganglioneuroma, to relentless progression despite aggressive, multimodality therapy. Indeed, neuroblastoma is unique among human cancers in terms of its propensity to undergo spontaneous regression. The strongest evidence for this comes from the mass screening studies conducted in Japan, North America and Europe and it is most evident in infants with stage 4S disease. This propensity is associated with a pattern of genomic change characterized by whole chromosome gains rather than segmental chromosome changes but the mechanism(s) underlying spontaneous regression are currently a matter of speculation. There is evidence to support several possible mechanisms of spontaneous regression in neuroblastomas: (1) neurotrophin deprivation, (2) loss of telomerase activity, (3) humoral or cellular immunity and (4) alterations in epigenetic regulation and possibly other mechanisms. It is likely that a better understanding of the mechanisms of spontaneous regression will help to identify targeted therapeutic approaches for these tumors. The most easily targeted mechanism is the delayed activation of developmentally programmed cell death regulated by the tropomyosin receptor kinase A (TrkA) pathway. Pan-Trk inhibitors are currently in clinical trials and so Trk inhibition might be used as the first line of therapy in infants with biologically favorable tumors that require treatment. Alternative approaches consist of breaking immune tolerance to tumor antigens but approaches to telomere shortening or epigenetic regulation are not easily druggable. The different mechanisms of spontaneous neuroblastoma regression are reviewed here, along with possible therapeutic approaches.

  16. Trends in incidence of lung cancer in Croatia from 2001 to 2013: gender and regional differences

    PubMed Central

    Siroglavić, Katarina-Josipa; Polić Vižintin, Marina; Tripković, Ingrid; Šekerija, Mario; Kukulj, Suzana

    2017-01-01

    Aim To provide an overview of the lung cancer incidence trends in the City of Zagreb (Zagreb), Split-Dalmatia County (SDC), and Croatia in the period from 2001 to 2013. Method Incidence data were obtained from the Croatian National Cancer Registry. For calculating incidence rates per 100 000 population, we used population estimates for the period 2001-2013 from the Croatian Bureau of Statistics. Age-standardized rates of lung cancer incidence were calculated by the direct standardization method using the European Standard Population. To describe incidence trends, we used joinpoint regression analysis. Results Joinpoint analysis showed a statistically significant decrease in lung cancer incidence in men in all regions, with an annual percentage change (APC) of -2.2% for Croatia, 1.9% for Zagreb, and -2.0% for SDC. In women, joinpoint analysis showed a statistically significant increase in the incidence for Croatia, with APC of 1.4%, a statistically significant increase of 1.0% for Zagreb, and no significant change in trend for SDC. In both genders, joinpoint analysis showed a significant decrease in age-standardized incidence rates of lung cancer, with APC of -1.3% for Croatia, -1.1% for Zagreb, and -1.6% for SDC. Conclusion There was an increase in female lung cancer incidence rate and a decrease in male lung cancer incidence rate in Croatia in 2001-20013 period, with similar patterns observed in all the investigated regions. These results highlight the importance of smoking prevention and cessation policies, especially among women and young people. PMID:29094814

  17. Availability and capacity of substance abuse programs in correctional settings: A classification and regression tree analysis.

    PubMed

    Taxman, Faye S; Kitsantas, Panagiota

    2009-08-01

    OBJECTIVE TO BE ADDRESSED: The purpose of this study was to investigate the structural and organizational factors that contribute to the availability and increased capacity for substance abuse treatment programs in correctional settings. We used classification and regression tree statistical procedures to identify how multi-level data can explain the variability in availability and capacity of substance abuse treatment programs in jails and probation/parole offices. The data for this study combined the National Criminal Justice Treatment Practices (NCJTP) Survey and the 2000 Census. The NCJTP survey was a nationally representative sample of correctional administrators for jails and probation/parole agencies. The sample size included 295 substance abuse treatment programs that were classified according to the intensity of their services: high, medium, and low. The independent variables included jurisdictional-level structural variables, attributes of the correctional administrators, and program and service delivery characteristics of the correctional agency. The two most important variables in predicting the availability of all three types of services were stronger working relationships with other organizations and the adoption of a standardized substance abuse screening tool by correctional agencies. For high and medium intensive programs, the capacity increased when an organizational learning strategy was used by administrators and the organization used a substance abuse screening tool. Implications on advancing treatment practices in correctional settings are discussed, including further work to test theories on how to better understand access to intensive treatment services. This study presents the first phase of understanding capacity-related issues regarding treatment programs offered in correctional settings.

  18. Mechanisms of neuroblastoma regression

    PubMed Central

    Brodeur, Garrett M.; Bagatell, Rochelle

    2014-01-01

    Recent genomic and biological studies of neuroblastoma have shed light on the dramatic heterogeneity in the clinical behaviour of this disease, which spans from spontaneous regression or differentiation in some patients, to relentless disease progression in others, despite intensive multimodality therapy. This evidence also suggests several possible mechanisms to explain the phenomena of spontaneous regression in neuroblastomas, including neurotrophin deprivation, humoral or cellular immunity, loss of telomerase activity and alterations in epigenetic regulation. A better understanding of the mechanisms of spontaneous regression might help to identify optimal therapeutic approaches for patients with these tumours. Currently, the most druggable mechanism is the delayed activation of developmentally programmed cell death regulated by the tropomyosin receptor kinase A pathway. Indeed, targeted therapy aimed at inhibiting neurotrophin receptors might be used in lieu of conventional chemotherapy or radiation in infants with biologically favourable tumours that require treatment. Alternative approaches consist of breaking immune tolerance to tumour antigens or activating neurotrophin receptor pathways to induce neuronal differentiation. These approaches are likely to be most effective against biologically favourable tumours, but they might also provide insights into treatment of biologically unfavourable tumours. We describe the different mechanisms of spontaneous neuroblastoma regression and the consequent therapeutic approaches. PMID:25331179

  19. Differentiating regressed melanoma from regressed lichenoid keratosis.

    PubMed

    Chan, Aegean H; Shulman, Kenneth J; Lee, Bonnie A

    2017-04-01

    Distinguishing regressed lichen planus-like keratosis (LPLK) from regressed melanoma can be difficult on histopathologic examination, potentially resulting in mismanagement of patients. We aimed to identify histopathologic features by which regressed melanoma can be differentiated from regressed LPLK. Twenty actively inflamed LPLK, 12 LPLK with regression and 15 melanomas with regression were compared and evaluated by hematoxylin and eosin staining as well as Melan-A, microphthalmia transcription factor (MiTF) and cytokeratin (AE1/AE3) immunostaining. (1) A total of 40% of regressed melanomas showed complete or near complete loss of melanocytes within the epidermis with Melan-A and MiTF immunostaining, while 8% of regressed LPLK exhibited this finding. (2) Necrotic keratinocytes were seen in the epidermis in 33% regressed melanomas as opposed to all of the regressed LPLK. (3) A dense infiltrate of melanophages in the papillary dermis was seen in 40% of regressed melanomas, a feature not seen in regressed LPLK. In summary, our findings suggest that a complete or near complete loss of melanocytes within the epidermis strongly favors a regressed melanoma over a regressed LPLK. In addition, necrotic epidermal keratinocytes and the presence of a dense band-like distribution of dermal melanophages can be helpful in differentiating these lesions. © 2016 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

  20. MIXREG: a computer program for mixed-effects regression analysis with autocorrelated errors.

    PubMed

    Hedeker, D; Gibbons, R D

    1996-05-01

    MIXREG is a program that provides estimates for a mixed-effects regression model (MRM) for normally-distributed response data including autocorrelated errors. This model can be used for analysis of unbalanced longitudinal data, where individuals may be measured at a different number of timepoints, or even at different timepoints. Autocorrelated errors of a general form or following an AR(1), MA(1), or ARMA(1,1) form are allowable. This model can also be used for analysis of clustered data, where the mixed-effects model assumes data within clusters are dependent. The degree of dependency is estimated jointly with estimates of the usual model parameters, thus adjusting for clustering. MIXREG uses maximum marginal likelihood estimation, utilizing both the EM algorithm and a Fisher-scoring solution. For the scoring solution, the covariance matrix of the random effects is expressed in its Gaussian decomposition, and the diagonal matrix reparameterized using the exponential transformation. Estimation of the individual random effects is accomplished using an empirical Bayes approach. Examples illustrating usage and features of MIXREG are provided.

  1. Regression Simulation Model. Appendix X. Users Manual,

    DTIC Science & Technology

    1981-03-01

    change as the prediction equations become refined. Whereas no notice will be provided when the changes are made, the programs will be modified such that...NATIONAL BUREAU Of STANDARDS 1963 A ___,_ __ _ __ _ . APPENDIX X ( R4/ EGRESSION IMULATION ’jDEL. Ape’A ’) 7 USERS MANUA submitted to The Great River...regression analysis and to establish a prediction equation (model). The prediction equation contains the partial regression coefficients (B-weights) which

  2. A non-linear regression analysis program for describing electrophysiological data with multiple functions using Microsoft Excel.

    PubMed

    Brown, Angus M

    2006-04-01

    The objective of this present study was to demonstrate a method for fitting complex electrophysiological data with multiple functions using the SOLVER add-in of the ubiquitous spreadsheet Microsoft Excel. SOLVER minimizes the difference between the sum of the squares of the data to be fit and the function(s) describing the data using an iterative generalized reduced gradient method. While it is a straightforward procedure to fit data with linear functions, and we have previously demonstrated a method of non-linear regression analysis of experimental data based upon a single function, it is more complex to fit data with multiple functions, usually requiring specialized expensive computer software. In this paper we describe an easily understood program for fitting experimentally acquired data, in this case the stimulus-evoked compound action potential from the mouse optic nerve, with multiple Gaussian functions. The program is flexible and can be applied to describe data with a wide variety of user-input functions.

  3. A primer for biomedical scientists on how to execute model II linear regression analysis.

    PubMed

    Ludbrook, John

    2012-04-01

    1. There are two very different ways of executing linear regression analysis. One is Model I, when the x-values are fixed by the experimenter. The other is Model II, in which the x-values are free to vary and are subject to error. 2. I have received numerous complaints from biomedical scientists that they have great difficulty in executing Model II linear regression analysis. This may explain the results of a Google Scholar search, which showed that the authors of articles in journals of physiology, pharmacology and biochemistry rarely use Model II regression analysis. 3. I repeat my previous arguments in favour of using least products linear regression analysis for Model II regressions. I review three methods for executing ordinary least products (OLP) and weighted least products (WLP) regression analysis: (i) scientific calculator and/or computer spreadsheet; (ii) specific purpose computer programs; and (iii) general purpose computer programs. 4. Using a scientific calculator and/or computer spreadsheet, it is easy to obtain correct values for OLP slope and intercept, but the corresponding 95% confidence intervals (CI) are inaccurate. 5. Using specific purpose computer programs, the freeware computer program smatr gives the correct OLP regression coefficients and obtains 95% CI by bootstrapping. In addition, smatr can be used to compare the slopes of OLP lines. 6. When using general purpose computer programs, I recommend the commercial programs systat and Statistica for those who regularly undertake linear regression analysis and I give step-by-step instructions in the Supplementary Information as to how to use loss functions. © 2011 The Author. Clinical and Experimental Pharmacology and Physiology. © 2011 Blackwell Publishing Asia Pty Ltd.

  4. TWSVR: Regression via Twin Support Vector Machine.

    PubMed

    Khemchandani, Reshma; Goyal, Keshav; Chandra, Suresh

    2016-02-01

    Taking motivation from Twin Support Vector Machine (TWSVM) formulation, Peng (2010) attempted to propose Twin Support Vector Regression (TSVR) where the regressor is obtained via solving a pair of quadratic programming problems (QPPs). In this paper we argue that TSVR formulation is not in the true spirit of TWSVM. Further, taking motivation from Bi and Bennett (2003), we propose an alternative approach to find a formulation for Twin Support Vector Regression (TWSVR) which is in the true spirit of TWSVM. We show that our proposed TWSVR can be derived from TWSVM for an appropriately constructed classification problem. To check the efficacy of our proposed TWSVR we compare its performance with TSVR and classical Support Vector Regression(SVR) on various regression datasets. Copyright © 2015 Elsevier Ltd. All rights reserved.

  5. Bringing a Perspective from Outside the Field: A Commentary on Davis et al.'s (2010) Use of a Modified Regression Discontinuity Design to Evaluate a Gifted Program

    ERIC Educational Resources Information Center

    Adelson, Jill L.; Kelcey, Benjamin

    2016-01-01

    In this commentary of "Evaluating the Gifted Program of an Urban School District Using a Modified Regression Discontinuity Design" by Davis, Engberg, Epple, Sieg, and Zimmer, we examine the background of the study, critique the methods used, and discuss the results and implications. The study used a fuzzy regression discontinuity design…

  6. Analysis of cerebrovascular disease mortality trends in Andalusia (1980-2014).

    PubMed

    Cayuela, A; Cayuela, L; Rodríguez-Domínguez, S; González, A; Moniche, F

    2017-03-15

    In recent decades, mortality rates for cerebrovascular diseases (CVD) have decreased significantly in many countries. This study analyses recent tendencies in CVD mortality rates in Andalusia (1980-2014) to identify any changes in previously observed sex and age trends. CVD mortality and population data were obtained from Spain's National Statistics Institute database. We calculated age-specific and age-standardised mortality rates using the direct method (European standard population). Joinpoint regression analysis was used to estimate the annual percentage change in rates and identify significant changes in mortality trends. We also estimated rate ratios between Andalusia and Spain. Standardised rates for both males and females showed 3 periods in joinpoint regression analysis: an initial period of significant decline (1980-1997), a period of rate stabilisation (1997-2003), and another period of significant decline (2003-2014). Between 1997 and 2003, age-standardised rates stabilised in Andalusia but continued to decrease in Spain as a whole. This increased in the gap between CVD mortality rates in Andalusia and Spain for both sexes and most age groups. Copyright © 2017 The Author(s). Publicado por Elsevier España, S.L.U. All rights reserved.

  7. Retro-regression--another important multivariate regression improvement.

    PubMed

    Randić, M

    2001-01-01

    We review the serious problem associated with instabilities of the coefficients of regression equations, referred to as the MRA (multivariate regression analysis) "nightmare of the first kind". This is manifested when in a stepwise regression a descriptor is included or excluded from a regression. The consequence is an unpredictable change of the coefficients of the descriptors that remain in the regression equation. We follow with consideration of an even more serious problem, referred to as the MRA "nightmare of the second kind", arising when optimal descriptors are selected from a large pool of descriptors. This process typically causes at different steps of the stepwise regression a replacement of several previously used descriptors by new ones. We describe a procedure that resolves these difficulties. The approach is illustrated on boiling points of nonanes which are considered (1) by using an ordered connectivity basis; (2) by using an ordering resulting from application of greedy algorithm; and (3) by using an ordering derived from an exhaustive search for optimal descriptors. A novel variant of multiple regression analysis, called retro-regression (RR), is outlined showing how it resolves the ambiguities associated with both "nightmares" of the first and the second kind of MRA.

  8. Modified Regression Correlation Coefficient for Poisson Regression Model

    NASA Astrophysics Data System (ADS)

    Kaengthong, Nattacha; Domthong, Uthumporn

    2017-09-01

    This study gives attention to indicators in predictive power of the Generalized Linear Model (GLM) which are widely used; however, often having some restrictions. We are interested in regression correlation coefficient for a Poisson regression model. This is a measure of predictive power, and defined by the relationship between the dependent variable (Y) and the expected value of the dependent variable given the independent variables [E(Y|X)] for the Poisson regression model. The dependent variable is distributed as Poisson. The purpose of this research was modifying regression correlation coefficient for Poisson regression model. We also compare the proposed modified regression correlation coefficient with the traditional regression correlation coefficient in the case of two or more independent variables, and having multicollinearity in independent variables. The result shows that the proposed regression correlation coefficient is better than the traditional regression correlation coefficient based on Bias and the Root Mean Square Error (RMSE).

  9. Confirming the timing of phase-based costing in oncology studies: a case example in advanced melanoma.

    PubMed

    Atkins, Michael; Coutinho, Anna D; Nunna, Sasikiran; Gupte-Singh, Komal; Eaddy, Michael

    2018-02-01

    The utilization of healthcare services and costs among patients with cancer is often estimated by the phase of care: initial, interim, or terminal. Although their durations are often set arbitrarily, we sought to establish data-driven phases of care using joinpoint regression in an advanced melanoma population as a case example. A retrospective claims database study was conducted to assess the costs of advanced melanoma from distant metastasis diagnosis to death during January 2010-September 2014. Joinpoint regression analysis was applied to identify the best-fitting points, where statistically significant changes in the trend of average monthly costs occurred. To identify the initial phase, average monthly costs were modeled from metastasis diagnosis to death; and were modeled backward from death to metastasis diagnosis for the terminal phase. Points of monthly cost trend inflection denoted ending and starting points. The months between represented the interim phase. A total of 1,671 patients with advanced melanoma who died met the eligibility criteria. Initial phase was identified as the 5-month period starting with diagnosis of metastasis, after which there was a sharp, significant decline in monthly cost trend (monthly percent change [MPC] = -13.0%; 95% CI = -16.9% to -8.8%). Terminal phase was defined as the 5-month period before death (MPC = -14.0%; 95% CI = -17.6% to -10.2%). The claims-based algorithm may under-estimate patients due to misclassifications, and may over-estimate terminal phase costs because hospital and emergency visits were used as a death proxy. Also, recently approved therapies were not included, which may under-estimate advanced melanoma costs. In this advanced melanoma population, optimal duration of the initial and terminal phases of care was 5 months immediately after diagnosis of metastasis and before death, respectively. Joinpoint regression can be used to provide data-supported phase of cancer care durations, but

  10. TI-59 Programs for Multiple Regression.

    DTIC Science & Technology

    1980-05-01

    general linear hypothesis model of full rank [ Graybill , 19611 can be written as Y = x 8 + C , s-N(O,o 2I) nxl nxk kxl nxl where Y is the vector of n...a "reduced model " solution, and confidence intervals for linear functions of the coefficients can be obtained using (x’x) and a2, based on the t...O107)l UA.LLL. Library ModuIe NASTER -Puter 0NTINA Cards 1 PROGRAM DESCRIPTION (s s 2 ror the general linear hypothesis model Y - XO + C’ calculates

  11. The Application of the Cumulative Logistic Regression Model to Automated Essay Scoring

    ERIC Educational Resources Information Center

    Haberman, Shelby J.; Sinharay, Sandip

    2010-01-01

    Most automated essay scoring programs use a linear regression model to predict an essay score from several essay features. This article applied a cumulative logit model instead of the linear regression model to automated essay scoring. Comparison of the performances of the linear regression model and the cumulative logit model was performed on a…

  12. Using Genetic Programming with Prior Formula Knowledge to Solve Symbolic Regression Problem.

    PubMed

    Lu, Qiang; Ren, Jun; Wang, Zhiguang

    2016-01-01

    A researcher can infer mathematical expressions of functions quickly by using his professional knowledge (called Prior Knowledge). But the results he finds may be biased and restricted to his research field due to limitation of his knowledge. In contrast, Genetic Programming method can discover fitted mathematical expressions from the huge search space through running evolutionary algorithms. And its results can be generalized to accommodate different fields of knowledge. However, since GP has to search a huge space, its speed of finding the results is rather slow. Therefore, in this paper, a framework of connection between Prior Formula Knowledge and GP (PFK-GP) is proposed to reduce the space of GP searching. The PFK is built based on the Deep Belief Network (DBN) which can identify candidate formulas that are consistent with the features of experimental data. By using these candidate formulas as the seed of a randomly generated population, PFK-GP finds the right formulas quickly by exploring the search space of data features. We have compared PFK-GP with Pareto GP on regression of eight benchmark problems. The experimental results confirm that the PFK-GP can reduce the search space and obtain the significant improvement in the quality of SR.

  13. Variability and trends in the consumption of drugs for treating attention-deficit/hyperactivity disorder in Castile-La Mancha, Spain (1992-2015).

    PubMed

    Criado-Álvarez, J J; González González, J; Romo Barrientos, C; Mohedano Moriano, A; Montero Rubio, J C; Pérez Veiga, J P

    2016-09-16

    Attention-deficit/hyperactivity disorder (ADHD) is one of the most common behavioural disorders of childhood; its prevalence in Spain is estimated at 5-9%. Available treatments for this condition include methylphenidate, atomoxetine, and lisdexamfetamine, whose consumption increases each year. The prevalence of ADHD was estimated by calculating the defined daily dose per 1,000 population per day of each drug and the total doses (therapeutic group N06BA) between 1992 and 2015 in each of the provinces of Castile-La Mancha (Spain). Trends, joinpoints, and annual percentages of change were analysed using joinpoint regression models. The minimum prevalence of ADHD in the population of Castile-La Mancha aged 5 to 19 was estimated at 13.22 cases per 1,000 population per day; prevalence varied across provinces (p<.05). Overall consumption has increased from 1992 to 2015, with an annual percentages of change of 10.3% and several joinpoints (2000, 2009, and 2012). methylphenidate represents 89.6% of total drug consumption, followed by lisdexamfetamine at 8%. Analysing drug consumption enables us to estimate the distribution of ADHD patients in Castile-La Mancha. Our data show an increase in the consumption of these drugs as well as differences in drug consumption between provinces, which reflect differences in ADHD management in clinical practice. Copyright © 2016. Publicado por Elsevier España, S.L.U.

  14. Testing and referral patterns in the years surrounding the US Preventive Services Task Force recommendation against prostate-specific antigen screening.

    PubMed

    Hutchinson, Ryan; Akhtar, Abdulhadi; Haridas, Justin; Bhat, Deepa; Roehrborn, Claus; Lotan, Yair

    2016-12-15

    Since the US Preventive Services Task Force (USPSTF) recommended against prostate-specific antigen (PSA) screening, there have been conflicting reports regarding the impact on the behavior of providers. This study analyzed real-world data on PSA ordering and referral practices in the years surrounding the recommendation. A whole-institution sample of entered PSA orders and urology referrals was obtained from the electronic medical record. The study was performed at a tertiary referral center with a catchment in the southern United States. PSA examinations were defined as screening when they were ordered by providers with appointments in internal medicine, family medicine, or general internal medicine. Linear and quadratic regression analyses were performed, and joinpoint regression was used to assess for trend inflection points. Between January 2010 and July 2015, there were 275,784 unique ambulatory visits for men. There were 63,722 raw PSA orders, and 54,684 were evaluable. Primary care providers ordered 17,315 PSA tests and 858 urology referrals. The number of PSA tests per ambulatory visit, the number of referrals per ambulatory visit, the age at the time of the urology referral, and the proportion of PSA tests performed outside the recommended age range did not significantly change. The PSA value at the time of referral increased significantly (P = .022). Joinpoint analysis revealed no joinpoints in the analysis of total PSA orders, screening PSA tests, or examinations per 100 visits. In the years surrounding the USPSTF recommendation, PSA behavior did not change significantly. Patients were referred at progressively higher average PSA levels. The implications for prostate cancer outcomes from these trends warrant further research into provider variables associated with actual PSA utilization. Cancer 2016;122:3785-3793. © 2016 American Cancer Society. © 2016 American Cancer Society.

  15. Significant social events and increasing use of life-sustaining treatment: trend analysis using extracorporeal membrane oxygenation as an example.

    PubMed

    Chen, Yen-Yuan; Chen, Likwang; Huang, Tien-Shang; Ko, Wen-Je; Chu, Tzong-Shinn; Ni, Yen-Hsuan; Chang, Shan-Chwen

    2014-03-04

    Most studies have examined the outcomes of patients supported by extracorporeal membrane oxygenation as a life-sustaining treatment. It is unclear whether significant social events are associated with the use of life-sustaining treatment. This study aimed to compare the trend of extracorporeal membrane oxygenation use in Taiwan with that in the world, and to examine the influence of significant social events on the trend of extracorporeal membrane oxygenation use in Taiwan. Taiwan's extracorporeal membrane oxygenation uses from 2000 to 2009 were collected from National Health Insurance Research Dataset. The number of the worldwide extracorporeal membrane oxygenation cases was mainly estimated using Extracorporeal Life Support Registry Report International Summary July 2012. The trend of Taiwan's crude annual incidence rate of extracorporeal membrane oxygenation use was compared with that of the rest of the world. Each trend of extracorporeal membrane oxygenation use was examined using joinpoint regression. The measurement was the crude annual incidence rate of extracorporeal membrane oxygenation use. Each of the Taiwan's crude annual incidence rates was much higher than the worldwide one in the same year. Both the trends of Taiwan's and worldwide crude annual incidence rates have significantly increased since 2000. Joinpoint regression selected the model of the Taiwan's trend with one joinpoint in 2006 as the best-fitted model, implying that the significant social events in 2006 were significantly associated with the trend change of extracorporeal membrane oxygenation use following 2006. In addition, significantly social events highlighted by the media are more likely to be associated with the increase of extracorporeal membrane oxygenation use than being fully covered by National Health Insurance. Significant social events, such as a well-known person's successful extracorporeal membrane oxygenation use highlighted by the mass media, are associated with the use of

  16. Predicting Quantitative Traits With Regression Models for Dense Molecular Markers and Pedigree

    PubMed Central

    de los Campos, Gustavo; Naya, Hugo; Gianola, Daniel; Crossa, José; Legarra, Andrés; Manfredi, Eduardo; Weigel, Kent; Cotes, José Miguel

    2009-01-01

    The availability of genomewide dense markers brings opportunities and challenges to breeding programs. An important question concerns the ways in which dense markers and pedigrees, together with phenotypic records, should be used to arrive at predictions of genetic values for complex traits. If a large number of markers are included in a regression model, marker-specific shrinkage of regression coefficients may be needed. For this reason, the Bayesian least absolute shrinkage and selection operator (LASSO) (BL) appears to be an interesting approach for fitting marker effects in a regression model. This article adapts the BL to arrive at a regression model where markers, pedigrees, and covariates other than markers are considered jointly. Connections between BL and other marker-based regression models are discussed, and the sensitivity of BL with respect to the choice of prior distributions assigned to key parameters is evaluated using simulation. The proposed model was fitted to two data sets from wheat and mouse populations, and evaluated using cross-validation methods. Results indicate that inclusion of markers in the regression further improved the predictive ability of models. An R program that implements the proposed model is freely available. PMID:19293140

  17. Engine With Regression and Neural Network Approximators Designed

    NASA Technical Reports Server (NTRS)

    Patnaik, Surya N.; Hopkins, Dale A.

    2001-01-01

    At the NASA Glenn Research Center, the NASA engine performance program (NEPP, ref. 1) and the design optimization testbed COMETBOARDS (ref. 2) with regression and neural network analysis-approximators have been coupled to obtain a preliminary engine design methodology. The solution to a high-bypass-ratio subsonic waverotor-topped turbofan engine, which is shown in the preceding figure, was obtained by the simulation depicted in the following figure. This engine is made of 16 components mounted on two shafts with 21 flow stations. The engine is designed for a flight envelope with 47 operating points. The design optimization utilized both neural network and regression approximations, along with the cascade strategy (ref. 3). The cascade used three algorithms in sequence: the method of feasible directions, the sequence of unconstrained minimizations technique, and sequential quadratic programming. The normalized optimum thrusts obtained by the three methods are shown in the following figure: the cascade algorithm with regression approximation is represented by a triangle, a circle is shown for the neural network solution, and a solid line indicates original NEPP results. The solutions obtained from both approximate methods lie within one standard deviation of the benchmark solution for each operating point. The simulation improved the maximum thrust by 5 percent. The performance of the linear regression and neural network methods as alternate engine analyzers was found to be satisfactory for the analysis and operation optimization of air-breathing propulsion engines (ref. 4).

  18. Simple linear and multivariate regression models.

    PubMed

    Rodríguez del Águila, M M; Benítez-Parejo, N

    2011-01-01

    In biomedical research it is common to find problems in which we wish to relate a response variable to one or more variables capable of describing the behaviour of the former variable by means of mathematical models. Regression techniques are used to this effect, in which an equation is determined relating the two variables. While such equations can have different forms, linear equations are the most widely used form and are easy to interpret. The present article describes simple and multiple linear regression models, how they are calculated, and how their applicability assumptions are checked. Illustrative examples are provided, based on the use of the freely accessible R program. Copyright © 2011 SEICAP. Published by Elsevier Espana. All rights reserved.

  19. Stolon regression

    PubMed Central

    Cherry Vogt, Kimberly S

    2008-01-01

    Many colonial organisms encrust surfaces with feeding and reproductive polyps connected by vascular stolons. Such colonies often show a dichotomy between runner-like forms, with widely spaced polyps and long stolon connections, and sheet-like forms, with closely spaced polyps and short stolon connections. Generative processes, such as rates of polyp initiation relative to rates of stolon elongation, are typically thought to underlie this dichotomy. Regressive processes, such as tissue regression and cell death, may also be relevant. In this context, we have recently characterized the process of stolon regression in a colonial cnidarian, Podocoryna carnea. Stolon regression occurs naturally in these colonies. To characterize this process in detail, high levels of stolon regression were induced in experimental colonies by treatment with reactive oxygen and reactive nitrogen species (ROS and RNS). Either treatment results in stolon regression and is accompanied by high levels of endogenous ROS and RNS as well as morphological indications of cell death in the regressing stolon. The initiating step in regression appears to be a perturbation of normal colony-wide gastrovascular flow. This suggests more general connections between stolon regression and a wide variety of environmental effects. Here we summarize our results and further discuss such connections. PMID:19704785

  20. Subsonic Aircraft With Regression and Neural-Network Approximators Designed

    NASA Technical Reports Server (NTRS)

    Patnaik, Surya N.; Hopkins, Dale A.

    2004-01-01

    At the NASA Glenn Research Center, NASA Langley Research Center's Flight Optimization System (FLOPS) and the design optimization testbed COMETBOARDS with regression and neural-network-analysis approximators have been coupled to obtain a preliminary aircraft design methodology. For a subsonic aircraft, the optimal design, that is the airframe-engine combination, is obtained by the simulation. The aircraft is powered by two high-bypass-ratio engines with a nominal thrust of about 35,000 lbf. It is to carry 150 passengers at a cruise speed of Mach 0.8 over a range of 3000 n mi and to operate on a 6000-ft runway. The aircraft design utilized a neural network and a regression-approximations-based analysis tool, along with a multioptimizer cascade algorithm that uses sequential linear programming, sequential quadratic programming, the method of feasible directions, and then sequential quadratic programming again. Optimal aircraft weight versus the number of design iterations is shown. The central processing unit (CPU) time to solution is given. It is shown that the regression-method-based analyzer exhibited a smoother convergence pattern than the FLOPS code. The optimum weight obtained by the approximation technique and the FLOPS code differed by 1.3 percent. Prediction by the approximation technique exhibited no error for the aircraft wing area and turbine entry temperature, whereas it was within 2 percent for most other parameters. Cascade strategy was required by FLOPS as well as the approximators. The regression method had a tendency to hug the data points, whereas the neural network exhibited a propensity to follow a mean path. The performance of the neural network and regression methods was considered adequate. It was at about the same level for small, standard, and large models with redundancy ratios (defined as the number of input-output pairs to the number of unknown coefficients) of 14, 28, and 57, respectively. In an SGI octane workstation (Silicon Graphics

  1. Air Leakage of US Homes: Regression Analysis and Improvements from Retrofit

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chan, Wanyu R.; Joh, Jeffrey; Sherman, Max H.

    2012-08-01

    LBNL Residential Diagnostics Database (ResDB) contains blower door measurements and other diagnostic test results of homes in United States. Of these, approximately 134,000 single-family detached homes have sufficient information for the analysis of air leakage in relation to a number of housing characteristics. We performed regression analysis to consider the correlation between normalized leakage and a number of explanatory variables: IECC climate zone, floor area, height, year built, foundation type, duct location, and other characteristics. The regression model explains 68% of the observed variability in normalized leakage. ResDB also contains the before and after retrofit air leakage measurements of approximatelymore » 23,000 homes that participated in weatherization assistant programs (WAPs) or residential energy efficiency programs. The two types of programs achieve rather similar reductions in normalized leakage: 30% for WAPs and 20% for other energy programs.« less

  2. Do clinical and translational science graduate students understand linear regression? Development and early validation of the REGRESS quiz.

    PubMed

    Enders, Felicity

    2013-12-01

    Although regression is widely used for reading and publishing in the medical literature, no instruments were previously available to assess students' understanding. The goal of this study was to design and assess such an instrument for graduate students in Clinical and Translational Science and Public Health. A 27-item REsearch on Global Regression Expectations in StatisticS (REGRESS) quiz was developed through an iterative process. Consenting students taking a course on linear regression in a Clinical and Translational Science program completed the quiz pre- and postcourse. Student results were compared to practicing statisticians with a master's or doctoral degree in statistics or a closely related field. Fifty-two students responded precourse, 59 postcourse , and 22 practicing statisticians completed the quiz. The mean (SD) score was 9.3 (4.3) for students precourse and 19.0 (3.5) postcourse (P < 0.001). Postcourse students had similar results to practicing statisticians (mean (SD) of 20.1(3.5); P = 0.21). Students also showed significant improvement pre/postcourse in each of six domain areas (P < 0.001). The REGRESS quiz was internally reliable (Cronbach's alpha 0.89). The initial validation is quite promising with statistically significant and meaningful differences across time and study populations. Further work is needed to validate the quiz across multiple institutions. © 2013 Wiley Periodicals, Inc.

  3. Regression: A Bibliography.

    ERIC Educational Resources Information Center

    Pedrini, D. T.; Pedrini, Bonnie C.

    Regression, another mechanism studied by Sigmund Freud, has had much research, e.g., hypnotic regression, frustration regression, schizophrenic regression, and infra-human-animal regression (often directly related to fixation). Many investigators worked with hypnotic age regression, which has a long history, going back to Russian reflexologists.…

  4. Estimation of soil cation exchange capacity using Genetic Expression Programming (GEP) and Multivariate Adaptive Regression Splines (MARS)

    NASA Astrophysics Data System (ADS)

    Emamgolizadeh, S.; Bateni, S. M.; Shahsavani, D.; Ashrafi, T.; Ghorbani, H.

    2015-10-01

    The soil cation exchange capacity (CEC) is one of the main soil chemical properties, which is required in various fields such as environmental and agricultural engineering as well as soil science. In situ measurement of CEC is time consuming and costly. Hence, numerous studies have used traditional regression-based techniques to estimate CEC from more easily measurable soil parameters (e.g., soil texture, organic matter (OM), and pH). However, these models may not be able to adequately capture the complex and highly nonlinear relationship between CEC and its influential soil variables. In this study, Genetic Expression Programming (GEP) and Multivariate Adaptive Regression Splines (MARS) were employed to estimate CEC from more readily measurable soil physical and chemical variables (e.g., OM, clay, and pH) by developing functional relations. The GEP- and MARS-based functional relations were tested at two field sites in Iran. Results showed that GEP and MARS can provide reliable estimates of CEC. Also, it was found that the MARS model (with root-mean-square-error (RMSE) of 0.318 Cmol+ kg-1 and correlation coefficient (R2) of 0.864) generated slightly better results than the GEP model (with RMSE of 0.270 Cmol+ kg-1 and R2 of 0.807). The performance of GEP and MARS models was compared with two existing approaches, namely artificial neural network (ANN) and multiple linear regression (MLR). The comparison indicated that MARS and GEP outperformed the MLP model, but they did not perform as good as ANN. Finally, a sensitivity analysis was conducted to determine the most and the least influential variables affecting CEC. It was found that OM and pH have the most and least significant effect on CEC, respectively.

  5. Advanced statistics: linear regression, part I: simple linear regression.

    PubMed

    Marill, Keith A

    2004-01-01

    Simple linear regression is a mathematical technique used to model the relationship between a single independent predictor variable and a single dependent outcome variable. In this, the first of a two-part series exploring concepts in linear regression analysis, the four fundamental assumptions and the mechanics of simple linear regression are reviewed. The most common technique used to derive the regression line, the method of least squares, is described. The reader will be acquainted with other important concepts in simple linear regression, including: variable transformations, dummy variables, relationship to inference testing, and leverage. Simplified clinical examples with small datasets and graphic models are used to illustrate the points. This will provide a foundation for the second article in this series: a discussion of multiple linear regression, in which there are multiple predictor variables.

  6. Prediction of dynamical systems by symbolic regression

    NASA Astrophysics Data System (ADS)

    Quade, Markus; Abel, Markus; Shafi, Kamran; Niven, Robert K.; Noack, Bernd R.

    2016-07-01

    We study the modeling and prediction of dynamical systems based on conventional models derived from measurements. Such algorithms are highly desirable in situations where the underlying dynamics are hard to model from physical principles or simplified models need to be found. We focus on symbolic regression methods as a part of machine learning. These algorithms are capable of learning an analytically tractable model from data, a highly valuable property. Symbolic regression methods can be considered as generalized regression methods. We investigate two particular algorithms, the so-called fast function extraction which is a generalized linear regression algorithm, and genetic programming which is a very general method. Both are able to combine functions in a certain way such that a good model for the prediction of the temporal evolution of a dynamical system can be identified. We illustrate the algorithms by finding a prediction for the evolution of a harmonic oscillator based on measurements, by detecting an arriving front in an excitable system, and as a real-world application, the prediction of solar power production based on energy production observations at a given site together with the weather forecast.

  7. A study of home deaths in Japan from 1951 to 2002

    PubMed Central

    Yang, Limin; Sakamoto, Naoko; Marui, Eiji

    2006-01-01

    Background Several surveys in Japan have indicated that most terminally ill Japanese patients would prefer to die at home or in a homelike setting. However, there is a great disparity between this stated preference and the reality, since most Japanese die in hospital. We report here national changes in home deaths in Japan over the last 5 decades. Using prefecture data, we also examined the factors in the medical service associated with home death in Japan. Methods Published data on place of death was obtained from the vital statistics compiled by the Ministry of Health, Labor and Welfare of Japan. We analyzed trends of home deaths from 1951 to 2002, and describe the changes in the proportion of home deaths by region, sex, age, and cause of death. Joinpoint regression analysis was used for trend analysis. Logistic regression analysis was performed to identify secular trends in home deaths, and the impact of age, sex, year of deaths and cause of deaths on home death. We also examined the association between home death and medical service factors by multiple regression analysis, using home death rate by prefectures in 2002 as a dependent variable. Results A significant decrease in the percentage of patients dying at home was observed in the results of joinpoint regression analysis. Older patients and males were more likely to die at home. Patients who died from cancer were less likely to die at home. The results of multiple regression analysis indicated that home death was related to the number of beds in hospital, ratio of daily occupied beds in general hospital, the number of families in which the elderly were living alone, and dwelling rooms. Conclusion The pattern of the place of death has not only been determined by social and demographic characteristics of the decedent, but also associated with the medical service in the community. PMID:16524485

  8. Recent changes in the trends of teen birth rates, 1981-2006.

    PubMed

    Wingo, Phyllis A; Smith, Ruben A; Tevendale, Heather D; Ferré, Cynthia

    2011-03-01

    To explore trends in teen birth rates by selected demographics. We used birth certificate data and joinpoint regression to examine trends in teen birth rates by age (10-14, 15-17, and 18-19 years) and race during 1981-2006 and by age and Hispanic origin during 1990-2006. Joinpoint analysis describes changing trends over successive segments of time and uses annual percentage change (APC) to express the amount of increase or decrease within each segment. For teens younger than 18 years, the decline in birth rates began in 1994 and ended in 2003 (APC: -8.03% per year for ages 10-14 years; APC: -5.63% per year for ages 15-17 years). The downward trend for 18- and 19-year-old teens began earlier (1991) and ended 1 year later (2004) (APC: -2.37% per year). For each study population, the trend was approximately level during the most recent time segment, except for continuing declines for 18- and 19-year-old white and Asian/Pacific Islander teens. The only increasing trend in the most recent time segment was for 18- and 19-year-old Hispanic teens. During these declines, the age distribution of teens who gave birth shifted to slightly older ages, and the percentage whose current birth was at least their second birth decreased. Teen birth rates were generally level during 2003/2004-2006 after the long-term declines. Rates increased among older Hispanic teens. These results indicate a need for renewed attention to effective teen pregnancy prevention programs in specific populations. Copyright © 2011. Published by Elsevier Inc.

  9. Evaluating differential effects using regression interactions and regression mixture models

    PubMed Central

    Van Horn, M. Lee; Jaki, Thomas; Masyn, Katherine; Howe, George; Feaster, Daniel J.; Lamont, Andrea E.; George, Melissa R. W.; Kim, Minjung

    2015-01-01

    Research increasingly emphasizes understanding differential effects. This paper focuses on understanding regression mixture models, a relatively new statistical methods for assessing differential effects by comparing results to using an interactive term in linear regression. The research questions which each model answers, their formulation, and their assumptions are compared using Monte Carlo simulations and real data analysis. The capabilities of regression mixture models are described and specific issues to be addressed when conducting regression mixtures are proposed. The paper aims to clarify the role that regression mixtures can take in the estimation of differential effects and increase awareness of the benefits and potential pitfalls of this approach. Regression mixture models are shown to be a potentially effective exploratory method for finding differential effects when these effects can be defined by a small number of classes of respondents who share a typical relationship between a predictor and an outcome. It is also shown that the comparison between regression mixture models and interactions becomes substantially more complex as the number of classes increases. It is argued that regression interactions are well suited for direct tests of specific hypotheses about differential effects and regression mixtures provide a useful approach for exploring effect heterogeneity given adequate samples and study design. PMID:26556903

  10. [Trends of cancer mortality rates in children and adolescents by level of marginalization in Mexico (1990-2009)].

    PubMed

    Escamilla-Santiago, Ricardo Antonio; Narro-Robles, José; Fajardo-Gutiérrez, Arturo; Rascón-Pacheco, Ramón Alberto; López-Cervantes, Malaquías

    2012-01-01

    To determine childhood and adolescent cancer mortality by the level of marginalization in Mexico. We used 1990-2009 death certificates estimating age-standardized rates. We calculated the Average Annual Percent Change (AAPC) using the Joinpoint Regression program available at the National Cancer Institute to assess tendency. Cancer mortality rates increased. AAPC were 0.87% male and 0.96% female children, and for adolescents were: males 1.22% and females 0.63%. The neoplasm pattern in infants was leukemia -central nervous system- lymphomas; and in adolescents it was leukemia -bone and articulation- lymphomas. The increase in cancer mortality corresponded to the high and highest marginated areas of each state. The increase in highly marginated areas may be partly explained by well-documented local registration of deaths. Further studies focusing on survival are required in order to better assess the effectiveness of cancer detection and medical treatment in our country.

  11. CatReg Software for Categorical Regression Analysis (May 2016)

    EPA Science Inventory

    CatReg 3.0 is a Microsoft Windows enhanced version of the Agency’s categorical regression analysis (CatReg) program. CatReg complements EPA’s existing Benchmark Dose Software (BMDS) by greatly enhancing a risk assessor’s ability to determine whether data from separate toxicologic...

  12. Nationwide summary of US Geological Survey regional regression equations for estimating magnitude and frequency of floods for ungaged sites, 1993

    USGS Publications Warehouse

    Jennings, M.E.; Thomas, W.O.; Riggs, H.C.

    1994-01-01

    For many years, the U.S. Geological Survey (USGS) has been involved in the development of regional regression equations for estimating flood magnitude and frequency at ungaged sites. These regression equations are used to transfer flood characteristics from gaged to ungaged sites through the use of watershed and climatic characteristics as explanatory or predictor variables. Generally these equations have been developed on a statewide or metropolitan area basis as part of cooperative study programs with specific State Departments of Transportation or specific cities. The USGS, in cooperation with the Federal Highway Administration and the Federal Emergency Management Agency, has compiled all the current (as of September 1993) statewide and metropolitan area regression equations into a micro-computer program titled the National Flood Frequency Program.This program includes regression equations for estimating flood-peak discharges and techniques for estimating a typical flood hydrograph for a given recurrence interval peak discharge for unregulated rural and urban watersheds. These techniques should be useful to engineers and hydrologists for planning and design applications. This report summarizes the statewide regression equations for rural watersheds in each State, summarizes the applicable metropolitan area or statewide regression equations for urban watersheds, describes the National Flood Frequency Program for making these computations, and provides much of the reference information on the extrapolation variables needed to run the program.

  13. Statistical power analyses using G*Power 3.1: tests for correlation and regression analyses.

    PubMed

    Faul, Franz; Erdfelder, Edgar; Buchner, Axel; Lang, Albert-Georg

    2009-11-01

    G*Power is a free power analysis program for a variety of statistical tests. We present extensions and improvements of the version introduced by Faul, Erdfelder, Lang, and Buchner (2007) in the domain of correlation and regression analyses. In the new version, we have added procedures to analyze the power of tests based on (1) single-sample tetrachoric correlations, (2) comparisons of dependent correlations, (3) bivariate linear regression, (4) multiple linear regression based on the random predictor model, (5) logistic regression, and (6) Poisson regression. We describe these new features and provide a brief introduction to their scope and handling.

  14. Factor regression for interpreting genotype-environment interaction in bread-wheat trials.

    PubMed

    Baril, C P

    1992-05-01

    The French INRA wheat (Triticum aestivum L. em Thell.) breeding program is based on multilocation trials to produce high-yielding, adapted lines for a wide range of environments. Differential genotypic responses to variable environment conditions limit the accuracy of yield estimations. Factor regression was used to partition the genotype-environment (GE) interaction into four biologically interpretable terms. Yield data were analyzed from 34 wheat genotypes grown in four environments using 12 auxiliary agronomic traits as genotypic and environmental covariates. Most of the GE interaction (91%) was explained by the combination of only three traits: 1,000-kernel weight, lodging susceptibility and spike length. These traits are easily measured in breeding programs, therefore factor regression model can provide a convenient and useful prediction method of yield.

  15. Gifted Identification and the Role of Gifted Education: A Commentary on "Evaluating the Gifted Program of an Urban School District Using a Modified Regression Discontinuity Design"

    ERIC Educational Resources Information Center

    Steenbergen-Hu, Saiying; Olszewski-Kubilius, Paula

    2016-01-01

    The article by Davis, Engberg, Epple, Sieg, and Zimmer (2010) represents one of the recent research efforts from economists in evaluating the impact of gifted programs. It can serve as a worked example of the implementation of the regression discontinuity (RD) design method in gifted education research. In this commentary, we first illustrate the…

  16. Advanced statistics: linear regression, part II: multiple linear regression.

    PubMed

    Marill, Keith A

    2004-01-01

    The applications of simple linear regression in medical research are limited, because in most situations, there are multiple relevant predictor variables. Univariate statistical techniques such as simple linear regression use a single predictor variable, and they often may be mathematically correct but clinically misleading. Multiple linear regression is a mathematical technique used to model the relationship between multiple independent predictor variables and a single dependent outcome variable. It is used in medical research to model observational data, as well as in diagnostic and therapeutic studies in which the outcome is dependent on more than one factor. Although the technique generally is limited to data that can be expressed with a linear function, it benefits from a well-developed mathematical framework that yields unique solutions and exact confidence intervals for regression coefficients. Building on Part I of this series, this article acquaints the reader with some of the important concepts in multiple regression analysis. These include multicollinearity, interaction effects, and an expansion of the discussion of inference testing, leverage, and variable transformations to multivariate models. Examples from the first article in this series are expanded on using a primarily graphic, rather than mathematical, approach. The importance of the relationships among the predictor variables and the dependence of the multivariate model coefficients on the choice of these variables are stressed. Finally, concepts in regression model building are discussed.

  17. Mortality from cystic fibrosis in Europe: 1994-2010.

    PubMed

    Quintana-Gallego, Esther; Ruiz-Ramos, Miguel; Delgado-Pecellin, Isabel; Calero, Carmen; Soriano, Joan B; Lopez-Campos, Jose Luis

    2016-02-01

    To date, available mortality trends due to cystic fibrosis (CF) have been limited to the analysis of certain countries in different parts of the world showing that mortality trends have been constantly decreasing. However, no studies have examined Europe as a whole. The present study aims to analyze CF mortality trends by gender within the European Union (EU) and to quantify potential years of life lost (PYLL). Deaths from the 27 EU countries were obtained from the statistical office of the EU from the years 1994-2010. Crude and age-standardized mortality rates (ASR) were estimated for women and men using the standard European population, expressed in deaths per 1,000,000 persons. The PYLL from ages 0 up to 30 years were estimated. Trends were studied by a joinpoint regression analysis. During the study period, 5,130 deaths (2,443 in males and 2,687 in females) were identified. Females had a slightly higher mortality rate than males, with a downward trend observed for both genders. In males, the ASR changed from 1.34 in 1994 to 1.03 in 2010. In females, the ASR changed from 1.42 in 1994 to 0.92 in 2010. The mean age at death and PYLL increased for both genders. The joinpoint analysis did not identify any significant joinpoint for either gender for ASR or PYLL. Our data suggest a continued downward trend of CF mortality throughout the EU, with differences by country and gender. © 2015 Wiley Periodicals, Inc.

  18. Asthma disease management: regression to the mean or better?

    PubMed

    Tinkelman, David; Wilson, Steve

    2004-12-01

    To assess the effectiveness of disease management as an adjunct to treatment for chronic illnesses, such as asthma, and to evaluate whether the statistical phenomenon of regression to the mean is responsible for many of the benefits commonly attributed to disease management. This study evaluated an asthma disease management intervention in a Colorado population covered by Medicaid. The outcomes are presented with the intervention group serving as its own control (baseline and postintervention measurements) and are compared with a matched control group during the same periods. In the intervention group, 388 asthmatics entered and 258 completed the 6-month program; 446 subjects participated in the control group. Facilities charges were compared for both groups during the baseline and program periods. Both groups were well matched demographically and for costs at baseline. Using the intervention group as its own control revealed a 49.1% savings. The control group savings were 30.7%. Therefore, the net savings were 18.4% (P < .001) for the intervention group vs controls. Although the demonstrated savings were less using a control group to correct for regression to the mean, they were statistically significant and clinically relevant. When using a control group to control for the statistical effects of regression to the mean, a disease management intervention for asthma in a population covered by Medicaid is effective in reducing healthcare costs.

  19. Functional mixture regression.

    PubMed

    Yao, Fang; Fu, Yuejiao; Lee, Thomas C M

    2011-04-01

    In functional linear models (FLMs), the relationship between the scalar response and the functional predictor process is often assumed to be identical for all subjects. Motivated by both practical and methodological considerations, we relax this assumption and propose a new class of functional regression models that allow the regression structure to vary for different groups of subjects. By projecting the predictor process onto its eigenspace, the new functional regression model is simplified to a framework that is similar to classical mixture regression models. This leads to the proposed approach named as functional mixture regression (FMR). The estimation of FMR can be readily carried out using existing software implemented for functional principal component analysis and mixture regression. The practical necessity and performance of FMR are illustrated through applications to a longevity analysis of female medflies and a human growth study. Theoretical investigations concerning the consistent estimation and prediction properties of FMR along with simulation experiments illustrating its empirical properties are presented in the supplementary material available at Biostatistics online. Corresponding results demonstrate that the proposed approach could potentially achieve substantial gains over traditional FLMs.

  20. Canadian firearms legislation and effects on homicide 1974 to 2008.

    PubMed

    Langmann, Caillin

    2012-08-01

    Canada has implemented legislation covering all firearms since 1977 and presents a model to examine incremental firearms control. The effect of legislation on homicide by firearm and the subcategory, spousal homicide, is controversial and has not been well studied to date. Legislative effects on homicide and spousal homicide were analyzed using data obtained from Statistics Canada from 1974 to 2008. Three statistical methods were applied to search for any associated effects of firearms legislation. Interrupted time series regression, ARIMA, and Joinpoint analysis were performed. Neither were any significant beneficial associations between firearms legislation and homicide or spousal homicide rates found after the passage of three Acts by the Canadian Parliament--Bill C-51 (1977), C-17 (1991), and C-68 (1995)--nor were effects found after the implementation of licensing in 2001 and the registration of rifles and shotguns in 2003. After the passage of C-68, a decrease in the rate of the decline of homicide by firearm was found by interrupted regression. Joinpoint analysis also found an increasing trend in homicide by firearm rate post the enactment of the licensing portion of C-68. Other factors found to be associated with homicide rates were median age, unemployment, immigration rates, percentage of population in low-income bracket, Gini index of income equality, population per police officer, and incarceration rate. This study failed to demonstrate a beneficial association between legislation and firearm homicide rates between 1974 and 2008.

  1. The Prekindergarten Age-Cutoff Regression-Discontinuity Design: Methodological Issues and Implications for Application

    ERIC Educational Resources Information Center

    Lipsey, Mark W.; Weiland, Christina; Yoshikawa, Hirokazu; Wilson, Sandra Jo; Hofer, Kerry G.

    2015-01-01

    Much of the currently available evidence on the causal effects of public prekindergarten programs on school readiness outcomes comes from studies that use a regression-discontinuity design (RDD) with the age cutoff to enter a program in a given year as the basis for assignment to treatment and control conditions. Because the RDD has high internal…

  2. [Regression on order statistics and its application in estimating nondetects for food exposure assessment].

    PubMed

    Yu, Xiaojin; Liu, Pei; Min, Jie; Chen, Qiguang

    2009-01-01

    To explore the application of regression on order statistics (ROS) in estimating nondetects for food exposure assessment. Regression on order statistics was adopted in analysis of cadmium residual data set from global food contaminant monitoring, the mean residual was estimated basing SAS programming and compared with the results from substitution methods. The results show that ROS method performs better obviously than substitution methods for being robust and convenient for posterior analysis. Regression on order statistics is worth to adopt,but more efforts should be make for details of application of this method.

  3. [Cancer mortality trends in Mexico, 1980-2011].

    PubMed

    Torres-Sánchez, Luisa E; Rojas-Martínez, Rosalba; Escamilla-Núñez, Consuelo; de la Vara-Salazar, Elvia; Lazcano-Ponce, Eduardo

    2014-01-01

    To evaluate trends in cancer mortality in Mexico between 1980-2011. Through direct method and using World Population 2010 as standard population, mortality rates for all cancers and the 15 most frequent locations, adjusted for age and sex were calculated. Trends in mortality rates and annual percentage change for each type of cancer were estimated by joinpoint regression model. As a result of the reduction in mortality from lung cancer (-3.2% -1.8% in men and in women), stomach (-2.1% -2.4% in men and in women) and cervix (-4.7%); since 2004 a significant (~1% per year) decline was observed in cancer mortality in general, in all ages, and in the group of 35-64 years of both sexes. Other cancers such as breast and ovarian cancer in women; as well as for prostate cancer in men, showed a steady increase. Some of the reductions in cancer mortality may be partially attributed to the effectiveness of prevention programs. However, adequate records of population-based cancer are needed to assess the real impact of these programs; as well as designing and evaluating innovative interventions to develop more cost-effective prevention policies.

  4. Parametric regression model for survival data: Weibull regression model as an example

    PubMed Central

    2016-01-01

    Weibull regression model is one of the most popular forms of parametric regression model that it provides estimate of baseline hazard function, as well as coefficients for covariates. Because of technical difficulties, Weibull regression model is seldom used in medical literature as compared to the semi-parametric proportional hazard model. To make clinical investigators familiar with Weibull regression model, this article introduces some basic knowledge on Weibull regression model and then illustrates how to fit the model with R software. The SurvRegCensCov package is useful in converting estimated coefficients to clinical relevant statistics such as hazard ratio (HR) and event time ratio (ETR). Model adequacy can be assessed by inspecting Kaplan-Meier curves stratified by categorical variable. The eha package provides an alternative method to model Weibull regression model. The check.dist() function helps to assess goodness-of-fit of the model. Variable selection is based on the importance of a covariate, which can be tested using anova() function. Alternatively, backward elimination starting from a full model is an efficient way for model development. Visualization of Weibull regression model after model development is interesting that it provides another way to report your findings. PMID:28149846

  5. Evaluating Differential Effects Using Regression Interactions and Regression Mixture Models

    ERIC Educational Resources Information Center

    Van Horn, M. Lee; Jaki, Thomas; Masyn, Katherine; Howe, George; Feaster, Daniel J.; Lamont, Andrea E.; George, Melissa R. W.; Kim, Minjung

    2015-01-01

    Research increasingly emphasizes understanding differential effects. This article focuses on understanding regression mixture models, which are relatively new statistical methods for assessing differential effects by comparing results to using an interactive term in linear regression. The research questions which each model answers, their…

  6. Creep-Rupture Data Analysis - Engineering Application of Regression Techniques. Ph.D. Thesis - North Carolina State Univ.

    NASA Technical Reports Server (NTRS)

    Rummler, D. R.

    1976-01-01

    The results are presented of investigations to apply regression techniques to the development of methodology for creep-rupture data analysis. Regression analysis techniques are applied to the explicit description of the creep behavior of materials for space shuttle thermal protection systems. A regression analysis technique is compared with five parametric methods for analyzing three simulated and twenty real data sets, and a computer program for the evaluation of creep-rupture data is presented.

  7. Normalization Ridge Regression in Practice I: Comparisons Between Ordinary Least Squares, Ridge Regression and Normalization Ridge Regression.

    ERIC Educational Resources Information Center

    Bulcock, J. W.

    The problem of model estimation when the data are collinear was examined. Though the ridge regression (RR) outperforms ordinary least squares (OLS) regression in the presence of acute multicollinearity, it is not a problem free technique for reducing the variance of the estimates. It is a stochastic procedure when it should be nonstochastic and it…

  8. Whole-genome regression and prediction methods applied to plant and animal breeding.

    PubMed

    de Los Campos, Gustavo; Hickey, John M; Pong-Wong, Ricardo; Daetwyler, Hans D; Calus, Mario P L

    2013-02-01

    Genomic-enabled prediction is becoming increasingly important in animal and plant breeding and is also receiving attention in human genetics. Deriving accurate predictions of complex traits requires implementing whole-genome regression (WGR) models where phenotypes are regressed on thousands of markers concurrently. Methods exist that allow implementing these large-p with small-n regressions, and genome-enabled selection (GS) is being implemented in several plant and animal breeding programs. The list of available methods is long, and the relationships between them have not been fully addressed. In this article we provide an overview of available methods for implementing parametric WGR models, discuss selected topics that emerge in applications, and present a general discussion of lessons learned from simulation and empirical data analysis in the last decade.

  9. NCCS Regression Test Harness

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Tharrington, Arnold N.

    2015-09-09

    The NCCS Regression Test Harness is a software package that provides a framework to perform regression and acceptance testing on NCCS High Performance Computers. The package is written in Python and has only the dependency of a Subversion repository to store the regression tests.

  10. Advantages of the net benefit regression framework for economic evaluations of interventions in the workplace: a case study of the cost-effectiveness of a collaborative mental health care program for people receiving short-term disability benefits for psychiatric disorders.

    PubMed

    Hoch, Jeffrey S; Dewa, Carolyn S

    2014-04-01

    Economic evaluations commonly accompany trials of new treatments or interventions; however, regression methods and their corresponding advantages for the analysis of cost-effectiveness data are not well known. To illustrate regression-based economic evaluation, we present a case study investigating the cost-effectiveness of a collaborative mental health care program for people receiving short-term disability benefits for psychiatric disorders. We implement net benefit regression to illustrate its strengths and limitations. Net benefit regression offers a simple option for cost-effectiveness analyses of person-level data. By placing economic evaluation in a regression framework, regression-based techniques can facilitate the analysis and provide simple solutions to commonly encountered challenges. Economic evaluations of person-level data (eg, from a clinical trial) should use net benefit regression to facilitate analysis and enhance results.

  11. MIXOR: a computer program for mixed-effects ordinal regression analysis.

    PubMed

    Hedeker, D; Gibbons, R D

    1996-03-01

    MIXOR provides maximum marginal likelihood estimates for mixed-effects ordinal probit, logistic, and complementary log-log regression models. These models can be used for analysis of dichotomous and ordinal outcomes from either a clustered or longitudinal design. For clustered data, the mixed-effects model assumes that data within clusters are dependent. The degree of dependency is jointly estimated with the usual model parameters, thus adjusting for dependence resulting from clustering of the data. Similarly, for longitudinal data, the mixed-effects approach can allow for individual-varying intercepts and slopes across time, and can estimate the degree to which these time-related effects vary in the population of individuals. MIXOR uses marginal maximum likelihood estimation, utilizing a Fisher-scoring solution. For the scoring solution, the Cholesky factor of the random-effects variance-covariance matrix is estimated, along with the effects of model covariates. Examples illustrating usage and features of MIXOR are provided.

  12. Understanding poisson regression.

    PubMed

    Hayat, Matthew J; Higgins, Melinda

    2014-04-01

    Nurse investigators often collect study data in the form of counts. Traditional methods of data analysis have historically approached analysis of count data either as if the count data were continuous and normally distributed or with dichotomization of the counts into the categories of occurred or did not occur. These outdated methods for analyzing count data have been replaced with more appropriate statistical methods that make use of the Poisson probability distribution, which is useful for analyzing count data. The purpose of this article is to provide an overview of the Poisson distribution and its use in Poisson regression. Assumption violations for the standard Poisson regression model are addressed with alternative approaches, including addition of an overdispersion parameter or negative binomial regression. An illustrative example is presented with an application from the ENSPIRE study, and regression modeling of comorbidity data is included for illustrative purposes. Copyright 2014, SLACK Incorporated.

  13. Prediction of Cancer Incidence and Mortality in Korea, 2018.

    PubMed

    Jung, Kyu-Won; Won, Young-Joo; Kong, Hyun-Joo; Lee, Eun Sook

    2018-04-01

    This study aimed to report on cancer incidence and mortality for the year 2018 to estimate Korea's current cancer burden. Cancer incidence data from 1999 to 2015 were obtained from the Korea National Cancer Incidence Database, and cancer mortality data from 1993 to 2016 were acquired from Statistics Korea. Cancer incidence and mortality were projected by fitting a linear regression model to observed age-specific cancer rates against observed years, then multiplying the projected age-specific rates by the age-specific population. The Joinpoint regression model was used to determine at which year the linear trend changed significantly, we only used the data of the latest trend. A total of 204,909 new cancer cases and 82,155 cancer deaths are expected to occur in Korea in 2018. The most common cancer sites were lung, followed by stomach, colorectal, breast and liver. These five cancers represent half of the overall burden of cancer in Korea. For mortality, the most common sites were lung cancer, followed by liver, colorectal, stomach and pancreas. The incidence rate of all cancer in Korea are estimated to decrease gradually, mainly due to decrease of thyroid cancer. These up-to-date estimates of the cancer burden in Korea could be an important resource for planning and evaluation of cancer-control programs.

  14. Cervical Cancer Incidence in Young U.S. Females After Human Papillomavirus Vaccine Introduction.

    PubMed

    Guo, Fangjian; Cofie, Leslie E; Berenson, Abbey B

    2018-05-30

    Since 2006, human papillomavirus vaccine has been recommended for young females in the U.S. This study aimed to compare cervical cancer incidence among young women before and after the human papillomavirus vaccine was introduced. This cross-sectional study used data from the National Program for Cancer Registries and Surveillance, Epidemiology, and End Results Incidence-U.S. Cancer Statistics 2001-2014 database for U.S. females aged 15-34 years. This study compared the 4-year average annual incidence of invasive cervical cancer in the 4 years before human papillomavirus vaccine was introduced (2003-2006) and the 4 most recent years in the vaccine era (2011-2014). Joinpoint regression models of cervical incidence from 2001 to 2014 were fitted to identify the discrete joints (year) that represent statistically significant changes in the direction of the trend after the introduction of human papillomavirus vaccination in 2006. Data were collected in 2001-2014, released, and analyzed in 2017. The 4-year average annual incidence rates for cervical cancer in 2011-2014 were 29% lower than that in 2003-2006 (6.0 vs 8.4 per 1,000,000 people, rate ratio=0.71, 95% CI=0.64, 0.80) among females aged 15-24 years, and 13.0% lower among females aged 25-34 years. Joinpoint analyses of cervical cancer incidence among females aged 15-24 years revealed a significant joint at 2009 for both squamous cell carcinoma and non-squamous cell carcinoma. Among females aged 25-34 years, there was no significant decrease in cervical cancer incidence after 2006. A significant decrease in the incidence of cervical cancer among young females after the introduction of human papillomavirus vaccine may indicate early effects of human papillomavirus vaccination. Copyright © 2018 American Journal of Preventive Medicine. Published by Elsevier Inc. All rights reserved.

  15. Whole-Genome Regression and Prediction Methods Applied to Plant and Animal Breeding

    PubMed Central

    de los Campos, Gustavo; Hickey, John M.; Pong-Wong, Ricardo; Daetwyler, Hans D.; Calus, Mario P. L.

    2013-01-01

    Genomic-enabled prediction is becoming increasingly important in animal and plant breeding and is also receiving attention in human genetics. Deriving accurate predictions of complex traits requires implementing whole-genome regression (WGR) models where phenotypes are regressed on thousands of markers concurrently. Methods exist that allow implementing these large-p with small-n regressions, and genome-enabled selection (GS) is being implemented in several plant and animal breeding programs. The list of available methods is long, and the relationships between them have not been fully addressed. In this article we provide an overview of available methods for implementing parametric WGR models, discuss selected topics that emerge in applications, and present a general discussion of lessons learned from simulation and empirical data analysis in the last decade. PMID:22745228

  16. Multiple regression technique for Pth degree polynominals with and without linear cross products

    NASA Technical Reports Server (NTRS)

    Davis, J. W.

    1973-01-01

    A multiple regression technique was developed by which the nonlinear behavior of specified independent variables can be related to a given dependent variable. The polynomial expression can be of Pth degree and can incorporate N independent variables. Two cases are treated such that mathematical models can be studied both with and without linear cross products. The resulting surface fits can be used to summarize trends for a given phenomenon and provide a mathematical relationship for subsequent analysis. To implement this technique, separate computer programs were developed for the case without linear cross products and for the case incorporating such cross products which evaluate the various constants in the model regression equation. In addition, the significance of the estimated regression equation is considered and the standard deviation, the F statistic, the maximum absolute percent error, and the average of the absolute values of the percent of error evaluated. The computer programs and their manner of utilization are described. Sample problems are included to illustrate the use and capability of the technique which show the output formats and typical plots comparing computer results to each set of input data.

  17. Unitary Response Regression Models

    ERIC Educational Resources Information Center

    Lipovetsky, S.

    2007-01-01

    The dependent variable in a regular linear regression is a numerical variable, and in a logistic regression it is a binary or categorical variable. In these models the dependent variable has varying values. However, there are problems yielding an identity output of a constant value which can also be modelled in a linear or logistic regression with…

  18. REGRES: A FORTRAN-77 program to calculate nonparametric and ``structural'' parametric solutions to bivariate regression equations

    NASA Astrophysics Data System (ADS)

    Rock, N. M. S.; Duffy, T. R.

    REGRES allows a range of regression equations to be calculated for paired sets of data values in which both variables are subject to error (i.e. neither is the "independent" variable). Nonparametric regressions, based on medians of all possible pairwise slopes and intercepts, are treated in detail. Estimated slopes and intercepts are output, along with confidence limits, Spearman and Kendall rank correlation coefficients. Outliers can be rejected with user-determined stringency. Parametric regressions can be calculated for any value of λ (the ratio of the variances of the random errors for y and x)—including: (1) major axis ( λ = 1); (2) reduced major axis ( λ = variance of y/variance of x); (3) Y on Xλ = infinity; or (4) X on Y ( λ = 0) solutions. Pearson linear correlation coefficients also are output. REGRES provides an alternative to conventional isochron assessment techniques where bivariate normal errors cannot be assumed, or weighting methods are inappropriate.

  19. Correlation and simple linear regression.

    PubMed

    Eberly, Lynn E

    2007-01-01

    This chapter highlights important steps in using correlation and simple linear regression to address scientific questions about the association of two continuous variables with each other. These steps include estimation and inference, assessing model fit, the connection between regression and ANOVA, and study design. Examples in microbiology are used throughout. This chapter provides a framework that is helpful in understanding more complex statistical techniques, such as multiple linear regression, linear mixed effects models, logistic regression, and proportional hazards regression.

  20. Cox regression analysis with missing covariates via nonparametric multiple imputation.

    PubMed

    Hsu, Chiu-Hsieh; Yu, Mandi

    2018-01-01

    We consider the situation of estimating Cox regression in which some covariates are subject to missing, and there exists additional information (including observed event time, censoring indicator and fully observed covariates) which may be predictive of the missing covariates. We propose to use two working regression models: one for predicting the missing covariates and the other for predicting the missing probabilities. For each missing covariate observation, these two working models are used to define a nearest neighbor imputing set. This set is then used to non-parametrically impute covariate values for the missing observation. Upon the completion of imputation, Cox regression is performed on the multiply imputed datasets to estimate the regression coefficients. In a simulation study, we compare the nonparametric multiple imputation approach with the augmented inverse probability weighted (AIPW) method, which directly incorporates the two working models into estimation of Cox regression, and the predictive mean matching imputation (PMM) method. We show that all approaches can reduce bias due to non-ignorable missing mechanism. The proposed nonparametric imputation method is robust to mis-specification of either one of the two working models and robust to mis-specification of the link function of the two working models. In contrast, the PMM method is sensitive to misspecification of the covariates included in imputation. The AIPW method is sensitive to the selection probability. We apply the approaches to a breast cancer dataset from Surveillance, Epidemiology and End Results (SEER) Program.

  1. Recent lung cancer mortality trends in Europe: effect of national smoke-free legislation strengthening.

    PubMed

    López-Campos, Jose L; Ruiz-Ramos, Miguel; Fernandez, Esteve; Soriano, Joan B

    2018-07-01

    The impact of smoke-free legislation within European Union (EU) countries on lung cancer mortality has not been evaluated to date. We aimed to determine lung cancer mortality trends in the EU-27 by sex, age, and calendar year for the period of 1994 and 2012, and relate them with changes in tobacco legislation at the national level. Deaths by Eurostat in each European country were analyzed, focusing on ICD-10 codes C33 and C34 from the years 1994 to 2012. Age-standardized mortality rates (ASR) were estimated separately for women and men in the EU-27 total and within country for each one of the years studied, and the significance of changing trends was estimated by joinpoint regression analysis, exploring lag times after initiation of smoke-free legislation in every country, if any. From 1994 to 2012, there were 4 681 877 deaths from lung cancer in Europe (3 491 607 in men and 1 190 180 in women) and a nearly linear decrease in mortality rates because of lung cancer in men from was observed1994 to 2012, mirrored in women by an upward trend, narrowing the sex gap during the study period from 5.1 in 1994 to 2.8 in 2012. Joinpoint regression analysis identified a number of trend changes over time, but it appears that they were unrelated to the implementation of smoke-free legislations. A few years after the introduction of smoke-free legislations across Europe, trends of lung cancer mortality trends have not changed.

  2. Oral cavity cancer trends over the past 25 years in Hong Kong: a multidirectional statistical analysis.

    PubMed

    Ushida, Keisuke; McGrath, Colman P; Lo, Edward C M; Zwahlen, Roger A

    2015-07-24

    Even though oral cavity cancer (OCC; ICD 10 codes C01, C02, C03, C04, C05, and C06) ranks eleventh among the world's most common cancers, accounting for approximately 2 % of all cancers, a trend analysis of OCC in Hong Kong is lacking. Hong Kong has experienced rapid economic growth with socio-cultural and environmental change after the Second World War. This together with the collected data in the cancer registry provides interesting ground for an epidemiological study on the influence of socio-cultural and environmental factors on OCC etiology. A multidirectional statistical analysis of the OCC trends over the past 25 years was performed using the databases of the Hong Kong Cancer Registry. The age, period, and cohort (APC) modeling was applied to determine age, period, and cohort effects on OCC development. Joinpoint regression analysis was used to find secular trend changes of both age-standardized and age-specific incidence rates. The APC model detected that OCC development in men was mainly dominated by the age effect, whereas in women an increasing linear period effect together with an age effect became evident. The joinpoint regression analysis showed a general downward trend of age-standardized incidence rates of OCC for men during the entire investigated period, whereas women demonstrated a significant upward trend from 2001 onwards. The results suggest that OCC incidence in Hong Kong appears to be associated with cumulative risk behaviors of the population, despite considerable socio-cultural and environmental changes after the Second World War.

  3. Breast Cancer Trend in Iran from 2000 to 2009 and Prediction till 2020 using a Trend Analysis Method.

    PubMed

    Zahmatkesh, Bibihajar; Keramat, Afsaneh; Alavi, Nasrinossadat; Khosravi, Ahmad; Kousha, Ahmad; Motlagh, Ali Ghanbari; Darman, Mahboobeh; Partovipour, Elham; Chaman, Reza

    2016-01-01

    Breast cancer is the most common cancer in women worldwide with a rising incidence rate in most countries. Considering the increase in life expectancy and change in lifestyle of Iranian women, this study investigated the age-adjusted trend of breast cancer incidence during 2000-2009 and predicted its incidence to 2020. The 1997 and 2006 census results were used for the projection of female population by age through the cohort-component method over the studied years. Data from the Iranian cancer registration system were used to calculate the annual incidence rate of breast cancer. The age-adjusted incidence rate was then calculated using the WHO standard population distribution. The five-year-age-specific incidence rates were also obtained for each year and future incidence was determined using the trend analysis method. Annual percentage change (APC) was calculated through the joinpoint regression method. The bias adjusted incidence rate of breast cancer increased from 16.7 per 100,000 women in 2000 to 33.6 per 100,000 women in 2009. The incidence of breast cancer had a growing trend in almost all age groups above 30 years over the studied years. In this period, the age groups of 45-65 years had the highest incidence. Investigation into the joinpoint curve showed that the curve had a steep slope with an APC of 23.4% before the first joinpoint, but became milder after this. From 2005 to 2009, the APC was calculated as 2.7%, through which the incidence of breast cancer in 2020 was predicted as 63.0 per 100,000 women. The age-adjusted incidence rate of breast cancer continues to increas in Iranian women. It is predicted that this trend will continue until 2020. Therefore, it seems necessary to prioritize the prevention, control and care for breast cancer in Iran.

  4. Site-specific estimation of peak-streamflow frequency using generalized least-squares regression for natural basins in Texas

    USGS Publications Warehouse

    Asquith, William H.; Slade, R.M.

    1999-01-01

    The U.S. Geological Survey, in cooperation with the Texas Department of Transportation, has developed a computer program to estimate peak-streamflow frequency for ungaged sites in natural basins in Texas. Peak-streamflow frequency refers to the peak streamflows for recurrence intervals of 2, 5, 10, 25, 50, and 100 years. Peak-streamflow frequency estimates are needed by planners, managers, and design engineers for flood-plain management; for objective assessment of flood risk; for cost-effective design of roads and bridges; and also for the desin of culverts, dams, levees, and other flood-control structures. The program estimates peak-streamflow frequency using a site-specific approach and a multivariate generalized least-squares linear regression. A site-specific approach differs from a traditional regional regression approach by developing unique equations to estimate peak-streamflow frequency specifically for the ungaged site. The stations included in the regression are selected using an informal cluster analysis that compares the basin characteristics of the ungaged site to the basin characteristics of all the stations in the data base. The program provides several choices for selecting the stations. Selecting the stations using cluster analysis ensures that the stations included in the regression will have the most pertinent information about flooding characteristics of the ungaged site and therefore provide the basis for potentially improved peak-streamflow frequency estimation. An evaluation of the site-specific approach in estimating peak-streamflow frequency for gaged sites indicates that the site-specific approach is at least as accurate as a traditional regional regression approach.

  5. Financial Aid and First-Year Collegiate GPA: A Regression Discontinuity Approach

    ERIC Educational Resources Information Center

    Curs, Bradley R.; Harper, Casandra E.

    2012-01-01

    Using a regression discontinuity design, we investigate whether a merit-based financial aid program has a causal effect on the first-year grade point average of first-time out-of-state freshmen at the University of Oregon. Our results indicate that merit-based financial aid has a positive and significant effect on first-year collegiate grade point…

  6. Support Vector Machine algorithm for regression and classification

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Yu, Chenggang; Zavaljevski, Nela

    2001-08-01

    The software is an implementation of the Support Vector Machine (SVM) algorithm that was invented and developed by Vladimir Vapnik and his co-workers at AT&T Bell Laboratories. The specific implementation reported here is an Active Set method for solving a quadratic optimization problem that forms the major part of any SVM program. The implementation is tuned to specific constraints generated in the SVM learning. Thus, it is more efficient than general-purpose quadratic optimization programs. A decomposition method has been implemented in the software that enables processing large data sets. The size of the learning data is virtually unlimited by themore » capacity of the computer physical memory. The software is flexible and extensible. Two upper bounds are implemented to regulate the SVM learning for classification, which allow users to adjust the false positive and false negative rates. The software can be used either as a standalone, general-purpose SVM regression or classification program, or be embedded into a larger software system.« less

  7. Error Covariance Penalized Regression: A novel multivariate model combining penalized regression with multivariate error structure.

    PubMed

    Allegrini, Franco; Braga, Jez W B; Moreira, Alessandro C O; Olivieri, Alejandro C

    2018-06-29

    A new multivariate regression model, named Error Covariance Penalized Regression (ECPR) is presented. Following a penalized regression strategy, the proposed model incorporates information about the measurement error structure of the system, using the error covariance matrix (ECM) as a penalization term. Results are reported from both simulations and experimental data based on replicate mid and near infrared (MIR and NIR) spectral measurements. The results for ECPR are better under non-iid conditions when compared with traditional first-order multivariate methods such as ridge regression (RR), principal component regression (PCR) and partial least-squares regression (PLS). Copyright © 2018 Elsevier B.V. All rights reserved.

  8. Orthogonal Regression: A Teaching Perspective

    ERIC Educational Resources Information Center

    Carr, James R.

    2012-01-01

    A well-known approach to linear least squares regression is that which involves minimizing the sum of squared orthogonal projections of data points onto the best fit line. This form of regression is known as orthogonal regression, and the linear model that it yields is known as the major axis. A similar method, reduced major axis regression, is…

  9. Marginalized zero-inflated negative binomial regression with application to dental caries

    PubMed Central

    Preisser, John S.; Das, Kalyan; Long, D. Leann; Divaris, Kimon

    2015-01-01

    The zero-inflated negative binomial regression model (ZINB) is often employed in diverse fields such as dentistry, health care utilization, highway safety, and medicine to examine relationships between exposures of interest and overdispersed count outcomes exhibiting many zeros. The regression coefficients of ZINB have latent class interpretations for a susceptible subpopulation at risk for the disease/condition under study with counts generated from a negative binomial distribution and for a non-susceptible subpopulation that provides only zero counts. The ZINB parameters, however, are not well-suited for estimating overall exposure effects, specifically, in quantifying the effect of an explanatory variable in the overall mixture population. In this paper, a marginalized zero-inflated negative binomial regression (MZINB) model for independent responses is proposed to model the population marginal mean count directly, providing straightforward inference for overall exposure effects based on maximum likelihood estimation. Through simulation studies, the finite sample performance of MZINB is compared to marginalized zero-inflated Poisson, Poisson, and negative binomial regression. The MZINB model is applied in the evaluation of a school-based fluoride mouthrinse program on dental caries in 677 children. PMID:26568034

  10. Determination of riverbank erosion probability using Locally Weighted Logistic Regression

    NASA Astrophysics Data System (ADS)

    Ioannidou, Elena; Flori, Aikaterini; Varouchakis, Emmanouil A.; Giannakis, Georgios; Vozinaki, Anthi Eirini K.; Karatzas, George P.; Nikolaidis, Nikolaos

    2015-04-01

    erosion occurrence probability can be calculated in conjunction with the model deviance regarding the independent variables tested. The most straightforward measure for goodness of fit is the G statistic. It is a simple and effective way to study and evaluate the Logistic Regression model efficiency and the reliability of each independent variable. The developed statistical model is applied to the Koiliaris River Basin on the island of Crete, Greece. Two datasets of river bank slope, river cross-section width and indications of erosion were available for the analysis (12 and 8 locations). Two different types of spatial dependence functions, exponential and tricubic, were examined to determine the local spatial dependence of the independent variables at the measurement locations. The results show a significant improvement when the tricubic function is applied as the erosion probability is accurately predicted at all eight validation locations. Results for the model deviance show that cross-section width is more important than bank slope in the estimation of erosion probability along the Koiliaris riverbanks. The proposed statistical model is a useful tool that quantifies the erosion probability along the riverbanks and can be used to assist managing erosion and flooding events. Acknowledgements This work is part of an on-going THALES project (CYBERSENSORS - High Frequency Monitoring System for Integrated Water Resources Management of Rivers). The project has been co-financed by the European Union (European Social Fund - ESF) and Greek national funds through the Operational Program "Education and Lifelong Learning" of the National Strategic Reference Framework (NSRF) - Research Funding Program: THALES. Investing in knowledge society through the European Social Fund.

  11. Logic regression and its extensions.

    PubMed

    Schwender, Holger; Ruczinski, Ingo

    2010-01-01

    Logic regression is an adaptive classification and regression procedure, initially developed to reveal interacting single nucleotide polymorphisms (SNPs) in genetic association studies. In general, this approach can be used in any setting with binary predictors, when the interaction of these covariates is of primary interest. Logic regression searches for Boolean (logic) combinations of binary variables that best explain the variability in the outcome variable, and thus, reveals variables and interactions that are associated with the response and/or have predictive capabilities. The logic expressions are embedded in a generalized linear regression framework, and thus, logic regression can handle a variety of outcome types, such as binary responses in case-control studies, numeric responses, and time-to-event data. In this chapter, we provide an introduction to the logic regression methodology, list some applications in public health and medicine, and summarize some of the direct extensions and modifications of logic regression that have been proposed in the literature. Copyright © 2010 Elsevier Inc. All rights reserved.

  12. An operational GLS model for hydrologic regression

    USGS Publications Warehouse

    Tasker, Gary D.; Stedinger, J.R.

    1989-01-01

    Recent Monte Carlo studies have documented the value of generalized least squares (GLS) procedures to estimate empirical relationships between streamflow statistics and physiographic basin characteristics. This paper presents a number of extensions of the GLS method that deal with realities and complexities of regional hydrologic data sets that were not addressed in the simulation studies. These extensions include: (1) a more realistic model of the underlying model errors; (2) smoothed estimates of cross correlation of flows; (3) procedures for including historical flow data; (4) diagnostic statistics describing leverage and influence for GLS regression; and (5) the formulation of a mathematical program for evaluating future gaging activities. ?? 1989.

  13. Organic matter variations in transgressive and regressive shales

    USGS Publications Warehouse

    Pasley, M.A.; Gregory, W.A.; Hart, G.F.

    1991-01-01

    Organic matter in the Upper Cretaceous Mancos Shale adjacent to the Tocito Sandstone in the San Juan Basin of New Mexico was characterized using organic petrology and organic geochemistry. Differences in the organic matter found in these regressive and transgressive offshore marine sediments have been documented and assessed within a sequence stratigraphic framework. The regressive Lower Mancos Shale below the Tocito Sandstone contains abundant well preserved phytoclasts and correspondingly low hydrogen indices. Total organic carbon values for the regressive shale are low. Sediments from the transgressive systems tract (Tocito Sandstone and overlying Upper Mancos Shale) contain less terrestrially derived organic matter, more amorphous non-structured protistoclasts, higher hydrogen indices and more total organic carbon. Advanced stages of degradation are characteristic of the phytoclasts found in the transgressive shale. Amorphous material in the transgressive shale fluoresces strongly while that found in the regressive shale is typically non-fluorescent. Data from pyrolysis-gas chromatography confirm these observations. These differences are apparently related to the contrasting depositional styles that were active on the shelf during regression and subsequent transgression. It is suggested that data from organic petrology and organic geochemistry provide greater resolution in sedimentologic and stratigraphic interpretations, particularly when working with basinward, fine-grained sediments. Petroleum source potential for the regressive Lower Mancos Shale below the Tocito Sandstone is poor. Based on abundant fluorescent amorphous material, high hydrogen indices, and high total organic carbon, the transgressive Upper Mancos Shale above the Tocito Sandstone possesses excellent source potential. This suggests that appreciable source potential can be found in offshore, fine-grained sediments of the transgressive systems tract below the condensed section and associated

  14. Population-wide folic acid fortification and preterm birth: testing the folate depletion hypothesis.

    PubMed

    Naimi, Ashley I; Auger, Nathalie

    2015-04-01

    We assess whether population-wide folic acid fortification policies were followed by a reduction of preterm and early-term birth rates in Québec among women with short and optimal interpregnancy intervals. We extracted birth certificate data for 1.3 million births between 1981 and 2010 to compute age-adjusted preterm and early-term birth rates stratified by short and optimal interpregnancy intervals. We used Joinpoint regression to detect changes in the preterm and early term birth rates and assess whether these changes coincide with the implementation of population-wide folic acid fortification. A change in the preterm birth rate occurred in 2000 among women with short (95% confidence interval [CI] = 1994, 2005) and optimal (95% CI = 1995, 2008) interpregnancy intervals. Changes in early term birth rates did not coincide with the implementation of folic acid fortification. Our results do not indicate a link between folic acid fortification and early term birth but suggest an improvement in preterm birth rates after implementation of a nationwide folic acid fortification program.

  15. Characteristics and effectiveness of diabetes self-management educational programs targeted to racial/ethnic minority groups: a systematic review, meta-analysis and meta-regression.

    PubMed

    Ricci-Cabello, Ignacio; Ruiz-Pérez, Isabel; Rojas-García, Antonio; Pastor, Guadalupe; Rodríguez-Barranco, Miguel; Gonçalves, Daniela C

    2014-07-19

    It is not clear to what extent educational programs aimed at promoting diabetes self-management in ethnic minority groups are effective. The aim of this work was to systematically review the effectiveness of educational programs to promote the self-management of racial/ethnic minority groups with type 2 diabetes, and to identify programs' characteristics associated with greater success. We undertook a systematic literature review. Specific searches were designed and implemented for Medline, EMBASE, CINAHL, ISI Web of Knowledge, Scirus, Current Contents and nine additional sources (from inception to October 2012). We included experimental and quasi-experimental studies assessing the impact of educational programs targeted to racial/ethnic minority groups with type 2 diabetes. We only included interventions conducted in countries members of the OECD. Two reviewers independently screened citations. Structured forms were used to extract information on intervention characteristics, effectiveness, and cost-effectiveness. When possible, we conducted random-effects meta-analyses using standardized mean differences to obtain aggregate estimates of effect size with 95% confidence intervals. Two reviewers independently extracted all the information and critically appraised the studies. We identified thirty-seven studies reporting on thirty-nine educational programs. Most of them were conducted in the US, with African American or Latino participants. Most programs obtained some benefits over standard care in improving diabetes knowledge, self-management behaviors and clinical outcomes. A meta-analysis of 20 randomized controlled trials (3,094 patients) indicated that the programs produced a reduction in glycated hemoglobin of -0.31% (95% CI -0.48% to -0.14%). Diabetes knowledge and self-management measures were too heterogeneous to pool. Meta-regressions showed larger reduction in glycated hemoglobin in individual and face to face delivered interventions, as well as in those

  16. Assessing the Generalizability of Estimates of Causal Effects from Regression Discontinuity Designs

    ERIC Educational Resources Information Center

    Bloom, Howard S.; Porter, Kristin E.

    2012-01-01

    In recent years, the regression discontinuity design (RDD) has gained widespread recognition as a quasi-experimental method that when used correctly, can produce internally valid estimates of causal effects of a treatment, a program or an intervention (hereafter referred to as treatment effects). In an RDD study, subjects or groups of subjects…

  17. Variable Selection for Regression Models of Percentile Flows

    NASA Astrophysics Data System (ADS)

    Fouad, G.

    2017-12-01

    Percentile flows describe the flow magnitude equaled or exceeded for a given percent of time, and are widely used in water resource management. However, these statistics are normally unavailable since most basins are ungauged. Percentile flows of ungauged basins are often predicted using regression models based on readily observable basin characteristics, such as mean elevation. The number of these independent variables is too large to evaluate all possible models. A subset of models is typically evaluated using automatic procedures, like stepwise regression. This ignores a large variety of methods from the field of feature (variable) selection and physical understanding of percentile flows. A study of 918 basins in the United States was conducted to compare an automatic regression procedure to the following variable selection methods: (1) principal component analysis, (2) correlation analysis, (3) random forests, (4) genetic programming, (5) Bayesian networks, and (6) physical understanding. The automatic regression procedure only performed better than principal component analysis. Poor performance of the regression procedure was due to a commonly used filter for multicollinearity, which rejected the strongest models because they had cross-correlated independent variables. Multicollinearity did not decrease model performance in validation because of a representative set of calibration basins. Variable selection methods based strictly on predictive power (numbers 2-5 from above) performed similarly, likely indicating a limit to the predictive power of the variables. Similar performance was also reached using variables selected based on physical understanding, a finding that substantiates recent calls to emphasize physical understanding in modeling for predictions in ungauged basins. The strongest variables highlighted the importance of geology and land cover, whereas widely used topographic variables were the weakest predictors. Variables suffered from a high

  18. Regression in autistic spectrum disorders.

    PubMed

    Stefanatos, Gerry A

    2008-12-01

    A significant proportion of children diagnosed with Autistic Spectrum Disorder experience a developmental regression characterized by a loss of previously-acquired skills. This may involve a loss of speech or social responsitivity, but often entails both. This paper critically reviews the phenomena of regression in autistic spectrum disorders, highlighting the characteristics of regression, age of onset, temporal course, and long-term outcome. Important considerations for diagnosis are discussed and multiple etiological factors currently hypothesized to underlie the phenomenon are reviewed. It is argued that regressive autistic spectrum disorders can be conceptualized on a spectrum with other regressive disorders that may share common pathophysiological features. The implications of this viewpoint are discussed.

  19. Linear regression in astronomy. II

    NASA Technical Reports Server (NTRS)

    Feigelson, Eric D.; Babu, Gutti J.

    1992-01-01

    A wide variety of least-squares linear regression procedures used in observational astronomy, particularly investigations of the cosmic distance scale, are presented and discussed. The classes of linear models considered are (1) unweighted regression lines, with bootstrap and jackknife resampling; (2) regression solutions when measurement error, in one or both variables, dominates the scatter; (3) methods to apply a calibration line to new data; (4) truncated regression models, which apply to flux-limited data sets; and (5) censored regression models, which apply when nondetections are present. For the calibration problem we develop two new procedures: a formula for the intercept offset between two parallel data sets, which propagates slope errors from one regression to the other; and a generalization of the Working-Hotelling confidence bands to nonstandard least-squares lines. They can provide improved error analysis for Faber-Jackson, Tully-Fisher, and similar cosmic distance scale relations.

  20. Retargeted Least Squares Regression Algorithm.

    PubMed

    Zhang, Xu-Yao; Wang, Lingfeng; Xiang, Shiming; Liu, Cheng-Lin

    2015-09-01

    This brief presents a framework of retargeted least squares regression (ReLSR) for multicategory classification. The core idea is to directly learn the regression targets from data other than using the traditional zero-one matrix as regression targets. The learned target matrix can guarantee a large margin constraint for the requirement of correct classification for each data point. Compared with the traditional least squares regression (LSR) and a recently proposed discriminative LSR models, ReLSR is much more accurate in measuring the classification error of the regression model. Furthermore, ReLSR is a single and compact model, hence there is no need to train two-class (binary) machines that are independent of each other. The convex optimization problem of ReLSR is solved elegantly and efficiently with an alternating procedure including regression and retargeting as substeps. The experimental evaluation over a range of databases identifies the validity of our method.

  1. Regression-assisted deconvolution.

    PubMed

    McIntyre, Julie; Stefanski, Leonard A

    2011-06-30

    We present a semi-parametric deconvolution estimator for the density function of a random variable biX that is measured with error, a common challenge in many epidemiological studies. Traditional deconvolution estimators rely only on assumptions about the distribution of X and the error in its measurement, and ignore information available in auxiliary variables. Our method assumes the availability of a covariate vector statistically related to X by a mean-variance function regression model, where regression errors are normally distributed and independent of the measurement errors. Simulations suggest that the estimator achieves a much lower integrated squared error than the observed-data kernel density estimator when models are correctly specified and the assumption of normal regression errors is met. We illustrate the method using anthropometric measurements of newborns to estimate the density function of newborn length. Copyright © 2011 John Wiley & Sons, Ltd.

  2. Bayesian isotonic density regression

    PubMed Central

    Wang, Lianming; Dunson, David B.

    2011-01-01

    Density regression models allow the conditional distribution of the response given predictors to change flexibly over the predictor space. Such models are much more flexible than nonparametric mean regression models with nonparametric residual distributions, and are well supported in many applications. A rich variety of Bayesian methods have been proposed for density regression, but it is not clear whether such priors have full support so that any true data-generating model can be accurately approximated. This article develops a new class of density regression models that incorporate stochastic-ordering constraints which are natural when a response tends to increase or decrease monotonely with a predictor. Theory is developed showing large support. Methods are developed for hypothesis testing, with posterior computation relying on a simple Gibbs sampler. Frequentist properties are illustrated in a simulation study, and an epidemiology application is considered. PMID:22822259

  3. Regression dilution bias: tools for correction methods and sample size calculation.

    PubMed

    Berglund, Lars

    2012-08-01

    Random errors in measurement of a risk factor will introduce downward bias of an estimated association to a disease or a disease marker. This phenomenon is called regression dilution bias. A bias correction may be made with data from a validity study or a reliability study. In this article we give a non-technical description of designs of reliability studies with emphasis on selection of individuals for a repeated measurement, assumptions of measurement error models, and correction methods for the slope in a simple linear regression model where the dependent variable is a continuous variable. Also, we describe situations where correction for regression dilution bias is not appropriate. The methods are illustrated with the association between insulin sensitivity measured with the euglycaemic insulin clamp technique and fasting insulin, where measurement of the latter variable carries noticeable random error. We provide software tools for estimation of a corrected slope in a simple linear regression model assuming data for a continuous dependent variable and a continuous risk factor from a main study and an additional measurement of the risk factor in a reliability study. Also, we supply programs for estimation of the number of individuals needed in the reliability study and for choice of its design. Our conclusion is that correction for regression dilution bias is seldom applied in epidemiological studies. This may cause important effects of risk factors with large measurement errors to be neglected.

  4. Progressive and Regressive Aspects of Information Technology in Society: A Third Sector Perspective

    ERIC Educational Resources Information Center

    Miller, Kandace R.

    2009-01-01

    This dissertation explores the impact of information technology on progressive and regressive values in society from the perspective of one international foundation and four of its technology-related programs. Through a critical interpretive approach employing an instrumental multiple-case method, a framework to help explain the influence of…

  5. Efficiency Analysis: Enhancing the Statistical and Evaluative Power of the Regression-Discontinuity Design.

    ERIC Educational Resources Information Center

    Madhere, Serge

    An analytic procedure, efficiency analysis, is proposed for improving the utility of quantitative program evaluation for decision making. The three features of the procedure are explained: (1) for statistical control, it adopts and extends the regression-discontinuity design; (2) for statistical inferences, it de-emphasizes hypothesis testing in…

  6. Fungible weights in logistic regression.

    PubMed

    Jones, Jeff A; Waller, Niels G

    2016-06-01

    In this article we develop methods for assessing parameter sensitivity in logistic regression models. To set the stage for this work, we first review Waller's (2008) equations for computing fungible weights in linear regression. Next, we describe 2 methods for computing fungible weights in logistic regression. To demonstrate the utility of these methods, we compute fungible logistic regression weights using data from the Centers for Disease Control and Prevention's (2010) Youth Risk Behavior Surveillance Survey, and we illustrate how these alternate weights can be used to evaluate parameter sensitivity. To make our work accessible to the research community, we provide R code (R Core Team, 2015) that will generate both kinds of fungible logistic regression weights. (PsycINFO Database Record (c) 2016 APA, all rights reserved).

  7. Assessment of Weighted Quantile Sum Regression for Modeling Chemical Mixtures and Cancer Risk

    PubMed Central

    Czarnota, Jenna; Gennings, Chris; Wheeler, David C

    2015-01-01

    In evaluation of cancer risk related to environmental chemical exposures, the effect of many chemicals on disease is ultimately of interest. However, because of potentially strong correlations among chemicals that occur together, traditional regression methods suffer from collinearity effects, including regression coefficient sign reversal and variance inflation. In addition, penalized regression methods designed to remediate collinearity may have limitations in selecting the truly bad actors among many correlated components. The recently proposed method of weighted quantile sum (WQS) regression attempts to overcome these problems by estimating a body burden index, which identifies important chemicals in a mixture of correlated environmental chemicals. Our focus was on assessing through simulation studies the accuracy of WQS regression in detecting subsets of chemicals associated with health outcomes (binary and continuous) in site-specific analyses and in non-site-specific analyses. We also evaluated the performance of the penalized regression methods of lasso, adaptive lasso, and elastic net in correctly classifying chemicals as bad actors or unrelated to the outcome. We based the simulation study on data from the National Cancer Institute Surveillance Epidemiology and End Results Program (NCI-SEER) case–control study of non-Hodgkin lymphoma (NHL) to achieve realistic exposure situations. Our results showed that WQS regression had good sensitivity and specificity across a variety of conditions considered in this study. The shrinkage methods had a tendency to incorrectly identify a large number of components, especially in the case of strong association with the outcome. PMID:26005323

  8. Assessment of weighted quantile sum regression for modeling chemical mixtures and cancer risk.

    PubMed

    Czarnota, Jenna; Gennings, Chris; Wheeler, David C

    2015-01-01

    In evaluation of cancer risk related to environmental chemical exposures, the effect of many chemicals on disease is ultimately of interest. However, because of potentially strong correlations among chemicals that occur together, traditional regression methods suffer from collinearity effects, including regression coefficient sign reversal and variance inflation. In addition, penalized regression methods designed to remediate collinearity may have limitations in selecting the truly bad actors among many correlated components. The recently proposed method of weighted quantile sum (WQS) regression attempts to overcome these problems by estimating a body burden index, which identifies important chemicals in a mixture of correlated environmental chemicals. Our focus was on assessing through simulation studies the accuracy of WQS regression in detecting subsets of chemicals associated with health outcomes (binary and continuous) in site-specific analyses and in non-site-specific analyses. We also evaluated the performance of the penalized regression methods of lasso, adaptive lasso, and elastic net in correctly classifying chemicals as bad actors or unrelated to the outcome. We based the simulation study on data from the National Cancer Institute Surveillance Epidemiology and End Results Program (NCI-SEER) case-control study of non-Hodgkin lymphoma (NHL) to achieve realistic exposure situations. Our results showed that WQS regression had good sensitivity and specificity across a variety of conditions considered in this study. The shrinkage methods had a tendency to incorrectly identify a large number of components, especially in the case of strong association with the outcome.

  9. Enhanced ID Pit Sizing Using Multivariate Regression Algorithm

    NASA Astrophysics Data System (ADS)

    Krzywosz, Kenji

    2007-03-01

    EPRI is funding a program to enhance and improve the reliability of inside diameter (ID) pit sizing for balance-of plant heat exchangers, such as condensers and component cooling water heat exchangers. More traditional approaches to ID pit sizing involve the use of frequency-specific amplitude or phase angles. The enhanced multivariate regression algorithm for ID pit depth sizing incorporates three simultaneous input parameters of frequency, amplitude, and phase angle. A set of calibration data sets consisting of machined pits of various rounded and elongated shapes and depths was acquired in the frequency range of 100 kHz to 1 MHz for stainless steel tubing having nominal wall thickness of 0.028 inch. To add noise to the acquired data set, each test sample was rotated and test data acquired at 3, 6, 9, and 12 o'clock positions. The ID pit depths were estimated using a second order and fourth order regression functions by relying on normalized amplitude and phase angle information from multiple frequencies. Due to unique damage morphology associated with the microbiologically-influenced ID pits, it was necessary to modify the elongated calibration standard-based algorithms by relying on the algorithm developed solely from the destructive sectioning results. This paper presents the use of transformed multivariate regression algorithm to estimate ID pit depths and compare the results with the traditional univariate phase angle analysis. Both estimates were then compared with the destructive sectioning results.

  10. Increasing incidence of thyroid cancer in the Nordic countries with main focus on Swedish data.

    PubMed

    Carlberg, Michael; Hedendahl, Lena; Ahonen, Mikko; Koppel, Tarmo; Hardell, Lennart

    2016-07-07

    Radiofrequency radiation in the frequency range 30 kHz-300 GHz was evaluated to be Group 2B, i.e. 'possibly' carcinogenic to humans, by the International Agency for Research on Cancer (IARC) at WHO in May 2011. Among the evaluated devices were mobile and cordless phones, since they emit radiofrequency electromagnetic fields (RF-EMF). In addition to the brain, another organ, the thyroid gland, also receives high exposure. The incidence of thyroid cancer is increasing in many countries, especially the papillary type that is the most radiosensitive type. We used the Swedish Cancer Register to study the incidence of thyroid cancer during 1970-2013 using joinpoint regression analysis. In women, the incidence increased statistically significantly during the whole study period; average annual percentage change (AAPC) +1.19 % (95 % confidence interval (CI) +0.56, +1.83 %). Two joinpoints were detected, 1979 and 2001, with a high increase of the incidence during the last period 2001-2013 with an annual percentage change (APC) of +5.34 % (95 % CI +3.93, +6.77 %). AAPC for all men during 1970-2013 was +0.77 % (95 % CI -0.03, +1.58 %). One joinpoint was detected in 2005 with a statistically significant increase in incidence during 2005-2013; APC +7.56 % (95 % CI +3.34, +11.96 %). Based on NORDCAN data, there was a statistically significant increase in the incidence of thyroid cancer in the Nordic countries during the same time period. In both women and men a joinpoint was detected in 2006. The incidence increased during 2006-2013 in women; APC +6.16 % (95 % CI +3.94, +8.42 %) and in men; APC +6.84 % (95 % CI +3.69, +10.08 %), thus showing similar results as the Swedish Cancer Register. Analyses based on data from the Cancer Register showed that the increasing trend in Sweden was mainly caused by thyroid cancer of the papillary type. We postulate that the whole increase cannot be attributed to better diagnostic procedures. Increasing exposure to ionizing

  11. Principal component regression analysis with SPSS.

    PubMed

    Liu, R X; Kuang, J; Gong, Q; Hou, X L

    2003-06-01

    The paper introduces all indices of multicollinearity diagnoses, the basic principle of principal component regression and determination of 'best' equation method. The paper uses an example to describe how to do principal component regression analysis with SPSS 10.0: including all calculating processes of the principal component regression and all operations of linear regression, factor analysis, descriptives, compute variable and bivariate correlations procedures in SPSS 10.0. The principal component regression analysis can be used to overcome disturbance of the multicollinearity. The simplified, speeded up and accurate statistical effect is reached through the principal component regression analysis with SPSS.

  12. Logistic regression applied to natural hazards: rare event logistic regression with replications

    NASA Astrophysics Data System (ADS)

    Guns, M.; Vanacker, V.

    2012-06-01

    Statistical analysis of natural hazards needs particular attention, as most of these phenomena are rare events. This study shows that the ordinary rare event logistic regression, as it is now commonly used in geomorphologic studies, does not always lead to a robust detection of controlling factors, as the results can be strongly sample-dependent. In this paper, we introduce some concepts of Monte Carlo simulations in rare event logistic regression. This technique, so-called rare event logistic regression with replications, combines the strength of probabilistic and statistical methods, and allows overcoming some of the limitations of previous developments through robust variable selection. This technique was here developed for the analyses of landslide controlling factors, but the concept is widely applicable for statistical analyses of natural hazards.

  13. Trends in esophageal cancer mortality in China during 1987-2009: age, period and birth cohort analyzes.

    PubMed

    Guo, Pi; Li, Ke

    2012-04-01

    Esophageal cancer is one of the most commonly diagnosed malignant tumors in China. The aim of this study was to provide the representative and comprehensive informations about the long-term mortality trends of this disease in China between 1987 and 2009, using joinpoint regression and generalized additive models (GAMs). Age-standardized mortality rates (ASMR), overall and truncated (35-64 years), were calculated using the direct calculation method, and joinpoint regression was performed to obtain the estimated annual percentage changes (EAPC). GAMs were fitted to study the effects of age, period and birth cohort on mortality trends. ASMR exhibited an overall remarked decline for rural females (EAPC=-2.3 95%CI: -3.3, -1.2), urban males (EAPC=-1.8 95%CI: -2.6, -1.0) and urban females (EAPC=-3.7 95%CI: -4.9, -2.4), but a small drop observed was not statistically significant for rural males (EAPC=-0.9 95%CI: -2.0, 0.3). The declines in ASMR were more noticeable for urban residents in recent years. Among all the residents, age effect showed an progressively increasing trend, whereas cohort effect declined steadily after the year corresponding to the maximum risk value. Period effect seemed to remain substantially unchanged throughout the years. Although variations in mortality rates were observed according to sex and area, the overall decreasing trends in esophageal cancer mortality were found in most Chinese people, aside from rural males. The findings could correspond to the changes in age- and cohort-related factors in the population. Further study is required to understand these potential factors. Copyright © 2011 Elsevier Ltd. All rights reserved.

  14. The impact of OSHA recordkeeping regulation changes on occupational injury and illness trends in the US: a time-series analysis.

    PubMed

    Friedman, Lee S; Forst, Linda

    2007-07-01

    The Survey of Occupational Injuries and Illnesses (SOII), based on Occupational Safety and Health Administration (OSHA) logs, indicates that the number of occupational injuries and illnesses in the US has steadily declined by 35.8% between 1992-2003. However, major changes to the OSHA recordkeeping standard occurred in 1995 and 2001. The authors assessed the relation between changes in OSHA recordkeeping regulations and the trend in occupational injuries and illnesses. SOII data available from the Bureau of Labor Statistics for years 1992-2003 were collected. The authors assessed time series data using join-point regression models. Before the first major recordkeeping change in 1995, injuries and illnesses declined annually by 0.5%. In the period 1995-2000 the slope declined by 3.1% annually (95% CI -3.7% to -2.5%), followed by another more precipitous decline occurring in 2001-2003 (-8.3%; 95% CI -10.0% to -6.6%). When stratifying the data, the authors continued to observe significant changes occurring in 1995 and 2001. The substantial declines in the number of injuries and illnesses correspond directly with changes in OSHA recordkeeping rules. Changes in employment, productivity, OSHA enforcement activity and sampling error do not explain the large decline. Based on the baseline slope (join-point regression analysis, 1992-4), the authors expected a decline of 407 964 injuries and illnesses during the period of follow-up if no intervention occurred; they actually observed a decline of 2.4 million injuries and illnesses of which 2 million or 83% of the decline can be attributed to the change in the OSHA recordkeeping rules.

  15. Quantifying trends in disease impact to produce a consistent and reproducible definition of an emerging infectious disease.

    PubMed

    Funk, Sebastian; Bogich, Tiffany L; Jones, Kate E; Kilpatrick, A Marm; Daszak, Peter

    2013-01-01

    The proper allocation of public health resources for research and control requires quantification of both a disease's current burden and the trend in its impact. Infectious diseases that have been labeled as "emerging infectious diseases" (EIDs) have received heightened scientific and public attention and resources. However, the label 'emerging' is rarely backed by quantitative analysis and is often used subjectively. This can lead to over-allocation of resources to diseases that are incorrectly labelled "emerging," and insufficient allocation of resources to diseases for which evidence of an increasing or high sustained impact is strong. We suggest a simple quantitative approach, segmented regression, to characterize the trends and emergence of diseases. Segmented regression identifies one or more trends in a time series and determines the most statistically parsimonious split(s) (or joinpoints) in the time series. These joinpoints in the time series indicate time points when a change in trend occurred and may identify periods in which drivers of disease impact change. We illustrate the method by analyzing temporal patterns in incidence data for twelve diseases. This approach provides a way to classify a disease as currently emerging, re-emerging, receding, or stable based on temporal trends, as well as to pinpoint the time when the change in these trends happened. We argue that quantitative approaches to defining emergence based on the trend in impact of a disease can, with appropriate context, be used to prioritize resources for research and control. Implementing this more rigorous definition of an EID will require buy-in and enforcement from scientists, policy makers, peer reviewers and journal editors, but has the potential to improve resource allocation for global health.

  16. Understanding logistic regression analysis.

    PubMed

    Sperandei, Sandro

    2014-01-01

    Logistic regression is used to obtain odds ratio in the presence of more than one explanatory variable. The procedure is quite similar to multiple linear regression, with the exception that the response variable is binomial. The result is the impact of each variable on the odds ratio of the observed event of interest. The main advantage is to avoid confounding effects by analyzing the association of all variables together. In this article, we explain the logistic regression procedure using examples to make it as simple as possible. After definition of the technique, the basic interpretation of the results is highlighted and then some special issues are discussed.

  17. Prediction of Cancer Incidence and Mortality in Korea, 2018

    PubMed Central

    Jung, Kyu-Won; Won, Young-Joo; Kong, Hyun-Joo; Lee, Eun Sook

    2018-01-01

    Purpose This study aimed to report on cancer incidence and mortality for the year 2018 to estimate Korea’s current cancer burden. Materials and Methods Cancer incidence data from 1999 to 2015 were obtained from the Korea National Cancer Incidence Database, and cancer mortality data from 1993 to 2016 were acquired from Statistics Korea. Cancer incidence and mortality were projected by fitting a linear regression model to observed age-specific cancer rates against observed years, then multiplying the projected age-specific rates by the age-specific population. The Joinpoint regression model was used to determine at which year the linear trend changed significantly, we only used the data of the latest trend. Results A total of 204,909 new cancer cases and 82,155 cancer deaths are expected to occur in Korea in 2018. The most common cancer sites were lung, followed by stomach, colorectal, breast and liver. These five cancers represent half of the overall burden of cancer in Korea. For mortality, the most common sites were lung cancer, followed by liver, colorectal, stomach and pancreas. Conclusion The incidence rate of all cancer in Korea are estimated to decrease gradually, mainly due to decrease of thyroid cancer. These up-to-date estimates of the cancer burden in Korea could be an important resource for planning and evaluation of cancer-control programs. PMID:29566480

  18. Sclerotic Regressing Large Congenital Nevus.

    PubMed

    Patsatsi, Aikaterini; Kokolios, Miltiadis; Pikou, Olga; Lambropoulos, Vasilios; Efstratiou, Ioannis; Sotiriadis, Dimitrios

    2016-11-01

    Regression of congenital nevi is usually associated with loss of pigment or halo formation. In rare cases, regression is characterized by sclerosis and hair loss. We describe a rare case of a sclerotic hypopigmented large congenital melanocytic nevus in which a localized scleroderma-like reaction process of regression seemed to have started in utero and progressed throughout early childhood. © 2016 Wiley Periodicals, Inc.

  19. Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors.

    PubMed

    Woodard, Dawn B; Crainiceanu, Ciprian; Ruppert, David

    2013-01-01

    We propose a new method for regression using a parsimonious and scientifically interpretable representation of functional predictors. Our approach is designed for data that exhibit features such as spikes, dips, and plateaus whose frequency, location, size, and shape varies stochastically across subjects. We propose Bayesian inference of the joint functional and exposure models, and give a method for efficient computation. We contrast our approach with existing state-of-the-art methods for regression with functional predictors, and show that our method is more effective and efficient for data that include features occurring at varying locations. We apply our methodology to a large and complex dataset from the Sleep Heart Health Study, to quantify the association between sleep characteristics and health outcomes. Software and technical appendices are provided in online supplemental materials.

  20. Parental report of the early development of children with regressive autism: the delays-plus-regression phenotype.

    PubMed

    Ozonoff, Sally; Williams, Brenda J; Landa, Rebecca

    2005-12-01

    Most children with autism demonstrate developmental abnormalities in their first year, whereas others display regression after mostly normal development. Few studies have examined the early development of the latter group. This study developed a retrospective measure, the Early Development Questionnaire (EDQ), to collect specific, parent-reported information about development in the first 18 months. Based on their EDQ scores, 60 children with autism between the ages of 3 and 9 were divided into three groups: an early onset group (n = 29), a definite regression group (n = 23), and a heterogeneous mixed group (n = 8). Significant differences in early social development were found between the early onset and regression groups. However, over 50 percent of the children who experienced a regression demonstrated some early social deficits during the first year of life, long before regression and the apparent onset of autism. This group, tentatively labeled 'delays-plus-regression', deserves further study.

  1. A simple approach to power and sample size calculations in logistic regression and Cox regression models.

    PubMed

    Vaeth, Michael; Skovlund, Eva

    2004-06-15

    For a given regression problem it is possible to identify a suitably defined equivalent two-sample problem such that the power or sample size obtained for the two-sample problem also applies to the regression problem. For a standard linear regression model the equivalent two-sample problem is easily identified, but for generalized linear models and for Cox regression models the situation is more complicated. An approximately equivalent two-sample problem may, however, also be identified here. In particular, we show that for logistic regression and Cox regression models the equivalent two-sample problem is obtained by selecting two equally sized samples for which the parameters differ by a value equal to the slope times twice the standard deviation of the independent variable and further requiring that the overall expected number of events is unchanged. In a simulation study we examine the validity of this approach to power calculations in logistic regression and Cox regression models. Several different covariate distributions are considered for selected values of the overall response probability and a range of alternatives. For the Cox regression model we consider both constant and non-constant hazard rates. The results show that in general the approach is remarkably accurate even in relatively small samples. Some discrepancies are, however, found in small samples with few events and a highly skewed covariate distribution. Comparison with results based on alternative methods for logistic regression models with a single continuous covariate indicates that the proposed method is at least as good as its competitors. The method is easy to implement and therefore provides a simple way to extend the range of problems that can be covered by the usual formulas for power and sample size determination. Copyright 2004 John Wiley & Sons, Ltd.

  2. CART (Classification and Regression Trees) Program: The Implementation of the CART Program and Its Application to Estimating Attrition Rates.

    DTIC Science & Technology

    1985-12-01

    consists of the node t and all descendants of t in T. (3) Definition 3. Pruning a branch Tt from a tree T con- sists of deleting from T all...The default is 1.0 so that actually, this keyword did not need to appear in the above file. (5) DELETE . This keyword does not appear in our example, but...when it is used associated with some variable names, it indicates that we want to delete these vari- ables from the regression. If this keyword is

  3. Regression Discontinuity Designs in Epidemiology

    PubMed Central

    Moscoe, Ellen; Mutevedzi, Portia; Newell, Marie-Louise; Bärnighausen, Till

    2014-01-01

    When patients receive an intervention based on whether they score below or above some threshold value on a continuously measured random variable, the intervention will be randomly assigned for patients close to the threshold. The regression discontinuity design exploits this fact to estimate causal treatment effects. In spite of its recent proliferation in economics, the regression discontinuity design has not been widely adopted in epidemiology. We describe regression discontinuity, its implementation, and the assumptions required for causal inference. We show that regression discontinuity is generalizable to the survival and nonlinear models that are mainstays of epidemiologic analysis. We then present an application of regression discontinuity to the much-debated epidemiologic question of when to start HIV patients on antiretroviral therapy. Using data from a large South African cohort (2007–2011), we estimate the causal effect of early versus deferred treatment eligibility on mortality. Patients whose first CD4 count was just below the 200 cells/μL CD4 count threshold had a 35% lower hazard of death (hazard ratio = 0.65 [95% confidence interval = 0.45–0.94]) than patients presenting with CD4 counts just above the threshold. We close by discussing the strengths and limitations of regression discontinuity designs for epidemiology. PMID:25061922

  4. Post-processing through linear regression

    NASA Astrophysics Data System (ADS)

    van Schaeybroeck, B.; Vannitsem, S.

    2011-03-01

    Various post-processing techniques are compared for both deterministic and ensemble forecasts, all based on linear regression between forecast data and observations. In order to evaluate the quality of the regression methods, three criteria are proposed, related to the effective correction of forecast error, the optimal variability of the corrected forecast and multicollinearity. The regression schemes under consideration include the ordinary least-square (OLS) method, a new time-dependent Tikhonov regularization (TDTR) method, the total least-square method, a new geometric-mean regression (GM), a recently introduced error-in-variables (EVMOS) method and, finally, a "best member" OLS method. The advantages and drawbacks of each method are clarified. These techniques are applied in the context of the 63 Lorenz system, whose model version is affected by both initial condition and model errors. For short forecast lead times, the number and choice of predictors plays an important role. Contrarily to the other techniques, GM degrades when the number of predictors increases. At intermediate lead times, linear regression is unable to provide corrections to the forecast and can sometimes degrade the performance (GM and the best member OLS with noise). At long lead times the regression schemes (EVMOS, TDTR) which yield the correct variability and the largest correlation between ensemble error and spread, should be preferred.

  5. Time series modeling by a regression approach based on a latent process.

    PubMed

    Chamroukhi, Faicel; Samé, Allou; Govaert, Gérard; Aknin, Patrice

    2009-01-01

    Time series are used in many domains including finance, engineering, economics and bioinformatics generally to represent the change of a measurement over time. Modeling techniques may then be used to give a synthetic representation of such data. A new approach for time series modeling is proposed in this paper. It consists of a regression model incorporating a discrete hidden logistic process allowing for activating smoothly or abruptly different polynomial regression models. The model parameters are estimated by the maximum likelihood method performed by a dedicated Expectation Maximization (EM) algorithm. The M step of the EM algorithm uses a multi-class Iterative Reweighted Least-Squares (IRLS) algorithm to estimate the hidden process parameters. To evaluate the proposed approach, an experimental study on simulated data and real world data was performed using two alternative approaches: a heteroskedastic piecewise regression model using a global optimization algorithm based on dynamic programming, and a Hidden Markov Regression Model whose parameters are estimated by the Baum-Welch algorithm. Finally, in the context of the remote monitoring of components of the French railway infrastructure, and more particularly the switch mechanism, the proposed approach has been applied to modeling and classifying time series representing the condition measurements acquired during switch operations.

  6. Procedures for adjusting regional regression models of urban-runoff quality using local data

    USGS Publications Warehouse

    Hoos, A.B.; Sisolak, J.K.

    1993-01-01

    Statistical operations termed model-adjustment procedures (MAP?s) can be used to incorporate local data into existing regression models to improve the prediction of urban-runoff quality. Each MAP is a form of regression analysis in which the local data base is used as a calibration data set. Regression coefficients are determined from the local data base, and the resulting `adjusted? regression models can then be used to predict storm-runoff quality at unmonitored sites. The response variable in the regression analyses is the observed load or mean concentration of a constituent in storm runoff for a single storm. The set of explanatory variables used in the regression analyses is different for each MAP, but always includes the predicted value of load or mean concentration from a regional regression model. The four MAP?s examined in this study were: single-factor regression against the regional model prediction, P, (termed MAP-lF-P), regression against P,, (termed MAP-R-P), regression against P, and additional local variables (termed MAP-R-P+nV), and a weighted combination of P, and a local-regression prediction (termed MAP-W). The procedures were tested by means of split-sample analysis, using data from three cities included in the Nationwide Urban Runoff Program: Denver, Colorado; Bellevue, Washington; and Knoxville, Tennessee. The MAP that provided the greatest predictive accuracy for the verification data set differed among the three test data bases and among model types (MAP-W for Denver and Knoxville, MAP-lF-P and MAP-R-P for Bellevue load models, and MAP-R-P+nV for Bellevue concentration models) and, in many cases, was not clearly indicated by the values of standard error of estimate for the calibration data set. A scheme to guide MAP selection, based on exploratory data analysis of the calibration data set, is presented and tested. The MAP?s were tested for sensitivity to the size of a calibration data set. As expected, predictive accuracy of all MAP?s for

  7. A Comparative Study of Classification and Regression Algorithms for Modelling Students' Academic Performance

    ERIC Educational Resources Information Center

    Strecht, Pedro; Cruz, Luís; Soares, Carlos; Mendes-Moreira, João; Abreu, Rui

    2015-01-01

    Predicting the success or failure of a student in a course or program is a problem that has recently been addressed using data mining techniques. In this paper we evaluate some of the most popular classification and regression algorithms on this problem. We address two problems: prediction of approval/failure and prediction of grade. The former is…

  8. Heterogeneity in drug abuse among juvenile offenders: is mixture regression more informative than standard regression?

    PubMed

    Montgomery, Katherine L; Vaughn, Michael G; Thompson, Sanna J; Howard, Matthew O

    2013-11-01

    Research on juvenile offenders has largely treated this population as a homogeneous group. However, recent findings suggest that this at-risk population may be considerably more heterogeneous than previously believed. This study compared mixture regression analyses with standard regression techniques in an effort to explain how known factors such as distress, trauma, and personality are associated with drug abuse among juvenile offenders. Researchers recruited 728 juvenile offenders from Missouri juvenile correctional facilities for participation in this study. Researchers investigated past-year substance use in relation to the following variables: demographic characteristics (gender, ethnicity, age, familial use of public assistance), antisocial behavior, and mental illness symptoms (psychopathic traits, psychiatric distress, and prior trauma). Results indicated that standard and mixed regression approaches identified significant variables related to past-year substance use among this population; however, the mixture regression methods provided greater specificity in results. Mixture regression analytic methods may help policy makers and practitioners better understand and intervene with the substance-related subgroups of juvenile offenders.

  9. Semiparametric regression during 2003–2007*

    PubMed Central

    Ruppert, David; Wand, M.P.; Carroll, Raymond J.

    2010-01-01

    Semiparametric regression is a fusion between parametric regression and nonparametric regression that integrates low-rank penalized splines, mixed model and hierarchical Bayesian methodology – thus allowing more streamlined handling of longitudinal and spatial correlation. We review progress in the field over the five-year period between 2003 and 2007. We find semiparametric regression to be a vibrant field with substantial involvement and activity, continual enhancement and widespread application. PMID:20305800

  10. Comparison of multinomial logistic regression and logistic regression: which is more efficient in allocating land use?

    NASA Astrophysics Data System (ADS)

    Lin, Yingzhi; Deng, Xiangzheng; Li, Xing; Ma, Enjun

    2014-12-01

    Spatially explicit simulation of land use change is the basis for estimating the effects of land use and cover change on energy fluxes, ecology and the environment. At the pixel level, logistic regression is one of the most common approaches used in spatially explicit land use allocation models to determine the relationship between land use and its causal factors in driving land use change, and thereby to evaluate land use suitability. However, these models have a drawback in that they do not determine/allocate land use based on the direct relationship between land use change and its driving factors. Consequently, a multinomial logistic regression method was introduced to address this flaw, and thereby, judge the suitability of a type of land use in any given pixel in a case study area of the Jiangxi Province, China. A comparison of the two regression methods indicated that the proportion of correctly allocated pixels using multinomial logistic regression was 92.98%, which was 8.47% higher than that obtained using logistic regression. Paired t-test results also showed that pixels were more clearly distinguished by multinomial logistic regression than by logistic regression. In conclusion, multinomial logistic regression is a more efficient and accurate method for the spatial allocation of land use changes. The application of this method in future land use change studies may improve the accuracy of predicting the effects of land use and cover change on energy fluxes, ecology, and environment.

  11. Interpretation of commonly used statistical regression models.

    PubMed

    Kasza, Jessica; Wolfe, Rory

    2014-01-01

    A review of some regression models commonly used in respiratory health applications is provided in this article. Simple linear regression, multiple linear regression, logistic regression and ordinal logistic regression are considered. The focus of this article is on the interpretation of the regression coefficients of each model, which are illustrated through the application of these models to a respiratory health research study. © 2013 The Authors. Respirology © 2013 Asian Pacific Society of Respirology.

  12. Fine and Gray competing risk regression model to study the cause-specific under-five child mortality in Bangladesh.

    PubMed

    Mohammad, Khandoker Akib; Fatima-Tuz-Zahura, Most; Bari, Wasimul

    2017-01-28

    The cause-specific under-five mortality of Bangladesh has been studied by fitting cumulative incidence function (CIF) based Fine and Gray competing risk regression model (1999). For the purpose of analysis, Bangladesh Demographic and Health Survey (BDHS), 2011 data set was used. Three types of mode of mortality for the under-five children are considered. These are disease, non-disease and other causes. Product-Limit survival probabilities for the under-five child mortality with log-rank test were used to select a set of covariates for the regression model. The covariates found to have significant association in bivariate analysis were only considered in the regression analysis. Potential determinants of under-five child mortality due to disease is size of child at birth, while gender of child, NGO (non-government organization) membership of mother, mother's education level, and size of child at birth are due to non-disease and age of mother at birth, NGO membership of mother, and mother's education level are for the mortality due to other causes. Female participation in the education programs needs to be increased because of the improvement of child health and government should arrange family and social awareness programs as well as health related programs for women so that they are aware of their child health.

  13. Comparison of l₁-Norm SVR and Sparse Coding Algorithms for Linear Regression.

    PubMed

    Zhang, Qingtian; Hu, Xiaolin; Zhang, Bo

    2015-08-01

    Support vector regression (SVR) is a popular function estimation technique based on Vapnik's concept of support vector machine. Among many variants, the l1-norm SVR is known to be good at selecting useful features when the features are redundant. Sparse coding (SC) is a technique widely used in many areas and a number of efficient algorithms are available. Both l1-norm SVR and SC can be used for linear regression. In this brief, the close connection between the l1-norm SVR and SC is revealed and some typical algorithms are compared for linear regression. The results show that the SC algorithms outperform the Newton linear programming algorithm, an efficient l1-norm SVR algorithm, in efficiency. The algorithms are then used to design the radial basis function (RBF) neural networks. Experiments on some benchmark data sets demonstrate the high efficiency of the SC algorithms. In particular, one of the SC algorithms, the orthogonal matching pursuit is two orders of magnitude faster than a well-known RBF network designing algorithm, the orthogonal least squares algorithm.

  14. An Investigation of Sleep Characteristics, EEG Abnormalities and Epilepsy in Developmentally Regressed and Non-Regressed Children with Autism

    ERIC Educational Resources Information Center

    Giannotti, Flavia; Cortesi, Flavia; Cerquiglini, Antonella; Miraglia, Daniela; Vagnoni, Cristina; Sebastiani, Teresa; Bernabei, Paola

    2008-01-01

    This study investigated sleep of children with autism and developmental regression and the possible relationship with epilepsy and epileptiform abnormalities. Participants were 104 children with autism (70 non-regressed, 34 regressed) and 162 typically developing children (TD). Results suggested that the regressed group had higher incidence of…

  15. Linear regression analysis for comparing two measurers or methods of measurement: but which regression?

    PubMed

    Ludbrook, John

    2010-07-01

    1. There are two reasons for wanting to compare measurers or methods of measurement. One is to calibrate one method or measurer against another; the other is to detect bias. Fixed bias is present when one method gives higher (or lower) values across the whole range of measurement. Proportional bias is present when one method gives values that diverge progressively from those of the other. 2. Linear regression analysis is a popular method for comparing methods of measurement, but the familiar ordinary least squares (OLS) method is rarely acceptable. The OLS method requires that the x values are fixed by the design of the study, whereas it is usual that both y and x values are free to vary and are subject to error. In this case, special regression techniques must be used. 3. Clinical chemists favour techniques such as major axis regression ('Deming's method'), the Passing-Bablok method or the bivariate least median squares method. Other disciplines, such as allometry, astronomy, biology, econometrics, fisheries research, genetics, geology, physics and sports science, have their own preferences. 4. Many Monte Carlo simulations have been performed to try to decide which technique is best, but the results are almost uninterpretable. 5. I suggest that pharmacologists and physiologists should use ordinary least products regression analysis (geometric mean regression, reduced major axis regression): it is versatile, can be used for calibration or to detect bias and can be executed by hand-held calculator or by using the loss function in popular, general-purpose, statistical software.

  16. What Works after School? The Relationship between After-School Program Quality, Program Attendance, and Academic Outcomes

    ERIC Educational Resources Information Center

    Leos-Urbel, Jacob

    2015-01-01

    This article examines the relationship between after-school program quality, program attendance, and academic outcomes for a sample of low-income after-school program participants. Regression and hierarchical linear modeling analyses use a unique longitudinal data set including 29 after-school programs that served 5,108 students in Grades 4 to 8…

  17. Isolating the Effects of Training Using Simple Regression Analysis: An Example of the Procedure.

    ERIC Educational Resources Information Center

    Waugh, C. Keith

    This paper provides a case example of simple regression analysis, a forecasting procedure used to isolate the effects of training from an identified extraneous variable. This case example focuses on results of a three-day sales training program to improve bank loan officers' knowledge, skill-level, and attitude regarding solicitation and sale of…

  18. National trends in the recommendation of radiotherapy after prostatectomy for prostate cancer before and after the reporting of a survival benefit in March 2009.

    PubMed

    Mahal, Brandon A; Hoffman, Karen E; Efstathiou, Jason A; Nguyen, Paul L

    2015-06-01

    Three randomized trials demonstrated that postprostatectomy adjuvant radiotherapy improves biochemical disease-free survival for patients with adverse pathologic features, and 1 trial found adjuvant radiotherapy improves overall survival. We sought to determine whether postprostatectomy radiotherapy (PPRT) utilization changed after publication of the survival benefit in March 2009. The Surveillance, Epidemiology, and End Results database was used to identify men diagnosed with prostate cancer from 2004 to 2011 who met criteria for enrollment in the randomized trials (positive margins and/or pT3-4 disease at radical prostatectomy). Joinpoint regression identified inflection points in PPRT utilization. Logistic regression was used to evaluate factors associated with PPRT recommendation. Of 35,361 men, 5104 (14.4%) received a recommendation for PPRT. In joinpoint regression, 2009 was the inflection point in PPRT utilization. In multivariable analysis, PPRT recommendations were more likely after March 2009 than before 15.8% vs. 13.5%, adjusted odds ratio (AOR; 1.09; 95% confidence interval [CI], 1.02-1.16; P = .008), in men with pT3 (vs. pT2, AOR, 2.81; 95% CI, 2.53-3.11; P < .001), pT4 (vs. pT2 AOR, 4.62; 95% CI, 3.85-5.54; P < .001), or margin positive (AOR, 1.46; 95% CI, 1.34-1.58; P < .001) disease and in men who were younger (per year decrease, AOR, 1.02; 95% CI, 1.02-1.03; P < .001), married (AOR, 1.10; 95% CI, 1.02-1.19; P = .01), or lived in metropolitan areas (AOR, 1.30; 95% CI, 1.16-1.47; P < .001). PPRT recommendations increased after the reporting of a survival benefit in March 2009, but absolute utilization rates remain low, suggesting that the oncologic community remains unconvinced that PPRT is needed for most patients with adverse features. Further work is needed to identify patients who might benefit most from PPRT. Copyright © 2015 Elsevier Inc. All rights reserved.

  19. Predicting Student Success on the Texas Chemistry STAAR Test: A Logistic Regression Analysis

    ERIC Educational Resources Information Center

    Johnson, William L.; Johnson, Annabel M.; Johnson, Jared

    2012-01-01

    Background: The context is the new Texas STAAR end-of-course testing program. Purpose: The authors developed a logistic regression model to predict who would pass-or-fail the new Texas chemistry STAAR end-of-course exam. Setting: Robert E. Lee High School (5A) with an enrollment of 2700 students, Tyler, Texas. Date of the study was the 2011-2012…

  20. Standards for Standardized Logistic Regression Coefficients

    ERIC Educational Resources Information Center

    Menard, Scott

    2011-01-01

    Standardized coefficients in logistic regression analysis have the same utility as standardized coefficients in linear regression analysis. Although there has been no consensus on the best way to construct standardized logistic regression coefficients, there is now sufficient evidence to suggest a single best approach to the construction of a…

  1. Developmental Regression in Autism Spectrum Disorders

    ERIC Educational Resources Information Center

    Rogers, Sally J.

    2004-01-01

    The occurrence of developmental regression in autism is one of the more puzzling features of this disorder. Although several studies have documented the validity of parental reports of regression using home videos, accumulating data suggest that most children who demonstrate regression also demonstrated previous, subtle, developmental differences.…

  2. Logistic regression for dichotomized counts.

    PubMed

    Preisser, John S; Das, Kalyan; Benecha, Habtamu; Stamm, John W

    2016-12-01

    Sometimes there is interest in a dichotomized outcome indicating whether a count variable is positive or zero. Under this scenario, the application of ordinary logistic regression may result in efficiency loss, which is quantifiable under an assumed model for the counts. In such situations, a shared-parameter hurdle model is investigated for more efficient estimation of regression parameters relating to overall effects of covariates on the dichotomous outcome, while handling count data with many zeroes. One model part provides a logistic regression containing marginal log odds ratio effects of primary interest, while an ancillary model part describes the mean count of a Poisson or negative binomial process in terms of nuisance regression parameters. Asymptotic efficiency of the logistic model parameter estimators of the two-part models is evaluated with respect to ordinary logistic regression. Simulations are used to assess the properties of the models with respect to power and Type I error, the latter investigated under both misspecified and correctly specified models. The methods are applied to data from a randomized clinical trial of three toothpaste formulations to prevent incident dental caries in a large population of Scottish schoolchildren. © The Author(s) 2014.

  3. Incentives, Program Configuration, and Employee Uptake of Workplace Wellness Programs.

    PubMed

    Huang, Haijing; Mattke, Soeren; Batorsky, Benajmin; Miles, Jeremy; Liu, Hangsheng; Taylor, Erin

    2016-01-01

    The aim of this study was to determine the effect of wellness program configurations and financial incentives on employee participation rate. We analyze a nationally representative survey on workplace wellness programs from 407 employers using cluster analysis and multivariable regression analysis. Employers who offer incentives and provide a comprehensive set of program offerings have higher participation rates. The effect of incentives differs by program configuration, with the strongest effect found for comprehensive and prevention-focused programs. Among intervention-focused programs, incentives are not associated with higher participation. Wellness programs can be grouped into distinct configurations, which have different workplace health focuses. Although monetary incentives can be effective in improving employee participation, the magnitude and significance of the effect is greater for some program configurations than others.

  4. Using the Ridge Regression Procedures to Estimate the Multiple Linear Regression Coefficients

    NASA Astrophysics Data System (ADS)

    Gorgees, HazimMansoor; Mahdi, FatimahAssim

    2018-05-01

    This article concerns with comparing the performance of different types of ordinary ridge regression estimators that have been already proposed to estimate the regression parameters when the near exact linear relationships among the explanatory variables is presented. For this situations we employ the data obtained from tagi gas filling company during the period (2008-2010). The main result we reached is that the method based on the condition number performs better than other methods since it has smaller mean square error (MSE) than the other stated methods.

  5. Multicollinearity and Regression Analysis

    NASA Astrophysics Data System (ADS)

    Daoud, Jamal I.

    2017-12-01

    In regression analysis it is obvious to have a correlation between the response and predictor(s), but having correlation among predictors is something undesired. The number of predictors included in the regression model depends on many factors among which, historical data, experience, etc. At the end selection of most important predictors is something objective due to the researcher. Multicollinearity is a phenomena when two or more predictors are correlated, if this happens, the standard error of the coefficients will increase [8]. Increased standard errors means that the coefficients for some or all independent variables may be found to be significantly different from In other words, by overinflating the standard errors, multicollinearity makes some variables statistically insignificant when they should be significant. In this paper we focus on the multicollinearity, reasons and consequences on the reliability of the regression model.

  6. The comparison of robust partial least squares regression with robust principal component regression on a real

    NASA Astrophysics Data System (ADS)

    Polat, Esra; Gunay, Suleyman

    2013-10-01

    One of the problems encountered in Multiple Linear Regression (MLR) is multicollinearity, which causes the overestimation of the regression parameters and increase of the variance of these parameters. Hence, in case of multicollinearity presents, biased estimation procedures such as classical Principal Component Regression (CPCR) and Partial Least Squares Regression (PLSR) are then performed. SIMPLS algorithm is the leading PLSR algorithm because of its speed, efficiency and results are easier to interpret. However, both of the CPCR and SIMPLS yield very unreliable results when the data set contains outlying observations. Therefore, Hubert and Vanden Branden (2003) have been presented a robust PCR (RPCR) method and a robust PLSR (RPLSR) method called RSIMPLS. In RPCR, firstly, a robust Principal Component Analysis (PCA) method for high-dimensional data on the independent variables is applied, then, the dependent variables are regressed on the scores using a robust regression method. RSIMPLS has been constructed from a robust covariance matrix for high-dimensional data and robust linear regression. The purpose of this study is to show the usage of RPCR and RSIMPLS methods on an econometric data set, hence, making a comparison of two methods on an inflation model of Turkey. The considered methods have been compared in terms of predictive ability and goodness of fit by using a robust Root Mean Squared Error of Cross-validation (R-RMSECV), a robust R2 value and Robust Component Selection (RCS) statistic.

  7. [Application of negative binomial regression and modified Poisson regression in the research of risk factors for injury frequency].

    PubMed

    Cao, Qingqing; Wu, Zhenqiang; Sun, Ying; Wang, Tiezhu; Han, Tengwei; Gu, Chaomei; Sun, Yehuan

    2011-11-01

    To Eexplore the application of negative binomial regression and modified Poisson regression analysis in analyzing the influential factors for injury frequency and the risk factors leading to the increase of injury frequency. 2917 primary and secondary school students were selected from Hefei by cluster random sampling method and surveyed by questionnaire. The data on the count event-based injuries used to fitted modified Poisson regression and negative binomial regression model. The risk factors incurring the increase of unintentional injury frequency for juvenile students was explored, so as to probe the efficiency of these two models in studying the influential factors for injury frequency. The Poisson model existed over-dispersion (P < 0.0001) based on testing by the Lagrangemultiplier. Therefore, the over-dispersion dispersed data using a modified Poisson regression and negative binomial regression model, was fitted better. respectively. Both showed that male gender, younger age, father working outside of the hometown, the level of the guardian being above junior high school and smoking might be the results of higher injury frequencies. On a tendency of clustered frequency data on injury event, both the modified Poisson regression analysis and negative binomial regression analysis can be used. However, based on our data, the modified Poisson regression fitted better and this model could give a more accurate interpretation of relevant factors affecting the frequency of injury.

  8. Evaluation of linear regression techniques for atmospheric applications: the importance of appropriate weighting

    NASA Astrophysics Data System (ADS)

    Wu, Cheng; Zhen Yu, Jian

    2018-03-01

    Linear regression techniques are widely used in atmospheric science, but they are often improperly applied due to lack of consideration or inappropriate handling of measurement uncertainty. In this work, numerical experiments are performed to evaluate the performance of five linear regression techniques, significantly extending previous works by Chu and Saylor. The five techniques are ordinary least squares (OLS), Deming regression (DR), orthogonal distance regression (ODR), weighted ODR (WODR), and York regression (YR). We first introduce a new data generation scheme that employs the Mersenne twister (MT) pseudorandom number generator. The numerical simulations are also improved by (a) refining the parameterization of nonlinear measurement uncertainties, (b) inclusion of a linear measurement uncertainty, and (c) inclusion of WODR for comparison. Results show that DR, WODR and YR produce an accurate slope, but the intercept by WODR and YR is overestimated and the degree of bias is more pronounced with a low R2 XY dataset. The importance of a properly weighting parameter λ in DR is investigated by sensitivity tests, and it is found that an improper λ in DR can lead to a bias in both the slope and intercept estimation. Because the λ calculation depends on the actual form of the measurement error, it is essential to determine the exact form of measurement error in the XY data during the measurement stage. If a priori error in one of the variables is unknown, or the measurement error described cannot be trusted, DR, WODR and YR can provide the least biases in slope and intercept among all tested regression techniques. For these reasons, DR, WODR and YR are recommended for atmospheric studies when both X and Y data have measurement errors. An Igor Pro-based program (Scatter Plot) was developed to facilitate the implementation of error-in-variables regressions.

  9. Multiple-Instance Regression with Structured Data

    NASA Technical Reports Server (NTRS)

    Wagstaff, Kiri L.; Lane, Terran; Roper, Alex

    2008-01-01

    We present a multiple-instance regression algorithm that models internal bag structure to identify the items most relevant to the bag labels. Multiple-instance regression (MIR) operates on a set of bags with real-valued labels, each containing a set of unlabeled items, in which the relevance of each item to its bag label is unknown. The goal is to predict the labels of new bags from their contents. Unlike previous MIR methods, MI-ClusterRegress can operate on bags that are structured in that they contain items drawn from a number of distinct (but unknown) distributions. MI-ClusterRegress simultaneously learns a model of the bag's internal structure, the relevance of each item, and a regression model that accurately predicts labels for new bags. We evaluated this approach on the challenging MIR problem of crop yield prediction from remote sensing data. MI-ClusterRegress provided predictions that were more accurate than those obtained with non-multiple-instance approaches or MIR methods that do not model the bag structure.

  10. [From clinical judgment to linear regression model.

    PubMed

    Palacios-Cruz, Lino; Pérez, Marcela; Rivas-Ruiz, Rodolfo; Talavera, Juan O

    2013-01-01

    When we think about mathematical models, such as linear regression model, we think that these terms are only used by those engaged in research, a notion that is far from the truth. Legendre described the first mathematical model in 1805, and Galton introduced the formal term in 1886. Linear regression is one of the most commonly used regression models in clinical practice. It is useful to predict or show the relationship between two or more variables as long as the dependent variable is quantitative and has normal distribution. Stated in another way, the regression is used to predict a measure based on the knowledge of at least one other variable. Linear regression has as it's first objective to determine the slope or inclination of the regression line: Y = a + bx, where "a" is the intercept or regression constant and it is equivalent to "Y" value when "X" equals 0 and "b" (also called slope) indicates the increase or decrease that occurs when the variable "x" increases or decreases in one unit. In the regression line, "b" is called regression coefficient. The coefficient of determination (R 2 ) indicates the importance of independent variables in the outcome.

  11. Autistic Regression

    ERIC Educational Resources Information Center

    Matson, Johnny L.; Kozlowski, Alison M.

    2010-01-01

    Autistic regression is one of the many mysteries in the developmental course of autism and pervasive developmental disorders not otherwise specified (PDD-NOS). Various definitions of this phenomenon have been used, further clouding the study of the topic. Despite this problem, some efforts at establishing prevalence have been made. The purpose of…

  12. Practical Session: Simple Linear Regression

    NASA Astrophysics Data System (ADS)

    Clausel, M.; Grégoire, G.

    2014-12-01

    Two exercises are proposed to illustrate the simple linear regression. The first one is based on the famous Galton's data set on heredity. We use the lm R command and get coefficients estimates, standard error of the error, R2, residuals …In the second example, devoted to data related to the vapor tension of mercury, we fit a simple linear regression, predict values, and anticipate on multiple linear regression. This pratical session is an excerpt from practical exercises proposed by A. Dalalyan at EPNC (see Exercises 1 and 2 of http://certis.enpc.fr/~dalalyan/Download/TP_ENPC_4.pdf).

  13. Program Director Participation in a Leadership and Management Skills Fellowship and Characteristics of Program Quality.

    PubMed

    Carek, Peter J; Mims, Lisa D; Conry, Colleen M; Maxwell, Lisa; Greenwood, Vicki; Pugno, Perry A

    2015-01-01

    The association between a residency program director completing a leadership and management skills fellowship and characteristics of quality and innovation of his/her residency program has not been studied. Therefore, the aim of this study is to examine the association between a residency program director's completion of a specific fellowship addressing these skills (National Institute for Program Director Development or NIPDD) and characteristics of quality and innovation of the program they direct. Using information from the American Academy of Family Physicians (AAFP), National Resident Matching Program (NRMP) and FREIDA® program characteristics were obtained. Descriptive statistics were used to summarize the data. The relationship between programs with a NIPDD graduate as director and program quality measures and indicators of innovation was analyzed using both chi square and logistic regression. Initial analyses showed significant associations between the NIPDD graduate status of a program director and regional location, mean years of program director tenure, and the program's 5-year aggregate ABFM board pass rate from 2007--2011. After grouping the programs into tertiles, the regression model showed significant positive associations with programs offering international experiences and being a NIPDD graduate. Program director participation in a fellowship addressing leadership and management skills (ie, NIPDD) was found to be associated with higher pass rates of new graduates on a Board certification examination and predictive of programs being in the upper tertile of programs in terms of Board pass rates.

  14. Categorical regression dose-response modeling

    EPA Science Inventory

    The goal of this training is to provide participants with training on the use of the U.S. EPA’s Categorical Regression soft¬ware (CatReg) and its application to risk assessment. Categorical regression fits mathematical models to toxicity data that have been assigned ord...

  15. Secular trend analysis of lung cancer incidence in Sihui city, China between 1987 and 2011.

    PubMed

    Du, Jin-Lin; Lin, Xiao; Zhang, Li-Fang; Li, Yan-Hua; Xie, Shang-Hang; Yang, Meng-Jie; Guo, Jie; Lin, Er-Hong; Liu, Qing; Hong, Ming-Huang; Huang, Qi-Hong; Liao, Zheng-Er; Cao, Su-Mei

    2015-07-31

    With industrial and econom ic development in recent decades in South China, cancer incidence may have changed due to the changing lifestyle and environment. However, the trends of lung cancer and the roles of smoking and other environmental risk factors in the development of lung cancer in rural areas of South China remain unclear. The purpose of this study was to explore the lung cancer incidence trends and the possible causes of these trends. Joinpoint regression analysis and the age-period-cohort (APC) model were used to analyze the lung cancer incidence trends in Sihui, Guangdong province, China between 1987 and 2011, and explore the possible causes of these trends. A total of 2,397 lung cancer patients were involved in this study. A 3-fold increase in the incidence of lung cancer in both sexes was observed over the 25-year period. Joinpoint regression analysis showed that while the incidence continued to increase steadily in females during the entire period, a sharp acceleration was observed in males starting in 2005. The full APC model was selected to describe age, period, and birth cohort effects on lung cancer incidence trends in Sihui. The age cohorts in both sexes showed a continuously significant increase in the relative risk (RR) of lung cancer, with a peak in the eldest age group (80-84 years). The RR of lung cancer showed a fluctuating curve in both sexes. The birth cohorts identified an increased trend in both males and females; however, males had a plateau in the youngest cohorts who were born during 1955-1969. Increasing trends of the incidence of lung cancer in Sihui were dominated by the effects of age and birth cohorts. Social aging, smoking, and environmental changes may play important roles in such trends.

  16. [The effect of increasing tobacco tax on tobacco sales in Japan].

    PubMed

    Ito, Yuri; Nakamura, Masakazu

    2013-09-01

    Since the special tobacco tax was established in 1998, the tobacco tax and price of tobacco have increased thrice, in 2003, 2006, and 2010, respectively. We evaluated the effect of increases in tax on the consumption and sales of tobacco in Japan using the annual data on the number of tobacco products sold and the total sales from Japan Tobacco, Inc. We applied the number of tobacco products sold and the total sales per year to a joinpoint regression model to examine the trends in the data. This model could help identify the year in which a decrease or increase was apparent from the data. In addition, we examined the effect of each tax increase while also considering other factors that may have caused a decrease in the levels of tobacco consumption using the method proposed by Hirano et al. According to the joinpoint regression analysis, the number of tobacco products sold started decreasing in 1998, and the trends of decrease accelerated to 5% per year, from 2005. Owing to the tax increase, tobacco sales reduced by -2.4%, -2.9%, and -10.1% (corrected for the effect of the Tohoku Great Earthquake), and price elasticity was estimated as -0.30, -0.27, and -0.28 (corrected) in 2003, 2006, and 2010, respectively. The effect of tobacco tax increase on the decrease in tobacco sales was greatest in 2010, while the price elasticity remained almost the same as it was during the previous tax increase. The sharp hike in tobacco tax in 2010 decreased the number of tobacco products sold, while the price elasticity in 2010 was similar to that in 2003 and 2006. Our findings suggest that further increase in tobacco tax is needed to reduce the damage caused by smoking in the people of Japan.

  17. Extrinsic local regression on manifold-valued data

    PubMed Central

    Lin, Lizhen; St Thomas, Brian; Zhu, Hongtu; Dunson, David B.

    2017-01-01

    We propose an extrinsic regression framework for modeling data with manifold valued responses and Euclidean predictors. Regression with manifold responses has wide applications in shape analysis, neuroscience, medical imaging and many other areas. Our approach embeds the manifold where the responses lie onto a higher dimensional Euclidean space, obtains a local regression estimate in that space, and then projects this estimate back onto the image of the manifold. Outside the regression setting both intrinsic and extrinsic approaches have been proposed for modeling i.i.d manifold-valued data. However, to our knowledge our work is the first to take an extrinsic approach to the regression problem. The proposed extrinsic regression framework is general, computationally efficient and theoretically appealing. Asymptotic distributions and convergence rates of the extrinsic regression estimates are derived and a large class of examples are considered indicating the wide applicability of our approach. PMID:29225385

  18. Functional Relationships and Regression Analysis.

    ERIC Educational Resources Information Center

    Preece, Peter F. W.

    1978-01-01

    Using a degenerate multivariate normal model for the distribution of organismic variables, the form of least-squares regression analysis required to estimate a linear functional relationship between variables is derived. It is suggested that the two conventional regression lines may be considered to describe functional, not merely statistical,…

  19. Logistic Regression: Concept and Application

    ERIC Educational Resources Information Center

    Cokluk, Omay

    2010-01-01

    The main focus of logistic regression analysis is classification of individuals in different groups. The aim of the present study is to explain basic concepts and processes of binary logistic regression analysis intended to determine the combination of independent variables which best explain the membership in certain groups called dichotomous…

  20. Predictors of course in obsessive-compulsive disorder: logistic regression versus Cox regression for recurrent events.

    PubMed

    Kempe, P T; van Oppen, P; de Haan, E; Twisk, J W R; Sluis, A; Smit, J H; van Dyck, R; van Balkom, A J L M

    2007-09-01

    Two methods for predicting remissions in obsessive-compulsive disorder (OCD) treatment are evaluated. Y-BOCS measurements of 88 patients with a primary OCD (DSM-III-R) diagnosis were performed over a 16-week treatment period, and during three follow-ups. Remission at any measurement was defined as a Y-BOCS score lower than thirteen combined with a reduction of seven points when compared with baseline. Logistic regression models were compared with a Cox regression for recurrent events model. Logistic regression yielded different models at different evaluation times. The recurrent events model remained stable when fewer measurements were used. Higher baseline levels of neuroticism and more severe OCD symptoms were associated with a lower chance of remission, early age of onset and more depressive symptoms with a higher chance. Choice of outcome time affects logistic regression prediction models. Recurrent events analysis uses all information on remissions and relapses. Short- and long-term predictors for OCD remission show overlap.

  1. A Survey of UML Based Regression Testing

    NASA Astrophysics Data System (ADS)

    Fahad, Muhammad; Nadeem, Aamer

    Regression testing is the process of ensuring software quality by analyzing whether changed parts behave as intended, and unchanged parts are not affected by the modifications. Since it is a costly process, a lot of techniques are proposed in the research literature that suggest testers how to build regression test suite from existing test suite with minimum cost. In this paper, we discuss the advantages and drawbacks of using UML diagrams for regression testing and analyze that UML model helps in identifying changes for regression test selection effectively. We survey the existing UML based regression testing techniques and provide an analysis matrix to give a quick insight into prominent features of the literature work. We discuss the open research issues like managing and reducing the size of regression test suite, prioritization of the test cases that would be helpful during strict schedule and resources that remain to be addressed for UML based regression testing.

  2. Regression model for estimating inactivation of microbial aerosols by solar radiation.

    PubMed

    Ben-David, Avishai; Sagripanti, Jose-Luis

    2013-01-01

    The inactivation of pathogenic aerosols by solar radiation is relevant to public health and biodefense. We investigated whether a relatively simple method to calculate solar diffuse and total irradiances could be developed and used in environmental photobiology estimations instead of complex atmospheric radiative transfer computer programs. The second-order regression model that we developed reproduced 13 radiation quantities calculated for equinoxes and solstices at 35(°) latitude with a computer-intensive and rather complex atmospheric radiative transfer program (MODTRAN) with a mean error <6% (2% for most radiation quantities). Extending the application of the regression model from a reference latitude and date (chosen as 35° latitude for 21 March) to different latitudes and days of the year was accomplished with variable success: usually with a mean error <15% (but as high as 150% for some combination of latitudes and days of year). This accuracy of the methodology proposed here compares favorably to photobiological experiments where the microbial survival is usually measured with an accuracy no better than ±0.5 log10 units. The approach and equations presented in this study should assist in estimating the maximum time during which microbial pathogens remain infectious after accidental or intentional aerosolization in open environments. © Published 2013. This article is a U.S. Government work and is in the public domain in the USA. Photochemistry and Photobiology © 2013 The American Society of Photobiology.

  3. Regression discontinuity design in criminal justice evaluation: an introduction and illustration.

    PubMed

    Rhodes, William; Jalbert, Sarah Kuck

    2013-01-01

    Corrections agencies frequently place offenders into risk categories, within which offenders receive different levels of supervision and programming. This supervision strategy is seldom evaluated but often can be through routine use of a regression discontinuity design (RDD). This article argues that RDD provides a rigorous and cost-effective method for correctional agencies to evaluate and improve supervision strategies and advocates for using RDD routinely in corrections administration. The objective is to better employ correctional resources. This article uses a Neyman-Pearson counterfactual framework to introduce readers to RDD, to provide intuition for why RDD should be used broadly, and to motivate a deeper reading into the methodology. The article also illustrates an application of RDD to evaluate an intensive supervision program for probationers. Application of the RDD, which requires basic knowledge of regressions and some special diagnostic tools, is within the competencies of many criminal justice evaluators. RDD is shown to be an effective strategy to identify the treatment effect in a community corrections agency using supervision that meets the necessary conditions for RDD. The article concludes with a critical review of how RDD compares to experimental methods to answer policy questions. The article recommends using RDD to evaluate whether differing levels of control and correction reduce criminal recidivism. It also advocates for routine use of RDD as an administrative tool to determine cut points used to assign offenders into different risk categories based on the offenders' risk scores.

  4. Precision Efficacy Analysis for Regression.

    ERIC Educational Resources Information Center

    Brooks, Gordon P.

    When multiple linear regression is used to develop a prediction model, sample size must be large enough to ensure stable coefficients. If the derivation sample size is inadequate, the model may not predict well for future subjects. The precision efficacy analysis for regression (PEAR) method uses a cross- validity approach to select sample sizes…

  5. Survival Data and Regression Models

    NASA Astrophysics Data System (ADS)

    Grégoire, G.

    2014-12-01

    We start this chapter by introducing some basic elements for the analysis of censored survival data. Then we focus on right censored data and develop two types of regression models. The first one concerns the so-called accelerated failure time models (AFT), which are parametric models where a function of a parameter depends linearly on the covariables. The second one is a semiparametric model, where the covariables enter in a multiplicative form in the expression of the hazard rate function. The main statistical tool for analysing these regression models is the maximum likelihood methodology and, in spite we recall some essential results about the ML theory, we refer to the chapter "Logistic Regression" for a more detailed presentation.

  6. A Comparison of Logistic Regression, Neural Networks, and Classification Trees Predicting Success of Actuarial Students

    ERIC Educational Resources Information Center

    Schumacher, Phyllis; Olinsky, Alan; Quinn, John; Smith, Richard

    2010-01-01

    The authors extended previous research by 2 of the authors who conducted a study designed to predict the successful completion of students enrolled in an actuarial program. They used logistic regression to determine the probability of an actuarial student graduating in the major or dropping out. They compared the results of this study with those…

  7. Quantile regression applied to spectral distance decay

    USGS Publications Warehouse

    Rocchini, D.; Cade, B.S.

    2008-01-01

    Remotely sensed imagery has long been recognized as a powerful support for characterizing and estimating biodiversity. Spectral distance among sites has proven to be a powerful approach for detecting species composition variability. Regression analysis of species similarity versus spectral distance allows us to quantitatively estimate the amount of turnover in species composition with respect to spectral and ecological variability. In classical regression analysis, the residual sum of squares is minimized for the mean of the dependent variable distribution. However, many ecological data sets are characterized by a high number of zeroes that add noise to the regression model. Quantile regressions can be used to evaluate trend in the upper quantiles rather than a mean trend across the whole distribution of the dependent variable. In this letter, we used ordinary least squares (OLS) and quantile regressions to estimate the decay of species similarity versus spectral distance. The achieved decay rates were statistically nonzero (p < 0.01), considering both OLS and quantile regressions. Nonetheless, the OLS regression estimate of the mean decay rate was only half the decay rate indicated by the upper quantiles. Moreover, the intercept value, representing the similarity reached when the spectral distance approaches zero, was very low compared with the intercepts of the upper quantiles, which detected high species similarity when habitats are more similar. In this letter, we demonstrated the power of using quantile regressions applied to spectral distance decay to reveal species diversity patterns otherwise lost or underestimated by OLS regression. ?? 2008 IEEE.

  8. Investigating bias in squared regression structure coefficients

    PubMed Central

    Nimon, Kim F.; Zientek, Linda R.; Thompson, Bruce

    2015-01-01

    The importance of structure coefficients and analogs of regression weights for analysis within the general linear model (GLM) has been well-documented. The purpose of this study was to investigate bias in squared structure coefficients in the context of multiple regression and to determine if a formula that had been shown to correct for bias in squared Pearson correlation coefficients and coefficients of determination could be used to correct for bias in squared regression structure coefficients. Using data from a Monte Carlo simulation, this study found that squared regression structure coefficients corrected with Pratt's formula produced less biased estimates and might be more accurate and stable estimates of population squared regression structure coefficients than estimates with no such corrections. While our findings are in line with prior literature that identified multicollinearity as a predictor of bias in squared regression structure coefficients but not coefficients of determination, the findings from this study are unique in that the level of predictive power, number of predictors, and sample size were also observed to contribute bias in squared regression structure coefficients. PMID:26217273

  9. SEMIPARAMETRIC QUANTILE REGRESSION WITH HIGH-DIMENSIONAL COVARIATES

    PubMed Central

    Zhu, Liping; Huang, Mian; Li, Runze

    2012-01-01

    This paper is concerned with quantile regression for a semiparametric regression model, in which both the conditional mean and conditional variance function of the response given the covariates admit a single-index structure. This semiparametric regression model enables us to reduce the dimension of the covariates and simultaneously retains the flexibility of nonparametric regression. Under mild conditions, we show that the simple linear quantile regression offers a consistent estimate of the index parameter vector. This is a surprising and interesting result because the single-index model is possibly misspecified under the linear quantile regression. With a root-n consistent estimate of the index vector, one may employ a local polynomial regression technique to estimate the conditional quantile function. This procedure is computationally efficient, which is very appealing in high-dimensional data analysis. We show that the resulting estimator of the quantile function performs asymptotically as efficiently as if the true value of the index vector were known. The methodologies are demonstrated through comprehensive simulation studies and an application to a real dataset. PMID:24501536

  10. A Primer on Logistic Regression.

    ERIC Educational Resources Information Center

    Woldbeck, Tanya

    This paper introduces logistic regression as a viable alternative when the researcher is faced with variables that are not continuous. If one is to use simple regression, the dependent variable must be measured on a continuous scale. In the behavioral sciences, it may not always be appropriate or possible to have a measured dependent variable on a…

  11. Robust Mediation Analysis Based on Median Regression

    PubMed Central

    Yuan, Ying; MacKinnon, David P.

    2014-01-01

    Mediation analysis has many applications in psychology and the social sciences. The most prevalent methods typically assume that the error distribution is normal and homoscedastic. However, this assumption may rarely be met in practice, which can affect the validity of the mediation analysis. To address this problem, we propose robust mediation analysis based on median regression. Our approach is robust to various departures from the assumption of homoscedasticity and normality, including heavy-tailed, skewed, contaminated, and heteroscedastic distributions. Simulation studies show that under these circumstances, the proposed method is more efficient and powerful than standard mediation analysis. We further extend the proposed robust method to multilevel mediation analysis, and demonstrate through simulation studies that the new approach outperforms the standard multilevel mediation analysis. We illustrate the proposed method using data from a program designed to increase reemployment and enhance mental health of job seekers. PMID:24079925

  12. Regression Analysis by Example. 5th Edition

    ERIC Educational Resources Information Center

    Chatterjee, Samprit; Hadi, Ali S.

    2012-01-01

    Regression analysis is a conceptually simple method for investigating relationships among variables. Carrying out a successful application of regression analysis, however, requires a balance of theoretical results, empirical rules, and subjective judgment. "Regression Analysis by Example, Fifth Edition" has been expanded and thoroughly…

  13. Deletion Diagnostics for Alternating Logistic Regressions

    PubMed Central

    Preisser, John S.; By, Kunthel; Perin, Jamie; Qaqish, Bahjat F.

    2013-01-01

    Deletion diagnostics are introduced for the regression analysis of clustered binary outcomes estimated with alternating logistic regressions, an implementation of generalized estimating equations (GEE) that estimates regression coefficients in a marginal mean model and in a model for the intracluster association given by the log odds ratio. The diagnostics are developed within an estimating equations framework that recasts the estimating functions for association parameters based upon conditional residuals into equivalent functions based upon marginal residuals. Extensions of earlier work on GEE diagnostics follow directly, including computational formulae for one-step deletion diagnostics that measure the influence of a cluster of observations on the estimated regression parameters and on the overall marginal mean or association model fit. The diagnostic formulae are evaluated with simulations studies and with an application concerning an assessment of factors associated with health maintenance visits in primary care medical practices. The application and the simulations demonstrate that the proposed cluster-deletion diagnostics for alternating logistic regressions are good approximations of their exact fully iterated counterparts. PMID:22777960

  14. Mental chronometry with simple linear regression.

    PubMed

    Chen, J Y

    1997-10-01

    Typically, mental chronometry is performed by means of introducing an independent variable postulated to affect selectively some stage of a presumed multistage process. However, the effect could be a global one that spreads proportionally over all stages of the process. Currently, there is no method to test this possibility although simple linear regression might serve the purpose. In the present study, the regression approach was tested with tasks (memory scanning and mental rotation) that involved a selective effect and with a task (word superiority effect) that involved a global effect, by the dominant theories. The results indicate (1) the manipulation of the size of a memory set or of angular disparity affects the intercept of the regression function that relates the times for memory scanning with different set sizes or for mental rotation with different angular disparities and (2) the manipulation of context affects the slope of the regression function that relates the times for detecting a target character under word and nonword conditions. These ratify the regression approach as a useful method for doing mental chronometry.

  15. Use of AMMI and linear regression models to analyze genotype-environment interaction in durum wheat.

    PubMed

    Nachit, M M; Nachit, G; Ketata, H; Gauch, H G; Zobel, R W

    1992-03-01

    The joint durum wheat (Triticum turgidum L var 'durum') breeding program of the International Maize and Wheat Improvement Center (CIMMYT) and the International Center for Agricultural Research in the Dry Areas (ICARDA) for the Mediterranean region employs extensive multilocation testing. Multilocation testing produces significant genotype-environment (GE) interaction that reduces the accuracy for estimating yield and selecting appropriate germ plasm. The sum of squares (SS) of GE interaction was partitioned by linear regression techniques into joint, genotypic, and environmental regressions, and by Additive Main effects and the Multiplicative Interactions (AMMI) model into five significant Interaction Principal Component Axes (IPCA). The AMMI model was more effective in partitioning the interaction SS than the linear regression technique. The SS contained in the AMMI model was 6 times higher than the SS for all three regressions. Postdictive assessment recommended the use of the first five IPCA axes, while predictive assessment AMMI1 (main effects plus IPCA1). After elimination of random variation, AMMI1 estimates for genotypic yields within sites were more precise than unadjusted means. This increased precision was equivalent to increasing the number of replications by a factor of 3.7.

  16. Practical Session: Logistic Regression

    NASA Astrophysics Data System (ADS)

    Clausel, M.; Grégoire, G.

    2014-12-01

    An exercise is proposed to illustrate the logistic regression. One investigates the different risk factors in the apparition of coronary heart disease. It has been proposed in Chapter 5 of the book of D.G. Kleinbaum and M. Klein, "Logistic Regression", Statistics for Biology and Health, Springer Science Business Media, LLC (2010) and also by D. Chessel and A.B. Dufour in Lyon 1 (see Sect. 6 of http://pbil.univ-lyon1.fr/R/pdf/tdr341.pdf). This example is based on data given in the file evans.txt coming from http://www.sph.emory.edu/dkleinb/logreg3.htm#data.

  17. Synthesizing Regression Results: A Factored Likelihood Method

    ERIC Educational Resources Information Center

    Wu, Meng-Jia; Becker, Betsy Jane

    2013-01-01

    Regression methods are widely used by researchers in many fields, yet methods for synthesizing regression results are scarce. This study proposes using a factored likelihood method, originally developed to handle missing data, to appropriately synthesize regression models involving different predictors. This method uses the correlations reported…

  18. Detection of epistatic effects with logic regression and a classical linear regression model.

    PubMed

    Malina, Magdalena; Ickstadt, Katja; Schwender, Holger; Posch, Martin; Bogdan, Małgorzata

    2014-02-01

    To locate multiple interacting quantitative trait loci (QTL) influencing a trait of interest within experimental populations, usually methods as the Cockerham's model are applied. Within this framework, interactions are understood as the part of the joined effect of several genes which cannot be explained as the sum of their additive effects. However, if a change in the phenotype (as disease) is caused by Boolean combinations of genotypes of several QTLs, this Cockerham's approach is often not capable to identify them properly. To detect such interactions more efficiently, we propose a logic regression framework. Even though with the logic regression approach a larger number of models has to be considered (requiring more stringent multiple testing correction) the efficient representation of higher order logic interactions in logic regression models leads to a significant increase of power to detect such interactions as compared to a Cockerham's approach. The increase in power is demonstrated analytically for a simple two-way interaction model and illustrated in more complex settings with simulation study and real data analysis.

  19. Bayesian Unimodal Density Regression for Causal Inference

    ERIC Educational Resources Information Center

    Karabatsos, George; Walker, Stephen G.

    2011-01-01

    Karabatsos and Walker (2011) introduced a new Bayesian nonparametric (BNP) regression model. Through analyses of real and simulated data, they showed that the BNP regression model outperforms other parametric and nonparametric regression models of common use, in terms of predictive accuracy of the outcome (dependent) variable. The other,…

  20. Regression of altitude-produced cardiac hypertrophy.

    NASA Technical Reports Server (NTRS)

    Sizemore, D. A.; Mcintyre, T. W.; Van Liere, E. J.; Wilson , M. F.

    1973-01-01

    The rate of regression of cardiac hypertrophy with time has been determined in adult male albino rats. The hypertrophy was induced by intermittent exposure to simulated high altitude. The percentage hypertrophy was much greater (46%) in the right ventricle than in the left (16%). The regression could be adequately fitted to a single exponential function with a half-time of 6.73 plus or minus 0.71 days (90% CI). There was no significant difference in the rates of regression for the two ventricles.

  1. Building Regression Models: The Importance of Graphics.

    ERIC Educational Resources Information Center

    Dunn, Richard

    1989-01-01

    Points out reasons for using graphical methods to teach simple and multiple regression analysis. Argues that a graphically oriented approach has considerable pedagogic advantages in the exposition of simple and multiple regression. Shows that graphical methods may play a central role in the process of building regression models. (Author/LS)

  2. Poisson Mixture Regression Models for Heart Disease Prediction.

    PubMed

    Mufudza, Chipo; Erol, Hamza

    2016-01-01

    Early heart disease control can be achieved by high disease prediction and diagnosis efficiency. This paper focuses on the use of model based clustering techniques to predict and diagnose heart disease via Poisson mixture regression models. Analysis and application of Poisson mixture regression models is here addressed under two different classes: standard and concomitant variable mixture regression models. Results show that a two-component concomitant variable Poisson mixture regression model predicts heart disease better than both the standard Poisson mixture regression model and the ordinary general linear Poisson regression model due to its low Bayesian Information Criteria value. Furthermore, a Zero Inflated Poisson Mixture Regression model turned out to be the best model for heart prediction over all models as it both clusters individuals into high or low risk category and predicts rate to heart disease componentwise given clusters available. It is deduced that heart disease prediction can be effectively done by identifying the major risks componentwise using Poisson mixture regression model.

  3. Poisson Mixture Regression Models for Heart Disease Prediction

    PubMed Central

    Erol, Hamza

    2016-01-01

    Early heart disease control can be achieved by high disease prediction and diagnosis efficiency. This paper focuses on the use of model based clustering techniques to predict and diagnose heart disease via Poisson mixture regression models. Analysis and application of Poisson mixture regression models is here addressed under two different classes: standard and concomitant variable mixture regression models. Results show that a two-component concomitant variable Poisson mixture regression model predicts heart disease better than both the standard Poisson mixture regression model and the ordinary general linear Poisson regression model due to its low Bayesian Information Criteria value. Furthermore, a Zero Inflated Poisson Mixture Regression model turned out to be the best model for heart prediction over all models as it both clusters individuals into high or low risk category and predicts rate to heart disease componentwise given clusters available. It is deduced that heart disease prediction can be effectively done by identifying the major risks componentwise using Poisson mixture regression model. PMID:27999611

  4. Regression analysis using dependent Polya trees.

    PubMed

    Schörgendorfer, Angela; Branscum, Adam J

    2013-11-30

    Many commonly used models for linear regression analysis force overly simplistic shape and scale constraints on the residual structure of data. We propose a semiparametric Bayesian model for regression analysis that produces data-driven inference by using a new type of dependent Polya tree prior to model arbitrary residual distributions that are allowed to evolve across increasing levels of an ordinal covariate (e.g., time, in repeated measurement studies). By modeling residual distributions at consecutive covariate levels or time points using separate, but dependent Polya tree priors, distributional information is pooled while allowing for broad pliability to accommodate many types of changing residual distributions. We can use the proposed dependent residual structure in a wide range of regression settings, including fixed-effects and mixed-effects linear and nonlinear models for cross-sectional, prospective, and repeated measurement data. A simulation study illustrates the flexibility of our novel semiparametric regression model to accurately capture evolving residual distributions. In an application to immune development data on immunoglobulin G antibodies in children, our new model outperforms several contemporary semiparametric regression models based on a predictive model selection criterion. Copyright © 2013 John Wiley & Sons, Ltd.

  5. A Regression Model Approach to First-Year Honors Program Admissions Serving a High-Minority Population

    ERIC Educational Resources Information Center

    Rhea, David M.

    2017-01-01

    Many honors programs make admissions decisions based on student high school GPA and a standardized test score. However, McKay argued that standardized test scores can be a barrier to honors program participation, particularly for minority students. Minority students, particularly Hispanic and African American students, are apt to have lower…

  6. RRegrs: an R package for computer-aided model selection with multiple regression models.

    PubMed

    Tsiliki, Georgia; Munteanu, Cristian R; Seoane, Jose A; Fernandez-Lozano, Carlos; Sarimveis, Haralambos; Willighagen, Egon L

    2015-01-01

    Predictive regression models can be created with many different modelling approaches. Choices need to be made for data set splitting, cross-validation methods, specific regression parameters and best model criteria, as they all affect the accuracy and efficiency of the produced predictive models, and therefore, raising model reproducibility and comparison issues. Cheminformatics and bioinformatics are extensively using predictive modelling and exhibit a need for standardization of these methodologies in order to assist model selection and speed up the process of predictive model development. A tool accessible to all users, irrespectively of their statistical knowledge, would be valuable if it tests several simple and complex regression models and validation schemes, produce unified reports, and offer the option to be integrated into more extensive studies. Additionally, such methodology should be implemented as a free programming package, in order to be continuously adapted and redistributed by others. We propose an integrated framework for creating multiple regression models, called RRegrs. The tool offers the option of ten simple and complex regression methods combined with repeated 10-fold and leave-one-out cross-validation. Methods include Multiple Linear regression, Generalized Linear Model with Stepwise Feature Selection, Partial Least Squares regression, Lasso regression, and Support Vector Machines Recursive Feature Elimination. The new framework is an automated fully validated procedure which produces standardized reports to quickly oversee the impact of choices in modelling algorithms and assess the model and cross-validation results. The methodology was implemented as an open source R package, available at https://www.github.com/enanomapper/RRegrs, by reusing and extending on the caret package. The universality of the new methodology is demonstrated using five standard data sets from different scientific fields. Its efficiency in cheminformatics and QSAR

  7. A Fast Solution of the Lindley Equations for the M-Group Regression Problem. Technical Report 78-3, October 1977 through May 1978.

    ERIC Educational Resources Information Center

    Molenaar, Ivo W.

    The technical problems involved in obtaining Bayesian model estimates for the regression parameters in m similar groups are studied. The available computer programs, BPREP (BASIC), and BAYREG, both written in FORTRAN, require an amount of computer processing that does not encourage regular use. These programs are analyzed so that the performance…

  8. Colorectal cancer mortality trends in Córdoba, Argentina.

    PubMed

    Pou, Sonia Alejandra; Osella, Alberto Rubén; Eynard, Aldo Renato; Niclis, Camila; Diaz, María del Pilar

    2009-12-01

    Colorectal cancer is a leading cause of death worldwide for men and women, and one of the most commonly diagnosed in Córdoba, Argentina. The aim of this work was to provide an up-to-date approach to descriptive epidemiology of colorectal cancer in Córdoba throughout the estimation of mortality trends in the period 1986-2006, using Joinpoint and age-period-cohort (APC) models. Age-standardized (world population) mortality rates (ASMR), overall and truncated (35-64 years), were calculated and Joinpoint regression performed to compute the estimated annual percentage changes (EAPC). Poisson sequential models were fitted to estimate the effect of age (11 age groups), period (1986-1990, 1991-1995, 1996-2000 or 2001-2006) and cohort (13 ten-years cohorts overlapping each other by five-years) on colorectal cancer mortality rates. ASMR showed an overall significant decrease (EAPC -0.9 95%CI: -1.7, -0.2) for women, being more noticeable from 1996 onwards (EAPC -2.1 95%CI: -4.0, -0.1). Age-effect showed an important rise in both sexes, but more evident in males. Birth cohort- and period effects reflected increasing and decreasing tendencies for men and women, respectively. Differences in mortality rates were found according to sex and could be related to age-period-cohort effects linked to the ageing process, health care and lifestyle. Further research is needed to elucidate the specific age-, period- and cohort-related factors.

  9. The National Flood Frequency Program, version 3 : a computer program for estimating magnitude and frequency of floods for ungaged sites

    USGS Publications Warehouse

    Ries, Kernell G.; Crouse, Michele Y.

    2002-01-01

    For many years, the U.S. Geological Survey (USGS) has been developing regional regression equations for estimating flood magnitude and frequency at ungaged sites. These regression equations are used to transfer flood characteristics from gaged to ungaged sites through the use of watershed and climatic characteristics as explanatory or predictor variables. Generally, these equations have been developed on a Statewide or metropolitan-area basis as part of cooperative study programs with specific State Departments of Transportation. In 1994, the USGS released a computer program titled the National Flood Frequency Program (NFF), which compiled all the USGS available regression equations for estimating the magnitude and frequency of floods in the United States and Puerto Rico. NFF was developed in cooperation with the Federal Highway Administration and the Federal Emergency Management Agency. Since the initial release of NFF, the USGS has produced new equations for many areas of the Nation. A new version of NFF has been developed that incorporates these new equations and provides additional functionality and ease of use. NFF version 3 provides regression-equation estimates of flood-peak discharges for unregulated rural and urban watersheds, flood-frequency plots, and plots of typical flood hydrographs for selected recurrence intervals. The Program also provides weighting techniques to improve estimates of flood-peak discharges for gaging stations and ungaged sites. The information provided by NFF should be useful to engineers and hydrologists for planning and design applications. This report describes the flood-regionalization techniques used in NFF and provides guidance on the applicability and limitations of the techniques. The NFF software and the documentation for the regression equations included in NFF are available at http://water.usgs.gov/software/nff.html.

  10. Rank-preserving regression: a more robust rank regression model against outliers.

    PubMed

    Chen, Tian; Kowalski, Jeanne; Chen, Rui; Wu, Pan; Zhang, Hui; Feng, Changyong; Tu, Xin M

    2016-08-30

    Mean-based semi-parametric regression models such as the popular generalized estimating equations are widely used to improve robustness of inference over parametric models. Unfortunately, such models are quite sensitive to outlying observations. The Wilcoxon-score-based rank regression (RR) provides more robust estimates over generalized estimating equations against outliers. However, the RR and its extensions do not sufficiently address missing data arising in longitudinal studies. In this paper, we propose a new approach to address outliers under a different framework based on the functional response models. This functional-response-model-based alternative not only addresses limitations of the RR and its extensions for longitudinal data, but, with its rank-preserving property, even provides more robust estimates than these alternatives. The proposed approach is illustrated with both real and simulated data. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  11. Multivariate decoding of brain images using ordinal regression.

    PubMed

    Doyle, O M; Ashburner, J; Zelaya, F O; Williams, S C R; Mehta, M A; Marquand, A F

    2013-11-01

    Neuroimaging data are increasingly being used to predict potential outcomes or groupings, such as clinical severity, drug dose response, and transitional illness states. In these examples, the variable (target) we want to predict is ordinal in nature. Conventional classification schemes assume that the targets are nominal and hence ignore their ranked nature, whereas parametric and/or non-parametric regression models enforce a metric notion of distance between classes. Here, we propose a novel, alternative multivariate approach that overcomes these limitations - whole brain probabilistic ordinal regression using a Gaussian process framework. We applied this technique to two data sets of pharmacological neuroimaging data from healthy volunteers. The first study was designed to investigate the effect of ketamine on brain activity and its subsequent modulation with two compounds - lamotrigine and risperidone. The second study investigates the effect of scopolamine on cerebral blood flow and its modulation using donepezil. We compared ordinal regression to multi-class classification schemes and metric regression. Considering the modulation of ketamine with lamotrigine, we found that ordinal regression significantly outperformed multi-class classification and metric regression in terms of accuracy and mean absolute error. However, for risperidone ordinal regression significantly outperformed metric regression but performed similarly to multi-class classification both in terms of accuracy and mean absolute error. For the scopolamine data set, ordinal regression was found to outperform both multi-class and metric regression techniques considering the regional cerebral blood flow in the anterior cingulate cortex. Ordinal regression was thus the only method that performed well in all cases. Our results indicate the potential of an ordinal regression approach for neuroimaging data while providing a fully probabilistic framework with elegant approaches for model selection

  12. Basis Selection for Wavelet Regression

    NASA Technical Reports Server (NTRS)

    Wheeler, Kevin R.; Lau, Sonie (Technical Monitor)

    1998-01-01

    A wavelet basis selection procedure is presented for wavelet regression. Both the basis and the threshold are selected using cross-validation. The method includes the capability of incorporating prior knowledge on the smoothness (or shape of the basis functions) into the basis selection procedure. The results of the method are demonstrated on sampled functions widely used in the wavelet regression literature. The results of the method are contrasted with other published methods.

  13. Hox-C9 activates the intrinsic pathway of apoptosis and is associated with spontaneous regression in neuroblastoma.

    PubMed

    Kocak, H; Ackermann, S; Hero, B; Kahlert, Y; Oberthuer, A; Juraeva, D; Roels, F; Theissen, J; Westermann, F; Deubzer, H; Ehemann, V; Brors, B; Odenthal, M; Berthold, F; Fischer, M

    2013-04-11

    Neuroblastoma is an embryonal malignancy of the sympathetic nervous system. Spontaneous regression and differentiation of neuroblastoma is observed in a subset of patients, and has been suggested to represent delayed activation of physiologic molecular programs of fetal neuroblasts. Homeobox genes constitute an important family of transcription factors, which play a fundamental role in morphogenesis and cell differentiation during embryogenesis. In this study, we demonstrate that expression of the majority of the human HOX class I homeobox genes is significantly associated with clinical covariates in neuroblastoma using microarray expression data of 649 primary tumors. Moreover, a HOX gene expression-based classifier predicted neuroblastoma patient outcome independently of age, stage and MYCN amplification status. Among all HOX genes, HOXC9 expression was most prominently associated with favorable prognostic markers. Most notably, elevated HOXC9 expression was significantly associated with spontaneous regression in infant neuroblastoma. Re-expression of HOXC9 in three neuroblastoma cell lines led to a significant reduction in cell viability, and abrogated tumor growth almost completely in neuroblastoma xenografts. Neuroblastoma growth arrest was related to the induction of programmed cell death, as indicated by an increase in the sub-G1 fraction and translocation of phosphatidylserine to the outer membrane. Programmed cell death was associated with the release of cytochrome c from the mitochondria into the cytosol and activation of the intrinsic cascade of caspases, indicating that HOXC9 re-expression triggers the intrinsic apoptotic pathway. Collectively, our results show a strong prognostic impact of HOX gene expression in neuroblastoma, and may point towards a role of Hox-C9 in neuroblastoma spontaneous regression.

  14. Stochastic search, optimization and regression with energy applications

    NASA Astrophysics Data System (ADS)

    Hannah, Lauren A.

    Designing clean energy systems will be an important task over the next few decades. One of the major roadblocks is a lack of mathematical tools to economically evaluate those energy systems. However, solutions to these mathematical problems are also of interest to the operations research and statistical communities in general. This thesis studies three problems that are of interest to the energy community itself or provide support for solution methods: R&D portfolio optimization, nonparametric regression and stochastic search with an observable state variable. First, we consider the one stage R&D portfolio optimization problem to avoid the sequential decision process associated with the multi-stage. The one stage problem is still difficult because of a non-convex, combinatorial decision space and a non-convex objective function. We propose a heuristic solution method that uses marginal project values---which depend on the selected portfolio---to create a linear objective function. In conjunction with the 0-1 decision space, this new problem can be solved as a knapsack linear program. This method scales well to large decision spaces. We also propose an alternate, provably convergent algorithm that does not exploit problem structure. These methods are compared on a solid oxide fuel cell R&D portfolio problem. Next, we propose Dirichlet Process mixtures of Generalized Linear Models (DPGLM), a new method of nonparametric regression that accommodates continuous and categorical inputs, and responses that can be modeled by a generalized linear model. We prove conditions for the asymptotic unbiasedness of the DP-GLM regression mean function estimate. We also give examples for when those conditions hold, including models for compactly supported continuous distributions and a model with continuous covariates and categorical response. We empirically analyze the properties of the DP-GLM and why it provides better results than existing Dirichlet process mixture regression

  15. Introduction to the use of regression models in epidemiology.

    PubMed

    Bender, Ralf

    2009-01-01

    Regression modeling is one of the most important statistical techniques used in analytical epidemiology. By means of regression models the effect of one or several explanatory variables (e.g., exposures, subject characteristics, risk factors) on a response variable such as mortality or cancer can be investigated. From multiple regression models, adjusted effect estimates can be obtained that take the effect of potential confounders into account. Regression methods can be applied in all epidemiologic study designs so that they represent a universal tool for data analysis in epidemiology. Different kinds of regression models have been developed in dependence on the measurement scale of the response variable and the study design. The most important methods are linear regression for continuous outcomes, logistic regression for binary outcomes, Cox regression for time-to-event data, and Poisson regression for frequencies and rates. This chapter provides a nontechnical introduction to these regression models with illustrating examples from cancer research.

  16. Wrong Signs in Regression Coefficients

    NASA Technical Reports Server (NTRS)

    McGee, Holly

    1999-01-01

    When using parametric cost estimation, it is important to note the possibility of the regression coefficients having the wrong sign. A wrong sign is defined as a sign on the regression coefficient opposite to the researcher's intuition and experience. Some possible causes for the wrong sign discussed in this paper are a small range of x's, leverage points, missing variables, multicollinearity, and computational error. Additionally, techniques for determining the cause of the wrong sign are given.

  17. Short-term Forecasting of the Prevalence of Trachoma: Expert Opinion, Statistical Regression, versus Transmission Models

    PubMed Central

    Liu, Fengchen; Porco, Travis C.; Amza, Abdou; Kadri, Boubacar; Nassirou, Baido; West, Sheila K.; Bailey, Robin L.; Keenan, Jeremy D.; Solomon, Anthony W.; Emerson, Paul M.; Gambhir, Manoj; Lietman, Thomas M.

    2015-01-01

    Background Trachoma programs rely on guidelines made in large part using expert opinion of what will happen with and without intervention. Large community-randomized trials offer an opportunity to actually compare forecasting methods in a masked fashion. Methods The Program for the Rapid Elimination of Trachoma trials estimated longitudinal prevalence of ocular chlamydial infection from 24 communities treated annually with mass azithromycin. Given antibiotic coverage and biannual assessments from baseline through 30 months, forecasts of the prevalence of infection in each of the 24 communities at 36 months were made by three methods: the sum of 15 experts’ opinion, statistical regression of the square-root-transformed prevalence, and a stochastic hidden Markov model of infection transmission (Susceptible-Infectious-Susceptible, or SIS model). All forecasters were masked to the 36-month results and to the other forecasts. Forecasts of the 24 communities were scored by the likelihood of the observed results and compared using Wilcoxon’s signed-rank statistic. Findings Regression and SIS hidden Markov models had significantly better likelihood than community expert opinion (p = 0.004 and p = 0.01, respectively). All forecasts scored better when perturbed to decrease Fisher’s information. Each individual expert’s forecast was poorer than the sum of experts. Interpretation Regression and SIS models performed significantly better than expert opinion, although all forecasts were overly confident. Further model refinements may score better, although would need to be tested and compared in new masked studies. Construction of guidelines that rely on forecasting future prevalence could consider use of mathematical and statistical models. PMID:26302380

  18. Short-term Forecasting of the Prevalence of Trachoma: Expert Opinion, Statistical Regression, versus Transmission Models.

    PubMed

    Liu, Fengchen; Porco, Travis C; Amza, Abdou; Kadri, Boubacar; Nassirou, Baido; West, Sheila K; Bailey, Robin L; Keenan, Jeremy D; Solomon, Anthony W; Emerson, Paul M; Gambhir, Manoj; Lietman, Thomas M

    2015-08-01

    Trachoma programs rely on guidelines made in large part using expert opinion of what will happen with and without intervention. Large community-randomized trials offer an opportunity to actually compare forecasting methods in a masked fashion. The Program for the Rapid Elimination of Trachoma trials estimated longitudinal prevalence of ocular chlamydial infection from 24 communities treated annually with mass azithromycin. Given antibiotic coverage and biannual assessments from baseline through 30 months, forecasts of the prevalence of infection in each of the 24 communities at 36 months were made by three methods: the sum of 15 experts' opinion, statistical regression of the square-root-transformed prevalence, and a stochastic hidden Markov model of infection transmission (Susceptible-Infectious-Susceptible, or SIS model). All forecasters were masked to the 36-month results and to the other forecasts. Forecasts of the 24 communities were scored by the likelihood of the observed results and compared using Wilcoxon's signed-rank statistic. Regression and SIS hidden Markov models had significantly better likelihood than community expert opinion (p = 0.004 and p = 0.01, respectively). All forecasts scored better when perturbed to decrease Fisher's information. Each individual expert's forecast was poorer than the sum of experts. Regression and SIS models performed significantly better than expert opinion, although all forecasts were overly confident. Further model refinements may score better, although would need to be tested and compared in new masked studies. Construction of guidelines that rely on forecasting future prevalence could consider use of mathematical and statistical models. Clinicaltrials.gov NCT00792922.

  19. Comparative study of biodegradability prediction of chemicals using decision trees, functional trees, and logistic regression.

    PubMed

    Chen, Guangchao; Li, Xuehua; Chen, Jingwen; Zhang, Ya-Nan; Peijnenburg, Willie J G M

    2014-12-01

    Biodegradation is the principal environmental dissipation process of chemicals. As such, it is a dominant factor determining the persistence and fate of organic chemicals in the environment, and is therefore of critical importance to chemical management and regulation. In the present study, the authors developed in silico methods assessing biodegradability based on a large heterogeneous set of 825 organic compounds, using the techniques of the C4.5 decision tree, the functional inner regression tree, and logistic regression. External validation was subsequently carried out by 2 independent test sets of 777 and 27 chemicals. As a result, the functional inner regression tree exhibited the best predictability with predictive accuracies of 81.5% and 81.0%, respectively, on the training set (825 chemicals) and test set I (777 chemicals). Performance of the developed models on the 2 test sets was subsequently compared with that of the Estimation Program Interface (EPI) Suite Biowin 5 and Biowin 6 models, which also showed a better predictability of the functional inner regression tree model. The model built in the present study exhibits a reasonable predictability compared with existing models while possessing a transparent algorithm. Interpretation of the mechanisms of biodegradation was also carried out based on the models developed. © 2014 SETAC.

  20. Suppression Situations in Multiple Linear Regression

    ERIC Educational Resources Information Center

    Shieh, Gwowen

    2006-01-01

    This article proposes alternative expressions for the two most prevailing definitions of suppression without resorting to the standardized regression modeling. The formulation provides a simple basis for the examination of their relationship. For the two-predictor regression, the author demonstrates that the previous results in the literature are…

  1. Variable Selection in Logistic Regression.

    DTIC Science & Technology

    1987-06-01

    23 %. AUTIOR(.) S. CONTRACT OR GRANT NUMBE Rf.i %Z. D. Bai, P. R. Krishnaiah and . C. Zhao F49620-85- C-0008 " PERFORMING ORGANIZATION NAME AND AOORESS...d I7 IOK-TK- d 7 -I0 7’ VARIABLE SELECTION IN LOGISTIC REGRESSION Z. D. Bai, P. R. Krishnaiah and L. C. Zhao Center for Multivariate Analysis...University of Pittsburgh Center for Multivariate Analysis University of Pittsburgh Y !I VARIABLE SELECTION IN LOGISTIC REGRESSION Z- 0. Bai, P. R. Krishnaiah

  2. Logistic regression for circular data

    NASA Astrophysics Data System (ADS)

    Al-Daffaie, Kadhem; Khan, Shahjahan

    2017-05-01

    This paper considers the relationship between a binary response and a circular predictor. It develops the logistic regression model by employing the linear-circular regression approach. The maximum likelihood method is used to estimate the parameters. The Newton-Raphson numerical method is used to find the estimated values of the parameters. A data set from weather records of Toowoomba city is analysed by the proposed methods. Moreover, a simulation study is considered. The R software is used for all computations and simulations.

  3. Should metacognition be measured by logistic regression?

    PubMed

    Rausch, Manuel; Zehetleitner, Michael

    2017-03-01

    Are logistic regression slopes suitable to quantify metacognitive sensitivity, i.e. the efficiency with which subjective reports differentiate between correct and incorrect task responses? We analytically show that logistic regression slopes are independent from rating criteria in one specific model of metacognition, which assumes (i) that rating decisions are based on sensory evidence generated independently of the sensory evidence used for primary task responses and (ii) that the distributions of evidence are logistic. Given a hierarchical model of metacognition, logistic regression slopes depend on rating criteria. According to all considered models, regression slopes depend on the primary task criterion. A reanalysis of previous data revealed that massive numbers of trials are required to distinguish between hierarchical and independent models with tolerable accuracy. It is argued that researchers who wish to use logistic regression as measure of metacognitive sensitivity need to control the primary task criterion and rating criteria. Copyright © 2017 Elsevier Inc. All rights reserved.

  4. Geodesic least squares regression on information manifolds

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Verdoolaege, Geert, E-mail: geert.verdoolaege@ugent.be

    We present a novel regression method targeted at situations with significant uncertainty on both the dependent and independent variables or with non-Gaussian distribution models. Unlike the classic regression model, the conditional distribution of the response variable suggested by the data need not be the same as the modeled distribution. Instead they are matched by minimizing the Rao geodesic distance between them. This yields a more flexible regression method that is less constrained by the assumptions imposed through the regression model. As an example, we demonstrate the improved resistance of our method against some flawed model assumptions and we apply thismore » to scaling laws in magnetic confinement fusion.« less

  5. Interquantile Shrinkage in Regression Models

    PubMed Central

    Jiang, Liewen; Wang, Huixia Judy; Bondell, Howard D.

    2012-01-01

    Conventional analysis using quantile regression typically focuses on fitting the regression model at different quantiles separately. However, in situations where the quantile coefficients share some common feature, joint modeling of multiple quantiles to accommodate the commonality often leads to more efficient estimation. One example of common features is that a predictor may have a constant effect over one region of quantile levels but varying effects in other regions. To automatically perform estimation and detection of the interquantile commonality, we develop two penalization methods. When the quantile slope coefficients indeed do not change across quantile levels, the proposed methods will shrink the slopes towards constant and thus improve the estimation efficiency. We establish the oracle properties of the two proposed penalization methods. Through numerical investigations, we demonstrate that the proposed methods lead to estimations with competitive or higher efficiency than the standard quantile regression estimation in finite samples. Supplemental materials for the article are available online. PMID:24363546

  6. Gaussian Process Regression Model in Spatial Logistic Regression

    NASA Astrophysics Data System (ADS)

    Sofro, A.; Oktaviarina, A.

    2018-01-01

    Spatial analysis has developed very quickly in the last decade. One of the favorite approaches is based on the neighbourhood of the region. Unfortunately, there are some limitations such as difficulty in prediction. Therefore, we offer Gaussian process regression (GPR) to accommodate the issue. In this paper, we will focus on spatial modeling with GPR for binomial data with logit link function. The performance of the model will be investigated. We will discuss the inference of how to estimate the parameters and hyper-parameters and to predict as well. Furthermore, simulation studies will be explained in the last section.

  7. A method for fitting regression splines with varying polynomial order in the linear mixed model.

    PubMed

    Edwards, Lloyd J; Stewart, Paul W; MacDougall, James E; Helms, Ronald W

    2006-02-15

    The linear mixed model has become a widely used tool for longitudinal analysis of continuous variables. The use of regression splines in these models offers the analyst additional flexibility in the formulation of descriptive analyses, exploratory analyses and hypothesis-driven confirmatory analyses. We propose a method for fitting piecewise polynomial regression splines with varying polynomial order in the fixed effects and/or random effects of the linear mixed model. The polynomial segments are explicitly constrained by side conditions for continuity and some smoothness at the points where they join. By using a reparameterization of this explicitly constrained linear mixed model, an implicitly constrained linear mixed model is constructed that simplifies implementation of fixed-knot regression splines. The proposed approach is relatively simple, handles splines in one variable or multiple variables, and can be easily programmed using existing commercial software such as SAS or S-plus. The method is illustrated using two examples: an analysis of longitudinal viral load data from a study of subjects with acute HIV-1 infection and an analysis of 24-hour ambulatory blood pressure profiles.

  8. Regression Analysis and the Sociological Imagination

    ERIC Educational Resources Information Center

    De Maio, Fernando

    2014-01-01

    Regression analysis is an important aspect of most introductory statistics courses in sociology but is often presented in contexts divorced from the central concerns that bring students into the discipline. Consequently, we present five lesson ideas that emerge from a regression analysis of income inequality and mortality in the USA and Canada.

  9. BIODEGRADATION PROBABILITY PROGRAM (BIODEG)

    EPA Science Inventory

    The Biodegradation Probability Program (BIODEG) calculates the probability that a chemical under aerobic conditions with mixed cultures of microorganisms will biodegrade rapidly or slowly. It uses fragment constants developed using multiple linear and non-linear regressions and d...

  10. Algorithm For Solution Of Subset-Regression Problems

    NASA Technical Reports Server (NTRS)

    Verhaegen, Michel

    1991-01-01

    Reliable and flexible algorithm for solution of subset-regression problem performs QR decomposition with new column-pivoting strategy, enables selection of subset directly from originally defined regression parameters. This feature, in combination with number of extensions, makes algorithm very flexible for use in analysis of subset-regression problems in which parameters have physical meanings. Also extended to enable joint processing of columns contaminated by noise with those free of noise, without using scaling techniques.

  11. Local Linear Regression for Data with AR Errors.

    PubMed

    Li, Runze; Li, Yan

    2009-07-01

    In many statistical applications, data are collected over time, and they are likely correlated. In this paper, we investigate how to incorporate the correlation information into the local linear regression. Under the assumption that the error process is an auto-regressive process, a new estimation procedure is proposed for the nonparametric regression by using local linear regression method and the profile least squares techniques. We further propose the SCAD penalized profile least squares method to determine the order of auto-regressive process. Extensive Monte Carlo simulation studies are conducted to examine the finite sample performance of the proposed procedure, and to compare the performance of the proposed procedures with the existing one. From our empirical studies, the newly proposed procedures can dramatically improve the accuracy of naive local linear regression with working-independent error structure. We illustrate the proposed methodology by an analysis of real data set.

  12. Incremental Net Effects in Multiple Regression

    ERIC Educational Resources Information Center

    Lipovetsky, Stan; Conklin, Michael

    2005-01-01

    A regular problem in regression analysis is estimating the comparative importance of the predictors in the model. This work considers the 'net effects', or shares of the predictors in the coefficient of the multiple determination, which is a widely used characteristic of the quality of a regression model. Estimation of the net effects can be a…

  13. Automation of Flight Software Regression Testing

    NASA Technical Reports Server (NTRS)

    Tashakkor, Scott B.

    2016-01-01

    NASA is developing the Space Launch System (SLS) to be a heavy lift launch vehicle supporting human and scientific exploration beyond earth orbit. SLS will have a common core stage, an upper stage, and different permutations of boosters and fairings to perform various crewed or cargo missions. Marshall Space Flight Center (MSFC) is writing the Flight Software (FSW) that will operate the SLS launch vehicle. The FSW is developed in an incremental manner based on "Agile" software techniques. As the FSW is incrementally developed, testing the functionality of the code needs to be performed continually to ensure that the integrity of the software is maintained. Manually testing the functionality on an ever-growing set of requirements and features is not an efficient solution and therefore needs to be done automatically to ensure testing is comprehensive. To support test automation, a framework for a regression test harness has been developed and used on SLS FSW. The test harness provides a modular design approach that can compile or read in the required information specified by the developer of the test. The modularity provides independence between groups of tests and the ability to add and remove tests without disturbing others. This provides the SLS FSW team a time saving feature that is essential to meeting SLS Program technical and programmatic requirements. During development of SLS FSW, this technique has proved to be a useful tool to ensure all requirements have been tested, and that desired functionality is maintained, as changes occur. It also provides a mechanism for developers to check functionality of the code that they have developed. With this system, automation of regression testing is accomplished through a scheduling tool and/or commit hooks. Key advantages of this test harness capability includes execution support for multiple independent test cases, the ability for developers to specify precisely what they are testing and how, the ability to add

  14. SPReM: Sparse Projection Regression Model For High-dimensional Linear Regression *

    PubMed Central

    Sun, Qiang; Zhu, Hongtu; Liu, Yufeng; Ibrahim, Joseph G.

    2014-01-01

    The aim of this paper is to develop a sparse projection regression modeling (SPReM) framework to perform multivariate regression modeling with a large number of responses and a multivariate covariate of interest. We propose two novel heritability ratios to simultaneously perform dimension reduction, response selection, estimation, and testing, while explicitly accounting for correlations among multivariate responses. Our SPReM is devised to specifically address the low statistical power issue of many standard statistical approaches, such as the Hotelling’s T2 test statistic or a mass univariate analysis, for high-dimensional data. We formulate the estimation problem of SPREM as a novel sparse unit rank projection (SURP) problem and propose a fast optimization algorithm for SURP. Furthermore, we extend SURP to the sparse multi-rank projection (SMURP) by adopting a sequential SURP approximation. Theoretically, we have systematically investigated the convergence properties of SURP and the convergence rate of SURP estimates. Our simulation results and real data analysis have shown that SPReM out-performs other state-of-the-art methods. PMID:26527844

  15. Real estate value prediction using multivariate regression models

    NASA Astrophysics Data System (ADS)

    Manjula, R.; Jain, Shubham; Srivastava, Sharad; Rajiv Kher, Pranav

    2017-11-01

    The real estate market is one of the most competitive in terms of pricing and the same tends to vary significantly based on a lot of factors, hence it becomes one of the prime fields to apply the concepts of machine learning to optimize and predict the prices with high accuracy. Therefore in this paper, we present various important features to use while predicting housing prices with good accuracy. We have described regression models, using various features to have lower Residual Sum of Squares error. While using features in a regression model some feature engineering is required for better prediction. Often a set of features (multiple regressions) or polynomial regression (applying a various set of powers in the features) is used for making better model fit. For these models are expected to be susceptible towards over fitting ridge regression is used to reduce it. This paper thus directs to the best application of regression models in addition to other techniques to optimize the result.

  16. Embedded Sensors for Measuring Surface Regression

    NASA Technical Reports Server (NTRS)

    Gramer, Daniel J.; Taagen, Thomas J.; Vermaak, Anton G.

    2006-01-01

    The development and evaluation of new hybrid and solid rocket motors requires accurate characterization of the propellant surface regression as a function of key operational parameters. These characteristics establish the propellant flow rate and are prime design drivers affecting the propulsion system geometry, size, and overall performance. There is a similar need for the development of advanced ablative materials, and the use of conventional ablatives exposed to new operational environments. The Miniature Surface Regression Sensor (MSRS) was developed to serve these applications. It is designed to be cast or embedded in the material of interest and regresses along with it. During this process, the resistance of the sensor is related to its instantaneous length, allowing the real-time thickness of the host material to be established. The time derivative of this data reveals the instantaneous surface regression rate. The MSRS could also be adapted to perform similar measurements for a variety of other host materials when it is desired to monitor thicknesses and/or regression rate for purposes of safety, operational control, or research. For example, the sensor could be used to monitor the thicknesses of brake linings or racecar tires and indicate when they need to be replaced. At the time of this reporting, over 200 of these sensors have been installed into a variety of host materials. An MSRS can be made in either of two configurations, denoted ladder and continuous (see Figure 1). A ladder MSRS includes two highly electrically conductive legs, across which narrow strips of electrically resistive material are placed at small increments of length. These strips resemble the rungs of a ladder and are electrically equivalent to many tiny resistors connected in parallel. A substrate material provides structural support for the legs and rungs. The instantaneous sensor resistance is read by an external signal conditioner via wires attached to the conductive legs on the

  17. Regressive Imagery in Creative Problem-Solving: Comparing Verbal Protocols of Expert and Novice Visual Artists and Computer Programmers

    ERIC Educational Resources Information Center

    Kozbelt, Aaron; Dexter, Scott; Dolese, Melissa; Meredith, Daniel; Ostrofsky, Justin

    2015-01-01

    We applied computer-based text analyses of regressive imagery to verbal protocols of individuals engaged in creative problem-solving in two domains: visual art (23 experts, 23 novices) and computer programming (14 experts, 14 novices). Percentages of words involving primary process and secondary process thought, plus emotion-related words, were…

  18. Testing Different Model Building Procedures Using Multiple Regression.

    ERIC Educational Resources Information Center

    Thayer, Jerome D.

    The stepwise regression method of selecting predictors for computer assisted multiple regression analysis was compared with forward, backward, and best subsets regression, using 16 data sets. The results indicated the stepwise method was preferred because of its practical nature, when the models chosen by different selection methods were similar…

  19. Quantile Regression in the Study of Developmental Sciences

    PubMed Central

    Petscher, Yaacov; Logan, Jessica A. R.

    2014-01-01

    Linear regression analysis is one of the most common techniques applied in developmental research, but only allows for an estimate of the average relations between the predictor(s) and the outcome. This study describes quantile regression, which provides estimates of the relations between the predictor(s) and outcome, but across multiple points of the outcome’s distribution. Using data from the High School and Beyond and U.S. Sustained Effects Study databases, quantile regression is demonstrated and contrasted with linear regression when considering models with: (a) one continuous predictor, (b) one dichotomous predictor, (c) a continuous and a dichotomous predictor, and (d) a longitudinal application. Results from each example exhibited the differential inferences which may be drawn using linear or quantile regression. PMID:24329596

  20. Tendency for age-specific mortality with hypertension in the European Union from 1980 to 2011.

    PubMed

    Tao, Lichan; Pu, Cunying; Shen, Shutong; Fang, Hongyi; Wang, Xiuzhi; Xuan, Qinkao; Xiao, Junjie; Li, Xinli

    2015-01-01

    Tendency for mortality in hypertension has not been well-characterized in European Union (EU). Mortality data from 1980 to 2011 in EU were used to calculate age-standardized mortality rate (ASMR, per 100,000), annual percentage change (APC) and average annual percentage change (AAPC). The Joinpoint Regression Program was used to compare the changes in tendency. Mortality rates in the most recent year studied vary between different countries, with the highest rates observed in Slovakia men and Estonia women. A downward trend in ASMR was demonstrated over all age groups. Robust decreases in ASMR were observed for both men (1991-1994, APC = -13.54) and women (1996-1999, APC = -14.80) aged 55-65 years. The tendency of systolic blood pressure (SBP) from 1980 to 2009 was consistent with ASMR, and the largest decrease was observed among Belgium men and France women. In conclusion, SBP associated ASMR decreased significantly on an annual basis from 1980 to 2009 while a slight increase was observed after 2009. Discrepancies in ASMR from one country to another in EU are significant during last three decades. With a better understanding of the tendency of the prevalence of hypertension and its mortality, efforts will be made to improve awareness and help strict control of hypertension.

  1. Tendency for age-specific mortality with hypertension in the European Union from 1980 to 2011

    PubMed Central

    Tao, Lichan; Pu, Cunying; Shen, Shutong; Fang, Hongyi; Wang, Xiuzhi; Xuan, Qinkao; Xiao, Junjie; Li, Xinli

    2015-01-01

    Tendency for mortality in hypertension has not been well-characterized in European Union (EU). Mortality data from 1980 to 2011 in EU were used to calculate age-standardized mortality rate (ASMR, per 100,000), annual percentage change (APC) and average annual percentage change (AAPC). The Joinpoint Regression Program was used to compare the changes in tendency. Mortality rates in the most recent year studied vary between different countries, with the highest rates observed in Slovakia men and Estonia women. A downward trend in ASMR was demonstrated over all age groups. Robust decreases in ASMR were observed for both men (1991-1994, APC = -13.54) and women (1996-1999, APC = -14.80) aged 55-65 years. The tendency of systolic blood pressure (SBP) from 1980 to 2009 was consistent with ASMR, and the largest decrease was observed among Belgium men and France women. In conclusion, SBP associated ASMR decreased significantly on an annual basis from 1980 to 2009 while a slight increase was observed after 2009. Discrepancies in ASMR from one country to another in EU are significant during last three decades. With a better understanding of the tendency of the prevalence of hypertension and its mortality, efforts will be made to improve awareness and help strict control of hypertension. PMID:25932090

  2. Trends in Suicide by Level of Urbanization - United States, 1999-2015.

    PubMed

    Kegler, Scott R; Stone, Deborah M; Holland, Kristin M

    2017-03-17

    Suicide is a major and continuing public health concern in the United States. During 1999-2015, approximately 600,000 U.S. residents died by suicide, with the highest annual rate occurring in 2015 (1). Annual county-level mortality data from the National Vital Statistics System (NVSS) and annual county-level population data from the U.S. Census Bureau were used to analyze suicide rate trends during 1999-2015, with special emphasis on comparing more urban and less urban areas. U.S. counties were grouped by level of urbanization using a six-level classification scheme. To evaluate rate trends, joinpoint regression methodology was applied to the time-series data for each level of urbanization. Suicide rates significantly increased over the study period for all county groupings and accelerated significantly in 2007-2008 for the medium metro, small metro, and non-metro groupings. Understanding suicide trends by urbanization level can help identify geographic areas of highest risk and focus prevention efforts. Communities can benefit from implementing policies, programs, and practices based on the best available evidence regarding suicide prevention and key risk factors. Many approaches are applicable regardless of urbanization level, whereas certain strategies might be particularly relevant in less urban areas affected by difficult economic conditions, limited access to helping services, and social isolation.

  3. Rounding the Regression

    ERIC Educational Resources Information Center

    Marland, Eric; Bossé, Michael J.; Rhoads, Gregory

    2018-01-01

    Rounding is a necessary step in many mathematical processes. We are taught early in our education about significant figures and how to properly round a number. So when we are given a data set and asked to find a regression line, we are inclined to offer the line with rounded coefficients to reflect our model. However, the effects are not as…

  4. Curriculum emphasis and resident preparation in postgraduate general dentistry programs.

    PubMed

    Lefever, Karen H; Atchison, Kathryn A; Mito, Ronald S; Lin, Sylvia

    2002-06-01

    In 1999 HRSA contracted with the UCLA School of Dentistry to evaluate the impact of federal funding on postgraduate general dentistry programs. Part of that evaluation analyzed curriculum emphasis and preparation of incoming residents in advanced general dentistry programs over a five-year period. Directors of 208 civilian AEGD and GPR programs were surveyed about the curriculum content of their programs, increased or decreased emphasis in thirty subject areas, and resident preparation and quality (GPA and National Board scores). Results indicate that curriculum changes in AEGD and GPR programs over the time period have been responsive to the changing nature of general practice. At least half of all program directors reported that their residents were less than adequately prepared in fourteen curriculum areas. Sub-analyses were conducted for AEGD/GPR programs and HRSA-funded versus nonfunded programs. Multivariate regression identified lower student quality as the most important program variable in predicting a perceived need for resident remediation. Logistic regression showed that programs with higher resident GPA and National Board Part I scores had less difficulty filling resident positions.

  5. Moderation analysis using a two-level regression model.

    PubMed

    Yuan, Ke-Hai; Cheng, Ying; Maxwell, Scott

    2014-10-01

    Moderation analysis is widely used in social and behavioral research. The most commonly used model for moderation analysis is moderated multiple regression (MMR) in which the explanatory variables of the regression model include product terms, and the model is typically estimated by least squares (LS). This paper argues for a two-level regression model in which the regression coefficients of a criterion variable on predictors are further regressed on moderator variables. An algorithm for estimating the parameters of the two-level model by normal-distribution-based maximum likelihood (NML) is developed. Formulas for the standard errors (SEs) of the parameter estimates are provided and studied. Results indicate that, when heteroscedasticity exists, NML with the two-level model gives more efficient and more accurate parameter estimates than the LS analysis of the MMR model. When error variances are homoscedastic, NML with the two-level model leads to essentially the same results as LS with the MMR model. Most importantly, the two-level regression model permits estimating the percentage of variance of each regression coefficient that is due to moderator variables. When applied to data from General Social Surveys 1991, NML with the two-level model identified a significant moderation effect of race on the regression of job prestige on years of education while LS with the MMR model did not. An R package is also developed and documented to facilitate the application of the two-level model.

  6. Independent contrasts and PGLS regression estimators are equivalent.

    PubMed

    Blomberg, Simon P; Lefevre, James G; Wells, Jessie A; Waterhouse, Mary

    2012-05-01

    We prove that the slope parameter of the ordinary least squares regression of phylogenetically independent contrasts (PICs) conducted through the origin is identical to the slope parameter of the method of generalized least squares (GLSs) regression under a Brownian motion model of evolution. This equivalence has several implications: 1. Understanding the structure of the linear model for GLS regression provides insight into when and why phylogeny is important in comparative studies. 2. The limitations of the PIC regression analysis are the same as the limitations of the GLS model. In particular, phylogenetic covariance applies only to the response variable in the regression and the explanatory variable should be regarded as fixed. Calculation of PICs for explanatory variables should be treated as a mathematical idiosyncrasy of the PIC regression algorithm. 3. Since the GLS estimator is the best linear unbiased estimator (BLUE), the slope parameter estimated using PICs is also BLUE. 4. If the slope is estimated using different branch lengths for the explanatory and response variables in the PIC algorithm, the estimator is no longer the BLUE, so this is not recommended. Finally, we discuss whether or not and how to accommodate phylogenetic covariance in regression analyses, particularly in relation to the problem of phylogenetic uncertainty. This discussion is from both frequentist and Bayesian perspectives.

  7. Logistic models--an odd(s) kind of regression.

    PubMed

    Jupiter, Daniel C

    2013-01-01

    The logistic regression model bears some similarity to the multivariable linear regression with which we are familiar. However, the differences are great enough to warrant a discussion of the need for and interpretation of logistic regression. Copyright © 2013 American College of Foot and Ankle Surgeons. Published by Elsevier Inc. All rights reserved.

  8. Demonstration of a Fiber Optic Regression Probe

    NASA Technical Reports Server (NTRS)

    Korman, Valentin; Polzin, Kurt A.

    2010-01-01

    The capability to provide localized, real-time monitoring of material regression rates in various applications has the potential to provide a new stream of data for development testing of various components and systems, as well as serving as a monitoring tool in flight applications. These applications include, but are not limited to, the regression of a combusting solid fuel surface, the ablation of the throat in a chemical rocket or the heat shield of an aeroshell, and the monitoring of erosion in long-life plasma thrusters. The rate of regression in the first application is very fast, while the second and third are increasingly slower. A recent fundamental sensor development effort has led to a novel regression, erosion, and ablation sensor technology (REAST). The REAST sensor allows for measurement of real-time surface erosion rates at a discrete surface location. The sensor is optical, using two different, co-located fiber-optics to perform the regression measurement. The disparate optical transmission properties of the two fiber-optics makes it possible to measure the regression rate by monitoring the relative light attenuation through the fibers. As the fibers regress along with the parent material in which they are embedded, the relative light intensities through the two fibers changes, providing a measure of the regression rate. The optical nature of the system makes it relatively easy to use in a variety of harsh, high temperature environments, and it is also unaffected by the presence of electric and magnetic fields. In addition, the sensor could be used to perform optical spectroscopy on the light emitted by a process and collected by fibers, giving localized measurements of various properties. The capability to perform an in-situ measurement of material regression rates is useful in addressing a variety of physical issues in various applications. An in-situ measurement allows for real-time data regarding the erosion rates, providing a quick method for

  9. Correlation and simple linear regression.

    PubMed

    Zou, Kelly H; Tuncali, Kemal; Silverman, Stuart G

    2003-06-01

    In this tutorial article, the concepts of correlation and regression are reviewed and demonstrated. The authors review and compare two correlation coefficients, the Pearson correlation coefficient and the Spearman rho, for measuring linear and nonlinear relationships between two continuous variables. In the case of measuring the linear relationship between a predictor and an outcome variable, simple linear regression analysis is conducted. These statistical concepts are illustrated by using a data set from published literature to assess a computed tomography-guided interventional technique. These statistical methods are important for exploring the relationships between variables and can be applied to many radiologic studies.

  10. Why are we regressing?

    PubMed

    Jupiter, Daniel C

    2012-01-01

    In this first of a series of statistical methodology commentaries for the clinician, we discuss the use of multivariate linear regression. Copyright © 2012 American College of Foot and Ankle Surgeons. Published by Elsevier Inc. All rights reserved.

  11. Orthogonal Projection in Teaching Regression and Financial Mathematics

    ERIC Educational Resources Information Center

    Kachapova, Farida; Kachapov, Ilias

    2010-01-01

    Two improvements in teaching linear regression are suggested. The first is to include the population regression model at the beginning of the topic. The second is to use a geometric approach: to interpret the regression estimate as an orthogonal projection and the estimation error as the distance (which is minimized by the projection). Linear…

  12. Multivariate Linear Regression and CART Regression Analysis of TBM Performance at Abu Hamour Phase-I Tunnel

    NASA Astrophysics Data System (ADS)

    Jakubowski, J.; Stypulkowski, J. B.; Bernardeau, F. G.

    2017-12-01

    The first phase of the Abu Hamour drainage and storm tunnel was completed in early 2017. The 9.5 km long, 3.7 m diameter tunnel was excavated with two Earth Pressure Balance (EPB) Tunnel Boring Machines from Herrenknecht. TBM operation processes were monitored and recorded by Data Acquisition and Evaluation System. The authors coupled collected TBM drive data with available information on rock mass properties, cleansed, completed with secondary variables and aggregated by weeks and shifts. Correlations and descriptive statistics charts were examined. Multivariate Linear Regression and CART regression tree models linking TBM penetration rate (PR), penetration per revolution (PPR) and field penetration index (FPI) with TBM operational and geotechnical characteristics were performed for the conditions of the weak/soft rock of Doha. Both regression methods are interpretable and the data were screened with different computational approaches allowing enriched insight. The primary goal of the analysis was to investigate empirical relations between multiple explanatory and responding variables, to search for best subsets of explanatory variables and to evaluate the strength of linear and non-linear relations. For each of the penetration indices, a predictive model coupling both regression methods was built and validated. The resultant models appeared to be stronger than constituent ones and indicated an opportunity for more accurate and robust TBM performance predictions.

  13. Logistic regression of family data from retrospective study designs.

    PubMed

    Whittemore, Alice S; Halpern, Jerry

    2003-11-01

    We wish to study the effects of genetic and environmental factors on disease risk, using data from families ascertained because they contain multiple cases of the disease. To do so, we must account for the way participants were ascertained, and for within-family correlations in both disease occurrences and covariates. We model the joint probability distribution of the covariates of ascertained family members, given family disease occurrence and pedigree structure. We describe two such covariate models: the random effects model and the marginal model. Both models assume a logistic form for the distribution of one person's covariates that involves a vector beta of regression parameters. The components of beta in the two models have different interpretations, and they differ in magnitude when the covariates are correlated within families. We describe ascertainment assumptions needed to estimate consistently the parameters beta(RE) in the random effects model and the parameters beta(M) in the marginal model. Under the ascertainment assumptions for the random effects model, we show that conditional logistic regression (CLR) of matched family data gives a consistent estimate beta(RE) for beta(RE) and a consistent estimate for the covariance matrix of beta(RE). Under the ascertainment assumptions for the marginal model, we show that unconditional logistic regression (ULR) gives a consistent estimate for beta(M), and we give a consistent estimator for its covariance matrix. The random effects/CLR approach is simple to use and to interpret, but it can use data only from families containing both affected and unaffected members. The marginal/ULR approach uses data from all individuals, but its variance estimates require special computations. A C program to compute these variance estimates is available at http://www.stanford.edu/dept/HRP/epidemiology. We illustrate these pros and cons by application to data on the effects of parity on ovarian cancer risk in mother

  14. Prediction of elemental creep. [steady state and cyclic data from regression analysis

    NASA Technical Reports Server (NTRS)

    Davis, J. W.; Rummler, D. R.

    1975-01-01

    Cyclic and steady-state creep tests were performed to provide data which were used to develop predictive equations. These equations, describing creep as a function of stress, temperature, and time, were developed through the use of a least squares regression analyses computer program for both the steady-state and cyclic data sets. Comparison of the data from the two types of tests, revealed that there was no significant difference between the cyclic and steady-state creep strains for the L-605 sheet under the experimental conditions investigated (for the same total time at load). Attempts to develop a single linear equation describing the combined steady-state and cyclic creep data resulted in standard errors of estimates higher than obtained for the individual data sets. A proposed approach to predict elemental creep in metals uses the cyclic creep equation and a computer program which applies strain and time hardening theories of creep accumulation.

  15. Background stratified Poisson regression analysis of cohort data.

    PubMed

    Richardson, David B; Langholz, Bryan

    2012-03-01

    Background stratified Poisson regression is an approach that has been used in the analysis of data derived from a variety of epidemiologically important studies of radiation-exposed populations, including uranium miners, nuclear industry workers, and atomic bomb survivors. We describe a novel approach to fit Poisson regression models that adjust for a set of covariates through background stratification while directly estimating the radiation-disease association of primary interest. The approach makes use of an expression for the Poisson likelihood that treats the coefficients for stratum-specific indicator variables as 'nuisance' variables and avoids the need to explicitly estimate the coefficients for these stratum-specific parameters. Log-linear models, as well as other general relative rate models, are accommodated. This approach is illustrated using data from the Life Span Study of Japanese atomic bomb survivors and data from a study of underground uranium miners. The point estimate and confidence interval obtained from this 'conditional' regression approach are identical to the values obtained using unconditional Poisson regression with model terms for each background stratum. Moreover, it is shown that the proposed approach allows estimation of background stratified Poisson regression models of non-standard form, such as models that parameterize latency effects, as well as regression models in which the number of strata is large, thereby overcoming the limitations of previously available statistical software for fitting background stratified Poisson regression models.

  16. Harmonic regression of Landsat time series for modeling attributes from national forest inventory data

    NASA Astrophysics Data System (ADS)

    Wilson, Barry T.; Knight, Joseph F.; McRoberts, Ronald E.

    2018-03-01

    Imagery from the Landsat Program has been used frequently as a source of auxiliary data for modeling land cover, as well as a variety of attributes associated with tree cover. With ready access to all scenes in the archive since 2008 due to the USGS Landsat Data Policy, new approaches to deriving such auxiliary data from dense Landsat time series are required. Several methods have previously been developed for use with finer temporal resolution imagery (e.g. AVHRR and MODIS), including image compositing and harmonic regression using Fourier series. The manuscript presents a study, using Minnesota, USA during the years 2009-2013 as the study area and timeframe. The study examined the relative predictive power of land cover models, in particular those related to tree cover, using predictor variables based solely on composite imagery versus those using estimated harmonic regression coefficients. The study used two common non-parametric modeling approaches (i.e. k-nearest neighbors and random forests) for fitting classification and regression models of multiple attributes measured on USFS Forest Inventory and Analysis plots using all available Landsat imagery for the study area and timeframe. The estimated Fourier coefficients developed by harmonic regression of tasseled cap transformation time series data were shown to be correlated with land cover, including tree cover. Regression models using estimated Fourier coefficients as predictor variables showed a two- to threefold increase in explained variance for a small set of continuous response variables, relative to comparable models using monthly image composites. Similarly, the overall accuracies of classification models using the estimated Fourier coefficients were approximately 10-20 percentage points higher than the models using the image composites, with corresponding individual class accuracies between six and 45 percentage points higher.

  17. Regression: The Apple Does Not Fall Far From the Tree.

    PubMed

    Vetter, Thomas R; Schober, Patrick

    2018-05-15

    Researchers and clinicians are frequently interested in either: (1) assessing whether there is a relationship or association between 2 or more variables and quantifying this association; or (2) determining whether 1 or more variables can predict another variable. The strength of such an association is mainly described by the correlation. However, regression analysis and regression models can be used not only to identify whether there is a significant relationship or association between variables but also to generate estimations of such a predictive relationship between variables. This basic statistical tutorial discusses the fundamental concepts and techniques related to the most common types of regression analysis and modeling, including simple linear regression, multiple regression, logistic regression, ordinal regression, and Poisson regression, as well as the common yet often underrecognized phenomenon of regression toward the mean. The various types of regression analysis are powerful statistical techniques, which when appropriately applied, can allow for the valid interpretation of complex, multifactorial data. Regression analysis and models can assess whether there is a relationship or association between 2 or more observed variables and estimate the strength of this association, as well as determine whether 1 or more variables can predict another variable. Regression is thus being applied more commonly in anesthesia, perioperative, critical care, and pain research. However, it is crucial to note that regression can identify plausible risk factors; it does not prove causation (a definitive cause and effect relationship). The results of a regression analysis instead identify independent (predictor) variable(s) associated with the dependent (outcome) variable. As with other statistical methods, applying regression requires that certain assumptions be met, which can be tested with specific diagnostics.

  18. Old age pension and intergenerational living arrangements: a regression discontinuity design

    PubMed Central

    2017-01-01

    China launched a pension program for rural residents in 2009, now covering more than 300 million Chinese. This program offers a unique setting for studying the ageing population, given the rapidity of China’s population ageing, traditions of filial piety and co-residence, decreasing number of children, and dearth of formal social security, at a relatively low income level. This paper examines whether receipt of the old-age pension payment equips elderly parents and their adult children to live apart and whether parents substitute children’s time involved in instrumental support to them with service consumption. Employing a regression discontinuity design to a primary longitudinal survey conducted in Guizhou province of China, this paper overcomes challenges in the literature that households eligible for pension payment might be systematically different from ineligible households and that it is difficult to separate the effect of pension from that of age or cohort heterogeneity. Around the pension eligibility age cut-off, results reveal large and significant reduction in intergenerational co-residence of the extended family and increase in service consumption among elderly parents. PMID:29051720

  19. Investigating the Performance of Alternate Regression Weights by Studying All Possible Criteria in Regression Models with a Fixed Set of Predictors

    ERIC Educational Resources Information Center

    Waller, Niels; Jones, Jeff

    2011-01-01

    We describe methods for assessing all possible criteria (i.e., dependent variables) and subsets of criteria for regression models with a fixed set of predictors, x (where x is an n x 1 vector of independent variables). Our methods build upon the geometry of regression coefficients (hereafter called regression weights) in n-dimensional space. For a…

  20. The role of verbal memory in regressions during reading.

    PubMed

    Guérard, Katherine; Saint-Aubin, Jean; Maltais, Marilyne

    2013-01-01

    During reading, participants generally move their eyes rightward on the line. A number of eye movements, called regressions, are made leftward, to words that have already been fixated. In the present study, we investigated the role of verbal memory during regressions. In Experiment 1, participants were asked to read sentences for comprehension. After reading, they were asked to make a regression to a target word presented auditorily. The results revealed that their regressions were guided by memory, as they differed from those of a control group who did not read the sentences. The role of verbal memory during regressions was then investigated by combining the reading task with articulatory suppression (Exps. 2 and 3). The results showed that articulatory suppression affected the size and the accuracy of the initial regression but had a minimal effect on corrective saccades. This suggests that verbal memory plays an important role in determining the location of the initial saccade during regressions.

  1. Quasi-experimental evidence on tobacco tax regressivity.

    PubMed

    Koch, Steven F

    2018-01-01

    Tobacco taxes are known to reduce tobacco consumption and to be regressive, such that tobacco control policy may have the perverse effect of further harming the poor. However, if tobacco consumption falls faster amongst the poor than the rich, tobacco control policy can actually be progressive. We take advantage of persistent and committed tobacco control activities in South Africa to examine the household tobacco expenditure burden. For the analysis, we make use of two South African Income and Expenditure Surveys (2005/06 and 2010/11) that span a series of such tax increases and have been matched across the years, yielding 7806 matched pairs of tobacco consuming households and 4909 matched pairs of cigarette consuming households. By matching households across the surveys, we are able to examine both the regressivity of the household tobacco burden, and any change in that regressivity, and since tobacco taxes have been a consistent component of tobacco prices, our results also relate to the regressivity of tobacco taxes. Like previous research into cigarette and tobacco expenditures, we find that the tobacco burden is regressive; thus, so are tobacco taxes. However, we find that over the five-year period considered, the tobacco burden has decreased, and, most importantly, falls less heavily on the poor. Thus, the tobacco burden and the tobacco tax is less regressive in 2010/11 than in 2005/06. Thus, increased tobacco taxes can, in at least some circumstances, reduce the financial burden that tobacco places on households. Copyright © 2017 Elsevier Ltd. All rights reserved.

  2. Influence diagnostics in meta-regression model.

    PubMed

    Shi, Lei; Zuo, ShanShan; Yu, Dalei; Zhou, Xiaohua

    2017-09-01

    This paper studies the influence diagnostics in meta-regression model including case deletion diagnostic and local influence analysis. We derive the subset deletion formulae for the estimation of regression coefficient and heterogeneity variance and obtain the corresponding influence measures. The DerSimonian and Laird estimation and maximum likelihood estimation methods in meta-regression are considered, respectively, to derive the results. Internal and external residual and leverage measure are defined. The local influence analysis based on case-weights perturbation scheme, responses perturbation scheme, covariate perturbation scheme, and within-variance perturbation scheme are explored. We introduce a method by simultaneous perturbing responses, covariate, and within-variance to obtain the local influence measure, which has an advantage of capable to compare the influence magnitude of influential studies from different perturbations. An example is used to illustrate the proposed methodology. Copyright © 2017 John Wiley & Sons, Ltd.

  3. Neither fixed nor random: weighted least squares meta-regression.

    PubMed

    Stanley, T D; Doucouliagos, Hristos

    2017-03-01

    Our study revisits and challenges two core conventional meta-regression estimators: the prevalent use of 'mixed-effects' or random-effects meta-regression analysis and the correction of standard errors that defines fixed-effects meta-regression analysis (FE-MRA). We show how and explain why an unrestricted weighted least squares MRA (WLS-MRA) estimator is superior to conventional random-effects (or mixed-effects) meta-regression when there is publication (or small-sample) bias that is as good as FE-MRA in all cases and better than fixed effects in most practical applications. Simulations and statistical theory show that WLS-MRA provides satisfactory estimates of meta-regression coefficients that are practically equivalent to mixed effects or random effects when there is no publication bias. When there is publication selection bias, WLS-MRA always has smaller bias than mixed effects or random effects. In practical applications, an unrestricted WLS meta-regression is likely to give practically equivalent or superior estimates to fixed-effects, random-effects, and mixed-effects meta-regression approaches. However, random-effects meta-regression remains viable and perhaps somewhat preferable if selection for statistical significance (publication bias) can be ruled out and when random, additive normal heterogeneity is known to directly affect the 'true' regression coefficient. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  4. Ridge Regression Signal Processing

    NASA Technical Reports Server (NTRS)

    Kuhl, Mark R.

    1990-01-01

    The introduction of the Global Positioning System (GPS) into the National Airspace System (NAS) necessitates the development of Receiver Autonomous Integrity Monitoring (RAIM) techniques. In order to guarantee a certain level of integrity, a thorough understanding of modern estimation techniques applied to navigational problems is required. The extended Kalman filter (EKF) is derived and analyzed under poor geometry conditions. It was found that the performance of the EKF is difficult to predict, since the EKF is designed for a Gaussian environment. A novel approach is implemented which incorporates ridge regression to explain the behavior of an EKF in the presence of dynamics under poor geometry conditions. The basic principles of ridge regression theory are presented, followed by the derivation of a linearized recursive ridge estimator. Computer simulations are performed to confirm the underlying theory and to provide a comparative analysis of the EKF and the recursive ridge estimator.

  5. Detecting outliers when fitting data with nonlinear regression – a new method based on robust nonlinear regression and the false discovery rate

    PubMed Central

    Motulsky, Harvey J; Brown, Ronald E

    2006-01-01

    Background Nonlinear regression, like linear regression, assumes that the scatter of data around the ideal curve follows a Gaussian or normal distribution. This assumption leads to the familiar goal of regression: to minimize the sum of the squares of the vertical or Y-value distances between the points and the curve. Outliers can dominate the sum-of-the-squares calculation, and lead to misleading results. However, we know of no practical method for routinely identifying outliers when fitting curves with nonlinear regression. Results We describe a new method for identifying outliers when fitting data with nonlinear regression. We first fit the data using a robust form of nonlinear regression, based on the assumption that scatter follows a Lorentzian distribution. We devised a new adaptive method that gradually becomes more robust as the method proceeds. To define outliers, we adapted the false discovery rate approach to handling multiple comparisons. We then remove the outliers, and analyze the data using ordinary least-squares regression. Because the method combines robust regression and outlier removal, we call it the ROUT method. When analyzing simulated data, where all scatter is Gaussian, our method detects (falsely) one or more outlier in only about 1–3% of experiments. When analyzing data contaminated with one or several outliers, the ROUT method performs well at outlier identification, with an average False Discovery Rate less than 1%. Conclusion Our method, which combines a new method of robust nonlinear regression with a new method of outlier identification, identifies outliers from nonlinear curve fits with reasonable power and few false positives. PMID:16526949

  6. Robust mislabel logistic regression without modeling mislabel probabilities.

    PubMed

    Hung, Hung; Jou, Zhi-Yu; Huang, Su-Yun

    2018-03-01

    Logistic regression is among the most widely used statistical methods for linear discriminant analysis. In many applications, we only observe possibly mislabeled responses. Fitting a conventional logistic regression can then lead to biased estimation. One common resolution is to fit a mislabel logistic regression model, which takes into consideration of mislabeled responses. Another common method is to adopt a robust M-estimation by down-weighting suspected instances. In this work, we propose a new robust mislabel logistic regression based on γ-divergence. Our proposal possesses two advantageous features: (1) It does not need to model the mislabel probabilities. (2) The minimum γ-divergence estimation leads to a weighted estimating equation without the need to include any bias correction term, that is, it is automatically bias-corrected. These features make the proposed γ-logistic regression more robust in model fitting and more intuitive for model interpretation through a simple weighting scheme. Our method is also easy to implement, and two types of algorithms are included. Simulation studies and the Pima data application are presented to demonstrate the performance of γ-logistic regression. © 2017, The International Biometric Society.

  7. Impact of the global financial crisis on low birth weight in Portugal: a time-trend analysis.

    PubMed

    Kana, Musa Abubakar; Correia, Sofia; Peleteiro, Barbara; Severo, Milton; Barros, Henrique

    2017-01-01

    The 2007-2008 global financial crisis had adverse consequences on population health of affected European countries. Few contemporary studies have studied its effect on perinatal indicators with long-lasting influence on adult health. Therefore, in this study, we investigated the impact of the 2007-2008 global financial crisis on low birth weight (LBW) in Portugal. Data on 2 045 155 singleton births of 1995-2014 were obtained from Statistics Portugal. Joinpoint regression analysis was performed to identify the years in which changes in LBW trends occurred, and to estimate the annual per cent changes (APC). LBW risk by time period expressed as prevalence ratios were computed using the Poisson regression. Contextual changes in sociodemographic and economic factors were provided by their trends. The joinpoint analysis identified 3 distinct periods (2 jointpoints) with different APC in LBW, corresponding to 1995-1999 (APC=4.4; 95% CI 3.2 to 5.6), 2000-2006 (APC=0.1; 95% CI -050 to 0.7) and 2007-2014 (APC=1.6; 95% CI 1.2 to 2.0). For non-Portuguese, it was, respectively, 1995-1999 (APC=1.4; 95% CI -3.9 to 7.0%), 2000-2007 (APC=-4.2; 95% CI -6.4 to -2.0) and 2008-2014 (APC=3.1; 95% CI 0.8 to 5.5). Compared with 1995-1999, all specific maternal characteristics had a 10-15% increase in LBW risk in 2000-2006 and a 20-25% increase in 2007-2014, except among migrants, for which LBW risk remained lower than in 1995-1999 but increased after the crisis. The increasing LBW risk coincides with a deceleration in gross domestic product growth rate, reduction in health expenditure, social protection allocation on family/children support and sickness. The 2007-2008 global financial crisis was associated with a significant increase in LBW, particularly among infants of non-Portuguese mothers. We recommend strengthening social policies aimed at maternity protection for vulnerable mothers and health system maintenance of social equity in perinatal healthcare.

  8. Compound Identification Using Penalized Linear Regression on Metabolomics

    PubMed Central

    Liu, Ruiqi; Wu, Dongfeng; Zhang, Xiang; Kim, Seongho

    2014-01-01

    Compound identification is often achieved by matching the experimental mass spectra to the mass spectra stored in a reference library based on mass spectral similarity. Because the number of compounds in the reference library is much larger than the range of mass-to-charge ratio (m/z) values so that the data become high dimensional data suffering from singularity. For this reason, penalized linear regressions such as ridge regression and the lasso are used instead of the ordinary least squares regression. Furthermore, two-step approaches using the dot product and Pearson’s correlation along with the penalized linear regression are proposed in this study. PMID:27212894

  9. Application of Regression-Discontinuity Analysis in Pharmaceutical Health Services Research

    PubMed Central

    Zuckerman, Ilene H; Lee, Euni; Wutoh, Anthony K; Xue, Zhenyi; Stuart, Bruce

    2006-01-01

    Objective To demonstrate how a relatively underused design, regression-discontinuity (RD), can provide robust estimates of intervention effects when stronger designs are impossible to implement. Data Sources/Study Setting Administrative claims from a Mid-Atlantic state Medicaid program were used to evaluate the effectiveness of an educational drug utilization review intervention. Study Design Quasi-experimental design. Data Collection/Extraction Methods A drug utilization review study was conducted to evaluate a letter intervention to physicians treating Medicaid children with potentially excessive use of short-acting β2-agonist inhalers (SAB). The outcome measure is change in seasonally-adjusted SAB use 5 months pre- and postintervention. To determine if the intervention reduced monthly SAB utilization, results from an RD analysis are compared to findings from a pretest–posttest design using repeated-measure ANOVA. Principal Findings Both analyses indicated that the intervention significantly reduced SAB use among the high users. Average monthly SAB use declined by 0.9 canisters per month (p<.001) according to the repeated-measure ANOVA and by 0.2 canisters per month (p<.001) from RD analysis. Conclusions Regression-discontinuity design is a useful quasi-experimental methodology that has significant advantages in internal validity compared to other pre–post designs when assessing interventions in which subjects' assignment is based on cutoff scores for a critical variable. PMID:16584464

  10. A Simulation Investigation of Principal Component Regression.

    ERIC Educational Resources Information Center

    Allen, David E.

    Regression analysis is one of the more common analytic tools used by researchers. However, multicollinearity between the predictor variables can cause problems in using the results of regression analyses. Problems associated with multicollinearity include entanglement of relative influences of variables due to reduced precision of estimation,…

  11. Modeling Fire Occurrence at the City Scale: A Comparison between Geographically Weighted Regression and Global Linear Regression.

    PubMed

    Song, Chao; Kwan, Mei-Po; Zhu, Jiping

    2017-04-08

    An increasing number of fires are occurring with the rapid development of cities, resulting in increased risk for human beings and the environment. This study compares geographically weighted regression-based models, including geographically weighted regression (GWR) and geographically and temporally weighted regression (GTWR), which integrates spatial and temporal effects and global linear regression models (LM) for modeling fire risk at the city scale. The results show that the road density and the spatial distribution of enterprises have the strongest influences on fire risk, which implies that we should focus on areas where roads and enterprises are densely clustered. In addition, locations with a large number of enterprises have fewer fire ignition records, probably because of strict management and prevention measures. A changing number of significant variables across space indicate that heterogeneity mainly exists in the northern and eastern rural and suburban areas of Hefei city, where human-related facilities or road construction are only clustered in the city sub-centers. GTWR can capture small changes in the spatiotemporal heterogeneity of the variables while GWR and LM cannot. An approach that integrates space and time enables us to better understand the dynamic changes in fire risk. Thus governments can use the results to manage fire safety at the city scale.

  12. Modeling Fire Occurrence at the City Scale: A Comparison between Geographically Weighted Regression and Global Linear Regression

    PubMed Central

    Song, Chao; Kwan, Mei-Po; Zhu, Jiping

    2017-01-01

    An increasing number of fires are occurring with the rapid development of cities, resulting in increased risk for human beings and the environment. This study compares geographically weighted regression-based models, including geographically weighted regression (GWR) and geographically and temporally weighted regression (GTWR), which integrates spatial and temporal effects and global linear regression models (LM) for modeling fire risk at the city scale. The results show that the road density and the spatial distribution of enterprises have the strongest influences on fire risk, which implies that we should focus on areas where roads and enterprises are densely clustered. In addition, locations with a large number of enterprises have fewer fire ignition records, probably because of strict management and prevention measures. A changing number of significant variables across space indicate that heterogeneity mainly exists in the northern and eastern rural and suburban areas of Hefei city, where human-related facilities or road construction are only clustered in the city sub-centers. GTWR can capture small changes in the spatiotemporal heterogeneity of the variables while GWR and LM cannot. An approach that integrates space and time enables us to better understand the dynamic changes in fire risk. Thus governments can use the results to manage fire safety at the city scale. PMID:28397745

  13. Regularized matrix regression

    PubMed Central

    Zhou, Hua; Li, Lexin

    2014-01-01

    Summary Modern technologies are producing a wealth of data with complex structures. For instance, in two-dimensional digital imaging, flow cytometry and electroencephalography, matrix-type covariates frequently arise when measurements are obtained for each combination of two underlying variables. To address scientific questions arising from those data, new regression methods that take matrices as covariates are needed, and sparsity or other forms of regularization are crucial owing to the ultrahigh dimensionality and complex structure of the matrix data. The popular lasso and related regularization methods hinge on the sparsity of the true signal in terms of the number of its non-zero coefficients. However, for the matrix data, the true signal is often of, or can be well approximated by, a low rank structure. As such, the sparsity is frequently in the form of low rank of the matrix parameters, which may seriously violate the assumption of the classical lasso. We propose a class of regularized matrix regression methods based on spectral regularization. A highly efficient and scalable estimation algorithm is developed, and a degrees-of-freedom formula is derived to facilitate model selection along the regularization path. Superior performance of the method proposed is demonstrated on both synthetic and real examples. PMID:24648830

  14. Regression away from the mean: Theory and examples.

    PubMed

    Schwarz, Wolf; Reike, Dennis

    2018-02-01

    Using a standard repeated measures model with arbitrary true score distribution and normal error variables, we present some fundamental closed-form results which explicitly indicate the conditions under which regression effects towards (RTM) and away from the mean are expected. Specifically, we show that for skewed and bimodal distributions many or even most cases will show a regression effect that is in expectation away from the mean, or that is not just towards but actually beyond the mean. We illustrate our results in quantitative detail with typical examples from experimental and biometric applications, which exhibit a clear regression away from the mean ('egression from the mean') signature. We aim not to repeal cautionary advice against potential RTM effects, but to present a balanced view of regression effects, based on a clear identification of the conditions governing the form that regression effects take in repeated measures designs. © 2017 The British Psychological Society.

  15. Two Paradoxes in Linear Regression Analysis.

    PubMed

    Feng, Ge; Peng, Jing; Tu, Dongke; Zheng, Julia Z; Feng, Changyong

    2016-12-25

    Regression is one of the favorite tools in applied statistics. However, misuse and misinterpretation of results from regression analysis are common in biomedical research. In this paper we use statistical theory and simulation studies to clarify some paradoxes around this popular statistical method. In particular, we show that a widely used model selection procedure employed in many publications in top medical journals is wrong. Formal procedures based on solid statistical theory should be used in model selection.

  16. Superquantile Regression: Theory, Algorithms, and Applications

    DTIC Science & Technology

    2014-12-01

    Example C: Stack loss data scatterplot matrix. 91 Regression α c0 caf cwt cac R̄ 2 α R̄ 2 α,Adj Least Squares NA -39.9197 0.7156 1.2953 -0.1521 0.9136...This is due to a small 92 Model Regression α c0 cwt cwt2 R̄ 2 α R̄ 2 α,Adj f2 Least Squares NA -41.9109 2.8174 — 0.7665 0.7542 Quantile 0.25 -32.0000

  17. A regression-based 3-D shoulder rhythm.

    PubMed

    Xu, Xu; Lin, Jia-hua; McGorry, Raymond W

    2014-03-21

    In biomechanical modeling of the shoulder, it is important to know the orientation of each bone in the shoulder girdle when estimating the loads on each musculoskeletal element. However, because of the soft tissue overlying the bones, it is difficult to accurately derive the orientation of the clavicle and scapula using surface markers during dynamic movement. The purpose of this study is to develop two regression models which predict the orientation of the clavicle and the scapula. The first regression model uses humerus orientation and individual factors such as age, gender, and anthropometry data as the predictors. The second regression model includes only the humerus orientation as the predictor. Thirty-eight participants performed 118 static postures covering the volume of the right hand reach. The orientation of the thorax, clavicle, scapula and humerus were measured with a motion tracking system. Regression analysis was performed on the Euler angles decomposed from the orientation of each bone from 26 randomly selected participants. The regression models were then validated with the remaining 12 participants. The results indicate that for the first model, the r(2) of the predicted orientation of the clavicle and the scapula ranged between 0.31 and 0.65, and the RMSE obtained from the validation dataset ranged from 6.92° to 10.39°. For the second model, the r(2) ranged between 0.19 and 0.57, and the RMSE obtained from the validation dataset ranged from 6.62° and 11.13°. The derived regression-based shoulder rhythm could be useful in future biomechanical modeling of the shoulder. Copyright © 2014 The Authors. Published by Elsevier Ltd.. All rights reserved.

  18. Examination of influential observations in penalized spline regression

    NASA Astrophysics Data System (ADS)

    Türkan, Semra

    2013-10-01

    In parametric or nonparametric regression models, the results of regression analysis are affected by some anomalous observations in the data set. Thus, detection of these observations is one of the major steps in regression analysis. These observations are precisely detected by well-known influence measures. Pena's statistic is one of them. In this study, Pena's approach is formulated for penalized spline regression in terms of ordinary residuals and leverages. The real data and artificial data are used to see illustrate the effectiveness of Pena's statistic as to Cook's distance on detecting influential observations. The results of the study clearly reveal that the proposed measure is superior to Cook's Distance to detect these observations in large data set.

  19. Stepwise versus Hierarchical Regression: Pros and Cons

    ERIC Educational Resources Information Center

    Lewis, Mitzi

    2007-01-01

    Multiple regression is commonly used in social and behavioral data analysis. In multiple regression contexts, researchers are very often interested in determining the "best" predictors in the analysis. This focus may stem from a need to identify those predictors that are supportive of theory. Alternatively, the researcher may simply be interested…

  20. Cascade Optimization for Aircraft Engines With Regression and Neural Network Analysis - Approximators

    NASA Technical Reports Server (NTRS)

    Patnaik, Surya N.; Guptill, James D.; Hopkins, Dale A.; Lavelle, Thomas M.

    2000-01-01

    The NASA Engine Performance Program (NEPP) can configure and analyze almost any type of gas turbine engine that can be generated through the interconnection of a set of standard physical components. In addition, the code can optimize engine performance by changing adjustable variables under a set of constraints. However, for engine cycle problems at certain operating points, the NEPP code can encounter difficulties: nonconvergence in the currently implemented Powell's optimization algorithm and deficiencies in the Newton-Raphson solver during engine balancing. A project was undertaken to correct these deficiencies. Nonconvergence was avoided through a cascade optimization strategy, and deficiencies associated with engine balancing were eliminated through neural network and linear regression methods. An approximation-interspersed cascade strategy was used to optimize the engine's operation over its flight envelope. Replacement of Powell's algorithm by the cascade strategy improved the optimization segment of the NEPP code. The performance of the linear regression and neural network methods as alternative engine analyzers was found to be satisfactory. This report considers two examples-a supersonic mixed-flow turbofan engine and a subsonic waverotor-topped engine-to illustrate the results, and it discusses insights gained from the improved version of the NEPP code.

  1. Blood glucose level prediction based on support vector regression using mobile platforms.

    PubMed

    Reymann, Maximilian P; Dorschky, Eva; Groh, Benjamin H; Martindale, Christine; Blank, Peter; Eskofier, Bjoern M

    2016-08-01

    The correct treatment of diabetes is vital to a patient's health: Staying within defined blood glucose levels prevents dangerous short- and long-term effects on the body. Mobile devices informing patients about their future blood glucose levels could enable them to take counter-measures to prevent hypo or hyper periods. Previous work addressed this challenge by predicting the blood glucose levels using regression models. However, these approaches required a physiological model, representing the human body's response to insulin and glucose intake, or are not directly applicable to mobile platforms (smart phones, tablets). In this paper, we propose an algorithm for mobile platforms to predict blood glucose levels without the need for a physiological model. Using an online software simulator program, we trained a Support Vector Regression (SVR) model and exported the parameter settings to our mobile platform. The prediction accuracy of our mobile platform was evaluated with pre-recorded data of a type 1 diabetes patient. The blood glucose level was predicted with an error of 19 % compared to the true value. Considering the permitted error of commercially used devices of 15 %, our algorithm is the basis for further development of mobile prediction algorithms.

  2. Comparing Methodologies for Developing an Early Warning System: Classification and Regression Tree Model versus Logistic Regression. REL 2015-077

    ERIC Educational Resources Information Center

    Koon, Sharon; Petscher, Yaacov

    2015-01-01

    The purpose of this report was to explicate the use of logistic regression and classification and regression tree (CART) analysis in the development of early warning systems. It was motivated by state education leaders' interest in maintaining high classification accuracy while simultaneously improving practitioner understanding of the rules by…

  3. Regression Analysis: Legal Applications in Institutional Research

    ERIC Educational Resources Information Center

    Frizell, Julie A.; Shippen, Benjamin S., Jr.; Luna, Andrew L.

    2008-01-01

    This article reviews multiple regression analysis, describes how its results should be interpreted, and instructs institutional researchers on how to conduct such analyses using an example focused on faculty pay equity between men and women. The use of multiple regression analysis will be presented as a method with which to compare salaries of…

  4. Two Paradoxes in Linear Regression Analysis

    PubMed Central

    FENG, Ge; PENG, Jing; TU, Dongke; ZHENG, Julia Z.; FENG, Changyong

    2016-01-01

    Summary Regression is one of the favorite tools in applied statistics. However, misuse and misinterpretation of results from regression analysis are common in biomedical research. In this paper we use statistical theory and simulation studies to clarify some paradoxes around this popular statistical method. In particular, we show that a widely used model selection procedure employed in many publications in top medical journals is wrong. Formal procedures based on solid statistical theory should be used in model selection. PMID:28638214

  5. Using Regression Equations Built from Summary Data in the Psychological Assessment of the Individual Case: Extension to Multiple Regression

    ERIC Educational Resources Information Center

    Crawford, John R.; Garthwaite, Paul H.; Denham, Annie K.; Chelune, Gordon J.

    2012-01-01

    Regression equations have many useful roles in psychological assessment. Moreover, there is a large reservoir of published data that could be used to build regression equations; these equations could then be employed to test a wide variety of hypotheses concerning the functioning of individual cases. This resource is currently underused because…

  6. Classification and regression tree analysis vs. multivariable linear and logistic regression methods as statistical tools for studying haemophilia.

    PubMed

    Henrard, S; Speybroeck, N; Hermans, C

    2015-11-01

    Haemophilia is a rare genetic haemorrhagic disease characterized by partial or complete deficiency of coagulation factor VIII, for haemophilia A, or IX, for haemophilia B. As in any other medical research domain, the field of haemophilia research is increasingly concerned with finding factors associated with binary or continuous outcomes through multivariable models. Traditional models include multiple logistic regressions, for binary outcomes, and multiple linear regressions for continuous outcomes. Yet these regression models are at times difficult to implement, especially for non-statisticians, and can be difficult to interpret. The present paper sought to didactically explain how, why, and when to use classification and regression tree (CART) analysis for haemophilia research. The CART method is non-parametric and non-linear, based on the repeated partitioning of a sample into subgroups based on a certain criterion. Breiman developed this method in 1984. Classification trees (CTs) are used to analyse categorical outcomes and regression trees (RTs) to analyse continuous ones. The CART methodology has become increasingly popular in the medical field, yet only a few examples of studies using this methodology specifically in haemophilia have to date been published. Two examples using CART analysis and previously published in this field are didactically explained in details. There is increasing interest in using CART analysis in the health domain, primarily due to its ease of implementation, use, and interpretation, thus facilitating medical decision-making. This method should be promoted for analysing continuous or categorical outcomes in haemophilia, when applicable. © 2015 John Wiley & Sons Ltd.

  7. Principles of Quantile Regression and an Application

    ERIC Educational Resources Information Center

    Chen, Fang; Chalhoub-Deville, Micheline

    2014-01-01

    Newer statistical procedures are typically introduced to help address the limitations of those already in practice or to deal with emerging research needs. Quantile regression (QR) is introduced in this paper as a relatively new methodology, which is intended to overcome some of the limitations of least squares mean regression (LMR). QR is more…

  8. Examining the Association between Patient-Reported Symptoms of Attention and Memory Dysfunction with Objective Cognitive Performance: A Latent Regression Rasch Model Approach.

    PubMed

    Li, Yuelin; Root, James C; Atkinson, Thomas M; Ahles, Tim A

    2016-06-01

    Patient-reported cognition generally exhibits poor concordance with objectively assessed cognitive performance. In this article, we introduce latent regression Rasch modeling and provide a step-by-step tutorial for applying Rasch methods as an alternative to traditional correlation to better clarify the relationship of self-report and objective cognitive performance. An example analysis using these methods is also included. Introduction to latent regression Rasch modeling is provided together with a tutorial on implementing it using the JAGS programming language for the Bayesian posterior parameter estimates. In an example analysis, data from a longitudinal neurocognitive outcomes study of 132 breast cancer patients and 45 non-cancer matched controls that included self-report and objective performance measures pre- and post-treatment were analyzed using both conventional and latent regression Rasch model approaches. Consistent with previous research, conventional analysis and correlations between neurocognitive decline and self-reported problems were generally near zero. In contrast, application of latent regression Rasch modeling found statistically reliable associations between objective attention and processing speed measures with self-reported Attention and Memory scores. Latent regression Rasch modeling, together with correlation of specific self-reported cognitive domains with neurocognitive measures, helps to clarify the relationship of self-report with objective performance. While the majority of patients attribute their cognitive difficulties to memory decline, the Rash modeling suggests the importance of processing speed and initial learning. To encourage the use of this method, a step-by-step guide and programming language for implementation is provided. Implications of this method in cognitive outcomes research are discussed. © The Author 2016. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  9. Using Time-Series Regression to Predict Academic Library Circulations.

    ERIC Educational Resources Information Center

    Brooks, Terrence A.

    1984-01-01

    Four methods were used to forecast monthly circulation totals in 15 midwestern academic libraries: dummy time-series regression, lagged time-series regression, simple average (straight-line forecasting), monthly average (naive forecasting). In tests of forecasting accuracy, dummy regression method and monthly mean method exhibited smallest average…

  10. Simple and multiple linear regression: sample size considerations.

    PubMed

    Hanley, James A

    2016-11-01

    The suggested "two subjects per variable" (2SPV) rule of thumb in the Austin and Steyerberg article is a chance to bring out some long-established and quite intuitive sample size considerations for both simple and multiple linear regression. This article distinguishes two of the major uses of regression models that imply very different sample size considerations, neither served well by the 2SPV rule. The first is etiological research, which contrasts mean Y levels at differing "exposure" (X) values and thus tends to focus on a single regression coefficient, possibly adjusted for confounders. The second research genre guides clinical practice. It addresses Y levels for individuals with different covariate patterns or "profiles." It focuses on the profile-specific (mean) Y levels themselves, estimating them via linear compounds of regression coefficients and covariates. By drawing on long-established closed-form variance formulae that lie beneath the standard errors in multiple regression, and by rearranging them for heuristic purposes, one arrives at quite intuitive sample size considerations for both research genres. Copyright © 2016 Elsevier Inc. All rights reserved.

  11. Breeding value accuracy estimates for growth traits using random regression and multi-trait models in Nelore cattle.

    PubMed

    Boligon, A A; Baldi, F; Mercadante, M E Z; Lobo, R B; Pereira, R J; Albuquerque, L G

    2011-06-28

    We quantified the potential increase in accuracy of expected breeding value for weights of Nelore cattle, from birth to mature age, using multi-trait and random regression models on Legendre polynomials and B-spline functions. A total of 87,712 weight records from 8144 females were used, recorded every three months from birth to mature age from the Nelore Brazil Program. For random regression analyses, all female weight records from birth to eight years of age (data set I) were considered. From this general data set, a subset was created (data set II), which included only nine weight records: at birth, weaning, 365 and 550 days of age, and 2, 3, 4, 5, and 6 years of age. Data set II was analyzed using random regression and multi-trait models. The model of analysis included the contemporary group as fixed effects and age of dam as a linear and quadratic covariable. In the random regression analyses, average growth trends were modeled using a cubic regression on orthogonal polynomials of age. Residual variances were modeled by a step function with five classes. Legendre polynomials of fourth and sixth order were utilized to model the direct genetic and animal permanent environmental effects, respectively, while third-order Legendre polynomials were considered for maternal genetic and maternal permanent environmental effects. Quadratic polynomials were applied to model all random effects in random regression models on B-spline functions. Direct genetic and animal permanent environmental effects were modeled using three segments or five coefficients, and genetic maternal and maternal permanent environmental effects were modeled with one segment or three coefficients in the random regression models on B-spline functions. For both data sets (I and II), animals ranked differently according to expected breeding value obtained by random regression or multi-trait models. With random regression models, the highest gains in accuracy were obtained at ages with a low number of

  12. Quantile regression models of animal habitat relationships

    USGS Publications Warehouse

    Cade, Brian S.

    2003-01-01

    Typically, all factors that limit an organism are not measured and included in statistical models used to investigate relationships with their environment. If important unmeasured variables interact multiplicatively with the measured variables, the statistical models often will have heterogeneous response distributions with unequal variances. Quantile regression is an approach for estimating the conditional quantiles of a response variable distribution in the linear model, providing a more complete view of possible causal relationships between variables in ecological processes. Chapter 1 introduces quantile regression and discusses the ordering characteristics, interval nature, sampling variation, weighting, and interpretation of estimates for homogeneous and heterogeneous regression models. Chapter 2 evaluates performance of quantile rankscore tests used for hypothesis testing and constructing confidence intervals for linear quantile regression estimates (0 ≤ τ ≤ 1). A permutation F test maintained better Type I errors than the Chi-square T test for models with smaller n, greater number of parameters p, and more extreme quantiles τ. Both versions of the test required weighting to maintain correct Type I errors when there was heterogeneity under the alternative model. An example application related trout densities to stream channel width:depth. Chapter 3 evaluates a drop in dispersion, F-ratio like permutation test for hypothesis testing and constructing confidence intervals for linear quantile regression estimates (0 ≤ τ ≤ 1). Chapter 4 simulates from a large (N = 10,000) finite population representing grid areas on a landscape to demonstrate various forms of hidden bias that might occur when the effect of a measured habitat variable on some animal was confounded with the effect of another unmeasured variable (spatially and not spatially structured). Depending on whether interactions of the measured habitat and unmeasured variable were negative

  13. Threshold regression to accommodate a censored covariate.

    PubMed

    Qian, Jing; Chiou, Sy Han; Maye, Jacqueline E; Atem, Folefac; Johnson, Keith A; Betensky, Rebecca A

    2018-06-22

    In several common study designs, regression modeling is complicated by the presence of censored covariates. Examples of such covariates include maternal age of onset of dementia that may be right censored in an Alzheimer's amyloid imaging study of healthy subjects, metabolite measurements that are subject to limit of detection censoring in a case-control study of cardiovascular disease, and progressive biomarkers whose baseline values are of interest, but are measured post-baseline in longitudinal neuropsychological studies of Alzheimer's disease. We propose threshold regression approaches for linear regression models with a covariate that is subject to random censoring. Threshold regression methods allow for immediate testing of the significance of the effect of a censored covariate. In addition, they provide for unbiased estimation of the regression coefficient of the censored covariate. We derive the asymptotic properties of the resulting estimators under mild regularity conditions. Simulations demonstrate that the proposed estimators have good finite-sample performance, and often offer improved efficiency over existing methods. We also derive a principled method for selection of the threshold. We illustrate the approach in application to an Alzheimer's disease study that investigated brain amyloid levels in older individuals, as measured through positron emission tomography scans, as a function of maternal age of dementia onset, with adjustment for other covariates. We have developed an R package, censCov, for implementation of our method, available at CRAN. © 2018, The International Biometric Society.

  14. [Spatial interpolation of soil organic matter using regression Kriging and geographically weighted regression Kriging].

    PubMed

    Yang, Shun-hua; Zhang, Hai-tao; Guo, Long; Ren, Yan

    2015-06-01

    Relative elevation and stream power index were selected as auxiliary variables based on correlation analysis for mapping soil organic matter. Geographically weighted regression Kriging (GWRK) and regression Kriging (RK) were used for spatial interpolation of soil organic matter and compared with ordinary Kriging (OK), which acts as a control. The results indicated that soil or- ganic matter was significantly positively correlated with relative elevation whilst it had a significantly negative correlation with stream power index. Semivariance analysis showed that both soil organic matter content and its residuals (including ordinary least square regression residual and GWR resi- dual) had strong spatial autocorrelation. Interpolation accuracies by different methods were esti- mated based on a data set of 98 validation samples. Results showed that the mean error (ME), mean absolute error (MAE) and root mean square error (RMSE) of RK were respectively 39.2%, 17.7% and 20.6% lower than the corresponding values of OK, with a relative-improvement (RI) of 20.63. GWRK showed a similar tendency, having its ME, MAE and RMSE to be respectively 60.6%, 23.7% and 27.6% lower than those of OK, with a RI of 59.79. Therefore, both RK and GWRK significantly improved the accuracy of OK interpolation of soil organic matter due to their in- corporation of auxiliary variables. In addition, GWRK performed obviously better than RK did in this study, and its improved performance should be attributed to the consideration of sample spatial locations.

  15. A Quantile Regression Approach to Understanding the Relations Between Morphological Awareness, Vocabulary, and Reading Comprehension in Adult Basic Education Students

    PubMed Central

    Tighe, Elizabeth L.; Schatschneider, Christopher

    2015-01-01

    The purpose of this study was to investigate the joint and unique contributions of morphological awareness and vocabulary knowledge at five reading comprehension levels in Adult Basic Education (ABE) students. We introduce the statistical technique of multiple quantile regression, which enabled us to assess the predictive utility of morphological awareness and vocabulary knowledge at multiple points (quantiles) along the continuous distribution of reading comprehension. To demonstrate the efficacy of our multiple quantile regression analysis, we compared and contrasted our results with a traditional multiple regression analytic approach. Our results indicated that morphological awareness and vocabulary knowledge accounted for a large portion of the variance (82-95%) in reading comprehension skills across all quantiles. Morphological awareness exhibited the greatest unique predictive ability at lower levels of reading comprehension whereas vocabulary knowledge exhibited the greatest unique predictive ability at higher levels of reading comprehension. These results indicate the utility of using multiple quantile regression to assess trajectories of component skills across multiple levels of reading comprehension. The implications of our findings for ABE programs are discussed. PMID:25351773

  16. A Quantile Regression Approach to Understanding the Relations Among Morphological Awareness, Vocabulary, and Reading Comprehension in Adult Basic Education Students.

    PubMed

    Tighe, Elizabeth L; Schatschneider, Christopher

    2016-07-01

    The purpose of this study was to investigate the joint and unique contributions of morphological awareness and vocabulary knowledge at five reading comprehension levels in adult basic education (ABE) students. We introduce the statistical technique of multiple quantile regression, which enabled us to assess the predictive utility of morphological awareness and vocabulary knowledge at multiple points (quantiles) along the continuous distribution of reading comprehension. To demonstrate the efficacy of our multiple quantile regression analysis, we compared and contrasted our results with a traditional multiple regression analytic approach. Our results indicated that morphological awareness and vocabulary knowledge accounted for a large portion of the variance (82%-95%) in reading comprehension skills across all quantiles. Morphological awareness exhibited the greatest unique predictive ability at lower levels of reading comprehension whereas vocabulary knowledge exhibited the greatest unique predictive ability at higher levels of reading comprehension. These results indicate the utility of using multiple quantile regression to assess trajectories of component skills across multiple levels of reading comprehension. The implications of our findings for ABE programs are discussed. © Hammill Institute on Disabilities 2014.

  17. Mixed conditional logistic regression for habitat selection studies.

    PubMed

    Duchesne, Thierry; Fortin, Daniel; Courbin, Nicolas

    2010-05-01

    1. Resource selection functions (RSFs) are becoming a dominant tool in habitat selection studies. RSF coefficients can be estimated with unconditional (standard) and conditional logistic regressions. While the advantage of mixed-effects models is recognized for standard logistic regression, mixed conditional logistic regression remains largely overlooked in ecological studies. 2. We demonstrate the significance of mixed conditional logistic regression for habitat selection studies. First, we use spatially explicit models to illustrate how mixed-effects RSFs can be useful in the presence of inter-individual heterogeneity in selection and when the assumption of independence from irrelevant alternatives (IIA) is violated. The IIA hypothesis states that the strength of preference for habitat type A over habitat type B does not depend on the other habitat types also available. Secondly, we demonstrate the significance of mixed-effects models to evaluate habitat selection of free-ranging bison Bison bison. 3. When movement rules were homogeneous among individuals and the IIA assumption was respected, fixed-effects RSFs adequately described habitat selection by simulated animals. In situations violating the inter-individual homogeneity and IIA assumptions, however, RSFs were best estimated with mixed-effects regressions, and fixed-effects models could even provide faulty conclusions. 4. Mixed-effects models indicate that bison did not select farmlands, but exhibited strong inter-individual variations in their response to farmlands. Less than half of the bison preferred farmlands over forests. Conversely, the fixed-effect model simply suggested an overall selection for farmlands. 5. Conditional logistic regression is recognized as a powerful approach to evaluate habitat selection when resource availability changes. This regression is increasingly used in ecological studies, but almost exclusively in the context of fixed-effects models. Fitness maximization can imply

  18. Novel applications of multitask learning and multiple output regression to multiple genetic trait prediction.

    PubMed

    He, Dan; Kuhn, David; Parida, Laxmi

    2016-06-15

    Given a set of biallelic molecular markers, such as SNPs, with genotype values encoded numerically on a collection of plant, animal or human samples, the goal of genetic trait prediction is to predict the quantitative trait values by simultaneously modeling all marker effects. Genetic trait prediction is usually represented as linear regression models. In many cases, for the same set of samples and markers, multiple traits are observed. Some of these traits might be correlated with each other. Therefore, modeling all the multiple traits together may improve the prediction accuracy. In this work, we view the multitrait prediction problem from a machine learning angle: as either a multitask learning problem or a multiple output regression problem, depending on whether different traits share the same genotype matrix or not. We then adapted multitask learning algorithms and multiple output regression algorithms to solve the multitrait prediction problem. We proposed a few strategies to improve the least square error of the prediction from these algorithms. Our experiments show that modeling multiple traits together could improve the prediction accuracy for correlated traits. The programs we used are either public or directly from the referred authors, such as MALSAR (http://www.public.asu.edu/~jye02/Software/MALSAR/) package. The Avocado data set has not been published yet and is available upon request. dhe@us.ibm.com. © The Author 2016. Published by Oxford University Press.

  19. Identifying Strategies Programs Adopt to Meet Healthy Eating and Physical Activity Standards in Afterschool Programs.

    PubMed

    Weaver, Robert G; Moore, Justin B; Turner-McGrievy, Brie; Saunders, Ruth; Beighle, Aaron; Khan, M Mahmud; Chandler, Jessica; Brazendale, Keith; Randell, Allison; Webster, Collin; Beets, Michael W

    2017-08-01

    The YMCA of USA has adopted Healthy Eating and Physical Activity (HEPA) Standards for its afterschool programs (ASPs). Little is known about strategies YMCA ASPs are implementing to achieve Standards and these strategies' effectiveness. (1) Identify strategies implemented in YMCA ASPs and (2) evaluate the relationship between strategy implementation and meeting Standards. HEPA was measured via accelerometer (moderate-to-vigorous-physical-activity [MVPA]) and direct observation (snacks served) in 20 ASPs. Strategies were identified and mapped onto a capacity building framework ( Strategies To Enhance Practice [STEPs]). Mixed-effects regression estimated increases in HEPA outcomes as implementation increased. Model-implied estimates were calculated for high (i.e., highest implementation score achieved), moderate (median implementation score across programs), and low (lowest implementation score achieved) implementation for both HEPA separately. Programs implemented a variety of strategies identified in STEPs. For every 1-point increase in implementation score 1.45% (95% confidence interval = 0.33% to 2.55%, p ≤ .001) more girls accumulated 30 min/day of MVPA and fruits and/or vegetables were served on 0.11 more days (95% confidence interval = 0.11-0.45, p ≤ .01). Relationships between implementation and other HEPA outcomes did not reach statistical significance. Still regression estimates indicated that desserts are served on 1.94 fewer days (i.e., 0.40 vs. 2.34) in the highest implementing program than the lowest implementing program and water is served 0.73 more days (i.e., 2.37 vs. 1.64). Adopting HEPA Standards at the national level does not lead to changes in routine practice in all programs. Practical strategies that programs could adopt to more fully comply with the HEPA Standards are identified.

  20. Automated particle identification through regression analysis of size, shape and colour

    NASA Astrophysics Data System (ADS)

    Rodriguez Luna, J. C.; Cooper, J. M.; Neale, S. L.

    2016-04-01

    Rapid point of care diagnostic tests and tests to provide therapeutic information are now available for a range of specific conditions from the measurement of blood glucose levels for diabetes to card agglutination tests for parasitic infections. Due to a lack of specificity these test are often then backed up by more conventional lab based diagnostic methods for example a card agglutination test may be carried out for a suspected parasitic infection in the field and if positive a blood sample can then be sent to a lab for confirmation. The eventual diagnosis is often achieved by microscopic examination of the sample. In this paper we propose a computerized vision system for aiding in the diagnostic process; this system used a novel particle recognition algorithm to improve specificity and speed during the diagnostic process. We will show the detection and classification of different types of cells in a diluted blood sample using regression analysis of their size, shape and colour. The first step is to define the objects to be tracked by a Gaussian Mixture Model for background subtraction and binary opening and closing for noise suppression. After subtracting the objects of interest from the background the next challenge is to predict if a given object belongs to a certain category or not. This is a classification problem, and the output of the algorithm is a Boolean value (true/false). As such the computer program should be able to "predict" with reasonable level of confidence if a given particle belongs to the kind we are looking for or not. We show the use of a binary logistic regression analysis with three continuous predictors: size, shape and color histogram. The results suggest this variables could be very useful in a logistic regression equation as they proved to have a relatively high predictive value on their own.

  1. 7 CFR 275.23 - Determination of State agency program performance.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... NUTRITION SERVICE, DEPARTMENT OF AGRICULTURE FOOD STAMP AND FOOD DISTRIBUTION PROGRAM PERFORMANCE REPORTING... section, the adjusted regressed payment error rate shall be calculated to yield the State agency's payment error rate. The adjusted regressed payment error rate is given by r 1″ + r 2″. (ii) If FNS determines...

  2. Deriving the Regression Equation without Using Calculus

    ERIC Educational Resources Information Center

    Gordon, Sheldon P.; Gordon, Florence S.

    2004-01-01

    Probably the one "new" mathematical topic that is most responsible for modernizing courses in college algebra and precalculus over the last few years is the idea of fitting a function to a set of data in the sense of a least squares fit. Whether it be simple linear regression or nonlinear regression, this topic opens the door to applying the…

  3. Boosted regression tree, table, and figure data

    EPA Pesticide Factsheets

    Spreadsheets are included here to support the manuscript Boosted Regression Tree Models to Explain Watershed Nutrient Concentrations and Biological Condition. This dataset is associated with the following publication:Golden , H., C. Lane , A. Prues, and E. D'Amico. Boosted Regression Tree Models to Explain Watershed Nutrient Concentrations and Biological Condition. JAWRA. American Water Resources Association, Middleburg, VA, USA, 52(5): 1251-1274, (2016).

  4. Semisupervised Clustering by Iterative Partition and Regression with Neuroscience Applications

    PubMed Central

    Qian, Guoqi; Wu, Yuehua; Ferrari, Davide; Qiao, Puxue; Hollande, Frédéric

    2016-01-01

    Regression clustering is a mixture of unsupervised and supervised statistical learning and data mining method which is found in a wide range of applications including artificial intelligence and neuroscience. It performs unsupervised learning when it clusters the data according to their respective unobserved regression hyperplanes. The method also performs supervised learning when it fits regression hyperplanes to the corresponding data clusters. Applying regression clustering in practice requires means of determining the underlying number of clusters in the data, finding the cluster label of each data point, and estimating the regression coefficients of the model. In this paper, we review the estimation and selection issues in regression clustering with regard to the least squares and robust statistical methods. We also provide a model selection based technique to determine the number of regression clusters underlying the data. We further develop a computing procedure for regression clustering estimation and selection. Finally, simulation studies are presented for assessing the procedure, together with analyzing a real data set on RGB cell marking in neuroscience to illustrate and interpret the method. PMID:27212939

  5. Quantum regression theorem and non-Markovianity of quantum dynamics

    NASA Astrophysics Data System (ADS)

    Guarnieri, Giacomo; Smirne, Andrea; Vacchini, Bassano

    2014-08-01

    We explore the connection between two recently introduced notions of non-Markovian quantum dynamics and the validity of the so-called quantum regression theorem. While non-Markovianity of a quantum dynamics has been defined looking at the behavior in time of the statistical operator, which determines the evolution of mean values, the quantum regression theorem makes statements about the behavior of system correlation functions of order two and higher. The comparison relies on an estimate of the validity of the quantum regression hypothesis, which can be obtained exactly evaluating two-point correlation functions. To this aim we consider a qubit undergoing dephasing due to interaction with a bosonic bath, comparing the exact evaluation of the non-Markovianity measures with the violation of the quantum regression theorem for a class of spectral densities. We further study a photonic dephasing model, recently exploited for the experimental measurement of non-Markovianity. It appears that while a non-Markovian dynamics according to either definition brings with itself violation of the regression hypothesis, even Markovian dynamics can lead to a failure of the regression relation.

  6. Analyzing hospitalization data: potential limitations of Poisson regression.

    PubMed

    Weaver, Colin G; Ravani, Pietro; Oliver, Matthew J; Austin, Peter C; Quinn, Robert R

    2015-08-01

    Poisson regression is commonly used to analyze hospitalization data when outcomes are expressed as counts (e.g. number of days in hospital). However, data often violate the assumptions on which Poisson regression is based. More appropriate extensions of this model, while available, are rarely used. We compared hospitalization data between 206 patients treated with hemodialysis (HD) and 107 treated with peritoneal dialysis (PD) using Poisson regression and compared results from standard Poisson regression with those obtained using three other approaches for modeling count data: negative binomial (NB) regression, zero-inflated Poisson (ZIP) regression and zero-inflated negative binomial (ZINB) regression. We examined the appropriateness of each model and compared the results obtained with each approach. During a mean 1.9 years of follow-up, 183 of 313 patients (58%) were never hospitalized (indicating an excess of 'zeros'). The data also displayed overdispersion (variance greater than mean), violating another assumption of the Poisson model. Using four criteria, we determined that the NB and ZINB models performed best. According to these two models, patients treated with HD experienced similar hospitalization rates as those receiving PD {NB rate ratio (RR): 1.04 [bootstrapped 95% confidence interval (CI): 0.49-2.20]; ZINB summary RR: 1.21 (bootstrapped 95% CI 0.60-2.46)}. Poisson and ZIP models fit the data poorly and had much larger point estimates than the NB and ZINB models [Poisson RR: 1.93 (bootstrapped 95% CI 0.88-4.23); ZIP summary RR: 1.84 (bootstrapped 95% CI 0.88-3.84)]. We found substantially different results when modeling hospitalization data, depending on the approach used. Our results argue strongly for a sound model selection process and improved reporting around statistical methods used for modeling count data. © The Author 2015. Published by Oxford University Press on behalf of ERA-EDTA. All rights reserved.

  7. Assessing risk factors for periodontitis using regression

    NASA Astrophysics Data System (ADS)

    Lobo Pereira, J. A.; Ferreira, Maria Cristina; Oliveira, Teresa

    2013-10-01

    Multivariate statistical analysis is indispensable to assess the associations and interactions between different factors and the risk of periodontitis. Among others, regression analysis is a statistical technique widely used in healthcare to investigate and model the relationship between variables. In our work we study the impact of socio-demographic, medical and behavioral factors on periodontal health. Using regression, linear and logistic models, we can assess the relevance, as risk factors for periodontitis disease, of the following independent variables (IVs): Age, Gender, Diabetic Status, Education, Smoking status and Plaque Index. The multiple linear regression analysis model was built to evaluate the influence of IVs on mean Attachment Loss (AL). Thus, the regression coefficients along with respective p-values will be obtained as well as the respective p-values from the significance tests. The classification of a case (individual) adopted in the logistic model was the extent of the destruction of periodontal tissues defined by an Attachment Loss greater than or equal to 4 mm in 25% (AL≥4mm/≥25%) of sites surveyed. The association measures include the Odds Ratios together with the correspondent 95% confidence intervals.

  8. A gentle introduction to quantile regression for ecologists

    USGS Publications Warehouse

    Cade, B.S.; Noon, B.R.

    2003-01-01

    Quantile regression is a way to estimate the conditional quantiles of a response variable distribution in the linear model that provides a more complete view of possible causal relationships between variables in ecological processes. Typically, all the factors that affect ecological processes are not measured and included in the statistical models used to investigate relationships between variables associated with those processes. As a consequence, there may be a weak or no predictive relationship between the mean of the response variable (y) distribution and the measured predictive factors (X). Yet there may be stronger, useful predictive relationships with other parts of the response variable distribution. This primer relates quantile regression estimates to prediction intervals in parametric error distribution regression models (eg least squares), and discusses the ordering characteristics, interval nature, sampling variation, weighting, and interpretation of the estimates for homogeneous and heterogeneous regression models.

  9. Multiple Correlation versus Multiple Regression.

    ERIC Educational Resources Information Center

    Huberty, Carl J.

    2003-01-01

    Describes differences between multiple correlation analysis (MCA) and multiple regression analysis (MRA), showing how these approaches involve different research questions and study designs, different inferential approaches, different analysis strategies, and different reported information. (SLD)

  10. Robust ridge regression estimators for nonlinear models with applications to high throughput screening assay data.

    PubMed

    Lim, Changwon

    2015-03-30

    Nonlinear regression is often used to evaluate the toxicity of a chemical or a drug by fitting data from a dose-response study. Toxicologists and pharmacologists may draw a conclusion about whether a chemical is toxic by testing the significance of the estimated parameters. However, sometimes the null hypothesis cannot be rejected even though the fit is quite good. One possible reason for such cases is that the estimated standard errors of the parameter estimates are extremely large. In this paper, we propose robust ridge regression estimation procedures for nonlinear models to solve this problem. The asymptotic properties of the proposed estimators are investigated; in particular, their mean squared errors are derived. The performances of the proposed estimators are compared with several standard estimators using simulation studies. The proposed methodology is also illustrated using high throughput screening assay data obtained from the National Toxicology Program. Copyright © 2014 John Wiley & Sons, Ltd.

  11. Complex regression Doppler optical coherence tomography

    NASA Astrophysics Data System (ADS)

    Elahi, Sahar; Gu, Shi; Thrane, Lars; Rollins, Andrew M.; Jenkins, Michael W.

    2018-04-01

    We introduce a new method to measure Doppler shifts more accurately and extend the dynamic range of Doppler optical coherence tomography (OCT). The two-point estimate of the conventional Doppler method is replaced with a regression that is applied to high-density B-scans in polar coordinates. We built a high-speed OCT system using a 1.68-MHz Fourier domain mode locked laser to acquire high-density B-scans (16,000 A-lines) at high enough frame rates (˜100 fps) to accurately capture the dynamics of the beating embryonic heart. Flow phantom experiments confirm that the complex regression lowers the minimum detectable velocity from 12.25 mm / s to 374 μm / s, whereas the maximum velocity of 400 mm / s is measured without phase wrapping. Complex regression Doppler OCT also demonstrates higher accuracy and precision compared with the conventional method, particularly when signal-to-noise ratio is low. The extended dynamic range allows monitoring of blood flow over several stages of development in embryos without adjusting the imaging parameters. In addition, applying complex averaging recovers hidden features in structural images.

  12. Interaction Models for Functional Regression.

    PubMed

    Usset, Joseph; Staicu, Ana-Maria; Maity, Arnab

    2016-02-01

    A functional regression model with a scalar response and multiple functional predictors is proposed that accommodates two-way interactions in addition to their main effects. The proposed estimation procedure models the main effects using penalized regression splines, and the interaction effect by a tensor product basis. Extensions to generalized linear models and data observed on sparse grids or with measurement error are presented. A hypothesis testing procedure for the functional interaction effect is described. The proposed method can be easily implemented through existing software. Numerical studies show that fitting an additive model in the presence of interaction leads to both poor estimation performance and lost prediction power, while fitting an interaction model where there is in fact no interaction leads to negligible losses. The methodology is illustrated on the AneuRisk65 study data.

  13. The Geometry of Enhancement in Multiple Regression

    ERIC Educational Resources Information Center

    Waller, Niels G.

    2011-01-01

    In linear multiple regression, "enhancement" is said to occur when R[superscript 2] = b[prime]r greater than r[prime]r, where b is a p x 1 vector of standardized regression coefficients and r is a p x 1 vector of correlations between a criterion y and a set of standardized regressors, x. When p = 1 then b [is congruent to] r and…

  14. Penalized nonparametric scalar-on-function regression via principal coordinates

    PubMed Central

    Reiss, Philip T.; Miller, David L.; Wu, Pei-Shien; Hua, Wen-Yu

    2016-01-01

    A number of classical approaches to nonparametric regression have recently been extended to the case of functional predictors. This paper introduces a new method of this type, which extends intermediate-rank penalized smoothing to scalar-on-function regression. In the proposed method, which we call principal coordinate ridge regression, one regresses the response on leading principal coordinates defined by a relevant distance among the functional predictors, while applying a ridge penalty. Our publicly available implementation, based on generalized additive modeling software, allows for fast optimal tuning parameter selection and for extensions to multiple functional predictors, exponential family-valued responses, and mixed-effects models. In an application to signature verification data, principal coordinate ridge regression, with dynamic time warping distance used to define the principal coordinates, is shown to outperform a functional generalized linear model. PMID:29217963

  15. A PHARMACOKINETIC PROGRAM (PKFIT) FOR R

    EPA Science Inventory

    The purpose of this study was to create a nonlinear regression (including a genetic algorithm) program (R script) to deal with data fitting for pharmacokinetics (PK) in R environment using its available packages. We call this tool as PKfit.

  16. Refractive regression after laser in situ keratomileusis.

    PubMed

    Yan, Mabel K; Chang, John Sm; Chan, Tommy Cy

    2018-04-26

    Uncorrected refractive errors are a leading cause of visual impairment across the world. In today's society, laser in situ keratomileusis (LASIK) has become the most commonly performed surgical procedure to correct refractive errors. However, regression of the initially achieved refractive correction has been a widely observed phenomenon following LASIK since its inception more than two decades ago. Despite technological advances in laser refractive surgery and various proposed management strategies, post-LASIK regression is still frequently observed and has significant implications for the long-term visual performance and quality of life of patients. This review explores the mechanism of refractive regression after both myopic and hyperopic LASIK, predisposing risk factors and its clinical course. In addition, current preventative strategies and therapies are also reviewed. © 2018 Royal Australian and New Zealand College of Ophthalmologists.

  17. Estimation of Standard Error of Regression Effects in Latent Regression Models Using Binder's Linearization. Research Report. ETS RR-07-09

    ERIC Educational Resources Information Center

    Li, Deping; Oranje, Andreas

    2007-01-01

    Two versions of a general method for approximating standard error of regression effect estimates within an IRT-based latent regression model are compared. The general method is based on Binder's (1983) approach, accounting for complex samples and finite populations by Taylor series linearization. In contrast, the current National Assessment of…

  18. Estimating effects of limiting factors with regression quantiles

    USGS Publications Warehouse

    Cade, B.S.; Terrell, J.W.; Schroeder, R.L.

    1999-01-01

    In a recent Concepts paper in Ecology, Thomson et al. emphasized that assumptions of conventional correlation and regression analyses fundamentally conflict with the ecological concept of limiting factors, and they called for new statistical procedures to address this problem. The analytical issue is that unmeasured factors may be the active limiting constraint and may induce a pattern of unequal variation in the biological response variable through an interaction with the measured factors. Consequently, changes near the maxima, rather than at the center of response distributions, are better estimates of the effects expected when the observed factor is the active limiting constraint. Regression quantiles provide estimates for linear models fit to any part of a response distribution, including near the upper bounds, and require minimal assumptions about the form of the error distribution. Regression quantiles extend the concept of one-sample quantiles to the linear model by solving an optimization problem of minimizing an asymmetric function of absolute errors. Rank-score tests for regression quantiles provide tests of hypotheses and confidence intervals for parameters in linear models with heteroscedastic errors, conditions likely to occur in models of limiting ecological relations. We used selected regression quantiles (e.g., 5th, 10th, ..., 95th) and confidence intervals to test hypotheses that parameters equal zero for estimated changes in average annual acorn biomass due to forest canopy cover of oak (Quercus spp.) and oak species diversity. Regression quantiles also were used to estimate changes in glacier lily (Erythronium grandiflorum) seedling numbers as a function of lily flower numbers, rockiness, and pocket gopher (Thomomys talpoides fossor) activity, data that motivated the query by Thomson et al. for new statistical procedures. Both example applications showed that effects of limiting factors estimated by changes in some upper regression quantile (e

  19. A SEMIPARAMETRIC BAYESIAN MODEL FOR CIRCULAR-LINEAR REGRESSION

    EPA Science Inventory

    We present a Bayesian approach to regress a circular variable on a linear predictor. The regression coefficients are assumed to have a nonparametric distribution with a Dirichlet process prior. The semiparametric Bayesian approach gives added flexibility to the model and is usefu...

  20. A review and comparison of Bayesian and likelihood-based inferences in beta regression and zero-or-one-inflated beta regression.

    PubMed

    Liu, Fang; Eugenio, Evercita C

    2018-04-01

    Beta regression is an increasingly popular statistical technique in medical research for modeling of outcomes that assume values in (0, 1), such as proportions and patient reported outcomes. When outcomes take values in the intervals [0,1), (0,1], or [0,1], zero-or-one-inflated beta (zoib) regression can be used. We provide a thorough review on beta regression and zoib regression in the modeling, inferential, and computational aspects via the likelihood-based and Bayesian approaches. We demonstrate the statistical and practical importance of correctly modeling the inflation at zero/one rather than ad hoc replacing them with values close to zero/one via simulation studies; the latter approach can lead to biased estimates and invalid inferences. We show via simulation studies that the likelihood-based approach is computationally faster in general than MCMC algorithms used in the Bayesian inferences, but runs the risk of non-convergence, large biases, and sensitivity to starting values in the optimization algorithm especially with clustered/correlated data, data with sparse inflation at zero and one, and data that warrant regularization of the likelihood. The disadvantages of the regular likelihood-based approach make the Bayesian approach an attractive alternative in these cases. Software packages and tools for fitting beta and zoib regressions in both the likelihood-based and Bayesian frameworks are also reviewed.

  1. Assessing Principal Component Regression Prediction of Neurochemicals Detected with Fast-Scan Cyclic Voltammetry

    PubMed Central

    2011-01-01

    Principal component regression is a multivariate data analysis approach routinely used to predict neurochemical concentrations from in vivo fast-scan cyclic voltammetry measurements. This mathematical procedure can rapidly be employed with present day computer programming languages. Here, we evaluate several methods that can be used to evaluate and improve multivariate concentration determination. The cyclic voltammetric representation of the calculated regression vector is shown to be a valuable tool in determining whether the calculated multivariate model is chemically appropriate. The use of Cook’s distance successfully identified outliers contained within in vivo fast-scan cyclic voltammetry training sets. This work also presents the first direct interpretation of a residual color plot and demonstrated the effect of peak shifts on predicted dopamine concentrations. Finally, separate analyses of smaller increments of a single continuous measurement could not be concatenated without substantial error in the predicted neurochemical concentrations due to electrode drift. Taken together, these tools allow for the construction of more robust multivariate calibration models and provide the first approach to assess the predictive ability of a procedure that is inherently impossible to validate because of the lack of in vivo standards. PMID:21966586

  2. Assessing principal component regression prediction of neurochemicals detected with fast-scan cyclic voltammetry.

    PubMed

    Keithley, Richard B; Wightman, R Mark

    2011-06-07

    Principal component regression is a multivariate data analysis approach routinely used to predict neurochemical concentrations from in vivo fast-scan cyclic voltammetry measurements. This mathematical procedure can rapidly be employed with present day computer programming languages. Here, we evaluate several methods that can be used to evaluate and improve multivariate concentration determination. The cyclic voltammetric representation of the calculated regression vector is shown to be a valuable tool in determining whether the calculated multivariate model is chemically appropriate. The use of Cook's distance successfully identified outliers contained within in vivo fast-scan cyclic voltammetry training sets. This work also presents the first direct interpretation of a residual color plot and demonstrated the effect of peak shifts on predicted dopamine concentrations. Finally, separate analyses of smaller increments of a single continuous measurement could not be concatenated without substantial error in the predicted neurochemical concentrations due to electrode drift. Taken together, these tools allow for the construction of more robust multivariate calibration models and provide the first approach to assess the predictive ability of a procedure that is inherently impossible to validate because of the lack of in vivo standards.

  3. Using ridge regression in systematic pointing error corrections

    NASA Technical Reports Server (NTRS)

    Guiar, C. N.

    1988-01-01

    A pointing error model is used in the antenna calibration process. Data from spacecraft or radio star observations are used to determine the parameters in the model. However, the regression variables are not truly independent, displaying a condition known as multicollinearity. Ridge regression, a biased estimation technique, is used to combat the multicollinearity problem. Two data sets pertaining to Voyager 1 spacecraft tracking (days 105 and 106 of 1987) were analyzed using both linear least squares and ridge regression methods. The advantages and limitations of employing the technique are presented. The problem is not yet fully resolved.

  4. Fuel Regression Rate Behavior of CAMUI Hybrid Rocket

    NASA Astrophysics Data System (ADS)

    Kaneko, Yudai; Itoh, Mitsunori; Kakikura, Akihito; Mori, Kazuhiro; Uejima, Kenta; Nakashima, Takuji; Wakita, Masashi; Totani, Tsuyoshi; Oshima, Nobuyuki; Nagata, Harunori

    A series of static firing tests was conducted to investigate the fuel regression characteristics of a Cascaded Multistage Impinging-jet (CAMUI) type hybrid rocket motor. A CAMUI type hybrid rocket uses the combination of liquid oxygen and a fuel grain made of polyethylene as a propellant. The collision distance divided by the port diameter, H/D, was varied to investigate the effect of the grain geometry on the fuel regression rate. As a result, the H/D geometry has little effect on the regression rate near the stagnation point, where the heat transfer coefficient is high. On the contrary, the fuel regression rate decreases near the circumference of the forward-end face and the backward-end face of fuel blocks. Besides the experimental approaches, a method of computational fluid dynamics clarified the heat transfer distribution on the grain surface with various H/D geometries. The calculation shows the decrease of the flow velocity due to the increase of H/D on the area where the fuel regression rate decreases with the increase of H/D. To estimate the exact fuel consumption, which is necessary to design a fuel grain, real-time measurement by an ultrasonic pulse-echo method was performed.

  5. Censored quantile regression with recursive partitioning-based weights

    PubMed Central

    Wey, Andrew; Wang, Lan; Rudser, Kyle

    2014-01-01

    Censored quantile regression provides a useful alternative to the Cox proportional hazards model for analyzing survival data. It directly models the conditional quantile of the survival time and hence is easy to interpret. Moreover, it relaxes the proportionality constraint on the hazard function associated with the popular Cox model and is natural for modeling heterogeneity of the data. Recently, Wang and Wang (2009. Locally weighted censored quantile regression. Journal of the American Statistical Association 103, 1117–1128) proposed a locally weighted censored quantile regression approach that allows for covariate-dependent censoring and is less restrictive than other censored quantile regression methods. However, their kernel smoothing-based weighting scheme requires all covariates to be continuous and encounters practical difficulty with even a moderate number of covariates. We propose a new weighting approach that uses recursive partitioning, e.g. survival trees, that offers greater flexibility in handling covariate-dependent censoring in moderately high dimensions and can incorporate both continuous and discrete covariates. We prove that this new weighting scheme leads to consistent estimation of the quantile regression coefficients and demonstrate its effectiveness via Monte Carlo simulations. We also illustrate the new method using a widely recognized data set from a clinical trial on primary biliary cirrhosis. PMID:23975800

  6. Study of Rapid-Regression Liquefying Hybrid Rocket Fuels

    NASA Technical Reports Server (NTRS)

    Zilliac, Greg; DeZilwa, Shane; Karabeyoglu, M. Arif; Cantwell, Brian J.; Castellucci, Paul

    2004-01-01

    A report describes experiments directed toward the development of paraffin-based hybrid rocket fuels that burn at regression rates greater than those of conventional hybrid rocket fuels like hydroxyl-terminated butadiene. The basic approach followed in this development is to use materials such that a hydrodynamically unstable liquid layer forms on the melting surface of a burning fuel body. Entrainment of droplets from the liquid/gas interface can substantially increase the rate of fuel mass transfer, leading to surface regression faster than can be achieved using conventional fuels. The higher regression rate eliminates the need for the complex multi-port grain structures of conventional solid rocket fuels, making it possible to obtain acceptable performance from single-port structures. The high-regression-rate fuels contain no toxic or otherwise hazardous components and can be shipped commercially as non-hazardous commodities. Among the experiments performed on these fuels were scale-up tests using gaseous oxygen. The data from these tests were found to agree with data from small-scale, low-pressure and low-mass-flux laboratory tests and to confirm the expectation that these fuels would burn at high regression rates, chamber pressures, and mass fluxes representative of full-scale rocket motors.

  7. R programming for parameters estimation of geographically weighted ordinal logistic regression (GWOLR) model based on Newton Raphson

    NASA Astrophysics Data System (ADS)

    Zuhdi, Shaifudin; Saputro, Dewi Retno Sari

    2017-03-01

    GWOLR model used for represent relationship between dependent variable has categories and scale of category is ordinal with independent variable influenced the geographical location of the observation site. Parameters estimation of GWOLR model use maximum likelihood provide system of nonlinear equations and hard to be found the result in analytic resolution. By finishing it, it means determine the maximum completion, this thing associated with optimizing problem. The completion nonlinear system of equations optimize use numerical approximation, which one is Newton Raphson method. The purpose of this research is to make iteration algorithm Newton Raphson and program using R software to estimate GWOLR model. Based on the research obtained that program in R can be used to estimate the parameters of GWOLR model by forming a syntax program with command "while".

  8. U.S. Marine Corps Enlistment Bonus Program

    DTIC Science & Technology

    1983-01-01

    Bonus Groups to Alternative Question ...... -1 B-2 Responses to Attitude Questions For Bonus Groups ....... -2 C-1 Calculation of Enlistment Supply ...can be used to measure the effect of changes in bonus levels on enlistment supply . Alternatively , we can measure the effect of the bonus program...EBP (Enlistment Bonus Program) EniisteJ Personnel, EnltsteA Supply , marine corps Personnel, Questionnaires, Recruiting, Regression Aalysis, Response

  9. Neural Network and Regression Approximations in High Speed Civil Transport Aircraft Design Optimization

    NASA Technical Reports Server (NTRS)

    Patniak, Surya N.; Guptill, James D.; Hopkins, Dale A.; Lavelle, Thomas M.

    1998-01-01

    Nonlinear mathematical-programming-based design optimization can be an elegant method. However, the calculations required to generate the merit function, constraints, and their gradients, which are frequently required, can make the process computational intensive. The computational burden can be greatly reduced by using approximating analyzers derived from an original analyzer utilizing neural networks and linear regression methods. The experience gained from using both of these approximation methods in the design optimization of a high speed civil transport aircraft is the subject of this paper. The Langley Research Center's Flight Optimization System was selected for the aircraft analysis. This software was exercised to generate a set of training data with which a neural network and a regression method were trained, thereby producing the two approximating analyzers. The derived analyzers were coupled to the Lewis Research Center's CometBoards test bed to provide the optimization capability. With the combined software, both approximation methods were examined for use in aircraft design optimization, and both performed satisfactorily. The CPU time for solution of the problem, which had been measured in hours, was reduced to minutes with the neural network approximation and to seconds with the regression method. Instability encountered in the aircraft analysis software at certain design points was also eliminated. On the other hand, there were costs and difficulties associated with training the approximating analyzers. The CPU time required to generate the input-output pairs and to train the approximating analyzers was seven times that required for solution of the problem.

  10. Bayesian nonparametric regression with varying residual density

    PubMed Central

    Pati, Debdeep; Dunson, David B.

    2013-01-01

    We consider the problem of robust Bayesian inference on the mean regression function allowing the residual density to change flexibly with predictors. The proposed class of models is based on a Gaussian process prior for the mean regression function and mixtures of Gaussians for the collection of residual densities indexed by predictors. Initially considering the homoscedastic case, we propose priors for the residual density based on probit stick-breaking (PSB) scale mixtures and symmetrized PSB (sPSB) location-scale mixtures. Both priors restrict the residual density to be symmetric about zero, with the sPSB prior more flexible in allowing multimodal densities. We provide sufficient conditions to ensure strong posterior consistency in estimating the regression function under the sPSB prior, generalizing existing theory focused on parametric residual distributions. The PSB and sPSB priors are generalized to allow residual densities to change nonparametrically with predictors through incorporating Gaussian processes in the stick-breaking components. This leads to a robust Bayesian regression procedure that automatically down-weights outliers and influential observations in a locally-adaptive manner. Posterior computation relies on an efficient data augmentation exact block Gibbs sampler. The methods are illustrated using simulated and real data applications. PMID:24465053

  11. Shrinkage regression-based methods for microarray missing value imputation.

    PubMed

    Wang, Hsiuying; Chiu, Chia-Chun; Wu, Yi-Ching; Wu, Wei-Sheng

    2013-01-01

    Missing values commonly occur in the microarray data, which usually contain more than 5% missing values with up to 90% of genes affected. Inaccurate missing value estimation results in reducing the power of downstream microarray data analyses. Many types of methods have been developed to estimate missing values. Among them, the regression-based methods are very popular and have been shown to perform better than the other types of methods in many testing microarray datasets. To further improve the performances of the regression-based methods, we propose shrinkage regression-based methods. Our methods take the advantage of the correlation structure in the microarray data and select similar genes for the target gene by Pearson correlation coefficients. Besides, our methods incorporate the least squares principle, utilize a shrinkage estimation approach to adjust the coefficients of the regression model, and then use the new coefficients to estimate missing values. Simulation results show that the proposed methods provide more accurate missing value estimation in six testing microarray datasets than the existing regression-based methods do. Imputation of missing values is a very important aspect of microarray data analyses because most of the downstream analyses require a complete dataset. Therefore, exploring accurate and efficient methods for estimating missing values has become an essential issue. Since our proposed shrinkage regression-based methods can provide accurate missing value estimation, they are competitive alternatives to the existing regression-based methods.

  12. The Impact of Prior Programming Knowledge on Lecture Attendance and Final Exam

    ERIC Educational Resources Information Center

    Veerasamy, Ashok Kumar; D'Souza, Daryl; Lindén, Rolf; Laakso, Mikko-Jussi

    2018-01-01

    In this article, we report the results of the impact of prior programming knowledge (PPK) on lecture attendance (LA) and on subsequent final programming exam performance in a university level introductory programming course. This study used Spearman's rank correlation coefficient, multiple regression, Kruskal-Wallis, and Bonferroni correction…

  13. Building Your Own Regression Model

    ERIC Educational Resources Information Center

    Horton, Robert, M.; Phillips, Vicki; Kenelly, John

    2004-01-01

    Spreadsheets to explore regression with an algebra 2 class in a medium-sized rural high school are presented. The use of spreadsheets can help students develop sophisticated understanding of mathematical models and use them to describe real-world phenomena.

  14. The Oklahoma's Promise Program: A National Model to Promote College Persistence

    ERIC Educational Resources Information Center

    Mendoza, Pilar; Mendez, Jesse P.

    2013-01-01

    Using a multi-method approach involving fixed effects and logistic regressions, this study examined the effect of the Oklahoma's Promise Program on student persistence in relation to the Pell and Stafford federal programs and according to socio-economic characteristics and class level. The Oklahoma's Promise is a hybrid state program that pays…

  15. Efficient Regressions via Optimally Combining Quantile Information*

    PubMed Central

    Zhao, Zhibiao; Xiao, Zhijie

    2014-01-01

    We develop a generally applicable framework for constructing efficient estimators of regression models via quantile regressions. The proposed method is based on optimally combining information over multiple quantiles and can be applied to a broad range of parametric and nonparametric settings. When combining information over a fixed number of quantiles, we derive an upper bound on the distance between the efficiency of the proposed estimator and the Fisher information. As the number of quantiles increases, this upper bound decreases and the asymptotic variance of the proposed estimator approaches the Cramér-Rao lower bound under appropriate conditions. In the case of non-regular statistical estimation, the proposed estimator leads to super-efficient estimation. We illustrate the proposed method for several widely used regression models. Both asymptotic theory and Monte Carlo experiments show the superior performance over existing methods. PMID:25484481

  16. Restoration of Monotonicity Respecting in Dynamic Regression

    PubMed Central

    Huang, Yijian

    2017-01-01

    Dynamic regression models, including the quantile regression model and Aalen’s additive hazards model, are widely adopted to investigate evolving covariate effects. Yet lack of monotonicity respecting with standard estimation procedures remains an outstanding issue. Advances have recently been made, but none provides a complete resolution. In this article, we propose a novel adaptive interpolation method to restore monotonicity respecting, by successively identifying and then interpolating nearest monotonicity-respecting points of an original estimator. Under mild regularity conditions, the resulting regression coefficient estimator is shown to be asymptotically equivalent to the original. Our numerical studies have demonstrated that the proposed estimator is much more smooth and may have better finite-sample efficiency than the original as well as, when available as only in special cases, other competing monotonicity-respecting estimators. Illustration with a clinical study is provided. PMID:29430068

  17. General Nature of Multicollinearity in Multiple Regression Analysis.

    ERIC Educational Resources Information Center

    Liu, Richard

    1981-01-01

    Discusses multiple regression, a very popular statistical technique in the field of education. One of the basic assumptions in regression analysis requires that independent variables in the equation should not be highly correlated. The problem of multicollinearity and some of the solutions to it are discussed. (Author)

  18. A Powerful Test for Comparing Multiple Regression Functions.

    PubMed

    Maity, Arnab

    2012-09-01

    In this article, we address the important problem of comparison of two or more population regression functions. Recently, Pardo-Fernández, Van Keilegom and González-Manteiga (2007) developed test statistics for simple nonparametric regression models: Y(ij) = θ(j)(Z(ij)) + σ(j)(Z(ij))∊(ij), based on empirical distributions of the errors in each population j = 1, … , J. In this paper, we propose a test for equality of the θ(j)(·) based on the concept of generalized likelihood ratio type statistics. We also generalize our test for other nonparametric regression setups, e.g, nonparametric logistic regression, where the loglikelihood for population j is any general smooth function [Formula: see text]. We describe a resampling procedure to obtain the critical values of the test. In addition, we present a simulation study to evaluate the performance of the proposed test and compare our results to those in Pardo-Fernández et al. (2007).

  19. Methods for Adjusting U.S. Geological Survey Rural Regression Peak Discharges in an Urban Setting

    USGS Publications Warehouse

    Moglen, Glenn E.; Shivers, Dorianne E.

    2006-01-01

    A study was conducted of 78 U.S. Geological Survey gaged streams that have been subjected to varying degrees of urbanization over the last three decades. Flood-frequency analysis coupled with nonlinear regression techniques were used to generate a set of equations for converting peak discharge estimates determined from rural regression equations to a set of peak discharge estimates that represent known urbanization. Specifically, urban regression equations for the 2-, 5-, 10-, 25-, 50-, 100-, and 500-year return periods were calibrated as a function of the corresponding rural peak discharge and the percentage of impervious area in a watershed. The results of this study indicate that two sets of equations, one set based on imperviousness and one set based on population density, performed well. Both sets of equations are dependent on rural peak discharges, a measure of development (average percentage of imperviousness or average population density), and a measure of homogeneity of development within a watershed. Average imperviousness was readily determined by using geographic information system methods and commonly available land-cover data. Similarly, average population density was easily determined from census data. Thus, a key advantage to the equations developed in this study is that they do not require field measurements of watershed characteristics as did the U.S. Geological Survey urban equations developed in an earlier investigation. During this study, the U.S. Geological Survey PeakFQ program was used as an integral tool in the calibration of all equations. The scarcity of historical land-use data, however, made exclusive use of flow records necessary for the 30-year period from 1970 to 2000. Such relatively short-duration streamflow time series required a nonstandard treatment of the historical data function of the PeakFQ program in comparison to published guidelines. Thus, the approach used during this investigation does not fully comply with the

  20. Applications of statistics to medical science, III. Correlation and regression.

    PubMed

    Watanabe, Hiroshi

    2012-01-01

    In this third part of a series surveying medical statistics, the concepts of correlation and regression are reviewed. In particular, methods of linear regression and logistic regression are discussed. Arguments related to survival analysis will be made in a subsequent paper.

  1. cp-R, an interface the R programming language for clinical laboratory method comparisons.

    PubMed

    Holmes, Daniel T

    2015-02-01

    Clinical scientists frequently need to compare two different bioanalytical methods as part of assay validation/monitoring. As a matter necessity, regression methods for quantitative comparison in clinical chemistry, hematology and other clinical laboratory disciplines must allow for error in both the x and y variables. Traditionally the methods popularized by 1) Deming and 2) Passing and Bablok have been recommended. While commercial tools exist, no simple open source tool is available. The purpose of this work was to develop and entirely open-source GUI-driven program for bioanalytical method comparisons capable of performing these regression methods and able to produce highly customized graphical output. The GUI is written in python and PyQt4 with R scripts performing regression and graphical functions. The program can be run from source code or as a pre-compiled binary executable. The software performs three forms of regression and offers weighting where applicable. Confidence bands of the regression are calculated using bootstrapping for Deming and Passing Bablok methods. Users can customize regression plots according to the tools available in R and can produced output in any of: jpg, png, tiff, bmp at any desired resolution or ps and pdf vector formats. Bland Altman plots and some regression diagnostic plots are also generated. Correctness of regression parameter estimates was confirmed against existing R packages. The program allows for rapid and highly customizable graphical output capable of conforming to the publication requirements of any clinical chemistry journal. Quick method comparisons can also be performed and cut and paste into spreadsheet or word processing applications. We present a simple and intuitive open source tool for quantitative method comparison in a clinical laboratory environment. Copyright © 2014 The Canadian Society of Clinical Chemists. Published by Elsevier Inc. All rights reserved.

  2. Multivariate Regression Analysis and Slaughter Livestock,

    DTIC Science & Technology

    AGRICULTURE, *ECONOMICS), (*MEAT, PRODUCTION), MULTIVARIATE ANALYSIS, REGRESSION ANALYSIS , ANIMALS, WEIGHT, COSTS, PREDICTIONS, STABILITY, MATHEMATICAL MODELS, STORAGE, BEEF, PORK, FOOD, STATISTICAL DATA, ACCURACY

  3. Analysis of reciprocal creatinine plots by two-phase linear regression.

    PubMed

    Rowe, P A; Richardson, R E; Burton, P R; Morgan, A G; Burden, R P

    1989-01-01

    The progression of renal diseases is often monitored by the serial measurement of plasma creatinine. The slope of the linear relation that is frequently found between the reciprocal of creatinine concentration and time delineates the rate of change in renal function. Minor changes in slope, perhaps indicating response to therapeutic intervention, can be difficult to identify and yet be of clinical importance. We describe the application of two-phase linear regression to identify and characterise changes in slope using a microcomputer. The method fits two intersecting lines to the data by computing a least-squares estimate of the position of the slope change and its 95% confidence limits. This avoids the potential bias of fixing the change at a preconceived time corresponding with an alteration in treatment. The program then evaluates the statistical and clinical significance of the slope change and produces a graphical output to aid interpretation.

  4. Predicting Word Reading Ability: A Quantile Regression Study

    ERIC Educational Resources Information Center

    McIlraith, Autumn L.

    2018-01-01

    Predictors of early word reading are well established. However, it is unclear if these predictors hold for readers across a range of word reading abilities. This study used quantile regression to investigate predictive relationships at different points in the distribution of word reading. Quantile regression analyses used preschool and…

  5. Risk factors for autistic regression: results of an ambispective cohort study.

    PubMed

    Zhang, Ying; Xu, Qiong; Liu, Jing; Li, She-chang; Xu, Xiu

    2012-08-01

    A subgroup of children diagnosed with autism experience developmental regression featured by a loss of previously acquired abilities. The pathogeny of autistic regression is unknown, although many risk factors likely exist. To better characterize autistic regression and investigate the association between autistic regression and potential influencing factors in Chinese autistic children, we conducted an ambispective study with a cohort of 170 autistic subjects. Analyses by multiple logistic regression showed significant correlations between autistic regression and febrile seizures (OR = 3.53, 95% CI = 1.17-10.65, P = .025), as well as with a family history of neuropsychiatric disorders (OR = 3.62, 95% CI = 1.35-9.71, P = .011). This study suggests that febrile seizures and family history of neuropsychiatric disorders are correlated with autistic regression.

  6. The Use of Linear Programming for Prediction.

    ERIC Educational Resources Information Center

    Schnittjer, Carl J.

    The purpose of the study was to develop a linear programming model to be used for prediction, test the accuracy of the predictions, and compare the accuracy with that produced by curvilinear multiple regression analysis. (Author)

  7. Logistic regression for risk factor modelling in stuttering research.

    PubMed

    Reed, Phil; Wu, Yaqionq

    2013-06-01

    To outline the uses of logistic regression and other statistical methods for risk factor analysis in the context of research on stuttering. The principles underlying the application of a logistic regression are illustrated, and the types of questions to which such a technique has been applied in the stuttering field are outlined. The assumptions and limitations of the technique are discussed with respect to existing stuttering research, and with respect to formulating appropriate research strategies to accommodate these considerations. Finally, some alternatives to the approach are briefly discussed. The way the statistical procedures are employed are demonstrated with some hypothetical data. Research into several practical issues concerning stuttering could benefit if risk factor modelling were used. Important examples are early diagnosis, prognosis (whether a child will recover or persist) and assessment of treatment outcome. After reading this article you will: (a) Summarize the situations in which logistic regression can be applied to a range of issues about stuttering; (b) Follow the steps in performing a logistic regression analysis; (c) Describe the assumptions of the logistic regression technique and the precautions that need to be checked when it is employed; (d) Be able to summarize its advantages over other techniques like estimation of group differences and simple regression. Copyright © 2012 Elsevier Inc. All rights reserved.

  8. Spectral distance decay: Assessing species beta-diversity by quantile regression

    USGS Publications Warehouse

    Rocchinl, D.; Nagendra, H.; Ghate, R.; Cade, B.S.

    2009-01-01

    Remotely sensed data represents key information for characterizing and estimating biodiversity. Spectral distance among sites has proven to be a powerful approach for detecting species composition variability. Regression analysis of species similarity versus spectral distance may allow us to quantitatively estimate how beta-diversity in species changes with respect to spectral and ecological variability. In classical regression analysis, the residual sum of squares is minimized for the mean of the dependent variable distribution. However, many ecological datasets are characterized by a high number of zeroes that can add noise to the regression model. Quantile regression can be used to evaluate trend in the upper quantiles rather than a mean trend across the whole distribution of the dependent variable. In this paper, we used ordinary least square (ols) and quantile regression to estimate the decay of species similarity versus spectral distance. The achieved decay rates were statistically nonzero (p < 0.05) considering both ols and quantile regression. Nonetheless, ols regression estimate of mean decay rate was only half the decay rate indicated by the upper quantiles. Moreover, the intercept value, representing the similarity reached when spectral distance approaches zero, was very low compared with the intercepts of upper quantiles, which detected high species similarity when habitats are more similar. In this paper we demonstrated the power of using quantile regressions applied to spectral distance decay in order to reveal species diversity patterns otherwise lost or underestimated by ordinary least square regression. ?? 2009 American Society for Photogrammetry and Remote Sensing.

  9. Preserving Institutional Privacy in Distributed binary Logistic Regression.

    PubMed

    Wu, Yuan; Jiang, Xiaoqian; Ohno-Machado, Lucila

    2012-01-01

    Privacy is becoming a major concern when sharing biomedical data across institutions. Although methods for protecting privacy of individual patients have been proposed, it is not clear how to protect the institutional privacy, which is many times a critical concern of data custodians. Built upon our previous work, Grid Binary LOgistic REgression (GLORE)1, we developed an Institutional Privacy-preserving Distributed binary Logistic Regression model (IPDLR) that considers both individual and institutional privacy for building a logistic regression model in a distributed manner. We tested our method using both simulated and clinical data, showing how it is possible to protect the privacy of individuals and of institutions using a distributed strategy.

  10. Poor Smokers, Poor Quitters, and Cigarette Tax Regressivity

    PubMed Central

    Remler, Dahlia K.

    2004-01-01

    The traditional view that excise taxes are regressive has been challenged. I document the history of the term regressive tax, show that traditional definitions have always found cigarette taxes to be regressive, and illustrate the implications of the greater price responsiveness observed among the poor. I explain the different definitions of tax burden: accounting, welfare-based willingness to pay, and welfare-based time inconsistent. Progressivity (equity across income groups) is sensitive to the way in which tax burden is assessed. Analysis of horizontal equity (fairness within a given income group) shows that cigarette taxes heavily burden poor smokers who do not quit, no matter how tax burden is assessed. PMID:14759931

  11. Interpreting Bivariate Regression Coefficients: Going beyond the Average

    ERIC Educational Resources Information Center

    Halcoussis, Dennis; Phillips, G. Michael

    2010-01-01

    Statistics, econometrics, investment analysis, and data analysis classes often review the calculation of several types of averages, including the arithmetic mean, geometric mean, harmonic mean, and various weighted averages. This note shows how each of these can be computed using a basic regression framework. By recognizing when a regression model…

  12. The Variance Normalization Method of Ridge Regression Analysis.

    ERIC Educational Resources Information Center

    Bulcock, J. W.; And Others

    The testing of contemporary sociological theory often calls for the application of structural-equation models to data which are inherently collinear. It is shown that simple ridge regression, which is commonly used for controlling the instability of ordinary least squares regression estimates in ill-conditioned data sets, is not a legitimate…

  13. Impact of multicollinearity on small sample hydrologic regression models

    NASA Astrophysics Data System (ADS)

    Kroll, Charles N.; Song, Peter

    2013-06-01

    Often hydrologic regression models are developed with ordinary least squares (OLS) procedures. The use of OLS with highly correlated explanatory variables produces multicollinearity, which creates highly sensitive parameter estimators with inflated variances and improper model selection. It is not clear how to best address multicollinearity in hydrologic regression models. Here a Monte Carlo simulation is developed to compare four techniques to address multicollinearity: OLS, OLS with variance inflation factor screening (VIF), principal component regression (PCR), and partial least squares regression (PLS). The performance of these four techniques was observed for varying sample sizes, correlation coefficients between the explanatory variables, and model error variances consistent with hydrologic regional regression models. The negative effects of multicollinearity are magnified at smaller sample sizes, higher correlations between the variables, and larger model error variances (smaller R2). The Monte Carlo simulation indicates that if the true model is known, multicollinearity is present, and the estimation and statistical testing of regression parameters are of interest, then PCR or PLS should be employed. If the model is unknown, or if the interest is solely on model predictions, is it recommended that OLS be employed since using more complicated techniques did not produce any improvement in model performance. A leave-one-out cross-validation case study was also performed using low-streamflow data sets from the eastern United States. Results indicate that OLS with stepwise selection generally produces models across study regions with varying levels of multicollinearity that are as good as biased regression techniques such as PCR and PLS.

  14. Advanced colorectal neoplasia risk stratification by penalized logistic regression.

    PubMed

    Lin, Yunzhi; Yu, Menggang; Wang, Sijian; Chappell, Richard; Imperiale, Thomas F

    2016-08-01

    Colorectal cancer is the second leading cause of death from cancer in the United States. To facilitate the efficiency of colorectal cancer screening, there is a need to stratify risk for colorectal cancer among the 90% of US residents who are considered "average risk." In this article, we investigate such risk stratification rules for advanced colorectal neoplasia (colorectal cancer and advanced, precancerous polyps). We use a recently completed large cohort study of subjects who underwent a first screening colonoscopy. Logistic regression models have been used in the literature to estimate the risk of advanced colorectal neoplasia based on quantifiable risk factors. However, logistic regression may be prone to overfitting and instability in variable selection. Since most of the risk factors in our study have several categories, it was tempting to collapse these categories into fewer risk groups. We propose a penalized logistic regression method that automatically and simultaneously selects variables, groups categories, and estimates their coefficients by penalizing the [Formula: see text]-norm of both the coefficients and their differences. Hence, it encourages sparsity in the categories, i.e. grouping of the categories, and sparsity in the variables, i.e. variable selection. We apply the penalized logistic regression method to our data. The important variables are selected, with close categories simultaneously grouped, by penalized regression models with and without the interactions terms. The models are validated with 10-fold cross-validation. The receiver operating characteristic curves of the penalized regression models dominate the receiver operating characteristic curve of naive logistic regressions, indicating a superior discriminative performance. © The Author(s) 2013.

  15. Histological characterization of regression in acquired immunodeficiency syndrome-related Kaposi's sarcoma.

    PubMed

    Pantanowitz, Liron; Dezube, Bruce J; Pinkus, Geraldine S; Tahan, Steven R

    2004-01-01

    Kaposi's sarcoma (KS) is an angioproliferative lesion that may regress or progress. Progression is related to spindle cell proliferation and the expression of human herpes virus-8 latency genes, including latent nuclear antigen-1 (LNA-1), cyclin-D1, and bcl-2. KS regression has not been well characterized histologically. Therefore, this study was undertaken to characterize the histopathology of pharmacologically induced regressed cutaneous KS. Skin punch biopsies from eight patients with acquired immunodeficiency syndrome (AIDS)-related KS, that regressed following chemotherapy with paclitaxel or the angiogenesis inhibitor Col-3, were investigated by light microscopy. Comparative immunophenotyping on pre- and post-treatment specimens for CD31, LNA-1, cyclin-D1, bcl-2, and CD117 (c-kit) was performed. Clinical and histologic features of regression were similar for paclitaxel and Col-3 treatment. On clinical examination, lesions flattened, became smaller, and lost their purple-red appearance, resulting in an orange-brown macule. Histological regression was divided into partial (n = 3) and complete (n = 5) regression. Partially regressed lesions had a significant reduction of spindle cells in the dermal interstitium, with residual spindle cells arranged around superficial and mid-dermal capillaries. Complete regression was characterized by an absence of detectable spindle cells, with a slight increase in capillaries of the superficial plexus. All regressed samples exhibited a prominent, superficial, perivascular, lymphocytic infiltrate and abundant dermal hemosiderin-laden macrophages. This clinicopathologic picture resembled the findings of pigmented purpura. CD31 staining correlated with the reduction of spindle cells. Regression was accompanied by a quantitative and qualitative decrease in LNA-1 and cyclin-D1 immunoreactivity, but no change in bcl-2 or c-kit expression. Pharmacologically induced regression of AIDS-related cutaneous KS is characterized by a complete

  16. Application-level regression testing framework using Jenkins

    DOE PAGES

    Budiardja, Reuben; Bouvet, Timothy; Arnold, Galen

    2017-09-26

    Monitoring and testing for regression of large-scale systems such as the NCSA's Blue Waters supercomputer are challenging tasks. In this paper, we describe the solution we came up with to perform those tasks. The goal was to find an automated solution for running user-level regression tests to evaluate system usability and performance. Jenkins, an automation server software, was chosen for its versatility, large user base, and multitude of plugins including collecting data and plotting test results over time. We also describe our Jenkins deployment to launch and monitor jobs on remote HPC system, perform authentication with one-time password, and integratemore » with our LDAP server for its authorization. We show some use cases and describe our best practices for successfully using Jenkins as a user-level system-wide regression testing and monitoring framework for large supercomputer systems.« less

  17. Nonparametric instrumental regression with non-convex constraints

    NASA Astrophysics Data System (ADS)

    Grasmair, M.; Scherzer, O.; Vanhems, A.

    2013-03-01

    This paper considers the nonparametric regression model with an additive error that is dependent on the explanatory variables. As is common in empirical studies in epidemiology and economics, it also supposes that valid instrumental variables are observed. A classical example in microeconomics considers the consumer demand function as a function of the price of goods and the income, both variables often considered as endogenous. In this framework, the economic theory also imposes shape restrictions on the demand function, such as integrability conditions. Motivated by this illustration in microeconomics, we study an estimator of a nonparametric constrained regression function using instrumental variables by means of Tikhonov regularization. We derive rates of convergence for the regularized model both in a deterministic and stochastic setting under the assumption that the true regression function satisfies a projected source condition including, because of the non-convexity of the imposed constraints, an additional smallness condition.

  18. Application-level regression testing framework using Jenkins

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Budiardja, Reuben; Bouvet, Timothy; Arnold, Galen

    Monitoring and testing for regression of large-scale systems such as the NCSA's Blue Waters supercomputer are challenging tasks. In this paper, we describe the solution we came up with to perform those tasks. The goal was to find an automated solution for running user-level regression tests to evaluate system usability and performance. Jenkins, an automation server software, was chosen for its versatility, large user base, and multitude of plugins including collecting data and plotting test results over time. We also describe our Jenkins deployment to launch and monitor jobs on remote HPC system, perform authentication with one-time password, and integratemore » with our LDAP server for its authorization. We show some use cases and describe our best practices for successfully using Jenkins as a user-level system-wide regression testing and monitoring framework for large supercomputer systems.« less

  19. Physiologic noise regression, motion regression, and TOAST dynamic field correction in complex-valued fMRI time series.

    PubMed

    Hahn, Andrew D; Rowe, Daniel B

    2012-02-01

    As more evidence is presented suggesting that the phase, as well as the magnitude, of functional MRI (fMRI) time series may contain important information and that there are theoretical drawbacks to modeling functional response in the magnitude alone, removing noise in the phase is becoming more important. Previous studies have shown that retrospective correction of noise from physiologic sources can remove significant phase variance and that dynamic main magnetic field correction and regression of estimated motion parameters also remove significant phase fluctuations. In this work, we investigate the performance of physiologic noise regression in a framework along with correction for dynamic main field fluctuations and motion regression. Our findings suggest that including physiologic regressors provides some benefit in terms of reduction in phase noise power, but it is small compared to the benefit of dynamic field corrections and use of estimated motion parameters as nuisance regressors. Additionally, we show that the use of all three techniques reduces phase variance substantially, removes undesirable spatial phase correlations and improves detection of the functional response in magnitude and phase. Copyright © 2011 Elsevier Inc. All rights reserved.

  20. Penalized spline estimation for functional coefficient regression models.

    PubMed

    Cao, Yanrong; Lin, Haiqun; Wu, Tracy Z; Yu, Yan

    2010-04-01

    The functional coefficient regression models assume that the regression coefficients vary with some "threshold" variable, providing appreciable flexibility in capturing the underlying dynamics in data and avoiding the so-called "curse of dimensionality" in multivariate nonparametric estimation. We first investigate the estimation, inference, and forecasting for the functional coefficient regression models with dependent observations via penalized splines. The P-spline approach, as a direct ridge regression shrinkage type global smoothing method, is computationally efficient and stable. With established fixed-knot asymptotics, inference is readily available. Exact inference can be obtained for fixed smoothing parameter λ, which is most appealing for finite samples. Our penalized spline approach gives an explicit model expression, which also enables multi-step-ahead forecasting via simulations. Furthermore, we examine different methods of choosing the important smoothing parameter λ: modified multi-fold cross-validation (MCV), generalized cross-validation (GCV), and an extension of empirical bias bandwidth selection (EBBS) to P-splines. In addition, we implement smoothing parameter selection using mixed model framework through restricted maximum likelihood (REML) for P-spline functional coefficient regression models with independent observations. The P-spline approach also easily allows different smoothness for different functional coefficients, which is enabled by assigning different penalty λ accordingly. We demonstrate the proposed approach by both simulation examples and a real data application.

  1. Weighted functional linear regression models for gene-based association analysis.

    PubMed

    Belonogova, Nadezhda M; Svishcheva, Gulnara R; Wilson, James F; Campbell, Harry; Axenovich, Tatiana I

    2018-01-01

    Functional linear regression models are effectively used in gene-based association analysis of complex traits. These models combine information about individual genetic variants, taking into account their positions and reducing the influence of noise and/or observation errors. To increase the power of methods, where several differently informative components are combined, weights are introduced to give the advantage to more informative components. Allele-specific weights have been introduced to collapsing and kernel-based approaches to gene-based association analysis. Here we have for the first time introduced weights to functional linear regression models adapted for both independent and family samples. Using data simulated on the basis of GAW17 genotypes and weights defined by allele frequencies via the beta distribution, we demonstrated that type I errors correspond to declared values and that increasing the weights of causal variants allows the power of functional linear models to be increased. We applied the new method to real data on blood pressure from the ORCADES sample. Five of the six known genes with P < 0.1 in at least one analysis had lower P values with weighted models. Moreover, we found an association between diastolic blood pressure and the VMP1 gene (P = 8.18×10-6), when we used a weighted functional model. For this gene, the unweighted functional and weighted kernel-based models had P = 0.004 and 0.006, respectively. The new method has been implemented in the program package FREGAT, which is freely available at https://cran.r-project.org/web/packages/FREGAT/index.html.

  2. Two-Year versus One-Year Head Start Program Impact: Addressing Selection Bias by Comparing Regression Modeling with Propensity Score Analysis

    ERIC Educational Resources Information Center

    Leow, Christine; Wen, Xiaoli; Korfmacher, Jon

    2015-01-01

    This article compares regression modeling and propensity score analysis as different types of statistical techniques used in addressing selection bias when estimating the impact of two-year versus one-year Head Start on children's school readiness. The analyses were based on the national Head Start secondary dataset. After controlling for…

  3. Trend of Suicide Rates According to Urbanity among Adolescents by Gender and Suicide Method in Korea, 1997–2012

    PubMed Central

    Choi, Kyung-Hwa; Kim, Dong-Hyun

    2015-01-01

    This study aims to quantifiably evaluate the trend of the suicide rate among Korean adolescents from 1997 to 2012 according to urbanity. We used national death certificates and registration population data by administrative district for 15–19 years-old adolescents. The annual percent change (APC) and average annual percent change (AAPC) were estimated by the Joinpoint Regression Program. The suicide rate in the rural areas was higher than that in the urban areas in both genders (males (/100,000), 12.2 vs. 8.5; females (/100,000), 10.2 vs. 7.4 in 2012). However, the trend significantly increased only in the urban area (AAPC [95% CI]: males 2.6 [0.7, 4.6], females 3.3 [1.4, 5.2]). In urban areas, the suicide rate by jumping significantly increased in both genders (AAPC [95% CI]: males, 6.7 [4.3, 9.1]; females, 4.5 [3.0, 6.1]). In rural areas, the rate by self-poisoning significantly decreased by 7.9% per year for males (95% CI: −12.5, −3.0) and the rate by hanging significantly increased by 10.1% per year for females (95% CI: 2.6, 18.2). The trend and methods of suicide differ according to urbanity; therefore, a suicide prevention policy based on urbanity needs to be established for adolescents in Korea. PMID:25985313

  4. Trend of Suicide Rates According to Urbanity among Adolescents by Gender and Suicide Method in Korea, 1997-2012.

    PubMed

    Choi, Kyung-Hwa; Kim, Dong-Hyun

    2015-05-13

    This study aims to quantifiably evaluate the trend of the suicide rate among Korean adolescents from 1997 to 2012 according to urbanity. We used national death certificates and registration population data by administrative district for 15-19 years-old adolescents. The annual percent change (APC) and average annual percent change (AAPC) were estimated by the Joinpoint Regression Program. The suicide rate in the rural areas was higher than that in the urban areas in both genders (males (/100,000), 12.2 vs. 8.5; females (/100,000), 10.2 vs. 7.4 in 2012). However, the trend significantly increased only in the urban area (AAPC [95% CI]: males 2.6 [0.7, 4.6], females 3.3 [1.4, 5.2]). In urban areas, the suicide rate by jumping significantly increased in both genders (AAPC [95% CI]: males, 6.7 [4.3, 9.1]; females, 4.5 [3.0, 6.1]). In rural areas, the rate by self-poisoning significantly decreased by 7.9% per year for males (95% CI: -12.5, -3.0) and the rate by hanging significantly increased by 10.1% per year for females (95% CI: 2.6, 18.2). The trend and methods of suicide differ according to urbanity; therefore, a suicide prevention policy based on urbanity needs to be established for adolescents in Korea.

  5. Quantile Regression in the Study of Developmental Sciences

    ERIC Educational Resources Information Center

    Petscher, Yaacov; Logan, Jessica A. R.

    2014-01-01

    Linear regression analysis is one of the most common techniques applied in developmental research, but only allows for an estimate of the average relations between the predictor(s) and the outcome. This study describes quantile regression, which provides estimates of the relations between the predictor(s) and outcome, but across multiple points of…

  6. Testing hypotheses for differences between linear regression lines

    Treesearch

    Stanley J. Zarnoch

    2009-01-01

    Five hypotheses are identified for testing differences between simple linear regression lines. The distinctions between these hypotheses are based on a priori assumptions and illustrated with full and reduced models. The contrast approach is presented as an easy and complete method for testing for overall differences between the regressions and for making pairwise...

  7. Quantifying components of the hydrologic cycle in Virginia using chemical hydrograph separation and multiple regression analysis

    USGS Publications Warehouse

    Sanford, Ward E.; Nelms, David L.; Pope, Jason P.; Selnick, David L.

    2012-01-01

    This study by the U.S. Geological Survey, prepared in cooperation with the Virginia Department of Environmental Quality, quantifies the components of the hydrologic cycle across the Commonwealth of Virginia. Long-term, mean fluxes were calculated for precipitation, surface runoff, infiltration, total evapotranspiration (ET), riparian ET, recharge, base flow (or groundwater discharge) and net total outflow. Fluxes of these components were first estimated on a number of real-time-gaged watersheds across Virginia. Specific conductance was used to distinguish and separate surface runoff from base flow. Specific-conductance data were collected every 15 minutes at 75 real-time gages for approximately 18 months between March 2007 and August 2008. Precipitation was estimated for 1971–2000 using PRISM climate data. Precipitation and temperature from the PRISM data were used to develop a regression-based relation to estimate total ET. The proportion of watershed precipitation that becomes surface runoff was related to physiographic province and rock type in a runoff regression equation. Component flux estimates from the watersheds were transferred to flux estimates for counties and independent cities using the ET and runoff regression equations. Only 48 of the 75 watersheds yielded sufficient data, and data from these 48 were used in the final runoff regression equation. The base-flow proportion for the 48 watersheds averaged 72 percent using specific conductance, a value that was substantially higher than the 61 percent average calculated using a graphical-separation technique (the USGS program PART). Final results for the study are presented as component flux estimates for all counties and independent cities in Virginia.

  8. Robust regression for large-scale neuroimaging studies.

    PubMed

    Fritsch, Virgile; Da Mota, Benoit; Loth, Eva; Varoquaux, Gaël; Banaschewski, Tobias; Barker, Gareth J; Bokde, Arun L W; Brühl, Rüdiger; Butzek, Brigitte; Conrod, Patricia; Flor, Herta; Garavan, Hugh; Lemaitre, Hervé; Mann, Karl; Nees, Frauke; Paus, Tomas; Schad, Daniel J; Schümann, Gunter; Frouin, Vincent; Poline, Jean-Baptiste; Thirion, Bertrand

    2015-05-01

    Multi-subject datasets used in neuroimaging group studies have a complex structure, as they exhibit non-stationary statistical properties across regions and display various artifacts. While studies with small sample sizes can rarely be shown to deviate from standard hypotheses (such as the normality of the residuals) due to the poor sensitivity of normality tests with low degrees of freedom, large-scale studies (e.g. >100 subjects) exhibit more obvious deviations from these hypotheses and call for more refined models for statistical inference. Here, we demonstrate the benefits of robust regression as a tool for analyzing large neuroimaging cohorts. First, we use an analytic test based on robust parameter estimates; based on simulations, this procedure is shown to provide an accurate statistical control without resorting to permutations. Second, we show that robust regression yields more detections than standard algorithms using as an example an imaging genetics study with 392 subjects. Third, we show that robust regression can avoid false positives in a large-scale analysis of brain-behavior relationships with over 1500 subjects. Finally we embed robust regression in the Randomized Parcellation Based Inference (RPBI) method and demonstrate that this combination further improves the sensitivity of tests carried out across the whole brain. Altogether, our results show that robust procedures provide important advantages in large-scale neuroimaging group studies. Copyright © 2015 Elsevier Inc. All rights reserved.

  9. A proposed streamflow-data program for Utah

    USGS Publications Warehouse

    Whitaker, G.L.

    1970-01-01

    An evaluation of the streamflow data available in Utah was made to provide guidelines for planning future programs. The basic steps in the evaluation procedure were (1) definition of the long- term goals of the streamflow-data program in quantitative form, (2) examination and analysis of all available data to determine which goals have already been met, and (3) consideration of alternate programs and techniques to meet the remaining objectives. The principal goals are (1) to provide current streamflow data where needed for water management and (2) to define streamflow characteristics at any point on any stream within a specified accuracy. It was found that the first goal generally is being satisfied but that flow characteristics at ungaged sites cannot be estimated within the specified accuracy by regression analysis with the existing data and model now available. This latter finding indicates the need for some changes in the present data program so that the accuracy goals can be approached by alternate methods. The regression method may be more successful at a future time if a more suitable model can be developed, and if an adequate sample of streamflow records can be obtained in all areas. In the meantime, methods of transferring flow characteristics which require some information at the ungaged site may be used. A modified streamflow-data program based on this study is proposed.

  10. Cascade Optimization Strategy with Neural Network and Regression Approximations Demonstrated on a Preliminary Aircraft Engine Design

    NASA Technical Reports Server (NTRS)

    Hopkins, Dale A.; Patnaik, Surya N.

    2000-01-01

    A preliminary aircraft engine design methodology is being developed that utilizes a cascade optimization strategy together with neural network and regression approximation methods. The cascade strategy employs different optimization algorithms in a specified sequence. The neural network and regression methods are used to approximate solutions obtained from the NASA Engine Performance Program (NEPP), which implements engine thermodynamic cycle and performance analysis models. The new methodology is proving to be more robust and computationally efficient than the conventional optimization approach of using a single optimization algorithm with direct reanalysis. The methodology has been demonstrated on a preliminary design problem for a novel subsonic turbofan engine concept that incorporates a wave rotor as a cycle-topping device. Computations of maximum thrust were obtained for a specific design point in the engine mission profile. The results (depicted in the figure) show a significant improvement in the maximum thrust obtained using the new methodology in comparison to benchmark solutions obtained using NEPP in a manual design mode.

  11. Service Content as a Determinant of Homeless Adults' Perceptions of Program Efficacy

    ERIC Educational Resources Information Center

    Sosin, Michael; George, Christine; Grossman, Susan

    2012-01-01

    This work analyzes the relationship between the services clients receive in treatment programs and client ratings of program efficacy. It relies on data from a random sample of clients served by Chicago's homelessness system (N = 554). Regressions utilizing that data suggest that ratings of program efficacy are positively predicted by program…

  12. Using Dominance Analysis to Determine Predictor Importance in Logistic Regression

    ERIC Educational Resources Information Center

    Azen, Razia; Traxel, Nicole

    2009-01-01

    This article proposes an extension of dominance analysis that allows researchers to determine the relative importance of predictors in logistic regression models. Criteria for choosing logistic regression R[superscript 2] analogues were determined and measures were selected that can be used to perform dominance analysis in logistic regression. A…

  13. Bayesian Estimation of Multivariate Latent Regression Models: Gauss versus Laplace

    ERIC Educational Resources Information Center

    Culpepper, Steven Andrew; Park, Trevor

    2017-01-01

    A latent multivariate regression model is developed that employs a generalized asymmetric Laplace (GAL) prior distribution for regression coefficients. The model is designed for high-dimensional applications where an approximate sparsity condition is satisfied, such that many regression coefficients are near zero after accounting for all the model…

  14. Biases and Standard Errors of Standardized Regression Coefficients

    ERIC Educational Resources Information Center

    Yuan, Ke-Hai; Chan, Wai

    2011-01-01

    The paper obtains consistent standard errors (SE) and biases of order O(1/n) for the sample standardized regression coefficients with both random and given predictors. Analytical results indicate that the formulas for SEs given in popular text books are consistent only when the population value of the regression coefficient is zero. The sample…

  15. A Growth Model for Academic Program Life Cycle (APLC): A Theoretical and Empirical Analysis

    ERIC Educational Resources Information Center

    Acquah, Edward H. K.

    2010-01-01

    Academic program life cycle concept states each program's life flows through several stages: introduction, growth, maturity, and decline. A mixed-influence diffusion growth model is fitted to enrolment data on academic programs to analyze the factors determining progress of academic programs through their life cycles. The regression analysis yield…

  16. Users manual for flight control design programs

    NASA Technical Reports Server (NTRS)

    Nalbandian, J. Y.

    1975-01-01

    Computer programs for the design of analog and digital flight control systems are documented. The program DIGADAPT uses linear-quadratic-gaussian synthesis algorithms in the design of command response controllers and state estimators, and it applies covariance propagation analysis to the selection of sampling intervals for digital systems. Program SCHED executes correlation and regression analyses for the development of gain and trim schedules to be used in open-loop explicit-adaptive control laws. A linear-time-varying simulation of aircraft motions is provided by the program TVHIS, which includes guidance and control logic, as well as models for control actuator dynamics. The programs are coded in FORTRAN and are compiled and executed on both IBM and CDC computers.

  17. Area-to-point regression kriging for pan-sharpening

    NASA Astrophysics Data System (ADS)

    Wang, Qunming; Shi, Wenzhong; Atkinson, Peter M.

    2016-04-01

    Pan-sharpening is a technique to combine the fine spatial resolution panchromatic (PAN) band with the coarse spatial resolution multispectral bands of the same satellite to create a fine spatial resolution multispectral image. In this paper, area-to-point regression kriging (ATPRK) is proposed for pan-sharpening. ATPRK considers the PAN band as the covariate. Moreover, ATPRK is extended with a local approach, called adaptive ATPRK (AATPRK), which fits a regression model using a local, non-stationary scheme such that the regression coefficients change across the image. The two geostatistical approaches, ATPRK and AATPRK, were compared to the 13 state-of-the-art pan-sharpening approaches summarized in Vivone et al. (2015) in experiments on three separate datasets. ATPRK and AATPRK produced more accurate pan-sharpened images than the 13 benchmark algorithms in all three experiments. Unlike the benchmark algorithms, the two geostatistical solutions precisely preserved the spectral properties of the original coarse data. Furthermore, ATPRK can be enhanced by a local scheme in AATRPK, in cases where the residuals from a global regression model are such that their spatial character varies locally.

  18. Regional regression of flood characteristics employing historical information

    USGS Publications Warehouse

    Tasker, Gary D.; Stedinger, J.R.

    1987-01-01

    Streamflow gauging networks provide hydrologic information for use in estimating the parameters of regional regression models. The regional regression models can be used to estimate flood statistics, such as the 100 yr peak, at ungauged sites as functions of drainage basin characteristics. A recent innovation in regional regression is the use of a generalized least squares (GLS) estimator that accounts for unequal station record lengths and sample cross correlation among the flows. However, this technique does not account for historical flood information. A method is proposed here to adjust this generalized least squares estimator to account for possible information about historical floods available at some stations in a region. The historical information is assumed to be in the form of observations of all peaks above a threshold during a long period outside the systematic record period. A Monte Carlo simulation experiment was performed to compare the GLS estimator adjusted for historical floods with the unadjusted GLS estimator and the ordinary least squares estimator. Results indicate that using the GLS estimator adjusted for historical information significantly improves the regression model. ?? 1987.

  19. Multicollinearity in Regression Analyses Conducted in Epidemiologic Studies

    PubMed Central

    Vatcheva, Kristina P.; Lee, MinJae; McCormick, Joseph B.; Rahbar, Mohammad H.

    2016-01-01

    The adverse impact of ignoring multicollinearity on findings and data interpretation in regression analysis is very well documented in the statistical literature. The failure to identify and report multicollinearity could result in misleading interpretations of the results. A review of epidemiological literature in PubMed from January 2004 to December 2013, illustrated the need for a greater attention to identifying and minimizing the effect of multicollinearity in analysis of data from epidemiologic studies. We used simulated datasets and real life data from the Cameron County Hispanic Cohort to demonstrate the adverse effects of multicollinearity in the regression analysis and encourage researchers to consider the diagnostic for multicollinearity as one of the steps in regression analysis. PMID:27274911

  20. Multicollinearity in Regression Analyses Conducted in Epidemiologic Studies.

    PubMed

    Vatcheva, Kristina P; Lee, MinJae; McCormick, Joseph B; Rahbar, Mohammad H

    2016-04-01

    The adverse impact of ignoring multicollinearity on findings and data interpretation in regression analysis is very well documented in the statistical literature. The failure to identify and report multicollinearity could result in misleading interpretations of the results. A review of epidemiological literature in PubMed from January 2004 to December 2013, illustrated the need for a greater attention to identifying and minimizing the effect of multicollinearity in analysis of data from epidemiologic studies. We used simulated datasets and real life data from the Cameron County Hispanic Cohort to demonstrate the adverse effects of multicollinearity in the regression analysis and encourage researchers to consider the diagnostic for multicollinearity as one of the steps in regression analysis.

  1. Resting-state functional magnetic resonance imaging: the impact of regression analysis.

    PubMed

    Yeh, Chia-Jung; Tseng, Yu-Sheng; Lin, Yi-Ru; Tsai, Shang-Yueh; Huang, Teng-Yi

    2015-01-01

    To investigate the impact of regression methods on resting-state functional magnetic resonance imaging (rsfMRI). During rsfMRI preprocessing, regression analysis is considered effective for reducing the interference of physiological noise on the signal time course. However, it is unclear whether the regression method benefits rsfMRI analysis. Twenty volunteers (10 men and 10 women; aged 23.4 ± 1.5 years) participated in the experiments. We used node analysis and functional connectivity mapping to assess the brain default mode network by using five combinations of regression methods. The results show that regressing the global mean plays a major role in the preprocessing steps. When a global regression method is applied, the values of functional connectivity are significantly lower (P ≤ .01) than those calculated without a global regression. This step increases inter-subject variation and produces anticorrelated brain areas. rsfMRI data processed using regression should be interpreted carefully. The significance of the anticorrelated brain areas produced by global signal removal is unclear. Copyright © 2014 by the American Society of Neuroimaging.

  2. Time series regression studies in environmental epidemiology.

    PubMed

    Bhaskaran, Krishnan; Gasparrini, Antonio; Hajat, Shakoor; Smeeth, Liam; Armstrong, Ben

    2013-08-01

    Time series regression studies have been widely used in environmental epidemiology, notably in investigating the short-term associations between exposures such as air pollution, weather variables or pollen, and health outcomes such as mortality, myocardial infarction or disease-specific hospital admissions. Typically, for both exposure and outcome, data are available at regular time intervals (e.g. daily pollution levels and daily mortality counts) and the aim is to explore short-term associations between them. In this article, we describe the general features of time series data, and we outline the analysis process, beginning with descriptive analysis, then focusing on issues in time series regression that differ from other regression methods: modelling short-term fluctuations in the presence of seasonal and long-term patterns, dealing with time varying confounding factors and modelling delayed ('lagged') associations between exposure and outcome. We finish with advice on model checking and sensitivity analysis, and some common extensions to the basic model.

  3. Farmers' Participation in Extension Programs and Technology Adoption in Rural Nepal: A Logistic Regression Analysis

    ERIC Educational Resources Information Center

    Suvedi, Murari; Ghimire, Raju; Kaplowitz, Michael

    2017-01-01

    Purpose: This paper examines the factors affecting farmers' participation in extension programs and adoption of improved seed varieties in the hills of rural Nepal. Methodology/approach: Cross-sectional farm-level data were collected during July and August 2014. A sample of 198 farm households was selected for interviewing by using a multistage,…

  4. Predicting School Enrollments Using the Modified Regression Technique.

    ERIC Educational Resources Information Center

    Grip, Richard S.; Young, John W.

    This report is based on a study in which a regression model was constructed to increase accuracy in enrollment predictions. A model, known as the Modified Regression Technique (MRT), was used to examine K-12 enrollment over the past 20 years in 2 New Jersey school districts of similar size and ethnicity. To test the model's accuracy, MRT was…

  5. Hazard Regression Models of Early Mortality in Trauma Centers

    PubMed Central

    Clark, David E; Qian, Jing; Winchell, Robert J; Betensky, Rebecca A

    2013-01-01

    Background Factors affecting early hospital deaths after trauma may be different from factors affecting later hospital deaths, and the distribution of short and long prehospital times may vary among hospitals. Hazard regression (HR) models may therefore be more useful than logistic regression (LR) models for analysis of trauma mortality, especially when treatment effects at different time points are of interest. Study Design We obtained data for trauma center patients from the 2008–9 National Trauma Data Bank (NTDB). Cases were included if they had complete data for prehospital times, hospital times, survival outcome, age, vital signs, and severity scores. Cases were excluded if pulseless on admission, transferred in or out, or ISS<9. Using covariates proposed for the Trauma Quality Improvement Program and an indicator for each hospital, we compared LR models predicting survival at 8 hours after injury to HR models with survival censored at 8 hours. HR models were then modified to allow time-varying hospital effects. Results 85,327 patients in 161 hospitals met inclusion criteria. Crude hazards peaked initially, then steadily declined. When hazard ratios were assumed constant in HR models, they were similar to odds ratios in LR models associating increased mortality with increased age, firearm mechanism, increased severity, more deranged physiology, and estimated hospital-specific effects. However, when hospital effects were allowed to vary by time, HR models demonstrated that hospital outliers were not the same at different times after injury. Conclusions HR models with time-varying hazard ratios reveal inconsistencies in treatment effects, data quality, and/or timing of early death among trauma centers. HR models are generally more flexible than LR models, can be adapted for censored data, and potentially offer a better tool for analysis of factors affecting early death after injury. PMID:23036828

  6. Large unbalanced credit scoring using Lasso-logistic regression ensemble.

    PubMed

    Wang, Hong; Xu, Qingsong; Zhou, Lifeng

    2015-01-01

    Recently, various ensemble learning methods with different base classifiers have been proposed for credit scoring problems. However, for various reasons, there has been little research using logistic regression as the base classifier. In this paper, given large unbalanced data, we consider the plausibility of ensemble learning using regularized logistic regression as the base classifier to deal with credit scoring problems. In this research, the data is first balanced and diversified by clustering and bagging algorithms. Then we apply a Lasso-logistic regression learning ensemble to evaluate the credit risks. We show that the proposed algorithm outperforms popular credit scoring models such as decision tree, Lasso-logistic regression and random forests in terms of AUC and F-measure. We also provide two importance measures for the proposed model to identify important variables in the data.

  7. A Note on the Relationship between the Number of Indicators and Their Reliability in Detecting Regression Coefficients in Latent Regression Analysis

    ERIC Educational Resources Information Center

    Dolan, Conor V.; Wicherts, Jelte M.; Molenaar, Peter C. M.

    2004-01-01

    We consider the question of how variation in the number and reliability of indicators affects the power to reject the hypothesis that the regression coefficients are zero in latent linear regression analysis. We show that power remains constant as long as the coefficient of determination remains unchanged. Any increase in the number of indicators…

  8. Parameters Estimation of Geographically Weighted Ordinal Logistic Regression (GWOLR) Model

    NASA Astrophysics Data System (ADS)

    Zuhdi, Shaifudin; Retno Sari Saputro, Dewi; Widyaningsih, Purnami

    2017-06-01

    A regression model is the representation of relationship between independent variable and dependent variable. The dependent variable has categories used in the logistic regression model to calculate odds on. The logistic regression model for dependent variable has levels in the logistics regression model is ordinal. GWOLR model is an ordinal logistic regression model influenced the geographical location of the observation site. Parameters estimation in the model needed to determine the value of a population based on sample. The purpose of this research is to parameters estimation of GWOLR model using R software. Parameter estimation uses the data amount of dengue fever patients in Semarang City. Observation units used are 144 villages in Semarang City. The results of research get GWOLR model locally for each village and to know probability of number dengue fever patient categories.

  9. Linear regression crash prediction models : issues and proposed solutions.

    DOT National Transportation Integrated Search

    2010-05-01

    The paper develops a linear regression model approach that can be applied to : crash data to predict vehicle crashes. The proposed approach involves novice data aggregation : to satisfy linear regression assumptions; namely error structure normality ...

  10. Ridge Regression for Interactive Models.

    ERIC Educational Resources Information Center

    Tate, Richard L.

    1988-01-01

    An exploratory study of the value of ridge regression for interactive models is reported. Assuming that the linear terms in a simple interactive model are centered to eliminate non-essential multicollinearity, a variety of common models, representing both ordinal and disordinal interactions, are shown to have "orientations" that are…

  11. Is More Always Better in Designing Workplace Wellness Programs?: A Comparison of Wellness Program Components Versus Outcomes.

    PubMed

    Batorsky, Benjamin; Van Stolk, Christian; Liu, Hangsheng

    2016-10-01

    Assess whether adding more components to a workplace wellness program is associated with better outcomes by measuring the relationship of program components to one another and to employee participation and perceptions of program effectiveness. Data came from a 2014 survey of 24,393 employees of 81 employers about services offered, leadership, incentives, and promotion. Logistic regressions were used to model the relationship between program characteristics and outcomes. Components individually are related to better outcomes, but this relationship is weaker in the presence of other components and non-significant for incentives. Within components, a moderate level of services and work time participation opportunities are associated with higher participation and effectiveness. The "more of everything" approach does not appear to be advisable for all programs. Programs should focus on providing ample opportunities for employees to participate and initiatives like results-based incentives.

  12. Publically accessible decision support system of the spatially referenced regressions on watershed attributes (SPARROW) model and model enhancements in South Carolina

    Treesearch

    Celeste Journey; Anne B. Hoos; David E. Ladd; John W. brakebill; Richard A. Smith

    2016-01-01

    The U.S. Geological Survey (USGS) National Water Quality Assessment program has developed a web-based decision support system (DSS) to provide free public access to the steady-stateSPAtially Referenced Regressions On Watershed attributes (SPARROW) model simulation results on nutrient conditions in streams and rivers and to offer scenario testing capabilities for...

  13. The effect of dual accreditation on family medicine residency programs.

    PubMed

    Mims, Lisa D; Bressler, Lindsey C; Wannamaker, Louise R; Carek, Peter J

    2015-04-01

    In 1985, the American Osteopathic Association (AOA) Board of Trustees agreed to allow residency programs to become dually accredited by the AOA and Accreditation Council for Graduate Medical Education (ACGME). Despite the increase in such programs, there has been minimal research comparing these programs to exclusively ACGME-accredited residencies. This study examines the association between dual accreditation and suggested markers of quality. Standard characteristics such as regional location, program structure (community or university based), postgraduate year one (PGY-1) positions offered, and salary (PGY-1) were obtained for each residency program. In addition, the faculty to resident ratio in the family medicine clinic and the number of half days residents spent in the clinic each week were recorded. Initial Match rates and pass rates of new graduates on the ABFM examination from 2009 to 2013 were also obtained. Variables were analyzed using chi-square and Student's t test. Logistic regression models were then created to predict a program's 5-year aggregate initial Match rate and Board pass rate in the top tertile as compared to the lowest tertile. Dual accreditation was obtained by 117 (27.0%) of programs. Initial analyses revealed associations between dually accredited programs and mean year of initial ACGME program accreditation, regional location, program structure, tracks, and alternative medicine curriculum. When evaluated in logistic regression, dual accreditation status was not associated with Match rates or ABFM pass rates. By examining suggested markers of program quality for dually accredited programs in comparison to ACGME-only accredited programs, this study successfully established both differences and similarities among the two types.

  14. Image interpolation via regularized local linear regression.

    PubMed

    Liu, Xianming; Zhao, Debin; Xiong, Ruiqin; Ma, Siwei; Gao, Wen; Sun, Huifang

    2011-12-01

    The linear regression model is a very attractive tool to design effective image interpolation schemes. Some regression-based image interpolation algorithms have been proposed in the literature, in which the objective functions are optimized by ordinary least squares (OLS). However, it is shown that interpolation with OLS may have some undesirable properties from a robustness point of view: even small amounts of outliers can dramatically affect the estimates. To address these issues, in this paper we propose a novel image interpolation algorithm based on regularized local linear regression (RLLR). Starting with the linear regression model where we replace the OLS error norm with the moving least squares (MLS) error norm leads to a robust estimator of local image structure. To keep the solution stable and avoid overfitting, we incorporate the l(2)-norm as the estimator complexity penalty. Moreover, motivated by recent progress on manifold-based semi-supervised learning, we explicitly consider the intrinsic manifold structure by making use of both measured and unmeasured data points. Specifically, our framework incorporates the geometric structure of the marginal probability distribution induced by unmeasured samples as an additional local smoothness preserving constraint. The optimal model parameters can be obtained with a closed-form solution by solving a convex optimization problem. Experimental results on benchmark test images demonstrate that the proposed method achieves very competitive performance with the state-of-the-art interpolation algorithms, especially in image edge structure preservation. © 2011 IEEE

  15. Diurnal salivary cortisol and regression status in MECP2 Duplication syndrome

    PubMed Central

    Peters, Sarika U.; Byiers, Breanne J.; Symons, Frank J.

    2015-01-01

    MECP2 duplication syndrome is an X-linked genomic disorder that is characterized by infantile hypotonia, intellectual disability, and recurrent respiratory infections. Regression affects a subset of individuals, and the etiology of regression has yet to be examined. In this study, alterations in the hypothalamus-pituitary-adrenal axis, including diurnal patterns in salivary cortisol, were examined in four males with MECP2 duplication syndrome who had regression, and four males with the same syndrome without regression (ages 3–22 years). Individuals who had experienced regression do not exhibit typical diurnal cortisol rhythms, and their profiles were flatter through the day. In contrast, individuals with MECP2 duplication syndrome who had not experienced regression showed more typical patterns of higher cortisol levels in the morning with linear decreases throughout the day. This study is the first to suggest a link between atypical diurnal cortisol rhythms and regression status in MECP2 duplication syndrome, and may have implications for treatment. PMID:25999300

  16. Hidden Connections between Regression Models of Strain-Gage Balance Calibration Data

    NASA Technical Reports Server (NTRS)

    Ulbrich, Norbert

    2013-01-01

    Hidden connections between regression models of wind tunnel strain-gage balance calibration data are investigated. These connections become visible whenever balance calibration data is supplied in its design format and both the Iterative and Non-Iterative Method are used to process the data. First, it is shown how the regression coefficients of the fitted balance loads of a force balance can be approximated by using the corresponding regression coefficients of the fitted strain-gage outputs. Then, data from the manual calibration of the Ames MK40 six-component force balance is chosen to illustrate how estimates of the regression coefficients of the fitted balance loads can be obtained from the regression coefficients of the fitted strain-gage outputs. The study illustrates that load predictions obtained by applying the Iterative or the Non-Iterative Method originate from two related regression solutions of the balance calibration data as long as balance loads are given in the design format of the balance, gage outputs behave highly linear, strict statistical quality metrics are used to assess regression models of the data, and regression model term combinations of the fitted loads and gage outputs can be obtained by a simple variable exchange.

  17. GLOBALLY ADAPTIVE QUANTILE REGRESSION WITH ULTRA-HIGH DIMENSIONAL DATA

    PubMed Central

    Zheng, Qi; Peng, Limin; He, Xuming

    2015-01-01

    Quantile regression has become a valuable tool to analyze heterogeneous covaraite-response associations that are often encountered in practice. The development of quantile regression methodology for high dimensional covariates primarily focuses on examination of model sparsity at a single or multiple quantile levels, which are typically prespecified ad hoc by the users. The resulting models may be sensitive to the specific choices of the quantile levels, leading to difficulties in interpretation and erosion of confidence in the results. In this article, we propose a new penalization framework for quantile regression in the high dimensional setting. We employ adaptive L1 penalties, and more importantly, propose a uniform selector of the tuning parameter for a set of quantile levels to avoid some of the potential problems with model selection at individual quantile levels. Our proposed approach achieves consistent shrinkage of regression quantile estimates across a continuous range of quantiles levels, enhancing the flexibility and robustness of the existing penalized quantile regression methods. Our theoretical results include the oracle rate of uniform convergence and weak convergence of the parameter estimators. We also use numerical studies to confirm our theoretical findings and illustrate the practical utility of our proposal. PMID:26604424

  18. Regression analysis for solving diagnosis problem of children's health

    NASA Astrophysics Data System (ADS)

    Cherkashina, Yu A.; Gerget, O. M.

    2016-04-01

    The paper includes results of scientific researches. These researches are devoted to the application of statistical techniques, namely, regression analysis, to assess the health status of children in the neonatal period based on medical data (hemostatic parameters, parameters of blood tests, the gestational age, vascular-endothelial growth factor) measured at 3-5 days of children's life. In this paper a detailed description of the studied medical data is given. A binary logistic regression procedure is discussed in the paper. Basic results of the research are presented. A classification table of predicted values and factual observed values is shown, the overall percentage of correct recognition is determined. Regression equation coefficients are calculated, the general regression equation is written based on them. Based on the results of logistic regression, ROC analysis was performed, sensitivity and specificity of the model are calculated and ROC curves are constructed. These mathematical techniques allow carrying out diagnostics of health of children providing a high quality of recognition. The results make a significant contribution to the development of evidence-based medicine and have a high practical importance in the professional activity of the author.

  19. Large Unbalanced Credit Scoring Using Lasso-Logistic Regression Ensemble

    PubMed Central

    Wang, Hong; Xu, Qingsong; Zhou, Lifeng

    2015-01-01

    Recently, various ensemble learning methods with different base classifiers have been proposed for credit scoring problems. However, for various reasons, there has been little research using logistic regression as the base classifier. In this paper, given large unbalanced data, we consider the plausibility of ensemble learning using regularized logistic regression as the base classifier to deal with credit scoring problems. In this research, the data is first balanced and diversified by clustering and bagging algorithms. Then we apply a Lasso-logistic regression learning ensemble to evaluate the credit risks. We show that the proposed algorithm outperforms popular credit scoring models such as decision tree, Lasso-logistic regression and random forests in terms of AUC and F-measure. We also provide two importance measures for the proposed model to identify important variables in the data. PMID:25706988

  20. Quantile Regression with Censored Data

    ERIC Educational Resources Information Center

    Lin, Guixian

    2009-01-01

    The Cox proportional hazards model and the accelerated failure time model are frequently used in survival data analysis. They are powerful, yet have limitation due to their model assumptions. Quantile regression offers a semiparametric approach to model data with possible heterogeneity. It is particularly powerful for censored responses, where the…

  1. Ordinary least squares regression is indicated for studies of allometry.

    PubMed

    Kilmer, J T; Rodríguez, R L

    2017-01-01

    When it comes to fitting simple allometric slopes through measurement data, evolutionary biologists have been torn between regression methods. On the one hand, there is the ordinary least squares (OLS) regression, which is commonly used across many disciplines of biology to fit lines through data, but which has a reputation for underestimating slopes when measurement error is present. On the other hand, there is the reduced major axis (RMA) regression, which is often recommended as a substitute for OLS regression in studies of allometry, but which has several weaknesses of its own. Here, we review statistical theory as it applies to evolutionary biology and studies of allometry. We point out that the concerns that arise from measurement error for OLS regression are small and straightforward to deal with, whereas RMA has several key properties that make it unfit for use in the field of allometry. The recommended approach for researchers interested in allometry is to use OLS regression on measurements taken with low (but realistically achievable) measurement error. If measurement error is unavoidable and relatively large, it is preferable to correct for slope attenuation rather than to turn to RMA regression, or to take the expected amount of attenuation into account when interpreting the data. © 2016 European Society For Evolutionary Biology. Journal of Evolutionary Biology © 2016 European Society For Evolutionary Biology.

  2. [Risk factors on the recurrence of ischemic stroke and the establishment of a Cox's regression model].

    PubMed

    An, Ya-chen; Chen, Yun-xia; Wang, Yu-xun; Zhao, Xiao-jing; Wang, Yan; Zhang, Jiang; Li, Chun-ling; Peng, Yan-bo; Gao, Su-ling; Chang, Li-sha; Zhang, Li; Xue, Xin-hong; Chen, Rui-ying; Wang, Da-li

    2011-08-01

    To investigate the risk factors and establish the Cox's regression model on the recurrence of ischemic stroke. We retrospectively reviewed consecutive patients with ischemic stroke admitted to the Neurology Department of the Hebei United University Affiliated Hospital between January 1, 2008 and December 31, 2009. Cases had been followed since the onset of ischemic stroke. The follow-up program was finished in June 30, 2010. Kaplan-Meier methods were used to describe the recurrence rate. Monovariant and multivariate Cox's proportional hazard regression model were used to analyze the risk factors associated to the episodes of recurrence. And then, a recurrence model was set up. During the period of follow-up program, 79 cases were relapsed, with the recurrence rates as 12.75% in one year and 18.87% in two years. Monovariant and multivariate Cox's proportional hazard regression model showed that the independent risk factors that were associated with the recurrence appeared to be age (X₁) (RR = 1.025, 95%CI: 1.003 - 1.048), history of hypertension (X₂) (RR = 1.976, 95%CI: 1.014 - 3.851), history of family strokes (X₃) (RR = 2.647, 95%CI: 1.175 - 5.961), total cholesterol amount (X₄) (RR = 1.485, 95%CI: 1.214 - 1.817), ESRS total scores (X₅) (RR = 1.327, 95%CI: 1.057 - 1.666) and progression of the disease (X₆) (RR = 1.889, 95%CI: 1.123 - 3.178). Personal prognosis index (PI) of the recurrence model was as follows: PI = 0.025X₁ + 0.681X₂ + 0.973X₃ + 0.395X₄ + 0.283X₅ + 0.636X₆. The smaller the personal prognosis index was, the lower the recurrence risk appeared, while the bigger the personal prognosis index was, the higher the recurrence risk appeared. Age, history of hypertension, total cholesterol amount, total scores of ESRS, together with the disease progression were the independent risk factors associated with the recurrence episodes of ischemic stroke. Both recurrence model and the personal prognosis index equation were successful

  3. Evaluating effects of developmental education for college students using a regression discontinuity design.

    PubMed

    Moss, Brian G; Yeaton, William H

    2013-10-01

    Annually, American colleges and universities provide developmental education (DE) to millions of underprepared students; however, evaluation estimates of DE benefits have been mixed. Using a prototypic exemplar of DE, our primary objective was to investigate the utility of a replicative evaluative framework for assessing program effectiveness. Within the context of the regression discontinuity (RD) design, this research examined the effectiveness of a DE program for five, sequential cohorts of first-time college students. Discontinuity estimates were generated for individual terms and cumulatively, across terms. Participants were 3,589 first-time community college students. DE program effects were measured by contrasting both college-level English grades and a dichotomous measure of pass/fail, for DE and non-DE students. Parametric and nonparametric estimates of overall effect were positive for continuous and dichotomous measures of achievement (grade and pass/fail). The variability of program effects over time was determined by tracking results within individual terms and cumulatively, across terms. Applying this replication strategy, DE's overall impact was modest (an effect size of approximately .20) but quite consistent, based on parametric and nonparametric estimation approaches. A meta-analysis of five RD results yielded virtually the same estimate as the overall, parametric findings. Subset analysis, though tentative, suggested that males benefited more than females, while academic gains were comparable for different ethnicities. The cumulative, within-study comparison, replication approach offers considerable potential for the evaluation of new and existing policies, particularly when effects are relatively small, as is often the case in applied settings.

  4. Adolescent Substance Use Following Participation in a Universal Drug Prevention Program: Examining Relationships with Program Recall and Baseline Use Status

    PubMed Central

    Bavarian, Niloofar; Duncan, Robert; Lewis, Kendra M.; Miao, Alicia; Washburn, Isaac J.

    2014-01-01

    Background We examined whether adolescents receiving a universal, school-based, drug-prevention program in grade 7 varied, by student profile, in substance use behaviors post-program implementation. Profiles were a function of recall of program receipt and substance use at baseline. Methods We analyzed data from the Adolescent Substance Abuse Prevention Study, a large, geographically diverse, longitudinal school-based cluster-randomized controlled trial of the Take Charge of Your Life drug-prevention program. Profiles were created using self-reported substance use (pre-intervention) and program recall (post-intervention) at Grade 7. We first examined characteristics of each of the four profiles of treatment students who varied by program recall and baseline substance use. Using multilevel logistic regression analyses, we examined differences in the odds of substance use (alcohol, tobacco, and marijuana) among student profiles at the six additional study waves (Time 2 (Grade 7) through Time 7 (Grade 11)). Results Pearson’s chi-square tests showed sample characteristics varied by student profile. Multilevel logistic regression results were consistent across all examined substance use behaviors at all time points. Namely, as compared to students who had no baseline substance use and had program recall (No Use, Recall), each of the remaining three profiles (No Use, No Recall; Use, Recall; Use, No Recall) were more likely to engage in substance use. Post-hoc analyses showed that for the two sub-profiles of baseline substance users, there were only two observed, and inconsistent, differences in the odds of subsequent substance use by recall status. Conclusions Findings suggest that for students who were not baseline substance users, program recall significantly decreased the likelihood of subsequent substance use. For students who were baseline substance users, program recall did not generally influence subsequent substance use. Implications for school-based drug

  5. Adolescent Substance Use Following Participation in a Universal Drug Prevention Program: Examining Relationships With Program Recall and Baseline Use Status.

    PubMed

    Bavarian, Niloofar; Duncan, Robert; Lewis, Kendra M; Miao, Alicia; Washburn, Isaac J

    2015-01-01

    The study examined whether adolescents receiving a universal, school based, drug prevention program in Grade 7 varied, by student profile, in substance use behaviors post program implementation. Profiles were a function of recall of program receipt and substance use at baseline. A secondary analysis was conducted on data from the Adolescent Substance Abuse Prevention Study, a large, geographically diverse, longitudinal school-based cluster-randomized controlled trial of the Take Charge of Your Life drug prevention program. Profiles were created using self-reported substance use (preintervention) and program recall (postintervention) at Grade 7. First, characteristics of each of the 4 profiles of treatment students who varied by program recall and baseline substance use were explored. Then, multilevel logistic regression analyses were used to examine differences in the odds of substance use (alcohol, tobacco, and marijuana) among student profiles at the 6 additional study waves (Time 2 [Grade 7] through Time 7 [Grade 11]). Pearson's chi-square tests showed sample characteristics varied by student profile. Multilevel logistic regression results were consistent across all examined substance use behaviors at all time points. Namely, as compared with students who had no baseline substance use and had program recall (No Use, Recall), each of the remaining 3 profiles (No Use, No Recall; Use, Recall; Use, No Recall) were more likely to engage in substance use. Post hoc analyses showed that for the 2 subprofiles of baseline substance users, there were only 2 observed, and inconsistent, differences in the odds of subsequent substance use by recall status. Findings suggest that for students who were not baseline substance users, program recall significantly decreased the likelihood of subsequent substance use. For students who were baseline substance users, program recall did not generally influence subsequent substance use. Implications for school-based drug prevention

  6. Meta-regression approximations to reduce publication selection bias.

    PubMed

    Stanley, T D; Doucouliagos, Hristos

    2014-03-01

    Publication selection bias is a serious challenge to the integrity of all empirical sciences. We derive meta-regression approximations to reduce this bias. Our approach employs Taylor polynomial approximations to the conditional mean of a truncated distribution. A quadratic approximation without a linear term, precision-effect estimate with standard error (PEESE), is shown to have the smallest bias and mean squared error in most cases and to outperform conventional meta-analysis estimators, often by a great deal. Monte Carlo simulations also demonstrate how a new hybrid estimator that conditionally combines PEESE and the Egger regression intercept can provide a practical solution to publication selection bias. PEESE is easily expanded to accommodate systematic heterogeneity along with complex and differential publication selection bias that is related to moderator variables. By providing an intuitive reason for these approximations, we can also explain why the Egger regression works so well and when it does not. These meta-regression methods are applied to several policy-relevant areas of research including antidepressant effectiveness, the value of a statistical life, the minimum wage, and nicotine replacement therapy. Copyright © 2013 John Wiley & Sons, Ltd.

  7. Regression Discontinuity for Causal Effect Estimation in Epidemiology.

    PubMed

    Oldenburg, Catherine E; Moscoe, Ellen; Bärnighausen, Till

    Regression discontinuity analyses can generate estimates of the causal effects of an exposure when a continuously measured variable is used to assign the exposure to individuals based on a threshold rule. Individuals just above the threshold are expected to be similar in their distribution of measured and unmeasured baseline covariates to individuals just below the threshold, resulting in exchangeability. At the threshold exchangeability is guaranteed if there is random variation in the continuous assignment variable, e.g., due to random measurement error. Under exchangeability, causal effects can be identified at the threshold. The regression discontinuity intention-to-treat (RD-ITT) effect on an outcome can be estimated as the difference in the outcome between individuals just above (or below) versus just below (or above) the threshold. This effect is analogous to the ITT effect in a randomized controlled trial. Instrumental variable methods can be used to estimate the effect of exposure itself utilizing the threshold as the instrument. We review the recent epidemiologic literature reporting regression discontinuity studies and find that while regression discontinuity designs are beginning to be utilized in a variety of applications in epidemiology, they are still relatively rare, and analytic and reporting practices vary. Regression discontinuity has the potential to greatly contribute to the evidence base in epidemiology, in particular on the real-life and long-term effects and side-effects of medical treatments that are provided based on threshold rules - such as treatments for low birth weight, hypertension or diabetes.

  8. Evaluation of regression-based 3-D shoulder rhythms.

    PubMed

    Xu, Xu; Dickerson, Clark R; Lin, Jia-Hua; McGorry, Raymond W

    2016-08-01

    The movements of the humerus, the clavicle, and the scapula are not completely independent. The coupled pattern of movement of these bones is called the shoulder rhythm. To date, multiple studies have focused on providing regression-based 3-D shoulder rhythms, in which the orientations of the clavicle and the scapula are estimated by the orientation of the humerus. In this study, six existing regression-based shoulder rhythms were evaluated by an independent dataset in terms of their predictability. The datasets include the measured orientations of the humerus, the clavicle, and the scapula of 14 participants over 118 different upper arm postures. The predicted orientations of the clavicle and the scapula were derived from applying those regression-based shoulder rhythms to the humerus orientation. The results indicated that none of those regression-based shoulder rhythms provides consistently more accurate results than the others. For all the joint angles and all the shoulder rhythms, the RMSE are all greater than 5°. Among those shoulder rhythms, the scapula lateral/medial rotation has the strongest correlation between the predicted and the measured angles, while the other thoracoclavicular and thoracoscapular bone orientation angles only showed a weak to moderate correlation. Since the regression-based shoulder rhythm has been adopted for shoulder biomechanical models to estimate shoulder muscle activities and structure loads, there needs to be further investigation on how the predicted error from the shoulder rhythm affects the output of the biomechanical model. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.

  9. Multiple Regression: A Leisurely Primer.

    ERIC Educational Resources Information Center

    Daniel, Larry G.; Onwuegbuzie, Anthony J.

    Multiple regression is a useful statistical technique when the researcher is considering situations in which variables of interest are theorized to be multiply caused. It may also be useful in those situations in which the researchers is interested in studies of predictability of phenomena of interest. This paper provides an introduction to…

  10. Computer Simulation of Human Service Program Evaluations.

    ERIC Educational Resources Information Center

    Trochim, William M. K.; Davis, James E.

    1985-01-01

    Describes uses of computer simulations for the context of human service program evaluation. Presents simple mathematical models for most commonly used human service outcome evaluation designs (pretest-posttest randomized experiment, pretest-posttest nonequivalent groups design, and regression-discontinuity design). Translates models into single…

  11. Non-ignorable missingness in logistic regression.

    PubMed

    Wang, Joanna J J; Bartlett, Mark; Ryan, Louise

    2017-08-30

    Nonresponses and missing data are common in observational studies. Ignoring or inadequately handling missing data may lead to biased parameter estimation, incorrect standard errors and, as a consequence, incorrect statistical inference and conclusions. We present a strategy for modelling non-ignorable missingness where the probability of nonresponse depends on the outcome. Using a simple case of logistic regression, we quantify the bias in regression estimates and show the observed likelihood is non-identifiable under non-ignorable missing data mechanism. We then adopt a selection model factorisation of the joint distribution as the basis for a sensitivity analysis to study changes in estimated parameters and the robustness of study conclusions against different assumptions. A Bayesian framework for model estimation is used as it provides a flexible approach for incorporating different missing data assumptions and conducting sensitivity analysis. Using simulated data, we explore the performance of the Bayesian selection model in correcting for bias in a logistic regression. We then implement our strategy using survey data from the 45 and Up Study to investigate factors associated with worsening health from the baseline to follow-up survey. Our findings have practical implications for the use of the 45 and Up Study data to answer important research questions relating to health and quality-of-life. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.

  12. Satellite rainfall retrieval by logistic regression

    NASA Technical Reports Server (NTRS)

    Chiu, Long S.

    1986-01-01

    The potential use of logistic regression in rainfall estimation from satellite measurements is investigated. Satellite measurements provide covariate information in terms of radiances from different remote sensors.The logistic regression technique can effectively accommodate many covariates and test their significance in the estimation. The outcome from the logistical model is the probability that the rainrate of a satellite pixel is above a certain threshold. By varying the thresholds, a rainrate histogram can be obtained, from which the mean and the variant can be estimated. A logistical model is developed and applied to rainfall data collected during GATE, using as covariates the fractional rain area and a radiance measurement which is deduced from a microwave temperature-rainrate relation. It is demonstrated that the fractional rain area is an important covariate in the model, consistent with the use of the so-called Area Time Integral in estimating total rain volume in other studies. To calibrate the logistical model, simulated rain fields generated by rainfield models with prescribed parameters are needed. A stringent test of the logistical model is its ability to recover the prescribed parameters of simulated rain fields. A rain field simulation model which preserves the fractional rain area and lognormality of rainrates as found in GATE is developed. A stochastic regression model of branching and immigration whose solutions are lognormally distributed in some asymptotic limits has also been developed.

  13. Weighted regression analysis and interval estimators

    Treesearch

    Donald W. Seegrist

    1974-01-01

    A method for deriving the weighted least squares estimators for the parameters of a multiple regression model. Confidence intervals for expected values, and prediction intervals for the means of future samples are given.

  14. Quality of life in breast cancer patients--a quantile regression analysis.

    PubMed

    Pourhoseingholi, Mohamad Amin; Safaee, Azadeh; Moghimi-Dehkordi, Bijan; Zeighami, Bahram; Faghihzadeh, Soghrat; Tabatabaee, Hamid Reza; Pourhoseingholi, Asma

    2008-01-01

    Quality of life study has an important role in health care especially in chronic diseases, in clinical judgment and in medical resources supplying. Statistical tools like linear regression are widely used to assess the predictors of quality of life. But when the response is not normal the results are misleading. The aim of this study is to determine the predictors of quality of life in breast cancer patients, using quantile regression model and compare to linear regression. A cross-sectional study conducted on 119 breast cancer patients that admitted and treated in chemotherapy ward of Namazi hospital in Shiraz. We used QLQ-C30 questionnaire to assessment quality of life in these patients. A quantile regression was employed to assess the assocciated factors and the results were compared to linear regression. All analysis carried out using SAS. The mean score for the global health status for breast cancer patients was 64.92+/-11.42. Linear regression showed that only grade of tumor, occupational status, menopausal status, financial difficulties and dyspnea were statistically significant. In spite of linear regression, financial difficulties were not significant in quantile regression analysis and dyspnea was only significant for first quartile. Also emotion functioning and duration of disease statistically predicted the QOL score in the third quartile. The results have demonstrated that using quantile regression leads to better interpretation and richer inference about predictors of the breast cancer patient quality of life.

  15. School Exits in the Milwaukee Parental Choice Program: Evidence of a Marketplace?

    ERIC Educational Resources Information Center

    Ford, Michael

    2011-01-01

    This article examines whether the large number of school exits from the Milwaukee school voucher program is evidence of a marketplace. Two logistic regression and multinomial logistic regression models tested the relation between the inability to draw large numbers of voucher students and the ability for a private school to remain viable. Data on…

  16. SPSS and SAS programming for the testing of mediation models.

    PubMed

    Dudley, William N; Benuzillo, Jose G; Carrico, Mineh S

    2004-01-01

    Mediation modeling can explain the nature of the relation among three or more variables. In addition, it can be used to show how a variable mediates the relation between levels of intervention and outcome. The Sobel test, developed in 1990, provides a statistical method for determining the influence of a mediator on an intervention or outcome. Although interactive Web-based and stand-alone methods exist for computing the Sobel test, SPSS and SAS programs that automatically run the required regression analyses and computations increase the accessibility of mediation modeling to nursing researchers. To illustrate the utility of the Sobel test and to make this programming available to the Nursing Research audience in both SAS and SPSS. The history, logic, and technical aspects of mediation testing are introduced. The syntax files sobel.sps and sobel.sas, created to automate the computation of the regression analysis and test statistic, are available from the corresponding author. The reported programming allows the user to complete mediation testing with the user's own data in a single-step fashion. A technical manual included with the programming provides instruction on program use and interpretation of the output. Mediation modeling is a useful tool for describing the relation between three or more variables. Programming and manuals for using this model are made available.

  17. Regression Models For Multivariate Count Data.

    PubMed

    Zhang, Yiwen; Zhou, Hua; Zhou, Jin; Sun, Wei

    2017-01-01

    Data with multivariate count responses frequently occur in modern applications. The commonly used multinomial-logit model is limiting due to its restrictive mean-variance structure. For instance, analyzing count data from the recent RNA-seq technology by the multinomial-logit model leads to serious errors in hypothesis testing. The ubiquity of over-dispersion and complicated correlation structures among multivariate counts calls for more flexible regression models. In this article, we study some generalized linear models that incorporate various correlation structures among the counts. Current literature lacks a treatment of these models, partly due to the fact that they do not belong to the natural exponential family. We study the estimation, testing, and variable selection for these models in a unifying framework. The regression models are compared on both synthetic and real RNA-seq data.

  18. Digression and Value Concatenation to Enable Privacy-Preserving Regression.

    PubMed

    Li, Xiao-Bai; Sarkar, Sumit

    2014-09-01

    Regression techniques can be used not only for legitimate data analysis, but also to infer private information about individuals. In this paper, we demonstrate that regression trees, a popular data-analysis and data-mining technique, can be used to effectively reveal individuals' sensitive data. This problem, which we call a "regression attack," has not been addressed in the data privacy literature, and existing privacy-preserving techniques are not appropriate in coping with this problem. We propose a new approach to counter regression attacks. To protect against privacy disclosure, our approach introduces a novel measure, called digression , which assesses the sensitive value disclosure risk in the process of building a regression tree model. Specifically, we develop an algorithm that uses the measure for pruning the tree to limit disclosure of sensitive data. We also propose a dynamic value-concatenation method for anonymizing data, which better preserves data utility than a user-defined generalization scheme commonly used in existing approaches. Our approach can be used for anonymizing both numeric and categorical data. An experimental study is conducted using real-world financial, economic and healthcare data. The results of the experiments demonstrate that the proposed approach is very effective in protecting data privacy while preserving data quality for research and analysis.

  19. Regression Model Optimization for the Analysis of Experimental Data

    NASA Technical Reports Server (NTRS)

    Ulbrich, N.

    2009-01-01

    A candidate math model search algorithm was developed at Ames Research Center that determines a recommended math model for the multivariate regression analysis of experimental data. The search algorithm is applicable to classical regression analysis problems as well as wind tunnel strain gage balance calibration analysis applications. The algorithm compares the predictive capability of different regression models using the standard deviation of the PRESS residuals of the responses as a search metric. This search metric is minimized during the search. Singular value decomposition is used during the search to reject math models that lead to a singular solution of the regression analysis problem. Two threshold dependent constraints are also applied. The first constraint rejects math models with insignificant terms. The second constraint rejects math models with near-linear dependencies between terms. The math term hierarchy rule may also be applied as an optional constraint during or after the candidate math model search. The final term selection of the recommended math model depends on the regressor and response values of the data set, the user s function class combination choice, the user s constraint selections, and the result of the search metric minimization. A frequently used regression analysis example from the literature is used to illustrate the application of the search algorithm to experimental data.

  20. Do Developmental Mathematics Programs Have a Causal Impact on Student Retention? An Application of Discrete-Time Survival and Regression-Discontinuity Analysis

    ERIC Educational Resources Information Center

    Lesik, Sally A.

    2007-01-01

    The impact of academic programs--such as developmental mathematics programs--on student retention, has been a controversial topic for administrators, policy makers, and faculty in higher education. Despite deep interest in the effectiveness of these programs in retaining students, scholars have been unable to determine whether such programs have a…

  1. Sample size determination for logistic regression on a logit-normal distribution.

    PubMed

    Kim, Seongho; Heath, Elisabeth; Heilbrun, Lance

    2017-06-01

    Although the sample size for simple logistic regression can be readily determined using currently available methods, the sample size calculation for multiple logistic regression requires some additional information, such as the coefficient of determination ([Formula: see text]) of a covariate of interest with other covariates, which is often unavailable in practice. The response variable of logistic regression follows a logit-normal distribution which can be generated from a logistic transformation of a normal distribution. Using this property of logistic regression, we propose new methods of determining the sample size for simple and multiple logistic regressions using a normal transformation of outcome measures. Simulation studies and a motivating example show several advantages of the proposed methods over the existing methods: (i) no need for [Formula: see text] for multiple logistic regression, (ii) available interim or group-sequential designs, and (iii) much smaller required sample size.

  2. Regression Commonality Analysis: A Technique for Quantitative Theory Building

    ERIC Educational Resources Information Center

    Nimon, Kim; Reio, Thomas G., Jr.

    2011-01-01

    When it comes to multiple linear regression analysis (MLR), it is common for social and behavioral science researchers to rely predominately on beta weights when evaluating how predictors contribute to a regression model. Presenting an underutilized statistical technique, this article describes how organizational researchers can use commonality…

  3. Correlation Weights in Multiple Regression

    ERIC Educational Resources Information Center

    Waller, Niels G.; Jones, Jeff A.

    2010-01-01

    A general theory on the use of correlation weights in linear prediction has yet to be proposed. In this paper we take initial steps in developing such a theory by describing the conditions under which correlation weights perform well in population regression models. Using OLS weights as a comparison, we define cases in which the two weighting…

  4. The devil is in the details: trends in avoidable hospitalization rates by geography in British Columbia, 1990–2000

    PubMed Central

    Cloutier-Fisher, Denise; Penning, Margaret J; Zheng, Chi; Druyts, Eric-Bené F

    2006-01-01

    Background Researchers and policy makers have focussed on the development of indicators to help monitor the success of regionalization, primary care reform and other health sector restructuring initiatives. Certain indicators are useful in examining issues of equity in service provision, especially among older populations, regardless of where they live. AHRs are used as an indicator of primary care system efficiency and thus reveal information about access to general practitioners. The purpose of this paper is to examine trends in avoidable hospitalization rates (AHRs) during a period of time characterized by several waves of health sector restructuring and regionalization in British Columbia. AHRs are examined in relation to non-avoidable and total hospitalization rates as well as by urban and rural geography across the province. Methods Analyses draw on linked administrative health data from the province of British Columbia for 1990 through 2000 for the population aged 50 and over. Joinpoint regression analyses and t-tests are used to detect and describe trends in the data. Results Generally speaking, non-avoidable hospitalizations constitute the vast majority of hospitalizations in a given year (i.e. around 95%) with AHRs constituting the remaining 5% of hospitalizations. Comparing rural areas and urban areas reveals that standardized rates of avoidable, non-avoidable and total hospitalizations are consistently higher in rural areas. Joinpoint regression results show significantly decreasing trends overall; lines are parallel in the case of avoidable hospitalizations, and lines are diverging for non-avoidable and total hospitalizations, with the gap between rural and urban areas being wider at the end of the time interval than at the beginning. Conclusion These data suggest that access to effective primary care in rural communities remains problematic in BC given that rural areas did not make any gains in AHRs relative to urban areas under recent health sector

  5. [Incidence and survival of esophageal cancer with different histological types in Linzhou between 2003 and 2012].

    PubMed

    Liu, S Z; Yu, L; Chen, Q; Quan, P L; Cao, X Q; Sun, X B

    2017-05-06

    Objective: To investigate the incidence and survival of esophageal cancer with different histological types and to understand the incidence trend and burden of esophageal cancer in Linzhou during 2003-2012. Methods: All incidence records of esophageal cancer and population reported were collected from Linzhou Cancer Registry during 2003-2012. Incidence rate was calculated using gender and histological types. Age standardized incidence rate was calculated according to world Segi's population and Chinese census data in 2000. Age standardized incidence rate by world population between 2003 and 2012 was analyzed with JoinPoint regression model and estimated annual percentage change (EAPC) was calculated. 5-year survival rate was calculated with Kaplan-Meier model. Results: There were 8 229 esophageal cancer cases in Linzhou during 2003-2012. The average annual incidence rate was 80.08/100 000 (8 229/10 276 481). Among all esophageal cancer cases, 7 019 (85.3%) were diagnosed as esophageal squamous cell carcinoma (ESCC). In Linzhou, the age standardized incidence rate by Chinese standard population and by world standard population was 80.92/100 000 and 81.85/100 000 in 2003, 67.97/100 000 and 68.63/100 000 in 2012. JoinPoint regression model showed that EAPC was-12.9% (95 %CI: -16.4%--9.1%) for other and unspecified histological type between 2003 and 2012. The EAPC was-5.5% (95 %CI: -9.2%--1.6%) for esophageal cancer between 2007 and 2012,-5.4% (95 %CI: -7.0%--3.9%) for esophageal cancer in female between 2006 and 2012,-4.9% (95 %CI: -9.5%--0.1%) for ESCC between 2007 and 2012. The 5-year prevalence of esophageal cancer was 215.49 per 100 000 (2 337/1 084 493), and 5 489 died within 5 years after incidence. 5-year survival rate of esophageal cancer was 34.6% (95 %CI: 33.5%-35.6%). Conclusion: Esophageal cancer had a decreasing trend in Linzhou. The survival rate was increasing. But, esophageal cancer was still a major burden in Linzhou. The major histological type was

  6. Impact of the global financial crisis on low birth weight in Portugal: a time-trend analysis

    PubMed Central

    Kana, Musa Abubakar; Correia, Sofia; Peleteiro, Barbara; Severo, Milton; Barros, Henrique

    2017-01-01

    Background The 2007–2008 global financial crisis had adverse consequences on population health of affected European countries. Few contemporary studies have studied its effect on perinatal indicators with long-lasting influence on adult health. Therefore, in this study, we investigated the impact of the 2007–2008 global financial crisis on low birth weight (LBW) in Portugal. Methods Data on 2 045 155 singleton births of 1995–2014 were obtained from Statistics Portugal. Joinpoint regression analysis was performed to identify the years in which changes in LBW trends occurred, and to estimate the annual per cent changes (APC). LBW risk by time period expressed as prevalence ratios were computed using the Poisson regression. Contextual changes in sociodemographic and economic factors were provided by their trends. Results The joinpoint analysis identified 3 distinct periods (2 jointpoints) with different APC in LBW, corresponding to 1995–1999 (APC=4.4; 95% CI 3.2 to 5.6), 2000–2006 (APC=0.1; 95% CI −050 to 0.7) and 2007–2014 (APC=1.6; 95% CI 1.2 to 2.0). For non-Portuguese, it was, respectively, 1995–1999 (APC=1.4; 95% CI −3.9 to 7.0%), 2000–2007 (APC=−4.2; 95% CI −6.4 to −2.0) and 2008–2014 (APC=3.1; 95% CI 0.8 to 5.5). Compared with 1995–1999, all specific maternal characteristics had a 10–15% increase in LBW risk in 2000–2006 and a 20–25% increase in 2007–2014, except among migrants, for which LBW risk remained lower than in 1995–1999 but increased after the crisis. The increasing LBW risk coincides with a deceleration in gross domestic product growth rate, reduction in health expenditure, social protection allocation on family/children support and sickness. Conclusions The 2007–2008 global financial crisis was associated with a significant increase in LBW, particularly among infants of non-Portuguese mothers. We recommend strengthening social policies aimed at maternity protection for vulnerable mothers and health

  7. Adjusted variable plots for Cox's proportional hazards regression model.

    PubMed

    Hall, C B; Zeger, S L; Bandeen-Roche, K J

    1996-01-01

    Adjusted variable plots are useful in linear regression for outlier detection and for qualitative evaluation of the fit of a model. In this paper, we extend adjusted variable plots to Cox's proportional hazards model for possibly censored survival data. We propose three different plots: a risk level adjusted variable (RLAV) plot in which each observation in each risk set appears, a subject level adjusted variable (SLAV) plot in which each subject is represented by one point, and an event level adjusted variable (ELAV) plot in which the entire risk set at each failure event is represented by a single point. The latter two plots are derived from the RLAV by combining multiple points. In each point, the regression coefficient and standard error from a Cox proportional hazards regression is obtained by a simple linear regression through the origin fit to the coordinates of the pictured points. The plots are illustrated with a reanalysis of a dataset of 65 patients with multiple myeloma.

  8. Targeting regressions: do readers pay attention to the left?

    PubMed

    Apel, Jens K; Henderson, John M; Ferreira, Fernanda

    2012-12-01

    The perceptual span during normal reading extends approximately 14 to 15 characters to the right and three to four characters to the left of a current fixation. In the present study, we investigated whether the perceptual span extends farther than three to four characters to the left immediately before readers execute a regression. We used a display-change paradigm in which we masked words beyond the three-to-four-character range to the left of a fixation. We hypothesized that if reading behavior was affected by this manipulation before regressions but not before progressions, we would have evidence that the perceptual span extends farther left before leftward eye movements. We observed significantly shorter regressive saccades and longer fixation and gaze durations in the masked condition when a regression was executed. Forward saccades were entirely unaffected by the manipulations. We concluded that the perceptual span during reading changes, depending on the direction of a following saccade.

  9. Linear regression models for solvent accessibility prediction in proteins.

    PubMed

    Wagner, Michael; Adamczak, Rafał; Porollo, Aleksey; Meller, Jarosław

    2005-04-01

    The relative solvent accessibility (RSA) of an amino acid residue in a protein structure is a real number that represents the solvent exposed surface area of this residue in relative terms. The problem of predicting the RSA from the primary amino acid sequence can therefore be cast as a regression problem. Nevertheless, RSA prediction has so far typically been cast as a classification problem. Consequently, various machine learning techniques have been used within the classification framework to predict whether a given amino acid exceeds some (arbitrary) RSA threshold and would thus be predicted to be "exposed," as opposed to "buried." We have recently developed novel methods for RSA prediction using nonlinear regression techniques which provide accurate estimates of the real-valued RSA and outperform classification-based approaches with respect to commonly used two-class projections. However, while their performance seems to provide a significant improvement over previously published approaches, these Neural Network (NN) based methods are computationally expensive to train and involve several thousand parameters. In this work, we develop alternative regression models for RSA prediction which are computationally much less expensive, involve orders-of-magnitude fewer parameters, and are still competitive in terms of prediction quality. In particular, we investigate several regression models for RSA prediction using linear L1-support vector regression (SVR) approaches as well as standard linear least squares (LS) regression. Using rigorously derived validation sets of protein structures and extensive cross-validation analysis, we compare the performance of the SVR with that of LS regression and NN-based methods. In particular, we show that the flexibility of the SVR (as encoded by metaparameters such as the error insensitivity and the error penalization terms) can be very beneficial to optimize the prediction accuracy for buried residues. We conclude that the simple

  10. Locally linear regression for pose-invariant face recognition.

    PubMed

    Chai, Xiujuan; Shan, Shiguang; Chen, Xilin; Gao, Wen

    2007-07-01

    The variation of facial appearance due to the viewpoint (/pose) degrades face recognition systems considerably, which is one of the bottlenecks in face recognition. One of the possible solutions is generating virtual frontal view from any given nonfrontal view to obtain a virtual gallery/probe face. Following this idea, this paper proposes a simple, but efficient, novel locally linear regression (LLR) method, which generates the virtual frontal view from a given nonfrontal face image. We first justify the basic assumption of the paper that there exists an approximate linear mapping between a nonfrontal face image and its frontal counterpart. Then, by formulating the estimation of the linear mapping as a prediction problem, we present the regression-based solution, i.e., globally linear regression. To improve the prediction accuracy in the case of coarse alignment, LLR is further proposed. In LLR, we first perform dense sampling in the nonfrontal face image to obtain many overlapped local patches. Then, the linear regression technique is applied to each small patch for the prediction of its virtual frontal patch. Through the combination of all these patches, the virtual frontal view is generated. The experimental results on the CMU PIE database show distinct advantage of the proposed method over Eigen light-field method.

  11. The allometry of coarse root biomass: log-transformed linear regression or nonlinear regression?

    PubMed

    Lai, Jiangshan; Yang, Bo; Lin, Dunmei; Kerkhoff, Andrew J; Ma, Keping

    2013-01-01

    Precise estimation of root biomass is important for understanding carbon stocks and dynamics in forests. Traditionally, biomass estimates are based on allometric scaling relationships between stem diameter and coarse root biomass calculated using linear regression (LR) on log-transformed data. Recently, it has been suggested that nonlinear regression (NLR) is a preferable fitting method for scaling relationships. But while this claim has been contested on both theoretical and empirical grounds, and statistical methods have been developed to aid in choosing between the two methods in particular cases, few studies have examined the ramifications of erroneously applying NLR. Here, we use direct measurements of 159 trees belonging to three locally dominant species in east China to compare the LR and NLR models of diameter-root biomass allometry. We then contrast model predictions by estimating stand coarse root biomass based on census data from the nearby 24-ha Gutianshan forest plot and by testing the ability of the models to predict known root biomass values measured on multiple tropical species at the Pasoh Forest Reserve in Malaysia. Based on likelihood estimates for model error distributions, as well as the accuracy of extrapolative predictions, we find that LR on log-transformed data is superior to NLR for fitting diameter-root biomass scaling models. More importantly, inappropriately using NLR leads to grossly inaccurate stand biomass estimates, especially for stands dominated by smaller trees.

  12. Stochastic Approximation Methods for Latent Regression Item Response Models

    ERIC Educational Resources Information Center

    von Davier, Matthias; Sinharay, Sandip

    2010-01-01

    This article presents an application of a stochastic approximation expectation maximization (EM) algorithm using a Metropolis-Hastings (MH) sampler to estimate the parameters of an item response latent regression model. Latent regression item response models are extensions of item response theory (IRT) to a latent variable model with covariates…

  13. A manual-based group program to improve mental health: what kind of teachers are interested and who stands to benefit from this program?

    PubMed

    Unterbrink, Thomas; Pfeifer, Ruth; Krippeit, Lorena; Zimmermann, Linda; Rose, Uwe; Joos, Andreas; Hartmann, Armin; Wirsching, Michael; Bauer, Joachim

    2014-01-01

    In order to evaluate a manual-based group program for teachers aiming at strengthening mental health, we examined (1) whether the teachers interested in participating differ from their colleagues without interest and (2) whether there is evidence of subgroups benefiting more than others among those who participated. Out of a basic sample of 949 schoolteachers, 337 teachers declared interest in a group program. All teachers were surveyed with the "General Health Questionnaire", the "Maslach Burnout Inventory" and the "Effort Reward Imbalance Questionnaire". In addition, participating teachers were screened with the "Symptom Checklist 27" T and χ(2)-tests were calculated to detect differences between those interested in the program and the remaining 612 teachers. Six factors were established and used for a regression analysis that identified specific parameters more or less correlating with health benefits of those who participated in the program. Findings showed that those declaring interest in the intervention displayed a higher degree of occupational stress according to all health parameters examined. Teachers interested in the program were significantly younger, more frequently female and single. The regression analysis showed that the baseline scores of the six health parameters were the strongest predictors for improvement. Worse scores before the beginning of the intervention correlated with a more positive effect. Intervention programs aiming at alleviating the mental stress of teachers find the interest of those who need it most. More importantly, the latter are the ones who--at least if our program is applied-benefit best.

  14. Intermediate and advanced topics in multilevel logistic regression analysis

    PubMed Central

    Merlo, Juan

    2017-01-01

    Multilevel data occur frequently in health services, population and public health, and epidemiologic research. In such research, binary outcomes are common. Multilevel logistic regression models allow one to account for the clustering of subjects within clusters of higher‐level units when estimating the effect of subject and cluster characteristics on subject outcomes. A search of the PubMed database demonstrated that the use of multilevel or hierarchical regression models is increasing rapidly. However, our impression is that many analysts simply use multilevel regression models to account for the nuisance of within‐cluster homogeneity that is induced by clustering. In this article, we describe a suite of analyses that can complement the fitting of multilevel logistic regression models. These ancillary analyses permit analysts to estimate the marginal or population‐average effect of covariates measured at the subject and cluster level, in contrast to the within‐cluster or cluster‐specific effects arising from the original multilevel logistic regression model. We describe the interval odds ratio and the proportion of opposed odds ratios, which are summary measures of effect for cluster‐level covariates. We describe the variance partition coefficient and the median odds ratio which are measures of components of variance and heterogeneity in outcomes. These measures allow one to quantify the magnitude of the general contextual effect. We describe an R 2 measure that allows analysts to quantify the proportion of variation explained by different multilevel logistic regression models. We illustrate the application and interpretation of these measures by analyzing mortality in patients hospitalized with a diagnosis of acute myocardial infarction. © 2017 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd. PMID:28543517

  15. Exact Analysis of Squared Cross-Validity Coefficient in Predictive Regression Models

    ERIC Educational Resources Information Center

    Shieh, Gwowen

    2009-01-01

    In regression analysis, the notion of population validity is of theoretical interest for describing the usefulness of the underlying regression model, whereas the presumably more important concept of population cross-validity represents the predictive effectiveness for the regression equation in future research. It appears that the inference…

  16. Use of probabilistic weights to enhance linear regression myoelectric control

    NASA Astrophysics Data System (ADS)

    Smith, Lauren H.; Kuiken, Todd A.; Hargrove, Levi J.

    2015-12-01

    Objective. Clinically available prostheses for transradial amputees do not allow simultaneous myoelectric control of degrees of freedom (DOFs). Linear regression methods can provide simultaneous myoelectric control, but frequently also result in difficulty with isolating individual DOFs when desired. This study evaluated the potential of using probabilistic estimates of categories of gross prosthesis movement, which are commonly used in classification-based myoelectric control, to enhance linear regression myoelectric control. Approach. Gaussian models were fit to electromyogram (EMG) feature distributions for three movement classes at each DOF (no movement, or movement in either direction) and used to weight the output of linear regression models by the probability that the user intended the movement. Eight able-bodied and two transradial amputee subjects worked in a virtual Fitts’ law task to evaluate differences in controllability between linear regression and probability-weighted regression for an intramuscular EMG-based three-DOF wrist and hand system. Main results. Real-time and offline analyses in able-bodied subjects demonstrated that probability weighting improved performance during single-DOF tasks (p < 0.05) by preventing extraneous movement at additional DOFs. Similar results were seen in experiments with two transradial amputees. Though goodness-of-fit evaluations suggested that the EMG feature distributions showed some deviations from the Gaussian, equal-covariance assumptions used in this experiment, the assumptions were sufficiently met to provide improved performance compared to linear regression control. Significance. Use of probability weights can improve the ability to isolate individual during linear regression myoelectric control, while maintaining the ability to simultaneously control multiple DOFs.

  17. The National Streamflow Statistics Program: A Computer Program for Estimating Streamflow Statistics for Ungaged Sites

    USGS Publications Warehouse

    Ries(compiler), Kernell G.; With sections by Atkins, J. B.; Hummel, P.R.; Gray, Matthew J.; Dusenbury, R.; Jennings, M.E.; Kirby, W.H.; Riggs, H.C.; Sauer, V.B.; Thomas, W.O.

    2007-01-01

    The National Streamflow Statistics (NSS) Program is a computer program that should be useful to engineers, hydrologists, and others for planning, management, and design applications. NSS compiles all current U.S. Geological Survey (USGS) regional regression equations for estimating streamflow statistics at ungaged sites in an easy-to-use interface that operates on computers with Microsoft Windows operating systems. NSS expands on the functionality of the USGS National Flood Frequency Program, and replaces it. The regression equations included in NSS are used to transfer streamflow statistics from gaged to ungaged sites through the use of watershed and climatic characteristics as explanatory or predictor variables. Generally, the equations were developed on a statewide or metropolitan-area basis as part of cooperative study programs. Equations are available for estimating rural and urban flood-frequency statistics, such as the 1 00-year flood, for every state, for Puerto Rico, and for the island of Tutuila, American Samoa. Equations are available for estimating other statistics, such as the mean annual flow, monthly mean flows, flow-duration percentiles, and low-flow frequencies (such as the 7-day, 0-year low flow) for less than half of the states. All equations available for estimating streamflow statistics other than flood-frequency statistics assume rural (non-regulated, non-urbanized) conditions. The NSS output provides indicators of the accuracy of the estimated streamflow statistics. The indicators may include any combination of the standard error of estimate, the standard error of prediction, the equivalent years of record, or 90 percent prediction intervals, depending on what was provided by the authors of the equations. The program includes several other features that can be used only for flood-frequency estimation. These include the ability to generate flood-frequency plots, and plots of typical flood hydrographs for selected recurrence intervals

  18. Regression Models For Multivariate Count Data

    PubMed Central

    Zhang, Yiwen; Zhou, Hua; Zhou, Jin; Sun, Wei

    2016-01-01

    Data with multivariate count responses frequently occur in modern applications. The commonly used multinomial-logit model is limiting due to its restrictive mean-variance structure. For instance, analyzing count data from the recent RNA-seq technology by the multinomial-logit model leads to serious errors in hypothesis testing. The ubiquity of over-dispersion and complicated correlation structures among multivariate counts calls for more flexible regression models. In this article, we study some generalized linear models that incorporate various correlation structures among the counts. Current literature lacks a treatment of these models, partly due to the fact that they do not belong to the natural exponential family. We study the estimation, testing, and variable selection for these models in a unifying framework. The regression models are compared on both synthetic and real RNA-seq data. PMID:28348500

  19. Competing risks regression for clustered data

    PubMed Central

    Zhou, Bingqing; Fine, Jason; Latouche, Aurelien; Labopin, Myriam

    2012-01-01

    A population average regression model is proposed to assess the marginal effects of covariates on the cumulative incidence function when there is dependence across individuals within a cluster in the competing risks setting. This method extends the Fine–Gray proportional hazards model for the subdistribution to situations, where individuals within a cluster may be correlated due to unobserved shared factors. Estimators of the regression parameters in the marginal model are developed under an independence working assumption where the correlation across individuals within a cluster is completely unspecified. The estimators are consistent and asymptotically normal, and variance estimation may be achieved without specifying the form of the dependence across individuals. A simulation study evidences that the inferential procedures perform well with realistic sample sizes. The practical utility of the methods is illustrated with data from the European Bone Marrow Transplant Registry. PMID:22045910

  20. Are increases in cigarette taxation regressive?

    PubMed

    Borren, P; Sutton, M

    1992-12-01

    Using the latest published data from Tobacco Advisory Council surveys, this paper re-evaluates the question of whether or not increases in cigarette taxation are regressive in the United Kingdom. The extended data set shows no evidence of increasing price-elasticity by social class as found in a major previous study. To the contrary, there appears to be no clear pattern in the price responsiveness of smoking behaviour across different social classes. Increases in cigarette taxation, while reducing smoking levels in all groups, fall most heavily on men and women in the lowest social class. Men and women in social class five can expect to pay eight and eleven times more of a tax increase respectively, than their social class one counterparts. Taken as a proportion of relative incomes, the regressive nature of increases in cigarette taxation is even more pronounced.

  1. Geographically weighted regression model on poverty indicator

    NASA Astrophysics Data System (ADS)

    Slamet, I.; Nugroho, N. F. T. A.; Muslich

    2017-12-01

    In this research, we applied geographically weighted regression (GWR) for analyzing the poverty in Central Java. We consider Gaussian Kernel as weighted function. The GWR uses the diagonal matrix resulted from calculating kernel Gaussian function as a weighted function in the regression model. The kernel weights is used to handle spatial effects on the data so that a model can be obtained for each location. The purpose of this paper is to model of poverty percentage data in Central Java province using GWR with Gaussian kernel weighted function and to determine the influencing factors in each regency/city in Central Java province. Based on the research, we obtained geographically weighted regression model with Gaussian kernel weighted function on poverty percentage data in Central Java province. We found that percentage of population working as farmers, population growth rate, percentage of households with regular sanitation, and BPJS beneficiaries are the variables that affect the percentage of poverty in Central Java province. In this research, we found the determination coefficient R2 are 68.64%. There are two categories of district which are influenced by different of significance factors.

  2. Problems and Solutions in Evaluating Child Outcomes of Large-Scale Educational Programs.

    ERIC Educational Resources Information Center

    Abrams, Allan S.; And Others

    1979-01-01

    Evaluation of large-scale programs is problematical because of inherent bias in assignment of treatment and control groups, resulting in serious regression artifacts even with the use of analysis of covariance designs. Nonuniformity of program implementation across sites and classrooms is also a problem. (Author/GSK)

  3. Cactus: An Introduction to Regression

    ERIC Educational Resources Information Center

    Hyde, Hartley

    2008-01-01

    When the author first used "VisiCalc," the author thought it a very useful tool when he had the formulas. But how could he design a spreadsheet if there was no known formula for the quantities he was trying to predict? A few months later, the author relates he learned to use multiple linear regression software and suddenly it all clicked into…

  4. Spatial Assessment of Model Errors from Four Regression Techniques

    Treesearch

    Lianjun Zhang; Jeffrey H. Gove; Jeffrey H. Gove

    2005-01-01

    Fomst modelers have attempted to account for the spatial autocorrelations among trees in growth and yield models by applying alternative regression techniques such as linear mixed models (LMM), generalized additive models (GAM), and geographicalIy weighted regression (GWR). However, the model errors are commonly assessed using average errors across the entire study...

  5. The Precision Efficacy Analysis for Regression Sample Size Method.

    ERIC Educational Resources Information Center

    Brooks, Gordon P.; Barcikowski, Robert S.

    The general purpose of this study was to examine the efficiency of the Precision Efficacy Analysis for Regression (PEAR) method for choosing appropriate sample sizes in regression studies used for precision. The PEAR method, which is based on the algebraic manipulation of an accepted cross-validity formula, essentially uses an effect size to…

  6. Floating Data and the Problem with Illustrating Multiple Regression.

    ERIC Educational Resources Information Center

    Sachau, Daniel A.

    2000-01-01

    Discusses how to introduce basic concepts of multiple regression by creating a large-scale, three-dimensional regression model using the classroom walls and floor. Addresses teaching points that should be covered and reveals student reaction to the model. Finds that the greatest benefit of the model is the low fear, walk-through, nonmathematical…

  7. Regression Effects in Angoff Ratings: Examples from Credentialing Exams

    ERIC Educational Resources Information Center

    Wyse, Adam E.

    2018-01-01

    This article discusses regression effects that are commonly observed in Angoff ratings where panelists tend to think that hard items are easier than they are and easy items are more difficult than they are in comparison to estimated item difficulties. Analyses of data from two credentialing exams illustrate these regression effects and the…

  8. A note on variance estimation in random effects meta-regression.

    PubMed

    Sidik, Kurex; Jonkman, Jeffrey N

    2005-01-01

    For random effects meta-regression inference, variance estimation for the parameter estimates is discussed. Because estimated weights are used for meta-regression analysis in practice, the assumed or estimated covariance matrix used in meta-regression is not strictly correct, due to possible errors in estimating the weights. Therefore, this note investigates the use of a robust variance estimation approach for obtaining variances of the parameter estimates in random effects meta-regression inference. This method treats the assumed covariance matrix of the effect measure variables as a working covariance matrix. Using an example of meta-analysis data from clinical trials of a vaccine, the robust variance estimation approach is illustrated in comparison with two other methods of variance estimation. A simulation study is presented, comparing the three methods of variance estimation in terms of bias and coverage probability. We find that, despite the seeming suitability of the robust estimator for random effects meta-regression, the improved variance estimator of Knapp and Hartung (2003) yields the best performance among the three estimators, and thus may provide the best protection against errors in the estimated weights.

  9. Wavelet regression model in forecasting crude oil price

    NASA Astrophysics Data System (ADS)

    Hamid, Mohd Helmie; Shabri, Ani

    2017-05-01

    This study presents the performance of wavelet multiple linear regression (WMLR) technique in daily crude oil forecasting. WMLR model was developed by integrating the discrete wavelet transform (DWT) and multiple linear regression (MLR) model. The original time series was decomposed to sub-time series with different scales by wavelet theory. Correlation analysis was conducted to assist in the selection of optimal decomposed components as inputs for the WMLR model. The daily WTI crude oil price series has been used in this study to test the prediction capability of the proposed model. The forecasting performance of WMLR model were also compared with regular multiple linear regression (MLR), Autoregressive Moving Average (ARIMA) and Generalized Autoregressive Conditional Heteroscedasticity (GARCH) using root mean square errors (RMSE) and mean absolute errors (MAE). Based on the experimental results, it appears that the WMLR model performs better than the other forecasting technique tested in this study.

  10. Feature Selection for Ridge Regression with Provable Guarantees.

    PubMed

    Paul, Saurabh; Drineas, Petros

    2016-04-01

    We introduce single-set spectral sparsification as a deterministic sampling-based feature selection technique for regularized least-squares classification, which is the classification analog to ridge regression. The method is unsupervised and gives worst-case guarantees of the generalization power of the classification function after feature selection with respect to the classification function obtained using all features. We also introduce leverage-score sampling as an unsupervised randomized feature selection method for ridge regression. We provide risk bounds for both single-set spectral sparsification and leverage-score sampling on ridge regression in the fixed design setting and show that the risk in the sampled space is comparable to the risk in the full-feature space. We perform experiments on synthetic and real-world data sets; a subset of TechTC-300 data sets, to support our theory. Experimental results indicate that the proposed methods perform better than the existing feature selection methods.

  11. Biostatistics Series Module 6: Correlation and Linear Regression.

    PubMed

    Hazra, Avijit; Gogtay, Nithya

    2016-01-01

    Correlation and linear regression are the most commonly used techniques for quantifying the association between two numeric variables. Correlation quantifies the strength of the linear relationship between paired variables, expressing this as a correlation coefficient. If both variables x and y are normally distributed, we calculate Pearson's correlation coefficient ( r ). If normality assumption is not met for one or both variables in a correlation analysis, a rank correlation coefficient, such as Spearman's rho (ρ) may be calculated. A hypothesis test of correlation tests whether the linear relationship between the two variables holds in the underlying population, in which case it returns a P < 0.05. A 95% confidence interval of the correlation coefficient can also be calculated for an idea of the correlation in the population. The value r 2 denotes the proportion of the variability of the dependent variable y that can be attributed to its linear relation with the independent variable x and is called the coefficient of determination. Linear regression is a technique that attempts to link two correlated variables x and y in the form of a mathematical equation ( y = a + bx ), such that given the value of one variable the other may be predicted. In general, the method of least squares is applied to obtain the equation of the regression line. Correlation and linear regression analysis are based on certain assumptions pertaining to the data sets. If these assumptions are not met, misleading conclusions may be drawn. The first assumption is that of linear relationship between the two variables. A scatter plot is essential before embarking on any correlation-regression analysis to show that this is indeed the case. Outliers or clustering within data sets can distort the correlation coefficient value. Finally, it is vital to remember that though strong correlation can be a pointer toward causation, the two are not synonymous.

  12. Biostatistics Series Module 6: Correlation and Linear Regression

    PubMed Central

    Hazra, Avijit; Gogtay, Nithya

    2016-01-01

    Correlation and linear regression are the most commonly used techniques for quantifying the association between two numeric variables. Correlation quantifies the strength of the linear relationship between paired variables, expressing this as a correlation coefficient. If both variables x and y are normally distributed, we calculate Pearson's correlation coefficient (r). If normality assumption is not met for one or both variables in a correlation analysis, a rank correlation coefficient, such as Spearman's rho (ρ) may be calculated. A hypothesis test of correlation tests whether the linear relationship between the two variables holds in the underlying population, in which case it returns a P < 0.05. A 95% confidence interval of the correlation coefficient can also be calculated for an idea of the correlation in the population. The value r2 denotes the proportion of the variability of the dependent variable y that can be attributed to its linear relation with the independent variable x and is called the coefficient of determination. Linear regression is a technique that attempts to link two correlated variables x and y in the form of a mathematical equation (y = a + bx), such that given the value of one variable the other may be predicted. In general, the method of least squares is applied to obtain the equation of the regression line. Correlation and linear regression analysis are based on certain assumptions pertaining to the data sets. If these assumptions are not met, misleading conclusions may be drawn. The first assumption is that of linear relationship between the two variables. A scatter plot is essential before embarking on any correlation-regression analysis to show that this is indeed the case. Outliers or clustering within data sets can distort the correlation coefficient value. Finally, it is vital to remember that though strong correlation can be a pointer toward causation, the two are not synonymous. PMID:27904175

  13. Bootstrap Enhanced Penalized Regression for Variable Selection with Neuroimaging Data

    PubMed Central

    Abram, Samantha V.; Helwig, Nathaniel E.; Moodie, Craig A.; DeYoung, Colin G.; MacDonald, Angus W.; Waller, Niels G.

    2016-01-01

    Recent advances in fMRI research highlight the use of multivariate methods for examining whole-brain connectivity. Complementary data-driven methods are needed for determining the subset of predictors related to individual differences. Although commonly used for this purpose, ordinary least squares (OLS) regression may not be ideal due to multi-collinearity and over-fitting issues. Penalized regression is a promising and underutilized alternative to OLS regression. In this paper, we propose a nonparametric bootstrap quantile (QNT) approach for variable selection with neuroimaging data. We use real and simulated data, as well as annotated R code, to demonstrate the benefits of our proposed method. Our results illustrate the practical potential of our proposed bootstrap QNT approach. Our real data example demonstrates how our method can be used to relate individual differences in neural network connectivity with an externalizing personality measure. Also, our simulation results reveal that the QNT method is effective under a variety of data conditions. Penalized regression yields more stable estimates and sparser models than OLS regression in situations with large numbers of highly correlated neural predictors. Our results demonstrate that penalized regression is a promising method for examining associations between neural predictors and clinically relevant traits or behaviors. These findings have important implications for the growing field of functional connectivity research, where multivariate methods produce numerous, highly correlated brain networks. PMID:27516732

  14. Bootstrap Enhanced Penalized Regression for Variable Selection with Neuroimaging Data.

    PubMed

    Abram, Samantha V; Helwig, Nathaniel E; Moodie, Craig A; DeYoung, Colin G; MacDonald, Angus W; Waller, Niels G

    2016-01-01

    Recent advances in fMRI research highlight the use of multivariate methods for examining whole-brain connectivity. Complementary data-driven methods are needed for determining the subset of predictors related to individual differences. Although commonly used for this purpose, ordinary least squares (OLS) regression may not be ideal due to multi-collinearity and over-fitting issues. Penalized regression is a promising and underutilized alternative to OLS regression. In this paper, we propose a nonparametric bootstrap quantile (QNT) approach for variable selection with neuroimaging data. We use real and simulated data, as well as annotated R code, to demonstrate the benefits of our proposed method. Our results illustrate the practical potential of our proposed bootstrap QNT approach. Our real data example demonstrates how our method can be used to relate individual differences in neural network connectivity with an externalizing personality measure. Also, our simulation results reveal that the QNT method is effective under a variety of data conditions. Penalized regression yields more stable estimates and sparser models than OLS regression in situations with large numbers of highly correlated neural predictors. Our results demonstrate that penalized regression is a promising method for examining associations between neural predictors and clinically relevant traits or behaviors. These findings have important implications for the growing field of functional connectivity research, where multivariate methods produce numerous, highly correlated brain networks.

  15. Regression Models for Identifying Noise Sources in Magnetic Resonance Images

    PubMed Central

    Zhu, Hongtu; Li, Yimei; Ibrahim, Joseph G.; Shi, Xiaoyan; An, Hongyu; Chen, Yashen; Gao, Wei; Lin, Weili; Rowe, Daniel B.; Peterson, Bradley S.

    2009-01-01

    Stochastic noise, susceptibility artifacts, magnetic field and radiofrequency inhomogeneities, and other noise components in magnetic resonance images (MRIs) can introduce serious bias into any measurements made with those images. We formally introduce three regression models including a Rician regression model and two associated normal models to characterize stochastic noise in various magnetic resonance imaging modalities, including diffusion-weighted imaging (DWI) and functional MRI (fMRI). Estimation algorithms are introduced to maximize the likelihood function of the three regression models. We also develop a diagnostic procedure for systematically exploring MR images to identify noise components other than simple stochastic noise, and to detect discrepancies between the fitted regression models and MRI data. The diagnostic procedure includes goodness-of-fit statistics, measures of influence, and tools for graphical display. The goodness-of-fit statistics can assess the key assumptions of the three regression models, whereas measures of influence can isolate outliers caused by certain noise components, including motion artifacts. The tools for graphical display permit graphical visualization of the values for the goodness-of-fit statistic and influence measures. Finally, we conduct simulation studies to evaluate performance of these methods, and we analyze a real dataset to illustrate how our diagnostic procedure localizes subtle image artifacts by detecting intravoxel variability that is not captured by the regression models. PMID:19890478

  16. Standardized Regression Coefficients as Indices of Effect Sizes in Meta-Analysis

    ERIC Educational Resources Information Center

    Kim, Rae Seon

    2011-01-01

    When conducting a meta-analysis, it is common to find many collected studies that report regression analyses, because multiple regression analysis is widely used in many fields. Meta-analysis uses effect sizes drawn from individual studies as a means of synthesizing a collection of results. However, indices of effect size from regression analyses…

  17. Intermediate and advanced topics in multilevel logistic regression analysis.

    PubMed

    Austin, Peter C; Merlo, Juan

    2017-09-10

    Multilevel data occur frequently in health services, population and public health, and epidemiologic research. In such research, binary outcomes are common. Multilevel logistic regression models allow one to account for the clustering of subjects within clusters of higher-level units when estimating the effect of subject and cluster characteristics on subject outcomes. A search of the PubMed database demonstrated that the use of multilevel or hierarchical regression models is increasing rapidly. However, our impression is that many analysts simply use multilevel regression models to account for the nuisance of within-cluster homogeneity that is induced by clustering. In this article, we describe a suite of analyses that can complement the fitting of multilevel logistic regression models. These ancillary analyses permit analysts to estimate the marginal or population-average effect of covariates measured at the subject and cluster level, in contrast to the within-cluster or cluster-specific effects arising from the original multilevel logistic regression model. We describe the interval odds ratio and the proportion of opposed odds ratios, which are summary measures of effect for cluster-level covariates. We describe the variance partition coefficient and the median odds ratio which are measures of components of variance and heterogeneity in outcomes. These measures allow one to quantify the magnitude of the general contextual effect. We describe an R 2 measure that allows analysts to quantify the proportion of variation explained by different multilevel logistic regression models. We illustrate the application and interpretation of these measures by analyzing mortality in patients hospitalized with a diagnosis of acute myocardial infarction. © 2017 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd. © 2017 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd.

  18. An Effect Size for Regression Predictors in Meta-Analysis

    ERIC Educational Resources Information Center

    Aloe, Ariel M.; Becker, Betsy Jane

    2012-01-01

    A new effect size representing the predictive power of an independent variable from a multiple regression model is presented. The index, denoted as r[subscript sp], is the semipartial correlation of the predictor with the outcome of interest. This effect size can be computed when multiple predictor variables are included in the regression model…

  19. Multi-Target Regression via Robust Low-Rank Learning.

    PubMed

    Zhen, Xiantong; Yu, Mengyang; He, Xiaofei; Li, Shuo

    2018-02-01

    Multi-target regression has recently regained great popularity due to its capability of simultaneously learning multiple relevant regression tasks and its wide applications in data mining, computer vision and medical image analysis, while great challenges arise from jointly handling inter-target correlations and input-output relationships. In this paper, we propose Multi-layer Multi-target Regression (MMR) which enables simultaneously modeling intrinsic inter-target correlations and nonlinear input-output relationships in a general framework via robust low-rank learning. Specifically, the MMR can explicitly encode inter-target correlations in a structure matrix by matrix elastic nets (MEN); the MMR can work in conjunction with the kernel trick to effectively disentangle highly complex nonlinear input-output relationships; the MMR can be efficiently solved by a new alternating optimization algorithm with guaranteed convergence. The MMR leverages the strength of kernel methods for nonlinear feature learning and the structural advantage of multi-layer learning architectures for inter-target correlation modeling. More importantly, it offers a new multi-layer learning paradigm for multi-target regression which is endowed with high generality, flexibility and expressive ability. Extensive experimental evaluation on 18 diverse real-world datasets demonstrates that our MMR can achieve consistently high performance and outperforms representative state-of-the-art algorithms, which shows its great effectiveness and generality for multivariate prediction.

  20. Analyzing degradation data with a random effects spline regression model

    DOE PAGES

    Fugate, Michael Lynn; Hamada, Michael Scott; Weaver, Brian Phillip

    2017-03-17

    This study proposes using a random effects spline regression model to analyze degradation data. Spline regression avoids having to specify a parametric function for the true degradation of an item. A distribution for the spline regression coefficients captures the variation of the true degradation curves from item to item. We illustrate the proposed methodology with a real example using a Bayesian approach. The Bayesian approach allows prediction of degradation of a population over time and estimation of reliability is easy to perform.