A unified framework for image retrieval using keyword and visual features.
Jing, Feng; Li, Mingling; Zhang, Hong-Jiang; Zhang, Bo
2005-07-01
In this paper, a unified image retrieval framework based on both keyword annotations and visual features is proposed. In this framework, a set of statistical models are built based on visual features of a small set of manually labeled images to represent semantic concepts and used to propagate keywords to other unlabeled images. These models are updated periodically when more images implicitly labeled by users become available through relevance feedback. In this sense, the keyword models serve the function of accumulation and memorization of knowledge learned from user-provided relevance feedback. Furthermore, two sets of effective and efficient similarity measures and relevance feedback schemes are proposed for query by keyword scenario and query by image example scenario, respectively. Keyword models are combined with visual features in these schemes. In particular, a new, entropy-based active learning strategy is introduced to improve the efficiency of relevance feedback for query by keyword. Furthermore, a new algorithm is proposed to estimate the keyword features of the search concept for query by image example. It is shown to be more appropriate than two existing relevance feedback algorithms. Experimental results demonstrate the effectiveness of the proposed framework.
Jackknife Variance Estimator for Two Sample Linear Rank Statistics
1988-11-01
Accesion For - - ,NTIS GPA&I "TIC TAB Unann c, nc .. [d Keywords: strong consistency; linear rank test’ influence function . i , at L By S- )Distribut...reverse if necessary and identify by block number) FIELD IGROUP SUB-GROUP Strong consistency; linear rank test; influence function . 19. ABSTRACT
Estimation of Lithological Classification in Taipei Basin: A Bayesian Maximum Entropy Method
NASA Astrophysics Data System (ADS)
Wu, Meng-Ting; Lin, Yuan-Chien; Yu, Hwa-Lung
2015-04-01
In environmental or other scientific applications, we must have a certain understanding of geological lithological composition. Because of restrictions of real conditions, only limited amount of data can be acquired. To find out the lithological distribution in the study area, many spatial statistical methods used to estimate the lithological composition on unsampled points or grids. This study applied the Bayesian Maximum Entropy (BME method), which is an emerging method of the geological spatiotemporal statistics field. The BME method can identify the spatiotemporal correlation of the data, and combine not only the hard data but the soft data to improve estimation. The data of lithological classification is discrete categorical data. Therefore, this research applied Categorical BME to establish a complete three-dimensional Lithological estimation model. Apply the limited hard data from the cores and the soft data generated from the geological dating data and the virtual wells to estimate the three-dimensional lithological classification in Taipei Basin. Keywords: Categorical Bayesian Maximum Entropy method, Lithological Classification, Hydrogeological Setting
Interest in Anesthesia as Reflected by Keyword Searches using Common Search Engines
Liu, Renyu; García, Paul S.; Fleisher, Lee A.
2012-01-01
Background Since current general interest in anesthesia is unknown, we analyzed internet keyword searches to gauge general interest in anesthesia in comparison with surgery and pain. Methods The trend of keyword searches from 2004 to 2010 related to anesthesia and anaesthesia was investigated using Google Insights for Search. The trend of number of peer reviewed articles on anesthesia cited on PubMed and Medline from 2004 to 2010 was investigated. The average cost on advertising on anesthesia, surgery and pain was estimated using Google AdWords. Searching results in other common search engines were also analyzed. Correlation between year and relative number of searches was determined with p< 0.05 considered statistically significant. Results Searches for the keyword “anesthesia” or “anaesthesia” diminished since 2004 reflected by Google Insights for Search (p< 0.05). The search for “anesthesia side effects” is trending up over the same time period while the search for “anesthesia and safety” is trending down. The search phrase “before anesthesia” is searched more frequently than “preanesthesia” and the search for “before anesthesia” is trending up. Using “pain” as a keyword is steadily increasing over the years indicated. While different search engines may provide different total number of searching results (available posts), the ratios of searching results between some common keywords related to perioperative care are comparable, indicating similar trend. The peer reviewed manuscripts on “anesthesia” and the proportion of papers on “anesthesia and outcome” are trending up. Estimates for spending of advertising dollars are less for anesthesia-related terms when compared to that for pain or surgery due to relative smaller number of searching traffic. Conclusions General interest in anesthesia (anaesthesia) as measured by internet searches appears to be decreasing. Pain, preanesthesia evaluation, anesthesia and outcome and side effects of anesthesia are the critical areas that anesthesiologists should focus on to address the increasing concerns. PMID:23853739
Interest in Anesthesia as Reflected by Keyword Searches using Common Search Engines.
Liu, Renyu; García, Paul S; Fleisher, Lee A
2012-01-23
Since current general interest in anesthesia is unknown, we analyzed internet keyword searches to gauge general interest in anesthesia in comparison with surgery and pain. The trend of keyword searches from 2004 to 2010 related to anesthesia and anaesthesia was investigated using Google Insights for Search. The trend of number of peer reviewed articles on anesthesia cited on PubMed and Medline from 2004 to 2010 was investigated. The average cost on advertising on anesthesia, surgery and pain was estimated using Google AdWords. Searching results in other common search engines were also analyzed. Correlation between year and relative number of searches was determined with p< 0.05 considered statistically significant. Searches for the keyword "anesthesia" or "anaesthesia" diminished since 2004 reflected by Google Insights for Search (p< 0.05). The search for "anesthesia side effects" is trending up over the same time period while the search for "anesthesia and safety" is trending down. The search phrase "before anesthesia" is searched more frequently than "preanesthesia" and the search for "before anesthesia" is trending up. Using "pain" as a keyword is steadily increasing over the years indicated. While different search engines may provide different total number of searching results (available posts), the ratios of searching results between some common keywords related to perioperative care are comparable, indicating similar trend. The peer reviewed manuscripts on "anesthesia" and the proportion of papers on "anesthesia and outcome" are trending up. Estimates for spending of advertising dollars are less for anesthesia-related terms when compared to that for pain or surgery due to relative smaller number of searching traffic. General interest in anesthesia (anaesthesia) as measured by internet searches appears to be decreasing. Pain, preanesthesia evaluation, anesthesia and outcome and side effects of anesthesia are the critical areas that anesthesiologists should focus on to address the increasing concerns.
Kurtosis Approach Nonlinear Blind Source Separation
NASA Technical Reports Server (NTRS)
Duong, Vu A.; Stubbemd, Allen R.
2005-01-01
In this paper, we introduce a new algorithm for blind source signal separation for post-nonlinear mixtures. The mixtures are assumed to be linearly mixed from unknown sources first and then distorted by memoryless nonlinear functions. The nonlinear functions are assumed to be smooth and can be approximated by polynomials. Both the coefficients of the unknown mixing matrix and the coefficients of the approximated polynomials are estimated by the gradient descent method conditional on the higher order statistical requirements. The results of simulation experiments presented in this paper demonstrate the validity and usefulness of our approach for nonlinear blind source signal separation Keywords: Independent Component Analysis, Kurtosis, Higher order statistics.
A Statistical Ontology-Based Approach to Ranking for Multiword Search
ERIC Educational Resources Information Center
Kim, Jinwoo
2013-01-01
Keyword search is a prominent data retrieval method for the Web, largely because the simple and efficient nature of keyword processing allows a large amount of information to be searched with fast response. However, keyword search approaches do not formally capture the clear meaning of a keyword query and fail to address the semantic relationships…
2016-01-01
Background After the Fukushima Dai-ichi Nuclear Power Station accident in Japan on March 11, 2011, a large number of comments, both positive and negative, were posted on social media. Objective The objective of this study was to clarify the characteristics of the trend in the number of tweets posted on Twitter, and to estimate how long public concern regarding the accident continued. We surveyed the attenuation period of the first term occurrence related to radiation exposure as a surrogate endpoint for the duration of concern. Methods We retrieved 18,891,284 tweets from Twitter data between March 11, 2011 and March 10, 2012, containing 143 variables in Japanese. We selected radiation, radioactive, Sievert (Sv), Becquerel (Bq), and gray (Gy) as keywords to estimate the attenuation period of public concern regarding radiation exposure. These data, formatted as comma-separated values, were transferred into a Statistical Analysis System (SAS) dataset for analysis, and survival analysis methodology was followed using the SAS LIFETEST procedure. This study was approved by the institutional review board of Hokkaido University and informed consent was waived. Results A Kaplan-Meier curve was used to show the rate of Twitter users posting a message after the accident that included one or more of the keywords. The term Sv occurred in tweets up to one year after the first tweet. Among the Twitter users studied, 75.32% (880,108/1,168,542) tweeted the word radioactive and 9.20% (107,522/1,168,542) tweeted the term Sv. The first reduction was observed within the first 7 days after March 11, 2011. The means and standard errors (SEs) of the duration from the first tweet on March 11, 2011 were 31.9 days (SE 0.096) for radioactive and 300.6 days (SE 0.181) for Sv. These keywords were still being used at the end of the study period. The mean attenuation period for radioactive was one month, and approximately one year for radiation and radiation units. The difference in mean duration between the keywords was attributed to the effect of mass media. Regularly posted messages, such as daily radiation dose reports, were relatively easy to detect from their time and formatted contents. The survival estimation indicated that public concern about the nuclear power plant accident remained after one year. Conclusions Although the simple plot of the number of tweets did not show clear results, we estimated the mean attenuation period as approximately one month for the keyword radioactive, and found that the keywords were still being used in posts at the end of the study period. Further research is required to quantify the effect of other phrases in social media data. The results of this exploratory study should advance progress in influencing and quantifying the communication of risk. PMID:27888168
Nishimoto, Naoki; Ota, Mizuki; Yagahara, Ayako; Ogasawara, Katsuhiko
2016-11-25
After the Fukushima Dai-ichi Nuclear Power Station accident in Japan on March 11, 2011, a large number of comments, both positive and negative, were posted on social media. The objective of this study was to clarify the characteristics of the trend in the number of tweets posted on Twitter, and to estimate how long public concern regarding the accident continued. We surveyed the attenuation period of the first term occurrence related to radiation exposure as a surrogate endpoint for the duration of concern. We retrieved 18,891,284 tweets from Twitter data between March 11, 2011 and March 10, 2012, containing 143 variables in Japanese. We selected radiation, radioactive, Sievert (Sv), Becquerel (Bq), and gray (Gy) as keywords to estimate the attenuation period of public concern regarding radiation exposure. These data, formatted as comma-separated values, were transferred into a Statistical Analysis System (SAS) dataset for analysis, and survival analysis methodology was followed using the SAS LIFETEST procedure. This study was approved by the institutional review board of Hokkaido University and informed consent was waived. A Kaplan-Meier curve was used to show the rate of Twitter users posting a message after the accident that included one or more of the keywords. The term Sv occurred in tweets up to one year after the first tweet. Among the Twitter users studied, 75.32% (880,108/1,168,542) tweeted the word radioactive and 9.20% (107,522/1,168,542) tweeted the term Sv. The first reduction was observed within the first 7 days after March 11, 2011. The means and standard errors (SEs) of the duration from the first tweet on March 11, 2011 were 31.9 days (SE 0.096) for radioactive and 300.6 days (SE 0.181) for Sv. These keywords were still being used at the end of the study period. The mean attenuation period for radioactive was one month, and approximately one year for radiation and radiation units. The difference in mean duration between the keywords was attributed to the effect of mass media. Regularly posted messages, such as daily radiation dose reports, were relatively easy to detect from their time and formatted contents. The survival estimation indicated that public concern about the nuclear power plant accident remained after one year. Although the simple plot of the number of tweets did not show clear results, we estimated the mean attenuation period as approximately one month for the keyword radioactive, and found that the keywords were still being used in posts at the end of the study period. Further research is required to quantify the effect of other phrases in social media data. The results of this exploratory study should advance progress in influencing and quantifying the communication of risk. ©Naoki Nishimoto, Mizuki Ota, Ayako Yagahara, Katsuhiko Ogasawara. Originally published in JMIR Public Health and Surveillance (http://publichealth.jmir.org), 25.11.2016.
Evaluation of an Automated Keywording System.
ERIC Educational Resources Information Center
Malone, Linda C.; And Others
1990-01-01
Discussion of automated indexing techniques focuses on ways to statistically document improvements in the development of an automated keywording system over time. The system developed by the Joint Chiefs of Staff to automate the storage, categorization, and retrieval of information from military exercises is explained, and performance measures are…
Markers of data quality in computer audit: the Manchester Orthopaedic Database.
Ricketts, D; Newey, M; Patterson, M; Hitchin, D; Fowler, S
1993-11-01
This study investigates the efficiency of the Manchester Orthopaedic Database (MOD), a computer software package for record collection and audit. Data is entered into the system in the form of diagnostic, operative and complication keywords. We have calculated the completeness, accuracy and quality (completeness x accuracy) of keyword data in the MOD in two departments of orthopaedics (Departments A and B). In each department, 100 sets of inpatient notes were reviewed. Department B obtained results which were significantly better than those in A at the 5% level. We attribute this to the presence of a systems coordinator to motivate and organise the team for audit. Senior and junior staff did not differ significantly with respect to completeness, accuracy and quality measures, but locum junior staff recorded data with a quality of 0%. Statistically, the biggest difference between the departments was the quality of operation keywords. Sample sizes were too small to permit effective statistical comparisons between the quality of complication keywords. In both departments, however, the poorest quality data was seen in complication keywords. The low complication keyword completeness contributed to this; on average, the true complication rate (39%) was twice the recorded complication rate (17%). In the recent Royal College of Surgeons of England Confidential Comparative Audit, the recorded complication rate was 4.7%. In the light of the above findings, we suggest that the true complication rate of the RCS CCA should approach 9%.
Estimation Methods for Non-Homogeneous Regression - Minimum CRPS vs Maximum Likelihood
NASA Astrophysics Data System (ADS)
Gebetsberger, Manuel; Messner, Jakob W.; Mayr, Georg J.; Zeileis, Achim
2017-04-01
Non-homogeneous regression models are widely used to statistically post-process numerical weather prediction models. Such regression models correct for errors in mean and variance and are capable to forecast a full probability distribution. In order to estimate the corresponding regression coefficients, CRPS minimization is performed in many meteorological post-processing studies since the last decade. In contrast to maximum likelihood estimation, CRPS minimization is claimed to yield more calibrated forecasts. Theoretically, both scoring rules used as an optimization score should be able to locate a similar and unknown optimum. Discrepancies might result from a wrong distributional assumption of the observed quantity. To address this theoretical concept, this study compares maximum likelihood and minimum CRPS estimation for different distributional assumptions. First, a synthetic case study shows that, for an appropriate distributional assumption, both estimation methods yield to similar regression coefficients. The log-likelihood estimator is slightly more efficient. A real world case study for surface temperature forecasts at different sites in Europe confirms these results but shows that surface temperature does not always follow the classical assumption of a Gaussian distribution. KEYWORDS: ensemble post-processing, maximum likelihood estimation, CRPS minimization, probabilistic temperature forecasting, distributional regression models
Quantitative analysis of the evolution of novelty in cinema through crowdsourced keywords.
Sreenivasan, Sameet
2013-09-26
The generation of novelty is central to any creative endeavor. Novelty generation and the relationship between novelty and individual hedonic value have long been subjects of study in social psychology. However, few studies have utilized large-scale datasets to quantitatively investigate these issues. Here we consider the domain of American cinema and explore these questions using a database of films spanning a 70 year period. We use crowdsourced keywords from the Internet Movie Database as a window into the contents of films, and prescribe novelty scores for each film based on occurrence probabilities of individual keywords and keyword-pairs. These scores provide revealing insights into the dynamics of novelty in cinema. We investigate how novelty influences the revenue generated by a film, and find a relationship that resembles the Wundt-Berlyne curve. We also study the statistics of keyword occurrence and the aggregate distribution of keywords over a 100 year period.
Quantitative analysis of the evolution of novelty in cinema through crowdsourced keywords
Sreenivasan, Sameet
2013-01-01
The generation of novelty is central to any creative endeavor. Novelty generation and the relationship between novelty and individual hedonic value have long been subjects of study in social psychology. However, few studies have utilized large-scale datasets to quantitatively investigate these issues. Here we consider the domain of American cinema and explore these questions using a database of films spanning a 70 year period. We use crowdsourced keywords from the Internet Movie Database as a window into the contents of films, and prescribe novelty scores for each film based on occurrence probabilities of individual keywords and keyword-pairs. These scores provide revealing insights into the dynamics of novelty in cinema. We investigate how novelty influences the revenue generated by a film, and find a relationship that resembles the Wundt-Berlyne curve. We also study the statistics of keyword occurrence and the aggregate distribution of keywords over a 100 year period. PMID:24067890
Rear-End Crashes: Problem Size Assessment And Statistical Description
DOT National Transportation Integrated Search
1993-05-01
KEYWORDS : RESEARCH AND DEVELOPMENT OR R&D, ADVANCED VEHICLE CONTROL & SAFETY SYSTEMS OR AVCSS, INTELLIGENT VEHICLE INITIATIVE OR IVI : THIS DOCUMENT PRESENTS PROBLEM SIZE ASSESSMENTS AND STATISTICAL CRASH DESCRIPTION FOR REAR-END CRASHES, INC...
ERIC Educational Resources Information Center
Aharony, Noa
2012-01-01
The current study seeks to describe and analyze journal research publications in the top 10 Library and Information Science journals from 2007-8. The paper presents a statistical descriptive analysis of authorship patterns (geographical distribution and affiliation) and keywords. Furthermore, it displays a thorough content analysis of keywords and…
Text mining by Tsallis entropy
NASA Astrophysics Data System (ADS)
Jamaati, Maryam; Mehri, Ali
2018-01-01
Long-range correlations between the elements of natural languages enable them to convey very complex information. Complex structure of human language, as a manifestation of natural languages, motivates us to apply nonextensive statistical mechanics in text mining. Tsallis entropy appropriately ranks the terms' relevance to document subject, taking advantage of their spatial correlation length. We apply this statistical concept as a new powerful word ranking metric in order to extract keywords of a single document. We carry out an experimental evaluation, which shows capability of the presented method in keyword extraction. We find that, Tsallis entropy has reliable word ranking performance, at the same level of the best previous ranking methods.
NASA Astrophysics Data System (ADS)
Li, Huajiao; An, Haizhong; Wang, Yue; Huang, Jiachen; Gao, Xiangyun
2016-05-01
Keeping abreast of trends in the articles and rapidly grasping a body of article's key points and relationship from a holistic perspective is a new challenge in both literature research and text mining. As the important component, keywords can present the core idea of the academic article. Usually, articles on a single theme or area could share one or some same keywords, and we can analyze topological features and evolution of the articles co-keyword networks and keywords co-occurrence networks to realize the in-depth analysis of the articles. This paper seeks to integrate statistics, text mining, complex networks and visualization to analyze all of the academic articles on one given theme, complex network(s). All 5944 ;complex networks; articles that were published between 1990 and 2013 and are available on the Web of Science are extracted. Based on the two-mode affiliation network theory, a new frontier of complex networks, we constructed two different networks, one taking the articles as nodes, the co-keyword relationships as edges and the quantity of co-keywords as the weight to construct articles co-keyword network, and another taking the articles' keywords as nodes, the co-occurrence relationships as edges and the quantity of simultaneous co-occurrences as the weight to construct keyword co-occurrence network. An integrated method for analyzing the topological features and evolution of the articles co-keyword network and keywords co-occurrence networks is proposed, and we also defined a new function to measure the innovation coefficient of the articles in annual level. This paper provides a useful tool and process for successfully achieving in-depth analysis and rapid understanding of the trends and relationships of articles in a holistic perspective.
Direction of Arrival Estimation Using a Reconfigurable Array
2005-05-06
civilian world. Keywords: Direction-of-arrival Estimation MUSIC algorithm Reconfigurable Array Experimental Created by Neevia Personal...14. SUBJECT TERMS: Direction-of-arrival ; Estimation ; MUSIC algorithm ; Reconfigurable ; Array ; Experimental 16. PRICE CODE 17...9 1.5 MuSiC Algorithm
Level statistics of words: Finding keywords in literary texts and symbolic sequences
NASA Astrophysics Data System (ADS)
Carpena, P.; Bernaola-Galván, P.; Hackenberg, M.; Coronado, A. V.; Oliver, J. L.
2009-03-01
Using a generalization of the level statistics analysis of quantum disordered systems, we present an approach able to extract automatically keywords in literary texts. Our approach takes into account not only the frequencies of the words present in the text but also their spatial distribution along the text, and is based on the fact that relevant words are significantly clustered (i.e., they self-attract each other), while irrelevant words are distributed randomly in the text. Since a reference corpus is not needed, our approach is especially suitable for single documents for which no a priori information is available. In addition, we show that our method works also in generic symbolic sequences (continuous texts without spaces), thus suggesting its general applicability.
Modeling for Ultrasonic Health Monitoring of Foams with Embedded Sensors
NASA Technical Reports Server (NTRS)
Wang, L.; Rokhlin, S. I.; Rokhlin, Stanislav, I.
2005-01-01
In this report analytical and numerical methods are proposed to estimate the effective elastic properties of regular and random open-cell foams. The methods are based on the principle of minimum energy and on structural beam models. The analytical solutions are obtained using symbolic processing software. The microstructure of the random foam is simulated using Voronoi tessellation together with a rate-dependent random close-packing algorithm. The statistics of the geometrical properties of random foams corresponding to different packing fractions have been studied. The effects of the packing fraction on elastic properties of the foams have been investigated by decomposing the compliance into bending and axial compliance components. It is shown that the bending compliance increases and the axial compliance decreases when the packing fraction increases. Keywords: Foam; Elastic properties; Finite element; Randomness
Computer-Automated Approach for Scoring Short Essays in an Introductory Statistics Course
ERIC Educational Resources Information Center
Zimmerman, Whitney Alicia; Kang, Hyun Bin; Kim, Kyung; Gao, Mengzhao; Johnson, Glenn; Clariana, Roy; Zhang, Fan
2018-01-01
Over two semesters short essay prompts were developed for use with the Graphical Interface for Knowledge Structure (GIKS), an automated essay scoring system. Participants were students in an undergraduate-level online introductory statistics course. The GIKS compares students' writing samples with an expert's to produce keyword occurrence and…
Intelligent Information Retrieval for a Multimedia Database Using Captions
1992-07-23
The user was allowed to retrieve any of several multimedia types depending on the descriptors entered. An example mentioned was the assembly of a...statistics showed some performance improvements over a keyword search. Similar type work was described by Wong eL al (1987) where a vector space representation...keyword) lists for searching the lexicon (a syntactic parser is not used); a type hierarchy of terms was used in the process. The system then checked the
ERIC Educational Resources Information Center
Brooker, Heather Rogers
2013-01-01
It is estimated that nearly 70% of high school students in the United States need some form of reading remediation, with the most common need being the ability to comprehend the content and significance of the text (Biancarosa & Snow, 2004). Research findings support the use of visual imagery and keyword cues as effective comprehension…
Epidemiology of brucellosis in Iran: A comprehensive systematic review and meta-analysis study.
Mirnejad, Reza; Jazi, Faramarz Masjedian; Mostafaei, Shayan; Sedighi, Mansour
2017-08-01
Brucellosis is still one of the most challenging issues for health and the economy in many developing countries such as Iran. Considering the high prevalence of brucellosis, the aim of the current study was to systematically review published data about the annual incidence rate of this infection from different parts of Iran and provide an overall relative frequency (RF) for Iran using meta-analysis. We searched several databases including PubMed, ISI Web of Science, Scopus, google scholar, IranMedex and Iranian Scientific Information Database (SID) by using the following keywords: "Brucella", "Brucellosis", "Malta fever", "Mediterranean fever", "undulant fever", "zoonosis" and "Iran" in Title/Abstract/Keywords fields. Articles/Abstracts, which used clinical specimens and reported the incidence of brucellosis, were included in this review. Quality of studies was assessed by STROB and PRISMA forms. All statistical analyses were performed using STATA 11.0 (STATA Corp, College Station, TX) and P-values under 0.05 were considered statistically significant. Out of the 8326 results, we found 34 articles suitable, according to inclusion and exlusion criteria, for inclusion in this systematic review and meta-analysis. The pooled incidence of brucellosis was estimated 0.001% (95% confidence interval (CI) = 0.0005-0.0015%) annually. Relative frequency of brucellosis in different studies varied from 7.0/100000 to 276.41/100000 in Qom and Kermanshah provinces, respectively. This systematic-review and meta-analysis study showed that the highest incidences of brucellosis are occurred in west and northwest regions of Iran. Totally, the incidence of the disease in Iran is in the high range. Copyright © 2017 Elsevier Ltd. All rights reserved.
1985-12-01
consists of the node t and all descendants of t in T. (3) Definition 3. Pruning a branch Tt from a tree T con- sists of deleting from T all...The default is 1.0 so that actually, this keyword did not need to appear in the above file. (5) DELETE . This keyword does not appear in our example, but...when it is used associated with some variable names, it indicates that we want to delete these vari- ables from the regression. If this keyword is
Comparing the Hierarchy of Keywords in On-Line News Portals
Tibély, Gergely; Sousa-Rodrigues, David; Pollner, Péter; Palla, Gergely
2016-01-01
Hierarchical organization is prevalent in networks representing a wide range of systems in nature and society. An important example is given by the tag hierarchies extracted from large on-line data repositories such as scientific publication archives, file sharing portals, blogs, on-line news portals, etc. The tagging of the stored objects with informative keywords in such repositories has become very common, and in most cases the tags on a given item are free words chosen by the authors independently. Therefore, the relations among keywords appearing in an on-line data repository are unknown in general. However, in most cases the topics and concepts described by these keywords are forming a latent hierarchy, with the more general topics and categories at the top, and more specialized ones at the bottom. There are several algorithms available for deducing this hierarchy from the statistical features of the keywords. In the present work we apply a recent, co-occurrence-based tag hierarchy extraction method to sets of keywords obtained from four different on-line news portals. The resulting hierarchies show substantial differences not just in the topics rendered as important (being at the top of the hierarchy) or of less interest (categorized low in the hierarchy), but also in the underlying network structure. This reveals discrepancies between the plausible keyword association frameworks in the studied news portals. PMID:27802319
Comparing the Hierarchy of Keywords in On-Line News Portals.
Tibély, Gergely; Sousa-Rodrigues, David; Pollner, Péter; Palla, Gergely
2016-01-01
Hierarchical organization is prevalent in networks representing a wide range of systems in nature and society. An important example is given by the tag hierarchies extracted from large on-line data repositories such as scientific publication archives, file sharing portals, blogs, on-line news portals, etc. The tagging of the stored objects with informative keywords in such repositories has become very common, and in most cases the tags on a given item are free words chosen by the authors independently. Therefore, the relations among keywords appearing in an on-line data repository are unknown in general. However, in most cases the topics and concepts described by these keywords are forming a latent hierarchy, with the more general topics and categories at the top, and more specialized ones at the bottom. There are several algorithms available for deducing this hierarchy from the statistical features of the keywords. In the present work we apply a recent, co-occurrence-based tag hierarchy extraction method to sets of keywords obtained from four different on-line news portals. The resulting hierarchies show substantial differences not just in the topics rendered as important (being at the top of the hierarchy) or of less interest (categorized low in the hierarchy), but also in the underlying network structure. This reveals discrepancies between the plausible keyword association frameworks in the studied news portals.
Patient's Guide to Arrhythmogenic Right Ventricular Dysplasia/Cardiomyopathy
Skip to main content American Heart Association Science Volunteer Warning Signs Search for this keyword Search Advanced Search Donate Home About this Journal Editorial Board General Statistics Circulation Doodle Doodle Gallery ...
2013-01-01
Background Primary care databases are a major source of data for epidemiological and health services research. However, most studies are based on coded information, ignoring information stored in free text. Using the early presentation of rheumatoid arthritis (RA) as an exemplar, our objective was to estimate the extent of data hidden within free text, using a keyword search. Methods We examined the electronic health records (EHRs) of 6,387 patients from the UK, aged 30 years and older, with a first coded diagnosis of RA between 2005 and 2008. We listed indicators for RA which were present in coded format and ran keyword searches for similar information held in free text. The frequency of indicator code groups and keywords from one year before to 14 days after RA diagnosis were compared, and temporal relationships examined. Results One or more keyword for RA was found in the free text in 29% of patients prior to the RA diagnostic code. Keywords for inflammatory arthritis diagnoses were present for 14% of patients whereas only 11% had a diagnostic code. Codes for synovitis were found in 3% of patients, but keywords were identified in an additional 17%. In 13% of patients there was evidence of a positive rheumatoid factor test in text only, uncoded. No gender differences were found. Keywords generally occurred close in time to the coded diagnosis of rheumatoid arthritis. They were often found under codes indicating letters and communications. Conclusions Potential cases may be missed or wrongly dated when coded data alone are used to identify patients with RA, as diagnostic suspicions are frequently confined to text. The use of EHRs to create disease registers or assess quality of care will be misleading if free text information is not taken into account. Methods to facilitate the automated processing of text need to be developed and implemented. PMID:23964710
Xie, Hongbo; Vucetic, Slobodan; Iakoucheva, Lilia M.; Oldfield, Christopher J.; Dunker, A. Keith; Uversky, Vladimir N.; Obradovic, Zoran
2008-01-01
Identifying relationships between function, amino acid sequence and protein structure represents a major challenge. In this study we propose a bioinformatics approach that identifies functional keywords in the Swiss-Prot database that correlate with intrinsic disorder. A statistical evaluation is employed to rank the significance of these correlations. Protein sequence data redundancy and the relationship between protein length and protein structure were taken into consideration to ensure the quality of the statistical inferences. Over 200,000 proteins from Swiss-Prot database were analyzed using this approach. The predictions of intrinsic disorder were carried out using PONDR VL3E predictor of long disordered regions that achieves an accuracy of above 86%. Overall, out of the 710 Swiss-Prot functional keywords that were each associated with at least 20 proteins, 238 were found to be strongly positively correlated with predicted long intrinsically disordered regions, whereas 302 were strongly negatively correlated with such regions. The remaining 170 keywords were ambiguous without strong positive or negative correlation with the disorder predictions. These functions cover a large variety of biological activities and imply that disordered regions are characterized by a wide functional repertoire. Our results agree well with literature findings, as we were able to find at least one illustrative example of functional disorder or order shown experimentally for the vast majority of keywords showing the strongest positive or negative correlation with intrinsic disorder. This work opens a series of three papers, which enriches the current view of protein structure-function relationships, especially with regards to functionalities of intrinsically disordered proteins and provides researchers with a novel tool that could be used to improve the understanding of the relationships between protein structure and function. The first paper of the series describes our statistical approach, outlines the major findings and provides illustrative examples of biological processes and functions positively and negatively correlated with intrinsic disorder. PMID:17391014
Xie, Hongbo; Vucetic, Slobodan; Iakoucheva, Lilia M; Oldfield, Christopher J; Dunker, A Keith; Uversky, Vladimir N; Obradovic, Zoran
2007-05-01
Identifying relationships between function, amino acid sequence, and protein structure represents a major challenge. In this study, we propose a bioinformatics approach that identifies functional keywords in the Swiss-Prot database that correlate with intrinsic disorder. A statistical evaluation is employed to rank the significance of these correlations. Protein sequence data redundancy and the relationship between protein length and protein structure were taken into consideration to ensure the quality of the statistical inferences. Over 200,000 proteins from the Swiss-Prot database were analyzed using this approach. The predictions of intrinsic disorder were carried out using PONDR VL3E predictor of long disordered regions that achieves an accuracy of above 86%. Overall, out of the 710 Swiss-Prot functional keywords that were each associated with at least 20 proteins, 238 were found to be strongly positively correlated with predicted long intrinsically disordered regions, whereas 302 were strongly negatively correlated with such regions. The remaining 170 keywords were ambiguous without strong positive or negative correlation with the disorder predictions. These functions cover a large variety of biological activities and imply that disordered regions are characterized by a wide functional repertoire. Our results agree well with literature findings, as we were able to find at least one illustrative example of functional disorder or order shown experimentally for the vast majority of keywords showing the strongest positive or negative correlation with intrinsic disorder. This work opens a series of three papers, which enriches the current view of protein structure-function relationships, especially with regards to functionalities of intrinsically disordered proteins, and provides researchers with a novel tool that could be used to improve the understanding of the relationships between protein structure and function. The first paper of the series describes our statistical approach, outlines the major findings, and provides illustrative examples of biological processes and functions positively and negatively correlated with intrinsic disorder.
Keyword extraction by nonextensivity measure.
Mehri, Ali; Darooneh, Amir H
2011-05-01
The presence of a long-range correlation in the spatial distribution of a relevant word type, in spite of random occurrences of an irrelevant word type, is an important feature of human-written texts. We classify the correlation between the occurrences of words by nonextensive statistical mechanics for the word-ranking process. In particular, we look at the nonextensivity parameter as an alternative metric to measure the spatial correlation in the text, from which the words may be ranked in terms of this measure. Finally, we compare different methods for keyword extraction. © 2011 American Physical Society
An, Ruopeng; Ji, Mengmeng; Zhang, Sheng
2017-11-01
We reviewed scientific literature regarding the effectiveness of social media-based interventions about weight-related behaviors and body weight status. A keyword search were performed in May 2017 in the Clinical-Trials.gov, Cochrane Library, PsycINFO, PubMed, and Web of Science databases. We conducted a meta-analysis to estimate the pooled effect size of social media-based interventions on weight-related outcome measures. We identified 22 interventions from the keyword and reference search, including 12 randomized controlled trials, 6 pre-post studies and 3 cohort studies conducted in 9 countries during 2010-2016. The majority (N = 17) used Facebook, followed by Twitter (N = 4) and Instagram (N = 1). Intervention durations averaged 17.8 weeks with a mean sample size of 69. The meta-analysis showed that social media-based interventions were associated with a statistically significant, but clinically modest reduction of body weight by 1.01 kg, body mass index by 0.92 kg/m2, and waist circumstance by 2.65 cm, and an increase of daily number of steps taken by 1530. In the meta-regression there was no doseresponse effect with respect to intervention duration. The boom of social media provides an unprecedented opportunity to implement health promotion programs. Future interventions should make efforts to improve intervention scalability and effectiveness.
Parsaei-Mohammadi, Parastoo; Ghasemi, Ali Hossein; Hassanzadeh-Beheshtabad, Raziyeh
2017-01-01
Introduction: In the present era, thesauri as tools in indexing play an effective role in integrating retrieval preventing fragmentation as well as a multiplicity of terminologies and also in providing information content of documents. Goals: This study aimed to investigate the keywords of articles indexed in IranMedex in terms of origin, structure and indexing situation and their Compliance with the Persian Medical Thesaurus and Medical Subject Headings (MeSH). Materials and Methods: This study is an applied research, and a survey has been conducted. Statistical population includes 32,850 Persian articles which are indexed in the IranMedex during the years 1385–1391. 379 cases were selected as sample of the study. Data collection was done using a checklist. In analyzing the findings, the SPSS Software were used. Findings: Although there was no significant difference in terms of indexing origin between the proportion of different types of the Persian and English keywords of articles indexed in the IranMedex, the compliance rates of the Persian and English keywords with the Persian medical thesaurus and MeSH were different in different years. In the meantime, the structure of keywords is leaning more towards phrase structure, and a single word structure and the majority of keywords are selected from the titles and abstracts. Conclusion: The authors’ familiarity with the thesauri and controlled tools causes homogeneity in assigning keywords and also provides more precise, faster, and easier retrieval of the keywords. It's suggested that a mixture of natural and control languages to be used in this database in order to reach more comprehensive results. PMID:28546967
Parsaei-Mohammadi, Parastoo; Ghasemi, Ali Hossein; Hassanzadeh-Beheshtabad, Raziyeh
2017-01-01
In the present era, thesauri as tools in indexing play an effective role in integrating retrieval preventing fragmentation as well as a multiplicity of terminologies and also in providing information content of documents. This study aimed to investigate the keywords of articles indexed in IranMedex in terms of origin, structure and indexing situation and their Compliance with the Persian Medical Thesaurus and Medical Subject Headings (MeSH). This study is an applied research, and a survey has been conducted. Statistical population includes 32,850 Persian articles which are indexed in the IranMedex during the years 1385-1391. 379 cases were selected as sample of the study. Data collection was done using a checklist. In analyzing the findings, the SPSS Software were used. Although there was no significant difference in terms of indexing origin between the proportion of different types of the Persian and English keywords of articles indexed in the IranMedex, the compliance rates of the Persian and English keywords with the Persian medical thesaurus and MeSH were different in different years. In the meantime, the structure of keywords is leaning more towards phrase structure, and a single word structure and the majority of keywords are selected from the titles and abstracts. The authors' familiarity with the thesauri and controlled tools causes homogeneity in assigning keywords and also provides more precise, faster, and easier retrieval of the keywords. It's suggested that a mixture of natural and control languages to be used in this database in order to reach more comprehensive results.
IDA Cost Research Symposium Held 25 May 1995.
1995-08-01
Excel Spreadsheet Publications: MCR Report TR-9507/01 Category: II.B Keywords: Government, Estimating, Missiles, Analysis, Production, Data...originally developed by Martin Marietta as part of SASET software estimating model. To be implemented as part of SoftEST Software Estimating Tool...following documents to report the results of Its work. Reports Reports are the most authoritative and most carefully considered products IDA
Battery Power Management in Heavy-duty HEVs based on the Estimated Critical Surface Charge
2011-03-01
health prospects without any penalty on fuel efficiency. Keywords: Lithium - ion battery ; power management; critical surface charge; Lithium-ion...fuel efficiency. 15. SUBJECT TERMS Lithium - ion battery ; power management; critical surface charge; Lithium-ion concentration; estimation; extended...Di Domenico, D., Fiengo, G., and Stefanopoulou, A. (2008) ’ Lithium - ion battery state of charge estimation with a kalman filter based on a
NASA Astrophysics Data System (ADS)
Wilson, B.; Paradise, T. R.
2016-12-01
The influx of millions of Syrian refugees into Turkey has rapidly changed the population distribution along the Dead Sea Rift and East Anatolian Fault zones. In contrast to other countries in the Middle East where refugees are accommodated in camp environments, the majority of displaced individuals in Turkey are integrated into cities, towns, and villages—placing stress on urban settings and increasing potential exposure to strong shaking. Yet, displaced populations are not traditionally captured in data sources used in earthquake risk analysis or loss estimations. Accordingly, we present a district-level analysis assessing the spatial overlap of earthquake hazards and refugee locations in southeastern Turkey to determine how migration patterns are altering seismic risk in the region. Using migration estimates from the U.S. Humanitarian Information Unit, we create three district-level population scenarios that combine official population statistics, refugee camp populations, and low, median, and high bounds for integrated refugee populations. We perform probabilistic seismic hazard analysis alongside these population scenarios to map spatial variations in seismic risk between 2011 and late 2015. Our results show a significant relative southward increase of seismic risk for this period due to refugee migration. Additionally, we calculate earthquake fatalities for simulated earthquakes using a semi-empirical loss estimation technique to determine degree of under-estimation resulting from forgoing migration data in loss modeling. We find that including refugee populations increased casualties by 11-12% using median population estimates, and upwards of 20% using high population estimates. These results communicate the ongoing importance of placing environmental hazards in their appropriate regional and temporal context which unites physical, political, cultural, and socio-economic landscapes. Keywords: Earthquakes, Hazards, Loss-Estimation, Syrian Crisis, Migration, Refugees
Ciardo, Delia; Gerardi, Marianna Alessandra; Vigorito, Sabrina; Morra, Anna; Dell'acqua, Veronica; Diaz, Federico Javier; Cattani, Federica; Zaffino, Paolo; Ricotti, Rosalinda; Spadea, Maria Francesca; Riboldi, Marco; Orecchia, Roberto; Baroni, Guido; Leonardi, Maria Cristina; Jereczek-Fossa, Barbara Alicja
2017-04-01
Atlas-based automatic segmentation (ABAS) addresses the challenges of accuracy and reliability in manual segmentation. We aim to evaluate the contribution of specific-purpose in ABAS of breast cancer (BC) patients with respect to generic-purpose libraries. One generic-purpose and 9 specific-purpose libraries, stratified according to type of surgery and size of thorax circumference, were obtained from the computed tomography of 200 BC patients. Keywords about contralateral breast volume and presence of breast expander/prostheses were recorded. ABAS was validated on 47 independent patients, considering manual segmentation from scratch as reference. Five ABAS datasets were obtained, testing single-ABAS and multi-ABAS with simultaneous truth and performance level estimation (STAPLE). Center of mass distance (CMD), average Hausdorff distance (AHD) and Dice similarity coefficient (DSC) between corresponding ABAS and manual structures were evaluated and statistically significant differences between different surgeries, structures and ABAS strategies were investigated. Statistically significant differences between patients who underwent different surgery were found, with superior results for conservative-surgery group, and between different structures were observed: ABAS of heart, lungs, kidneys and liver was satisfactory (median values: CMD<2 mm, DSC≥0.80, AHD<1.5 mm), whereas chest wall, breast and spinal cord obtained moderate performance (median values: 2 mm ≤ CMD<5 mm, 0.60 ≤ DSC<0.80, 1.5 mm ≤ AHD<4 mm) and esophagus, stomach, brachial plexus and supraclavicular nodes obtained poor performance (median CMD≥5 mm, DSC<0.60, AHD≥4 mm). The application of STAPLE algorithm generally yields higher performance and the use of keywords improves results for breast ABAS. The homogeneity in the selection of atlases based on multiple anatomical and clinical features and the use of specific-purpose libraries can improve ABAS performance with respect to generic-purpose libraries. Copyright © 2016 Elsevier Ltd. All rights reserved.
Google and Women's Health-Related Issues: What Does the Search Engine Data Reveal?
Baazeem, Mazin; Abenhaim, Haim
2014-01-01
Identifying the gaps in public knowledge of women's health related issues has always been difficult. With the increasing number of Internet users in the United States, we sought to use the Internet as a tool to help us identify such gaps and to estimate women's most prevalent health concerns by examining commonly searched health-related keywords in Google search engine. We collected a large pool of possible search keywords from two independent practicing obstetrician/gynecologists and classified them into five main categories (obstetrics, gynecology, infertility, urogynecology/menopause and oncology), and measured the monthly average search volume within the United States for each keyword with all its possible combinations using Google AdWords tool. We found that pregnancy related keywords were less frequently searched in general compared to other categories with an average of 145,400 hits per month for the top twenty keywords. Among the most common pregnancy-related keywords was "pregnancy and sex' while pregnancy-related diseases were uncommonly searched. HPV alone was searched 305,400 times per month. Of the cancers affecting women, breast cancer was the most commonly searched with an average of 247,190 times per month, followed by cervical cancer then ovarian cancer. The commonly searched keywords are often issues that are not discussed in our daily practice as well as in public health messages. The search volume is relatively related to disease prevalence with the exception of ovarian cancer which could signify a public fear.
Entropy, recycling and macroeconomics of water resources
NASA Astrophysics Data System (ADS)
Karakatsanis, Georgios; Mamassis, Nikos; Koutsoyiannis, Demetris
2014-05-01
We propose a macroeconomic model for water quantity and quality supply multipliers derived by water recycling (Karakatsanis et al. 2013). Macroeconomic models that incorporate natural resource conservation have become increasingly important (European Commission et al. 2012). In addition, as an estimated 80% of globally used freshwater is not reused (United Nations 2012), under increasing population trends, water recycling becomes a solution of high priority. Recycling of water resources creates two major conservation effects: (1) conservation of water in reservoirs and aquifers and (2) conservation of ecosystem carrying capacity due to wastewater flux reduction. Statistical distribution properties of the recycling efficiencies -on both water quantity and quality- for each sector are of vital economic importance. Uncertainty and complexity of water reuse in sectors are statistically quantified by entropy. High entropy of recycling efficiency values signifies greater efficiency dispersion; which -in turn- may indicate the need for additional infrastructure for the statistical distribution's both shifting and concentration towards higher efficiencies that lead to higher supply multipliers. Keywords: Entropy, water recycling, water supply multipliers, conservation, recycling efficiencies, macroeconomics References 1. European Commission (EC), Food and Agriculture Organization (FAO), International Monetary Fund (IMF), Organization of Economic Cooperation and Development (OECD), United Nations (UN) and World Bank (2012), System of Environmental and Economic Accounting (SEEA) Central Framework (White cover publication), United Nations Statistics Division 2. Karakatsanis, G., N. Mamassis, D. Koutsoyiannis and A. Efstratiades (2013), Entropy and reliability of water use via a statistical approach of scarcity, 5th EGU Leonardo Conference - Hydrofractals 2013 - STAHY '13, Kos Island, Greece, European Geosciences Union, International Association of Hydrological Sciences, International Union of Geodesy and Geophysics 3. United Nations (UN) (2012), World Water Development Report 4, UNESCO Publishing
NASA Astrophysics Data System (ADS)
Kusche, J.; Forootan, E.; Eicker, A.; Hoffmann-Dobrev, H.
2012-04-01
West-African countries have been exposed to changes in rainfall patterns over the last decades, including a significant negative trend. This causes adverse effects on water resources, for instance reduced freshwater availability, and changes in the frequency, duration and magnitude of droughts and floods. Extracting the main patterns of water storage change in West Africa from remote sensing and linking them to climate variability, is therefore an essential step to understand the hydrological aspects of the region. In this study, the higher order statistical method of Independent Component Analysis (ICA) is employed to extract statistically independent water storage patterns from monthly Gravity Recovery And Climate Experiment (GRACE), from the WaterGAP Global Hydrology Model (WGHM) and from Tropical Rainfall Measuring Mission (TRMM) products over West Africa, for the period 2002-2012. Then, to reveal the influences of climatic teleconnections on the individual patterns, these results were correlated to the El Nino-Southern Oscillation (ENSO) and the Indian Ocean Dipole (IOD) indices. To study the predictability of water storage changes, advanced statistical methods were applied on the main independent Sea Surface Temperature (SST) patterns over the Atlantic and Indian Oceans for the period 2002-2012 and the ICA results. Our results show a water storage decrease over the coastal regions of West Africa (including Sierra Leone, Liberia, Togo and Nigeria), associated with rainfall decrease. The comparison between GRACE estimations and WGHM results indicates some inconsistencies that underline the importance of forcing data for hydrological modeling of West Africa. Keywords: West Africa; GRACE-derived water storage; ICA; ENSO; IOD
Statistics of co-occurring keywords in confined text messages on Twitter
NASA Astrophysics Data System (ADS)
Mathiesen, J.; Angheluta, L.; Jensen, M. H.
2014-09-01
Online social media such as the micro-blogging site Twitter has become a rich source of real-time data on online human behaviors. Here we analyze the occurrence and co-occurrence frequency of keywords in user posts on Twitter. From the occurrence rate of major international brand names, we provide examples of predictions of brand-user behaviors. From the co-occurrence rates, we further analyze the user-perceived relationships between international brand names and construct the corresponding relationship networks. In general the user activity on Twitter is highly intermittent and we show that the occurrence rate of brand names forms a highly correlated time signal.
Understanding Depressive Symptoms and Psychosocial Stressors on Twitter: A Corpus-Based Study.
Mowery, Danielle; Smith, Hilary; Cheney, Tyler; Stoddard, Greg; Coppersmith, Glen; Bryan, Craig; Conway, Mike
2017-02-28
With a lifetime prevalence of 16.2%, major depressive disorder is the fifth biggest contributor to the disease burden in the United States. The aim of this study, building on previous work qualitatively analyzing depression-related Twitter data, was to describe the development of a comprehensive annotation scheme (ie, coding scheme) for manually annotating Twitter data with Diagnostic and Statistical Manual of Mental Disorders, Edition 5 (DSM 5) major depressive symptoms (eg, depressed mood, weight change, psychomotor agitation, or retardation) and Diagnostic and Statistical Manual of Mental Disorders, Edition IV (DSM-IV) psychosocial stressors (eg, educational problems, problems with primary support group, housing problems). Using this annotation scheme, we developed an annotated corpus, Depressive Symptom and Psychosocial Stressors Acquired Depression, the SAD corpus, consisting of 9300 tweets randomly sampled from the Twitter application programming interface (API) using depression-related keywords (eg, depressed, gloomy, grief). An analysis of our annotated corpus yielded several key results. First, 72.09% (6829/9473) of tweets containing relevant keywords were nonindicative of depressive symptoms (eg, "we're in for a new economic depression"). Second, the most prevalent symptoms in our dataset were depressed mood and fatigue or loss of energy. Third, less than 2% of tweets contained more than one depression related category (eg, diminished ability to think or concentrate, depressed mood). Finally, we found very high positive correlations between some depression-related symptoms in our annotated dataset (eg, fatigue or loss of energy and educational problems; educational problems and diminished ability to think). We successfully developed an annotation scheme and an annotated corpus, the SAD corpus, consisting of 9300 tweets randomly-selected from the Twitter application programming interface using depression-related keywords. Our analyses suggest that keyword queries alone might not be suitable for public health monitoring because context can change the meaning of keyword in a statement. However, postprocessing approaches could be useful for reducing the noise and improving the signal needed to detect depression symptoms using social media. ©Danielle Mowery, Hilary Smith, Tyler Cheney, Greg Stoddard, Glen Coppersmith, Craig Bryan, Mike Conway. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 28.02.2017.
Understanding Depressive Symptoms and Psychosocial Stressors on Twitter: A Corpus-Based Study
Smith, Hilary; Cheney, Tyler; Stoddard, Greg; Coppersmith, Glen; Bryan, Craig; Conway, Mike
2017-01-01
Background With a lifetime prevalence of 16.2%, major depressive disorder is the fifth biggest contributor to the disease burden in the United States. Objective The aim of this study, building on previous work qualitatively analyzing depression-related Twitter data, was to describe the development of a comprehensive annotation scheme (ie, coding scheme) for manually annotating Twitter data with Diagnostic and Statistical Manual of Mental Disorders, Edition 5 (DSM 5) major depressive symptoms (eg, depressed mood, weight change, psychomotor agitation, or retardation) and Diagnostic and Statistical Manual of Mental Disorders, Edition IV (DSM-IV) psychosocial stressors (eg, educational problems, problems with primary support group, housing problems). Methods Using this annotation scheme, we developed an annotated corpus, Depressive Symptom and Psychosocial Stressors Acquired Depression, the SAD corpus, consisting of 9300 tweets randomly sampled from the Twitter application programming interface (API) using depression-related keywords (eg, depressed, gloomy, grief). An analysis of our annotated corpus yielded several key results. Results First, 72.09% (6829/9473) of tweets containing relevant keywords were nonindicative of depressive symptoms (eg, “we’re in for a new economic depression”). Second, the most prevalent symptoms in our dataset were depressed mood and fatigue or loss of energy. Third, less than 2% of tweets contained more than one depression related category (eg, diminished ability to think or concentrate, depressed mood). Finally, we found very high positive correlations between some depression-related symptoms in our annotated dataset (eg, fatigue or loss of energy and educational problems; educational problems and diminished ability to think). Conclusions We successfully developed an annotation scheme and an annotated corpus, the SAD corpus, consisting of 9300 tweets randomly-selected from the Twitter application programming interface using depression-related keywords. Our analyses suggest that keyword queries alone might not be suitable for public health monitoring because context can change the meaning of keyword in a statement. However, postprocessing approaches could be useful for reducing the noise and improving the signal needed to detect depression symptoms using social media. PMID:28246066
Google and Women’s Health-Related Issues: What Does the Search Engine Data Reveal?
Baazeem, Mazin
2014-01-01
Objectives Identifying the gaps in public knowledge of women’s health related issues has always been difficult. With the increasing number of Internet users in the United States, we sought to use the Internet as a tool to help us identify such gaps and to estimate women’s most prevalent health concerns by examining commonly searched health-related keywords in Google search engine. Methods We collected a large pool of possible search keywords from two independent practicing obstetrician/gynecologists and classified them into five main categories (obstetrics, gynecology, infertility, urogynecology/menopause and oncology), and measured the monthly average search volume within the United States for each keyword with all its possible combinations using Google AdWords tool. Results We found that pregnancy related keywords were less frequently searched in general compared to other categories with an average of 145,400 hits per month for the top twenty keywords. Among the most common pregnancy-related keywords was “pregnancy and sex’ while pregnancy-related diseases were uncommonly searched. HPV alone was searched 305,400 times per month. Of the cancers affecting women, breast cancer was the most commonly searched with an average of 247,190 times per month, followed by cervical cancer then ovarian cancer. Conclusion The commonly searched keywords are often issues that are not discussed in our daily practice as well as in public health messages. The search volume is relatively related to disease prevalence with the exception of ovarian cancer which could signify a public fear. PMID:25422723
Effects of extreme weather on human health: methodology review
NASA Astrophysics Data System (ADS)
Wu, R.; Liss, A.; Naumova, E. N.
2012-12-01
This work critically evaluates current methodology applied to estimate the effects of extreme weather events (EWE) on human health. Specifically, we focus on uncertainties associated with: a) the main statistical approaches for estimating the effects of EWE, b) definitions of health outcomes and EWE, and c) possible sources of errors and biases in currently available data sets. The EWE, which include heat waves, cold spells, ice storms, flood, drought and tornadoes, are known for their massive effects on ecosystems, economies, infrastructures. In particular, human lives and health are frequently impacted by EWE; however, the estimate of such effects is complex and lacks a systematic methodology. An accurate and reliable estimate of health impacts is critical for developing preparedness and effective prevention strategies, better allocating scarce resources for mitigating negative impacts of EWE, and detecting vulnerable populations and regions in a timely manner. We reviewed 82 manuscripts published between 1993 and 2011, selected from MedPub and Medline databases using predetermined sets of keywords, such as extreme weather, mortality, morbidity and hospitalization. We classified publications based on their geographical locations, types of included health outcomes, methods for detecting EWE and statistical methodology employed to determine the presence and magnitude of EWE associated health outcomes. We determined that 57% of the reviewed manuscripts applied time-series analysis and the associations analysis and were conducted in temperate regions of the US, Canada, Korea, Japan and Europe respectively. About 60% of reviewed studies focused primarily on mortality data, 30% on morbidity outcomes and 9% studied both mortality and morbidity with respect to direct effects of extreme heat waves and cold spells. A wide range of EWE definitions were employed in those manuscripts, which limited the ability to compare the results to a certain degree. We observed at least three main sources of uncertainty, which may lead to an estimate bias: potential misrepresentation and misspecification of the biological causal mechanism in statistical models, completeness and quality of reporting EWE-specific health outcomes, and incomplete accounting for spatial uncertainties in historical environmental records. Finally we show that some of those systematic biases can be reduced by performing proper adjustments, while some of them still need further studies and efforts. Reducing bias provides more accurate representation of disease burden. Better understanding of EWE and their impacts on human health, combined with other preventive strategies, can provide better protection from EWE for vulnerable populations in the future.
1990-01-01
This document contains summaries of fifteen of the well known books which underlie the Total Quality Management philosophy. Members of the DCASR St Louis staff offer comments and opinions on how the authors have presented the quality concept in todays business environment. Keywords: TQM (Total Quality Management ), Quality concepts, Statistical process control.
Automated semantic indexing of figure captions to improve radiology image retrieval.
Kahn, Charles E; Rubin, Daniel L
2009-01-01
We explored automated concept-based indexing of unstructured figure captions to improve retrieval of images from radiology journals. The MetaMap Transfer program (MMTx) was used to map the text of 84,846 figure captions from 9,004 peer-reviewed, English-language articles to concepts in three controlled vocabularies from the UMLS Metathesaurus, version 2006AA. Sampling procedures were used to estimate the standard information-retrieval metrics of precision and recall, and to evaluate the degree to which concept-based retrieval improved image retrieval. Precision was estimated based on a sample of 250 concepts. Recall was estimated based on a sample of 40 concepts. The authors measured the impact of concept-based retrieval to improve upon keyword-based retrieval in a random sample of 10,000 search queries issued by users of a radiology image search engine. Estimated precision was 0.897 (95% confidence interval, 0.857-0.937). Estimated recall was 0.930 (95% confidence interval, 0.838-1.000). In 5,535 of 10,000 search queries (55%), concept-based retrieval found results not identified by simple keyword matching; in 2,086 searches (21%), more than 75% of the results were found by concept-based search alone. Concept-based indexing of radiology journal figure captions achieved very high precision and recall, and significantly improved image retrieval.
Statistical downscaling of summer precipitation over northwestern South America
NASA Astrophysics Data System (ADS)
Palomino Lemus, Reiner; Córdoba Machado, Samir; Raquel Gámiz Fortis, Sonia; Castro Díez, Yolanda; Jesús Esteban Parra, María
2015-04-01
In this study a statistical downscaling (SD) model using Principal Component Regression (PCR) for simulating summer precipitation in Colombia during the period 1950-2005, has been developed, and climate projections during the 2071-2100 period by applying the obtained SD model have been obtained. For these ends the Principal Components (PCs) of the SLP reanalysis data from NCEP were used as predictor variables, while the observed gridded summer precipitation was the predictand variable. Period 1950-1993 was utilized for calibration and 1994-2010 for validation. The Bootstrap with replacement was applied to provide estimations of the statistical errors. All models perform reasonably well at regional scales, and the spatial distribution of the correlation coefficients between predicted and observed gridded precipitation values show high values (between 0.5 and 0.93) along Andes range, north and north Pacific of Colombia. Additionally, the ability of the MIROC5 GCM to simulate the summer precipitation in Colombia, for present climate (1971-2005), has been analyzed by calculating the differences between the simulated and observed precipitation values. The simulation obtained by this GCM strongly overestimates the precipitation along a horizontal sector through the center of Colombia, especially important at the east and west of this country. However, the SD model applied to the SLP of the GCM shows its ability to faithfully reproduce the rainfall field. Finally, in order to get summer precipitation projections in Colombia for the period 1971-2100, the downscaled model, recalibrated for the total period 1950-2010, has been applied to the SLP output from MIROC5 model under the RCP2.6, RCP4.5 and RCP8.5 scenarios. The changes estimated by the SD models are not significant under the RCP2.6 scenario, while for the RCP4.5 and RCP8.5 scenarios a significant increase of precipitation appears regard to the present values in all the regions, reaching around the 27% in northern Colombia region under the RCP8.5 scenario. Keywords: Statistical downscaling, precipitation, Principal Component Regression, climate change, Colombia. ACKNOWLEDGEMENTS This work has been financed by the projects P11-RNM-7941 (Junta de Andalucía-Spain) and CGL2013-48539-R (MINECO-Spain, FEDER).
[Pruritus in Germany-a Google search engine analysis].
Zink, A; Rüth, M; Schuster, B; Darsow, U; Biedermann, T; Ständer, S
2018-06-06
Because affected persons often do not visit a doctor, the prevalence of chronic and acute pruritus in the general population is difficult to determine. The aim of this study is to estimate the frequency and the most common locations of pruritus in German internet users, who-with 62.4 million persons-represent a large majority of the German population, by analysing the Google search volume. Relevant keywords for the subject "pruritus" were identified and analysed using the Google AdWords Keyword Planner. The assessment period was January 2015 to December 2016. In total the Google AdWords Keyword Planner identified 701 keywords for the topic "Juckreiz" (German lay word for pruritus), resulting in 7,531,890 pruritus-related Google searches during the assessment period. Most common search terms were the German lay term for atopic eczema ("Neurodermitis", 23.7%), the German lay term for psoriasis ("Schuppenflechte", 17.8%) and "psoriasis" (13%). The German lay term for pruritus ("Juckreiz") was only the sixth most searched term (3%). Most searches (72%) focused on influencing factors for pruritus, especially on skin diseases and skin conditions. The most commonly searched location was pruritus on the whole body, followed by anal pruritus. Analysis of the temporal course showed a higher monthly search volume during winter. With its unconventional methodology, a Google search engine analysis, this study allows a rough estimation of the medical need of pruritus in the German general population, which seems to be higher than expected. Especially pruritus in the anal area was identified as an unmet medical need.
Mild cognitive impairment and fMRI studies of brain functional connectivity: the state of the art
Farràs-Permanyer, Laia; Guàrdia-Olmos, Joan; Peró-Cebollero, Maribel
2015-01-01
In the last 15 years, many articles have studied brain connectivity in Mild Cognitive Impairment patients with fMRI techniques, seemingly using different connectivity statistical models in each investigation to identify complex connectivity structures so as to recognize typical behavior in this type of patient. This diversity in statistical approaches may cause problems in results comparison. This paper seeks to describe how researchers approached the study of brain connectivity in MCI patients using fMRI techniques from 2002 to 2014. The focus is on the statistical analysis proposed by each research group in reference to the limitations and possibilities of those techniques to identify some recommendations to improve the study of functional connectivity. The included articles came from a search of Web of Science and PsycINFO using the following keywords: f MRI, MCI, and functional connectivity. Eighty-one papers were found, but two of them were discarded because of the lack of statistical analysis. Accordingly, 79 articles were included in this review. We summarized some parts of the articles, including the goal of every investigation, the cognitive paradigm and methods used, brain regions involved, use of ROI analysis and statistical analysis, emphasizing on the connectivity estimation model used in each investigation. The present analysis allowed us to confirm the remarkable variability of the statistical analysis methods found. Additionally, the study of brain connectivity in this type of population is not providing, at the moment, any significant information or results related to clinical aspects relevant for prediction and treatment. We propose to follow guidelines for publishing fMRI data that would be a good solution to the problem of study replication. The latter aspect could be important for future publications because a higher homogeneity would benefit the comparison between publications and the generalization of results. PMID:26300802
Use of environmental isotope tracer and GIS techniques to estimate basin recharge
NASA Astrophysics Data System (ADS)
Odunmbaku, Abdulganiu A. A.
The extensive use of ground water only began with the advances in pumping technology at the early portion of 20th Century. Groundwater provides the majority of fresh water supply for municipal, agricultural and industrial uses, primarily because of little to no treatment it requires. Estimating the volume of groundwater available in a basin is a daunting task, and no accurate measurements can be made. Usually water budgets and simulation models are primarily used to estimate the volume of water in a basin. Precipitation, land surface cover and subsurface geology are factors that affect recharge; these factors affect percolation which invariably affects groundwater recharge. Depending on precipitation, soil chemistry, groundwater chemical composition, gradient and depth, the age and rate of recharge can be estimated. This present research proposes to estimate the recharge in Mimbres, Tularosa and Diablo Basin using the chloride environmental isotope; chloride mass-balance approach and GIS. It also proposes to determine the effect of elevation on recharge rate. Mimbres and Tularosa Basin are located in southern New Mexico State, and extend southward into Mexico. Diablo Basin is located in Texas in extends southward. This research utilizes the chloride mass balance approach to estimate the recharge rate through collection of groundwater data from wells, and precipitation. The data were analysed statistically to eliminate duplication, outliers, and incomplete data. Cluster analysis, piper diagram and statistical significance were performed on the parameters of the groundwater; the infiltration rate was determined using chloride mass balance technique. The data was then analysed spatially using ArcGIS10. Regions of active recharge were identified in Mimbres and Diablo Basin, but this could not be clearly identified in Tularosa Basin. CMB recharge for Tularosa Basin yields 0.04037mm/yr (0.0016in/yr), Diablo Basin was 0.047mm/yr (0.0016 in/yr), and 0.2153mm/yr (0.00848in/yr) for Mimbres Basin. The elevation where active recharge occurs was determined to be 1,500m for Mimbres and Tularosa Basin and 1,200m for Diablo Basin. The results obtained in this study were consistent with result obtained by other researchers working in basins with similar semiarid mountainous conditions, thereby validating the applicability of CMB in the three basins. Keywords: Recharge, chloride mass balance, elevation, Mimbres, Tularosa, Diablo, Basin, GIS, chloride, elevation.
Generalizing Word Lattice Translation
2008-02-01
demonstrate substantial gains for Chinese-English and Arabic -English translation. Keywords: word lattice translation, phrase-based and hierarchical...introduce in reordering models. Our experiments evaluating the approach demonstrate substantial gains for Chinese-English and Arabic -English translation. 15...gains for Chinese-English and Arabic -English translation. 1 Introduction When Brown and colleagues introduced statistical machine translation in the
Toxoplasmosis and epilepsy--systematic review and meta analysis.
Ngoungou, Edgard B; Bhalla, Devender; Nzoghe, Amandine; Dardé, Marie-Laure; Preux, Pierre-Marie
2015-02-01
Toxoplasmosis is an important, widespread, parasitic infection caused by Toxoplasma gondii. The chronic infection in immunocompetent patients, usually considered as asymptomatic, is now suspected to be a risk factor for various neurological disorders, including epilepsy. We aimed to conduct a systematic review and meta-analysis of the available literature to estimate the risk of epilepsy due to toxoplasmosis. A systematic literature search was conducted of several databases and journals to identify studies published in English or French, without date restriction, which looked at toxoplasmosis (as exposure) and epilepsy (as disease) and met certain other inclusion criteria. The search was based on keywords and suitable combinations in English and French. Fixed and random effects models were used to determine odds ratios, and statistical significance was set at 5.0%. Six studies were identified, with an estimated total of 2888 subjects, of whom 1280 had epilepsy (477 positive for toxoplasmosis) and 1608 did not (503 positive for toxoplasmosis). The common odds ratio (calculated) by random effects model was 2.25 (95% CI 1.27-3.9), p = 0.005. Despite the limited number of studies, and a lack of high-quality data, toxoplasmosis should continue to be regarded as an epilepsy risk factor. More and better studies are needed to determine the real impact of this parasite on the occurrence of epilepsy.
Molecular Dynamics Simulation of the Antiamoebin Ion Channel: Linking Structure and Conductance
NASA Technical Reports Server (NTRS)
Wilson, Michael A.; Wei, Chenyu; Bjelkmar, Paer; Wallace, B. A.; Pohorille, Andrew
2011-01-01
Molecular dynamics simulations were carried out in order to ascertain which of the potential multimeric forms of the transmembrane peptaibol channel, antiamoebin, is consistant with its measured conductance. Estimates of the conductance obtained through counting ions that cross the channel and by solving the Nernst-Planck equation yield consistent results, indicating that the motion of ions inside the channel can be satisfactorily described as diffusive.The calculated conductance of octameric channels is markedly higher than the conductance measured in single channel recordings, whereas the tetramer appears to be non-conducting. The conductance of the hexamer was estimated to be 115+/-34 pS and 74+/-20 pS, at 150 mV and 75 mV, respectively, in satisfactory agreement with the value of 90 pS measured at 75 mV. On this basis we propose that the antiamoebin channel consists of six monomers. Its pore is large enough to accommodate K(+) and Cl(-) with their first solvation shells intact. The free energy barrier encountered by K(+) is only 2.2 kcal/mol whereas Cl(-) encounters a substantially higher barrier of nearly 5 kcal/mol. This difference makes the channel selective for cations. Ion crossing events are shown to be uncorrelated and follow Poisson statistics. keywords: ion channels, peptaibols, channel conductance, molecular dynamics
Automated Semantic Indexing of Figure Captions to Improve Radiology Image Retrieval
Kahn, Charles E.; Rubin, Daniel L.
2009-01-01
Objective We explored automated concept-based indexing of unstructured figure captions to improve retrieval of images from radiology journals. Design The MetaMap Transfer program (MMTx) was used to map the text of 84,846 figure captions from 9,004 peer-reviewed, English-language articles to concepts in three controlled vocabularies from the UMLS Metathesaurus, version 2006AA. Sampling procedures were used to estimate the standard information-retrieval metrics of precision and recall, and to evaluate the degree to which concept-based retrieval improved image retrieval. Measurements Precision was estimated based on a sample of 250 concepts. Recall was estimated based on a sample of 40 concepts. The authors measured the impact of concept-based retrieval to improve upon keyword-based retrieval in a random sample of 10,000 search queries issued by users of a radiology image search engine. Results Estimated precision was 0.897 (95% confidence interval, 0.857–0.937). Estimated recall was 0.930 (95% confidence interval, 0.838–1.000). In 5,535 of 10,000 search queries (55%), concept-based retrieval found results not identified by simple keyword matching; in 2,086 searches (21%), more than 75% of the results were found by concept-based search alone. Conclusion Concept-based indexing of radiology journal figure captions achieved very high precision and recall, and significantly improved image retrieval. PMID:19261938
2011-01-01
present performance statistics to explain the scalability behavior. Keywords-atmospheric models, time intergrators , MPI, scal- ability, performance; I...across inter-element bound- aries. Basis functions are constructed as tensor products of Lagrange polynomials ψi (x) = hα(ξ) ⊗ hβ(η) ⊗ hγ(ζ)., where hα
Saez, M; Figueiras, A; Ballester, F; Perez-Hoyos, S; Ocana, R; Tobias, A
2001-01-01
STUDY OBJECTIVE—The objective of this paper is to introduce a different approach, called the ecological-longitudinal, to carrying out pooled analysis in time series ecological studies. Because it gives a larger number of data points and, hence, increases the statistical power of the analysis, this approach, unlike conventional ones, allows the complementation of aspects such as accommodation of random effect models, of lags, of interaction between pollutants and between pollutants and meteorological variables, that are hardly implemented in conventional approaches. DESIGN—The approach is illustrated by providing quantitative estimates of the short-term effects of air pollution on mortality in three Spanish cities, Barcelona, Valencia and Vigo, for the period 1992-1994. Because the dependent variable was a count, a Poisson generalised linear model was first specified. Several modelling issues are worth mentioning. Firstly, because the relations between mortality and explanatory variables were non-linear, cubic splines were used for covariate control, leading to a generalised additive model, GAM. Secondly, the effects of the predictors on the response were allowed to occur with some lag. Thirdly, the residual autocorrelation, because of imperfect control, was controlled for by means of an autoregressive Poisson GAM. Finally, the longitudinal design demanded the consideration of the existence of individual heterogeneity, requiring the consideration of mixed models. MAIN RESULTS—The estimates of the relative risks obtained from the individual analyses varied across cities, particularly those associated with sulphur dioxide. The highest relative risks corresponded to black smoke in Valencia. These estimates were higher than those obtained from the ecological-longitudinal analysis. Relative risks estimated from this latter analysis were practically identical across cities, 1.00638 (95% confidence intervals 1.0002, 1.0011) for a black smoke increase of 10 µg/m3 and 1.00415 (95% CI 1.0001, 1.0007) for a increase of 10 µg/m3 of sulphur dioxide. Because the statistical power is higher than in the individual analysis more interactions were statistically significant, especially those among air pollutants and meteorological variables. CONCLUSIONS—Air pollutant levels were related to mortality in the three cities of the study, Barcelona, Valencia and Vigo. These results were consistent with similar studies in other cities, with other multicentric studies and coherent with both, previous individual, for each city, and multicentric studies for all three cities. Keywords: air pollution; mortality; longitudinal studies PMID:11351001
Martin, Alexandra; Stewart, J Ryan; Gaskins, Jeremy; Medlin, Erin
2018-01-20
The Internet is a major source of health information for gynecologic cancer patients. In this study, we systematically explore common Google search terms related to gynecologic cancer and calculate readability of top resulting websites. We used Google AdWords Keyword Planner to generate a list of commonly searched keywords related to gynecologic oncology, which were sorted into five groups (cervical cancer, ovarian cancer, uterine cancer, vulvar cancer, vaginal cancer) using five patient education websites from sgo.org . Each keyword was Google searched to create a list of top websites. The Python programming language (version 3.5.1) was used to describe frequencies of keywords, top-level domains (TLDs), domains, and readability of top websites using four validated formulae. Of the estimated 1,846,950 monthly searches resulting in 62,227 websites, the most common was cancer.org . The most common TLD was *.com. Most websites were above the eighth-grade reading level recommended by the American Medical Association (AMA) and the National Institute of Health (NIH). The SMOG Index was the most reliable formula. The mean grade level readability for all sites using SMOG was 9.4 ± 2.3, with 23.9% of sites falling at or below the eighth-grade reading level. The first ten results for each Google keyword were easiest to read with results beyond the first page of Google being consistently more difficult. Keywords related to gynecologic malignancies are Google-searched frequently. Most websites are difficult to read without a high school education. This knowledge may help gynecologic oncology providers adequately meet the needs of their patients.
ERIC Educational Resources Information Center
Calzada Pérez, María
2013-01-01
The present paper revolves around MaxiECPC, one of the various sub-corpora that make up ECPC (the European Comparable and Parallel Corpora), an electronic archive of speeches delivered at different parliaments (i.e. the European Parliament-EP; the Spanish Congreso de los Diputados-CD; and the British House of Commons-HC) from 1996 to 2009. In…
Monitoring Influenza Epidemics in China with Search Query from Baidu
Lv, Benfu; Peng, Geng; Chunara, Rumi; Brownstein, John S.
2013-01-01
Several approaches have been proposed for near real-time detection and prediction of the spread of influenza. These include search query data for influenza-related terms, which has been explored as a tool for augmenting traditional surveillance methods. In this paper, we present a method that uses Internet search query data from Baidu to model and monitor influenza activity in China. The objectives of the study are to present a comprehensive technique for: (i) keyword selection, (ii) keyword filtering, (iii) index composition and (iv) modeling and detection of influenza activity in China. Sequential time-series for the selected composite keyword index is significantly correlated with Chinese influenza case data. In addition, one-month ahead prediction of influenza cases for the first eight months of 2012 has a mean absolute percent error less than 11%. To our knowledge, this is the first study on the use of search query data from Baidu in conjunction with this approach for estimation of influenza activity in China. PMID:23750192
NASA Astrophysics Data System (ADS)
Myo Lin, Nay; Rutten, Martine
2017-04-01
The Sittaung River is one of four major rivers in Myanmar. This river basin is developing fast and facing problems with flood, sedimentation, river bank erosion and salt intrusion. At present, more than 20 numbers of reservoirs have already been constructed for multiple purposes such as irrigation, domestic water supply, hydro-power generation, and flood control. The rainfall runoff models are required for the operational management of this reservoir system. In this study, the river basin is divided into (64) sub-catchments and the Sacramento Soil Moisture Accounting (SAC-SMA) models are developed by using satellite rainfall and Geographic Information System (GIS) data. The SAC-SMA model has sixteen calibration parameters, and also uses a unit hydrograph for surface flow routing. The Sobek software package is used for SAC-SMA modelling and simulation of river system. The models are calibrated and tested by using observed discharge and water level data. The statistical results show that the model is applicable to use for data scarce region. Keywords: Sacramento, Sobek, rainfall runoff, reservoir
Widowhood and Mortality: A Meta-Analysis and Meta-Regressiona
Roelfs, David J.; Shor, Eran; Curreli, Misty; Clemow, Lynn; Burg, Matthew M.; Schwartz, Joseph E.
2013-01-01
The study of spousal bereavement and mortality has long been a major topic of interest for social scientists, but much remains unknown with respect to important moderating factors such as age, follow-up duration, and geographic region. The present study examines these factors using meta-analysis. Keyword searches were conducted in multiple electronic databases, supplemented by extensive iterative hand searches. We extracted 1381 mortality risk estimates from 124 publications, providing data on more than 500 million persons. Compared to married people, widowers had a mean hazard ratio (HR) of 1.23 (95% confidence interval [CI], 1.19–1.28) among HRs adjusted for age and additional covariates and a high subjective quality score. The mean HR was higher for men (HR, 1.27; 95% CI, 1.19–1.35) than for women (HR, 1.15; 95% CI: 1.08–1.22). A significant interaction effect was found between gender and mean age, with HRs decreasing more rapidly for men than for women as age increased. Other significant predictors of HR magnitude included sample size, geographic region, level of statistical adjustment, and study quality. PMID:22427278
NASA Astrophysics Data System (ADS)
Arunachalam, M. S.; Obili, Manjula; Srimurali, M.
2016-07-01
Long-term variation of Surface Ozone, NO2, Temperature, Relative humidity and crop yield datasets over thirteen districts of Andhra Pradesh(AP) has been studied with the help of OMI, MODIS, AIRS, ERA-Interim re-analysis and Directorate of Economics and Statistics (DES) of AP. Inter comparison of crop yield loss estimates according to exposure metrics such as AOT40 (accumulated ozone exposure over a threshold of 40) and non-linear variation of surface temperature for twenty and eighteen varieties of two major crop growing seasons namely, kharif (April-September) and rabi (October-March), respectively has been made. Study is carried to establish a new crop-yield-exposure relationship for different crop cultivars of AP. Both ozone and temperature are showing a correlation coefficient of 0.66 and 0.87 with relative humidity; and 0.72 and 0.80 with NO2. Alleviation of high surface ozone results in high food security and improves the economy thereby reduces the induced warming of the troposphere caused by ozone. Keywords: Surface Ozone, NO2, Temperature, Relative humidity, Crop yield, AOT 40.
An efficient scheme for automatic web pages categorization using the support vector machine
NASA Astrophysics Data System (ADS)
Bhalla, Vinod Kumar; Kumar, Neeraj
2016-07-01
In the past few years, with an evolution of the Internet and related technologies, the number of the Internet users grows exponentially. These users demand access to relevant web pages from the Internet within fraction of seconds. To achieve this goal, there is a requirement of an efficient categorization of web page contents. Manual categorization of these billions of web pages to achieve high accuracy is a challenging task. Most of the existing techniques reported in the literature are semi-automatic. Using these techniques, higher level of accuracy cannot be achieved. To achieve these goals, this paper proposes an automatic web pages categorization into the domain category. The proposed scheme is based on the identification of specific and relevant features of the web pages. In the proposed scheme, first extraction and evaluation of features are done followed by filtering the feature set for categorization of domain web pages. A feature extraction tool based on the HTML document object model of the web page is developed in the proposed scheme. Feature extraction and weight assignment are based on the collection of domain-specific keyword list developed by considering various domain pages. Moreover, the keyword list is reduced on the basis of ids of keywords in keyword list. Also, stemming of keywords and tag text is done to achieve a higher accuracy. An extensive feature set is generated to develop a robust classification technique. The proposed scheme was evaluated using a machine learning method in combination with feature extraction and statistical analysis using support vector machine kernel as the classification tool. The results obtained confirm the effectiveness of the proposed scheme in terms of its accuracy in different categories of web pages.
Statistical Cost Estimation in Higher Education: Some Alternatives.
ERIC Educational Resources Information Center
Brinkman, Paul T.; Niwa, Shelley
Recent developments in econometrics that are relevant to the task of estimating costs in higher education are reviewed. The relative effectiveness of alternative statistical procedures for estimating costs are also tested. Statistical cost estimation involves three basic parts: a model, a data set, and an estimation procedure. Actual data are used…
NASA Astrophysics Data System (ADS)
Araya, F. Z.; Abdul-Aziz, O. I.
2017-12-01
This study utilized a systematic data analytics approach to determine the relative linkages of stream dissolved oxygen (DO) with the hydro-climatic and biogeochemical drivers across the U.S. Pacific Coast. Multivariate statistical techniques of Pearson correlation matrix, principal component analysis, and factor analysis were applied to a complex water quality dataset (1998-2015) at 35 water quality monitoring stations of USGS NWIS and EPA STORET. Power-law based partial least squares regression (PLSR) models with a bootstrap Monte Carlo procedure (1000 iterations) were developed to reliably estimate the relative linkages by resolving multicollinearity (Nash-Sutcliffe Efficiency, NSE = 0.50-0.94). Based on the dominant drivers, four environmental regimes have been identified and adequately described the system-data variances. In Pacific North West and Southern California, water temperature was the most dominant driver of DO in majority of the streams. However, in Central and Northern California, stream DO was controlled by multiple drivers (i.e., water temperature, pH, stream flow, and total phosphorus), exhibiting a transitional environmental regime. Further, total phosphorus (TP) appeared to be the limiting nutrient for most streams. The estimated linkages and insights would be useful to identify management priorities to achieve healthy coastal stream ecosystems across the Pacific Coast of U.S.A. and similar regions around the world. Keywords: Data analytics, water quality, coastal streams, dissolved oxygen, environmental regimes, Pacific Coast, United States.
A Step Beyond Simple Keyword Searches: Services Enabled by a Full Content Digital Journal Archive
NASA Technical Reports Server (NTRS)
Boccippio, Dennis J.
2003-01-01
The problems of managing and searching large archives of scientific journal articles can potentially be addressed through data mining and statistical techniques matured primarily for quantitative scientific data analysis. A journal paper could be represented by a multivariate descriptor, e.g., the occurrence counts of a number key technical terms or phrases (keywords), perhaps derived from a controlled vocabulary ( e . g . , the American Meteorological Society's Glossary of Meteorology) or bootstrapped from the journal archive itself. With this technique, conventional statistical classification tools can be leveraged to address challenges faced by both scientists and professional societies in knowledge management. For example, cluster analyses can be used to find bundles of "most-related" papers, and address the issue of journal bifurcation (when is a new journal necessary, and what topics should it encompass). Similarly, neural networks can be trained to predict the optimal journal (within a society's collection) in which a newly submitted paper should be published. Comparable techniques could enable very powerful end-user tools for journal searches, all premised on the view of a paper as a data point in a multidimensional descriptor space, e.g.: "find papers most similar to the one I am reading", "build a personalized subscription service, based on the content of the papers I am interested in, rather than preselected keywords", "find suitable reviewers, based on the content of their own published works", etc. Such services may represent the next "quantum leap" beyond the rudimentary search interfaces currently provided to end-users, as well as a compelling value-added component needed to bridge the print-to-digital-medium gap, and help stabilize professional societies' revenue stream during the print-to-digital transition.
2016-07-27
make risk-informed decisions during serious games . Statistical models of intra- game performance were developed to determine whether behaviors in...specific facets of the gameplay workflow were predictive of analytical performance and games outcomes. A study of over seventy instrumented teams revealed...more accurate game decisions. 2 Keywords: Humatics · Serious Games · Human-System Interaction · Instrumentation · Teamwork · Communication Analysis
TQM (Total Quality Management) SPARC (Special Process Action Review Committees) Handbook
1989-08-01
This document describes the techniques used to support and guide the Special Process Action Review Committees for accomplishing their goals for Total Quality Management (TQM). It includes concepts and definitions, checklists, sample formats, and assessment criteria. Keywords: Continuous process improvement; Logistics information; Process analysis; Quality control; Quality assurance; Total Quality Management ; Statistical processes; Management Planning and control; Management training; Management information systems.
Territorial Developments Based on Graffiti: a Statistical Mechanics Approach
2011-10-28
defined on a lattice . We introduce a two-gang Hamiltonian model where agents have red or blue affiliation but are otherwise indistinguishable. In this...ramifications of our results. Keywords: Territorial Formation, Spin Systems, Phase Transitions 1. Introduction Lattice models have been extensively used in...inconsequential. In short, lattice models have proved extremely useful in the context of the physical, biological and even chemical sciences. In more
Use of keyword hierarchies to interpret gene expression patterns.
Masys, D R; Welsh, J B; Lynn Fink, J; Gribskov, M; Klacansky, I; Corbeil, J
2001-04-01
High-density microarray technology permits the quantitative and simultaneous monitoring of thousands of genes. The interpretation challenge is to extract relevant information from this large amount of data. A growing variety of statistical analysis approaches are available to identify clusters of genes that share common expression characteristics, but provide no information regarding the biological similarities of genes within clusters. The published literature provides a potential source of information to assist in interpretation of clustering results. We describe a data mining method that uses indexing terms ('keywords') from the published literature linked to specific genes to present a view of the conceptual similarity of genes within a cluster or group of interest. The method takes advantage of the hierarchical nature of Medical Subject Headings used to index citations in the MEDLINE database, and the registry numbers applied to enzymes.
A statistical view of FMRFamide neuropeptide diversity.
Espinoza, E; Carrigan, M; Thomas, S G; Shaw, G; Edison, A S
2000-01-01
FMRFamide-like peptide (FLP) amino acid sequences have been collected and statistically analyzed. FLP amino acid composition as a function of position in the peptide is graphically presented for several major phyla. Results of total amino acid composition and frequencies of pairs of FLP amino acids have been computed and compared with corresponding values from the entire GenBank protein sequence database. The data for pairwise distributions of amino acids should help in future structure-function studies of FLPs. To aid in future peptide discovery, a computer program and search protocol was developed to identify FLPs from the GenBank protein database without the use of keywords.
J-Adaptive estimation with estimated noise statistics. [for orbit determination
NASA Technical Reports Server (NTRS)
Jazwinski, A. H.; Hipkins, C.
1975-01-01
The J-Adaptive estimator described by Jazwinski and Hipkins (1972) is extended to include the simultaneous estimation of the statistics of the unmodeled system accelerations. With the aid of simulations it is demonstrated that the J-Adaptive estimator with estimated noise statistics can automatically estimate satellite orbits to an accuracy comparable with the data noise levels, when excellent, continuous tracking coverage is available. Such tracking coverage will be available from satellite-to-satellite tracking.
An adaptive state of charge estimation approach for lithium-ion series-connected battery system
NASA Astrophysics Data System (ADS)
Peng, Simin; Zhu, Xuelai; Xing, Yinjiao; Shi, Hongbing; Cai, Xu; Pecht, Michael
2018-07-01
Due to the incorrect or unknown noise statistics of a battery system and its cell-to-cell variations, state of charge (SOC) estimation of a lithium-ion series-connected battery system is usually inaccurate or even divergent using model-based methods, such as extended Kalman filter (EKF) and unscented Kalman filter (UKF). To resolve this problem, an adaptive unscented Kalman filter (AUKF) based on a noise statistics estimator and a model parameter regulator is developed to accurately estimate the SOC of a series-connected battery system. An equivalent circuit model is first built based on the model parameter regulator that illustrates the influence of cell-to-cell variation on the battery system. A noise statistics estimator is then used to attain adaptively the estimated noise statistics for the AUKF when its prior noise statistics are not accurate or exactly Gaussian. The accuracy and effectiveness of the SOC estimation method is validated by comparing the developed AUKF and UKF when model and measurement statistics noises are inaccurate, respectively. Compared with the UKF and EKF, the developed method shows the highest SOC estimation accuracy.
Calculating weighted estimates of peak streamflow statistics
Cohn, Timothy A.; Berenbrock, Charles; Kiang, Julie E.; Mason, Jr., Robert R.
2012-01-01
According to the Federal guidelines for flood-frequency estimation, the uncertainty of peak streamflow statistics, such as the 1-percent annual exceedance probability (AEP) flow at a streamgage, can be reduced by combining the at-site estimate with the regional regression estimate to obtain a weighted estimate of the flow statistic. The procedure assumes the estimates are independent, which is reasonable in most practical situations. The purpose of this publication is to describe and make available a method for calculating a weighted estimate from the uncertainty or variance of the two independent estimates.
Hur, Junguk; Özgür, Arzucan; Xiang, Zuoshuang; He, Yongqun
2015-01-01
Literature mining of gene-gene interactions has been enhanced by ontology-based name classifications. However, in biomedical literature mining, interaction keywords have not been carefully studied and used beyond a collection of keywords. In this study, we report the development of a new Interaction Network Ontology (INO) that classifies >800 interaction keywords and incorporates interaction terms from the PSI Molecular Interactions (PSI-MI) and Gene Ontology (GO). Using INO-based literature mining results, a modified Fisher's exact test was established to analyze significantly over- and under-represented enriched gene-gene interaction types within a specific area. Such a strategy was applied to study the vaccine-mediated gene-gene interactions using all PubMed abstracts. The Vaccine Ontology (VO) and INO were used to support the retrieval of vaccine terms and interaction keywords from the literature. INO is aligned with the Basic Formal Ontology (BFO) and imports terms from 10 other existing ontologies. Current INO includes 540 terms. In terms of interaction-related terms, INO imports and aligns PSI-MI and GO interaction terms and includes over 100 newly generated ontology terms with 'INO_' prefix. A new annotation property, 'has literature mining keywords', was generated to allow the listing of different keywords mapping to the interaction types in INO. Using all PubMed documents published as of 12/31/2013, approximately 266,000 vaccine-associated documents were identified, and a total of 6,116 gene-pairs were associated with at least one INO term. Out of 78 INO interaction terms associated with at least five gene-pairs of the vaccine-associated sub-network, 14 terms were significantly over-represented (i.e., more frequently used) and 17 under-represented based on our modified Fisher's exact test. These over-represented and under-represented terms share some common top-level terms but are distinct at the bottom levels of the INO hierarchy. The analysis of these interaction types and their associated gene-gene pairs uncovered many scientific insights. INO provides a novel approach for defining hierarchical interaction types and related keywords for literature mining. The ontology-based literature mining, in combination with an INO-based statistical interaction enrichment test, provides a new platform for efficient mining and analysis of topic-specific gene interaction networks.
Mann, Michael P.; Rizzardo, Jule; Satkowski, Richard
2004-01-01
Accurate streamflow statistics are essential to water resource agencies involved in both science and decision-making. When long-term streamflow data are lacking at a site, estimation techniques are often employed to generate streamflow statistics. However, procedures for accurately estimating streamflow statistics often are lacking. When estimation procedures are developed, they often are not evaluated properly before being applied. Use of unevaluated or underevaluated flow-statistic estimation techniques can result in improper water-resources decision-making. The California State Water Resources Control Board (SWRCB) uses two key techniques, a modified rational equation and drainage basin area-ratio transfer, to estimate streamflow statistics at ungaged locations. These techniques have been implemented to varying degrees, but have not been formally evaluated. For estimating peak flows at the 2-, 5-, 10-, 25-, 50-, and 100-year recurrence intervals, the SWRCB uses the U.S. Geological Surveys (USGS) regional peak-flow equations. In this study, done cooperatively by the USGS and SWRCB, the SWRCB estimated several flow statistics at 40 USGS streamflow gaging stations in the north coast region of California. The SWRCB estimates were made without reference to USGS flow data. The USGS used the streamflow data provided by the 40 stations to generate flow statistics that could be compared with SWRCB estimates for accuracy. While some SWRCB estimates compared favorably with USGS statistics, results were subject to varying degrees of error over the region. Flow-based estimation techniques generally performed better than rain-based methods, especially for estimation of December 15 to March 31 mean daily flows. The USGS peak-flow equations also performed well, but tended to underestimate peak flows. The USGS equations performed within reported error bounds, but will require updating in the future as peak-flow data sets grow larger. Little correlation was discovered between estimation errors and geographic locations or various basin characteristics. However, for 25-percentile year mean-daily-flow estimates for December 15 to March 31, the greatest estimation errors were at east San Francisco Bay area stations with mean annual precipitation less than or equal to 30 inches, and estimated 2-year/24-hour rainfall intensity less than 3 inches.
Comparative statistics of Garman-Klass, Parkinson, Roger-Satchell and bridge estimators
NASA Astrophysics Data System (ADS)
Lapinova, S.; Saichev, A.
2017-01-01
Comparative statistical properties of Parkinson, Garman-Klass, Roger-Satchell and bridge oscillation estimators are discussed. Point and interval estimations, related with mentioned estimators are considered.
STATISTICAL ESTIMATION AND VISUALIZATION OF GROUND-WATER CONTAMINATION DATA
This work presents methods of visualizing and animating statistical estimates of ground water and/or soil contamination over a region from observations of the contaminant for that region. The primary statistical methods used to produce the regional estimates are nonparametric re...
NASA Technical Reports Server (NTRS)
Zimmerman, G. A.; Olsen, E. T.
1992-01-01
Noise power estimation in the High-Resolution Microwave Survey (HRMS) sky survey element is considered as an example of a constant false alarm rate (CFAR) signal detection problem. Order-statistic-based noise power estimators for CFAR detection are considered in terms of required estimator accuracy and estimator dynamic range. By limiting the dynamic range of the value to be estimated, the performance of an order-statistic estimator can be achieved by simpler techniques requiring only a single pass of the data. Simple threshold-and-count techniques are examined, and it is shown how several parallel threshold-and-count estimation devices can be used to expand the dynamic range to meet HRMS system requirements with minimal hardware complexity. An input/output (I/O) efficient limited-precision order-statistic estimator with wide but limited dynamic range is also examined.
A Novel Method for Estimating Transgender Status Using Electronic Medical Records
Roblin, Douglas; Barzilay, Joshua; Tolsma, Dennis; Robinson, Brandi; Schild, Laura; Cromwell, Lee; Braun, Hayley; Nash, Rebecca; Gerth, Joseph; Hunkeler, Enid; Quinn, Virginia P.; Tangpricha, Vin; Goodman, Michael
2016-01-01
Background We describe a novel algorithm for identifying transgender people and determining their male-to-female (MTF) or female-to-male (FTM) identity in electronic medical records (EMR) of an integrated health system. Methods A SAS program scanned Kaiser Permanente Georgia EMR from January 2006 through December 2014 for relevant diagnostic codes, and presence of specific keywords (e.g., “transgender” or “transsexual”) in clinical notes. Eligibility was verified by review of de-identified text strings containing targeted keywords, and if needed, by an additional in-depth review of records. Once transgender status was confirmed, FTM or MTF identity was assessed using a second SAS program and another round of text string reviews. Results Of 813,737 members, 271 were identified as possibly transgender: 137 through keywords only, 25 through diagnostic codes only, and 109 through both codes and keywords. Of these individuals, 185 (68%, 95% confidence interval [CI]: 62-74%) were confirmed as definitely transgender. The proportions (95% CIs) of definite transgender status among persons identified via keywords, diagnostic codes, and both were 45% (37-54%), 56% (35-75%), and 100% (96-100%), respectively. Of the 185 definitely transgender people, 99 (54%, 95% CI: 46-61%) were MTF, 84 (45%, 95% CI: 38-53%) were FTM. For two persons, gender identity remained unknown. Prevalence of transgender people (per 100,000 members) was 4.4 (95% CI: 2.6-7.4) in 2006 and 38.7 (95% CI: 32.4-46.2) in 2014. Conclusions The proposed method of identifying candidates for transgender health studies is low cost and relatively efficient. It can be applied in other similar health care systems. PMID:26907539
Does Twitter trigger bursts in signature collections?
Yamaguchi, Rui; Imoto, Seiya; Kami, Masahiro; Watanabe, Kenji; Miyano, Satoru; Yuji, Koichiro
2013-01-01
The quantification of social media impacts on societal and political events is a difficult undertaking. The Japanese Society of Oriental Medicine started a signature-collecting campaign to oppose a medical policy of the Government Revitalization Unit to exclude a traditional Japanese medicine, "Kampo," from the public insurance system. The signature count showed a series of aberrant bursts from November 26 to 29, 2009. In the same interval, the number of messages on Twitter including the keywords "Signature" and "Kampo," increased abruptly. Moreover, the number of messages on an Internet forum that discussed the policy and called for signatures showed a train of spikes. In order to estimate the contributions of social media, we developed a statistical model with state-space modeling framework that distinguishes the contributions of multiple social media in time-series of collected public opinions. We applied the model to the time-series of signature counts of the campaign and quantified contributions of two social media, i.e., Twitter and an Internet forum, by the estimation. We found that a considerable portion (78%) of the signatures was affected from either of the social media throughout the campaign and the Twitter effect (26%) was smaller than the Forum effect (52%) in total, although Twitter probably triggered the initial two bursts of signatures. Comparisons of the estimated profiles of the both effects suggested distinctions between the social media in terms of sustainable impact of messages or tweets. Twitter shows messages on various topics on a time-line; newer messages push out older ones. Twitter may diminish the impact of messages that are tweeted intermittently. The quantification of social media impacts is beneficial to better understand people's tendency and may promote developing strategies to engage public opinions effectively. Our proposed method is a promising tool to explore information hidden in social phenomena.
Weighted Statistical Binning: Enabling Statistically Consistent Genome-Scale Phylogenetic Analyses
Bayzid, Md Shamsuzzoha; Mirarab, Siavash; Boussau, Bastien; Warnow, Tandy
2015-01-01
Because biological processes can result in different loci having different evolutionary histories, species tree estimation requires multiple loci from across multiple genomes. While many processes can result in discord between gene trees and species trees, incomplete lineage sorting (ILS), modeled by the multi-species coalescent, is considered to be a dominant cause for gene tree heterogeneity. Coalescent-based methods have been developed to estimate species trees, many of which operate by combining estimated gene trees, and so are called "summary methods". Because summary methods are generally fast (and much faster than more complicated coalescent-based methods that co-estimate gene trees and species trees), they have become very popular techniques for estimating species trees from multiple loci. However, recent studies have established that summary methods can have reduced accuracy in the presence of gene tree estimation error, and also that many biological datasets have substantial gene tree estimation error, so that summary methods may not be highly accurate in biologically realistic conditions. Mirarab et al. (Science 2014) presented the "statistical binning" technique to improve gene tree estimation in multi-locus analyses, and showed that it improved the accuracy of MP-EST, one of the most popular coalescent-based summary methods. Statistical binning, which uses a simple heuristic to evaluate "combinability" and then uses the larger sets of genes to re-calculate gene trees, has good empirical performance, but using statistical binning within a phylogenomic pipeline does not have the desirable property of being statistically consistent. We show that weighting the re-calculated gene trees by the bin sizes makes statistical binning statistically consistent under the multispecies coalescent, and maintains the good empirical performance. Thus, "weighted statistical binning" enables highly accurate genome-scale species tree estimation, and is also statistically consistent under the multi-species coalescent model. New data used in this study are available at DOI: http://dx.doi.org/10.6084/m9.figshare.1411146, and the software is available at https://github.com/smirarab/binning. PMID:26086579
J-adaptive estimation with estimated noise statistics
NASA Technical Reports Server (NTRS)
Jazwinski, A. H.; Hipkins, C.
1973-01-01
The J-adaptive sequential estimator is extended to include simultaneous estimation of the noise statistics in a model for system dynamics. This extension completely automates the estimator, eliminating the requirement of an analyst in the loop. Simulations in satellite orbit determination demonstrate the efficacy of the sequential estimation algorithm.
Unsupervised Topic Discovery by Anomaly Detection
2013-09-01
Kullback , and R. A. Leibler , “On information and sufficiency,” Annals of Mathematical Statistics, vol. 22, no. 1, pp. 79–86, 1951. [14] S. Basu, A...read known publicly. There is a strong interest in the analysis of these opinions and comments as they provide useful information about the sentiments...them as topics. The difficulty in this approach is finding a good set of keywords that accurately represents the documents. The method used to
An Evaluation Method of Words Tendency Depending on Time-Series Variation and Its Improvements.
ERIC Educational Resources Information Center
Atlam, El-Sayed; Okada, Makoto; Shishibori, Masami; Aoe, Jun-ichi
2002-01-01
Discussion of word frequency and keywords in text focuses on a method to estimate automatically the stability classes that indicate a word's popularity with time-series variations based on the frequency change in past electronic text data. Compares the evaluation of decision tree stability class results with manual classification results.…
Using DMSP/OLS nighttime imagery to estimate carbon dioxide emission
NASA Astrophysics Data System (ADS)
Desheng, B.; Letu, H.; Bao, Y.; Naizhuo, Z.; Hara, M.; Nishio, F.
2012-12-01
This study highlighted a method for estimating CO2 emission from electric power plants using the Defense Meteorological Satellite Program's Operational Linescan System (DMSP/OLS) stable light image product for 1999. CO2 emissions from power plants account for a high percentage of CO2 emissions from fossil fuel consumptions. Thermal power plants generate the electricity by burning fossil fuels, so they emit CO2 directly. In many Asian countries such as China, Japan, India, and South Korea, the amounts of electric power generated by thermal power accounts over 58% in the total amount of electric power in 1999. So far, figures of the CO2 emission were obtained mainly by traditional statistical methods. Moreover, the statistical data were summarized as administrative regions, so it is difficult to examine the spatial distribution of non-administrative division. In some countries the reliability of such CO2 emission data is relatively low. However, satellite remote sensing can observe the earth surface without limitation of administrative regions. Thus, it is important to estimate CO2 using satellite remote sensing. In this study, we estimated the CO2 emission by fossil fuel consumption from electric power plant using stable light image of the DMSP/OLS satellite data for 1999 after correction for saturation effect in Japan. Digital number (DN) values of the stable light images in center areas of cities are saturated due to the large nighttime light intensities and characteristics of the OLS satellite sensors. To more accurately estimate the CO2 emission using the stable light images, a saturation correction method was developed by using the DMSP radiance calibration image, which does not include any saturation pixels. A regression equation was developed by the relationship between DN values of non-saturated pixels in the stable light image and those in the radiance calibration image. And, regression equation was used to adjust the DNs of the radiance calibration image. Then, saturated DNs of the stable light image was corrected using adjusted radiance calibration image. After that, regression analysis was performed with cumulative DNs of the corrected stable light image, electric power consumption, electric power generation and CO2 emission by fossil fuel consumption from electric power plant each other. Results indicated that there are good relationships (R2>90%) between DNs of the corrected stable light image and other parameters. Based on the above results, we estimated the CO2 emission from electric power plant using corrected stable light image. Keywords: DMSP/OLS, stable light, saturation light correction method, regression analysis Acknowledgment: The research was financially supported by the Sasakawa Scientific Research Grant from the Japan Science Society.
Mohd Yusof, Mohd Yusmiaidil Putera; Wan Mokhtar, Ilham; Rajasekharan, Sivaprakash; Overholser, Rosanna; Martens, Luc
2017-11-01
Through numerous validation and method comparison studies on different populations, the Willems method exhibited a superior accuracy. This article aims to systematically examine how accurate the application of Willems dental age method on children of different age groups and its performance based on various populations and regions. A strategic literature search of PubMed, MEDLINE, Web of Science, EMBASE and hand searching were used to identify the studies published up to September 2014 that estimated the dental age using the Willems method (modified Demirjian), with a populations, intervention, comparisons and outcomes (PICO) search strategy using MeSH keywords, focusing on the question: How much Willems method deviates from the chronological age in estimating age in children? Standardized mean differences were calculated for difference of dental age to chronological age by using random effects model. Subgroup analyses were performed to evaluate potential heterogeneity. Of 116 titles retrieved based on the standardized search strategy, only 19 articles fulfilled the inclusion criteria for quantitative analysis. The pooled estimates were separately kept as underestimation (n=7) and overestimation (n=12) of chronological age groups for both genders according to primary studies. On absolute values, females (underestimated by 0.13; 95% CI: 0.09-0.18 and overestimated by 0.27; 95% CI: 0.17-0.36) exhibited better accuracy than males (underestimated by 0.28; 95% CI: 0.14-0.42 and overestimated by 0.33; 95% CI: 0.22-0.44). For comparison purposes, the overall pooled estimate overestimated the age by 0.10 (95% CI: -0.06 to 0.26) and 0.09 (95% CI: -0.09 to 0.19) for males and females, respectively. There was no significant difference between the young and older child in subgroup analysis using omnibus test. The mean age between different regions exhibited no statistically significant. The use of Willems method is appropriate to estimate age in children considering its accuracy among different populations, investigators and age groups. Copyright © 2017 Elsevier B.V. All rights reserved.
Statistical properties of alternative national forest inventory area estimators
Francis Roesch; John Coulston; Andrew D. Hill
2012-01-01
The statistical properties of potential estimators of forest area for the USDA Forest Service's Forest Inventory and Analysis (FIA) program are presented and discussed. The current FIA area estimator is compared and contrasted with a weighted mean estimator and an estimator based on the Polya posterior, in the presence of nonresponse. Estimator optimality is...
NASA Astrophysics Data System (ADS)
El Sharif, H.; Teegavarapu, R. S.
2012-12-01
Spatial interpolation methods used for estimation of missing precipitation data at a site seldom check for their ability to preserve site and regional statistics. Such statistics are primarily defined by spatial correlations and other site-to-site statistics in a region. Preservation of site and regional statistics represents a means of assessing the validity of missing precipitation estimates at a site. This study evaluates the efficacy of a fuzzy-logic methodology for infilling missing historical daily precipitation data in preserving site and regional statistics. Rain gauge sites in the state of Kentucky, USA, are used as a case study for evaluation of this newly proposed method in comparison to traditional data infilling techniques. Several error and performance measures will be used to evaluate the methods and trade-offs in accuracy of estimation and preservation of site and regional statistics.
Evaluation of wind field statistics near and inside clouds using a coherent Doppler lidar
NASA Astrophysics Data System (ADS)
Lottman, Brian Todd
1998-09-01
This work proposes advanced techniques for measuring the spatial wind field statistics near and inside clouds using a vertically pointing solid state coherent Doppler lidar on a fixed ground based platform. The coherent Doppler lidar is an ideal instrument for high spatial and temporal resolution velocity estimates. The basic parameters of lidar are discussed, including a complete statistical description of the Doppler lidar signal. This description is extended to cases with simple functional forms for aerosol backscatter and velocity. An estimate for the mean velocity over a sensing volume is produced by estimating the mean spectra. There are many traditional spectral estimators, which are useful for conditions with slowly varying velocity and backscatter. A new class of estimators (novel) is introduced that produces reliable velocity estimates for conditions with large variations in aerosol backscatter and velocity with range, such as cloud conditions. Performance of traditional and novel estimators is computed for a variety of deterministic atmospheric conditions using computer simulated data. Wind field statistics are produced for actual data for a cloud deck, and for multi- layer clouds. Unique results include detection of possible spectral signatures for rain, estimates for the structure function inside a cloud deck, reliable velocity estimation techniques near and inside thin clouds, and estimates for simple wind field statistics between cloud layers.
Efficient and Robust Signal Approximations
2009-05-01
otherwise. Remark. Permutation matrices are both orthogonal and doubly- stochastic [62]. We will now show how to further simplify the Robust Coding...reporting burden for the collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching...Standard Form 298 (Rev. 8-98) Prescribed by ANSI Std Z39-18 Keywords: signal processing, image compression, independent component analysis , sparse
Ries(compiler), Kernell G.; With sections by Atkins, J. B.; Hummel, P.R.; Gray, Matthew J.; Dusenbury, R.; Jennings, M.E.; Kirby, W.H.; Riggs, H.C.; Sauer, V.B.; Thomas, W.O.
2007-01-01
The National Streamflow Statistics (NSS) Program is a computer program that should be useful to engineers, hydrologists, and others for planning, management, and design applications. NSS compiles all current U.S. Geological Survey (USGS) regional regression equations for estimating streamflow statistics at ungaged sites in an easy-to-use interface that operates on computers with Microsoft Windows operating systems. NSS expands on the functionality of the USGS National Flood Frequency Program, and replaces it. The regression equations included in NSS are used to transfer streamflow statistics from gaged to ungaged sites through the use of watershed and climatic characteristics as explanatory or predictor variables. Generally, the equations were developed on a statewide or metropolitan-area basis as part of cooperative study programs. Equations are available for estimating rural and urban flood-frequency statistics, such as the 1 00-year flood, for every state, for Puerto Rico, and for the island of Tutuila, American Samoa. Equations are available for estimating other statistics, such as the mean annual flow, monthly mean flows, flow-duration percentiles, and low-flow frequencies (such as the 7-day, 0-year low flow) for less than half of the states. All equations available for estimating streamflow statistics other than flood-frequency statistics assume rural (non-regulated, non-urbanized) conditions. The NSS output provides indicators of the accuracy of the estimated streamflow statistics. The indicators may include any combination of the standard error of estimate, the standard error of prediction, the equivalent years of record, or 90 percent prediction intervals, depending on what was provided by the authors of the equations. The program includes several other features that can be used only for flood-frequency estimation. These include the ability to generate flood-frequency plots, and plots of typical flood hydrographs for selected recurrence intervals, estimates of the probable maximum flood, extrapolation of the 500-year flood when an equation for estimating it is not available, and weighting techniques to improve flood-frequency estimates for gaging stations and ungaged sites on gaged streams. This report describes the regionalization techniques used to develop the equations in NSS and provides guidance on the applicability and limitations of the techniques. The report also includes a users manual and a summary of equations available for estimating basin lagtime, which is needed by the program to generate flood hydrographs. The NSS software and accompanying database, and the documentation for the regression equations included in NSS, are available on the Web at http://water.usgs.gov/software/.
NASA Astrophysics Data System (ADS)
Sikora, Grzegorz; Teuerle, Marek; Wyłomańska, Agnieszka; Grebenkov, Denis
2017-08-01
The most common way of estimating the anomalous scaling exponent from single-particle trajectories consists of a linear fit of the dependence of the time-averaged mean-square displacement on the lag time at the log-log scale. We investigate the statistical properties of this estimator in the case of fractional Brownian motion (FBM). We determine the mean value, the variance, and the distribution of the estimator. Our theoretical results are confirmed by Monte Carlo simulations. In the limit of long trajectories, the estimator is shown to be asymptotically unbiased, consistent, and with vanishing variance. These properties ensure an accurate estimation of the scaling exponent even from a single (long enough) trajectory. As a consequence, we prove that the usual way to estimate the diffusion exponent of FBM is correct from the statistical point of view. Moreover, the knowledge of the estimator distribution is the first step toward new statistical tests of FBM and toward a more reliable interpretation of the experimental histograms of scaling exponents in microbiology.
The estimation of the measurement results with using statistical methods
NASA Astrophysics Data System (ADS)
Velychko, O.; Gordiyenko, T.
2015-02-01
The row of international standards and guides describe various statistical methods that apply for a management, control and improvement of processes with the purpose of realization of analysis of the technical measurement results. The analysis of international standards and guides on statistical methods estimation of the measurement results recommendations for those applications in laboratories is described. For realization of analysis of standards and guides the cause-and-effect Ishikawa diagrams concerting to application of statistical methods for estimation of the measurement results are constructed.
Assaad, Houssein I; Choudhary, Pankaj K
2013-01-01
The L -statistics form an important class of estimators in nonparametric statistics. Its members include trimmed means and sample quantiles and functions thereof. This article is devoted to theory and applications of L -statistics for repeated measurements data, wherein the measurements on the same subject are dependent and the measurements from different subjects are independent. This article has three main goals: (a) Show that the L -statistics are asymptotically normal for repeated measurements data. (b) Present three statistical applications of this result, namely, location estimation using trimmed means, quantile estimation and construction of tolerance intervals. (c) Obtain a Bahadur representation for sample quantiles. These results are generalizations of similar results for independently and identically distributed data. The practical usefulness of these results is illustrated by analyzing a real data set involving measurement of systolic blood pressure. The properties of the proposed point and interval estimators are examined via simulation.
Relative risk estimates from spatial and space-time scan statistics: Are they biased?
Prates, Marcos O.; Kulldorff, Martin; Assunção, Renato M.
2014-01-01
The purely spatial and space-time scan statistics have been successfully used by many scientists to detect and evaluate geographical disease clusters. Although the scan statistic has high power in correctly identifying a cluster, no study has considered the estimates of the cluster relative risk in the detected cluster. In this paper we evaluate whether there is any bias on these estimated relative risks. Intuitively, one may expect that the estimated relative risks has upward bias, since the scan statistic cherry picks high rate areas to include in the cluster. We show that this intuition is correct for clusters with low statistical power, but with medium to high power the bias becomes negligible. The same behaviour is not observed for the prospective space-time scan statistic, where there is an increasing conservative downward bias of the relative risk as the power to detect the cluster increases. PMID:24639031
Pattern statistics on Markov chains and sensitivity to parameter estimation
Nuel, Grégory
2006-01-01
Background: In order to compute pattern statistics in computational biology a Markov model is commonly used to take into account the sequence composition. Usually its parameter must be estimated. The aim of this paper is to determine how sensitive these statistics are to parameter estimation, and what are the consequences of this variability on pattern studies (finding the most over-represented words in a genome, the most significant common words to a set of sequences,...). Results: In the particular case where pattern statistics (overlap counting only) computed through binomial approximations we use the delta-method to give an explicit expression of σ, the standard deviation of a pattern statistic. This result is validated using simulations and a simple pattern study is also considered. Conclusion: We establish that the use of high order Markov model could easily lead to major mistakes due to the high sensitivity of pattern statistics to parameter estimation. PMID:17044916
Pattern statistics on Markov chains and sensitivity to parameter estimation.
Nuel, Grégory
2006-10-17
In order to compute pattern statistics in computational biology a Markov model is commonly used to take into account the sequence composition. Usually its parameter must be estimated. The aim of this paper is to determine how sensitive these statistics are to parameter estimation, and what are the consequences of this variability on pattern studies (finding the most over-represented words in a genome, the most significant common words to a set of sequences,...). In the particular case where pattern statistics (overlap counting only) computed through binomial approximations we use the delta-method to give an explicit expression of sigma, the standard deviation of a pattern statistic. This result is validated using simulations and a simple pattern study is also considered. We establish that the use of high order Markov model could easily lead to major mistakes due to the high sensitivity of pattern statistics to parameter estimation.
NASA Technical Reports Server (NTRS)
Starlinger, Alois; Duffy, Stephen F.; Palko, Joseph L.
1993-01-01
New methods are presented that utilize the optimization of goodness-of-fit statistics in order to estimate Weibull parameters from failure data. It is assumed that the underlying population is characterized by a three-parameter Weibull distribution. Goodness-of-fit tests are based on the empirical distribution function (EDF). The EDF is a step function, calculated using failure data, and represents an approximation of the cumulative distribution function for the underlying population. Statistics (such as the Kolmogorov-Smirnov statistic and the Anderson-Darling statistic) measure the discrepancy between the EDF and the cumulative distribution function (CDF). These statistics are minimized with respect to the three Weibull parameters. Due to nonlinearities encountered in the minimization process, Powell's numerical optimization procedure is applied to obtain the optimum value of the EDF. Numerical examples show the applicability of these new estimation methods. The results are compared to the estimates obtained with Cooper's nonlinear regression algorithm.
SAP- FORTRAN STATIC SOURCE CODE ANALYZER PROGRAM (IBM VERSION)
NASA Technical Reports Server (NTRS)
Manteufel, R.
1994-01-01
The FORTRAN Static Source Code Analyzer program, SAP, was developed to automatically gather statistics on the occurrences of statements and structures within a FORTRAN program and to provide for the reporting of those statistics. Provisions have been made for weighting each statistic and to provide an overall figure of complexity. Statistics, as well as figures of complexity, are gathered on a module by module basis. Overall summed statistics are also accumulated for the complete input source file. SAP accepts as input syntactically correct FORTRAN source code written in the FORTRAN 77 standard language. In addition, code written using features in the following languages is also accepted: VAX-11 FORTRAN, IBM S/360 FORTRAN IV Level H Extended; and Structured FORTRAN. The SAP program utilizes two external files in its analysis procedure. A keyword file allows flexibility in classifying statements and in marking a statement as either executable or non-executable. A statistical weight file allows the user to assign weights to all output statistics, thus allowing the user flexibility in defining the figure of complexity. The SAP program is written in FORTRAN IV for batch execution and has been implemented on a DEC VAX series computer under VMS and on an IBM 370 series computer under MVS. The SAP program was developed in 1978 and last updated in 1985.
SAP- FORTRAN STATIC SOURCE CODE ANALYZER PROGRAM (DEC VAX VERSION)
NASA Technical Reports Server (NTRS)
Merwarth, P. D.
1994-01-01
The FORTRAN Static Source Code Analyzer program, SAP, was developed to automatically gather statistics on the occurrences of statements and structures within a FORTRAN program and to provide for the reporting of those statistics. Provisions have been made for weighting each statistic and to provide an overall figure of complexity. Statistics, as well as figures of complexity, are gathered on a module by module basis. Overall summed statistics are also accumulated for the complete input source file. SAP accepts as input syntactically correct FORTRAN source code written in the FORTRAN 77 standard language. In addition, code written using features in the following languages is also accepted: VAX-11 FORTRAN, IBM S/360 FORTRAN IV Level H Extended; and Structured FORTRAN. The SAP program utilizes two external files in its analysis procedure. A keyword file allows flexibility in classifying statements and in marking a statement as either executable or non-executable. A statistical weight file allows the user to assign weights to all output statistics, thus allowing the user flexibility in defining the figure of complexity. The SAP program is written in FORTRAN IV for batch execution and has been implemented on a DEC VAX series computer under VMS and on an IBM 370 series computer under MVS. The SAP program was developed in 1978 and last updated in 1985.
Humans make efficient use of natural image statistics when performing spatial interpolation.
D'Antona, Anthony D; Perry, Jeffrey S; Geisler, Wilson S
2013-12-16
Visual systems learn through evolution and experience over the lifespan to exploit the statistical structure of natural images when performing visual tasks. Understanding which aspects of this statistical structure are incorporated into the human nervous system is a fundamental goal in vision science. To address this goal, we measured human ability to estimate the intensity of missing image pixels in natural images. Human estimation accuracy is compared with various simple heuristics (e.g., local mean) and with optimal observers that have nearly complete knowledge of the local statistical structure of natural images. Human estimates are more accurate than those of simple heuristics, and they match the performance of an optimal observer that knows the local statistical structure of relative intensities (contrasts). This optimal observer predicts the detailed pattern of human estimation errors and hence the results place strong constraints on the underlying neural mechanisms. However, humans do not reach the performance of an optimal observer that knows the local statistical structure of the absolute intensities, which reflect both local relative intensities and local mean intensity. As predicted from a statistical analysis of natural images, human estimation accuracy is negligibly improved by expanding the context from a local patch to the whole image. Our results demonstrate that the human visual system exploits efficiently the statistical structure of natural images.
A General Model for Estimating and Correcting the Effects of Nonindependence in Meta-Analysis.
ERIC Educational Resources Information Center
Strube, Michael J.
A general model is described which can be used to represent the four common types of meta-analysis: (1) estimation of effect size by combining study outcomes; (2) estimation of effect size by contrasting study outcomes; (3) estimation of statistical significance by combining study outcomes; and (4) estimation of statistical significance by…
DOT National Transportation Integrated Search
2010-03-01
This document provides guidance for using the ACS Statistical Analyzer. It is an Excel-based template for users of estimates from the American Community Survey (ACS) to assess the precision of individual estimates and to compare pairs of estimates fo...
Nowcasting Cloud Fields for U.S. Air Force Special Operations
2017-03-01
application of Bayes’ Rule offers many advantages over Kernel Density Estimation (KDE) and other commonly used statistical post-processing methods...reflectance and probability of cloud. A statistical post-processing technique is applied using Bayesian estimation to train the system from a set of past...nowcasting, low cloud forecasting, cloud reflectance, ISR, Bayesian estimation, statistical post-processing, machine learning 15. NUMBER OF PAGES
2009-01-09
Vector Ecology 34 (1): 99-103. 2009. Keyword Index : House fly, Musca domestica, trapping. INTRODUCTION Traps have been a mainstay of house fly (Musca...attract synanthropic flies. Proc. Pap. 46th Ann. Conf. Calif. Mosq. Vector Contr. Assoc. pp. 70-73. Pickens, L. G. and R. W. Miller. 1987. Techniques...1139: 279- 284. SAS Institute. 1992. SAS users guide: statistics. SAS Institute, Cary, NC. Warner, W. B. 1991. Attractant composition for synanthropic
Three Important Taylor Series for Introductory Physics
2009-09-01
series by the sum of its first few terms is useful throughout an introductory physics course . Example applications [1, 2] include estimating square...Lat. Am. J. Phys. Educ. Vol. 3, No. 3, Sept. 2009 535 http://www.journal.lapen.org.mx Three Important Taylor Series for Introductory Physics...one dimension, which instructively ties the mathematical development to physics concepts already presented in introductory courses . Keywords
Fast Algorithms for Estimating Mixture Parameters
1989-08-30
The investigation is a two year project with the first year sponsored by the Army Research Office and the second year by the National Science Foundation (Grant... Science Foundation during the coming year. Keywords: Fast algorithms; Algorithms Mixture Distribution Random Variables. (KR)...numerical testing of the accelerated fixed-point method was completed. The work on relaxation methods will be done under the sponsorship of the National
Debris Hazards Due to Overloaded Conventional Construction Facades
2015-12-01
hazards to buildings. This work will present results for experiments involving conventional façade materials (glass, concrete , and mason- ry) that have...ex- periments and a discussion of the distribution parameters are presented. Keywords: Blast, fragmentation, concrete , masonry, debris... concrete , glass, and concrete masonry. It was also desired to produce data for which the state of stress and strain rates could be estimated. There were
Estimating the Proportion of True Null Hypotheses Using the Pattern of Observed p-values
Tong, Tiejun; Feng, Zeny; Hilton, Julia S.; Zhao, Hongyu
2013-01-01
Estimating the proportion of true null hypotheses, π0, has attracted much attention in the recent statistical literature. Besides its apparent relevance for a set of specific scientific hypotheses, an accurate estimate of this parameter is key for many multiple testing procedures. Most existing methods for estimating π0 in the literature are motivated from the independence assumption of test statistics, which is often not true in reality. Simulations indicate that most existing estimators in the presence of the dependence among test statistics can be poor, mainly due to the increase of variation in these estimators. In this paper, we propose several data-driven methods for estimating π0 by incorporating the distribution pattern of the observed p-values as a practical approach to address potential dependence among test statistics. Specifically, we use a linear fit to give a data-driven estimate for the proportion of true-null p-values in (λ, 1] over the whole range [0, 1] instead of using the expected proportion at 1 − λ. We find that the proposed estimators may substantially decrease the variance of the estimated true null proportion and thus improve the overall performance. PMID:24078762
Estimating the Proportion of True Null Hypotheses Using the Pattern of Observed p-values.
Tong, Tiejun; Feng, Zeny; Hilton, Julia S; Zhao, Hongyu
2013-01-01
Estimating the proportion of true null hypotheses, π 0 , has attracted much attention in the recent statistical literature. Besides its apparent relevance for a set of specific scientific hypotheses, an accurate estimate of this parameter is key for many multiple testing procedures. Most existing methods for estimating π 0 in the literature are motivated from the independence assumption of test statistics, which is often not true in reality. Simulations indicate that most existing estimators in the presence of the dependence among test statistics can be poor, mainly due to the increase of variation in these estimators. In this paper, we propose several data-driven methods for estimating π 0 by incorporating the distribution pattern of the observed p -values as a practical approach to address potential dependence among test statistics. Specifically, we use a linear fit to give a data-driven estimate for the proportion of true-null p -values in (λ, 1] over the whole range [0, 1] instead of using the expected proportion at 1 - λ. We find that the proposed estimators may substantially decrease the variance of the estimated true null proportion and thus improve the overall performance.
12 CFR Appendix A to Subpart A of... - Appendix A to Subpart A of Part 327
Code of Federal Regulations, 2010 CFR
2010-01-01
... pricing multipliers are derived from: • A model (the Statistical Model) that estimates the probability..., which is four basis points higher than the minimum rate. II. The Statistical Model The Statistical Model... to 1997. As a result, and as described in Table A.1, the Statistical Model is estimated using a...
NASA Astrophysics Data System (ADS)
de Macedo, Isadora A. S.; da Silva, Carolina B.; de Figueiredo, J. J. S.; Omoboya, Bode
2017-01-01
Wavelet estimation as well as seismic-to-well tie procedures are at the core of every seismic interpretation workflow. In this paper we perform a comparative study of wavelet estimation methods for seismic-to-well tie. Two approaches to wavelet estimation are discussed: a deterministic estimation, based on both seismic and well log data, and a statistical estimation, based on predictive deconvolution and the classical assumptions of the convolutional model, which provides a minimum-phase wavelet. Our algorithms, for both wavelet estimation methods introduce a semi-automatic approach to determine the optimum parameters of deterministic wavelet estimation and statistical wavelet estimation and, further, to estimate the optimum seismic wavelets by searching for the highest correlation coefficient between the recorded trace and the synthetic trace, when the time-depth relationship is accurate. Tests with numerical data show some qualitative conclusions, which are probably useful for seismic inversion and interpretation of field data, by comparing deterministic wavelet estimation and statistical wavelet estimation in detail, especially for field data example. The feasibility of this approach is verified on real seismic and well data from Viking Graben field, North Sea, Norway. Our results also show the influence of the washout zones on well log data on the quality of the well to seismic tie.
Uncertainties in Estimates of Fleet Average Fuel Economy : A Statistical Evaluation
DOT National Transportation Integrated Search
1977-01-01
Research was performed to assess the current Federal procedure for estimating the average fuel economy of each automobile manufacturer's new car fleet. Test vehicle selection and fuel economy estimation methods were characterized statistically and so...
Cheng, Paula Glenda Ferrer; Ramos, Roann Munoz; Bitsch, Jó Ágila; Jonas, Stephan Michael; Ix, Tim; See, Portia Lynn Quetulio; Wehrle, Klaus
2016-07-20
Language reflects the state of one's mental health and personal characteristics. It also reveals preoccupations with a particular schema, thus possibly providing insights into psychological conditions. Using text or lexical analysis in exploring depression, negative schemas and self-focusing tendencies may be depicted. As mobile technology has become highly integrated in daily routine, mobile devices have the capacity for ecological momentary assessment (EMA), specifically the experience sampling method (ESM), where behavior is captured in real-time or closer in time to experience in one's natural environment. Extending mobile technology to psychological health could augment initial clinical assessment, particularly of mood disturbances, such as depression and analyze daily activities, such as language use in communication. Here, we present the process of lexicon generation and development and the initial validation of Psychologist in a Pocket (PiaP), a mobile app designed to screen signs of depression through text analysis. The main objectives of the study are (1) to generate and develop a depressive lexicon that can be used for screening text-input in mobile apps to be used in the PiaP; and (2) to conduct content validation as initial validation. The first phase of our research focused on lexicon development. Words related to depression and its symptoms based on the Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition (DSM-5) and in the ICD-10 Classification of Mental and Behavioural Disorders: Clinical Descriptions and Diagnostic Guidelines classification systems were gathered from focus group discussions with Filipino college students, interviews with mental health professionals, and the review of established scales for depression and other related constructs. The lexicon development phase yielded a database consisting of 13 categories based on the criteria depressive symptoms in the DSM-5 and ICD-10. For the draft of the depression lexicon for PiaP, we were able to gather 1762 main keywords and 9655 derivatives of main keywords. In addition, we compiled 823,869 spelling variations. Keywords included negatively-valenced words like "sad", "unworthy", or "tired" which are almost always accompanied by personal pronouns, such as "I", "I'm" or "my" and in Filipino, "ako" or "ko". For the content validation, only keywords with CVR equal to or more than 0.75 were included in the depression lexicon test-run version. The mean of all CVRs yielded a high overall CVI of 0.90. A total of 1498 main keywords, 8911 derivatives of main keywords, and 783,140 spelling variations, with a total of 793, 553 keywords now comprise the test-run version. The generation of the depression lexicon is relatively exhaustive. The breadth of keywords used in text analysis incorporates the characteristic expressions of depression and its related constructs by a particular culture and age group. A content-validated mobile health app, PiaP may help augment a more effective and early detection of depressive symptoms.
NASA Astrophysics Data System (ADS)
Aleman, A.
2017-12-01
This presentation will provide an overview and discussion of the Global Change Master Directory (GCMD) Keywords and their applications in Earth science data discovery. The GCMD Keywords are a hierarchical set of controlled keywords covering the Earth science disciplines, including: science keywords, service keywords, data centers, projects, location, data resolution, instruments and platforms. Controlled vocabularies (keywords) help users accurately, consistently and comprehensively categorize their data and also allow for the precise search and subsequent retrieval of data. The GCMD Keywords are a community resource and are developed collaboratively with input from various stakeholders, including GCMD staff, keyword users and metadata providers. The GCMD Keyword Landing Page and GCMD Keyword Community Forum provide access to keyword resources and an area for discussion of topics related to the GCMD Keywords. See https://earthdata.nasa.gov/about/gcmd/global-change-master-directory-gcmd-keywords
DOT National Transportation Integrated Search
1997-10-01
In order to provide waterborne commerce information as soon as possible, the Waterborne Commerce Statistics Center (WCSC) has prepared this summary document of estimated waterborne commerce statistics for calendar year 1996. The foreign import and ex...
DOT National Transportation Integrated Search
1999-07-30
In order to provide waterborne commerce information as soon as possible, the Waterborne Commerce Statistics Center(WCSC) has prepared this summary document of estimated waterborne commerce statistics for calendar year 1998. The foreign import and exp...
Singer, Donald A.; Menzie, W.D.; Cheng, Qiuming; Bonham-Carter, G. F.
2005-01-01
Estimating numbers of undiscovered mineral deposits is a fundamental part of assessing mineral resources. Some statistical tools can act as guides to low variance, unbiased estimates of the number of deposits. The primary guide is that the estimates must be consistent with the grade and tonnage models. Another statistical guide is the deposit density (i.e., the number of deposits per unit area of permissive rock in well-explored control areas). Preliminary estimates and confidence limits of the number of undiscovered deposits in a tract of given area may be calculated using linear regression and refined using frequency distributions with appropriate parameters. A Poisson distribution leads to estimates having lower relative variances than the regression estimates and implies a random distribution of deposits. Coefficients of variation are used to compare uncertainties of negative binomial, Poisson, or MARK3 empirical distributions that have the same expected number of deposits as the deposit density. Statistical guides presented here allow simple yet robust estimation of the number of undiscovered deposits in permissive terranes.
Probabilistic models in human sensorimotor control
Wolpert, Daniel M.
2009-01-01
Sensory and motor uncertainty form a fundamental constraint on human sensorimotor control. Bayesian decision theory (BDT) has emerged as a unifying framework to understand how the central nervous system performs optimal estimation and control in the face of such uncertainty. BDT has two components: Bayesian statistics and decision theory. Here we review Bayesian statistics and show how it applies to estimating the state of the world and our own body. Recent results suggest that when learning novel tasks we are able to learn the statistical properties of both the world and our own sensory apparatus so as to perform estimation using Bayesian statistics. We review studies which suggest that humans can combine multiple sources of information to form maximum likelihood estimates, can incorporate prior beliefs about possible states of the world so as to generate maximum a posteriori estimates and can use Kalman filter-based processes to estimate time-varying states. Finally, we review Bayesian decision theory in motor control and how the central nervous system processes errors to determine loss functions and optimal actions. We review results that suggest we plan movements based on statistics of our actions that result from signal-dependent noise on our motor outputs. Taken together these studies provide a statistical framework for how the motor system performs in the presence of uncertainty. PMID:17628731
New robust statistical procedures for the polytomous logistic regression models.
Castilla, Elena; Ghosh, Abhik; Martin, Nirian; Pardo, Leandro
2018-05-17
This article derives a new family of estimators, namely the minimum density power divergence estimators, as a robust generalization of the maximum likelihood estimator for the polytomous logistic regression model. Based on these estimators, a family of Wald-type test statistics for linear hypotheses is introduced. Robustness properties of both the proposed estimators and the test statistics are theoretically studied through the classical influence function analysis. Appropriate real life examples are presented to justify the requirement of suitable robust statistical procedures in place of the likelihood based inference for the polytomous logistic regression model. The validity of the theoretical results established in the article are further confirmed empirically through suitable simulation studies. Finally, an approach for the data-driven selection of the robustness tuning parameter is proposed with empirical justifications. © 2018, The International Biometric Society.
Zhou, Xiang
2017-12-01
Linear mixed models (LMMs) are among the most commonly used tools for genetic association studies. However, the standard method for estimating variance components in LMMs-the restricted maximum likelihood estimation method (REML)-suffers from several important drawbacks: REML requires individual-level genotypes and phenotypes from all samples in the study, is computationally slow, and produces downward-biased estimates in case control studies. To remedy these drawbacks, we present an alternative framework for variance component estimation, which we refer to as MQS. MQS is based on the method of moments (MoM) and the minimal norm quadratic unbiased estimation (MINQUE) criterion, and brings two seemingly unrelated methods-the renowned Haseman-Elston (HE) regression and the recent LD score regression (LDSC)-into the same unified statistical framework. With this new framework, we provide an alternative but mathematically equivalent form of HE that allows for the use of summary statistics. We provide an exact estimation form of LDSC to yield unbiased and statistically more efficient estimates. A key feature of our method is its ability to pair marginal z -scores computed using all samples with SNP correlation information computed using a small random subset of individuals (or individuals from a proper reference panel), while capable of producing estimates that can be almost as accurate as if both quantities are computed using the full data. As a result, our method produces unbiased and statistically efficient estimates, and makes use of summary statistics, while it is computationally efficient for large data sets. Using simulations and applications to 37 phenotypes from 8 real data sets, we illustrate the benefits of our method for estimating and partitioning SNP heritability in population studies as well as for heritability estimation in family studies. Our method is implemented in the GEMMA software package, freely available at www.xzlab.org/software.html.
Nonparametric estimation and testing of fixed effects panel data models
Henderson, Daniel J.; Carroll, Raymond J.; Li, Qi
2009-01-01
In this paper we consider the problem of estimating nonparametric panel data models with fixed effects. We introduce an iterative nonparametric kernel estimator. We also extend the estimation method to the case of a semiparametric partially linear fixed effects model. To determine whether a parametric, semiparametric or nonparametric model is appropriate, we propose test statistics to test between the three alternatives in practice. We further propose a test statistic for testing the null hypothesis of random effects against fixed effects in a nonparametric panel data regression model. Simulations are used to examine the finite sample performance of the proposed estimators and the test statistics. PMID:19444335
Rapid automatic keyword extraction for information retrieval and analysis
Rose, Stuart J [Richland, WA; Cowley,; E, Wendy [Richland, WA; Crow, Vernon L [Richland, WA; Cramer, Nicholas O [Richland, WA
2012-03-06
Methods and systems for rapid automatic keyword extraction for information retrieval and analysis. Embodiments can include parsing words in an individual document by delimiters, stop words, or both in order to identify candidate keywords. Word scores for each word within the candidate keywords are then calculated based on a function of co-occurrence degree, co-occurrence frequency, or both. Based on a function of the word scores for words within the candidate keyword, a keyword score is calculated for each of the candidate keywords. A portion of the candidate keywords are then extracted as keywords based, at least in part, on the candidate keywords having the highest keyword scores.
Estimating the proportion of true null hypotheses when the statistics are discrete.
Dialsingh, Isaac; Austin, Stefanie R; Altman, Naomi S
2015-07-15
In high-dimensional testing problems π0, the proportion of null hypotheses that are true is an important parameter. For discrete test statistics, the P values come from a discrete distribution with finite support and the null distribution may depend on an ancillary statistic such as a table margin that varies among the test statistics. Methods for estimating π0 developed for continuous test statistics, which depend on a uniform or identical null distribution of P values, may not perform well when applied to discrete testing problems. This article introduces a number of π0 estimators, the regression and 'T' methods that perform well with discrete test statistics and also assesses how well methods developed for or adapted from continuous tests perform with discrete tests. We demonstrate the usefulness of these estimators in the analysis of high-throughput biological RNA-seq and single-nucleotide polymorphism data. implemented in R. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Multi-stream LSTM-HMM decoding and histogram equalization for noise robust keyword spotting.
Wöllmer, Martin; Marchi, Erik; Squartini, Stefano; Schuller, Björn
2011-09-01
Highly spontaneous, conversational, and potentially emotional and noisy speech is known to be a challenge for today's automatic speech recognition (ASR) systems, which highlights the need for advanced algorithms that improve speech features and models. Histogram Equalization is an efficient method to reduce the mismatch between clean and noisy conditions by normalizing all moments of the probability distribution of the feature vector components. In this article, we propose to combine histogram equalization and multi-condition training for robust keyword detection in noisy speech. To better cope with conversational speaking styles, we show how contextual information can be effectively exploited in a multi-stream ASR framework that dynamically models context-sensitive phoneme estimates generated by a long short-term memory neural network. The proposed techniques are evaluated on the SEMAINE database-a corpus containing emotionally colored conversations with a cognitive system for "Sensitive Artificial Listening".
NASA Astrophysics Data System (ADS)
Palomino-Lemus, Reiner; Córdoba-Machado, Samir; Quishpe-Vásquez, César; García-Valdecasas-Ojeda, Matilde; Raquel Gámiz-Fortis, Sonia; Castro-Díez, Yolanda; Jesús Esteban-Parra, María
2017-04-01
In this study the Principal Component Regression (PCR) method has been used as statistical downscaling technique for simulating boreal winter precipitation in Tropical America during the period 1950-2010, and then for generating climate change projections for 2071-2100 period. The study uses the Global Precipitation Climatology Centre (GPCC, version 6) data set over the Tropical America region [30°N-30°S, 120°W-30°W] as predictand variable in the downscaling model. The mean monthly sea level pressure (SLP) from the National Center for Environmental Prediction - National Center for Atmospheric Research (NCEP-NCAR reanalysis project), has been used as predictor variable, covering a more extended area [30°N-30°S, 180°W-30°W]. Also, the SLP outputs from 20 GCMs, taken from the Coupled Model Intercomparison Project (CMIP5) have been used. The model data include simulations with historical atmospheric concentrations and future projections for the representative concentration pathways RCP2.6, RCP4.5, and RCP8.5. The ability of the different GCMs to simulate the winter precipitation in the study area for present climate (1971-2000) was analyzed by calculating the differences between the simulated and observed precipitation values. Additionally, the statistical significance at 95% confidence level of these differences has been estimated by means of the bilateral rank sum test of Wilcoxon-Mann-Whitney. Finally, to project winter precipitation in the area for the period 2071-2100, the downscaling model, recalibrated for the total period 1950-2010, was applied to the SLP outputs of the GCMs under the RCP2.6, RCP4.5, and RCP8.5 scenarios. The results show that, generally, for present climate the statistical downscaling shows a high ability to faithfully reproduce the precipitation field, while the simulations performed directly by using not downscaled outputs of GCMs strongly distort the precipitation field. For future climate, the projected predictions under the RCP4.5 and RCP8.5 scenarios show large areas with significant changes. For the RCP2.6 scenario, projected results present a predominance of very moderate decreases in rainfall, although significant in some models. Keywords: climate change projections, precipitation, Tropical America, statistical downscaling. Acknowledgements: This work has been financed by the projects P11-RNM-7941 (Junta de Andalucía-Spain) and CGL2013-48539-R (MINECO-Spain, FEDER).
2015-05-12
Deficiencies That Affect the Reliability of Estimates ________________________________________6 Statistical Precision Could Be Improved... statistical precision of improper payments estimates in seven of the DoD payment programs through the use of stratified sample designs. DoD improper...payments not subject to sampling, which made the results statistically invalid. We made a recommendation to correct this problem in a previous report;4
NASA Astrophysics Data System (ADS)
Farmer, W. H.; Kiang, J. E.
2017-12-01
The development, deployment and maintenance of water resources management infrastructure and practices rely on hydrologic characterization, which requires an understanding of local hydrology. With regards to streamflow, this understanding is typically quantified with statistics derived from long-term streamgage records. However, a fundamental problem is how to characterize local hydrology without the luxury of streamgage records, a problem that complicates water resources management at ungaged locations and for long-term future projections. This problem has typically been addressed through the development of point estimators, such as regression equations, to estimate particular statistics. Physically-based precipitation-runoff models, which are capable of producing simulated hydrographs, offer an alternative to point estimators. The advantage of simulated hydrographs is that they can be used to compute any number of streamflow statistics from a single source (the simulated hydrograph) rather than relying on a diverse set of point estimators. However, the use of simulated hydrographs introduces a degree of model uncertainty that is propagated through to estimated streamflow statistics and may have drastic effects on management decisions. We compare the accuracy and precision of streamflow statistics (e.g. the mean annual streamflow, the annual maximum streamflow exceeded in 10% of years, and the minimum seven-day average streamflow exceeded in 90% of years, among others) derived from point estimators (e.g. regressions, kriging, machine learning) to that of statistics derived from simulated hydrographs across the continental United States. Initial results suggest that the error introduced through hydrograph simulation may substantially bias the resulting hydrologic characterization.
Nonlinear Statistical Estimation with Numerical Maximum Likelihood
1974-10-01
probably most directly attributable to the speed, precision and compactness of the linear programming algorithm exercised ; the mutual primal-dual...discriminant analysis is to classify the individual as a member of T# or IT, 1 2 according to the relative...Introduction to the Dissertation 1 Introduction to Statistical Estimation Theory 3 Choice of Estimator.. .Density Functions 12 Choice of Estimator
NASA Technical Reports Server (NTRS)
Tilton, J. C.; Swain, P. H. (Principal Investigator); Vardeman, S. B.
1981-01-01
A key input to a statistical classification algorithm, which exploits the tendency of certain ground cover classes to occur more frequently in some spatial context than in others, is a statistical characterization of the context: the context distribution. An unbiased estimator of the context distribution is discussed which, besides having the advantage of statistical unbiasedness, has the additional advantage over other estimation techniques of being amenable to an adaptive implementation in which the context distribution estimate varies according to local contextual information. Results from applying the unbiased estimator to the contextual classification of three real LANDSAT data sets are presented and contrasted with results from non-contextual classifications and from contextual classifications utilizing other context distribution estimation techniques.
Study on Hybrid Image Search Technology Based on Texts and Contents
NASA Astrophysics Data System (ADS)
Wang, H. T.; Ma, F. L.; Yan, C.; Pan, H.
2018-05-01
Image search was studied first here based on texts and contents, respectively. The text-based image feature extraction was put forward by integrating the statistical and topic features in view of the limitation of extraction of keywords only by means of statistical features of words. On the other hand, a search-by-image method was put forward based on multi-feature fusion in view of the imprecision of the content-based image search by means of a single feature. The layered-searching method depended on primarily the text-based image search method and additionally the content-based image search was then put forward in view of differences between the text-based and content-based methods and their difficult direct fusion. The feasibility and effectiveness of the hybrid search algorithm were experimentally verified.
Assimilating NOAA SST data into BSH operational circulation model for North and Baltic Seas
NASA Astrophysics Data System (ADS)
Losa, Svetlana; Schroeter, Jens; Nerger, Lars; Janjic, Tijana; Danilov, Sergey; Janssen, Frank
A data assimilation (DA) system is developed for BSH operational circulation model in order to improve forecast of current velocities, sea surface height, temperature and salinity in the North and Baltic Seas. Assimilated data are NOAA sea surface temperature (SST) data for the following period: 01.10.07 -30.09.08. All data assimilation experiments are based on im-plementation of one of the so-called statistical DA methods -Singular Evolutive Interpolated Kalman (SEIK) filter, -with different ways of prescribing assumed model and data errors statis-tics. Results of the experiments will be shown and compared against each other. Hydrographic data from MARNET stations and sea level at series of tide gauges are used as independent information to validate the data assimilation system. Keywords: Operational Oceanography and forecasting
Seeking health information online: does Wikipedia matter?
Laurent, Michaël R; Vickers, Tim J
2009-01-01
OBJECTIVE To determine the significance of the English Wikipedia as a source of online health information. DESIGN The authors measured Wikipedia's ranking on general Internet search engines by entering keywords from MedlinePlus, NHS Direct Online, and the National Organization of Rare Diseases as queries into search engine optimization software. We assessed whether article quality influenced this ranking. The authors tested whether traffic to Wikipedia coincided with epidemiological trends and news of emerging health concerns, and how it compares to MedlinePlus. MEASUREMENTS Cumulative incidence and average position of Wikipedia compared to other Web sites among the first 20 results on general Internet search engines (Google, Google UK, Yahoo, and MSN, and page view statistics for selected Wikipedia articles and MedlinePlus pages. RESULTS Wikipedia ranked among the first ten results in 71-85% of search engines and keywords tested. Wikipedia surpassed MedlinePlus and NHS Direct Online (except for queries from the latter on Google UK), and ranked higher with quality articles. Wikipedia ranked highest for rare diseases, although its incidence in several categories decreased. Page views increased parallel to the occurrence of 20 seasonal disorders and news of three emerging health concerns. Wikipedia articles were viewed more often than MedlinePlus Topic (p = 0.001) but for MedlinePlus Encyclopedia pages, the trend was not significant (p = 0.07-0.10). CONCLUSIONS Based on its search engine ranking and page view statistics, the English Wikipedia is a prominent source of online health information compared to the other online health information providers studied.
Yu, Xiaojin; Liu, Pei; Min, Jie; Chen, Qiguang
2009-01-01
To explore the application of regression on order statistics (ROS) in estimating nondetects for food exposure assessment. Regression on order statistics was adopted in analysis of cadmium residual data set from global food contaminant monitoring, the mean residual was estimated basing SAS programming and compared with the results from substitution methods. The results show that ROS method performs better obviously than substitution methods for being robust and convenient for posterior analysis. Regression on order statistics is worth to adopt,but more efforts should be make for details of application of this method.
Alternative Statistical Frameworks for Student Growth Percentile Estimation
ERIC Educational Resources Information Center
Lockwood, J. R.; Castellano, Katherine E.
2015-01-01
This article suggests two alternative statistical approaches for estimating student growth percentiles (SGP). The first is to estimate percentile ranks of current test scores conditional on past test scores directly, by modeling the conditional cumulative distribution functions, rather than indirectly through quantile regressions. This would…
Kappa statistic for clustered matched-pair data.
Yang, Zhao; Zhou, Ming
2014-07-10
Kappa statistic is widely used to assess the agreement between two procedures in the independent matched-pair data. For matched-pair data collected in clusters, on the basis of the delta method and sampling techniques, we propose a nonparametric variance estimator for the kappa statistic without within-cluster correlation structure or distributional assumptions. The results of an extensive Monte Carlo simulation study demonstrate that the proposed kappa statistic provides consistent estimation and the proposed variance estimator behaves reasonably well for at least a moderately large number of clusters (e.g., K ≥50). Compared with the variance estimator ignoring dependence within a cluster, the proposed variance estimator performs better in maintaining the nominal coverage probability when the intra-cluster correlation is fair (ρ ≥0.3), with more pronounced improvement when ρ is further increased. To illustrate the practical application of the proposed estimator, we analyze two real data examples of clustered matched-pair data. Copyright © 2014 John Wiley & Sons, Ltd.
Should Security Researchers Experiment More and Draw More Inferences?
2011-08-01
knowledge would be enormous. To obtain a large and representative sample of keystroke-dynamics research papers, we consulted the IEEE Xplore database... IEEE Xplore are similar to those published elsewhere), these confidence intervals estimate the re- gions where those true percentages would lie with 95...of articles and conference pro- ceedings published by the IEEE , to which our university maintains a subscription. We conducted two keyword searches for
Treatment of Vestibular Dysfunction Using a Portable Stimulator
2017-04-01
AWARD NUMBER: W81XWH-14-2-0012 TITLE: Treatment of Vestibular Dysfunction Using a Portable Stimulator PRINCIPAL INVESTIGATOR: Jorge M...Approved OMB No. 0704-0188 Public reporting burden for this collection of information is estimated to average 1 hour per response, including the time ...Dysfunction Using a Portable Stimulator Email: jorge.serrador@rutgers.edu U U U TABLE OF CONTENTS Page No. INTRODUCTION 4 KEYWORDS 4 ACCOMPLISHMENTS 4
2006-08-21
Dynamic Testing of In-Situ Composite Floors and Evaluation of Vibration Serviceability Using the Finite Element Method By Anthony R. Barrett...Setareh Alfred L. Wicks 21 August 2006 Blacksburg, VA Keywords: vibration, floor, serviceability , walking, modal analysis, fundamental frequency...burden estimate or any other aspect of this collection of information, including suggestions for reducing this burden, to Washington Headquarters Services
Ries, Kernell G.; Eng, Ken
2010-01-01
The U.S. Geological Survey, in cooperation with the Maryland Department of the Environment, operated a network of 20 low-flow partial-record stations during 2008 in a region that extends from southwest of Baltimore to the northeastern corner of Maryland to obtain estimates of selected streamflow statistics at the station locations. The study area is expected to face a substantial influx of new residents and businesses as a result of military and civilian personnel transfers associated with the Federal Base Realignment and Closure Act of 2005. The estimated streamflow statistics, which include monthly 85-percent duration flows, the 10-year recurrence-interval minimum base flow, and the 7-day, 10-year low flow, are needed to provide a better understanding of the availability of water resources in the area to be affected by base-realignment activities. Streamflow measurements collected for this study at the low-flow partial-record stations and measurements collected previously for 8 of the 20 stations were related to concurrent daily flows at nearby index streamgages to estimate the streamflow statistics. Three methods were used to estimate the streamflow statistics and two methods were used to select the index streamgages. Of the three methods used to estimate the streamflow statistics, two of them--the Moments and MOVE1 methods--rely on correlating the streamflow measurements at the low-flow partial-record stations with concurrent streamflows at nearby, hydrologically similar index streamgages to determine the estimates. These methods, recommended for use by the U.S. Geological Survey, generally require about 10 streamflow measurements at the low-flow partial-record station. The third method transfers the streamflow statistics from the index streamgage to the partial-record station based on the average of the ratios of the measured streamflows at the partial-record station to the concurrent streamflows at the index streamgage. This method can be used with as few as one pair of streamflow measurements made on a single streamflow recession at the low-flow partial-record station, although additional pairs of measurements will increase the accuracy of the estimates. Errors associated with the two correlation methods generally were lower than the errors associated with the flow-ratio method, but the advantages of the flow-ratio method are that it can produce reasonably accurate estimates from streamflow measurements much faster and at lower cost than estimates obtained using the correlation methods. The two index-streamgage selection methods were (1) selection based on the highest correlation coefficient between the low-flow partial-record station and the index streamgages, and (2) selection based on Euclidean distance, where the Euclidean distance was computed as a function of geographic proximity and the basin characteristics: drainage area, percentage of forested area, percentage of impervious area, and the base-flow recession time constant, t. Method 1 generally selected index streamgages that were significantly closer to the low-flow partial-record stations than method 2. The errors associated with the estimated streamflow statistics generally were lower for method 1 than for method 2, but the differences were not statistically significant. The flow-ratio method for estimating streamflow statistics at low-flow partial-record stations was shown to be independent from the two correlation-based estimation methods. As a result, final estimates were determined for eight low-flow partial-record stations by weighting estimates from the flow-ratio method with estimates from one of the two correlation methods according to the respective variances of the estimates. Average standard errors of estimate for the final estimates ranged from 90.0 to 7.0 percent, with an average value of 26.5 percent. Average standard errors of estimate for the weighted estimates were, on average, 4.3 percent less than the best average standard errors of estima
NASA Astrophysics Data System (ADS)
Kwon, Hyun-Han; Kim, Jin-Guk; Jung, Il-Won
2015-04-01
It must be acknowledged that application of rainfall-runoff models to simulate rainfall-runoff processes are successful in gauged watershed. However, there still remain some issues that will need to be further discussed. In particular, the quantitive representation of nonstationarity issue in basin response (e.g. concentration time, storage coefficient and roughness) along with ungauged watershed needs to be studied. In this regard, this study aims to investigate nonstationarity in basin response so as to potentially provide useful information in simulating runoff processes in ungauged watershed. For this purpose, HEC-1 rainfall-runoff model was mainly utilized. In addition, this study combined HEC-1 model with Bayesian statistical model to estimate uncertainty of the parameters which is called Bayesian HEC-1 (BHEC-1). The proposed rainfall-runofall model is applied to various catchments along with various rainfall patterns to understand nonstationarities in catchment response. Further discussion about the nonstationarity in catchment response and possible regionalization of the parameters for ungauged watershed are discussed. KEYWORDS: Nonstationary, Catchment response, Uncertainty, Bayesian Acknowledgement This research was supported by a Grant (13SCIPA01) from Smart Civil Infrastructure Research Program funded by the Ministry of Land, Infrastructure and Transport (MOLIT) of Korea government and the Korea Agency for Infrastructure Technology Advancement (KAIA).
NASA Astrophysics Data System (ADS)
Stevens, T.; Olsen, L. M.; Ritz, S.; Morahan, M.; Aleman, A.; Cepero, L.; Gokey, C.; Holland, M.; Cordova, R.; Areu, S.; Cherry, T.; Tran-Ho, H.
2012-12-01
Discovering Earth science data can be complex if the catalog holding the data lacks structure. Controlled keyword vocabularies within metadata catalogues can improve data discovery. NASA's Global Change Master Directory's (GCMD) Keyword Management System (KMS) is a recently released a RESTful web service for managing and providing access to controlled keywords (science keywords, service keywords, platforms, instruments, providers, locations, projects, data resolution, etc.). The KMS introduces a completely new paradigm for the use and management of the keywords and allows access to these keywords as SKOS Concepts (RDF), OWL, standard XML, and CSV. A universally unique identifier (UUID) is automatically assigned to each keyword, which uniquely identifies each concept and its associated information. A component of the KMS is the keyword manager, an internal tool that allows GCMD science coordinators to manage concepts. This includes adding, modifying, and deleting broader, narrower, or related concepts and associated definitions. The controlled keyword vocabulary represents over 20 years of effort and collaboration with the Earth science community. The maintenance, stability, and ongoing vigilance in maintaining mutually exclusive and parallel keyword lists is important for a "normalized" search and discovery, and provides a unique advantage for the science community. Modifications and additions are made based on community suggestions and internal review. To help maintain keyword integrity, science keyword rules and procedures for modification of keywords were developed. This poster will highlight the use of the KMS as a beneficial service for the stewardship and access of the GCMD keywords. Users will learn how to access the KMS and utilize the keywords. Best practices for managing an extensive keyword hierarchy will also be discussed. Participants will learn the process for making keyword suggestions, which subsequently help in building a controlled keyword vocabulary to improve earth science data discovery and access.
Estimating chronic disease rates in Canada: which population-wide denominator to use?
Ellison, J; Nagamuthu, C; Vanderloo, S; McRae, B; Waters, C
2016-10-01
Chronic disease rates are produced from the Public Health Agency of Canada's Canadian Chronic Disease Surveillance System (CCDSS) using administrative health data from provincial/territorial health ministries. Denominators for these rates are based on estimates of populations derived from health insurance files. However, these data may not be accessible to all researchers. Another source for population size estimates is the Statistics Canada census. The purpose of our study was to calculate the major differences between the CCDSS and Statistics Canada's population denominators and to identify the sources or reasons for the potential differences between these data sources. We compared the 2009 denominators from the CCDSS and Statistics Canada. The CCDSS denominator was adjusted for the growth components (births, deaths, emigration and immigration) from Statistics Canada's census data. The unadjusted CCDSS denominator was 34 429 804, 3.2% higher than Statistics Canada's estimate of population in 2009. After the CCDSS denominator was adjusted for the growth components, the difference between the two estimates was reduced to 431 323 people, a difference of 1.3%. The CCDSS overestimates the population relative to Statistics Canada overall. The largest difference between the two estimates was from the migrant growth component, while the smallest was from the emigrant component. By using data descriptions by data source, researchers can make decisions about which population to use in their calculations of disease frequency.
Methods for estimating low-flow statistics for Massachusetts streams
Ries, Kernell G.; Friesz, Paul J.
2000-01-01
Methods and computer software are described in this report for determining flow duration, low-flow frequency statistics, and August median flows. These low-flow statistics can be estimated for unregulated streams in Massachusetts using different methods depending on whether the location of interest is at a streamgaging station, a low-flow partial-record station, or an ungaged site where no data are available. Low-flow statistics for streamgaging stations can be estimated using standard U.S. Geological Survey methods described in the report. The MOVE.1 mathematical method and a graphical correlation method can be used to estimate low-flow statistics for low-flow partial-record stations. The MOVE.1 method is recommended when the relation between measured flows at a partial-record station and daily mean flows at a nearby, hydrologically similar streamgaging station is linear, and the graphical method is recommended when the relation is curved. Equations are presented for computing the variance and equivalent years of record for estimates of low-flow statistics for low-flow partial-record stations when either a single or multiple index stations are used to determine the estimates. The drainage-area ratio method or regression equations can be used to estimate low-flow statistics for ungaged sites where no data are available. The drainage-area ratio method is generally as accurate as or more accurate than regression estimates when the drainage-area ratio for an ungaged site is between 0.3 and 1.5 times the drainage area of the index data-collection site. Regression equations were developed to estimate the natural, long-term 99-, 98-, 95-, 90-, 85-, 80-, 75-, 70-, 60-, and 50-percent duration flows; the 7-day, 2-year and the 7-day, 10-year low flows; and the August median flow for ungaged sites in Massachusetts. Streamflow statistics and basin characteristics for 87 to 133 streamgaging stations and low-flow partial-record stations were used to develop the equations. The streamgaging stations had from 2 to 81 years of record, with a mean record length of 37 years. The low-flow partial-record stations had from 8 to 36 streamflow measurements, with a median of 14 measurements. All basin characteristics were determined from digital map data. The basin characteristics that were statistically significant in most of the final regression equations were drainage area, the area of stratified-drift deposits per unit of stream length plus 0.1, mean basin slope, and an indicator variable that was 0 in the eastern region and 1 in the western region of Massachusetts. The equations were developed by use of weighted-least-squares regression analyses, with weights assigned proportional to the years of record and inversely proportional to the variances of the streamflow statistics for the stations. Standard errors of prediction ranged from 70.7 to 17.5 percent for the equations to predict the 7-day, 10-year low flow and 50-percent duration flow, respectively. The equations are not applicable for use in the Southeast Coastal region of the State, or where basin characteristics for the selected ungaged site are outside the ranges of those for the stations used in the regression analyses. A World Wide Web application was developed that provides streamflow statistics for data collection stations from a data base and for ungaged sites by measuring the necessary basin characteristics for the site and solving the regression equations. Output provided by the Web application for ungaged sites includes a map of the drainage-basin boundary determined for the site, the measured basin characteristics, the estimated streamflow statistics, and 90-percent prediction intervals for the estimates. An equation is provided for combining regression and correlation estimates to obtain improved estimates of the streamflow statistics for low-flow partial-record stations. An equation is also provided for combining regression and drainage-area ratio estimates to obtain improved e
Statistical Techniques to Analyze Pesticide Data Program Food Residue Observations.
Szarka, Arpad Z; Hayworth, Carol G; Ramanarayanan, Tharacad S; Joseph, Robert S I
2018-06-26
The U.S. EPA conducts dietary-risk assessments to ensure that levels of pesticides on food in the U.S. food supply are safe. Often these assessments utilize conservative residue estimates, maximum residue levels (MRLs), and a high-end estimate derived from registrant-generated field-trial data sets. A more realistic estimate of consumers' pesticide exposure from food may be obtained by utilizing residues from food-monitoring programs, such as the Pesticide Data Program (PDP) of the U.S. Department of Agriculture. A substantial portion of food-residue concentrations in PDP monitoring programs are below the limits of detection (left-censored), which makes the comparison of regulatory-field-trial and PDP residue levels difficult. In this paper, we present a novel adaption of established statistical techniques, the Kaplan-Meier estimator (K-M), the robust regression on ordered statistic (ROS), and the maximum-likelihood estimator (MLE), to quantify the pesticide-residue concentrations in the presence of heavily censored data sets. The examined statistical approaches include the most commonly used parametric and nonparametric methods for handling left-censored data that have been used in the fields of medical and environmental sciences. This work presents a case study in which data of thiamethoxam residue on bell pepper generated from registrant field trials were compared with PDP-monitoring residue values. The results from the statistical techniques were evaluated and compared with commonly used simple substitution methods for the determination of summary statistics. It was found that the maximum-likelihood estimator (MLE) is the most appropriate statistical method to analyze this residue data set. Using the MLE technique, the data analyses showed that the median and mean PDP bell pepper residue levels were approximately 19 and 7 times lower, respectively, than the corresponding statistics of the field-trial residues.
Hu, Jiande; Dong, Yonghai; Chen, Xiaodan; Liu, Yun; Ma, Dongyang; Liu, Xiaoyun; Zheng, Ruizhi; Mao, Xiangqun; Chen, Ting; He, Wei
2015-08-01
According to World Health Organization, for every committed suicide there were 20 suicide attempts at least. In the last decade, despite the increasing awareness on suicide attempts among adolescents in China, there has been no comprehensive system reporting vital statistics. Consequently, the prevalence of suicide attempts reported in some studies ranged variedly. Therefore, the purpose of this study was to provide the first meta-analysis of cross-sectional studies of suicide attempts to fill this gap. Two reviewers independently screened potentially relevant cross-sectional studies of suicide attempts through PubMed-Medline, Embase, Wanfang Data, Chongqing VIP and Chinese National Knowledge Infrastructure databases using the core terms 'suicid*'/'suicide attempt*'/'attempted suicide' and 'adolescen*'/'youth'/'child*'/'student*' and 'China'/'Chinese' in the article titles, abstracts and keywords. Chi-square based Q test and I(2) statistic assessed the heterogeneity. Forest plot was used to display results graphically. Potential publication bias was assessed by the funnel plot, Begg's and Egger's test. In total, 43 studies with 200,124 participants met the eligibility criteria. The pooled prevalence of suicide attempts among Chinese adolescents was 2.94% (95% CI: 2.53%-3.41%). Substantial heterogeneity in prevalence estimates was revealed. Subgroup analyses showed that the prevalence for males was 2.50% (95% CI: 2.08%-3.01%), and for females was 3.17% (95% CI: 2.56%-3.91%). In sum, abstracting across the literatures, the prevalence of suicide attempts among Chinese adolescents was moderate compared with other countries around the world. Necessary measures should be set out prevent them in the future. Copyright © 2015 Elsevier Inc. All rights reserved.
Park, Chan Hyuk; Kim, Eun Hye; Roh, Yun Ho; Kim, Ha Yan; Lee, Sang Kil
2014-01-01
Although many case reports have described patients with proton pump inhibitor (PPI)-induced hypomagnesemia, the impact of PPI use on hypomagnesemia has not been fully clarified through comparative studies. We aimed to evaluate the association between the use of PPI and the risk of developing hypomagnesemia by conducting a systematic review with meta-analysis. We conducted a systematic search of MEDLINE, EMBASE, and the Cochrane Library using the primary keywords "proton pump," "dexlansoprazole," "esomeprazole," "ilaprazole," "lansoprazole," "omeprazole," "pantoprazole," "rabeprazole," "hypomagnesemia," "hypomagnesaemia," and "magnesium." Studies were included if they evaluated the association between PPI use and hypomagnesemia and reported relative risks or odds ratios or provided data for their estimation. Pooled odds ratios with 95% confidence intervals were calculated using the random effects model. Statistical heterogeneity was assessed with Cochran's Q test and I2 statistics. Nine studies including 115,455 patients were analyzed. The median Newcastle-Ottawa quality score for the included studies was seven (range, 6-9). Among patients taking PPIs, the median proportion of patients with hypomagnesemia was 27.1% (range, 11.3-55.2%) across all included studies. Among patients not taking PPIs, the median proportion of patients with hypomagnesemia was 18.4% (range, 4.3-52.7%). On meta-analysis, pooled odds ratio for PPI use was found to be 1.775 (95% confidence interval 1.077-2.924). Significant heterogeneity was identified using Cochran's Q test (df = 7, P<0.001, I2 = 98.0%). PPI use may increase the risk of hypomagnesemia. However, significant heterogeneity among the included studies prevented us from reaching a definitive conclusion.
NASA Astrophysics Data System (ADS)
Swade, D. A.; Gardner, L.; Hopkins, E.; Kimball, T.; Lezon, K.; Rose, J.; Shiao, B.
STScI has undertaken a project to place all HST keyword information in one source, the keyword database, and to provide a mechanism for making this keyword information accessible to all HST users, the keyword dictionary, which is a WWW interface to the keyword database.
Samadzadeh, Gholam Reza; Rigi, Tahereh; Ganjali, Ali Reza
2013-01-01
Surveying valuable and most recent information from internet, has become vital for researchers and scholars, because every day, thousands and perhaps millions of scientific works are brought out as digital resources which represented by internet and researchers can't ignore this great resource to find related documents for their literature search, which may not be found in any library. With regard to variety of documents presented on the internet, search engines are one of the most effective search tools for finding information. The aim of this study is to evaluate the three criteria, recall, preciseness and importance of the four search engines which are PubMed, Science Direct, Google Scholar and federated search of Iranian National Medical Digital Library in addiction (prevention and treatment) to select the most effective search engine for offering the best literature research. This research was a cross-sectional study by which four popular search engines in medical sciences were evaluated. To select keywords, medical subject heading (Mesh) was used. We entered given keywords in the search engines and after searching, 10 first entries were evaluated. Direct observation was used as a mean for data collection and they were analyzed by descriptive statistics (number, percent number and mean) and inferential statistics, One way analysis of variance (ANOVA) and post hoc Tukey in Spss. 15 statistical software. P Value < 0.05 was considered statistically significant. Results have shown that the search engines had different operations with regard to the evaluated criteria. Since P Value was 0.004 < 0.05 for preciseness and was 0.002 < 0.05 for importance, it shows significant difference among search engines. PubMed, Science Direct and Google Scholar were the best in recall, preciseness and importance respectively. As literature research is one of the most important stages of research, it's better for researchers, especially Substance-Related Disorders scholars to use different search engines with the best recall, preciseness and importance in that subject field to reach desirable results while searching and they don't depend on just one search engine.
Samadzadeh, Gholam Reza; Rigi, Tahereh; Ganjali, Ali Reza
2013-01-01
Background Surveying valuable and most recent information from internet, has become vital for researchers and scholars, because every day, thousands and perhaps millions of scientific works are brought out as digital resources which represented by internet and researchers can’t ignore this great resource to find related documents for their literature search, which may not be found in any library. With regard to variety of documents presented on the internet, search engines are one of the most effective search tools for finding information. Objectives The aim of this study is to evaluate the three criteria, recall, preciseness and importance of the four search engines which are PubMed, Science Direct, Google Scholar and federated search of Iranian National Medical Digital Library in addiction (prevention and treatment) to select the most effective search engine for offering the best literature research. Materials and Methods This research was a cross-sectional study by which four popular search engines in medical sciences were evaluated. To select keywords, medical subject heading (Mesh) was used. We entered given keywords in the search engines and after searching, 10 first entries were evaluated. Direct observation was used as a mean for data collection and they were analyzed by descriptive statistics (number, percent number and mean) and inferential statistics, One way analysis of variance (ANOVA) and post hoc Tukey in Spss. 15 statistical software. P Value < 0.05 was considered statistically significant. Results Results have shown that the search engines had different operations with regard to the evaluated criteria. Since P Value was 0.004 < 0.05 for preciseness and was 0.002 < 0.05 for importance, it shows significant difference among search engines. PubMed, Science Direct and Google Scholar were the best in recall, preciseness and importance respectively. Conclusions As literature research is one of the most important stages of research, it's better for researchers, especially Substance-Related Disorders scholars to use different search engines with the best recall, preciseness and importance in that subject field to reach desirable results while searching and they don’t depend on just one search engine. PMID:24971257
Confidence Intervals for Effect Sizes: Applying Bootstrap Resampling
ERIC Educational Resources Information Center
Banjanovic, Erin S.; Osborne, Jason W.
2016-01-01
Confidence intervals for effect sizes (CIES) provide readers with an estimate of the strength of a reported statistic as well as the relative precision of the point estimate. These statistics offer more information and context than null hypothesis statistic testing. Although confidence intervals have been recommended by scholars for many years,…
NASA Technical Reports Server (NTRS)
Barth, Timothy J.
2016-01-01
This chapter discusses the ongoing development of combined uncertainty and error bound estimates for computational fluid dynamics (CFD) calculations subject to imposed random parameters and random fields. An objective of this work is the construction of computable error bound formulas for output uncertainty statistics that guide CFD practitioners in systematically determining how accurately CFD realizations should be approximated and how accurately uncertainty statistics should be approximated for output quantities of interest. Formal error bounds formulas for moment statistics that properly account for the presence of numerical errors in CFD calculations and numerical quadrature errors in the calculation of moment statistics have been previously presented in [8]. In this past work, hierarchical node-nested dense and sparse tensor product quadratures are used to calculate moment statistics integrals. In the present work, a framework has been developed that exploits the hierarchical structure of these quadratures in order to simplify the calculation of an estimate of the quadrature error needed in error bound formulas. When signed estimates of realization error are available, this signed error may also be used to estimate output quantity of interest probability densities as a means to assess the impact of realization error on these density estimates. Numerical results are presented for CFD problems with uncertainty to demonstrate the capabilities of this framework.
Garg, Rakesh
2016-09-01
The conduct of research requires a systematic approach involving diligent planning and its execution as planned. It comprises various essential predefined components such as aims, population, conduct/technique, outcome and statistical considerations. These need to be objective, reliable and in a repeatable format. Hence, the understanding of the basic aspects of methodology is essential for any researcher. This is a narrative review and focuses on various aspects of the methodology for conduct of a clinical research. The relevant keywords were used for literature search from various databases and from bibliographies of the articles.
Nowcasting and Forecasting the Monthly Food Stamps Data in the US Using Online Search Data
Fantazzini, Dean
2014-01-01
We propose the use of Google online search data for nowcasting and forecasting the number of food stamps recipients. We perform a large out-of-sample forecasting exercise with almost 3000 competing models with forecast horizons up to 2 years ahead, and we show that models including Google search data statistically outperform the competing models at all considered horizons. These results hold also with several robustness checks, considering alternative keywords, a falsification test, different out-of-samples, directional accuracy and forecasts at the state-level. PMID:25369315
Eash, David A.; Barnes, Kimberlee K.
2017-01-01
A statewide study was conducted to develop regression equations for estimating six selected low-flow frequency statistics and harmonic mean flows for ungaged stream sites in Iowa. The estimation equations developed for the six low-flow frequency statistics include: the annual 1-, 7-, and 30-day mean low flows for a recurrence interval of 10 years, the annual 30-day mean low flow for a recurrence interval of 5 years, and the seasonal (October 1 through December 31) 1- and 7-day mean low flows for a recurrence interval of 10 years. Estimation equations also were developed for the harmonic-mean-flow statistic. Estimates of these seven selected statistics are provided for 208 U.S. Geological Survey continuous-record streamgages using data through September 30, 2006. The study area comprises streamgages located within Iowa and 50 miles beyond the State's borders. Because trend analyses indicated statistically significant positive trends when considering the entire period of record for the majority of the streamgages, the longest, most recent period of record without a significant trend was determined for each streamgage for use in the study. The median number of years of record used to compute each of these seven selected statistics was 35. Geographic information system software was used to measure 54 selected basin characteristics for each streamgage. Following the removal of two streamgages from the initial data set, data collected for 206 streamgages were compiled to investigate three approaches for regionalization of the seven selected statistics. Regionalization, a process using statistical regression analysis, provides a relation for efficiently transferring information from a group of streamgages in a region to ungaged sites in the region. The three regionalization approaches tested included statewide, regional, and region-of-influence regressions. For the regional regression, the study area was divided into three low-flow regions on the basis of hydrologic characteristics, landform regions, and soil regions. A comparison of root mean square errors and average standard errors of prediction for the statewide, regional, and region-of-influence regressions determined that the regional regression provided the best estimates of the seven selected statistics at ungaged sites in Iowa. Because a significant number of streams in Iowa reach zero flow as their minimum flow during low-flow years, four different types of regression analyses were used: left-censored, logistic, generalized-least-squares, and weighted-least-squares regression. A total of 192 streamgages were included in the development of 27 regression equations for the three low-flow regions. For the northeast and northwest regions, a censoring threshold was used to develop 12 left-censored regression equations to estimate the 6 low-flow frequency statistics for each region. For the southern region a total of 12 regression equations were developed; 6 logistic regression equations were developed to estimate the probability of zero flow for the 6 low-flow frequency statistics and 6 generalized least-squares regression equations were developed to estimate the 6 low-flow frequency statistics, if nonzero flow is estimated first by use of the logistic equations. A weighted-least-squares regression equation was developed for each region to estimate the harmonic-mean-flow statistic. Average standard errors of estimate for the left-censored equations for the northeast region range from 64.7 to 88.1 percent and for the northwest region range from 85.8 to 111.8 percent. Misclassification percentages for the logistic equations for the southern region range from 5.6 to 14.0 percent. Average standard errors of prediction for generalized least-squares equations for the southern region range from 71.7 to 98.9 percent and pseudo coefficients of determination for the generalized-least-squares equations range from 87.7 to 91.8 percent. Average standard errors of prediction for weighted-least-squares equations developed for estimating the harmonic-mean-flow statistic for each of the three regions range from 66.4 to 80.4 percent. The regression equations are applicable only to stream sites in Iowa with low flows not significantly affected by regulation, diversion, or urbanization and with basin characteristics within the range of those used to develop the equations. If the equations are used at ungaged sites on regulated streams, or on streams affected by water-supply and agricultural withdrawals, then the estimates will need to be adjusted by the amount of regulation or withdrawal to estimate the actual flow conditions if that is of interest. Caution is advised when applying the equations for basins with characteristics near the applicable limits of the equations and for basins located in karst topography. A test of two drainage-area ratio methods using 31 pairs of streamgages, for the annual 7-day mean low-flow statistic for a recurrence interval of 10 years, indicates a weighted drainage-area ratio method provides better estimates than regional regression equations for an ungaged site on a gaged stream in Iowa when the drainage-area ratio is between 0.5 and 1.4. These regression equations will be implemented within the U.S. Geological Survey StreamStats web-based geographic-information-system tool. StreamStats allows users to click on any ungaged site on a river and compute estimates of the seven selected statistics; in addition, 90-percent prediction intervals and the measured basin characteristics for the ungaged sites also are provided. StreamStats also allows users to click on any streamgage in Iowa and estimates computed for these seven selected statistics are provided for the streamgage.
Statistical plant set estimation using Schroeder-phased multisinusoidal input design
NASA Technical Reports Server (NTRS)
Bayard, D. S.
1992-01-01
A frequency domain method is developed for plant set estimation. The estimation of a plant 'set' rather than a point estimate is required to support many methods of modern robust control design. The approach here is based on using a Schroeder-phased multisinusoid input design which has the special property of placing input energy only at the discrete frequency points used in the computation. A detailed analysis of the statistical properties of the frequency domain estimator is given, leading to exact expressions for the probability distribution of the estimation error, and many important properties. It is shown that, for any nominal parametric plant estimate, one can use these results to construct an overbound on the additive uncertainty to any prescribed statistical confidence. The 'soft' bound thus obtained can be used to replace 'hard' bounds presently used in many robust control analysis and synthesis methods.
Online Updating of Statistical Inference in the Big Data Setting.
Schifano, Elizabeth D; Wu, Jing; Wang, Chun; Yan, Jun; Chen, Ming-Hui
2016-01-01
We present statistical methods for big data arising from online analytical processing, where large amounts of data arrive in streams and require fast analysis without storage/access to the historical data. In particular, we develop iterative estimating algorithms and statistical inferences for linear models and estimating equations that update as new data arrive. These algorithms are computationally efficient, minimally storage-intensive, and allow for possible rank deficiencies in the subset design matrices due to rare-event covariates. Within the linear model setting, the proposed online-updating framework leads to predictive residual tests that can be used to assess the goodness-of-fit of the hypothesized model. We also propose a new online-updating estimator under the estimating equation setting. Theoretical properties of the goodness-of-fit tests and proposed estimators are examined in detail. In simulation studies and real data applications, our estimator compares favorably with competing approaches under the estimating equation setting.
Incorporating GIS and remote sensing for census population disaggregation
NASA Astrophysics Data System (ADS)
Wu, Shuo-Sheng'derek'
Census data are the primary source of demographic data for a variety of researches and applications. For confidentiality issues and administrative purposes, census data are usually released to the public by aggregated areal units. In the United States, the smallest census unit is census blocks. Due to data aggregation, users of census data may have problems in visualizing population distribution within census blocks and estimating population counts for areas not coinciding with census block boundaries. The main purpose of this study is to develop methodology for estimating sub-block areal populations and assessing the estimation errors. The City of Austin, Texas was used as a case study area. Based on tax parcel boundaries and parcel attributes derived from ancillary GIS and remote sensing data, detailed urban land use classes were first classified using a per-field approach. After that, statistical models by land use classes were built to infer population density from other predictor variables, including four census demographic statistics (the Hispanic percentage, the married percentage, the unemployment rate, and per capita income) and three physical variables derived from remote sensing images and building footprints vector data (a landscape heterogeneity statistics, a building pattern statistics, and a building volume statistics). In addition to statistical models, deterministic models were proposed to directly infer populations from building volumes and three housing statistics, including the average space per housing unit, the housing unit occupancy rate, and the average household size. After population models were derived or proposed, how well the models predict populations for another set of sample blocks was assessed. The results show that deterministic models were more accurate than statistical models. Further, by simulating the base unit for modeling from aggregating blocks, I assessed how well the deterministic models estimate sub-unit-level populations. I also assessed the aggregation effects and the resealing effects on sub-unit estimates. Lastly, from another set of mixed-land-use sample blocks, a mixed-land-use model was derived and compared with a residential-land-use model. The results of per-field land use classification are satisfactory with a Kappa accuracy statistics of 0.747. Model Assessments by land use show that population estimates for multi-family land use areas have higher errors than those for single-family land use areas, and population estimates for mixed land use areas have higher errors than those for residential land use areas. The assessments of sub-unit estimates using a simulation approach indicate that smaller areas show higher estimation errors, estimation errors do not relate to the base unit size, and resealing improves all levels of sub-unit estimates.
Simulating the Snow Water Equivalent and its changing pattern over Nepal
NASA Astrophysics Data System (ADS)
Niroula, S.; Joseph, J.; Ghosh, S.
2016-12-01
Snow fall in the Himalayan region is one of the primary sources of fresh water, which accounts around 10% of total precipitation of Nepal. Snow water is an intricate variable in terms of its global and regional estimates whose complexity is favored by spatial variability linked with rugged topography. The study is primarily focused on simulation of Snow Water Equivalent (SWE) by the use of a macroscale hydrologic model, Variable Infiltration Capacity (VIC). As whole Nepal including its Himalayas lies under the catchment of Ganga River in India, contributing at least 40% of annual discharge of Ganges, this model was run in the entire watershed that covers part of Tibet and Bangladesh as well. Meteorological inputs for 29 years (1979-2007) are drawn from ERA-INTERIM and APHRODITE dataset for horizontal resolution of 0.25 degrees. The analysis was performed to study temporal variability of SWE in the Himalayan region of Nepal. The model was calibrated by observed stream flows of the tributaries of the Gandaki River in Nepal which ultimately feeds river Ganga. Further, the simulated SWE is used to estimate stream flow in this river basin. Since Nepal has a greater snow cover accumulation in monsoon season than in winter at high altitudes, seasonality fluctuations in SWE affecting the stream flows are known. The model provided fair estimates of SWE and stream flow as per statistical analysis. Stream flows are known to be sensitive to the changes in snow water that can bring a negative impact on power generation in a country which has huge hydroelectric potential. In addition, our results on simulated SWE in second largest snow-fed catchment of the country will be helpful for reservoir management, flood forecasting and other water resource management issues. Keywords: Hydrology, Snow Water Equivalent, Variable Infiltration Capacity, Gandaki River Basin, Stream Flow
Does Twitter Trigger Bursts in Signature Collections?
Yamaguchi, Rui; Imoto, Seiya; Kami, Masahiro; Watanabe, Kenji; Miyano, Satoru; Yuji, Koichiro
2013-01-01
Introduction The quantification of social media impacts on societal and political events is a difficult undertaking. The Japanese Society of Oriental Medicine started a signature-collecting campaign to oppose a medical policy of the Government Revitalization Unit to exclude a traditional Japanese medicine, “Kampo,” from the public insurance system. The signature count showed a series of aberrant bursts from November 26 to 29, 2009. In the same interval, the number of messages on Twitter including the keywords “Signature” and “Kampo,” increased abruptly. Moreover, the number of messages on an Internet forum that discussed the policy and called for signatures showed a train of spikes. Methods and Findings In order to estimate the contributions of social media, we developed a statistical model with state-space modeling framework that distinguishes the contributions of multiple social media in time-series of collected public opinions. We applied the model to the time-series of signature counts of the campaign and quantified contributions of two social media, i.e., Twitter and an Internet forum, by the estimation. We found that a considerable portion (78%) of the signatures was affected from either of the social media throughout the campaign and the Twitter effect (26%) was smaller than the Forum effect (52%) in total, although Twitter probably triggered the initial two bursts of signatures. Comparisons of the estimated profiles of the both effects suggested distinctions between the social media in terms of sustainable impact of messages or tweets. Twitter shows messages on various topics on a time-line; newer messages push out older ones. Twitter may diminish the impact of messages that are tweeted intermittently. Conclusions The quantification of social media impacts is beneficial to better understand people’s tendency and may promote developing strategies to engage public opinions effectively. Our proposed method is a promising tool to explore information hidden in social phenomena. PMID:23484004
Ramos, Roann Munoz; Bitsch, Jó Ágila; Jonas, Stephan Michael; Ix, Tim; See, Portia Lynn Quetulio; Wehrle, Klaus
2016-01-01
Background Language reflects the state of one’s mental health and personal characteristics. It also reveals preoccupations with a particular schema, thus possibly providing insights into psychological conditions. Using text or lexical analysis in exploring depression, negative schemas and self-focusing tendencies may be depicted. As mobile technology has become highly integrated in daily routine, mobile devices have the capacity for ecological momentary assessment (EMA), specifically the experience sampling method (ESM), where behavior is captured in real-time or closer in time to experience in one’s natural environment. Extending mobile technology to psychological health could augment initial clinical assessment, particularly of mood disturbances, such as depression and analyze daily activities, such as language use in communication. Here, we present the process of lexicon generation and development and the initial validation of Psychologist in a Pocket (PiaP), a mobile app designed to screen signs of depression through text analysis. Objective The main objectives of the study are (1) to generate and develop a depressive lexicon that can be used for screening text-input in mobile apps to be used in the PiaP; and (2) to conduct content validation as initial validation. Methods The first phase of our research focused on lexicon development. Words related to depression and its symptoms based on the Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition (DSM-5) and in the ICD-10 Classification of Mental and Behavioural Disorders: Clinical Descriptions and Diagnostic Guidelines classification systems were gathered from focus group discussions with Filipino college students, interviews with mental health professionals, and the review of established scales for depression and other related constructs. Results The lexicon development phase yielded a database consisting of 13 categories based on the criteria depressive symptoms in the DSM-5 and ICD-10. For the draft of the depression lexicon for PiaP, we were able to gather 1762 main keywords and 9655 derivatives of main keywords. In addition, we compiled 823,869 spelling variations. Keywords included negatively-valenced words like “sad”, “unworthy”, or “tired” which are almost always accompanied by personal pronouns, such as “I”, “I’m” or “my” and in Filipino, “ako” or “ko”. For the content validation, only keywords with CVR equal to or more than 0.75 were included in the depression lexicon test-run version. The mean of all CVRs yielded a high overall CVI of 0.90. A total of 1498 main keywords, 8911 derivatives of main keywords, and 783,140 spelling variations, with a total of 793, 553 keywords now comprise the test-run version. Conclusions The generation of the depression lexicon is relatively exhaustive. The breadth of keywords used in text analysis incorporates the characteristic expressions of depression and its related constructs by a particular culture and age group. A content-validated mobile health app, PiaP may help augment a more effective and early detection of depressive symptoms. PMID:27439444
Estimating procedure times for surgeries by determining location parameters for the lognormal model.
Spangler, William E; Strum, David P; Vargas, Luis G; May, Jerrold H
2004-05-01
We present an empirical study of methods for estimating the location parameter of the lognormal distribution. Our results identify the best order statistic to use, and indicate that using the best order statistic instead of the median may lead to less frequent incorrect rejection of the lognormal model, more accurate critical value estimates, and higher goodness-of-fit. Using simulation data, we constructed and compared two models for identifying the best order statistic, one based on conventional nonlinear regression and the other using a data mining/machine learning technique. Better surgical procedure time estimates may lead to improved surgical operations.
ERIC Educational Resources Information Center
Herlihy, Lester B.; Deffenbaugh, Walter S.
1938-01-01
This report presents statistics of city school systems for the school year 1935-36. prior to 1933-34 school statistics for cities included in county unit systems were estimated. Most of these cities are in Florida, Louisiana, Maryland, and West Virginia. Since the method of estimating school statistics for the cities included with the counties in…
NASA Astrophysics Data System (ADS)
Stevens, T.
2016-12-01
NASA's Global Change Master Directory (GCMD) curates a hierarchical set of controlled vocabularies (keywords) covering Earth sciences and associated information (data centers, projects, platforms, and instruments). The purpose of the keywords is to describe Earth science data and services in a consistent and comprehensive manner, allowing for precise metadata search and subsequent retrieval of data and services. The keywords are accessible in a standardized SKOS/RDF/OWL representation and are used as an authoritative taxonomy, as a source for developing ontologies, and to search and access Earth Science data within online metadata catalogs. The keyword curation approach involves: (1) receiving community suggestions; (2) triaging community suggestions; (3) evaluating keywords against a set of criteria coordinated by the NASA Earth Science Data and Information System (ESDIS) Standards Office; (4) implementing the keywords; and (5) publication/notification of keyword changes. This approach emphasizes community input, which helps ensure a high quality, normalized, and relevant keyword structure that will evolve with users' changing needs. The Keyword Community Forum, which promotes a responsive, open, and transparent process, is an area where users can discuss keyword topics and make suggestions for new keywords. Others could potentially use this formalized approach as a model for keyword curation.
Can EDA Combat the Rise of Electronic Counterfeiting?
2012-06-01
Categories and Subject Descriptors B .7 [Hardware]: Integrated Circuits General Terms Design, Security Keywords Counterfeiting; Reliability; Device and...1199-1/12/06 ...$10.00. SIA at $7.5 B . Very recently, EE Times estimated that IC counterfeiting losses are as high as $ 169 B annually. There- fore, the...PERSON a. REPORT unclassified b . ABSTRACT unclassified c. THIS PAGE unclassified Standard Form 298 (Rev. 8-98) Prescribed by ANSI Std Z39-18
A General Purpose Feature Extractor for Light Detection and Ranging Data
2010-11-17
datasets, and the 3D MIT DARPA Urban Challenge dataset. Keywords: SLAM ; LIDARs ; feature detection; uncertainty estimates; descriptors 1. Introduction The...November 2010 Abstract: Feature extraction is a central step of processing Light Detection and Ranging ( LIDAR ) data. Existing detectors tend to exploit...detector for both 2D and 3D LIDAR data that is applicable to virtually any environment. Our method adapts classic feature detection methods from the image
Targeting ESR1-Mutant Breast Cancer
2015-09-01
Approved OMB No. 0704-0188 Public reporting burden for this collection of information is estimated to average 1 hour per response , including the time...CLASSIFICATION OF: 17. LIMITATION OF ABSTRACT 18. NUMBER OF PAGES 19a. NAME OF RESPONSIBLE PERSON USAMRMC a. REPORT U b. ABSTRACT U c. THIS PAGE U UU...Cancer W81XWH-14-1-0359 5 2. Keywords Estrogen Receptor Estrogen Response Element Metastatic Breast Cancer Ligand Binding Domain Mutation
2015-01-01
This research has the purpose to establish a foundation for new classification and estimation of CDMA signals. Keywords: DS / CDMA signals, BPSK, QPSK...DEVELOPMENT OF THE AVERAGE LIKELIHOOD FUNCTION FOR CODE DIVISION MULTIPLE ACCESS ( CDMA ) USING BPSK AND QPSK SYMBOLS JANUARY 2015...To) OCT 2013 – OCT 2014 4. TITLE AND SUBTITLE DEVELOPMENT OF THE AVERAGE LIKELIHOOD FUNCTION FOR CODE DIVISION MULTIPLE ACCESS ( CDMA ) USING BPSK
YY1 Control of AID-Dependent Lymphomagenesis
2015-07-01
lymphomagenesis by conditional deletion of the yy1 gene in germinal center B cells using γ1-CRE mice. 2. KEYWORDS Yin-Yang 1 (YY1), B Cell Lymphoma, Activation...AWARD NUMBER: W81XWH-14- 1 -0171 TITLE: YY1 Control of AID-Dependent Lymphomagenesis PRINCIPAL INVESTIGATOR: Michael Atchison, Ph.D CONTRACTING...reporting burden for this collection of information is estimated to average 1 hour per response, including the time for reviewing instructions
Epidemiology from Tweets: Estimating Misuse of Prescription Opioids in the USA from Social Media.
Chary, Michael; Genes, Nicholas; Giraud-Carrier, Christophe; Hanson, Carl; Nelson, Lewis S; Manini, Alex F
2017-12-01
The misuse of prescription opioids (MUPO) is a leading public health concern. Social media are playing an expanded role in public health research, but there are few methods for estimating established epidemiological metrics from social media. The purpose of this study was to demonstrate that the geographic variation of social media posts mentioning prescription opioid misuse strongly correlates with government estimates of MUPO in the last month. We wrote software to acquire publicly available tweets from Twitter from 2012 to 2014 that contained at least one keyword related to prescription opioid use (n = 3,611,528). A medical toxicologist and emergency physician curated the list of keywords. We used the semantic distance (SemD) to automatically quantify the similarity of meaning between tweets and identify tweets that mentioned MUPO. We defined the SemD between two words as the shortest distance between the two corresponding word-centroids. Each word-centroid represented all recognized meanings of a word. We validated this automatic identification with manual curation. We used Twitter metadata to estimate the location of each tweet. We compared our estimated geographic distribution with the 2013-2015 National Surveys on Drug Usage and Health (NSDUH). Tweets that mentioned MUPO formed a distinct cluster far away from semantically unrelated tweets. The state-by-state correlation between Twitter and NSDUH was highly significant across all NSDUH survey years. The correlation was strongest between Twitter and NSDUH data from those aged 18-25 (r = 0.94, p < 0.01 for 2012; r = 0.94, p < 0.01 for 2013; r = 0.71, p = 0.02 for 2014). The correlation was driven by discussions of opioid use, even after controlling for geographic variation in Twitter usage. Mentions of MUPO on Twitter correlate strongly with state-by-state NSDUH estimates of MUPO. We have also demonstrated that a natural language processing can be used to analyze social media to provide insights for syndromic toxicosurveillance.
Crown, William; Chang, Jessica; Olson, Melvin; Kahler, Kristijan; Swindle, Jason; Buzinec, Paul; Shah, Nilay; Borah, Bijan
2015-09-01
Missing data, particularly missing variables, can create serious analytic challenges in observational comparative effectiveness research studies. Statistical linkage of datasets is a potential method for incorporating missing variables. Prior studies have focused upon the bias introduced by imperfect linkage. This analysis uses a case study of hepatitis C patients to estimate the net effect of statistical linkage on bias, also accounting for the potential reduction in missing variable bias. The results show that statistical linkage can reduce bias while also enabling parameter estimates to be obtained for the formerly missing variables. The usefulness of statistical linkage will vary depending upon the strength of the correlations of the missing variables with the treatment variable, as well as the outcome variable of interest.
Statistical inference for remote sensing-based estimates of net deforestation
Ronald E. McRoberts; Brian F. Walters
2012-01-01
Statistical inference requires expression of an estimate in probabilistic terms, usually in the form of a confidence interval. An approach to constructing confidence intervals for remote sensing-based estimates of net deforestation is illustrated. The approach is based on post-classification methods using two independent forest/non-forest classifications because...
DOT National Transportation Integrated Search
1981-10-01
Two statistical procedures have been developed to estimate hourly or daily aircraft counts. These counts can then be transformed into estimates of instantaneous air counts. The first procedure estimates the stable (deterministic) mean level of hourly...
This analysis updates EPA's standard VSL estimate by using a more comprehensive collection of VSL studies that include studies published between 1992 and 2000, as well as applying a more appropriate statistical method. We provide a pooled effect VSL estimate by applying the empi...
Short-term international migration trends in England and Wales from 2004 to 2009.
Whitworth, Simon; Loukas, Konstantinos; McGregor, Ian
2011-01-01
Short-term migration estimates for England and Wales are the latest addition to the Office for National Statistics (ONS) migration statistics. This article discusses definitions of short-term migration and the methodology that is used to produce the estimates. Some of the estimates and the changes in the estimates over time are then discussed. The article includes previously unpublished short-term migration statistics and therefore helps to give a more complete picture of the size and characteristics of short-term international migration for England and Wales than has previously been possible. ONS have identified a clear user requirement for short-term migration estimates at local authority (LA) level. Consequently, attention is also paid to the progress that has been made and future work that is planned to distribute England and Wales short-term migration estimates to LA level.
UWB pulse detection and TOA estimation using GLRT
NASA Astrophysics Data System (ADS)
Xie, Yan; Janssen, Gerard J. M.; Shakeri, Siavash; Tiberius, Christiaan C. J. M.
2017-12-01
In this paper, a novel statistical approach is presented for time-of-arrival (TOA) estimation based on first path (FP) pulse detection using a sub-Nyquist sampling ultra-wide band (UWB) receiver. The TOA measurement accuracy, which cannot be improved by averaging of the received signal, can be enhanced by the statistical processing of a number of TOA measurements. The TOA statistics are modeled and analyzed for a UWB receiver using threshold crossing detection of a pulse signal with noise. The detection and estimation scheme based on the Generalized Likelihood Ratio Test (GLRT) detector, which captures the full statistical information of the measurement data, is shown to achieve accurate TOA estimation and allows for a trade-off between the threshold level, the noise level, the amplitude and the arrival time of the first path pulse, and the accuracy of the obtained final TOA.
Estimating the Probability of Traditional Copying, Conditional on Answer-Copying Statistics.
Allen, Jeff; Ghattas, Andrew
2016-06-01
Statistics for detecting copying on multiple-choice tests produce p values measuring the probability of a value at least as large as that observed, under the null hypothesis of no copying. The posterior probability of copying is arguably more relevant than the p value, but cannot be derived from Bayes' theorem unless the population probability of copying and probability distribution of the answer-copying statistic under copying are known. In this article, the authors develop an estimator for the posterior probability of copying that is based on estimable quantities and can be used with any answer-copying statistic. The performance of the estimator is evaluated via simulation, and the authors demonstrate how to apply the formula using actual data. Potential uses, generalizability to other types of cheating, and limitations of the approach are discussed.
Technique for estimation of streamflow statistics in mineral areas of interest in Afghanistan
Olson, Scott A.; Mack, Thomas J.
2011-01-01
A technique for estimating streamflow statistics at ungaged stream sites in areas of mineral interest in Afghanistan using drainage-area-ratio relations of historical streamflow data was developed and is documented in this report. The technique can be used to estimate the following streamflow statistics at ungaged sites: (1) 7-day low flow with a 10-year recurrence interval, (2) 7-day low flow with a 2-year recurrence interval, (3) daily mean streamflow exceeded 90 percent of the time, (4) daily mean streamflow exceeded 80 percent of the time, (5) mean monthly streamflow for each month of the year, (6) mean annual streamflow, and (7) minimum monthly streamflow for each month of the year. Because they are based on limited historical data, the estimates of streamflow statistics at ungaged sites are considered preliminary.
Estimates of School Statistics, 1971-72.
ERIC Educational Resources Information Center
Flanigan, Jean M.
This report presents public school statistics for the 50 States, the District of Columbia, and the regions and outlying areas of the United States. The text presents national data for each of the past 10 years and defines the basic series of statistics. Tables present the revised estimates by State and region for 1970-71 and the preliminary…
Radhakrishnan, Srinivasan; Erbis, Serkan; Isaacs, Jacqueline A; Kamarthi, Sagar
2017-01-01
Systematic reviews of scientific literature are important for mapping the existing state of research and highlighting further growth channels in a field of study, but systematic reviews are inherently tedious, time consuming, and manual in nature. In recent years, keyword co-occurrence networks (KCNs) are exploited for knowledge mapping. In a KCN, each keyword is represented as a node and each co-occurrence of a pair of words is represented as a link. The number of times that a pair of words co-occurs in multiple articles constitutes the weight of the link connecting the pair. The network constructed in this manner represents cumulative knowledge of a domain and helps to uncover meaningful knowledge components and insights based on the patterns and strength of links between keywords that appear in the literature. In this work, we propose a KCN-based approach that can be implemented prior to undertaking a systematic review to guide and accelerate the review process. The novelty of this method lies in the new metrics used for statistical analysis of a KCN that differ from those typically used for KCN analysis. The approach is demonstrated through its application to nano-related Environmental, Health, and Safety (EHS) risk literature. The KCN approach identified the knowledge components, knowledge structure, and research trends that match with those discovered through a traditional systematic review of the nanoEHS field. Because KCN-based analyses can be conducted more quickly to explore a vast amount of literature, this method can provide a knowledge map and insights prior to undertaking a rigorous traditional systematic review. This two-step approach can significantly reduce the effort and time required for a traditional systematic literature review. The proposed KCN-based pre-systematic review method is universal. It can be applied to any scientific field of study to prepare a knowledge map.
Isaacs, Jacqueline A.
2017-01-01
Systematic reviews of scientific literature are important for mapping the existing state of research and highlighting further growth channels in a field of study, but systematic reviews are inherently tedious, time consuming, and manual in nature. In recent years, keyword co-occurrence networks (KCNs) are exploited for knowledge mapping. In a KCN, each keyword is represented as a node and each co-occurrence of a pair of words is represented as a link. The number of times that a pair of words co-occurs in multiple articles constitutes the weight of the link connecting the pair. The network constructed in this manner represents cumulative knowledge of a domain and helps to uncover meaningful knowledge components and insights based on the patterns and strength of links between keywords that appear in the literature. In this work, we propose a KCN-based approach that can be implemented prior to undertaking a systematic review to guide and accelerate the review process. The novelty of this method lies in the new metrics used for statistical analysis of a KCN that differ from those typically used for KCN analysis. The approach is demonstrated through its application to nano-related Environmental, Health, and Safety (EHS) risk literature. The KCN approach identified the knowledge components, knowledge structure, and research trends that match with those discovered through a traditional systematic review of the nanoEHS field. Because KCN-based analyses can be conducted more quickly to explore a vast amount of literature, this method can provide a knowledge map and insights prior to undertaking a rigorous traditional systematic review. This two-step approach can significantly reduce the effort and time required for a traditional systematic literature review. The proposed KCN-based pre-systematic review method is universal. It can be applied to any scientific field of study to prepare a knowledge map. PMID:28328983
Consistency of extreme flood estimation approaches
NASA Astrophysics Data System (ADS)
Felder, Guido; Paquet, Emmanuel; Penot, David; Zischg, Andreas; Weingartner, Rolf
2017-04-01
Estimations of low-probability flood events are frequently used for the planning of infrastructure as well as for determining the dimensions of flood protection measures. There are several well-established methodical procedures to estimate low-probability floods. However, a global assessment of the consistency of these methods is difficult to achieve, the "true value" of an extreme flood being not observable. Anyway, a detailed comparison performed on a given case study brings useful information about the statistical and hydrological processes involved in different methods. In this study, the following three different approaches for estimating low-probability floods are compared: a purely statistical approach (ordinary extreme value statistics), a statistical approach based on stochastic rainfall-runoff simulation (SCHADEX method), and a deterministic approach (physically based PMF estimation). These methods are tested for two different Swiss catchments. The results and some intermediate variables are used for assessing potential strengths and weaknesses of each method, as well as for evaluating the consistency of these methods.
NASA Astrophysics Data System (ADS)
Pickard, William F.
2004-10-01
The classical PERT inverse statistics problem requires estimation of the mean, \\skew1\\bar{m} , and standard deviation, s, of a unimodal distribution given estimates of its mode, m, and of the smallest, a, and largest, b, values likely to be encountered. After placing the problem in historical perspective and showing that it is ill-posed because it is underdetermined, this paper offers an approach to resolve the ill-posedness: (a) by interpreting a and b modes of order statistic distributions; (b) by requiring also an estimate of the number of samples, N, considered in estimating the set {m, a, b}; and (c) by maximizing a suitable likelihood, having made the traditional assumption that the underlying distribution is beta. Exact formulae relating the four parameters of the beta distribution to {m, a, b, N} and the assumed likelihood function are then used to compute the four underlying parameters of the beta distribution; and from them, \\skew1\\bar{m} and s are computed using exact formulae.
Jesus, Tiago S; Papadimitriou, Christina; Pinho, Cátia S; Hoenig, Helen
2018-06-01
To characterize the peer-reviewed quality improvement (QI) literature in rehabilitation. Five electronic databases were searched for English-language articles from 2010 to 2016. Keywords for QI and safety management were searched for in combination with keywords for rehabilitation content and journals. Secondary searches (eg, references-list scanning) were also performed. Two reviewers independently selected articles using working definitions of rehabilitation and QI study types; of 1016 references, 112 full texts were assessed for eligibility. Reported study characteristics including study focus, study setting, use of inferential statistics, stated limitations, and use of improvement cycles and theoretical models were extracted by 1 reviewer, with a second reviewer consulted whenever inferences or interpretation were involved. Fifty-nine empirical rehabilitation QI studies were found: 43 reporting on local QI activities, 7 reporting on QI effectiveness research, 8 reporting on QI facilitators or barriers, and 1 systematic review of a specific topic. The number of publications had significant yearly growth between 2010 and 2016 (P=.03). Among the 43 reports on local QI activities, 23.3% did not explicitly report any study limitations; 39.5% did not used inferential statistics to measure the QI impact; 95.3% did not cite/mention the appropriate reporting guidelines; only 18.6% reported multiple QI cycles; just over 50% reported using a model to guide the QI activity; and only 7% reported the use of a particular theoretical model. Study sites and focuses were diverse; however, nearly a third (30.2%) examined early mobilization in intensive care units. The number of empirical, peer-reviewed rehabilitation QI publications is growing but remains a tiny fraction of rehabilitation research publications. Rehabilitation QI studies could be strengthened by greater use of extant models and theory to guide the QI work, consistent reporting of study limitations, and use of inferential statistics. Copyright © 2017 American Congress of Rehabilitation Medicine. All rights reserved.
Robust estimators for speech enhancement in real environments
NASA Astrophysics Data System (ADS)
Sandoval-Ibarra, Yuma; Diaz-Ramirez, Victor H.; Kober, Vitaly
2015-09-01
Common statistical estimators for speech enhancement rely on several assumptions about stationarity of speech signals and noise. These assumptions may not always valid in real-life due to nonstationary characteristics of speech and noise processes. We propose new estimators based on existing estimators by incorporation of computation of rank-order statistics. The proposed estimators are better adapted to non-stationary characteristics of speech signals and noise processes. Through computer simulations we show that the proposed estimators yield a better performance in terms of objective metrics than that of known estimators when speech signals are contaminated with airport, babble, restaurant, and train-station noise.
NASA Technical Reports Server (NTRS)
Freilich, M. H.; Pawka, S. S.
1987-01-01
The statistics of Sxy estimates derived from orthogonal-component measurements are examined. Based on results of Goodman (1957), the probability density function (pdf) for Sxy(f) estimates is derived, and a closed-form solution for arbitrary moments of the distribution is obtained. Characteristic functions are used to derive the exact pdf of Sxy(tot). In practice, a simple Gaussian approximation is found to be highly accurate even for relatively few degrees of freedom. Implications for experiment design are discussed, and a maximum-likelihood estimator for a posterior estimation is outlined.
Statistical Analysis of Big Data on Pharmacogenomics
Fan, Jianqing; Liu, Han
2013-01-01
This paper discusses statistical methods for estimating complex correlation structure from large pharmacogenomic datasets. We selectively review several prominent statistical methods for estimating large covariance matrix for understanding correlation structure, inverse covariance matrix for network modeling, large-scale simultaneous tests for selecting significantly differently expressed genes and proteins and genetic markers for complex diseases, and high dimensional variable selection for identifying important molecules for understanding molecule mechanisms in pharmacogenomics. Their applications to gene network estimation and biomarker selection are used to illustrate the methodological power. Several new challenges of Big data analysis, including complex data distribution, missing data, measurement error, spurious correlation, endogeneity, and the need for robust statistical methods, are also discussed. PMID:23602905
Austin, Peter C
2010-04-22
Multilevel logistic regression models are increasingly being used to analyze clustered data in medical, public health, epidemiological, and educational research. Procedures for estimating the parameters of such models are available in many statistical software packages. There is currently little evidence on the minimum number of clusters necessary to reliably fit multilevel regression models. We conducted a Monte Carlo study to compare the performance of different statistical software procedures for estimating multilevel logistic regression models when the number of clusters was low. We examined procedures available in BUGS, HLM, R, SAS, and Stata. We found that there were qualitative differences in the performance of different software procedures for estimating multilevel logistic models when the number of clusters was low. Among the likelihood-based procedures, estimation methods based on adaptive Gauss-Hermite approximations to the likelihood (glmer in R and xtlogit in Stata) or adaptive Gaussian quadrature (Proc NLMIXED in SAS) tended to have superior performance for estimating variance components when the number of clusters was small, compared to software procedures based on penalized quasi-likelihood. However, only Bayesian estimation with BUGS allowed for accurate estimation of variance components when there were fewer than 10 clusters. For all statistical software procedures, estimation of variance components tended to be poor when there were only five subjects per cluster, regardless of the number of clusters.
Goyal, Ravi; De Gruttola, Victor
2018-01-30
Analysis of sexual history data intended to describe sexual networks presents many challenges arising from the fact that most surveys collect information on only a very small fraction of the population of interest. In addition, partners are rarely identified and responses are subject to reporting biases. Typically, each network statistic of interest, such as mean number of sexual partners for men or women, is estimated independently of other network statistics. There is, however, a complex relationship among networks statistics; and knowledge of these relationships can aid in addressing concerns mentioned earlier. We develop a novel method that constrains a posterior predictive distribution of a collection of network statistics in order to leverage the relationships among network statistics in making inference about network properties of interest. The method ensures that inference on network properties is compatible with an actual network. Through extensive simulation studies, we also demonstrate that use of this method can improve estimates in settings where there is uncertainty that arises both from sampling and from systematic reporting bias compared with currently available approaches to estimation. To illustrate the method, we apply it to estimate network statistics using data from the Chicago Health and Social Life Survey. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
Transmission overhaul and replacement predictions using Weibull and renewel theory
NASA Technical Reports Server (NTRS)
Savage, M.; Lewicki, D. G.
1989-01-01
A method to estimate the frequency of transmission overhauls is presented. This method is based on the two-parameter Weibull statistical distribution for component life. A second method is presented to estimate the number of replacement components needed to support the transmission overhaul pattern. The second method is based on renewal theory. Confidence statistics are applied with both methods to improve the statistical estimate of sample behavior. A transmission example is also presented to illustrate the use of the methods. Transmission overhaul frequency and component replacement calculations are included in the example.
Bayesian approach to inverse statistical mechanics.
Habeck, Michael
2014-05-01
Inverse statistical mechanics aims to determine particle interactions from ensemble properties. This article looks at this inverse problem from a Bayesian perspective and discusses several statistical estimators to solve it. In addition, a sequential Monte Carlo algorithm is proposed that draws the interaction parameters from their posterior probability distribution. The posterior probability involves an intractable partition function that is estimated along with the interactions. The method is illustrated for inverse problems of varying complexity, including the estimation of a temperature, the inverse Ising problem, maximum entropy fitting, and the reconstruction of molecular interaction potentials.
Bayesian approach to inverse statistical mechanics
NASA Astrophysics Data System (ADS)
Habeck, Michael
2014-05-01
Inverse statistical mechanics aims to determine particle interactions from ensemble properties. This article looks at this inverse problem from a Bayesian perspective and discusses several statistical estimators to solve it. In addition, a sequential Monte Carlo algorithm is proposed that draws the interaction parameters from their posterior probability distribution. The posterior probability involves an intractable partition function that is estimated along with the interactions. The method is illustrated for inverse problems of varying complexity, including the estimation of a temperature, the inverse Ising problem, maximum entropy fitting, and the reconstruction of molecular interaction potentials.
NASA Technical Reports Server (NTRS)
Fisher, Brad; Wolff, David B.
2010-01-01
Passive and active microwave rain sensors onboard earth-orbiting satellites estimate monthly rainfall from the instantaneous rain statistics collected during satellite overpasses. It is well known that climate-scale rain estimates from meteorological satellites incur sampling errors resulting from the process of discrete temporal sampling and statistical averaging. Sampling and retrieval errors ultimately become entangled in the estimation of the mean monthly rain rate. The sampling component of the error budget effectively introduces statistical noise into climate-scale rain estimates that obscure the error component associated with the instantaneous rain retrieval. Estimating the accuracy of the retrievals on monthly scales therefore necessitates a decomposition of the total error budget into sampling and retrieval error quantities. This paper presents results from a statistical evaluation of the sampling and retrieval errors for five different space-borne rain sensors on board nine orbiting satellites. Using an error decomposition methodology developed by one of the authors, sampling and retrieval errors were estimated at 0.25 resolution within 150 km of ground-based weather radars located at Kwajalein, Marshall Islands and Melbourne, Florida. Error and bias statistics were calculated according to the land, ocean and coast classifications of the surface terrain mask developed for the Goddard Profiling (GPROF) rain algorithm. Variations in the comparative error statistics are attributed to various factors related to differences in the swath geometry of each rain sensor, the orbital and instrument characteristics of the satellite and the regional climatology. The most significant result from this study found that each of the satellites incurred negative longterm oceanic retrieval biases of 10 to 30%.
Efficient estimation of Pareto model: Some modified percentile estimators.
Bhatti, Sajjad Haider; Hussain, Shahzad; Ahmad, Tanvir; Aslam, Muhammad; Aftab, Muhammad; Raza, Muhammad Ali
2018-01-01
The article proposes three modified percentile estimators for parameter estimation of the Pareto distribution. These modifications are based on median, geometric mean and expectation of empirical cumulative distribution function of first-order statistic. The proposed modified estimators are compared with traditional percentile estimators through a Monte Carlo simulation for different parameter combinations with varying sample sizes. Performance of different estimators is assessed in terms of total mean square error and total relative deviation. It is determined that modified percentile estimator based on expectation of empirical cumulative distribution function of first-order statistic provides efficient and precise parameter estimates compared to other estimators considered. The simulation results were further confirmed using two real life examples where maximum likelihood and moment estimators were also considered.
Research in Biomaterials and Tissue Engineering: Achievements and perspectives.
Ventre, Maurizio; Causa, Filippo; Netti, Paolo A; Pietrabissa, Riccardo
2015-01-01
Research on biomaterials and related subjects has been active in Italy. Starting from the very first examples of biomaterials and biomedical devices, Italian researchers have always provided valuable scientific contributions. This trend has steadily increased. To provide a rough estimate of this, it is sufficient to search PubMed, a free search engine accessing primarily the MEDLINE database of references and abstracts on life sciences and biomedical topics, with the keywords "biomaterials" or "tissue engineering" and sort the results by affiliation. Again, even though this is a crude estimate, the results speak for themselves, as Italy is the third European country, in terms of publications, with an astonishing 3,700 products in the last decade.
New features added to EVALIDator: ratio estimation and county choropleth maps
Patrick D. Miles; Mark H. Hansen
2012-01-01
The EVALIDator Web application, developed in 2007, provides estimates and sampling errors for many user selected forest statistics from the Forest Inventory and Analysis Database (FIADB). Among the statistics estimated are forest area, number of trees, biomass, volume, growth, removals, and mortality. A new release of EVALIDator, developed in 2012, has an option to...
Spatial Analysis of Stover Moisture Content During Harvest Season in the U.S.
Oyedeji, Oluwafemi A.; Sokhansanj, Shahab; Webb, Erin
2017-01-01
The moisture content of a maturing crop varies as the harvest season progresses. For crop residues such as corn stover, moisture content at the time of harvest can be as high as 75% (wet mass basis) to less than 20% depending on the geographic location (climate conditions) and stage of harvest. Biomass moisture content is critical for baling and extended storage. It is therefore essential to have an estimate of the quantities of corn stover available as wet or dry for various parts of the U.S. To this end, we analyzed hourly weather data (temperature, humidity, and rainfall) from themore » Typical Meteorological Year v.3 (TMY3) dataset developed by the National Renewable Energy Laboratory. A recently published set of equations for calculating the moisture content of stover as a function of hourly temperature, humidity, and rainfall were used. The annual start and end of corn grain harvest along with annual grain production (in bushels) for each state were extracted from USDA National Agricultural Statistics Service reports. Using these datasets and moisture sorption equations, the percentage of corn stover tonnage with moisture content less than 20%, between 20% and 40%, or greater than 40% was estimated from the length of time that the biomass was in these moisture content ranges. These calculations were carried out for several locations within each of the states for which TMY data were available. It was concluded that about 37.2% of corn stover is dry (<20% moisture content), whereas 36.5% is wet (>40% moisture content) nationwide. The remaining 27.0% of corn stover is between 20% and 40% moisture content. Keywords: Corn stover, Equilibrium moisture content, Field drying, Moisture content, Stover harvest, Typical Meteorological Year data.« less
Spatial Analysis of Stover Moisture Content During Harvest Season in the U.S.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Oyedeji, Oluwafemi A.; Sokhansanj, Shahab; Webb, Erin
The moisture content of a maturing crop varies as the harvest season progresses. For crop residues such as corn stover, moisture content at the time of harvest can be as high as 75% (wet mass basis) to less than 20% depending on the geographic location (climate conditions) and stage of harvest. Biomass moisture content is critical for baling and extended storage. It is therefore essential to have an estimate of the quantities of corn stover available as wet or dry for various parts of the U.S. To this end, we analyzed hourly weather data (temperature, humidity, and rainfall) from themore » Typical Meteorological Year v.3 (TMY3) dataset developed by the National Renewable Energy Laboratory. A recently published set of equations for calculating the moisture content of stover as a function of hourly temperature, humidity, and rainfall were used. The annual start and end of corn grain harvest along with annual grain production (in bushels) for each state were extracted from USDA National Agricultural Statistics Service reports. Using these datasets and moisture sorption equations, the percentage of corn stover tonnage with moisture content less than 20%, between 20% and 40%, or greater than 40% was estimated from the length of time that the biomass was in these moisture content ranges. These calculations were carried out for several locations within each of the states for which TMY data were available. It was concluded that about 37.2% of corn stover is dry (<20% moisture content), whereas 36.5% is wet (>40% moisture content) nationwide. The remaining 27.0% of corn stover is between 20% and 40% moisture content. Keywords: Corn stover, Equilibrium moisture content, Field drying, Moisture content, Stover harvest, Typical Meteorological Year data.« less
NASA Astrophysics Data System (ADS)
Sutton, Virginia Kay
This paper examines statistical issues associated with estimating paths of juvenile salmon through the intakes of Kaplan turbines. Passive sensors, hydrophones, detecting signals from ultrasonic transmitters implanted in individual fish released into the preturbine region were used to obtain the information to estimate fish paths through the intake. Aim and location of the sensors affects the spatial region in which the transmitters can be detected, and formulas relating this region to sensor aiming directions are derived. Cramer-Rao lower bounds for the variance of estimators of fish location are used to optimize placement of each sensor. Finally, a statistical methodology is developed for analyzing angular data collected from optimally placed sensors.
Wu, S.-S.; Wang, L.; Qiu, X.
2008-01-01
This article presents a deterministic model for sub-block-level population estimation based on the total building volumes derived from geographic information system (GIS) building data and three census block-level housing statistics. To assess the model, we generated artificial blocks by aggregating census block areas and calculating the respective housing statistics. We then applied the model to estimate populations for sub-artificial-block areas and assessed the estimates with census populations of the areas. Our analyses indicate that the average percent error of population estimation for sub-artificial-block areas is comparable to those for sub-census-block areas of the same size relative to associated blocks. The smaller the sub-block-level areas, the higher the population estimation errors. For example, the average percent error for residential areas is approximately 0.11 percent for 100 percent block areas and 35 percent for 5 percent block areas.
NASA Technical Reports Server (NTRS)
Stevens, T.; Ritz, S.; Aleman, A.; Genazzio, M.; Morahan, M.; Wharton, S.
2016-01-01
NASA's Global Change Master Directory (GCMD) develops and expands a hierarchical set of controlled vocabularies (keywords) covering the Earth sciences and associated information (data centers, projects, platforms, instruments, etc.). The purpose of the keywords is to describe Earth science data and services in a consistent and comprehensive manner, allowing for the precise searching of metadata and subsequent retrieval of data and services. The keywords are accessible in a standardized SKOSRDFOWL representation and are used as an authoritative taxonomy, as a source for developing ontologies, and to search and access Earth Science data within online metadata catalogues. The keyword development approach involves: (1) receiving community suggestions, (2) triaging community suggestions, (3) evaluating the keywords against a set of criteria coordinated by the NASA ESDIS Standards Office, and (4) publication/notification of the keyword changes. This approach emphasizes community input, which helps ensure a high quality, normalized, and relevant keyword structure that will evolve with users changing needs. The Keyword Community Forum, which promotes a responsive, open, and transparent processes, is an area where users can discuss keyword topics and make suggestions for new keywords. The formalized approach could potentially be used as a model for keyword development.
Extraction of Protein-Protein Interaction from Scientific Articles by Predicting Dominant Keywords.
Koyabu, Shun; Phan, Thi Thanh Thuy; Ohkawa, Takenao
2015-01-01
For the automatic extraction of protein-protein interaction information from scientific articles, a machine learning approach is useful. The classifier is generated from training data represented using several features to decide whether a protein pair in each sentence has an interaction. Such a specific keyword that is directly related to interaction as "bind" or "interact" plays an important role for training classifiers. We call it a dominant keyword that affects the capability of the classifier. Although it is important to identify the dominant keywords, whether a keyword is dominant depends on the context in which it occurs. Therefore, we propose a method for predicting whether a keyword is dominant for each instance. In this method, a keyword that derives imbalanced classification results is tentatively assumed to be a dominant keyword initially. Then the classifiers are separately trained from the instance with and without the assumed dominant keywords. The validity of the assumed dominant keyword is evaluated based on the classification results of the generated classifiers. The assumption is updated by the evaluation result. Repeating this process increases the prediction accuracy of the dominant keyword. Our experimental results using five corpora show the effectiveness of our proposed method with dominant keyword prediction.
Seeking Health Information Online: Does Wikipedia Matter?
Laurent, Michaël R.; Vickers, Tim J.
2009-01-01
Objective To determine the significance of the English Wikipedia as a source of online health information. Design The authors measured Wikipedia's ranking on general Internet search engines by entering keywords from MedlinePlus, NHS Direct Online, and the National Organization of Rare Diseases as queries into search engine optimization software. We assessed whether article quality influenced this ranking. The authors tested whether traffic to Wikipedia coincided with epidemiological trends and news of emerging health concerns, and how it compares to MedlinePlus. Measurements Cumulative incidence and average position of Wikipedia® compared to other Web sites among the first 20 results on general Internet search engines (Google®, Google UK®, Yahoo®, and MSN®), and page view statistics for selected Wikipedia articles and MedlinePlus pages. Results Wikipedia ranked among the first ten results in 71–85% of search engines and keywords tested. Wikipedia surpassed MedlinePlus and NHS Direct Online (except for queries from the latter on Google UK), and ranked higher with quality articles. Wikipedia ranked highest for rare diseases, although its incidence in several categories decreased. Page views increased parallel to the occurrence of 20 seasonal disorders and news of three emerging health concerns. Wikipedia articles were viewed more often than MedlinePlus Topic (p = 0.001) but for MedlinePlus Encyclopedia pages, the trend was not significant (p = 0.07–0.10). Conclusions Based on its search engine ranking and page view statistics, the English Wikipedia is a prominent source of online health information compared to the other online health information providers studied. PMID:19390105
Comparing estimates of climate change impacts from process-based and statistical crop models
NASA Astrophysics Data System (ADS)
Lobell, David B.; Asseng, Senthold
2017-01-01
The potential impacts of climate change on crop productivity are of widespread interest to those concerned with addressing climate change and improving global food security. Two common approaches to assess these impacts are process-based simulation models, which attempt to represent key dynamic processes affecting crop yields, and statistical models, which estimate functional relationships between historical observations of weather and yields. Examples of both approaches are increasingly found in the scientific literature, although often published in different disciplinary journals. Here we compare published sensitivities to changes in temperature, precipitation, carbon dioxide (CO2), and ozone from each approach for the subset of crops, locations, and climate scenarios for which both have been applied. Despite a common perception that statistical models are more pessimistic, we find no systematic differences between the predicted sensitivities to warming from process-based and statistical models up to +2 °C, with limited evidence at higher levels of warming. For precipitation, there are many reasons why estimates could be expected to differ, but few estimates exist to develop robust comparisons, and precipitation changes are rarely the dominant factor for predicting impacts given the prominent role of temperature, CO2, and ozone changes. A common difference between process-based and statistical studies is that the former tend to include the effects of CO2 increases that accompany warming, whereas statistical models typically do not. Major needs moving forward include incorporating CO2 effects into statistical studies, improving both approaches’ treatment of ozone, and increasing the use of both methods within the same study. At the same time, those who fund or use crop model projections should understand that in the short-term, both approaches when done well are likely to provide similar estimates of warming impacts, with statistical models generally requiring fewer resources to produce robust estimates, especially when applied to crops beyond the major grains.
Parkhurst, David L.; Appelo, C.A.J.
2013-01-01
PHREEQC version 3 is a computer program written in the C and C++ programming languages that is designed to perform a wide variety of aqueous geochemical calculations. PHREEQC implements several types of aqueous models: two ion-association aqueous models (the Lawrence Livermore National Laboratory model and WATEQ4F), a Pitzer specific-ion-interaction aqueous model, and the SIT (Specific ion Interaction Theory) aqueous model. Using any of these aqueous models, PHREEQC has capabilities for (1) speciation and saturation-index calculations; (2) batch-reaction and one-dimensional (1D) transport calculations with reversible and irreversible reactions, which include aqueous, mineral, gas, solid-solution, surface-complexation, and ion-exchange equilibria, and specified mole transfers of reactants, kinetically controlled reactions, mixing of solutions, and pressure and temperature changes; and (3) inverse modeling, which finds sets of mineral and gas mole transfers that account for differences in composition between waters within specified compositional uncertainty limits. Many new modeling features were added to PHREEQC version 3 relative to version 2. The Pitzer aqueous model (pitzer.dat database, with keyword PITZER) can be used for high-salinity waters that are beyond the range of application for the Debye-Hückel theory. The Peng-Robinson equation of state has been implemented for calculating the solubility of gases at high pressure. Specific volumes of aqueous species are calculated as a function of the dielectric properties of water and the ionic strength of the solution, which allows calculation of pressure effects on chemical reactions and the density of a solution. The specific conductance and the density of a solution are calculated and printed in the output file. In addition to Runge-Kutta integration, a stiff ordinary differential equation solver (CVODE) has been included for kinetic calculations with multiple rates that occur at widely different time scales. Surface complexation can be calculated with the CD-MUSIC (Charge Distribution MUltiSIte Complexation) triple-layer model in addition to the diffuse-layer model. The composition of the electrical double layer of a surface can be estimated by using the Donnan approach, which is more robust and faster than the alternative Borkovec-Westall integration. Multicomponent diffusion, diffusion in the electrostatic double layer on a surface, and transport of colloids with simultaneous surface complexation have been added to the transport module. A series of keyword data blocks has been added for isotope calculations—ISOTOPES, CALCULATE_VALUES, ISOTOPE_ALPHAS, ISOTOPE_RATIOS, and NAMED_EXPRESSIONS. Solution isotopic data can be input in conventional units (for example, permil, percent modern carbon, or tritium units) and the numbers are converted to moles of isotope by PHREEQC. The isotopes are treated as individual components (they must be defined as individual master species) so that each isotope has its own set of aqueous species, gases, and solids. The isotope-related keywords allow calculating equilibrium fractionation of isotopes among the species and phases of a system. The calculated isotopic compositions are printed in easily readable conventional units. New keywords and options facilitate the setup of input files and the interpretation of the results. Keyword data blocks can be copied (keyword COPY) and deleted (keyword DELETE). Keyword data items can be altered by using the keyword data blocks with the _MODIFY extension and a simulation can be run with all reactants of a given index number (keyword RUN_CELLS). The definition of the complete chemical state of all reactants of PHREEQC can be saved in a file in a raw data format ( DUMP and _RAW keywords). The file can be read as part of another input file with the INCLUDE$ keyword. These keywords facilitate the use of IPhreeqc, which is a module implementing all PHREEQC version 3 capabilities; the module is designed to be used in other programs that need to implement geochemical calculations; for example, transport codes. Charting capabilities have been added to some versions of PHREEQC. Charting capabilities have been added to Windows distributions of PHREEQC version 3. (Charting on Linux requires installation of Wine.) The keyword data block USER_GRAPH allows selection of data for plotting and manipulation of chart appearance. Almost any results from geochemical simulations (for example, concentrations, activities, or saturation indices) can be retrieved by using Basic language functions and specified as data for plotting in USER_GRAPH. Results of transport simulations can be plotted against distance or time. Data can be added to a chart from tab-separated-values files. All input for PHREEQC version 3 is defined in keyword data blocks, each of which may have a series of identifiers for specific types of data. This report provides a complete description of each keyword data block and its associated identifiers. Input files for 22 examples that demonstrate most of the capabilities of PHREEQC version 3 are described and the results of the example simulations are presented and discussed.
A Comparison of Methods for Estimating the Determinant of High-Dimensional Covariance Matrix.
Hu, Zongliang; Dong, Kai; Dai, Wenlin; Tong, Tiejun
2017-09-21
The determinant of the covariance matrix for high-dimensional data plays an important role in statistical inference and decision. It has many real applications including statistical tests and information theory. Due to the statistical and computational challenges with high dimensionality, little work has been proposed in the literature for estimating the determinant of high-dimensional covariance matrix. In this paper, we estimate the determinant of the covariance matrix using some recent proposals for estimating high-dimensional covariance matrix. Specifically, we consider a total of eight covariance matrix estimation methods for comparison. Through extensive simulation studies, we explore and summarize some interesting comparison results among all compared methods. We also provide practical guidelines based on the sample size, the dimension, and the correlation of the data set for estimating the determinant of high-dimensional covariance matrix. Finally, from a perspective of the loss function, the comparison study in this paper may also serve as a proxy to assess the performance of the covariance matrix estimation.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kane, V.E.
1982-01-01
A class of goodness-of-fit estimators is found to provide a useful alternative in certain situations to the standard maximum likelihood method which has some undesirable estimation characteristics for estimation from the three-parameter lognormal distribution. The class of goodness-of-fit tests considered include the Shapiro-Wilk and Filliben tests which reduce to a weighted linear combination of the order statistics that can be maximized in estimation problems. The weighted order statistic estimators are compared to the standard procedures in Monte Carlo simulations. Robustness of the procedures are examined and example data sets analyzed.
ERIC Educational Resources Information Center
Population Reference Bureau, Inc., Washington, DC.
This poster-size data sheet presents population estimates and selected demographic indicators for the nation's 281 metropolitan areas. These areas are divided into 261 Metropolitan Statistical Areas (MSAs) and 20 Consolidated Metropolitan Statistical Areas (CMSAs), reporting units which replace the Standard Metropolitan Statistical Areas (SMSAs)…
Statistical tools for transgene copy number estimation based on real-time PCR.
Yuan, Joshua S; Burris, Jason; Stewart, Nathan R; Mentewab, Ayalew; Stewart, C Neal
2007-11-01
As compared with traditional transgene copy number detection technologies such as Southern blot analysis, real-time PCR provides a fast, inexpensive and high-throughput alternative. However, the real-time PCR based transgene copy number estimation tends to be ambiguous and subjective stemming from the lack of proper statistical analysis and data quality control to render a reliable estimation of copy number with a prediction value. Despite the recent progresses in statistical analysis of real-time PCR, few publications have integrated these advancements in real-time PCR based transgene copy number determination. Three experimental designs and four data quality control integrated statistical models are presented. For the first method, external calibration curves are established for the transgene based on serially-diluted templates. The Ct number from a control transgenic event and putative transgenic event are compared to derive the transgene copy number or zygosity estimation. Simple linear regression and two group T-test procedures were combined to model the data from this design. For the second experimental design, standard curves were generated for both an internal reference gene and the transgene, and the copy number of transgene was compared with that of internal reference gene. Multiple regression models and ANOVA models can be employed to analyze the data and perform quality control for this approach. In the third experimental design, transgene copy number is compared with reference gene without a standard curve, but rather, is based directly on fluorescence data. Two different multiple regression models were proposed to analyze the data based on two different approaches of amplification efficiency integration. Our results highlight the importance of proper statistical treatment and quality control integration in real-time PCR-based transgene copy number determination. These statistical methods allow the real-time PCR-based transgene copy number estimation to be more reliable and precise with a proper statistical estimation. Proper confidence intervals are necessary for unambiguous prediction of trangene copy number. The four different statistical methods are compared for their advantages and disadvantages. Moreover, the statistical methods can also be applied for other real-time PCR-based quantification assays including transfection efficiency analysis and pathogen quantification.
Extraction of Protein-Protein Interaction from Scientific Articles by Predicting Dominant Keywords
Koyabu, Shun; Phan, Thi Thanh Thuy; Ohkawa, Takenao
2015-01-01
For the automatic extraction of protein-protein interaction information from scientific articles, a machine learning approach is useful. The classifier is generated from training data represented using several features to decide whether a protein pair in each sentence has an interaction. Such a specific keyword that is directly related to interaction as “bind” or “interact” plays an important role for training classifiers. We call it a dominant keyword that affects the capability of the classifier. Although it is important to identify the dominant keywords, whether a keyword is dominant depends on the context in which it occurs. Therefore, we propose a method for predicting whether a keyword is dominant for each instance. In this method, a keyword that derives imbalanced classification results is tentatively assumed to be a dominant keyword initially. Then the classifiers are separately trained from the instance with and without the assumed dominant keywords. The validity of the assumed dominant keyword is evaluated based on the classification results of the generated classifiers. The assumption is updated by the evaluation result. Repeating this process increases the prediction accuracy of the dominant keyword. Our experimental results using five corpora show the effectiveness of our proposed method with dominant keyword prediction. PMID:26783534
An audit of the statistics and the comparison with the parameter in the population
NASA Astrophysics Data System (ADS)
Bujang, Mohamad Adam; Sa'at, Nadiah; Joys, A. Reena; Ali, Mariana Mohamad
2015-10-01
The sufficient sample size that is needed to closely estimate the statistics for particular parameters are use to be an issue. Although sample size might had been calculated referring to objective of the study, however, it is difficult to confirm whether the statistics are closed with the parameter for a particular population. All these while, guideline that uses a p-value less than 0.05 is widely used as inferential evidence. Therefore, this study had audited results that were analyzed from various sub sample and statistical analyses and had compared the results with the parameters in three different populations. Eight types of statistical analysis and eight sub samples for each statistical analysis were analyzed. Results found that the statistics were consistent and were closed to the parameters when the sample study covered at least 15% to 35% of population. Larger sample size is needed to estimate parameter that involve with categorical variables compared with numerical variables. Sample sizes with 300 to 500 are sufficient to estimate the parameters for medium size of population.
Emura, Takeshi; Konno, Yoshihiko; Michimae, Hirofumi
2015-07-01
Doubly truncated data consist of samples whose observed values fall between the right- and left- truncation limits. With such samples, the distribution function of interest is estimated using the nonparametric maximum likelihood estimator (NPMLE) that is obtained through a self-consistency algorithm. Owing to the complicated asymptotic distribution of the NPMLE, the bootstrap method has been suggested for statistical inference. This paper proposes a closed-form estimator for the asymptotic covariance function of the NPMLE, which is computationally attractive alternative to bootstrapping. Furthermore, we develop various statistical inference procedures, such as confidence interval, goodness-of-fit tests, and confidence bands to demonstrate the usefulness of the proposed covariance estimator. Simulations are performed to compare the proposed method with both the bootstrap and jackknife methods. The methods are illustrated using the childhood cancer dataset.
NASA Astrophysics Data System (ADS)
Sutawanir
2015-12-01
Mortality tables play important role in actuarial studies such as life annuities, premium determination, premium reserve, valuation pension plan, pension funding. Some known mortality tables are CSO mortality table, Indonesian Mortality Table, Bowers mortality table, Japan Mortality table. For actuary applications some tables are constructed with different environment such as single decrement, double decrement, and multiple decrement. There exist two approaches in mortality table construction : mathematics approach and statistical approach. Distribution model and estimation theory are the statistical concepts that are used in mortality table construction. This article aims to discuss the statistical approach in mortality table construction. The distributional assumptions are uniform death distribution (UDD) and constant force (exponential). Moment estimation and maximum likelihood are used to estimate the mortality parameter. Moment estimation methods are easier to manipulate compared to maximum likelihood estimation (mle). However, the complete mortality data are not used in moment estimation method. Maximum likelihood exploited all available information in mortality estimation. Some mle equations are complicated and solved using numerical methods. The article focus on single decrement estimation using moment and maximum likelihood estimation. Some extension to double decrement will introduced. Simple dataset will be used to illustrated the mortality estimation, and mortality table.
1980-02-01
to estimate f -..ell, -noderately ,-ell, or- poorly. 1 ’The sansitivity *of a rec-ilarized estimate of f to the noise is made explicit. After giving the...AD-A 7 .SA92 925 WISCONSIN UN! V-MADISON DEFT OF STATISTICS F /S 11,’ 1 ILL POSED PRORLEMS: NUMERICAL ANn STATISTICAL METHODS FOR MILOL-ETC(U FEB 80 a...estimate f given z. We first define the 1 intrinsic rank of the problem where jK(tit) f (t)dt is known exactly. This 0 definition is used to provide insight
An alternative approach to confidence interval estimation for the win ratio statistic.
Luo, Xiaodong; Tian, Hong; Mohanty, Surya; Tsai, Wei Yann
2015-03-01
Pocock et al. (2012, European Heart Journal 33, 176-182) proposed a win ratio approach to analyzing composite endpoints comprised of outcomes with different clinical priorities. In this article, we establish a statistical framework for this approach. We derive the null hypothesis and propose a closed-form variance estimator for the win ratio statistic in all pairwise matching situation. Our simulation study shows that the proposed variance estimator performs well regardless of the magnitude of treatment effect size and the type of the joint distribution of the outcomes. © 2014, The International Biometric Society.
Xiang, G.; Ferson, S.; Ginzburg, L.; Longpré, L.; Mayorga, E.; Kosheleva, O.
2013-01-01
To preserve privacy, the original data points (with exact values) are replaced by boxes containing each (inaccessible) data point. This privacy-motivated uncertainty leads to uncertainty in the statistical characteristics computed based on this data. In a previous paper, we described how to minimize this uncertainty under the assumption that we use the same standard statistical estimates for the desired characteristics. In this paper, we show that we can further decrease the resulting uncertainty if we allow fuzzy-motivated weighted estimates, and we explain how to optimally select the corresponding weights. PMID:25187183
Ulus, Tumer; Yurtseven, Eray; Cavdar, Sabanur; Erginoz, Ethem; Erdogan, M. Sarper
2012-01-01
Aim To compare the quality of the 2008 cancer mortality data of the Istanbul Directorate of Cemeteries (IDC) with the 2008 data of International Agency for Research on Cancer (IARC) and Turkish Statistical Institute (TUIK), and discuss the suitability of using this databank for estimations of cancer mortality in the future. Methods We used 2008 and 2010 death records of the IDC and compared it to TUIK and IARC data. Results According to the WHO statistics, in Turkey in 2008 there were 67 255 estimated cancer deaths. As the population of Turkey was 71 517 100, the cancer mortality rate was 9.4 per 10 000. According to the IDC statistics, the cancer mortality rate in Istanbul in 2008 was 5.97 per 10 000. Conclusion IDC estimates were higher than WHO estimates probably because WHO bases its estimates on a sample group and because of the restrictions of IDC data collection method. Death certificates could be a reliable and accurate data source for mortality statistics if the problems of data collection are solved. PMID:23100210
Standardization of Keyword Search Mode
ERIC Educational Resources Information Center
Su, Di
2010-01-01
In spite of its popularity, keyword search mode has not been standardized. Though information professionals are quick to adapt to various presentations of keyword search mode, novice end-users may find keyword search confusing. This article compares keyword search mode in some major reference databases and calls for standardization. (Contains 3…
Özgür, Arzucan; Hur, Junguk; He, Yongqun
2016-01-01
The Interaction Network Ontology (INO) logically represents biological interactions, pathways, and networks. INO has been demonstrated to be valuable in providing a set of structured ontological terms and associated keywords to support literature mining of gene-gene interactions from biomedical literature. However, previous work using INO focused on single keyword matching, while many interactions are represented with two or more interaction keywords used in combination. This paper reports our extension of INO to include combinatory patterns of two or more literature mining keywords co-existing in one sentence to represent specific INO interaction classes. Such keyword combinations and related INO interaction type information could be automatically obtained via SPARQL queries, formatted in Excel format, and used in an INO-supported SciMiner, an in-house literature mining program. We studied the gene interaction sentences from the commonly used benchmark Learning Logic in Language (LLL) dataset and one internally generated vaccine-related dataset to identify and analyze interaction types containing multiple keywords. Patterns obtained from the dependency parse trees of the sentences were used to identify the interaction keywords that are related to each other and collectively represent an interaction type. The INO ontology currently has 575 terms including 202 terms under the interaction branch. The relations between the INO interaction types and associated keywords are represented using the INO annotation relations: 'has literature mining keywords' and 'has keyword dependency pattern'. The keyword dependency patterns were generated via running the Stanford Parser to obtain dependency relation types. Out of the 107 interactions in the LLL dataset represented with two-keyword interaction types, 86 were identified by using the direct dependency relations. The LLL dataset contained 34 gene regulation interaction types, each of which associated with multiple keywords. A hierarchical display of these 34 interaction types and their ancestor terms in INO resulted in the identification of specific gene-gene interaction patterns from the LLL dataset. The phenomenon of having multi-keyword interaction types was also frequently observed in the vaccine dataset. By modeling and representing multiple textual keywords for interaction types, the extended INO enabled the identification of complex biological gene-gene interactions represented with multiple keywords.
Reyes-Aldasoro, Constantino Carlos
2017-01-01
In this work, the public database of biomedical literature PubMed was mined using queries with combinations of keywords and year restrictions. It was found that the proportion of Cancer-related entries per year in PubMed has risen from around 6% in 1950 to more than 16% in 2016. This increase is not shared by other conditions such as AIDS, Malaria, Tuberculosis, Diabetes, Cardiovascular, Stroke and Infection some of which have, on the contrary, decreased as a proportion of the total entries per year. Organ-related queries were performed to analyse the variation of some specific cancers. A series of queries related to incidence, funding, and relationship with DNA, Computing and Mathematics, were performed to test correlation between the keywords, with the hope of elucidating the cause behind the rise of Cancer in PubMed. Interestingly, the proportion of Cancer-related entries that contain "DNA", "Computational" or "Mathematical" have increased, which suggests that the impact of these scientific advances on Cancer has been stronger than in other conditions. It is important to highlight that the results obtained with the data mining approach here presented are limited to the presence or absence of the keywords on a single, yet extensive, database. Therefore, results should be observed with caution. All the data used for this work is publicly available through PubMed and the UK's Office for National Statistics. All queries and figures were generated with the software platform Matlab and the files are available as supplementary material.
2017-01-01
In this work, the public database of biomedical literature PubMed was mined using queries with combinations of keywords and year restrictions. It was found that the proportion of Cancer-related entries per year in PubMed has risen from around 6% in 1950 to more than 16% in 2016. This increase is not shared by other conditions such as AIDS, Malaria, Tuberculosis, Diabetes, Cardiovascular, Stroke and Infection some of which have, on the contrary, decreased as a proportion of the total entries per year. Organ-related queries were performed to analyse the variation of some specific cancers. A series of queries related to incidence, funding, and relationship with DNA, Computing and Mathematics, were performed to test correlation between the keywords, with the hope of elucidating the cause behind the rise of Cancer in PubMed. Interestingly, the proportion of Cancer-related entries that contain “DNA”, “Computational” or “Mathematical” have increased, which suggests that the impact of these scientific advances on Cancer has been stronger than in other conditions. It is important to highlight that the results obtained with the data mining approach here presented are limited to the presence or absence of the keywords on a single, yet extensive, database. Therefore, results should be observed with caution. All the data used for this work is publicly available through PubMed and the UK’s Office for National Statistics. All queries and figures were generated with the software platform Matlab and the files are available as supplementary material. PMID:28282418
Kruschke, John K; Liddell, Torrin M
2018-02-01
In the practice of data analysis, there is a conceptual distinction between hypothesis testing, on the one hand, and estimation with quantified uncertainty on the other. Among frequentists in psychology, a shift of emphasis from hypothesis testing to estimation has been dubbed "the New Statistics" (Cumming 2014). A second conceptual distinction is between frequentist methods and Bayesian methods. Our main goal in this article is to explain how Bayesian methods achieve the goals of the New Statistics better than frequentist methods. The article reviews frequentist and Bayesian approaches to hypothesis testing and to estimation with confidence or credible intervals. The article also describes Bayesian approaches to meta-analysis, randomized controlled trials, and power analysis.
Walker, Martin; Basáñez, María-Gloria; Ouédraogo, André Lin; Hermsen, Cornelus; Bousema, Teun; Churcher, Thomas S
2015-01-16
Quantitative molecular methods (QMMs) such as quantitative real-time polymerase chain reaction (q-PCR), reverse-transcriptase PCR (qRT-PCR) and quantitative nucleic acid sequence-based amplification (QT-NASBA) are increasingly used to estimate pathogen density in a variety of clinical and epidemiological contexts. These methods are often classified as semi-quantitative, yet estimates of reliability or sensitivity are seldom reported. Here, a statistical framework is developed for assessing the reliability (uncertainty) of pathogen densities estimated using QMMs and the associated diagnostic sensitivity. The method is illustrated with quantification of Plasmodium falciparum gametocytaemia by QT-NASBA. The reliability of pathogen (e.g. gametocyte) densities, and the accompanying diagnostic sensitivity, estimated by two contrasting statistical calibration techniques, are compared; a traditional method and a mixed model Bayesian approach. The latter accounts for statistical dependence of QMM assays run under identical laboratory protocols and permits structural modelling of experimental measurements, allowing precision to vary with pathogen density. Traditional calibration cannot account for inter-assay variability arising from imperfect QMMs and generates estimates of pathogen density that have poor reliability, are variable among assays and inaccurately reflect diagnostic sensitivity. The Bayesian mixed model approach assimilates information from replica QMM assays, improving reliability and inter-assay homogeneity, providing an accurate appraisal of quantitative and diagnostic performance. Bayesian mixed model statistical calibration supersedes traditional techniques in the context of QMM-derived estimates of pathogen density, offering the potential to improve substantially the depth and quality of clinical and epidemiological inference for a wide variety of pathogens.
Eash, David A.; Barnes, Kimberlee K.; O'Shea, Padraic S.
2016-09-19
A statewide study was led to develop regression equations for estimating three selected spring and three selected fall low-flow frequency statistics for ungaged stream sites in Iowa. The estimation equations developed for the six low-flow frequency statistics include spring (April through June) 1-, 7-, and 30-day mean low flows for a recurrence interval of 10 years and fall (October through December) 1-, 7-, and 30-day mean low flows for a recurrence interval of 10 years. Estimates of the three selected spring statistics are provided for 241 U.S. Geological Survey continuous-record streamgages, and estimates of the three selected fall statistics are provided for 238 of these streamgages, using data through June 2014. Because only 9 years of fall streamflow record were available, three streamgages included in the development of the spring regression equations were not included in the development of the fall regression equations. Because of regulation, diversion, or urbanization, 30 of the 241 streamgages were not included in the development of the regression equations. The study area includes Iowa and adjacent areas within 50 miles of the Iowa border. Because trend analyses indicated statistically significant positive trends when considering the period of record for most of the streamgages, the longest, most recent period of record without a significant trend was determined for each streamgage for use in the study. Geographic information system software was used to measure 63 selected basin characteristics for each of the 211streamgages used to develop the regional regression equations. The study area was divided into three low-flow regions that were defined in a previous study for the development of regional regression equations.Because several streamgages included in the development of regional regression equations have estimates of zero flow calculated from observed streamflow for selected spring and fall low-flow frequency statistics, the final equations for the three low-flow regions were developed using two types of regression analyses—left-censored and generalized-least-squares regression analyses. A total of 211 streamgages were included in the development of nine spring regression equations—three equations for each of the three low-flow regions. A total of 208 streamgages were included in the development of nine fall regression equations—three equations for each of the three low-flow regions. A censoring threshold was used to develop 15 left-censored regression equations to estimate the three fall low-flow frequency statistics for each of the three low-flow regions and to estimate the three spring low-flow frequency statistics for the southern and northwest regions. For the northeast region, generalized-least-squares regression was used to develop three equations to estimate the three spring low-flow frequency statistics. For the northeast region, average standard errors of prediction range from 32.4 to 48.4 percent for the spring equations and average standard errors of estimate range from 56.4 to 73.8 percent for the fall equations. For the northwest region, average standard errors of estimate range from 58.9 to 62.1 percent for the spring equations and from 83.2 to 109.4 percent for the fall equations. For the southern region, average standard errors of estimate range from 43.2 to 64.0 percent for the spring equations and from 78.1 to 78.7 percent for the fall equations.The regression equations are applicable only to stream sites in Iowa with low flows not substantially affected by regulation, diversion, or urbanization and with basin characteristics within the range of those used to develop the equations. The regression equations will be implemented within the U.S. Geological Survey StreamStats Web-based geographic information system application. StreamStats allows users to click on any ungaged stream site and compute estimates of the six selected spring and fall low-flow statistics; in addition, 90-percent prediction intervals and the measured basin characteristics for the ungaged site are provided. StreamStats also allows users to click on any Iowa streamgage to obtain computed estimates for the six selected spring and fall low-flow statistics.
Keywords and Co-Occurrence Patterns in the Voynich Manuscript: An Information-Theoretic Analysis
Montemurro, Marcelo A.; Zanette, Damián H.
2013-01-01
The Voynich manuscript has remained so far as a mystery for linguists and cryptologists. While the text written on medieval parchment -using an unknown script system- shows basic statistical patterns that bear resemblance to those from real languages, there are features that suggested to some researches that the manuscript was a forgery intended as a hoax. Here we analyse the long-range structure of the manuscript using methods from information theory. We show that the Voynich manuscript presents a complex organization in the distribution of words that is compatible with those found in real language sequences. We are also able to extract some of the most significant semantic word-networks in the text. These results together with some previously known statistical features of the Voynich manuscript, give support to the presence of a genuine message inside the book. PMID:23805215
Lin, Jen-Jen; Cheng, Jung-Yu; Huang, Li-Fei; Lin, Ying-Hsiu; Wan, Yung-Liang; Tsui, Po-Hsiang
2017-05-01
The Nakagami distribution is an approximation useful to the statistics of ultrasound backscattered signals for tissue characterization. Various estimators may affect the Nakagami parameter in the detection of changes in backscattered statistics. In particular, the moment-based estimator (MBE) and maximum likelihood estimator (MLE) are two primary methods used to estimate the Nakagami parameters of ultrasound signals. This study explored the effects of the MBE and different MLE approximations on Nakagami parameter estimations. Ultrasound backscattered signals of different scatterer number densities were generated using a simulation model, and phantom experiments and measurements of human liver tissues were also conducted to acquire real backscattered echoes. Envelope signals were employed to estimate the Nakagami parameters by using the MBE, first- and second-order approximations of MLE (MLE 1 and MLE 2 , respectively), and Greenwood approximation (MLE gw ) for comparisons. The simulation results demonstrated that, compared with the MBE and MLE 1 , the MLE 2 and MLE gw enabled more stable parameter estimations with small sample sizes. Notably, the required data length of the envelope signal was 3.6 times the pulse length. The phantom and tissue measurement results also showed that the Nakagami parameters estimated using the MLE 2 and MLE gw could simultaneously differentiate various scatterer concentrations with lower standard deviations and reliably reflect physical meanings associated with the backscattered statistics. Therefore, the MLE 2 and MLE gw are suggested as estimators for the development of Nakagami-based methodologies for ultrasound tissue characterization. Copyright © 2017 Elsevier B.V. All rights reserved.
ERIC Educational Resources Information Center
St-Onge, Christina; Valois, Pierre; Abdous, Belkacem; Germain, Stephane
2009-01-01
To date, there have been no studies comparing parametric and nonparametric Item Characteristic Curve (ICC) estimation methods on the effectiveness of Person-Fit Statistics (PFS). The primary aim of this study was to determine if the use of ICCs estimated by nonparametric methods would increase the accuracy of item response theory-based PFS for…
Evaluation of assumptions in soil moisture triple collocation analysis
USDA-ARS?s Scientific Manuscript database
Triple collocation analysis (TCA) enables estimation of error variances for three or more products that retrieve or estimate the same geophysical variable using mutually-independent methods. Several statistical assumptions regarding the statistical nature of errors (e.g., mutual independence and ort...
Identifiability of PBPK Models with Applications to Dimethylarsinic Acid Exposure
Any statistical model should be identifiable in order for estimates and tests using it to be meaningful. We consider statistical analysis of physiologically-based pharmacokinetic (PBPK) models in which parameters cannot be estimated precisely from available data, and discuss diff...
NASA Technical Reports Server (NTRS)
Conrad, A. R.; Lupton, W. F.
1992-01-01
Each Keck instrument presents a consistent software view to the user interface programmer. The view consists of a small library of functions, which are identical for all instruments, and a large set of keywords, that vary from instrument to instrument. All knowledge of the underlying task structure is hidden from the application programmer by the keyword layer. Image capture software uses the same function library to collect data for the image header. Because the image capture software and the instrument control software are built on top of the same keyword layer, a given observation can be 'replayed' by extracting keyword-value pairs from the image header and passing them back to the control system. The keyword layer features non-blocking as well as blocking I/O. A non-blocking keyword write operation (such as setting a filter position) specifies a callback to be invoked when the operation is complete. A non-blocking keyword read operation specifies a callback to be invoked whenever the keyword changes state. The keyword-callback style meshes well with the widget-callback style commonly used in X window programs. The first keyword library was built for the two Keck optical instruments. More recently, keyword libraries have been developed for the infrared instruments and for telescope control. Although the underlying mechanisms used for inter-process communication by each of these systems vary widely (Lick MUSIC, Sun RPC, and direct socket I/O, respectively), a basic user interface has been written that can be used with any of these systems. Since the keyword libraries are bound to user interface programs dynamically at run time, only a single set of user interface executables is needed. For example, the same program, 'xshow', can be used to display continuously the telescope's position, the time left in an instrument's exposure, or both values simultaneously. Less generic tools that operate on specific keywords, for example an X display that controls optical instrument exposures, have also been written using the keyword layer.
2013-01-01
Demographic estimates of population at risk often underpin epidemiologic research and public health surveillance efforts. In spite of their central importance to epidemiology and public-health practice, little previous attention has been paid to evaluating the magnitude of errors associated with such estimates or the sensitivity of epidemiologic statistics to these effects. In spite of the well-known observation that accuracy in demographic estimates declines as the size of the population to be estimated decreases, demographers continue to face pressure to produce estimates for increasingly fine-grained population characteristics at ever-smaller geographic scales. Unfortunately, little guidance on the magnitude of errors that can be expected in such estimates is currently available in the literature and available for consideration in small-area epidemiology. This paper attempts to fill this current gap by producing a Vintage 2010 set of single-year-of-age estimates for census tracts, then evaluating their accuracy and precision in light of the results of the 2010 Census. These estimates are produced and evaluated for 499 census tracts in New Mexico for single-years of age from 0 to 21 and for each sex individually. The error distributions associated with these estimates are characterized statistically using non-parametric statistics including the median and 2.5th and 97.5th percentiles. The impact of these errors are considered through simulations in which observed and estimated 2010 population counts are used as alternative denominators and simulated event counts are used to compute a realistic range fo prevalence values. The implications of the results of this study for small-area epidemiologic research in cancer and environmental health are considered. PMID:24359344
NASA Astrophysics Data System (ADS)
Yan, Zheng; Mingzhong, Tian; Hengli, Wang
2010-05-01
Chinese hand-written local records were originated from the first century. Generally, these local records include geography, evolution, customs, education, products, people, historical sites, as well as writings of an area. Through such endeavors, the information of the natural materials of China nearly has had no "dark ages" in the evolution of its 5000-year old civilization. A compilation of all meaningful historical data of natural-disasters taken place in Alxa of inner-Mongolia, the second largest desert in China, is used here for the construction of a 500-year high resolution database. The database is divided into subsets according to the types of natural-disasters like sand-dust storm, drought events, cold wave, etc. Through applying trend, correlation, wavelet, and spectral analysis on these data, we can estimate the statistically periodicity of different natural-disasters, detect and quantify similarities and patterns of the periodicities of these records, and finally take these results in aggregate to find a strong and coherent cyclicity through the last 500 years which serves as the driving mechanism of these geological hazards. Based on the periodicity obtained from the above analysis, the paper discusses the probability of forecasting natural-disasters and the suitable measures to reduce disaster losses through history records. Keyword: Chinese local records; Alxa; natural disasters; database; periodicity analysis
Data Validation in the Kepler Science Operations Center Pipeline
NASA Technical Reports Server (NTRS)
Wu, Hayley; Twicken, Joseph D.; Tenenbaum, Peter; Clarke, Bruce D.; Li, Jie; Quintana, Elisa V.; Allen, Christopher; Chandrasekaran, Hema; Jenkins, Jon M.; Caldwell, Douglas A.;
2010-01-01
We present an overview of the Data Validation (DV) software component and its context within the Kepler Science Operations Center (SOC) pipeline and overall Kepler Science mission. The SOC pipeline performs a transiting planet search on the corrected light curves for over 150,000 targets across the focal plane array. We discuss the DV strategy for automated validation of Threshold Crossing Events (TCEs) generated in the transiting planet search. For each TCE, a transiting planet model is fitted to the target light curve. A multiple planet search is conducted by repeating the transiting planet search on the residual light curve after the model flux has been removed; if an additional detection occurs, a planet model is fitted to the new TCE. A suite of automated tests are performed after all planet candidates have been identified. We describe a centroid motion test to determine the significance of the motion of the target photocenter during transit and to estimate the coordinates of the transit source within the photometric aperture; a series of eclipsing binary discrimination tests on the parameters of the planet model fits to all transits and the sequences of odd and even transits; and a statistical bootstrap to assess the likelihood that the TCE would have been generated purely by chance given the target light curve with all transits removed. Keywords: photometry, data validation, Kepler, Earth-size planets
McCormack, Gavin R; Virk, Jagdeep S
2014-09-01
Higher levels of sedentary behavior are associated with adverse health outcomes. Over-reliance on private motor vehicles for transportation is a potential contributor to the obesity epidemic. The objective of this study was to review evidence on the relationship between motor vehicle travel distance and time and weight status among adults. Keywords associated with driving and weight status were entered into four databases (PubMed Medline Transportation Research Information Database and Web of Science) and retrieved article titles and abstracts screened for relevance. Relevant articles were assessed for their eligibility for inclusion in the review (English-language articles a sample ≥ 16 years of age included a measure of time or distance traveling in a motor vehicle and weight status and estimated the association between driving and weight status). The database search yielded 2781 articles, from which 88 were deemed relevant and 10 studies met the inclusion criteria. Of the 10 studies included in the review, 8 found a statistically significant positive association between time and distance traveled in a motor vehicle and weight status. Multilevel interventions that make alternatives to driving private motor vehicles more convenient, such as walking and cycling, are needed to promote healthy weight in the adult population. Copyright © 2014 Elsevier Inc. All rights reserved.
On the climate impacts from the volcanic and solar forcings
NASA Astrophysics Data System (ADS)
Varotsos, Costas A.; Lovejoy, Shaun
2016-04-01
The observed and the modelled estimations show that the main forcings on the atmosphere are of volcanic and solar origins, which act however in an opposite way. The former can be very strong and decrease at short time scales, whereas, the latter increase with time scale. On the contrary, the observed fluctuations in temperatures increase at long scales (e.g. centennial and millennial), and the solar forcings do increase with scale. The common practice is to reduce forcings to radiative equivalents assuming that their combination is linear. In order to clarify the validity of the linearity assumption and determine its range of validity, we systematically compare the statistical properties of solar only, volcanic only and combined solar and volcanic forcings over the range of time scales from one to 1000 years. Additionally, we attempt to investigate plausible reasons for the discrepancies observed between the measured and modeled anomalies of tropospheric temperatures in the tropics. For this purpose, we analyse tropospheric temperature anomalies for both the measured and modeled time series. The results obtained show that the measured temperature fluctuations reveal white noise behavior, while the modeled ones exhibit long-range power law correlations. We suggest that the persistent signal, should be removed from the modeled values in order to achieve better agreement with observations. Keywords: Scaling, Nonlinear variability, Climate system, Solar radiation
Horakova, Dagmar; Bouchalova, Katerina; Cwiertka, Karel; Stepanek, Ladislav; Vlckova, Jana; Kollarova, Helena
2018-05-15
Triple negative breast cancer (TNBC) is an aggressive form of breast cancer (BC) with a poor prognosis. Second, patients cannot benefit from targeted therapy, except for those with BRCA1/2 mutations, for whom poly (ADP-ribose) polymerase (PARP) inhibition therapy using olaparib has recently been approved. As global priorities continue to be epidemiological analysis of BC risk factors and early diagnosis, this review focuses on the risks and protective factors associated with TNBC. A PubMed keyword search for new knowledge on the risks and protective factors for TNBC was carried out. We also found statistical information from current online databases concerning the estimated incidence, prevalence and mortality worldwide of this cancer. Traditional risk factors for BC and TNBC are those related to reproduction such as the age of menarche, age of first birth, parity, breastfeeding and age at menopause. Attention needs to be paid to familial BC, weight control, alcohol consumption and regular physical activity. Epidemiological studies on TNBC provide evidence for protective factors such as regular consumption of soya, seafood, green tea, folic acid and vitamin D. Potential risk factors may include night work and viral infectious agents like human papillomavirus (HPV) and Epstein-Barr virus (EBV). Droplet digital methylation-specific PCR (ddMSP) is a possible new screening method for detection of BC including TNBC. Further research is necessary to validate these new factors.
Nonparametric entropy estimation using kernel densities.
Lake, Douglas E
2009-01-01
The entropy of experimental data from the biological and medical sciences provides additional information over summary statistics. Calculating entropy involves estimates of probability density functions, which can be effectively accomplished using kernel density methods. Kernel density estimation has been widely studied and a univariate implementation is readily available in MATLAB. The traditional definition of Shannon entropy is part of a larger family of statistics, called Renyi entropy, which are useful in applications that require a measure of the Gaussianity of data. Of particular note is the quadratic entropy which is related to the Friedman-Tukey (FT) index, a widely used measure in the statistical community. One application where quadratic entropy is very useful is the detection of abnormal cardiac rhythms, such as atrial fibrillation (AF). Asymptotic and exact small-sample results for optimal bandwidth and kernel selection to estimate the FT index are presented and lead to improved methods for entropy estimation.
NASA Astrophysics Data System (ADS)
Liu, Deyang; An, Ping; Ma, Ran; Yang, Chao; Shen, Liquan; Li, Kai
2016-07-01
Three-dimensional (3-D) holoscopic imaging, also known as integral imaging, light field imaging, or plenoptic imaging, can provide natural and fatigue-free 3-D visualization. However, a large amount of data is required to represent the 3-D holoscopic content. Therefore, efficient coding schemes for this particular type of image are needed. A 3-D holoscopic image coding scheme with kernel-based minimum mean square error (MMSE) estimation is proposed. In the proposed scheme, the coding block is predicted by an MMSE estimator under statistical modeling. In order to obtain the signal statistical behavior, kernel density estimation (KDE) is utilized to estimate the probability density function of the statistical modeling. As bandwidth estimation (BE) is a key issue in the KDE problem, we also propose a BE method based on kernel trick. The experimental results demonstrate that the proposed scheme can achieve a better rate-distortion performance and a better visual rendering quality.
Atmospheric Tracer Inverse Modeling Using Markov Chain Monte Carlo (MCMC)
NASA Astrophysics Data System (ADS)
Kasibhatla, P.
2004-12-01
In recent years, there has been an increasing emphasis on the use of Bayesian statistical estimation techniques to characterize the temporal and spatial variability of atmospheric trace gas sources and sinks. The applications have been varied in terms of the particular species of interest, as well as in terms of the spatial and temporal resolution of the estimated fluxes. However, one common characteristic has been the use of relatively simple statistical models for describing the measurement and chemical transport model error statistics and prior source statistics. For example, multivariate normal probability distribution functions (pdfs) are commonly used to model these quantities and inverse source estimates are derived for fixed values of pdf paramaters. While the advantage of this approach is that closed form analytical solutions for the a posteriori pdfs of interest are available, it is worth exploring Bayesian analysis approaches which allow for a more general treatment of error and prior source statistics. Here, we present an application of the Markov Chain Monte Carlo (MCMC) methodology to an atmospheric tracer inversion problem to demonstrate how more gereral statistical models for errors can be incorporated into the analysis in a relatively straightforward manner. The MCMC approach to Bayesian analysis, which has found wide application in a variety of fields, is a statistical simulation approach that involves computing moments of interest of the a posteriori pdf by efficiently sampling this pdf. The specific inverse problem that we focus on is the annual mean CO2 source/sink estimation problem considered by the TransCom3 project. TransCom3 was a collaborative effort involving various modeling groups and followed a common modeling and analysis protocoal. As such, this problem provides a convenient case study to demonstrate the applicability of the MCMC methodology to atmospheric tracer source/sink estimation problems.
Wedemeyer, Gary A.; Nelson, Nancy C.
1975-01-01
Gaussian and nonparametric (percentile estimate and tolerance interval) statistical methods were used to estimate normal ranges for blood chemistry (bicarbonate, bilirubin, calcium, hematocrit, hemoglobin, magnesium, mean cell hemoglobin concentration, osmolality, inorganic phosphorus, and pH for juvenile rainbow (Salmo gairdneri, Shasta strain) trout held under defined environmental conditions. The percentile estimate and Gaussian methods gave similar normal ranges, whereas the tolerance interval method gave consistently wider ranges for all blood variables except hemoglobin. If the underlying frequency distribution is unknown, the percentile estimate procedure would be the method of choice.
THE Role OF Anisotropy AND Intermittency IN Solar Wind/Magnetosphere Coupling
NASA Astrophysics Data System (ADS)
Jankovicova, D.; Voros, Z.
2006-12-01
Turbulent fluctuations are common in the solar wind as well as in the Earth's magnetosphere. The fluctuations of both magnetic field and plasma parameters exhibit non-Gaussian statistics. Neither the amplitude of these fluctuations nor their spectral characteristics can provide a full statistical description of multi-scale features in turbulence. It substantiates a statistical approach including the estimation of experimentally accessible statistical moments. In this contribution, we will directly estimate the third (skewness) and the fourth (kurtosis) statistical moments from the available time series of magnetic measurements in the solar wind (ACE and WIND spacecraft) and in the Earth's magnetosphere (SYM-H index). Then we evaluate how the statistical moments change during strong and weak solar wind/magnetosphere coupling intervals.
Targeted estimation of nuisance parameters to obtain valid statistical inference.
van der Laan, Mark J
2014-01-01
In order to obtain concrete results, we focus on estimation of the treatment specific mean, controlling for all measured baseline covariates, based on observing independent and identically distributed copies of a random variable consisting of baseline covariates, a subsequently assigned binary treatment, and a final outcome. The statistical model only assumes possible restrictions on the conditional distribution of treatment, given the covariates, the so-called propensity score. Estimators of the treatment specific mean involve estimation of the propensity score and/or estimation of the conditional mean of the outcome, given the treatment and covariates. In order to make these estimators asymptotically unbiased at any data distribution in the statistical model, it is essential to use data-adaptive estimators of these nuisance parameters such as ensemble learning, and specifically super-learning. Because such estimators involve optimal trade-off of bias and variance w.r.t. the infinite dimensional nuisance parameter itself, they result in a sub-optimal bias/variance trade-off for the resulting real-valued estimator of the estimand. We demonstrate that additional targeting of the estimators of these nuisance parameters guarantees that this bias for the estimand is second order and thereby allows us to prove theorems that establish asymptotic linearity of the estimator of the treatment specific mean under regularity conditions. These insights result in novel targeted minimum loss-based estimators (TMLEs) that use ensemble learning with additional targeted bias reduction to construct estimators of the nuisance parameters. In particular, we construct collaborative TMLEs (C-TMLEs) with known influence curve allowing for statistical inference, even though these C-TMLEs involve variable selection for the propensity score based on a criterion that measures how effective the resulting fit of the propensity score is in removing bias for the estimand. As a particular special case, we also demonstrate the required targeting of the propensity score for the inverse probability of treatment weighted estimator using super-learning to fit the propensity score.
Statistical methods of estimating mining costs
Long, K.R.
2011-01-01
Until it was defunded in 1995, the U.S. Bureau of Mines maintained a Cost Estimating System (CES) for prefeasibility-type economic evaluations of mineral deposits and estimating costs at producing and non-producing mines. This system had a significant role in mineral resource assessments to estimate costs of developing and operating known mineral deposits and predicted undiscovered deposits. For legal reasons, the U.S. Geological Survey cannot update and maintain CES. Instead, statistical tools are under development to estimate mining costs from basic properties of mineral deposits such as tonnage, grade, mineralogy, depth, strip ratio, distance from infrastructure, rock strength, and work index. The first step was to reestimate "Taylor's Rule" which relates operating rate to available ore tonnage. The second step was to estimate statistical models of capital and operating costs for open pit porphyry copper mines with flotation concentrators. For a sample of 27 proposed porphyry copper projects, capital costs can be estimated from three variables: mineral processing rate, strip ratio, and distance from nearest railroad before mine construction began. Of all the variables tested, operating costs were found to be significantly correlated only with strip ratio.
Beaton, Alan A; Gruneberg, Michael M; Hyde, Christopher; Shufflebottom, Alex; Sykes, Robert N
2005-07-01
Ellis and Beaton (1993a) reported that the keyword method of learning enhanced memory of foreign vocabulary items when receptive learning was measured. However, for productive learning, rote repetition was superior to the keyword method. The first two experiments reported here show that, in comparison with rote repetition, both receptive and productive learning can be enhanced by the keyword method, provided that the quality of the keyword images is adequate. In a third experiment using a subset of words from Ellis and Beaton (1993a), the finding they reported, that for productive learning rote repetition was superior to the keyword method, was reversed. The quality of keyword images will vary from study to study and any generalisation regarding the efficacy of the keyword method must take this into account.
Fractal analysis of the short time series in a visibility graph method
NASA Astrophysics Data System (ADS)
Li, Ruixue; Wang, Jiang; Yu, Haitao; Deng, Bin; Wei, Xile; Chen, Yingyuan
2016-05-01
The aim of this study is to evaluate the performance of the visibility graph (VG) method on short fractal time series. In this paper, the time series of Fractional Brownian motions (fBm), characterized by different Hurst exponent H, are simulated and then mapped into a scale-free visibility graph, of which the degree distributions show the power-law form. The maximum likelihood estimation (MLE) is applied to estimate power-law indexes of degree distribution, and in this progress, the Kolmogorov-Smirnov (KS) statistic is used to test the performance of estimation of power-law index, aiming to avoid the influence of droop head and heavy tail in degree distribution. As a result, we find that the MLE gives an optimal estimation of power-law index when KS statistic reaches its first local minimum. Based on the results from KS statistic, the relationship between the power-law index and the Hurst exponent is reexamined and then amended to meet short time series. Thus, a method combining VG, MLE and KS statistics is proposed to estimate Hurst exponents from short time series. Lastly, this paper also offers an exemplification to verify the effectiveness of the combined method. In addition, the corresponding results show that the VG can provide a reliable estimation of Hurst exponents.
NASA Technical Reports Server (NTRS)
Abbey, Craig K.; Eckstein, Miguel P.
2002-01-01
We consider estimation and statistical hypothesis testing on classification images obtained from the two-alternative forced-choice experimental paradigm. We begin with a probabilistic model of task performance for simple forced-choice detection and discrimination tasks. Particular attention is paid to general linear filter models because these models lead to a direct interpretation of the classification image as an estimate of the filter weights. We then describe an estimation procedure for obtaining classification images from observer data. A number of statistical tests are presented for testing various hypotheses from classification images based on some more compact set of features derived from them. As an example of how the methods we describe can be used, we present a case study investigating detection of a Gaussian bump profile.
The effects of estimation of censoring, truncation, transformation and partial data vectors
NASA Technical Reports Server (NTRS)
Hartley, H. O.; Smith, W. B.
1972-01-01
The purpose of this research was to attack statistical problems concerning the estimation of distributions for purposes of predicting and measuring assembly performance as it appears in biological and physical situations. Various statistical procedures were proposed to attack problems of this sort, that is, to produce the statistical distributions of the outcomes of biological and physical situations which, employ characteristics measured on constituent parts. The techniques are described.
Statistical Inference on Memory Structure of Processes and Its Applications to Information Theory
2016-05-12
valued times series from a sample. (A practical algorithm to compute the estimator is a work in progress.) Third, finitely-valued spatial processes...ES) U.S. Army Research Office P.O. Box 12211 Research Triangle Park, NC 27709-2211 mathematical statistics; time series ; Markov chains; random...proved. Second, a statistical method is developed to estimate the memory depth of discrete- time and continuously-valued times series from a sample. (A
The Shock and Vibration Digest. Volume 13, Number 12
1981-12-01
Resulting Unsteady Forces and Flow Phenomenon. Part III 26 BOOK REVIEWS STATISTICAL ENERGY ANALYSIS Chapter IV considers the problems of estimating J OF...stress, acceleration, modes. Statistical energy analysis (SEA), which is and pressure; estimations of the average system expressed in terms of random...by F.C. Nelson, SVD, 13 (8), pp 30-31 (Aug 1981) Lyons, R.H., Statistical Energy Analysis of Dynamic Systems, MIT Press, Cambridge, MA; Revieed by H
Koltun, G.F.; Kula, Stephanie P.
2013-01-01
This report presents the results of a study to develop methods for estimating selected low-flow statistics and for determining annual flow-duration statistics for Ohio streams. Regression techniques were used to develop equations for estimating 10-year recurrence-interval (10-percent annual-nonexceedance probability) low-flow yields, in cubic feet per second per square mile, with averaging periods of 1, 7, 30, and 90-day(s), and for estimating the yield corresponding to the long-term 80-percent duration flow. These equations, which estimate low-flow yields as a function of a streamflow-variability index, are based on previously published low-flow statistics for 79 long-term continuous-record streamgages with at least 10 years of data collected through water year 1997. When applied to the calibration dataset, average absolute percent errors for the regression equations ranged from 15.8 to 42.0 percent. The regression results have been incorporated into the U.S. Geological Survey (USGS) StreamStats application for Ohio (http://water.usgs.gov/osw/streamstats/ohio.html) in the form of a yield grid to facilitate estimation of the corresponding streamflow statistics in cubic feet per second. Logistic-regression equations also were developed and incorporated into the USGS StreamStats application for Ohio for selected low-flow statistics to help identify occurrences of zero-valued statistics. Quantiles of daily and 7-day mean streamflows were determined for annual and annual-seasonal (September–November) periods for each complete climatic year of streamflow-gaging station record for 110 selected streamflow-gaging stations with 20 or more years of record. The quantiles determined for each climatic year were the 99-, 98-, 95-, 90-, 80-, 75-, 70-, 60-, 50-, 40-, 30-, 25-, 20-, 10-, 5-, 2-, and 1-percent exceedance streamflows. Selected exceedance percentiles of the annual-exceedance percentiles were subsequently computed and tabulated to help facilitate consideration of the annual risk of exceedance or nonexceedance of annual and annual-seasonal-period flow-duration values. The quantiles are based on streamflow data collected through climatic year 2008.
Spatial scan statistics for detection of multiple clusters with arbitrary shapes.
Lin, Pei-Sheng; Kung, Yi-Hung; Clayton, Murray
2016-12-01
In applying scan statistics for public health research, it would be valuable to develop a detection method for multiple clusters that accommodates spatial correlation and covariate effects in an integrated model. In this article, we connect the concepts of the likelihood ratio (LR) scan statistic and the quasi-likelihood (QL) scan statistic to provide a series of detection procedures sufficiently flexible to apply to clusters of arbitrary shape. First, we use an independent scan model for detection of clusters and then a variogram tool to examine the existence of spatial correlation and regional variation based on residuals of the independent scan model. When the estimate of regional variation is significantly different from zero, a mixed QL estimating equation is developed to estimate coefficients of geographic clusters and covariates. We use the Benjamini-Hochberg procedure (1995) to find a threshold for p-values to address the multiple testing problem. A quasi-deviance criterion is used to regroup the estimated clusters to find geographic clusters with arbitrary shapes. We conduct simulations to compare the performance of the proposed method with other scan statistics. For illustration, the method is applied to enterovirus data from Taiwan. © 2016, The International Biometric Society.
Estimating regional plant biodiversity with GIS modelling
Louis R. Iverson; Anantha M. Prasad; Anantha M. Prasad
1998-01-01
We analyzed a statewide species database together with a county-level geographic information system to build a model based on well-surveyed areas to estimate species richness in less surveyed counties. The model involved GIS (Arc/Info) and statistics (S-PLUS), including spatial statistics (S+SpatialStats).
StreamStats: a U.S. geological survey web site for stream information
Kernell, G. Ries; Gray, John R.; Renard, Kenneth G.; McElroy, Stephen A.; Gburek, William J.; Canfield, H. Evan; Scott, Russell L.
2003-01-01
The U.S. Geological Survey has developed a Web application, named StreamStats, for providing streamflow statistics, such as the 100-year flood and the 7-day, 10-year low flow, to the public. Statistics can be obtained for data-collection stations and for ungaged sites. Streamflow statistics are needed for water-resources planning and management; for design of bridges, culverts, and flood-control structures; and for many other purposes. StreamStats users can point and click on data-collection stations shown on a map in their Web browser window to obtain previously determined streamflow statistics and other information for the stations. Users also can point and click on any stream shown on the map to get estimates of streamflow statistics for ungaged sites. StreamStats determines the watershed boundaries and measures physical and climatic characteristics of the watersheds for the ungaged sites by use of a Geographic Information System (GIS), and then it inserts the characteristics into previously determined regression equations to estimate the streamflow statistics. Compared to manual methods, StreamStats reduces the average time needed to estimate streamflow statistics for ungaged sites from several hours to several minutes.
Modeling Ka-band low elevation angle propagation statistics
NASA Technical Reports Server (NTRS)
Russell, Thomas A.; Weinfield, John; Pearson, Chris; Ippolito, Louis J.
1995-01-01
The statistical variability of the secondary atmospheric propagation effects on satellite communications cannot be ignored at frequencies of 20 GHz or higher, particularly if the propagation margin allocation is such that link availability falls below 99 percent. The secondary effects considered in this paper are gaseous absorption, cloud absorption, and tropospheric scintillation; rain attenuation is the primary effect. Techniques and example results are presented for estimation of the overall combined impact of the atmosphere on satellite communications reliability. Statistical methods are employed throughout and the most widely accepted models for the individual effects are used wherever possible. The degree of correlation between the effects is addressed and some bounds on the expected variability in the combined effects statistics are derived from the expected variability in correlation. Example estimates are presented of combined effects statistics in the Washington D.C. area of 20 GHz and 5 deg elevation angle. The statistics of water vapor are shown to be sufficient for estimation of the statistics of gaseous absorption at 20 GHz. A computer model based on monthly surface weather is described and tested. Significant improvement in prediction of absorption extremes is demonstrated with the use of path weather data instead of surface data.
Falgreen, Steffen; Laursen, Maria Bach; Bødker, Julie Støve; Kjeldsen, Malene Krag; Schmitz, Alexander; Nyegaard, Mette; Johnsen, Hans Erik; Dybkær, Karen; Bøgsted, Martin
2014-06-05
In vitro generated dose-response curves of human cancer cell lines are widely used to develop new therapeutics. The curves are summarised by simplified statistics that ignore the conventionally used dose-response curves' dependency on drug exposure time and growth kinetics. This may lead to suboptimal exploitation of data and biased conclusions on the potential of the drug in question. Therefore we set out to improve the dose-response assessments by eliminating the impact of time dependency. First, a mathematical model for drug induced cell growth inhibition was formulated and used to derive novel dose-response curves and improved summary statistics that are independent of time under the proposed model. Next, a statistical analysis workflow for estimating the improved statistics was suggested consisting of 1) nonlinear regression models for estimation of cell counts and doubling times, 2) isotonic regression for modelling the suggested dose-response curves, and 3) resampling based method for assessing variation of the novel summary statistics. We document that conventionally used summary statistics for dose-response experiments depend on time so that fast growing cell lines compared to slowly growing ones are considered overly sensitive. The adequacy of the mathematical model is tested for doxorubicin and found to fit real data to an acceptable degree. Dose-response data from the NCI60 drug screen were used to illustrate the time dependency and demonstrate an adjustment correcting for it. The applicability of the workflow was illustrated by simulation and application on a doxorubicin growth inhibition screen. The simulations show that under the proposed mathematical model the suggested statistical workflow results in unbiased estimates of the time independent summary statistics. Variance estimates of the novel summary statistics are used to conclude that the doxorubicin screen covers a significant diverse range of responses ensuring it is useful for biological interpretations. Time independent summary statistics may aid the understanding of drugs' action mechanism on tumour cells and potentially renew previous drug sensitivity evaluation studies.
2014-01-01
Background In vitro generated dose-response curves of human cancer cell lines are widely used to develop new therapeutics. The curves are summarised by simplified statistics that ignore the conventionally used dose-response curves’ dependency on drug exposure time and growth kinetics. This may lead to suboptimal exploitation of data and biased conclusions on the potential of the drug in question. Therefore we set out to improve the dose-response assessments by eliminating the impact of time dependency. Results First, a mathematical model for drug induced cell growth inhibition was formulated and used to derive novel dose-response curves and improved summary statistics that are independent of time under the proposed model. Next, a statistical analysis workflow for estimating the improved statistics was suggested consisting of 1) nonlinear regression models for estimation of cell counts and doubling times, 2) isotonic regression for modelling the suggested dose-response curves, and 3) resampling based method for assessing variation of the novel summary statistics. We document that conventionally used summary statistics for dose-response experiments depend on time so that fast growing cell lines compared to slowly growing ones are considered overly sensitive. The adequacy of the mathematical model is tested for doxorubicin and found to fit real data to an acceptable degree. Dose-response data from the NCI60 drug screen were used to illustrate the time dependency and demonstrate an adjustment correcting for it. The applicability of the workflow was illustrated by simulation and application on a doxorubicin growth inhibition screen. The simulations show that under the proposed mathematical model the suggested statistical workflow results in unbiased estimates of the time independent summary statistics. Variance estimates of the novel summary statistics are used to conclude that the doxorubicin screen covers a significant diverse range of responses ensuring it is useful for biological interpretations. Conclusion Time independent summary statistics may aid the understanding of drugs’ action mechanism on tumour cells and potentially renew previous drug sensitivity evaluation studies. PMID:24902483
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gilbert, Richard O.
The application of statistics to environmental pollution monitoring studies requires a knowledge of statistical analysis methods particularly well suited to pollution data. This book fills that need by providing sampling plans, statistical tests, parameter estimation procedure techniques, and references to pertinent publications. Most of the statistical techniques are relatively simple, and examples, exercises, and case studies are provided to illustrate procedures. The book is logically divided into three parts. Chapters 1, 2, and 3 are introductory chapters. Chapters 4 through 10 discuss field sampling designs and Chapters 11 through 18 deal with a broad range of statistical analysis procedures. Somemore » statistical techniques given here are not commonly seen in statistics book. For example, see methods for handling correlated data (Sections 4.5 and 11.12), for detecting hot spots (Chapter 10), and for estimating a confidence interval for the mean of a lognormal distribution (Section 13.2). Also, Appendix B lists a computer code that estimates and tests for trends over time at one or more monitoring stations using nonparametric methods (Chapters 16 and 17). Unfortunately, some important topics could not be included because of their complexity and the need to limit the length of the book. For example, only brief mention could be made of time series analysis using Box-Jenkins methods and of kriging techniques for estimating spatial and spatial-time patterns of pollution, although multiple references on these topics are provided. Also, no discussion of methods for assessing risks from environmental pollution could be included.« less
ASYMPTOTIC DISTRIBUTION OF ΔAUC, NRIs, AND IDI BASED ON THEORY OF U-STATISTICS
Demler, Olga V.; Pencina, Michael J.; Cook, Nancy R.; D’Agostino, Ralph B.
2017-01-01
The change in AUC (ΔAUC), the IDI, and NRI are commonly used measures of risk prediction model performance. Some authors have reported good validity of associated methods of estimating their standard errors (SE) and construction of confidence intervals, whereas others have questioned their performance. To address these issues we unite the ΔAUC, IDI, and three versions of the NRI under the umbrella of the U-statistics family. We rigorously show that the asymptotic behavior of ΔAUC, NRIs, and IDI fits the asymptotic distribution theory developed for U-statistics. We prove that the ΔAUC, NRIs, and IDI are asymptotically normal, unless they compare nested models under the null hypothesis. In the latter case, asymptotic normality and existing SE estimates cannot be applied to ΔAUC, NRIs, or IDI. In the former case SE formulas proposed in the literature are equivalent to SE formulas obtained from U-statistics theory if we ignore adjustment for estimated parameters. We use Sukhatme-Randles-deWet condition to determine when adjustment for estimated parameters is necessary. We show that adjustment is not necessary for SEs of the ΔAUC and two versions of the NRI when added predictor variables are significant and normally distributed. The SEs of the IDI and three-category NRI should always be adjusted for estimated parameters. These results allow us to define when existing formulas for SE estimates can be used and when resampling methods such as the bootstrap should be used instead when comparing nested models. We also use the U-statistic theory to develop a new SE estimate of ΔAUC. PMID:28627112
Asymptotic distribution of ∆AUC, NRIs, and IDI based on theory of U-statistics.
Demler, Olga V; Pencina, Michael J; Cook, Nancy R; D'Agostino, Ralph B
2017-09-20
The change in area under the curve (∆AUC), the integrated discrimination improvement (IDI), and net reclassification index (NRI) are commonly used measures of risk prediction model performance. Some authors have reported good validity of associated methods of estimating their standard errors (SE) and construction of confidence intervals, whereas others have questioned their performance. To address these issues, we unite the ∆AUC, IDI, and three versions of the NRI under the umbrella of the U-statistics family. We rigorously show that the asymptotic behavior of ∆AUC, NRIs, and IDI fits the asymptotic distribution theory developed for U-statistics. We prove that the ∆AUC, NRIs, and IDI are asymptotically normal, unless they compare nested models under the null hypothesis. In the latter case, asymptotic normality and existing SE estimates cannot be applied to ∆AUC, NRIs, or IDI. In the former case, SE formulas proposed in the literature are equivalent to SE formulas obtained from U-statistics theory if we ignore adjustment for estimated parameters. We use Sukhatme-Randles-deWet condition to determine when adjustment for estimated parameters is necessary. We show that adjustment is not necessary for SEs of the ∆AUC and two versions of the NRI when added predictor variables are significant and normally distributed. The SEs of the IDI and three-category NRI should always be adjusted for estimated parameters. These results allow us to define when existing formulas for SE estimates can be used and when resampling methods such as the bootstrap should be used instead when comparing nested models. We also use the U-statistic theory to develop a new SE estimate of ∆AUC. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
Combining statistical inference and decisions in ecology
Williams, Perry J.; Hooten, Mevin B.
2016-01-01
Statistical decision theory (SDT) is a sub-field of decision theory that formally incorporates statistical investigation into a decision-theoretic framework to account for uncertainties in a decision problem. SDT provides a unifying analysis of three types of information: statistical results from a data set, knowledge of the consequences of potential choices (i.e., loss), and prior beliefs about a system. SDT links the theoretical development of a large body of statistical methods including point estimation, hypothesis testing, and confidence interval estimation. The theory and application of SDT have mainly been developed and published in the fields of mathematics, statistics, operations research, and other decision sciences, but have had limited exposure in ecology. Thus, we provide an introduction to SDT for ecologists and describe its utility for linking the conventionally separate tasks of statistical investigation and decision making in a single framework. We describe the basic framework of both Bayesian and frequentist SDT, its traditional use in statistics, and discuss its application to decision problems that occur in ecology. We demonstrate SDT with two types of decisions: Bayesian point estimation, and an applied management problem of selecting a prescribed fire rotation for managing a grassland bird species. Central to SDT, and decision theory in general, are loss functions. Thus, we also provide basic guidance and references for constructing loss functions for an SDT problem.
Methods for estimating flow-duration and annual mean-flow statistics for ungaged streams in Oklahoma
Esralew, Rachel A.; Smith, S. Jerrod
2010-01-01
Flow statistics can be used to provide decision makers with surface-water information needed for activities such as water-supply permitting, flow regulation, and other water rights issues. Flow statistics could be needed at any location along a stream. Most often, streamflow statistics are needed at ungaged sites, where no flow data are available to compute the statistics. Methods are presented in this report for estimating flow-duration and annual mean-flow statistics for ungaged streams in Oklahoma. Flow statistics included the (1) annual (period of record), (2) seasonal (summer-autumn and winter-spring), and (3) 12 monthly duration statistics, including the 20th, 50th, 80th, 90th, and 95th percentile flow exceedances, and the annual mean-flow (mean of daily flows for the period of record). Flow statistics were calculated from daily streamflow information collected from 235 streamflow-gaging stations throughout Oklahoma and areas in adjacent states. A drainage-area ratio method is the preferred method for estimating flow statistics at an ungaged location that is on a stream near a gage. The method generally is reliable only if the drainage-area ratio of the two sites is between 0.5 and 1.5. Regression equations that relate flow statistics to drainage-basin characteristics were developed for the purpose of estimating selected flow-duration and annual mean-flow statistics for ungaged streams that are not near gaging stations on the same stream. Regression equations were developed from flow statistics and drainage-basin characteristics for 113 unregulated gaging stations. Separate regression equations were developed by using U.S. Geological Survey streamflow-gaging stations in regions with similar drainage-basin characteristics. These equations can increase the accuracy of regression equations used for estimating flow-duration and annual mean-flow statistics at ungaged stream locations in Oklahoma. Streamflow-gaging stations were grouped by selected drainage-basin characteristics by using a k-means cluster analysis. Three regions were identified for Oklahoma on the basis of the clustering of gaging stations and a manual delineation of distinguishable hydrologic and geologic boundaries: Region 1 (western Oklahoma excluding the Oklahoma and Texas Panhandles), Region 2 (north- and south-central Oklahoma), and Region 3 (eastern and central Oklahoma). A total of 228 regression equations (225 flow-duration regressions and three annual mean-flow regressions) were developed using ordinary least-squares and left-censored (Tobit) multiple-regression techniques. These equations can be used to estimate 75 flow-duration statistics and annual mean-flow for ungaged streams in the three regions. Drainage-basin characteristics that were statistically significant independent variables in the regression analyses were (1) contributing drainage area; (2) station elevation; (3) mean drainage-basin elevation; (4) channel slope; (5) percentage of forested canopy; (6) mean drainage-basin hillslope; (7) soil permeability; and (8) mean annual, seasonal, and monthly precipitation. The accuracy of flow-duration regression equations generally decreased from high-flow exceedance (low-exceedance probability) to low-flow exceedance (high-exceedance probability) . This decrease may have happened because a greater uncertainty exists for low-flow estimates and low-flow is largely affected by localized geology that was not quantified by the drainage-basin characteristics selected. The standard errors of estimate of regression equations for Region 1 (western Oklahoma) were substantially larger than those standard errors for other regions, especially for low-flow exceedances. These errors may be a result of greater variability in low flow because of increased irrigation activities in this region. Regression equations may not be reliable for sites where the drainage-basin characteristics are outside the range of values of independent vari
The importance of the keyword-generation method in keyword mnemonics.
Campos, Alfredo; Amor, Angeles; González, María Angeles
2004-01-01
Keyword mnemonics is under certain conditions an effective approach for learning foreign-language vocabulary. It appears to be effective for words with high image vividness but not for words with low image vividness. In this study, two experiments were performed to assess the efficacy of a new keyword-generation procedure (peer generation). In Experiment 1, a sample of 363 high-school students was randomly into four groups. The subjects were required to learn L1 equivalents of a list of 16 Latin words (8 with high image vividness, 8 with low image vividness), using a) the rote method, or the keyword method with b) keywords and images generated and supplied by the experimenter, c) keywords and images generated by themselves, or d) keywords and images previously generated by peers (i.e., subjects with similar sociodemographic characteristics). Recall was tested immediately and one week later. For high-vivideness words, recall was significantly better in the keyword groups than the rote method group. For low-vividness words, learning method had no significant effect. Experiment 2 was basically identical, except that the word lists comprised 32 words (16 high-vividness, 16 low-vividness). In this experiment, the peer-generated-keyword group showed significantly better recall of high-vividness words than the rote method groups and the subject generated keyword group; again, however, learning method had no significant effect on recall of low-vividness words.
Artifact interactions retard technological improvement: An empirical study
Magee, Christopher L.
2017-01-01
Empirical research has shown performance improvement of many different technological domains occurs exponentially but with widely varying improvement rates. What causes some technologies to improve faster than others do? Previous quantitative modeling research has identified artifact interactions, where a design change in one component influences others, as an important determinant of improvement rates. The models predict that improvement rate for a domain is proportional to the inverse of the domain’s interaction parameter. However, no empirical research has previously studied and tested the dependence of improvement rates on artifact interactions. A challenge to testing the dependence is that any method for measuring interactions has to be applicable to a wide variety of technologies. Here we propose a novel patent-based method that is both technology domain-agnostic and less costly than alternative methods. We use textual content from patent sets in 27 domains to find the influence of interactions on improvement rates. Qualitative analysis identified six specific keywords that signal artifact interactions. Patent sets from each domain were then examined to determine the total count of these 6 keywords in each domain, giving an estimate of artifact interactions in each domain. It is found that improvement rates are positively correlated with the inverse of the total count of keywords with Pearson correlation coefficient of +0.56 with a p-value of 0.002. The results agree with model predictions, and provide, for the first time, empirical evidence that artifact interactions have a retarding effect on improvement rates of technological domains. PMID:28777798
A novel word spotting method based on recurrent neural networks.
Frinken, Volkmar; Fischer, Andreas; Manmatha, R; Bunke, Horst
2012-02-01
Keyword spotting refers to the process of retrieving all instances of a given keyword from a document. In the present paper, a novel keyword spotting method for handwritten documents is described. It is derived from a neural network-based system for unconstrained handwriting recognition. As such it performs template-free spotting, i.e., it is not necessary for a keyword to appear in the training set. The keyword spotting is done using a modification of the CTC Token Passing algorithm in conjunction with a recurrent neural network. We demonstrate that the proposed systems outperform not only a classical dynamic time warping-based approach but also a modern keyword spotting system, based on hidden Markov models. Furthermore, we analyze the performance of the underlying neural networks when using them in a recognition task followed by keyword spotting on the produced transcription. We point out the advantages of keyword spotting when compared to classic text line recognition.
Can We Spin Straw Into Gold? An Evaluation of Immigrant Legal Status Imputation Approaches
Van Hook, Jennifer; Bachmeier, James D.; Coffman, Donna; Harel, Ofer
2014-01-01
Researchers have developed logical, demographic, and statistical strategies for imputing immigrants’ legal status, but these methods have never been empirically assessed. We used Monte Carlo simulations to test whether, and under what conditions, legal status imputation approaches yield unbiased estimates of the association of unauthorized status with health insurance coverage. We tested five methods under a range of missing data scenarios. Logical and demographic imputation methods yielded biased estimates across all missing data scenarios. Statistical imputation approaches yielded unbiased estimates only when unauthorized status was jointly observed with insurance coverage; when this condition was not met, these methods overestimated insurance coverage for unauthorized relative to legal immigrants. We next showed how bias can be reduced by incorporating prior information about unauthorized immigrants. Finally, we demonstrated the utility of the best-performing statistical method for increasing power. We used it to produce state/regional estimates of insurance coverage among unauthorized immigrants in the Current Population Survey, a data source that contains no direct measures of immigrants’ legal status. We conclude that commonly employed legal status imputation approaches are likely to produce biased estimates, but data and statistical methods exist that could substantially reduce these biases. PMID:25511332
Ahlborn, W; Tuz, H J; Uberla, K
1990-03-01
In cohort studies the Mantel-Haenszel estimator ORMH is computed from sample data and is used as a point estimator of relative risk. Test-based confidence intervals are estimated with the help of the asymptotic chi-squared distributed MH-statistic chi 2MHS. The Mantel-extension-chi-squared is used as a test statistic for a dose-response relationship. Both test statistics--the Mantel-Haenszel-chi as well as the Mantel-extension-chi--assume homogeneity of risk across strata, which is rarely present. Also an extended nonparametric statistic, proposed by Terpstra, which is based on the Mann-Whitney-statistics assumes homogeneity of risk across strata. We have earlier defined four risk measures RRkj (k = 1,2,...,4) in the population and considered their estimates and the corresponding asymptotic distributions. In order to overcome the homogeneity assumption we use the delta-method to get "test-based" confidence intervals. Because the four risk measures RRkj are presented as functions of four weights gik we give, consequently, the asymptotic variances of these risk estimators also as functions of the weights gik in a closed form. Approximations to these variances are given. For testing a dose-response relationship we propose a new class of chi 2(1)-distributed global measures Gk and the corresponding global chi 2-test. In contrast to the Mantel-extension-chi homogeneity of risk across strata must not be assumed. These global test statistics are of the Wald type for composite hypotheses.(ABSTRACT TRUNCATED AT 250 WORDS)
Wu, Desheng; Song, Yu; Xie, Kefan; Zhang, Baofeng
2018-04-25
Chemical accidents are major causes of environmental losses and have been debated due to the potential threat to human beings and environment. Compared with the single statistical analysis, co-word analysis of chemical accidents illustrates significant traits at various levels and presents data into a visual network. This study utilizes a co-word analysis of the keywords extracted from the Web crawling texts of environmental loss-related chemical accidents and uses the Pearson's correlation coefficient to examine the internal attributes. To visualize the keywords of the accidents, this study carries out a multidimensional scaling analysis applying PROXSCAL and centrality identification. The research results show that an enormous environmental cost is exacted, especially given the expected environmental loss-related chemical accidents with geographical features. Meanwhile, each event often brings more than one environmental impact. Large number of chemical substances are released in the form of solid, liquid, and gas, leading to serious results. Eight clusters that represent the traits of these accidents are formed, including "leakage," "poisoning," "explosion," "pipeline crack," "river pollution," "dust pollution," "emission," and "industrial effluent." "Explosion" and "gas" possess a strong correlation with "poisoning," located at the center of visualization map.
A bibliometric analysis of occupational therapy publications.
Brown, Ted; Gutman, Sharon A; Ho, Yuh-Shan; Fong, Kenneth N K
2018-01-01
Bibliometrics involves the statistical analysis of the publications in a specific discipline or subject area. A bibliometric analysis of the occupational therapy refereed literature is needed. A bibliometric analysis was completed of the occupational therapy literature from 1991-2014, indexed in the Science Citation Index-Expanded or the Social Sciences Citation Index. Publications were searched by title, abstract, keywords, and KeyWords Plus. Total number of article citations, citations per journal, and contributions per country, individual authors, and institution were calculated. 5,315 occupational therapy articles were published in 821 journals. It appears that there is a citation window of an approximate 10-year period between the time of publication and the peak number of citations an article receives. The top three most highly cited articles were published in Developmental Medicine and Child Neurology, JAMA, and Lancet. AJOT, BJOT and AOTJ published the largest number of occupational therapy articles with the United States, Australia, and Canada producing the highest number of publications. McMaster University, the University of Queensland, and the University of Toronto were the institutions that published the largest number of occupational therapy journal articles. The occupational therapy literature is growing and the frequency of article citation is increasing.
Analysis models for the estimation of oceanic fields
NASA Technical Reports Server (NTRS)
Carter, E. F.; Robinson, A. R.
1987-01-01
A general model for statistically optimal estimates is presented for dealing with scalar, vector and multivariate datasets. The method deals with anisotropic fields and treats space and time dependence equivalently. Problems addressed include the analysis, or the production of synoptic time series of regularly gridded fields from irregular and gappy datasets, and the estimate of fields by compositing observations from several different instruments and sampling schemes. Technical issues are discussed, including the convergence of statistical estimates, the choice of representation of the correlations, the influential domain of an observation, and the efficiency of numerical computations.
Gazoorian, Christopher L.
2015-01-01
A graphical user interface, with an integrated spreadsheet summary report, has been developed to estimate and display the daily mean streamflows and statistics and to evaluate different water management or water withdrawal scenarios with the estimated monthly data. This package of regression equations, U.S. Geological Survey streamgage data, and spreadsheet application produces an interactive tool to estimate an unaltered daily streamflow hydrograph and streamflow statistics at ungaged sites in New York. Among other uses, the New York Streamflow Estimation Tool can assist water managers with permitting water withdrawals, implementing habitat protection, estimating contaminant loads, or determining the potential affect from chemical spills.
An Introduction to Confidence Intervals for Both Statistical Estimates and Effect Sizes.
ERIC Educational Resources Information Center
Capraro, Mary Margaret
This paper summarizes methods of estimating confidence intervals, including classical intervals and intervals for effect sizes. The recent American Psychological Association (APA) Task Force on Statistical Inference report suggested that confidence intervals should always be reported, and the fifth edition of the APA "Publication Manual"…
Statistics as Unbiased Estimators: Exploring the Teaching of Standard Deviation
ERIC Educational Resources Information Center
Wasserman, Nicholas H.; Casey, Stephanie; Champion, Joe; Huey, Maryann
2017-01-01
This manuscript presents findings from a study about the knowledge for and planned teaching of standard deviation. We investigate how understanding variance as an unbiased (inferential) estimator--not just a descriptive statistic for the variation (spread) in data--is related to teachers' instruction regarding standard deviation, particularly…
Early estimate of motor vehicle traffic fatalities in 2009 : a brief statistical summary
DOT National Transportation Integrated Search
2010-03-01
statistical projection of traffic fatalities in 2009 shows that an estimated 33,963 people died in motor vehicle traffic crashes. This represents a decline of about 8.9 percent as compared to the 37,261 fatalities that occurred in 2008, as shown in T...
NASA Astrophysics Data System (ADS)
Kottmann, R.; Ratmeyer, V.; Pop Ristov, A.; Boetius, A.
2012-04-01
More and more seagoing scientific expeditions use video-controlled research platforms such as Remote Operating Vehicles (ROV), Autonomous Underwater Vehicles (AUV), and towed camera systems. These produce many hours of video material which contains detailed and scientifically highly valuable footage of the biological, chemical, geological, and physical aspects of the oceans. Many of the videos contain unique observations of unknown life-forms which are rare, and which cannot be sampled and studied otherwise. To make such video material online accessible and to create a collaborative annotation environment the "Video Annotation and processing platform" (V-App) was developed. A first solely web-based installation for ROV videos is setup at the German Center for Marine Environmental Sciences (available at http://videolib.marum.de). It allows users to search and watch videos with a standard web browser based on the HTML5 standard. Moreover, V-App implements social web technologies allowing a distributed world-wide scientific community to collaboratively annotate videos anywhere at any time. It has several features fully implemented among which are: • User login system for fine grained permission and access control • Video watching • Video search using keywords, geographic position, depth and time range and any combination thereof • Video annotation organised in themes (tracks) such as biology and geology among others in standard or full screen mode • Annotation keyword management: Administrative users can add, delete, and update single keywords for annotation or upload sets of keywords from Excel-sheets • Download of products for scientific use This unique web application system helps making costly ROV videos online available (estimated cost range between 5.000 - 10.000 Euros per hour depending on the combination of ship and ROV). Moreover, with this system each expert annotation adds instantaneous available and valuable knowledge to otherwise uncharted material.
Liu, Ying; Navathe, Shamkant B; Pivoshenko, Alex; Dasigi, Venu G; Dingledine, Ray; Ciliax, Brian J
2006-01-01
One of the key challenges of microarray studies is to derive biological insights from the gene-expression patterns. Clustering genes by functional keyword association can provide direct information about the functional links among genes. However, the quality of the keyword lists significantly affects the clustering results. We compared two keyword weighting schemes: normalised z-score and term frequency-inverse document frequency (TFIDF). Two gene sets were tested to evaluate the effectiveness of the weighting schemes for keyword extraction for gene clustering. Using established measures of cluster quality, the results produced from TFIDF-weighted keywords outperformed those produced from normalised z-score weighted keywords. The optimised algorithms should be useful for partitioning genes from microarray lists into functionally discrete clusters.
[Research & development on computer expert system for forensic bones estimation].
Zhao, Jun-ji; Zhang, Jan-zheng; Liu, Nin-guo
2005-08-01
To build an expert system for forensic bones estimation. By using the object oriented method, employing statistical data of forensic anthropology, combining the statistical data frame knowledge representation with productions and also using the fuzzy matching and DS evidence theory method. Software for forensic estimation of sex, age and height with opened knowledge base was designed. This system is reliable and effective, and it would be a good assistant of the forensic technician.
Thermodynamics of statistical inference by cells.
Lang, Alex H; Fisher, Charles K; Mora, Thierry; Mehta, Pankaj
2014-10-03
The deep connection between thermodynamics, computation, and information is now well established both theoretically and experimentally. Here, we extend these ideas to show that thermodynamics also places fundamental constraints on statistical estimation and learning. To do so, we investigate the constraints placed by (nonequilibrium) thermodynamics on the ability of biochemical signaling networks to estimate the concentration of an external signal. We show that accuracy is limited by energy consumption, suggesting that there are fundamental thermodynamic constraints on statistical inference.
NASA Astrophysics Data System (ADS)
Alexakis, Dimitrios; Seiradakis, Kostas; Tsanis, Ioannis
2016-04-01
This article presents a remote sensing approach for spatio-temporal monitoring of both soil erosion and roughness using an Unmanned Aerial Vehicle (UAV). Soil erosion by water is commonly known as one of the main reasons for land degradation. Gully erosion causes considerable soil loss and soil degradation. Furthermore, quantification of soil roughness (irregularities of the soil surface due to soil texture) is important and affects surface storage and infiltration. Soil roughness is one of the most susceptible to variation in time and space characteristics and depends on different parameters such as cultivation practices and soil aggregation. A UAV equipped with a digital camera was employed to monitor soil in terms of erosion and roughness in two different study areas in Chania, Crete, Greece. The UAV followed predicted flight paths computed by the relevant flight planning software. The photogrammetric image processing enabled the development of sophisticated Digital Terrain Models (DTMs) and ortho-image mosaics with very high resolution on a sub-decimeter level. The DTMs were developed using photogrammetric processing of more than 500 images acquired with the UAV from different heights above the ground level. As the geomorphic formations can be observed from above using UAVs, shadowing effects do not generally occur and the generated point clouds have very homogeneous and high point densities. The DTMs generated from UAV were compared in terms of vertical absolute accuracies with a Global Navigation Satellite System (GNSS) survey. The developed data products were used for quantifying gully erosion and soil roughness in 3D as well as for the analysis of the surrounding areas. The significant elevation changes from multi-temporal UAV elevation data were used for estimating diachronically soil loss and sediment delivery without installing sediment traps. Concerning roughness, statistical indicators of surface elevation point measurements were estimated and various parameters such as standard deviation of DTM, deviation of residual and standard deviation of prominence were calculated directly from the extracted DTM. Sophisticated statistical filters and elevation indices were developed to quantify both soil erosion and roughness. The applied methodology for monitoring both soil erosion and roughness provides an optimum way of reducing the existing gap between field scale and satellite scale. Keywords : UAV, soil, erosion, roughness, DTM
Austin, Peter C; Schuster, Tibor; Platt, Robert W
2015-10-15
Estimating statistical power is an important component of the design of both randomized controlled trials (RCTs) and observational studies. Methods for estimating statistical power in RCTs have been well described and can be implemented simply. In observational studies, statistical methods must be used to remove the effects of confounding that can occur due to non-random treatment assignment. Inverse probability of treatment weighting (IPTW) using the propensity score is an attractive method for estimating the effects of treatment using observational data. However, sample size and power calculations have not been adequately described for these methods. We used an extensive series of Monte Carlo simulations to compare the statistical power of an IPTW analysis of an observational study with time-to-event outcomes with that of an analysis of a similarly-structured RCT. We examined the impact of four factors on the statistical power function: number of observed events, prevalence of treatment, the marginal hazard ratio, and the strength of the treatment-selection process. We found that, on average, an IPTW analysis had lower statistical power compared to an analysis of a similarly-structured RCT. The difference in statistical power increased as the magnitude of the treatment-selection model increased. The statistical power of an IPTW analysis tended to be lower than the statistical power of a similarly-structured RCT.
Two-stage approach to keyword spotting in handwritten documents
NASA Astrophysics Data System (ADS)
Haji, Mehdi; Ameri, Mohammad R.; Bui, Tien D.; Suen, Ching Y.; Ponson, Dominique
2013-12-01
Separation of keywords from non-keywords is the main problem in keyword spotting systems which has traditionally been approached by simplistic methods, such as thresholding of recognition scores. In this paper, we analyze this problem from a machine learning perspective, and we study several standard machine learning algorithms specifically in the context of non-keyword rejection. We propose a two-stage approach to keyword spotting and provide a theoretical analysis of the performance of the system which gives insights on how to design the classifier in order to maximize the overall performance in terms of F-measure.
Wise, Michael J
2003-10-29
The late embryogenesis abundant (LEA) proteins cover a number of loosely related groups of proteins, originally found in plants but now being found in non-plant species. Their precise function is unknown, though considerable evidence suggests that LEA proteins are involved in desiccation resistance. Using a number of statistically-based bioinformatics tools the classification of a large set of LEA proteins, covering all Groups, is reexamined together with some previous findings. Searches based on peptide composition return proteins with similar composition to different LEA Groups; keyword clustering is then applied to reveal keywords and phrases suggestive of the Groups' properties. Previous research has suggested that glycine is characteristic of LEA proteins, but it is only highly over-represented in Groups 1 and 2, while alanine, thought characteristic of Group 2, is over-represented in Group 3, 4 and 6 but under-represented in Groups 1 and 2. However, for LEA Groups 1 2 and 3 it is shown that glutamine is very significantly over-represented, while cysteine, phenylalanine, isoleucine, leucine and tryptophan are significantly under-represented. There is also evidence that the Group 4 LEA proteins are more appropriately redistributed to Group 2 and Group 3. Similarly, Group 5 is better found among the Group 3 LEA proteins. There is evidence that Group 2 and Group 3 LEA proteins, though distinct, might be related. This relationship is also evident in the overlapping sets of keywords for the two Groups, emphasising alpha-helical structure and, at a larger scale, filaments, all of which fits well with experimental evidence that proteins from both Groups are natively unstructured, but become structured under stress conditions. The keywords support localisation of LEA proteins both in the nucleus and associated with the cytoskeleton, and a mode of action similar to chaperones, perhaps the cold shock chaperones, via a role in DNA-binding. In general, non-globular and low-complexity proteins, such as the LEA proteins, pose particular challenges in determining their functions and modes of action. Rather than masking off and ignoring low-complexity domains, novel tools and tool combinations are needed which are capable of analysing such proteins in their entirety.
Tabano, David C; Bol, Kirk; Newcomer, Sophia R; Barrow, Jennifer C; Daley, Matthew F
2017-12-06
Measuring obesity prevalence across geographic areas should account for environmental and socioeconomic factors that contribute to spatial autocorrelation, the dependency of values in estimates across neighboring areas, to mitigate the bias in measures and risk of type I errors in hypothesis testing. Dependency among observations across geographic areas violates statistical independence assumptions and may result in biased estimates. Empirical Bayes (EB) estimators reduce the variability of estimates with spatial autocorrelation, which limits the overall mean square-error and controls for sample bias. Using the Colorado Body Mass Index (BMI) Monitoring System, we modeled the spatial autocorrelation of adult (≥ 18 years old) obesity (BMI ≥ 30 kg m 2 ) measurements using patient-level electronic health record data from encounters between January 1, 2009, and December 31, 2011. Obesity prevalence was estimated among census tracts with >=10 observations in Denver County census tracts during the study period. We calculated the Moran's I statistic to test for spatial autocorrelation across census tracts, and mapped crude and EB obesity prevalence across geographic areas. In Denver County, there were 143 census tracts with 10 or more observations, representing a total of 97,710 adults with a valid BMI. The crude obesity prevalence for adults in Denver County was 29.8 percent (95% CI 28.4-31.1%) and ranged from 12.8 to 45.2 percent across individual census tracts. EB obesity prevalence was 30.2 percent (95% CI 28.9-31.5%) and ranged from 15.3 to 44.3 percent across census tracts. Statistical tests using the Moran's I statistic suggest adult obesity prevalence in Denver County was distributed in a non-random pattern. Clusters of EB obesity estimates were highly significant (alpha=0.05) in neighboring census tracts. Concentrations of obesity estimates were primarily in the west and north in Denver County. Statistical tests reveal adult obesity prevalence exhibit spatial autocorrelation in Denver County at the census tract level. EB estimates for obesity prevalence can be used to control for spatial autocorrelation between neighboring census tracts and may produce less biased estimates of obesity prevalence.
Wiley, Jeffrey B.; Curran, Janet H.
2003-01-01
Methods for estimating daily mean flow-duration statistics for seven regions in Alaska and low-flow frequencies for one region, southeastern Alaska, were developed from daily mean discharges for streamflow-gaging stations in Alaska and conterminous basins in Canada. The 15-, 10-, 9-, 8-, 7-, 6-, 5-, 4-, 3-, 2-, and 1-percent duration flows were computed for the October-through-September water year for 222 stations in Alaska and conterminous basins in Canada. The 98-, 95-, 90-, 85-, 80-, 70-, 60-, and 50-percent duration flows were computed for the individual months of July, August, and September for 226 stations in Alaska and conterminous basins in Canada. The 98-, 95-, 90-, 85-, 80-, 70-, 60-, and 50-percent duration flows were computed for the season July-through-September for 65 stations in southeastern Alaska. The 7-day, 10-year and 7-day, 2-year low-flow frequencies for the season July-through-September were computed for 65 stations for most of southeastern Alaska. Low-flow analyses were limited to particular months or seasons in order to omit winter low flows, when ice effects reduce the quality of the records and validity of statistical assumptions. Regression equations for estimating the selected high-flow and low-flow statistics for the selected months and seasons for ungaged sites were developed from an ordinary-least-squares regression model using basin characteristics as independent variables. Drainage area and precipitation were significant explanatory variables for high flows, and drainage area, precipitation, mean basin elevation, and area of glaciers were significant explanatory variables for low flows. The estimating equations can be used at ungaged sites in Alaska and conterminous basins in Canada where streamflow regulation, streamflow diversion, urbanization, and natural damming and releasing of water do not affect the streamflow data for the given month or season. Standard errors of estimate ranged from 15 to 56 percent for high-duration flow statistics, 25 to greater than 500 percent for monthly low-duration flow statistics, 32 to 66 percent for seasonal low-duration flow statistics, and 53 to 64 percent for low-flow frequency statistics.
Watson, Kara M.; McHugh, Amy R.
2014-01-01
Regional regression equations were developed for estimating monthly flow-duration and monthly low-flow frequency statistics for ungaged streams in Coastal Plain and non-coastal regions of New Jersey for baseline and current land- and water-use conditions. The equations were developed to estimate 87 different streamflow statistics, which include the monthly 99-, 90-, 85-, 75-, 50-, and 25-percentile flow-durations of the minimum 1-day daily flow; the August–September 99-, 90-, and 75-percentile minimum 1-day daily flow; and the monthly 7-day, 10-year (M7D10Y) low-flow frequency. These 87 streamflow statistics were computed for 41 continuous-record streamflow-gaging stations (streamgages) with 20 or more years of record and 167 low-flow partial-record stations in New Jersey with 10 or more streamflow measurements. The regression analyses used to develop equations to estimate selected streamflow statistics were performed by testing the relation between flow-duration statistics and low-flow frequency statistics for 32 basin characteristics (physical characteristics, land use, surficial geology, and climate) at the 41 streamgages and 167 low-flow partial-record stations. The regression analyses determined drainage area, soil permeability, average April precipitation, average June precipitation, and percent storage (water bodies and wetlands) were the significant explanatory variables for estimating the selected flow-duration and low-flow frequency statistics. Streamflow estimates were computed for two land- and water-use conditions in New Jersey—land- and water-use during the baseline period of record (defined as the years a streamgage had little to no change in development and water use) and current land- and water-use conditions (1989–2008)—for each selected station using data collected through water year 2008. The baseline period of record is representative of a period when the basin was unaffected by change in development. The current period is representative of the increased development of the last 20 years (1989–2008). The two different land- and water-use conditions were used as surrogates for development to determine whether there have been changes in low-flow statistics as a result of changes in development over time. The State was divided into two low-flow regression regions, the Coastal Plain and the non-coastal region, in order to improve the accuracy of the regression equations. The left-censored parametric survival regression method was used for the analyses to account for streamgages and partial-record stations that had zero flow values for some of the statistics. The average standard error of estimate for the 348 regression equations ranged from 16 to 340 percent. These regression equations and basin characteristics are presented in the U.S. Geological Survey (USGS) StreamStats Web-based geographic information system application. This tool allows users to click on an ungaged site on a stream in New Jersey and get the estimated flow-duration and low-flow frequency statistics. Additionally, the user can click on a streamgage or partial-record station and get the “at-site” streamflow statistics. The low-flow characteristics of a stream ultimately affect the use of the stream by humans. Specific information on the low-flow characteristics of streams is essential to water managers who deal with problems related to municipal and industrial water supply, fish and wildlife conservation, and dilution of wastewater.
Wood, Molly S.
2014-01-01
The U.S. Geological Survey (USGS), in cooperation with the Bureau of Land Management (BLM), estimated streamflow statistics for stream segments designated “Wild,” “Scenic,” or “Recreational” under the National Wild and Scenic Rivers System in the Owyhee Canyonlands Wilderness in southwestern Idaho. The streamflow statistics were used by the BLM to develop and file a draft, federal reserved water right claim to protect federally designated “outstanding remarkable values” in the Jarbidge River. The BLM determined that the daily mean streamflow equaled or exceeded 20, 50, and 80 percent of the time during bimonthly periods (two periods per month) and the bankfull (66.7-percent annual exceedance probability) streamflow are important thresholds for maintaining outstanding remarkable values. Although streamflow statistics for the Jarbidge River below Jarbidge, Nevada (USGS 13162225) were published previously in 2013 and used for the draft water right claim, the BLM and USGS have since recognized the need to refine streamflow statistics given the approximate 40 river mile distance and intervening tributaries between the original point of estimation (USGS 13162225) and at the mouth of the Jarbidge River, which is the downstream end of the Wild and Scenic River segment. A drainage-area-ratio method was used in 2013 to estimate bimonthly exceedance probability streamflow statistics at the mouth of the Jarbidge River based on available streamgage data on the Jarbidge and East Fork Jarbidge Rivers. The resulting bimonthly streamflow statistics were further adjusted using a scaling factor calculated from a water balance on streamflow statistics calculated for the Bruneau and East Fork Bruneau Rivers and Sheep Creek. The final, adjusted bimonthly exceedance probability and bankfull streamflow statistics compared well with available verification datasets (including discrete streamflow measurements made at the mouth of the Jarbidge River) and are considered the best available estimates for streamflow statistics in the Jarbidge Wild and Scenic River segment.
Joint Adaptive Mean-Variance Regularization and Variance Stabilization of High Dimensional Data.
Dazard, Jean-Eudes; Rao, J Sunil
2012-07-01
The paper addresses a common problem in the analysis of high-dimensional high-throughput "omics" data, which is parameter estimation across multiple variables in a set of data where the number of variables is much larger than the sample size. Among the problems posed by this type of data are that variable-specific estimators of variances are not reliable and variable-wise tests statistics have low power, both due to a lack of degrees of freedom. In addition, it has been observed in this type of data that the variance increases as a function of the mean. We introduce a non-parametric adaptive regularization procedure that is innovative in that : (i) it employs a novel "similarity statistic"-based clustering technique to generate local-pooled or regularized shrinkage estimators of population parameters, (ii) the regularization is done jointly on population moments, benefiting from C. Stein's result on inadmissibility, which implies that usual sample variance estimator is improved by a shrinkage estimator using information contained in the sample mean. From these joint regularized shrinkage estimators, we derived regularized t-like statistics and show in simulation studies that they offer more statistical power in hypothesis testing than their standard sample counterparts, or regular common value-shrinkage estimators, or when the information contained in the sample mean is simply ignored. Finally, we show that these estimators feature interesting properties of variance stabilization and normalization that can be used for preprocessing high-dimensional multivariate data. The method is available as an R package, called 'MVR' ('Mean-Variance Regularization'), downloadable from the CRAN website.
The limits of protein sequence comparison?
Pearson, William R; Sierk, Michael L
2010-01-01
Modern sequence alignment algorithms are used routinely to identify homologous proteins, proteins that share a common ancestor. Homologous proteins always share similar structures and often have similar functions. Over the past 20 years, sequence comparison has become both more sensitive, largely because of profile-based methods, and more reliable, because of more accurate statistical estimates. As sequence and structure databases become larger, and comparison methods become more powerful, reliable statistical estimates will become even more important for distinguishing similarities that are due to homology from those that are due to analogy (convergence). The newest sequence alignment methods are more sensitive than older methods, but more accurate statistical estimates are needed for their full power to be realized. PMID:15919194
NASA Astrophysics Data System (ADS)
Huang, Jinxin; Clarkson, Eric; Kupinski, Matthew; Rolland, Jannick P.
2014-03-01
The prevalence of Dry Eye Disease (DED) in the USA is approximately 40 million in aging adults with about $3.8 billion economic burden. However, a comprehensive understanding of tear film dynamics, which is the prerequisite to advance the management of DED, is yet to be realized. To extend our understanding of tear film dynamics, we investigate the simultaneous estimation of the lipid and aqueous layers thicknesses with the combination of optical coherence tomography (OCT) and statistical decision theory. In specific, we develop a mathematical model for Fourier-domain OCT where we take into account the different statistical processes associated with the imaging chain. We formulate the first-order and second-order statistical quantities of the output of the OCT system, which can generate some simulated OCT spectra. A tear film model, which includes a lipid and aqueous layer on top of a rough corneal surface, is the object being imaged. Then we further implement a Maximum-likelihood (ML) estimator to interpret the simulated OCT data to estimate the thicknesses of both layers of the tear film. Results show that an axial resolution of 1 μm allows estimates down to nanometers scale. We use the root mean square error of the estimates as a metric to evaluate the system parameters, such as the tradeoff between the imaging speed and the precision of estimation. This framework further provides the theoretical basics to optimize the imaging setup for a specific thickness estimation task.
Automatic medical image annotation and keyword-based image retrieval using relevance feedback.
Ko, Byoung Chul; Lee, JiHyeon; Nam, Jae-Yeal
2012-08-01
This paper presents novel multiple keywords annotation for medical images, keyword-based medical image retrieval, and relevance feedback method for image retrieval for enhancing image retrieval performance. For semantic keyword annotation, this study proposes a novel medical image classification method combining local wavelet-based center symmetric-local binary patterns with random forests. For keyword-based image retrieval, our retrieval system use the confidence score that is assigned to each annotated keyword by combining probabilities of random forests with predefined body relation graph. To overcome the limitation of keyword-based image retrieval, we combine our image retrieval system with relevance feedback mechanism based on visual feature and pattern classifier. Compared with other annotation and relevance feedback algorithms, the proposed method shows both improved annotation performance and accurate retrieval results.
Willis, Brian H; Riley, Richard D
2017-09-20
An important question for clinicians appraising a meta-analysis is: are the findings likely to be valid in their own practice-does the reported effect accurately represent the effect that would occur in their own clinical population? To this end we advance the concept of statistical validity-where the parameter being estimated equals the corresponding parameter for a new independent study. Using a simple ('leave-one-out') cross-validation technique, we demonstrate how we may test meta-analysis estimates for statistical validity using a new validation statistic, Vn, and derive its distribution. We compare this with the usual approach of investigating heterogeneity in meta-analyses and demonstrate the link between statistical validity and homogeneity. Using a simulation study, the properties of Vn and the Q statistic are compared for univariate random effects meta-analysis and a tailored meta-regression model, where information from the setting (included as model covariates) is used to calibrate the summary estimate to the setting of application. Their properties are found to be similar when there are 50 studies or more, but for fewer studies Vn has greater power but a higher type 1 error rate than Q. The power and type 1 error rate of Vn are also shown to depend on the within-study variance, between-study variance, study sample size, and the number of studies in the meta-analysis. Finally, we apply Vn to two published meta-analyses and conclude that it usefully augments standard methods when deciding upon the likely validity of summary meta-analysis estimates in clinical practice. © 2017 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd. © 2017 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd.
Effects of communicating DNA-based disease risk estimates on risk-reducing behaviours.
Marteau, Theresa M; French, David P; Griffin, Simon J; Prevost, A T; Sutton, Stephen; Watkinson, Clare; Attwood, Sophie; Hollands, Gareth J
2010-10-06
There are high expectations regarding the potential for the communication of DNA-based disease risk estimates to motivate behaviour change. To assess the effects of communicating DNA-based disease risk estimates on risk-reducing behaviours and motivation to undertake such behaviours. We searched the following databases using keywords and medical subject headings: Cochrane Central Register of Controlled Trials (CENTRAL, The Cochrane Library, Issue 4 2010), MEDLINE (1950 to April 2010), EMBASE (1980 to April 2010), PsycINFO (1985 to April 2010) using OVID SP, and CINAHL (EBSCO) (1982 to April 2010). We also searched reference lists, conducted forward citation searches of potentially eligible articles and contacted authors of relevant studies for suggestions. There were no language restrictions. Unpublished or in press articles were eligible for inclusion. Randomised or quasi-randomised controlled trials involving adults (aged 18 years and over) in which one group received actual (clinical studies) or imagined (analogue studies) personalised DNA-based disease risk estimates for diseases for which the risk could plausibly be reduced by behavioural change. Eligible studies had to include a primary outcome measure of risk-reducing behaviour or motivation (e.g. intention) to alter such behaviour. Two review authors searched for studies and independently extracted data. We assessed risk of bias according to the Cochrane Handbook for Systematic Reviews of Interventions. For continuous outcome measures, we report effect sizes as standardised mean differences (SMDs). For dichotomous outcome measures, we report effect sizes as odds ratios (ORs). We obtained pooled effect sizes with 95% confidence intervals (CIs) using the random effects model applied on the scale of standardised differences and log odds ratios. We examined 5384 abstracts and identified 21 studies as potentially eligible. Following a full text analysis, we included 14 papers reporting results of 7 clinical studies (2 papers report on the same trial) and 6 analogue studies.Of the seven clinical studies, five assessed smoking cessation. Meta-analyses revealed no statistically significant effects on either short-term (less than 6 months) smoking cessation (OR 1.35, 95% CI 0.76 to 2.39, P = 0.31, n = 3 studies) or cessation after six months (OR 1.07, 95% CI 0.64 to 1.78, P = 0.80, n = 4 studies). Two clinical studies assessed diet and found effects that significantly favoured DNA-based risk estimates (OR 2.24, 95% CI 1.17 to 4.27, P = 0.01). No statistically significant effects were found in the two studies assessing physical activity (OR 1.03, 95% CI 0.59 to 1.80, P = 0.92) or the one study assessing medication or vitamin use aimed at reducing disease risks (OR 1.26, 95% CI 0.58 to 2.72, P = 0.56). For the six non-clinical analogue studies, meta-analysis revealed a statistically significant effect of DNA-based risk on intention to change behaviour (SMD 0.16, 95% CI 0.04 to 0.29, P = 0.01).There was no evidence that communicating DNA-based disease risk estimates had any unintended adverse effects. Two studies that assessed fear arousal immediately after the presentation of risk information did, however, report greater fear arousal in the DNA-based disease risk estimate groups compared to comparison groups.The quality of included studies was generally poor. None of the clinical or analogue studies were considered to have a low risk of bias, due to either a lack of clarity in reporting, or where details were reported, evidence of a failure to sufficiently safeguard against the risk of bias. Mindful of the weak evidence based on a small number of studies of limited quality, the results of this review suggest that communicating DNA-based disease risk estimates has little or no effect on smoking and physical activity. It may have a small effect on self-reported diet and on intentions to change behaviour. Claims that receiving DNA-based test results motivates people to change their behaviour are not supported by evidence. Larger and better-quality RCTs are needed.
Adaptive Error Estimation in Linearized Ocean General Circulation Models
NASA Technical Reports Server (NTRS)
Chechelnitsky, Michael Y.
1999-01-01
Data assimilation methods are routinely used in oceanography. The statistics of the model and measurement errors need to be specified a priori. This study addresses the problem of estimating model and measurement error statistics from observations. We start by testing innovation based methods of adaptive error estimation with low-dimensional models in the North Pacific (5-60 deg N, 132-252 deg E) to TOPEX/POSEIDON (TIP) sea level anomaly data, acoustic tomography data from the ATOC project, and the MIT General Circulation Model (GCM). A reduced state linear model that describes large scale internal (baroclinic) error dynamics is used. The methods are shown to be sensitive to the initial guess for the error statistics and the type of observations. A new off-line approach is developed, the covariance matching approach (CMA), where covariance matrices of model-data residuals are "matched" to their theoretical expectations using familiar least squares methods. This method uses observations directly instead of the innovations sequence and is shown to be related to the MT method and the method of Fu et al. (1993). Twin experiments using the same linearized MIT GCM suggest that altimetric data are ill-suited to the estimation of internal GCM errors, but that such estimates can in theory be obtained using acoustic data. The CMA is then applied to T/P sea level anomaly data and a linearization of a global GFDL GCM which uses two vertical modes. We show that the CMA method can be used with a global model and a global data set, and that the estimates of the error statistics are robust. We show that the fraction of the GCM-T/P residual variance explained by the model error is larger than that derived in Fukumori et al.(1999) with the method of Fu et al.(1993). Most of the model error is explained by the barotropic mode. However, we find that impact of the change in the error statistics on the data assimilation estimates is very small. This is explained by the large representation error, i.e. the dominance of the mesoscale eddies in the T/P signal, which are not part of the 21 by 1" GCM. Therefore, the impact of the observations on the assimilation is very small even after the adjustment of the error statistics. This work demonstrates that simult&neous estimation of the model and measurement error statistics for data assimilation with global ocean data sets and linearized GCMs is possible. However, the error covariance estimation problem is in general highly underdetermined, much more so than the state estimation problem. In other words there exist a very large number of statistical models that can be made consistent with the available data. Therefore, methods for obtaining quantitative error estimates, powerful though they may be, cannot replace physical insight. Used in the right context, as a tool for guiding the choice of a small number of model error parameters, covariance matching can be a useful addition to the repertory of tools available to oceanographers.
Residential radon and environmental burden of disease among Non-smokers.
Noh, Juhwan; Sohn, Jungwoo; Cho, Jaelim; Kang, Dae Ryong; Joo, Sowon; Kim, Changsoo; Shin, Dong Chun
2016-01-01
Lung cancer was the second highest absolute cancer incidence globally and the first cause of cancer mortality in 2014. Indoor radon is the second leading risk factor of lung cancer after cigarette smoking among ever smokers and the first among non-smokers. Environmental burden of disease (EBD) attributable to residential radon among non-smokers is critical for identifying threats to population health and planning health policy. To identify and retrieve literatures describing environmental burden of lung cancer attributable to residential radon, we searched databases including Ovid-MEDLINE, -EMBASE from 1980 to 2016. Search terms included patient keywords using 'lung', 'neoplasm', exposure keywords using 'residential', 'radon', and outcomes keywords using 'years of life lost', 'years of life lost due to disability', 'burden'. Searching through literatures identified 261 documents; further 9 documents were identified using manual searching. Two researchers independently assessed 271 abstracts eligible for inclusion at the abstract level. Full text reviews were conducted for selected publications after the first assessment. Ten studies were included in the final evaluation. Global disability-adjusted life years (DALYs)(95 % uncertainty interval) for lung cancer were increased by 35.9 % from 23,850,000(18,835,000-29,845,000) in 1900 to 32,405,000(24,400,000-38,334,000) in 2000. DALYs attributable to residential radon were 2,114,000(273,000-4,660,000) DALYs in 2010. Lung cancer caused 34,732,900(33,042,600 ~ 36,328,100) DALYs in 2013. DALYs attributable to residential radon were 1,979,000(1,331,000-2,768,000) DALYs for in 2013. The number of attributable lung cancer cases was 70-900 and EBD for radon was 1,000-14,000 DALYs in Netherland. The years of life lost were 0.066 years among never-smokers and 0.198 years among ever-smoker population in Canada. In summary, estimated global EBD attributable to residential radon was 1,979,000 DALYs for both sexes in 2013. In Netherlands, EBD for radon was 1,000-14,000 DALYs. Smoking population lost three times more years than never-smokers in Canada. There was no study estimating EBD of residential radon among never smokers in Korea and Asian country. In addition, there were a few studies reflecting the age of building, though residential radon exposure level depends on the age of building. Further EBD study reflecting Korean disability weight and the age of building is required to estimate EBD precisely.
NASA Technical Reports Server (NTRS)
Todling, Ricardo
2015-01-01
Recently, this author studied an approach to the estimation of system error based on combining observation residuals derived from a sequential filter and fixed lag-1 smoother. While extending the methodology to a variational formulation, experimenting with simple models and making sure consistency was found between the sequential and variational formulations, the limitations of the residual-based approach came clearly to the surface. This note uses the sequential assimilation application to simple nonlinear dynamics to highlight the issue. Only when some of the underlying error statistics are assumed known is it possible to estimate the unknown component. In general, when considerable uncertainties exist in the underlying statistics as a whole, attempts to obtain separate estimates of the various error covariances are bound to lead to misrepresentation of errors. The conclusions are particularly relevant to present-day attempts to estimate observation-error correlations from observation residual statistics. A brief illustration of the issue is also provided by comparing estimates of error correlations derived from a quasi-operational assimilation system and a corresponding Observing System Simulation Experiments framework.
Wendel, Jeanne; Dumitras, Diana
2005-06-01
This paper describes an analytical methodology for obtaining statistically unbiased outcomes estimates for programs in which participation decisions may be correlated with variables that impact outcomes. This methodology is particularly useful for intraorganizational program evaluations conducted for business purposes. In this situation, data is likely to be available for a population of managed care members who are eligible to participate in a disease management (DM) program, with some electing to participate while others eschew the opportunity. The most pragmatic analytical strategy for in-house evaluation of such programs is likely to be the pre-intervention/post-intervention design in which the control group consists of people who were invited to participate in the DM program, but declined the invitation. Regression estimates of program impacts may be statistically biased if factors that impact participation decisions are correlated with outcomes measures. This paper describes an econometric procedure, the Treatment Effects model, developed to produce statistically unbiased estimates of program impacts in this type of situation. Two equations are estimated to (a) estimate the impacts of patient characteristics on decisions to participate in the program, and then (b) use this information to produce a statistically unbiased estimate of the impact of program participation on outcomes. This methodology is well-established in economics and econometrics, but has not been widely applied in the DM outcomes measurement literature; hence, this paper focuses on one illustrative application.
Connell, J.F.; Bailey, Z.C.
1989-01-01
A total of 338 single-well aquifer tests from Bear Creek and Melton Valley, Tennessee were statistically grouped to estimate hydraulic conductivities for the geologic formations in the valleys. A cross-sectional simulation model linked to a regression model was used to further refine the statistical estimates for each of the formations and to improve understanding of ground-water flow in Bear Creek Valley. Median hydraulic-conductivity values were used as initial values in the model. Model-calculated estimates of hydraulic conductivity were generally lower than the statistical estimates. Simulations indicate that (1) the Pumpkin Valley Shale controls groundwater flow between Pine Ridge and Bear Creek; (2) all the recharge on Chestnut Ridge discharges to the Maynardville Limestone; (3) the formations having smaller hydraulic gradients may have a greater tendency for flow along strike; (4) local hydraulic conditions in the Maynardville Limestone cause inaccurate model-calculated estimates of hydraulic conductivity; and (5) the conductivity of deep bedrock neither affects the results of the model nor does it add information on the flow system. Improved model performance would require: (1) more water level data for the Copper Ridge Dolomite; (2) improved estimates of hydraulic conductivity in the Copper Ridge Dolomite and Maynardville Limestone; and (3) more water level data and aquifer tests in deep bedrock. (USGS)
Kim, Yoonsang; Choi, Young-Ku; Emery, Sherry
2013-08-01
Several statistical packages are capable of estimating generalized linear mixed models and these packages provide one or more of three estimation methods: penalized quasi-likelihood, Laplace, and Gauss-Hermite. Many studies have investigated these methods' performance for the mixed-effects logistic regression model. However, the authors focused on models with one or two random effects and assumed a simple covariance structure between them, which may not be realistic. When there are multiple correlated random effects in a model, the computation becomes intensive, and often an algorithm fails to converge. Moreover, in our analysis of smoking status and exposure to anti-tobacco advertisements, we have observed that when a model included multiple random effects, parameter estimates varied considerably from one statistical package to another even when using the same estimation method. This article presents a comprehensive review of the advantages and disadvantages of each estimation method. In addition, we compare the performances of the three methods across statistical packages via simulation, which involves two- and three-level logistic regression models with at least three correlated random effects. We apply our findings to a real dataset. Our results suggest that two packages-SAS GLIMMIX Laplace and SuperMix Gaussian quadrature-perform well in terms of accuracy, precision, convergence rates, and computing speed. We also discuss the strengths and weaknesses of the two packages in regard to sample sizes.
Kim, Yoonsang; Emery, Sherry
2013-01-01
Several statistical packages are capable of estimating generalized linear mixed models and these packages provide one or more of three estimation methods: penalized quasi-likelihood, Laplace, and Gauss-Hermite. Many studies have investigated these methods’ performance for the mixed-effects logistic regression model. However, the authors focused on models with one or two random effects and assumed a simple covariance structure between them, which may not be realistic. When there are multiple correlated random effects in a model, the computation becomes intensive, and often an algorithm fails to converge. Moreover, in our analysis of smoking status and exposure to anti-tobacco advertisements, we have observed that when a model included multiple random effects, parameter estimates varied considerably from one statistical package to another even when using the same estimation method. This article presents a comprehensive review of the advantages and disadvantages of each estimation method. In addition, we compare the performances of the three methods across statistical packages via simulation, which involves two- and three-level logistic regression models with at least three correlated random effects. We apply our findings to a real dataset. Our results suggest that two packages—SAS GLIMMIX Laplace and SuperMix Gaussian quadrature—perform well in terms of accuracy, precision, convergence rates, and computing speed. We also discuss the strengths and weaknesses of the two packages in regard to sample sizes. PMID:24288415
NASA Astrophysics Data System (ADS)
Kato, Takeyoshi; Sugimoto, Hiroyuki; Suzuoki, Yasuo
We established a procedure for estimating regional electricity demand and regional potential capacity of distributed generators (DGs) by using a grid square statistics data set. A photovoltaic power system (PV system) for residential use and a co-generation system (CGS) for both residential and commercial use were taken into account. As an example, the result regarding Aichi prefecture was presented in this paper. The statistical data of the number of households by family-type and the number of employees by business category for about 4000 grid-square with 1km × 1km area was used to estimate the floor space or the electricity demand distribution. The rooftop area available for installing PV systems was also estimated with the grid-square statistics data set. Considering the relation between a capacity of existing CGS and a scale-index of building where CGS is installed, the potential capacity of CGS was estimated for three business categories, i.e. hotel, hospital, store. In some regions, the potential capacity of PV systems was estimated to be about 10,000kW/km2, which corresponds to the density of the existing area with intensive installation of PV systems. Finally, we discussed the ratio of regional potential capacity of DGs to regional maximum electricity demand for deducing the appropriate capacity of DGs in the model of future electricity distribution system.
Asquith, William H.; Barbie, Dana L.
2014-01-01
Selected summary statistics (L-moments) and estimates of respective sampling variances were computed for the 35 streamgages lacking statistically significant trends. From the L-moments and estimated sampling variances, weighted means or regional values were computed for each L-moment. An example application is included demonstrating how the L-moments could be used to evaluate the magnitude and frequency of annual mean streamflow.
DOT National Transportation Integrated Search
2009-06-01
A statistical projection of traffic fatalities for the first quarter of 2009 shows that an estimated 7,689 people died in motor vehicle traffic crashes. This represents a decline of about 9 percent as compared to the 8,451 fatalities that occurred in...
DOT National Transportation Integrated Search
2010-09-01
A statistical projection of traffic fatalities for the first half of : 2010 shows that an estimated 14,996 people died in motor : vehicle traffic crashes. This represents a decline of about 9.2 : percent as compared to the 16,509 fatalities that occu...
Non-convex Statistical Optimization for Sparse Tensor Graphical Model
Sun, Wei; Wang, Zhaoran; Liu, Han; Cheng, Guang
2016-01-01
We consider the estimation of sparse graphical models that characterize the dependency structure of high-dimensional tensor-valued data. To facilitate the estimation of the precision matrix corresponding to each way of the tensor, we assume the data follow a tensor normal distribution whose covariance has a Kronecker product structure. The penalized maximum likelihood estimation of this model involves minimizing a non-convex objective function. In spite of the non-convexity of this estimation problem, we prove that an alternating minimization algorithm, which iteratively estimates each sparse precision matrix while fixing the others, attains an estimator with the optimal statistical rate of convergence as well as consistent graph recovery. Notably, such an estimator achieves estimation consistency with only one tensor sample, which is unobserved in previous work. Our theoretical results are backed by thorough numerical studies. PMID:28316459
Statistical estimation via convex optimization for trending and performance monitoring
NASA Astrophysics Data System (ADS)
Samar, Sikandar
This thesis presents an optimization-based statistical estimation approach to find unknown trends in noisy data. A Bayesian framework is used to explicitly take into account prior information about the trends via trend models and constraints. The main focus is on convex formulation of the Bayesian estimation problem, which allows efficient computation of (globally) optimal estimates. There are two main parts of this thesis. The first part formulates trend estimation in systems described by known detailed models as a convex optimization problem. Statistically optimal estimates are then obtained by maximizing a concave log-likelihood function subject to convex constraints. We consider the problem of increasing problem dimension as more measurements become available, and introduce a moving horizon framework to enable recursive estimation of the unknown trend by solving a fixed size convex optimization problem at each horizon. We also present a distributed estimation framework, based on the dual decomposition method, for a system formed by a network of complex sensors with local (convex) estimation. Two specific applications of the convex optimization-based Bayesian estimation approach are described in the second part of the thesis. Batch estimation for parametric diagnostics in a flight control simulation of a space launch vehicle is shown to detect incipient fault trends despite the natural masking properties of feedback in the guidance and control loops. Moving horizon approach is used to estimate time varying fault parameters in a detailed nonlinear simulation model of an unmanned aerial vehicle. An excellent performance is demonstrated in the presence of winds and turbulence.
Generalized Hurst exponent estimates differentiate EEG signals of healthy and epileptic patients
NASA Astrophysics Data System (ADS)
Lahmiri, Salim
2018-01-01
The aim of our current study is to check whether multifractal patterns of the electroencephalographic (EEG) signals of normal and epileptic patients are statistically similar or different. In this regard, the generalized Hurst exponent (GHE) method is used for robust estimation of the multifractals in each type of EEG signals, and three powerful statistical tests are performed to check existence of differences between estimated GHEs from healthy control subjects and epileptic patients. The obtained results show that multifractals exist in both types of EEG signals. Particularly, it was found that the degree of fractal is more pronounced in short variations of normal EEG signals than in short variations of EEG signals with seizure free intervals. In contrary, it is more pronounced in long variations of EEG signals with seizure free intervals than in normal EEG signals. Importantly, both parametric and nonparametric statistical tests show strong evidence that estimated GHEs of normal EEG signals are statistically and significantly different from those with seizure free intervals. Therefore, GHEs can be efficiently used to distinguish between healthy and patients suffering from epilepsy.
NASA Astrophysics Data System (ADS)
Rosas, Pedro; Wagemans, Johan; Ernst, Marc O.; Wichmann, Felix A.
2005-05-01
A number of models of depth-cue combination suggest that the final depth percept results from a weighted average of independent depth estimates based on the different cues available. The weight of each cue in such an average is thought to depend on the reliability of each cue. In principle, such a depth estimation could be statistically optimal in the sense of producing the minimum-variance unbiased estimator that can be constructed from the available information. Here we test such models by using visual and haptic depth information. Different texture types produce differences in slant-discrimination performance, thus providing a means for testing a reliability-sensitive cue-combination model with texture as one of the cues to slant. Our results show that the weights for the cues were generally sensitive to their reliability but fell short of statistically optimal combination - we find reliability-based reweighting but not statistically optimal cue combination.
Mahmood, Iftekhar
2004-01-01
The objective of this study was to evaluate the performance of Wagner-Nelson, Loo-Reigelman, and statistical moments methods in determining the absorption rate constant(s) in the presence of a secondary peak. These methods were also evaluated when there were two absorption rates without a secondary peak. Different sets of plasma concentration versus time data for a hypothetical drug following one or two compartment models were generated by simulation. The true ka was compared with the ka estimated by Wagner-Nelson, Loo-Riegelman and statistical moments methods. The results of this study indicate that Wagner-Nelson, Loo-Riegelman and statistical moments methods may not be used for the estimation of absorption rate constants in the presence of a secondary peak or when absorption takes place with two absorption rates.
Web page sorting algorithm based on query keyword distance relation
NASA Astrophysics Data System (ADS)
Yang, Han; Cui, Hong Gang; Tang, Hao
2017-08-01
In order to optimize the problem of page sorting, according to the search keywords in the web page in the relationship between the characteristics of the proposed query keywords clustering ideas. And it is converted into the degree of aggregation of the search keywords in the web page. Based on the PageRank algorithm, the clustering degree factor of the query keyword is added to make it possible to participate in the quantitative calculation. This paper proposes an improved algorithm for PageRank based on the distance relation between search keywords. The experimental results show the feasibility and effectiveness of the method.
Estimation of two ordered mean residual lifetime functions.
Ebrahimi, N
1993-06-01
In many statistical studies involving failure data, biometric mortality data, and actuarial data, mean residual lifetime (MRL) function is of prime importance. In this paper we introduce the problem of nonparametric estimation of a MRL function on an interval when this function is bounded from below by another such function (known or unknown) on that interval, and derive the corresponding two functional estimators. The first is to be used when there is a known bound, and the second when the bound is another MRL function to be estimated independently. Both estimators are obtained by truncating the empirical estimator discussed by Yang (1978, Annals of Statistics 6, 112-117). In the first case, it is truncated at a known bound; in the second, at a point somewhere between the two empirical estimates. Consistency of both estimators is proved, and a pointwise large-sample distribution theory of the first estimator is derived.
Nariai, N; Kim, S; Imoto, S; Miyano, S
2004-01-01
We propose a statistical method to estimate gene networks from DNA microarray data and protein-protein interactions. Because physical interactions between proteins or multiprotein complexes are likely to regulate biological processes, using only mRNA expression data is not sufficient for estimating a gene network accurately. Our method adds knowledge about protein-protein interactions to the estimation method of gene networks under a Bayesian statistical framework. In the estimated gene network, a protein complex is modeled as a virtual node based on principal component analysis. We show the effectiveness of the proposed method through the analysis of Saccharomyces cerevisiae cell cycle data. The proposed method improves the accuracy of the estimated gene networks, and successfully identifies some biological facts.
Blood pulse wave velocity measured by photoacoustic microscopy
NASA Astrophysics Data System (ADS)
Yeh, Chenghung; Hu, Song; Maslov, Konstantin; Wang, Lihong V.
2013-03-01
Blood pulse wave velocity (PWV) is an important indicator for vascular stiffness. In this letter, we present electrocardiogram-synchronized photoacoustic microscopy for in vivo noninvasive quantification of the PWV in the peripheral vessels of mice. Interestingly, strong correlation between blood flow speed and ECG were clearly observed in arteries but not in veins. PWV is measured by the pulse travel time and the distance between two spot of a chose vessel, where simultaneously recorded electrocardiograms served as references. Statistical analysis shows a linear correlation between the PWV and the vessel diameter, which agrees with known physiology. Keywords: photoacoustic microscopy, photoacoustic spectroscopy, bilirubin, scattering medium.
Combining statistical inference and decisions in ecology.
Williams, Perry J; Hooten, Mevin B
2016-09-01
Statistical decision theory (SDT) is a sub-field of decision theory that formally incorporates statistical investigation into a decision-theoretic framework to account for uncertainties in a decision problem. SDT provides a unifying analysis of three types of information: statistical results from a data set, knowledge of the consequences of potential choices (i.e., loss), and prior beliefs about a system. SDT links the theoretical development of a large body of statistical methods, including point estimation, hypothesis testing, and confidence interval estimation. The theory and application of SDT have mainly been developed and published in the fields of mathematics, statistics, operations research, and other decision sciences, but have had limited exposure in ecology. Thus, we provide an introduction to SDT for ecologists and describe its utility for linking the conventionally separate tasks of statistical investigation and decision making in a single framework. We describe the basic framework of both Bayesian and frequentist SDT, its traditional use in statistics, and discuss its application to decision problems that occur in ecology. We demonstrate SDT with two types of decisions: Bayesian point estimation and an applied management problem of selecting a prescribed fire rotation for managing a grassland bird species. Central to SDT, and decision theory in general, are loss functions. Thus, we also provide basic guidance and references for constructing loss functions for an SDT problem. © 2016 by the Ecological Society of America.
Peter, Raphael Simon; Brehme, Torben; Völzke, Henry; Muche, Rainer; Rothenbacher, Dietrich; Büchele, Gisela
2016-06-01
Knowledge of epidemiologic research topics as well as trends is useful for scientific societies, researchers and funding agencies. In recent years researchers recognized the usefulness of keyword network analysis for visualizing and analyzing scientific research topics. Therefore, we applied keyword network analysis to present an overview of current epidemiologic research topics in Germany. Accepted submissions to the 9th annual congress of the German Society for Epidemiology (DGEpi) in 2014 were used as data source. Submitters had to choose one of 19 subject areas, and were ask to provide a title, structured abstract, names of authors along with their affiliations, and a list of freely selectable keywords. Keywords had been provided for 262 (82 %) submissions, 1030 keywords in total. Overall the most common keywords were: "migration" (18 times), "prevention" (15 times), followed by "children", "cohort study", "physical activity", and "secondary data analysis" (11 times each). Some keywords showed a certain concentration under one specific subject area, e.g. "migration" with 8 of 18 in social epidemiology or "breast cancer" with 4 of 7 in cancer epidemiology. While others like "physical activity" were equally distributed over multiple subject areas (cardiovascular & metabolic diseases, ageing, methods, paediatrics, prevention & health service research). This keyword network analysis demonstrated the high diversity of epidemiologic research topics with a large number of distinct keywords as presented at the annual conference of the DGEpi.
Hogben; Lawson
1997-07-01
The literature on keyword training presents a confusing picture of the usefulness of the keyword method for foreign language vocabulary learning by students with strong verbal knowledge backgrounds. This paper reviews research which notes the existence of conflicting sets of findings concerning the verbal background-keyword training relationship and presents the results of analyses which argue against the assertion made by McDaniel and Pressley (1984) that keyword training will have minimal effect on students with high verbal ability. Findings from regression analyses of data from two studies did not show that the relationship between keyword training and immediate recall performance was moderated by verbal knowledge background. The disparate sets of findings related to the keyword training-verbal knowledge relationship and themes emerging from other research suggest that this relationship requires further examination.
Alić, Nikola; Papen, George; Saperstein, Robert; Milstein, Laurence; Fainman, Yeshaiahu
2005-06-13
Exact signal statistics for fiber-optic links containing a single optical pre-amplifier are calculated and applied to sequence estimation for electronic dispersion compensation. The performance is evaluated and compared with results based on the approximate chi-square statistics. We show that detection in existing systems based on exact statistics can be improved relative to using a chi-square distribution for realistic filter shapes. In contrast, for high-spectral efficiency systems the difference between the two approaches diminishes, and performance tends to be less dependent on the exact shape of the filter used.
High cumulants of conserved charges and their statistical uncertainties
NASA Astrophysics Data System (ADS)
Li-Zhu, Chen; Ye-Yin, Zhao; Xue, Pan; Zhi-Ming, Li; Yuan-Fang, Wu
2017-10-01
We study the influence of measured high cumulants of conserved charges on their associated statistical uncertainties in relativistic heavy-ion collisions. With a given number of events, the measured cumulants randomly fluctuate with an approximately normal distribution, while the estimated statistical uncertainties are found to be correlated with corresponding values of the obtained cumulants. Generally, with a given number of events, the larger the cumulants we measure, the larger the statistical uncertainties that are estimated. The error-weighted averaged cumulants are dependent on statistics. Despite this effect, however, it is found that the three sigma rule of thumb is still applicable when the statistics are above one million. Supported by NSFC (11405088, 11521064, 11647093), Major State Basic Research Development Program of China (2014CB845402) and Ministry of Science and Technology (MoST) (2016YFE0104800)
Statistical Methods for Generalized Linear Models with Covariates Subject to Detection Limits.
Bernhardt, Paul W; Wang, Huixia J; Zhang, Daowen
2015-05-01
Censored observations are a common occurrence in biomedical data sets. Although a large amount of research has been devoted to estimation and inference for data with censored responses, very little research has focused on proper statistical procedures when predictors are censored. In this paper, we consider statistical methods for dealing with multiple predictors subject to detection limits within the context of generalized linear models. We investigate and adapt several conventional methods and develop a new multiple imputation approach for analyzing data sets with predictors censored due to detection limits. We establish the consistency and asymptotic normality of the proposed multiple imputation estimator and suggest a computationally simple and consistent variance estimator. We also demonstrate that the conditional mean imputation method often leads to inconsistent estimates in generalized linear models, while several other methods are either computationally intensive or lead to parameter estimates that are biased or more variable compared to the proposed multiple imputation estimator. In an extensive simulation study, we assess the bias and variability of different approaches within the context of a logistic regression model and compare variance estimation methods for the proposed multiple imputation estimator. Lastly, we apply several methods to analyze the data set from a recently-conducted GenIMS study.
Ahearn, Elizabeth A.
2010-01-01
Multiple linear regression equations for determining flow-duration statistics were developed to estimate select flow exceedances ranging from 25- to 99-percent for six 'bioperiods'-Salmonid Spawning (November), Overwinter (December-February), Habitat Forming (March-April), Clupeid Spawning (May), Resident Spawning (June), and Rearing and Growth (July-October)-in Connecticut. Regression equations also were developed to estimate the 25- and 99-percent flow exceedances without reference to a bioperiod. In total, 32 equations were developed. The predictive equations were based on regression analyses relating flow statistics from streamgages to GIS-determined basin and climatic characteristics for the drainage areas of those streamgages. Thirty-nine streamgages (and an additional 6 short-term streamgages and 28 partial-record sites for the non-bioperiod 99-percent exceedance) in Connecticut and adjacent areas of neighboring States were used in the regression analysis. Weighted least squares regression analysis was used to determine the predictive equations; weights were assigned based on record length. The basin characteristics-drainage area, percentage of area with coarse-grained stratified deposits, percentage of area with wetlands, mean monthly precipitation (November), mean seasonal precipitation (December, January, and February), and mean basin elevation-are used as explanatory variables in the equations. Standard errors of estimate of the 32 equations ranged from 10.7 to 156 percent with medians of 19.2 and 55.4 percent to predict the 25- and 99-percent exceedances, respectively. Regression equations to estimate high and median flows (25- to 75-percent exceedances) are better predictors (smaller variability of the residual values around the regression line) than the equations to estimate low flows (less than 75-percent exceedance). The Habitat Forming (March-April) bioperiod had the smallest standard errors of estimate, ranging from 10.7 to 20.9 percent. In contrast, the Rearing and Growth (July-October) bioperiod had the largest standard errors, ranging from 30.9 to 156 percent. The adjusted coefficient of determination of the equations ranged from 77.5 to 99.4 percent with medians of 98.5 and 90.6 percent to predict the 25- and 99-percent exceedances, respectively. Descriptive information on the streamgages used in the regression, measured basin and climatic characteristics, and estimated flow-duration statistics are provided in this report. Flow-duration statistics and the 32 regression equations for estimating flow-duration statistics in Connecticut are stored on the U.S. Geological Survey World Wide Web application ?StreamStats? (http://water.usgs.gov/osw/streamstats/index.html). The regression equations developed in this report can be used to produce unbiased estimates of select flow exceedances statewide.
Robustness of S1 statistic with Hodges-Lehmann for skewed distributions
NASA Astrophysics Data System (ADS)
Ahad, Nor Aishah; Yahaya, Sharipah Soaad Syed; Yin, Lee Ping
2016-10-01
Analysis of variance (ANOVA) is a common use parametric method to test the differences in means for more than two groups when the populations are normally distributed. ANOVA is highly inefficient under the influence of non- normal and heteroscedastic settings. When the assumptions are violated, researchers are looking for alternative such as Kruskal-Wallis under nonparametric or robust method. This study focused on flexible method, S1 statistic for comparing groups using median as the location estimator. S1 statistic was modified by substituting the median with Hodges-Lehmann and the default scale estimator with the variance of Hodges-Lehmann and MADn to produce two different test statistics for comparing groups. Bootstrap method was used for testing the hypotheses since the sampling distributions of these modified S1 statistics are unknown. The performance of the proposed statistic in terms of Type I error was measured and compared against the original S1 statistic, ANOVA and Kruskal-Wallis. The propose procedures show improvement compared to the original statistic especially under extremely skewed distribution.
Erus, Guray; Zacharaki, Evangelia I; Davatzikos, Christos
2014-04-01
This paper presents a method for capturing statistical variation of normal imaging phenotypes, with emphasis on brain structure. The method aims to estimate the statistical variation of a normative set of images from healthy individuals, and identify abnormalities as deviations from normality. A direct estimation of the statistical variation of the entire volumetric image is challenged by the high-dimensionality of images relative to smaller sample sizes. To overcome this limitation, we iteratively sample a large number of lower dimensional subspaces that capture image characteristics ranging from fine and localized to coarser and more global. Within each subspace, a "target-specific" feature selection strategy is applied to further reduce the dimensionality, by considering only imaging characteristics present in a test subject's images. Marginal probability density functions of selected features are estimated through PCA models, in conjunction with an "estimability" criterion that limits the dimensionality of estimated probability densities according to available sample size and underlying anatomy variation. A test sample is iteratively projected to the subspaces of these marginals as determined by PCA models, and its trajectory delineates potential abnormalities. The method is applied to segmentation of various brain lesion types, and to simulated data on which superiority of the iterative method over straight PCA is demonstrated. Copyright © 2014 Elsevier B.V. All rights reserved.
Statistical inference involving binomial and negative binomial parameters.
García-Pérez, Miguel A; Núñez-Antón, Vicente
2009-05-01
Statistical inference about two binomial parameters implies that they are both estimated by binomial sampling. There are occasions in which one aims at testing the equality of two binomial parameters before and after the occurrence of the first success along a sequence of Bernoulli trials. In these cases, the binomial parameter before the first success is estimated by negative binomial sampling whereas that after the first success is estimated by binomial sampling, and both estimates are related. This paper derives statistical tools to test two hypotheses, namely, that both binomial parameters equal some specified value and that both parameters are equal though unknown. Simulation studies are used to show that in small samples both tests are accurate in keeping the nominal Type-I error rates, and also to determine sample size requirements to detect large, medium, and small effects with adequate power. Additional simulations also show that the tests are sufficiently robust to certain violations of their assumptions.
Methods for estimating selected low-flow frequency statistics for unregulated streams in Kentucky
Martin, Gary R.; Arihood, Leslie D.
2010-01-01
This report provides estimates of, and presents methods for estimating, selected low-flow frequency statistics for unregulated streams in Kentucky including the 30-day mean low flows for recurrence intervals of 2 and 5 years (30Q2 and 30Q5) and the 7-day mean low flows for recurrence intervals of 5, 10, and 20 years (7Q2, 7Q10, and 7Q20). Estimates of these statistics are provided for 121 U.S. Geological Survey streamflow-gaging stations with data through the 2006 climate year, which is the 12-month period ending March 31 of each year. Data were screened to identify the periods of homogeneous, unregulated flows for use in the analyses. Logistic-regression equations are presented for estimating the annual probability of the selected low-flow frequency statistics being equal to zero. Weighted-least-squares regression equations were developed for estimating the magnitude of the nonzero 30Q2, 30Q5, 7Q2, 7Q10, and 7Q20 low flows. Three low-flow regions were defined for estimating the 7-day low-flow frequency statistics. The explicit explanatory variables in the regression equations include total drainage area and the mapped streamflow-variability index measured from a revised statewide coverage of this characteristic. The percentage of the station low-flow statistics correctly classified as zero or nonzero by use of the logistic-regression equations ranged from 87.5 to 93.8 percent. The average standard errors of prediction of the weighted-least-squares regression equations ranged from 108 to 226 percent. The 30Q2 regression equations have the smallest standard errors of prediction, and the 7Q20 regression equations have the largest standard errors of prediction. The regression equations are applicable only to stream sites with low flows unaffected by regulation from reservoirs and local diversions of flow and to drainage basins in specified ranges of basin characteristics. Caution is advised when applying the equations for basins with characteristics near the applicable limits and for basins with karst drainage features.
NASA Technical Reports Server (NTRS)
Krajewski, Witold F.; Rexroth, David T.; Kiriaki, Kiriakie
1991-01-01
Two problems related to radar rainfall estimation are described. The first part is a description of a preliminary data analysis for the purpose of statistical estimation of rainfall from multiple (radar and raingage) sensors. Raingage, radar, and joint radar-raingage estimation is described, and some results are given. Statistical parameters of rainfall spatial dependence are calculated and discussed in the context of optimal estimation. Quality control of radar data is also described. The second part describes radar scattering by ellipsoidal raindrops. An analytical solution is derived for the Rayleigh scattering regime. Single and volume scattering are presented. Comparison calculations with the known results for spheres and oblate spheroids are shown.
Equations for estimating selected streamflow statistics in Rhode Island
Bent, Gardner C.; Steeves, Peter A.; Waite, Andrew M.
2014-01-01
The equations, which are based on data from streams with little to no flow alterations, will provide an estimate of the natural flows for a selected site. They will not estimate flows for altered sites with dams, surface-water withdrawals, groundwater withdrawals (pumping wells), diversions, and wastewater discharges. If the equations are used to estimate streamflow statistics for altered sites, the user should adjust the flow estimates for the alterations. The regression equations should be used only for ungaged sites with drainage areas between 0.52 and 294 square miles and stream densities between 0.94 and 3.49 miles per square mile; these are the ranges of the explanatory variables in the equations.
Harris, Alexandre M.; DeGiorgio, Michael
2016-01-01
Gene diversity, or expected heterozygosity (H), is a common statistic for assessing genetic variation within populations. Estimation of this statistic decreases in accuracy and precision when individuals are related or inbred, due to increased dependence among allele copies in the sample. The original unbiased estimator of expected heterozygosity underestimates true population diversity in samples containing relatives, as it only accounts for sample size. More recently, a general unbiased estimator of expected heterozygosity was developed that explicitly accounts for related and inbred individuals in samples. Though unbiased, this estimator’s variance is greater than that of the original estimator. To address this issue, we introduce a general unbiased estimator of gene diversity for samples containing related or inbred individuals, which employs the best linear unbiased estimator of allele frequencies, rather than the commonly used sample proportion. We examine the properties of this estimator, H∼BLUE, relative to alternative estimators using simulations and theoretical predictions, and show that it predominantly has the smallest mean squared error relative to others. Further, we empirically assess the performance of H∼BLUE on a global human microsatellite dataset of 5795 individuals, from 267 populations, genotyped at 645 loci. Additionally, we show that the improved variance of H∼BLUE leads to improved estimates of the population differentiation statistic, FST, which employs measures of gene diversity within its calculation. Finally, we provide an R script, BestHet, to compute this estimator from genomic and pedigree data. PMID:28040781
Covariance estimation in Terms of Stokes Parameters with Application to Vector Sensor Imaging
2016-12-15
S. Klein, “HF Vector Sensor for Radio Astronomy : Ground Testing Results,” in AIAA SPACE 2016, ser. AIAA SPACE Forum, American Institute of... astronomy ,” in 2016 IEEE Aerospace Conference, Mar. 2016, pp. 1–17. doi: 10.1109/ AERO.2016.7500688. [4] K.-C. Ho, K.-C. Tan, and B. T. G. Tan, “Estimation of...Statistical Imaging in Radio Astronomy via an Expectation-Maximization Algorithm for Structured Covariance Estimation,” in Statistical Methods in Imaging: IN
Maximum a posteriori decoder for digital communications
NASA Technical Reports Server (NTRS)
Altes, Richard A. (Inventor)
1997-01-01
A system and method for decoding by identification of the most likely phase coded signal corresponding to received data. The present invention has particular application to communication with signals that experience spurious random phase perturbations. The generalized estimator-correlator uses a maximum a posteriori (MAP) estimator to generate phase estimates for correlation with incoming data samples and for correlation with mean phases indicative of unique hypothesized signals. The result is a MAP likelihood statistic for each hypothesized transmission, wherein the highest value statistic identifies the transmitted signal.
NASA Langley's Approach to the Sandia's Structural Dynamics Challenge Problem
NASA Technical Reports Server (NTRS)
Horta, Lucas G.; Kenny, Sean P.; Crespo, Luis G.; Elliott, Kenny B.
2007-01-01
The objective of this challenge is to develop a data-based probabilistic model of uncertainty to predict the behavior of subsystems (payloads) by themselves and while coupled to a primary (target) system. Although this type of analysis is routinely performed and representative of issues faced in real-world system design and integration, there are still several key technical challenges that must be addressed when analyzing uncertain interconnected systems. For example, one key technical challenge is related to the fact that there is limited data on target configurations. Moreover, it is typical to have multiple data sets from experiments conducted at the subsystem level, but often samples sizes are not sufficient to compute high confidence statistics. In this challenge problem additional constraints are placed as ground rules for the participants. One such rule is that mathematical models of the subsystem are limited to linear approximations of the nonlinear physics of the problem at hand. Also, participants are constrained to use these models and the multiple data sets to make predictions about the target system response under completely different input conditions. Our approach involved initially the screening of several different methods. Three of the ones considered are presented herein. The first one is based on the transformation of the modal data to an orthogonal space where the mean and covariance of the data are matched by the model. The other two approaches worked solutions in physical space where the uncertain parameter set is made of masses, stiffnesses and damping coefficients; one matches confidence intervals of low order moments of the statistics via optimization while the second one uses a Kernel density estimation approach. The paper will touch on all the approaches, lessons learned, validation 1 metrics and their comparison, data quantity restriction, and assumptions/limitations of each approach. Keywords: Probabilistic modeling, model validation, uncertainty quantification, kernel density
Park, Chan Hyuk; Kim, Eun Hye; Roh, Yun Ho; Kim, Ha Yan; Lee, Sang Kil
2014-01-01
Background Although many case reports have described patients with proton pump inhibitor (PPI)-induced hypomagnesemia, the impact of PPI use on hypomagnesemia has not been fully clarified through comparative studies. We aimed to evaluate the association between the use of PPI and the risk of developing hypomagnesemia by conducting a systematic review with meta-analysis. Methods We conducted a systematic search of MEDLINE, EMBASE, and the Cochrane Library using the primary keywords “proton pump,” “dexlansoprazole,” “esomeprazole,” “ilaprazole,” “lansoprazole,” “omeprazole,” “pantoprazole,” “rabeprazole,” “hypomagnesemia,” “hypomagnesaemia,” and “magnesium.” Studies were included if they evaluated the association between PPI use and hypomagnesemia and reported relative risks or odds ratios or provided data for their estimation. Pooled odds ratios with 95% confidence intervals were calculated using the random effects model. Statistical heterogeneity was assessed with Cochran’s Q test and I 2 statistics. Results Nine studies including 115,455 patients were analyzed. The median Newcastle-Ottawa quality score for the included studies was seven (range, 6–9). Among patients taking PPIs, the median proportion of patients with hypomagnesemia was 27.1% (range, 11.3–55.2%) across all included studies. Among patients not taking PPIs, the median proportion of patients with hypomagnesemia was 18.4% (range, 4.3–52.7%). On meta-analysis, pooled odds ratio for PPI use was found to be 1.775 (95% confidence interval 1.077–2.924). Significant heterogeneity was identified using Cochran’s Q test (df = 7, P<0.001, I 2 = 98.0%). Conclusions PPI use may increase the risk of hypomagnesemia. However, significant heterogeneity among the included studies prevented us from reaching a definitive conclusion. PMID:25394217
Southard, Rodney E.
2013-01-01
The weather and precipitation patterns in Missouri vary considerably from year to year. In 2008, the statewide average rainfall was 57.34 inches and in 2012, the statewide average rainfall was 30.64 inches. This variability in precipitation and resulting streamflow in Missouri underlies the necessity for water managers and users to have reliable streamflow statistics and a means to compute select statistics at ungaged locations for a better understanding of water availability. Knowledge of surface-water availability is dependent on the streamflow data that have been collected and analyzed by the U.S. Geological Survey for more than 100 years at approximately 350 streamgages throughout Missouri. The U.S. Geological Survey, in cooperation with the Missouri Department of Natural Resources, computed streamflow statistics at streamgages through the 2010 water year, defined periods of drought and defined methods to estimate streamflow statistics at ungaged locations, and developed regional regression equations to compute selected streamflow statistics at ungaged locations. Streamflow statistics and flow durations were computed for 532 streamgages in Missouri and in neighboring States of Missouri. For streamgages with more than 10 years of record, Kendall’s tau was computed to evaluate for trends in streamflow data. If trends were detected, the variable length method was used to define the period of no trend. Water years were removed from the dataset from the beginning of the record for a streamgage until no trend was detected. Low-flow frequency statistics were then computed for the entire period of record and for the period of no trend if 10 or more years of record were available for each analysis. Three methods are presented for computing selected streamflow statistics at ungaged locations. The first method uses power curve equations developed for 28 selected streams in Missouri and neighboring States that have multiple streamgages on the same streams. Statistical estimates on one of these streams can be calculated at an ungaged location that has a drainage area that is between 40 percent of the drainage area of the farthest upstream streamgage and within 150 percent of the drainage area of the farthest downstream streamgage along the stream of interest. The second method may be used on any stream with a streamgage that has operated for 10 years or longer and for which anthropogenic effects have not changed the low-flow characteristics at the ungaged location since collection of the streamflow data. A ratio of drainage area of the stream at the ungaged location to the drainage area of the stream at the streamgage was computed to estimate the statistic at the ungaged location. The range of applicability is between 40- and 150-percent of the drainage area of the streamgage, and the ungaged location must be located on the same stream as the streamgage. The third method uses regional regression equations to estimate selected low-flow frequency statistics for unregulated streams in Missouri. This report presents regression equations to estimate frequency statistics for the 10-year recurrence interval and for the N-day durations of 1, 2, 3, 7, 10, 30, and 60 days. Basin and climatic characteristics were computed using geographic information system software and digital geospatial data. A total of 35 characteristics were computed for use in preliminary statewide and regional regression analyses based on existing digital geospatial data and previous studies. Spatial analyses for geographical bias in the predictive accuracy of the regional regression equations defined three low-flow regions with the State representing the three major physiographic provinces in Missouri. Region 1 includes the Central Lowlands, Region 2 includes the Ozark Plateaus, and Region 3 includes the Mississippi Alluvial Plain. A total of 207 streamgages were used in the regression analyses for the regional equations. Of the 207 U.S. Geological Survey streamgages, 77 were located in Region 1, 120 were located in Region 2, and 10 were located in Region 3. Streamgages located outside of Missouri were selected to extend the range of data used for the independent variables in the regression analyses. Streamgages included in the regression analyses had 10 or more years of record and were considered to be affected minimally by anthropogenic activities or trends. Regional regression analyses identified three characteristics as statistically significant for the development of regional equations. For Region 1, drainage area, longest flow path, and streamflow-variability index were statistically significant. The range in the standard error of estimate for Region 1 is 79.6 to 94.2 percent. For Region 2, drainage area and streamflow variability index were statistically significant, and the range in the standard error of estimate is 48.2 to 72.1 percent. For Region 3, drainage area and streamflow-variability index also were statistically significant with a range in the standard error of estimate of 48.1 to 96.2 percent. Limitations on the use of estimating low-flow frequency statistics at ungaged locations are dependent on the method used. The first method outlined for use in Missouri, power curve equations, were developed to estimate the selected statistics for ungaged locations on 28 selected streams with multiple streamgages located on the same stream. A second method uses a drainage-area ratio to compute statistics at an ungaged location using data from a single streamgage on the same stream with 10 or more years of record. Ungaged locations on these streams may use the ratio of the drainage area at an ungaged location to the drainage area at a streamgage location to scale the selected statistic value from the streamgage location to the ungaged location. This method can be used if the drainage area of the ungaged location is within 40 to 150 percent of the streamgage drainage area. The third method is the use of the regional regression equations. The limits for the use of these equations are based on the ranges of the characteristics used as independent variables and that streams must be affected minimally by anthropogenic activities.
Using Sentinel-1 and Landsat 8 satellite images to estimate surface soil moisture content.
NASA Astrophysics Data System (ADS)
Mexis, Philippos-Dimitrios; Alexakis, Dimitrios D.; Daliakopoulos, Ioannis N.; Tsanis, Ioannis K.
2016-04-01
Nowadays, the potential for more accurate assessment of Soil Moisture (SM) content exploiting Earth Observation (EO) technology, by exploring the use of synergistic approaches among a variety of EO instruments has emerged. This study is the first to investigate the potential of Synthetic Aperture Radar (SAR) (Sentinel-1) and optical (Landsat 8) images in combination with ground measurements to estimate volumetric SM content in support of water management and agricultural practices. SAR and optical data are downloaded and corrected in terms of atmospheric, geometric and radiometric corrections. SAR images are also corrected in terms of roughness and vegetation with the synergistic use of Oh and Topp models using a dataset consisting of backscattering coefficients and corresponding direct measurements of ground parameters (moisture, roughness). Following, various vegetation indices (NDVI, SAVI, MSAVI, EVI, etc.) are estimated to record diachronically the vegetation regime within the study area and as auxiliary data in the final modeling. Furthermore, thermal images from optical data are corrected and incorporated to the overall approach. The basic principle of Thermal InfraRed (TIR) method is that Land Surface Temperature (LST) is sensitive to surface SM content due to its impact on surface heating process (heat capacity and thermal conductivity) under bare soil or sparse vegetation cover conditions. Ground truth data are collected from a Time-domain reflectometer (TRD) gauge network established in western Crete, Greece, during 2015. Sophisticated algorithms based on Artificial Neural Networks (ANNs) and Multiple Linear Regression (MLR) approaches are used to explore the statistical relationship between backscattering measurements and SM content. Results highlight the potential of SAR and optical satellite images to contribute to effective SM content detection in support of water resources management and precision agriculture. Keywords: Sentinel-1, Landsat 8, Soil moisture content, Artificial Neural Network, Multiple Linear Regression The study was fully supported by the CASCADE project. The CASCADE Project is financed by the European Commission FP7 program, ENV.2011.2.1.4-2 - 'Behaviour of ecosystems, thresholds and tipping points', EU Grant agreement: 283068.
Virtual water balance estimation in Tunisia
NASA Astrophysics Data System (ADS)
Stambouli, Talel; Benalaya, Abdallah; Ghezal, Lamia; Ali, Chebil; Hammami, Rifka; Souissi, Asma
2015-04-01
The water in Tunisia is limited and unevenly distributed in the different regions, especially in arid zones. In fact, the annual rainfall average varies from less than 100 mm in the extreme South to over 1500 mm in the extreme North of the country. Currently, the conventional potential of water resources of the country is estimated about 4.84 billion m³ / year of which 2.7 billion cubic meters / year of surface water and 2.14 billion cubic meters / year of groundwater, characterizing a structural shortage for water safety in Tunisia (under 500m3/inhabitant/year). With over than 80% of water volumes have been mobilized for agriculture. The virtual water concept, defined by Allan (1997), as the amount of water needed to generate a product of both natural and artificial origin, this concept establish a similarity between product marketing and water trade. Given the influence of water in food production, virtual water studies focus generally on food products. At a global scale, the influence of these product's markets with water management was not seen. Influence has appreciated only by analyzing water-scarce countries, but at the detail level, should be increased, as most studies consider a country as a single geographical point, leading to considerable inaccuracies. The main objective of this work is the virtual water balance estimation of strategic crops in Tunisia (both irrigated and dry crops) to determine their influence on the water resources management and to establish patterns for improving it. The virtual water balance was performed basing on farmer's surveys, crop and meteorological data, irrigation management and regional statistics. Results show that the majority of farmers realize a waste of the irrigation water especially at the vegetable crops and fruit trees. Thus, a good control of the cultural package may result in lower quantities of water used by crops while ensuring good production with a suitable economic profitability. Then, the virtual water concept integration in the production systems choice and policies affecting the use of water is very useful to save over this scarce resource and to support farmers in their production activities and maintaining the sustainability of farms. Keywords: Virtual water, water balance, irrigation, Tunisia
ESIP Documentation Cluster Session: GCMD Keyword Update
NASA Technical Reports Server (NTRS)
Stevens, Tyler
2018-01-01
The Global Change Master Directory (GCMD) Keywords are a hierarchical set of controlled Earth Science vocabularies that help ensure Earth science data and services are described in a consistent and comprehensive manner and allow for the precise searching of collection-level metadata and subsequent retrieval of data and services. Initiated over twenty years ago, the GCMD Keywords are periodically analyzed for relevancy and will continue to be refined and expanded in response to user needs. This talk explores the current status of the GCMD keywords, the value and usage that the keywords bring to different tools/agencies as it relates to data discovery, and how the keywords relate to SWEET (Semantic Web for Earth and Environmental Terminology) Ontologies.
Hill, Timothy; Chocholek, Melanie; Clement, Robert
2017-06-01
Eddy covariance (EC) continues to provide invaluable insights into the dynamics of Earth's surface processes. However, despite its many strengths, spatial replication of EC at the ecosystem scale is rare. High equipment costs are likely to be partially responsible. This contributes to the low sampling, and even lower replication, of ecoregions in Africa, Oceania (excluding Australia) and South America. The level of replication matters as it directly affects statistical power. While the ergodicity of turbulence and temporal replication allow an EC tower to provide statistically robust flux estimates for its footprint, these principles do not extend to larger ecosystem scales. Despite the challenge of spatially replicating EC, it is clearly of interest to be able to use EC to provide statistically robust flux estimates for larger areas. We ask: How much spatial replication of EC is required for statistical confidence in our flux estimates of an ecosystem? We provide the reader with tools to estimate the number of EC towers needed to achieve a given statistical power. We show that for a typical ecosystem, around four EC towers are needed to have 95% statistical confidence that the annual flux of an ecosystem is nonzero. Furthermore, if the true flux is small relative to instrument noise and spatial variability, the number of towers needed can rise dramatically. We discuss approaches for improving statistical power and describe one solution: an inexpensive EC system that could help by making spatial replication more affordable. However, we note that diverting limited resources from other key measurements in order to allow spatial replication may not be optimal, and a balance needs to be struck. While individual EC towers are well suited to providing fluxes from the flux footprint, we emphasize that spatial replication is essential for statistically robust fluxes if a wider ecosystem is being studied. © 2016 The Authors Global Change Biology Published by John Wiley & Sons Ltd.
Measurement and Estimation of Riverbed Scour in a Mountain River
NASA Astrophysics Data System (ADS)
Song, L. A.; Chan, H. C.; Chen, B. A.
2016-12-01
Mountains are steep with rapid flows in Taiwan. After installing a structure in a mountain river, scour usually occurs around the structure because of the high energy gradient. Excessive scouring has been reported as one of the main causes of failure of river structures. The scouring disaster related to the flood can be reduced if the riverbed variation can be properly evaluated based on the flow conditions. This study measures the riverbed scour by using an improved "float-out device". Scouring and hydrodynamic data were simultaneously collected in the Mei River, Nantou County located in central Taiwan. The semi-empirical models proposed by previous researchers were used to estimate the scour depths based on the measured flow characteristics. The differences between the measured and estimated scour depths were discussed. Attempts were then made to improve the estimating results by developing a semi-empirical model to predict the riverbed scour based on the local field data. It is expected to setup a warning system of river structure safety by using the flow conditions. Keywords: scour, model, float-out device
Arcos-Machancoses, J V; Ruiz Hernández, C; Martin de Carpi, J; Pinillos Pisón, S
2018-02-09
Congenital diaphragmatic hernia survivors are a well-known group at risk for developing gastroesophageal reflux disease that may be particularly long-term severe. The aim of this study is to provide a systematic review of the prevalence of gastroesophageal reflux in infant and children survivors treated for congenital diaphragmatic hernia.Electronic and manual searches were performed with keywords related to congenital diaphragmatic hernia, gastroesophageal reflux disease, and epidemiology terms. Summary estimates of the prevalence were calculated. Effect model was chosen depending on heterogeneity (I2). Factors potentially related with the prevalence, including study quality or the diagnostic strategy followed, were assessed by subgroup and meta-regression analyses. Risk of publication bias was studied by funnel plot analysis and the Egger test.The search yielded 140 articles, 26 of which were included in the analyses and provided 34 estimates of prevalence: 21 in patients aged 12 months or younger, and 13 in older children. The overall prevalence of gastroesophageal reflux disease in infants was 52.7% (95% confidence interval [CI]: 43.2% to 62.1%, I2 = 88.7%) and, in children over 1 year old, 35.1% (95% CI: 25.4% to 45.3%, I2 = 73.5%). Significant clinical and statistical heterogeneity was found. The strategy chosen for gastroesophageal reflux diagnosis influenced the reported prevalence. The only estimate obtained with a systematic use of multichannel intraluminal impedance provided a higher prevalence in both age groups: 83.3% (95% CI: 67.2% to 93.6%) and 61.1% (95% CI: 43.5% to 76.9%) respectively. This last prevalence did not significantly differ from that obtained using only low risk of bias estimates.As a conclusion, gastroesophageal reflux disease is commonly observed after congenital diaphragmatic hernia repair and is almost constantly present in the first months of life. It may be underdiagnosed if systematically esophageal monitoring is not performed. This should be considered when proposing follow-up and management protocols for congenital diaphragmatic hernia survivors. © The Author(s) 2018. Published by Oxford University Press on behalf of International Society for Diseases of the Esophagus. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Combined evaluation of optical and microwave satellite dataset for soil moisture deficit estimation
NASA Astrophysics Data System (ADS)
Srivastava, Prashant K.; Han, Dawei; Islam, Tanvir; Singh, Sudhir Kumar; Gupta, Manika; Gupta, Dileep Kumar; Kumar, Pradeep
2016-04-01
Soil moisture is a key variable responsible for water and energy exchanges from land surface to the atmosphere (Srivastava et al., 2014). On the other hand, Soil Moisture Deficit (or SMD) can help regulating the proper use of water at specified time to avoid any agricultural losses (Srivastava et al., 2013b) and could help in preventing natural disasters, e.g. flood and drought (Srivastava et al., 2013a). In this study, evaluation of Moderate Resolution Imaging Spectroradiometer (MODIS) Land Surface Temperature (LST) and soil moisture from Soil Moisture and Ocean Salinity (SMOS) satellites are attempted for prediction of Soil Moisture Deficit (SMD). Sophisticated algorithm like Adaptive Neuro Fuzzy Inference System (ANFIS) is used for prediction of SMD using the MODIS and SMOS dataset. The benchmark SMD estimated from Probability Distributed Model (PDM) over the Brue catchment, Southwest of England, U.K. is used for all the validation. The performances are assessed in terms of Nash Sutcliffe Efficiency, Root Mean Square Error and the percentage of bias between ANFIS simulated SMD and the benchmark. The performance statistics revealed a good agreement between benchmark and the ANFIS estimated SMD using the MODIS dataset. The assessment of the products with respect to this peculiar evidence is an important step for successful development of hydro-meteorological model and forecasting system. The analysis of the satellite products (viz. SMOS soil moisture and MODIS LST) towards SMD prediction is a crucial step for successful hydrological modelling, agriculture and water resource management, and can provide important assistance in policy and decision making. Keywords: Land Surface Temperature, MODIS, SMOS, Soil Moisture Deficit, Fuzzy Logic System References: Srivastava, P.K., Han, D., Ramirez, M.A., Islam, T., 2013a. Appraisal of SMOS soil moisture at a catchment scale in a temperate maritime climate. Journal of Hydrology 498, 292-304. Srivastava, P.K., Han, D., Rico-Ramirez, M.A., Al-Shrafany, D., Islam, T., 2013b. Data fusion techniques for improving soil moisture deficit using SMOS satellite and WRF-NOAH land surface model. Water Resources Management 27, 5069-5087. Srivastava, P.K., Han, D., Rico-Ramirez, M.A., O'Neill, P., Islam, T., Gupta, M., 2014. Assessment of SMOS soil moisture retrieval parameters using tau-omega algorithms for soil moisture deficit estimation. Journal of Hydrology 519, 574-587.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hogan, Craig
It is argued by extrapolation of general relativity and quantum mechanics that a classical inertial frame corresponds to a statistically defined observable that rotationally fluctuates due to Planck scale indeterminacy. Physical effects of exotic nonlocal rotational correlations on large scale field states are estimated. Their entanglement with the strong interaction vacuum is estimated to produce a universal, statistical centrifugal acceleration that resembles the observed cosmological constant.
ERIC Educational Resources Information Center
Poon, Wai-Yin; Wong, Yuen-Kwan
2004-01-01
This study uses a Cook's distance type diagnostic statistic to identify unusual observations in a data set that unduly influence the estimation of a covariance matrix. Similar to many other deletion-type diagnostic statistics, this proposed measure is susceptible to masking or swamping effect in the presence of several unusual observations. In…
Targeted On-Demand Team Performance App Development
2016-10-01
from three sites; 6) Preliminary analysis indicates larger than estimate effect size and study is sufficiently powered for generalizable outcomes...statistical analyses, and examine any resulting qualitative data for trends or connections to statistical outcomes. On Schedule 21 Predictive...Preliminary analysis indicates larger than estimate effect size and study is sufficiently powered for generalizable outcomes. What opportunities for
Statistical estimators for monitoring spotted owls in Oregon and Washington in 1987.
Tlmothy A. Max; Ray A. Souter; Kathleen A. O' Halloran
1990-01-01
Spotted owls (Strix occidentalis) were monitored on 11 National Forests in the Pacific Northwest Region of the USDA Forest Service between March and August of 1987. The basic intent of monitoring was to provide estimates of occupancy and reproduction rates for pairs of spotted owls. This paper documents the technical details of the statistical...
A smoothed residual based goodness-of-fit statistic for nest-survival models
Rodney X. Sturdivant; Jay J. Rotella; Robin E. Russell
2008-01-01
Estimating nest success and identifying important factors related to nest-survival rates is an essential goal for many wildlife researchers interested in understanding avian population dynamics. Advances in statistical methods have led to a number of estimation methods and approaches to modeling this problem. Recently developed models allow researchers to include a...
Fetal Alcohol Spectrum Disorders (FASDs): Data and Statistics
... alcohol screening and counseling for all women Data & Statistics Recommend on Facebook Tweet Share Compartir Prevalence of ... conducted annually by the National Center for Health Statistics (NCHS), CDC, to produce national estimates for a ...
Toppi, J; Petti, M; Vecchiato, G; Cincotti, F; Salinari, S; Mattia, D; Babiloni, F; Astolfi, L
2013-01-01
Partial Directed Coherence (PDC) is a spectral multivariate estimator for effective connectivity, relying on the concept of Granger causality. Even if its original definition derived directly from information theory, two modifies were introduced in order to provide better physiological interpretations of the estimated networks: i) normalization of the estimator according to rows, ii) squared transformation. In the present paper we investigated the effect of PDC normalization on the performances achieved by applying the statistical validation process on investigated connectivity patterns under different conditions of Signal to Noise ratio (SNR) and amount of data available for the analysis. Results of the statistical analysis revealed an effect of PDC normalization only on the percentages of type I and type II errors occurred by using Shuffling procedure for the assessment of connectivity patterns. No effects of the PDC formulation resulted on the performances achieved during the validation process executed instead by means of Asymptotic Statistic approach. Moreover, the percentages of both false positives and false negatives committed by Asymptotic Statistic are always lower than those achieved by Shuffling procedure for each type of normalization.
Estimating and comparing microbial diversity in the presence of sequencing errors
Chiu, Chun-Huo
2016-01-01
Estimating and comparing microbial diversity are statistically challenging due to limited sampling and possible sequencing errors for low-frequency counts, producing spurious singletons. The inflated singleton count seriously affects statistical analysis and inferences about microbial diversity. Previous statistical approaches to tackle the sequencing errors generally require different parametric assumptions about the sampling model or about the functional form of frequency counts. Different parametric assumptions may lead to drastically different diversity estimates. We focus on nonparametric methods which are universally valid for all parametric assumptions and can be used to compare diversity across communities. We develop here a nonparametric estimator of the true singleton count to replace the spurious singleton count in all methods/approaches. Our estimator of the true singleton count is in terms of the frequency counts of doubletons, tripletons and quadrupletons, provided these three frequency counts are reliable. To quantify microbial alpha diversity for an individual community, we adopt the measure of Hill numbers (effective number of taxa) under a nonparametric framework. Hill numbers, parameterized by an order q that determines the measures’ emphasis on rare or common species, include taxa richness (q = 0), Shannon diversity (q = 1, the exponential of Shannon entropy), and Simpson diversity (q = 2, the inverse of Simpson index). A diversity profile which depicts the Hill number as a function of order q conveys all information contained in a taxa abundance distribution. Based on the estimated singleton count and the original non-singleton frequency counts, two statistical approaches (non-asymptotic and asymptotic) are developed to compare microbial diversity for multiple communities. (1) A non-asymptotic approach refers to the comparison of estimated diversities of standardized samples with a common finite sample size or sample completeness. This approach aims to compare diversity estimates for equally-large or equally-complete samples; it is based on the seamless rarefaction and extrapolation sampling curves of Hill numbers, specifically for q = 0, 1 and 2. (2) An asymptotic approach refers to the comparison of the estimated asymptotic diversity profiles. That is, this approach compares the estimated profiles for complete samples or samples whose size tends to be sufficiently large. It is based on statistical estimation of the true Hill number of any order q ≥ 0. In the two approaches, replacing the spurious singleton count by our estimated count, we can greatly remove the positive biases associated with diversity estimates due to spurious singletons and also make fair comparisons across microbial communities, as illustrated in our simulation results and in applying our method to analyze sequencing data from viral metagenomes. PMID:26855872
Mittal, Manish; Harrison, Donald L; Thompson, David M; Miller, Michael J; Farmer, Kevin C; Ng, Yu-Tze
2016-01-01
While the choice of analytical approach affects study results and their interpretation, there is no consensus to guide the choice of statistical approaches to evaluate public health policy change. This study compared and contrasted three statistical estimation procedures in the assessment of a U.S. Food and Drug Administration (FDA) suicidality warning, communicated in January 2008 and implemented in May 2009, on antiepileptic drug (AED) prescription claims. Longitudinal designs were utilized to evaluate Oklahoma (U.S. State) Medicaid claim data from January 2006 through December 2009. The study included 9289 continuously eligible individuals with prevalent diagnoses of epilepsy and/or psychiatric disorder. Segmented regression models using three estimation procedures [i.e., generalized linear models (GLM), generalized estimation equations (GEE), and generalized linear mixed models (GLMM)] were used to estimate trends of AED prescription claims across three time periods: before (January 2006-January 2008); during (February 2008-May 2009); and after (June 2009-December 2009) the FDA warning. All three statistical procedures estimated an increasing trend (P < 0.0001) in AED prescription claims before the FDA warning period. No procedures detected a significant change in trend during (GLM: -30.0%, 99% CI: -60.0% to 10.0%; GEE: -20.0%, 99% CI: -70.0% to 30.0%; GLMM: -23.5%, 99% CI: -58.8% to 1.2%) and after (GLM: 50.0%, 99% CI: -70.0% to 160.0%; GEE: 80.0%, 99% CI: -20.0% to 200.0%; GLMM: 47.1%, 99% CI: -41.2% to 135.3%) the FDA warning when compared to pre-warning period. Although the three procedures provided consistent inferences, the GEE and GLMM approaches accounted appropriately for correlation. Further, marginal models estimated using GEE produced more robust and valid population-level estimations. Copyright © 2016 Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Asai, Kikuo; Kondo, Kimio; Kobayashi, Hideaki; Saito, Fumihiko
We developed a prototype system to support telecommunication by using keywords selected by the speaker in a videoconference. In the traditional presentation style, a speaker talks and uses audiovisual materials, and the audience at remote sites looks at these materials. Unfortunately, the audience often loses concentration and attention during the talk. To overcome this problem, we investigate a keyword presentation style, in which the speaker holds keyword cards that enable the audience to see additional information. Although keyword captions were originally intended for use in video materials for learning foreign languages, they can also be used to improve the quality of distance lectures in videoconferences. Our prototype system recognizes printed keywords in a video image at a server, and transfers the data to clients as multimedia functions such as language translation, three-dimensional (3D) model visualization, and audio reproduction. The additional information is collocated to the keyword cards in the display window, thus forming a spatial relationship between them. We conducted an experiment to investigate the properties of the keyword presentation style for an audience. The results suggest the potential of the keyword presentation style for improving the audience's concentration and attention in distance lectures by providing an environment that facilitates eye contact during videoconferencing.
Attribute-Based Proxy Re-Encryption with Keyword Search
Shi, Yanfeng; Liu, Jiqiang; Han, Zhen; Zheng, Qingji; Zhang, Rui; Qiu, Shuo
2014-01-01
Keyword search on encrypted data allows one to issue the search token and conduct search operations on encrypted data while still preserving keyword privacy. In the present paper, we consider the keyword search problem further and introduce a novel notion called attribute-based proxy re-encryption with keyword search (), which introduces a promising feature: In addition to supporting keyword search on encrypted data, it enables data owners to delegate the keyword search capability to some other data users complying with the specific access control policy. To be specific, allows (i) the data owner to outsource his encrypted data to the cloud and then ask the cloud to conduct keyword search on outsourced encrypted data with the given search token, and (ii) the data owner to delegate other data users keyword search capability in the fine-grained access control manner through allowing the cloud to re-encrypted stored encrypted data with a re-encrypted data (embedding with some form of access control policy). We formalize the syntax and security definitions for , and propose two concrete constructions for : key-policy and ciphertext-policy . In the nutshell, our constructions can be treated as the integration of technologies in the fields of attribute-based cryptography and proxy re-encryption cryptography. PMID:25549257
Attribute-based proxy re-encryption with keyword search.
Shi, Yanfeng; Liu, Jiqiang; Han, Zhen; Zheng, Qingji; Zhang, Rui; Qiu, Shuo
2014-01-01
Keyword search on encrypted data allows one to issue the search token and conduct search operations on encrypted data while still preserving keyword privacy. In the present paper, we consider the keyword search problem further and introduce a novel notion called attribute-based proxy re-encryption with keyword search (ABRKS), which introduces a promising feature: In addition to supporting keyword search on encrypted data, it enables data owners to delegate the keyword search capability to some other data users complying with the specific access control policy. To be specific, ABRKS allows (i) the data owner to outsource his encrypted data to the cloud and then ask the cloud to conduct keyword search on outsourced encrypted data with the given search token, and (ii) the data owner to delegate other data users keyword search capability in the fine-grained access control manner through allowing the cloud to re-encrypted stored encrypted data with a re-encrypted data (embedding with some form of access control policy). We formalize the syntax and security definitions for ABRKS, and propose two concrete constructions for ABRKS: key-policy ABRKS and ciphertext-policy ABRKS. In the nutshell, our constructions can be treated as the integration of technologies in the fields of attribute-based cryptography and proxy re-encryption cryptography.
Estimating liver cancer deaths in Thailand based on verbal autopsy study.
Waeto, Salwa; Pipatjaturon, Nattakit; Tongkumchum, Phattrawan; Choonpradub, Chamnein; Saelim, Rattikan; Makaje, Nifatamah
2014-01-01
Liver cancer mortality is high in Thailand but utility of related vital statistics is limited due to national vital registration (VR) data being under reported for specific causes of deaths. Accurate methodologies and reliable supplementary data are needed to provide worthy national vital statistics. This study aimed to model liver cancer deaths based on verbal autopsy (VA) study in 2005 to provide more accurate estimates of liver cancer deaths than those reported. The results were used to estimate number of liver cancer deaths during 2000-2009. A verbal autopsy (VA) was carried out in 2005 based on a sample of 9,644 deaths from nine provinces and it provided reliable information on causes of deaths by gender, age group, location of deaths in or outside hospital, and causes of deaths of the VR database. Logistic regression was used to model liver cancer deaths and other variables. The estimated probabilities from the model were applied to liver cancer deaths in the VR database, 2000-2009. Thus, the more accurately VA-estimated numbers of liver cancer deaths were obtained. The model fits the data quite well with sensitivity 0.64. The confidence intervals from statistical model provide the estimates and their precisions. The VA-estimated numbers of liver cancer deaths were higher than the corresponding VR database with inflation factors 1.56 for males and 1.64 for females. The statistical methods used in this study can be applied to available mortality data in developing countries where their national vital registration data are of low quality and supplementary reliable data are available.
Estimating the deposition of urban atmospheric NO2 to the urban forest in Portland-Vancouver USA
NASA Astrophysics Data System (ADS)
Rao, M.; Gonzalez Abraham, R.; George, L. A.
2016-12-01
Cities are hotspots of atmospheric emissions of reactive nitrogen oxides, including nitrogen dioxide (NO2), a US EPA criteria pollutant that affects both human and environmental health. A fraction of this anthropogenic, atmospheric NO2 is deposited onto the urban forest, potentially mitigating the impact of NO2 on respiratory health within cities. However, the role of the urban forest in removal of atmospheric NO2 through deposition has not been well studied. Here, using an observationally-based statistical model, we first estimate the reduction of NO2 associated with the urban forest in Portland-Vancouver, USA, and the health benefits accruing from this reduction. In order to assess if this statistically observed reduction in NO2 associated with the urban forest is consistent with deposition, we then compare the amount of NO2 removed through deposition to the urban forest as estimated using a 4km CMAQ simulation. We further undertake a sensitivity analysis in CMAQ to estimate the range of NO2removed as a function of bulk stomatal resistance. We find that NO2 deposition estimated by CMAQ accounts for roughly one-third of the reduction in NO2 shown by the observationally-based statistical model (Figure). Our sensitivity analysis shows that a 3-10 fold increase in the bulk stomatal resistance parameter in CMAQ would align CMAQ-estimated deposition with the statistical model. The reduction of NO2 by the urban forest in the Portland-Vancouver area may yield a health benefit of at least $1.5 million USD annually, providing strong motivation to better understand the mechanism through which the urban forest may be removing air pollutants such as NO2and thus helping create healthier urban atmospheres. Figure: Comparing the amount of NO2 deposition as estimated by CMAQ and the observationally-based statistical model (LURF). Each point corresponds to a single 4 x 4km CMAQ grid cell.
[Flavouring estimation of quality of grape wines with use of methods of mathematical statistics].
Yakuba, Yu F; Khalaphyan, A A; Temerdashev, Z A; Bessonov, V V; Malinkin, A D
2016-01-01
The questions of forming of wine's flavour integral estimation during the tasting are discussed, the advantages and disadvantages of the procedures are declared. As investigating materials we used the natural white and red wines of Russian manufactures, which were made with the traditional technologies from Vitis Vinifera, straight hybrids, blending and experimental wines (more than 300 different samples). The aim of the research was to set the correlation between the content of wine's nonvolatile matter and wine's tasting quality rating by mathematical statistics methods. The content of organic acids, amino acids and cations in wines were considered as the main factors influencing on the flavor. Basically, they define the beverage's quality. The determination of those components in wine's samples was done by the electrophoretic method «CAPEL». Together with the analytical checking of wine's samples quality the representative group of specialists simultaneously carried out wine's tasting estimation using 100 scores system. The possibility of statistical modelling of correlation of wine's tasting estimation based on analytical data of amino acids and cations determination reasonably describing the wine's flavour was examined. The statistical modelling of correlation between the wine's tasting estimation and the content of major cations (ammonium, potassium, sodium, magnesium, calcium), free amino acids (proline, threonine, arginine) and the taking into account the level of influence on flavour and analytical valuation within fixed limits of quality accordance were done with Statistica. Adequate statistical models which are able to predict tasting estimation that is to determine the wine's quality using the content of components forming the flavour properties have been constructed. It is emphasized that along with aromatic (volatile) substances the nonvolatile matter - mineral substances and organic substances - amino acids such as proline, threonine, arginine influence on wine's flavour properties. It has been shown the nonvolatile components contribute in organoleptic and flavour quality estimation of wines as aromatic volatile substances but they take part in forming the expert's evaluation.
NASA Technical Reports Server (NTRS)
Houston, A. G.; Feiveson, A. H.; Chhikara, R. S.; Hsu, E. M. (Principal Investigator)
1979-01-01
A statistical methodology was developed to check the accuracy of the products of the experimental operations throughout crop growth and to determine whether the procedures are adequate to accomplish the desired accuracy and reliability goals. It has allowed the identification and isolation of key problems in wheat area yield estimation, some of which have been corrected and some of which remain to be resolved. The major unresolved problem in accuracy assessment is that of precisely estimating the bias of the LACIE production estimator. Topics covered include: (1) evaluation techniques; (2) variance and bias estimation for the wheat production estimate; (3) the 90/90 evaluation; (4) comparison of the LACIE estimate with reference standards; and (5) first and second order error source investigations.
Improved population estimates through the use of auxiliary information
Johnson, D.H.; Ralph, C.J.; Scott, J.M.
1981-01-01
When estimating the size of a population of birds, the investigator may have, in addition to an estimator based on a statistical sample, information on one of several auxiliary variables, such as: (1) estimates of the population made on previous occasions, (2) measures of habitat variables associated with the size of the population, and (3) estimates of the population sizes of other species that correlate with the species of interest. Although many studies have described the relationships between each of these kinds of data and the population size to be estimated, very little work has been done to improve the estimator by incorporating such auxiliary information. A statistical methodology termed 'empirical Bayes' seems to be appropriate to these situations. The potential that empirical Bayes methodology has for improved estimation of the population size of the Mallard (Anas platyrhynchos) is explored. In the example considered, three empirical Bayes estimators were found to reduce the error by one-fourth to one-half of that of the usual estimator.
Explorations in Statistics: the Bootstrap
ERIC Educational Resources Information Center
Curran-Everett, Douglas
2009-01-01
Learning about statistics is a lot like learning about science: the learning is more meaningful if you can actively explore. This fourth installment of Explorations in Statistics explores the bootstrap. The bootstrap gives us an empirical approach to estimate the theoretical variability among possible values of a sample statistic such as the…
DOE Office of Scientific and Technical Information (OSTI.GOV)
al-Saffar, Sinan; Joslyn, Cliff A.; Chappell, Alan R.
As semantic datasets grow to be very large and divergent, there is a need to identify and exploit their inherent semantic structure for discovery and optimization. Towards that end, we present here a novel methodology to identify the semantic structures inherent in an arbitrary semantic graph dataset. We first present the concept of an extant ontology as a statistical description of the semantic relations present amongst the typed entities modeled in the graph. This serves as a model of the underlying semantic structure to aid in discovery and visualization. We then describe a method of ontological scaling in which themore » ontology is employed as a hierarchical scaling filter to infer different resolution levels at which the graph structures are to be viewed or analyzed. We illustrate these methods on three large and publicly available semantic datasets containing more than one billion edges each. Keywords-Semantic Web; Visualization; Ontology; Multi-resolution Data Mining;« less
Improvement of short-term numerical wind predictions
NASA Astrophysics Data System (ADS)
Bedard, Joel
Geophysic Model Output Statistics (GMOS) are developed to optimize the use of NWP for complex sites. GMOS differs from other MOS that are widely used by meteorological centers in the following aspects: it takes into account the surrounding geophysical parameters such as surface roughness, terrain height, etc., along with wind direction; it can be directly applied without any training, although training will further improve the results. The GMOS was applied to improve the Environment Canada GEM-LAM 2.5km forecasts at North Cape (PEI, Canada): It improves the predictions RMSE by 25-30% for all time horizons and almost all meteorological conditions; the topographic signature of the forecast error due to insufficient grid refinement is eliminated and the NWP combined with GMOS outperform the persistence from a 2h horizon, instead of 4h without GMOS. Finally, GMOS was applied at another site (Bouctouche, NB, Canada): similar improvements were observed, thus showing its general applicability. Keywords: wind energy, wind power forecast, numerical weather prediction, complex sites, model output statistics
Stability of INFIT and OUTFIT Compared to Simulated Estimates in Applied Setting.
Hodge, Kari J; Morgan, Grant B
Residual-based fit statistics are commonly used as an indication of the extent to which the item response data fit the Rash model. Fit statistic estimates are influenced by sample size and rules-of thumb estimates may result in incorrect conclusions about the extent to which the model fits the data. Estimates obtained in this analysis were compared to 250 simulated data sets to examine the stability of the estimates. All INFIT estimates were within the rule-of-thumb range of 0.7 to 1.3. However, only 82% of the INFIT estimates fell within the 2.5th and 97.5th percentile of the simulated item's INFIT distributions using this 95% confidence-like interval. This is a 18 percentage point difference in items that were classified as acceptable. Fourty-eight percent of OUTFIT estimates fell within the 0.7 to 1.3 rule- of-thumb range. Whereas 34% of OUTFIT estimates fell within the 2.5th and 97.5th percentile of the simulated item's OUTFIT distributions. This is a 13 percentage point difference in items that were classified as acceptable. When using the rule-of- thumb ranges for fit estimates the magnitude of misfit was smaller than with the 95% confidence interval of the simulated distribution. The findings indicate that the use of confidence intervals as critical values for fit statistics leads to different model data fit conclusions than traditional rule of thumb critical values.
Riley, Richard D.
2017-01-01
An important question for clinicians appraising a meta‐analysis is: are the findings likely to be valid in their own practice—does the reported effect accurately represent the effect that would occur in their own clinical population? To this end we advance the concept of statistical validity—where the parameter being estimated equals the corresponding parameter for a new independent study. Using a simple (‘leave‐one‐out’) cross‐validation technique, we demonstrate how we may test meta‐analysis estimates for statistical validity using a new validation statistic, Vn, and derive its distribution. We compare this with the usual approach of investigating heterogeneity in meta‐analyses and demonstrate the link between statistical validity and homogeneity. Using a simulation study, the properties of Vn and the Q statistic are compared for univariate random effects meta‐analysis and a tailored meta‐regression model, where information from the setting (included as model covariates) is used to calibrate the summary estimate to the setting of application. Their properties are found to be similar when there are 50 studies or more, but for fewer studies Vn has greater power but a higher type 1 error rate than Q. The power and type 1 error rate of Vn are also shown to depend on the within‐study variance, between‐study variance, study sample size, and the number of studies in the meta‐analysis. Finally, we apply Vn to two published meta‐analyses and conclude that it usefully augments standard methods when deciding upon the likely validity of summary meta‐analysis estimates in clinical practice. © 2017 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd. PMID:28620945
High throughput nonparametric probability density estimation.
Farmer, Jenny; Jacobs, Donald
2018-01-01
In high throughput applications, such as those found in bioinformatics and finance, it is important to determine accurate probability distribution functions despite only minimal information about data characteristics, and without using human subjectivity. Such an automated process for univariate data is implemented to achieve this goal by merging the maximum entropy method with single order statistics and maximum likelihood. The only required properties of the random variables are that they are continuous and that they are, or can be approximated as, independent and identically distributed. A quasi-log-likelihood function based on single order statistics for sampled uniform random data is used to empirically construct a sample size invariant universal scoring function. Then a probability density estimate is determined by iteratively improving trial cumulative distribution functions, where better estimates are quantified by the scoring function that identifies atypical fluctuations. This criterion resists under and over fitting data as an alternative to employing the Bayesian or Akaike information criterion. Multiple estimates for the probability density reflect uncertainties due to statistical fluctuations in random samples. Scaled quantile residual plots are also introduced as an effective diagnostic to visualize the quality of the estimated probability densities. Benchmark tests show that estimates for the probability density function (PDF) converge to the true PDF as sample size increases on particularly difficult test probability densities that include cases with discontinuities, multi-resolution scales, heavy tails, and singularities. These results indicate the method has general applicability for high throughput statistical inference.
High throughput nonparametric probability density estimation
Farmer, Jenny
2018-01-01
In high throughput applications, such as those found in bioinformatics and finance, it is important to determine accurate probability distribution functions despite only minimal information about data characteristics, and without using human subjectivity. Such an automated process for univariate data is implemented to achieve this goal by merging the maximum entropy method with single order statistics and maximum likelihood. The only required properties of the random variables are that they are continuous and that they are, or can be approximated as, independent and identically distributed. A quasi-log-likelihood function based on single order statistics for sampled uniform random data is used to empirically construct a sample size invariant universal scoring function. Then a probability density estimate is determined by iteratively improving trial cumulative distribution functions, where better estimates are quantified by the scoring function that identifies atypical fluctuations. This criterion resists under and over fitting data as an alternative to employing the Bayesian or Akaike information criterion. Multiple estimates for the probability density reflect uncertainties due to statistical fluctuations in random samples. Scaled quantile residual plots are also introduced as an effective diagnostic to visualize the quality of the estimated probability densities. Benchmark tests show that estimates for the probability density function (PDF) converge to the true PDF as sample size increases on particularly difficult test probability densities that include cases with discontinuities, multi-resolution scales, heavy tails, and singularities. These results indicate the method has general applicability for high throughput statistical inference. PMID:29750803
NASA Technical Reports Server (NTRS)
Bell, Thomas L.; Kundu, Prasun K.; Einaudi, Franco (Technical Monitor)
2000-01-01
Estimates from TRMM satellite data of monthly total rainfall over an area are subject to substantial sampling errors due to the limited number of visits to the area by the satellite during the month. Quantitative comparisons of TRMM averages with data collected by other satellites and by ground-based systems require some estimate of the size of this sampling error. A method of estimating this sampling error based on the actual statistics of the TRMM observations and on some modeling work has been developed. "Sampling error" in TRMM monthly averages is defined here relative to the monthly total a hypothetical satellite permanently stationed above the area would have reported. "Sampling error" therefore includes contributions from the random and systematic errors introduced by the satellite remote sensing system. As part of our long-term goal of providing error estimates for each grid point accessible to the TRMM instruments, sampling error estimates for TRMM based on rain retrievals from TRMM microwave (TMI) data are compared for different times of the year and different oceanic areas (to minimize changes in the statistics due to algorithmic differences over land and ocean). Changes in sampling error estimates due to changes in rain statistics due 1) to evolution of the official algorithms used to process the data, and 2) differences from other remote sensing systems such as the Defense Meteorological Satellite Program (DMSP) Special Sensor Microwave/Imager (SSM/I), are analyzed.
ERIC Educational Resources Information Center
Yang, Le
2016-01-01
This study analyzed digital item metadata and keywords from Internet search engines to learn what metadata elements actually facilitate discovery of digital collections through Internet keyword searching and how significantly each metadata element affects the discovery of items in a digital repository. The study found that keywords from Internet…
The Keyword Method of Vocabulary Acquisition: An Experimental Evaluation.
ERIC Educational Resources Information Center
Griffith, Douglas
The keyword method of vocabulary acquisition is a two-step mnemonic technique for learning vocabulary terms. The first step, the acoustic link, generates a keyword based on the sound of the foreign word. The second step, the imagery link, ties the keyword to the meaning of the item to be learned, via an interactive visual image or other…
An empirical analysis of the distribution of overshoots in a stationary Gaussian stochastic process
NASA Technical Reports Server (NTRS)
Carter, M. C.; Madison, M. W.
1973-01-01
The frequency distribution of overshoots in a stationary Gaussian stochastic process is analyzed. The primary processes involved in this analysis are computer simulation and statistical estimation. Computer simulation is used to simulate stationary Gaussian stochastic processes that have selected autocorrelation functions. An analysis of the simulation results reveals a frequency distribution for overshoots with a functional dependence on the mean and variance of the process. Statistical estimation is then used to estimate the mean and variance of a process. It is shown that for an autocorrelation function, the mean and the variance for the number of overshoots, a frequency distribution for overshoots can be estimated.
Kulesz, Paulina A.; Tian, Siva; Juranek, Jenifer; Fletcher, Jack M.; Francis, David J.
2015-01-01
Objective Weak structure-function relations for brain and behavior may stem from problems in estimating these relations in small clinical samples with frequently occurring outliers. In the current project, we focused on the utility of using alternative statistics to estimate these relations. Method Fifty-four children with spina bifida meningomyelocele performed attention tasks and received MRI of the brain. Using a bootstrap sampling process, the Pearson product moment correlation was compared with four robust correlations: the percentage bend correlation, the Winsorized correlation, the skipped correlation using the Donoho-Gasko median, and the skipped correlation using the minimum volume ellipsoid estimator Results All methods yielded similar estimates of the relations between measures of brain volume and attention performance. The similarity of estimates across correlation methods suggested that the weak structure-function relations previously found in many studies are not readily attributable to the presence of outlying observations and other factors that violate the assumptions behind the Pearson correlation. Conclusions Given the difficulty of assembling large samples for brain-behavior studies, estimating correlations using multiple, robust methods may enhance the statistical conclusion validity of studies yielding small, but often clinically significant, correlations. PMID:25495830
Kulesz, Paulina A; Tian, Siva; Juranek, Jenifer; Fletcher, Jack M; Francis, David J
2015-03-01
Weak structure-function relations for brain and behavior may stem from problems in estimating these relations in small clinical samples with frequently occurring outliers. In the current project, we focused on the utility of using alternative statistics to estimate these relations. Fifty-four children with spina bifida meningomyelocele performed attention tasks and received MRI of the brain. Using a bootstrap sampling process, the Pearson product-moment correlation was compared with 4 robust correlations: the percentage bend correlation, the Winsorized correlation, the skipped correlation using the Donoho-Gasko median, and the skipped correlation using the minimum volume ellipsoid estimator. All methods yielded similar estimates of the relations between measures of brain volume and attention performance. The similarity of estimates across correlation methods suggested that the weak structure-function relations previously found in many studies are not readily attributable to the presence of outlying observations and other factors that violate the assumptions behind the Pearson correlation. Given the difficulty of assembling large samples for brain-behavior studies, estimating correlations using multiple, robust methods may enhance the statistical conclusion validity of studies yielding small, but often clinically significant, correlations. PsycINFO Database Record (c) 2015 APA, all rights reserved.
An open source framework for tracking and state estimation ('Stone Soup')
NASA Astrophysics Data System (ADS)
Thomas, Paul A.; Barr, Jordi; Balaji, Bhashyam; White, Kruger
2017-05-01
The ability to detect and unambiguously follow all moving entities in a state-space is important in multiple domains both in defence (e.g. air surveillance, maritime situational awareness, ground moving target indication) and the civil sphere (e.g. astronomy, biology, epidemiology, dispersion modelling). However, tracking and state estimation researchers and practitioners have difficulties recreating state-of-the-art algorithms in order to benchmark their own work. Furthermore, system developers need to assess which algorithms meet operational requirements objectively and exhaustively rather than intuitively or driven by personal favourites. We have therefore commenced the development of a collaborative initiative to create an open source framework for production, demonstration and evaluation of Tracking and State Estimation algorithms. The initiative will develop a (MIT-licensed) software platform for researchers and practitioners to test, verify and benchmark a variety of multi-sensor and multi-object state estimation algorithms. The initiative is supported by four defence laboratories, who will contribute to the development effort for the framework. The tracking and state estimation community will derive significant benefits from this work, including: access to repositories of verified and validated tracking and state estimation algorithms, a framework for the evaluation of multiple algorithms, standardisation of interfaces and access to challenging data sets. Keywords: Tracking,
Analysis of ground-water data for selected wells near Holloman Air Force Base, New Mexico, 1950-95
Huff, G.F.
1996-01-01
Ground-water-level, ground-water-withdrawal, and ground- water-quality data were evaluated for trends. Holloman Air Force Base is located in the west-central part of Otero County, New Mexico. Ground-water-data analyses include assembly and inspection of U.S. Geological Survey and Holloman Air Force Base data, including ground-water-level data for public-supply and observation wells and withdrawal and water-quality data for public-supply wells in the area. Well Douglas 4 shows a statistically significant decreasing trend in water levels for 1972-86 and a statistically significant increasing trend in water levels for 1986-90. Water levels in wells San Andres 5 and San Andres 6 show statistically significant decreasing trends for 1972-93 and 1981-89, respectively. A mixture of statistically significant increasing trends, statistically significant decreasing trends, and lack of statistically significant trends over periods ranging from the early 1970's to the early 1990's are indicated for the Boles wells and wells near the Boles wells. Well Boles 5 shows a statistically significant increasing trend in water levels for 1981-90. Well Boles 5 and well 17S.09E.25.343 show no statistically significant trends in water levels for 1990-93 and 1988-93, respectively. For 1986-93, well Frenchy 1 shows a statistically significant decreasing trend in water levels. Ground-water withdrawal from the San Andres and Douglas wells regularly exceeded estimated ground-water recharge from San Andres Canyon for 1963-87. For 1951-57 and 1960-86, ground-water withdrawal from the Boles wells regularly exceeded total estimated ground-water recharge from Mule, Arrow, and Lead Canyons. Ground-water withdrawal from the San Andres and Douglas wells and from the Boles wells nearly equaled estimated ground- water recharge for 1989-93 and 1986-93, respectively. For 1987- 93, ground-water withdrawal from the Escondido well regularly exceeded estimated ground-water recharge from Escondido Canyon, and ground-water withdrawal from the Frenchy wells regularly exceeded total estimated ground-water recharge from Dog and Deadman Canyons. Water-quality samples were collected from selected Douglas, San Andres, and Boles public-supply wells from December 1994 to February 1995. Concentrations of dissolved nitrate show the most consistent increases between current and historical data. Current concentrations of dissolved nitrate are greater than historical concentrations in 7 of 10 wells.
Correcting for Optimistic Prediction in Small Data Sets
Smith, Gordon C. S.; Seaman, Shaun R.; Wood, Angela M.; Royston, Patrick; White, Ian R.
2014-01-01
The C statistic is a commonly reported measure of screening test performance. Optimistic estimation of the C statistic is a frequent problem because of overfitting of statistical models in small data sets, and methods exist to correct for this issue. However, many studies do not use such methods, and those that do correct for optimism use diverse methods, some of which are known to be biased. We used clinical data sets (United Kingdom Down syndrome screening data from Glasgow (1991–2003), Edinburgh (1999–2003), and Cambridge (1990–2006), as well as Scottish national pregnancy discharge data (2004–2007)) to evaluate different approaches to adjustment for optimism. We found that sample splitting, cross-validation without replication, and leave-1-out cross-validation produced optimism-adjusted estimates of the C statistic that were biased and/or associated with greater absolute error than other available methods. Cross-validation with replication, bootstrapping, and a new method (leave-pair-out cross-validation) all generated unbiased optimism-adjusted estimates of the C statistic and had similar absolute errors in the clinical data set. Larger simulation studies confirmed that all 3 methods performed similarly with 10 or more events per variable, or when the C statistic was 0.9 or greater. However, with lower events per variable or lower C statistics, bootstrapping tended to be optimistic but with lower absolute and mean squared errors than both methods of cross-validation. PMID:24966219
Statistical Techniques for Signal Processing
1993-01-12
functions and extended influence functions of the associated underlying estimators. An interesting application of the influence function and its...and related filter smtctures. While the influence function is best known for its role in characterizing the robustness of estimators. the mathematical...statistics can be designed and analyzed for performance using the influence function as a tool. In particular, we have examined the mean-median
A statistical approach to estimate O3 uptake of ponderosa pine in a mediterranean climate
N.E. Grulke; H.K. Preisler; C.C. Fan; W.A. Retzlaff
2002-01-01
In highly polluted sites, stomatal behavior is sluggish with respect to light, vapor pressure deficit, and internal CO2 concentration (Ci) and poorly described by existing models. Statistical models were developed to estimate stomatal conductance (gs) of 40-year-old ponderosa pine at three sites differing in pollutant exposure for the purpose of...
Adding a Parameter Increases the Variance of an Estimated Regression Function
ERIC Educational Resources Information Center
Withers, Christopher S.; Nadarajah, Saralees
2011-01-01
The linear regression model is one of the most popular models in statistics. It is also one of the simplest models in statistics. It has received applications in almost every area of science, engineering and medicine. In this article, the authors show that adding a predictor to a linear model increases the variance of the estimated regression…
ERIC Educational Resources Information Center
Schochet, Peter Z.
2015-01-01
This report presents the statistical theory underlying the "RCT-YES" software that estimates and reports impacts for RCTs for a wide range of designs used in social policy research. The report discusses a unified, non-parametric design-based approach for impact estimation using the building blocks of the Neyman-Rubin-Holland causal…
Analysis of half diallel mating designs I: a practical analysis procedure for ANOVA approximation.
G.R. Johnson; J.N. King
1998-01-01
Procedures to analyze half-diallel mating designs using the SAS statistical package are presented. The procedure requires two runs of PROC and VARCOMP and results in estimates of additive and non-additive genetic variation. The procedures described can be modified to work on most statistical software packages which can compute variance component estimates. The...
NASA Astrophysics Data System (ADS)
Karakatsanis, L. P.; Iliopoulos, A. C.; Pavlos, E. G.; Pavlos, G. P.
2018-02-01
In this paper, we perform statistical analysis of time series deriving from Earth's climate. The time series are concerned with Geopotential Height (GH) and correspond to temporal and spatial components of the global distribution of month average values, during the period (1948-2012). The analysis is based on Tsallis non-extensive statistical mechanics and in particular on the estimation of Tsallis' q-triplet, namely {qstat, qsens, qrel}, the reconstructed phase space and the estimation of correlation dimension and the Hurst exponent of rescaled range analysis (R/S). The deviation of Tsallis q-triplet from unity indicates non-Gaussian (Tsallis q-Gaussian) non-extensive character with heavy tails probability density functions (PDFs), multifractal behavior and long range dependences for all timeseries considered. Also noticeable differences of the q-triplet estimation found in the timeseries at distinct local or temporal regions. Moreover, in the reconstructive phase space revealed a lower-dimensional fractal set in the GH dynamical phase space (strong self-organization) and the estimation of Hurst exponent indicated multifractality, non-Gaussianity and persistence. The analysis is giving significant information identifying and characterizing the dynamical characteristics of the earth's climate.
Estimating Selected Streamflow Statistics Representative of 1930-2002 in West Virginia
Wiley, Jeffrey B.
2008-01-01
Regional equations and procedures were developed for estimating 1-, 3-, 7-, 14-, and 30-day 2-year; 1-, 3-, 7-, 14-, and 30-day 5-year; and 1-, 3-, 7-, 14-, and 30-day 10-year hydrologically based low-flow frequency values for unregulated streams in West Virginia. Regional equations and procedures also were developed for estimating the 1-day, 3-year and 4-day, 3-year biologically based low-flow frequency values; the U.S. Environmental Protection Agency harmonic-mean flows; and the 10-, 25-, 50-, 75-, and 90-percent flow-duration values. Regional equations were developed using ordinary least-squares regression using statistics from 117 U.S. Geological Survey continuous streamflow-gaging stations as dependent variables and basin characteristics as independent variables. Equations for three regions in West Virginia - North, South-Central, and Eastern Panhandle - were determined. Drainage area, precipitation, and longitude of the basin centroid are significant independent variables in one or more of the equations. Estimating procedures are presented for determining statistics at a gaging station, a partial-record station, and an ungaged location. Examples of some estimating procedures are presented.
Louwerse, Max M; Benesh, Nick
2012-01-01
Spatial mental representations can be derived from linguistic and non-linguistic sources of information. This study tested whether these representations could be formed from statistical linguistic frequencies of city names, and to what extent participants differed in their performance when they estimated spatial locations from language or maps. In a computational linguistic study, we demonstrated that co-occurrences of cities in Tolkien's Lord of the Rings trilogy and The Hobbit predicted the authentic longitude and latitude of those cities in Middle Earth. In a human study, we showed that human spatial estimates of the location of cities were very similar regardless of whether participants read Tolkien's texts or memorized a map of Middle Earth. However, text-based location estimates obtained from statistical linguistic frequencies better predicted the human text-based estimates than the human map-based estimates. These findings suggest that language encodes spatial structure of cities, and that human cognitive map representations can come from implicit statistical linguistic patterns, from explicit non-linguistic perceptual information, or from both. Copyright © 2012 Cognitive Science Society, Inc.
NASA Astrophysics Data System (ADS)
Popova, Olga H.
Dental hygiene students must embody effective critical thinking skills in order to provide evidence-based comprehensive patient care. The problem addressed in this study it was not known if and to what extent concept mapping and reflective journaling activities embedded in a curriculum over a 4-week period, impacted the critical thinking skills of 22 first and second-year dental hygiene students attending a community college in the Midwest. The overarching research questions were: what is the effect of concept mapping, and what is the effect of reflective journaling on the level of critical thinking skills of first and second year dental hygiene students? This quantitative study employed a quasi-experimental, pretest-posttest design. Analysis of Covariance (ANCOVA) assessed students' mean scores of critical thinking on the California Critical Thinking Skills Test (CCTST) pretest and posttest for the concept mapping and reflective journaling treatment groups. The results of the study found an increase in CCTST posttest scores with the use of both concept mapping and reflective journaling. However, the increase in scores was not found to be statistically significant. Hence, this study identified concept mapping using Ausubel's assimilation theory and reflective journaling incorporating Johns's revision of Carper's patterns of knowing as potential instructional strategies and theoretical models to enhance undergraduate students' critical thinking skills. More research is required in this area to draw further conclusions. Keywords: Critical thinking, critical thinking development, critical thinking skills, instructional strategies, concept mapping, reflective journaling, dental hygiene, college students.
Changes in newspaper coverage of mental illness from 2008 to 2014 in England.
Rhydderch, D; Krooupa, A-M; Shefer, G; Goulden, R; Williams, P; Thornicroft, A; Rose, D; Thornicroft, G; Henderson, C
2016-08-01
This study evaluates English newspaper coverage of mental health topics between 2008 and 2014 to provide context for the concomitant improvement in public attitudes and seek evidence for changes in coverage. Articles in 27 newspapers were retrieved using keyword searches on two randomly chosen days each month in 2008-2014, excluding 2012 due to restricted resources. Content analysis used a structured coding framework. Univariate logistic regression models were used to estimate the odds of each hypothesised element occurring each year compared to 2008. There was a substantial increase in the number of articles covering mental health between 2008 and 2014. We found an increase in the proportion of antistigmatising articles which approached significance at P < 0.05 (OR = 1.21, P = 0.056). The decrease in stigmatising articles was not statistically significant (OR = 0.90, P = 0.312). There was a significant decrease in the proportion of articles featuring the stigmatising elements 'danger to others' and 'personal responsibility', and an increase in 'hopeless victim'. There was a significant proportionate increase in articles featuring the antistigmatising elements 'injustice' and 'stigma', but a decrease in 'sympathetic portrayal of people with mental illness'. We found a decrease in articles promoting ideas about dangerousness or mental illness being self-inflicted, but an increase in articles portraying people as incapable. Yet, these findings were not consistent over time. © 2016 The Authors. Acta Psychiatrica Scandinavica Published by John Wiley & Sons Ltd.
Liu, Ying; Ciliax, Brian J; Borges, Karin; Dasigi, Venu; Ram, Ashwin; Navathe, Shamkant B; Dingledine, Ray
2004-01-01
One of the key challenges of microarray studies is to derive biological insights from the unprecedented quatities of data on gene-expression patterns. Clustering genes by functional keyword association can provide direct information about the nature of the functional links among genes within the derived clusters. However, the quality of the keyword lists extracted from biomedical literature for each gene significantly affects the clustering results. We extracted keywords from MEDLINE that describes the most prominent functions of the genes, and used the resulting weights of the keywords as feature vectors for gene clustering. By analyzing the resulting cluster quality, we compared two keyword weighting schemes: normalized z-score and term frequency-inverse document frequency (TFIDF). The best combination of background comparison set, stop list and stemming algorithm was selected based on precision and recall metrics. In a test set of four known gene groups, a hierarchical algorithm correctly assigned 25 of 26 genes to the appropriate clusters based on keywords extracted by the TDFIDF weighting scheme, but only 23 og 26 with the z-score method. To evaluate the effectiveness of the weighting schemes for keyword extraction for gene clusters from microarray profiles, 44 yeast genes that are differentially expressed during the cell cycle were used as a second test set. Using established measures of cluster quality, the results produced from TFIDF-weighted keywords had higher purity, lower entropy, and higher mutual information than those produced from normalized z-score weighted keywords. The optimized algorithms should be useful for sorting genes from microarray lists into functionally discrete clusters.
ERIC Educational Resources Information Center
Klein, Gary M.
1994-01-01
Online public access catalogs from 67 libraries using NOTIS software were searched using Internet connections to determine the positional operators selected as the default keyword operator on each catalog. Results indicate the lack of a processing standard for keyword searches. Five tables provide information. (Author/AEF)
NASA Astrophysics Data System (ADS)
Saini, K. K.; Sehgal, R. K.; Sethi, B. L.
2008-10-01
In this paper major reliability estimators are analyzed and there comparatively result are discussed. There strengths and weaknesses are evaluated in this case study. Each of the reliability estimators has certain advantages and disadvantages. Inter-rater reliability is one of the best ways to estimate reliability when your measure is an observation. However, it requires multiple raters or observers. As an alternative, you could look at the correlation of ratings of the same single observer repeated on two different occasions. Each of the reliability estimators will give a different value for reliability. In general, the test-retest and inter-rater reliability estimates will be lower in value than the parallel forms and internal consistency ones because they involve measuring at different times or with different raters. Since reliability estimates are often used in statistical analyses of quasi-experimental designs.
Robust functional statistics applied to Probability Density Function shape screening of sEMG data.
Boudaoud, S; Rix, H; Al Harrach, M; Marin, F
2014-01-01
Recent studies pointed out possible shape modifications of the Probability Density Function (PDF) of surface electromyographical (sEMG) data according to several contexts like fatigue and muscle force increase. Following this idea, criteria have been proposed to monitor these shape modifications mainly using High Order Statistics (HOS) parameters like skewness and kurtosis. In experimental conditions, these parameters are confronted with small sample size in the estimation process. This small sample size induces errors in the estimated HOS parameters restraining real-time and precise sEMG PDF shape monitoring. Recently, a functional formalism, the Core Shape Model (CSM), has been used to analyse shape modifications of PDF curves. In this work, taking inspiration from CSM method, robust functional statistics are proposed to emulate both skewness and kurtosis behaviors. These functional statistics combine both kernel density estimation and PDF shape distances to evaluate shape modifications even in presence of small sample size. Then, the proposed statistics are tested, using Monte Carlo simulations, on both normal and Log-normal PDFs that mimic observed sEMG PDF shape behavior during muscle contraction. According to the obtained results, the functional statistics seem to be more robust than HOS parameters to small sample size effect and more accurate in sEMG PDF shape screening applications.
Central On-Line Data Directory
NASA Technical Reports Server (NTRS)
Thieman, J.
1986-01-01
The National Space Science Data Center (NSSDC) Central On-Line Data Directory (CODD), which allows the general scientist remote access to information about data sets available not only at NSSDC, but throughout the scientific community, is discussed. A user may search for data set information within CODD by specifying spacecraft name, experiment name, investigator name, and/or keywords. CODD will include information on atmospheric science data sets contained not only within the PCDS, but also within other data sets that are deemed important. Keywords to be used in locating these data sets are currently being formulated. The main type of keyword to be used for categorization of data sets will be discipline related. The primary discipline keyword for PCDS-type data sets would be ATMOSPHERIC SCIENCE. A good set of subdiscipline keywords is needed under this discipline to subdivide the data sets. A sheet containing a strawman set of subdiscipline keywords was distributed, and a request was made for the knowledgeable scientists to modify or replace the proposed keywords.
NASA Technical Reports Server (NTRS)
Currit, P. A.
1983-01-01
The Cleanroom software development methodology is designed to take the gamble out of product releases for both suppliers and receivers of the software. The ingredients of this procedure are a life cycle of executable product increments, representative statistical testing, and a standard estimate of the MTTF (Mean Time To Failure) of the product at the time of its release. A statistical approach to software product testing using randomly selected samples of test cases is considered. A statistical model is defined for the certification process which uses the timing data recorded during test. A reasonableness argument for this model is provided that uses previously published data on software product execution. Also included is a derivation of the certification model estimators and a comparison of the proposed least squares technique with the more commonly used maximum likelihood estimators.
Martin, Gary R.; Fowler, Kathleen K.; Arihood, Leslie D.
2016-09-06
Information on low-flow characteristics of streams is essential for the management of water resources. This report provides equations for estimating the 1-, 7-, and 30-day mean low flows for a recurrence interval of 10 years and the harmonic-mean flow at ungaged, unregulated stream sites in Indiana. These equations were developed using the low-flow statistics and basin characteristics for 108 continuous-record streamgages in Indiana with at least 10 years of daily mean streamflow data through the 2011 climate year (April 1 through March 31). The equations were developed in cooperation with the Indiana Department of Environmental Management.Regression techniques were used to develop the equations for estimating low-flow frequency statistics and the harmonic-mean flows on the basis of drainage-basin characteristics. A geographic information system was used to measure basin characteristics for selected streamgages. A final set of 25 basin characteristics measured at all the streamgages were evaluated to choose the best predictors of the low-flow statistics.Logistic-regression equations applicable statewide are presented for estimating the probability that selected low-flow frequency statistics equal zero. These equations use the explanatory variables total drainage area, average transmissivity of the full thickness of the unconsolidated deposits within 1,000 feet of the stream network, and latitude of the basin outlet. The percentage of the streamgage low-flow statistics correctly classified as zero or nonzero using the logistic-regression equations ranged from 86.1 to 88.9 percent.Generalized-least-squares regression equations applicable statewide for estimating nonzero low-flow frequency statistics use total drainage area, the average hydraulic conductivity of the top 70 feet of unconsolidated deposits, the slope of the basin, and the index of permeability and thickness of the Quaternary surficial sediments as explanatory variables. The average standard error of prediction of these regression equations ranges from 55.7 to 61.5 percent.Regional weighted-least-squares regression equations were developed for estimating the harmonic-mean flows by dividing the State into three low-flow regions. The Northern region uses total drainage area and the average transmissivity of the entire thickness of unconsolidated deposits as explanatory variables. The Central region uses total drainage area, the average hydraulic conductivity of the entire thickness of unconsolidated deposits, and the index of permeability and thickness of the Quaternary surficial sediments. The Southern region uses total drainage area and the percent of the basin covered by forest. The average standard error of prediction for these equations ranges from 39.3 to 66.7 percent.The regional regression equations are applicable only to stream sites with low flows unaffected by regulation and to stream sites with drainage basin characteristic values within specified limits. Caution is advised when applying the equations for basins with characteristics near the applicable limits and for basins with karst drainage features and for urbanized basins. Extrapolations near and beyond the applicable basin characteristic limits will have unknown errors that may be large. Equations are presented for use in estimating the 90-percent prediction interval of the low-flow statistics estimated by use of the regression equations at a given stream site.The regression equations are to be incorporated into the U.S. Geological Survey StreamStats Web-based application for Indiana. StreamStats allows users to select a stream site on a map and automatically measure the needed basin characteristics and compute the estimated low-flow statistics and associated prediction intervals.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bao, Rong; Li, Yongdong; Liu, Chunliang
2016-07-15
The output power fluctuations caused by weights of macro particles used in particle-in-cell (PIC) simulations of a backward wave oscillator and a travelling wave tube are statistically analyzed. It is found that the velocities of electrons passed a specific slow-wave structure form a specific electron velocity distribution. The electron velocity distribution obtained in PIC simulation with a relative small weight of macro particles is considered as an initial distribution. By analyzing this initial distribution with a statistical method, the estimations of the output power fluctuations caused by different weights of macro particles are obtained. The statistical method is verified bymore » comparing the estimations with the simulation results. The fluctuations become stronger with increasing weight of macro particles, which can also be determined reversely from estimations of the output power fluctuations. With the weights of macro particles optimized by the statistical method, the output power fluctuations in PIC simulations are relatively small and acceptable.« less
The new statistics: why and how.
Cumming, Geoff
2014-01-01
We need to make substantial changes to how we conduct research. First, in response to heightened concern that our published research literature is incomplete and untrustworthy, we need new requirements to ensure research integrity. These include prespecification of studies whenever possible, avoidance of selection and other inappropriate data-analytic practices, complete reporting, and encouragement of replication. Second, in response to renewed recognition of the severe flaws of null-hypothesis significance testing (NHST), we need to shift from reliance on NHST to estimation and other preferred techniques. The new statistics refers to recommended practices, including estimation based on effect sizes, confidence intervals, and meta-analysis. The techniques are not new, but adopting them widely would be new for many researchers, as well as highly beneficial. This article explains why the new statistics are important and offers guidance for their use. It describes an eight-step new-statistics strategy for research with integrity, which starts with formulation of research questions in estimation terms, has no place for NHST, and is aimed at building a cumulative quantitative discipline.
Median statistics estimates of Hubble and Newton's constants
NASA Astrophysics Data System (ADS)
Bethapudi, Suryarao; Desai, Shantanu
2017-02-01
Robustness of any statistics depends upon the number of assumptions it makes about the measured data. We point out the advantages of median statistics using toy numerical experiments and demonstrate its robustness, when the number of assumptions we can make about the data are limited. We then apply the median statistics technique to obtain estimates of two constants of nature, Hubble constant (H0) and Newton's gravitational constant ( G , both of which show significant differences between different measurements. For H0, we update the analyses done by Chen and Ratra (2011) and Gott et al. (2001) using 576 measurements. We find after grouping the different results according to their primary type of measurement, the median estimates are given by H0 = 72.5^{+2.5}_{-8} km/sec/Mpc with errors corresponding to 95% c.l. (2 σ) and G=6.674702^{+0.0014}_{-0.0009} × 10^{-11} Nm2kg-2 corresponding to 68% c.l. (1σ).
Bayesian inference based on dual generalized order statistics from the exponentiated Weibull model
NASA Astrophysics Data System (ADS)
Al Sobhi, Mashail M.
2015-02-01
Bayesian estimation for the two parameters and the reliability function of the exponentiated Weibull model are obtained based on dual generalized order statistics (DGOS). Also, Bayesian prediction bounds for future DGOS from exponentiated Weibull model are obtained. The symmetric and asymmetric loss functions are considered for Bayesian computations. The Markov chain Monte Carlo (MCMC) methods are used for computing the Bayes estimates and prediction bounds. The results have been specialized to the lower record values. Comparisons are made between Bayesian and maximum likelihood estimators via Monte Carlo simulation.
Global, Regional, and National Fossil-Fuel CO2 Emissions, 1751 - 2008 (Version 2011)
Boden, Thomas A. [CDIAC, Oak Ridge National Laboratory; Marland, G. [CDIAC, Oak Ridge National Laboratory; Andres, Robert J. [CDIAC, Oak Ridge National Laboratory
2011-01-01
Publications containing historical energy statistics make it possible to estimate fossil fuel CO2 emissions back to 1751. Etemad et al. (1991) published a summary compilation that tabulates coal, brown coal, peat, and crude oil production by nation and year. Footnotes in the Etemad et al.(1991) publication extend the energy statistics time series back to 1751. Summary compilations of fossil fuel trade were published by Mitchell (1983, 1992, 1993, 1995). Mitchell's work tabulates solid and liquid fuel imports and exports by nation and year. These pre-1950 production and trade data were digitized and CO2 emission calculations were made following the procedures discussed in Marland and Rotty (1984) and Boden et al. (1995). Further details on the contents and processing of the historical energy statistics are provided in Andres et al. (1999). The 1950 to present CO2 emission estimates are derived primarily from energy statistics published by the United Nations (2010), using the methods of Marland and Rotty (1984). The energy statistics were compiled primarily from annual questionnaires distributed by the U.N. Statistical Office and supplemented by official national statistical publications. As stated in the introduction of the Statistical Yearbook, "in a few cases, official sources are supplemented by other sources and estimates, where these have been subjected to professional scrutiny and debate and are consistent with other independent sources." Data from the U.S. Department of Interior's Geological Survey (USGS 2010) were used to estimate CO2 emitted during cement production. Values for emissions from gas flaring were derived primarily from U.N. data but were supplemented with data from the U.S. Department of Energy's Energy Information Administration (1994), Rotty (1974), and data provided by G. Marland. Greater details about these methods are provided in Marland and Rotty (1984), Boden et al. (1995), and Andres et al. (1999).
Global, Regional, and National Fossil-Fuel CO2 Emissions (1751 - 2010) (V. 2013)
Boden, Thomas A. [CDIAC, Oak Ridge National Laboratory; Andres, Robert J. [CDIAC, Oak Ridge National Laboratory; Marland, G.
2013-01-01
Publications containing historical energy statistics make it possible to estimate fossil fuel CO2 emissions back to 1751. Etemad et al. (1991) published a summary compilation that tabulates coal, brown coal, peat, and crude oil production by nation and year. Footnotes in the Etemad et al.(1991) publication extend the energy statistics time series back to 1751. Summary compilations of fossil fuel trade were published by Mitchell (1983, 1992, 1993, 1995). Mitchell's work tabulates solid and liquid fuel imports and exports by nation and year. These pre-1950 production and trade data were digitized and CO2 emission calculations were made following the procedures discussed in Marland and Rotty (1984) and Boden et al. (1995). Further details on the contents and processing of the historical energy statistics are provided in Andres et al. (1999). The 1950 to present CO2 emission estimates are derived primarily from energy statistics published by the United Nations (2013), using the methods of Marland and Rotty (1984). The energy statistics were compiled primarily from annual questionnaires distributed by the U.N. Statistical Office and supplemented by official national statistical publications. As stated in the introduction of the Statistical Yearbook, "in a few cases, official sources are supplemented by other sources and estimates, where these have been subjected to professional scrutiny and debate and are consistent with other independent sources." Data from the U.S. Department of Interior's Geological Survey (USGS 2012) were used to estimate CO2 emitted during cement production. Values for emissions from gas flaring were derived primarily from U.N. data but were supplemented with data from the U.S. Department of Energy's Energy Information Administration (1994), Rotty (1974), and data provided by G. Marland. Greater details about these methods are provided in Marland and Rotty (1984), Boden et al. (1995), and Andres et al. (1999).
Global, Regional, and National Fossil-Fuel CO2 Emissions (1751 - 2014) (V. 2017)
Boden, T. A. [CDIAC, Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (USA); Andres, R. J. [CDIAC, Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (USA); Marland, G. [Appalachian State University, Boone, NC (USA)
2017-01-01
Publications containing historical energy statistics make it possible to estimate fossil fuel CO2 emissions back to 1751. Etemad et al. (1991) published a summary compilation that tabulates coal, brown coal, peat, and crude oil production by nation and year. Footnotes in the Etemad et al.(1991) publication extend the energy statistics time series back to 1751. Summary compilations of fossil fuel trade were published by Mitchell (1983, 1992, 1993, 1995). Mitchell's work tabulates solid and liquid fuel imports and exports by nation and year. These pre-1950 production and trade data were digitized and CO2 emission calculations were made following the procedures discussed in Marland and Rotty (1984) and Boden et al. (1995). Further details on the contents and processing of the historical energy statistics are provided in Andres et al. (1999). The 1950 to present CO2 emission estimates are derived primarily from energy statistics published by the United Nations (2017), using the methods of Marland and Rotty (1984). The energy statistics were compiled primarily from annual questionnaires distributed by the U.N. Statistical Office and supplemented by official national statistical publications. As stated in the introduction of the Statistical Yearbook, "in a few cases, official sources are supplemented by other sources and estimates, where these have been subjected to professional scrutiny and debate and are consistent with other independent sources." Data from the U.S. Department of Interior's Geological Survey (USGS 2017) were used to estimate CO2 emitted during cement production. Values for emissions from gas flaring were derived primarily from U.N. data but were supplemented with data from the U.S. Department of Energy's Energy Information Administration (1994), Rotty (1974), and data provided by G. Marland. Greater details about these methods are provided in Marland and Rotty (1984), Boden et al. (1995), and Andres et al. (1999).
Global, Regional, and National Fossil-Fuel CO2 Emissions (1751 - 2013) (V. 2016)
Boden, T. A. [CDIAC, Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (USA); Andres, R. J. [CDIAC, Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (USA); Marland, G. [Appalachian State University, Boone, NC (USA)
2016-01-01
Publications containing historical energy statistics make it possible to estimate fossil fuel CO2 emissions back to 1751. Etemad et al. (1991) published a summary compilation that tabulates coal, brown coal, peat, and crude oil production by nation and year. Footnotes in the Etemad et al.(1991) publication extend the energy statistics time series back to 1751. Summary compilations of fossil fuel trade were published by Mitchell (1983, 1992, 1993, 1995). Mitchell's work tabulates solid and liquid fuel imports and exports by nation and year. These pre-1950 production and trade data were digitized and CO2 emission calculations were made following the procedures discussed in Marland and Rotty (1984) and Boden et al. (1995). Further details on the contents and processing of the historical energy statistics are provided in Andres et al. (1999). The 1950 to present CO2 emission estimates are derived primarily from energy statistics published by the United Nations (2016), using the methods of Marland and Rotty (1984). The energy statistics were compiled primarily from annual questionnaires distributed by the U.N. Statistical Office and supplemented by official national statistical publications. As stated in the introduction of the Statistical Yearbook, "in a few cases, official sources are supplemented by other sources and estimates, where these have been subjected to professional scrutiny and debate and are consistent with other independent sources." Data from the U.S. Department of Interior's Geological Survey (USGS 2016) were used to estimate CO2 emitted during cement production. Values for emissions from gas flaring were derived primarily from U.N. data but were supplemented with data from the U.S. Department of Energy's Energy Information Administration (1994), Rotty (1974), and data provided by G. Marland. Greater details about these methods are provided in Marland and Rotty (1984), Boden et al. (1995), and Andres et al. (1999).
Global, Regional, and National Fossil-Fuel CO2 Emissions (1751 - 2011) (V. 2015)
Boden, T. A. [CDIAC, Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (USA); Andres, R. J. [CDIAC, Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (USA); Marland, G. [Appalachian State University Boone, NC (USA)
2015-01-01
Publications containing historical energy statistics make it possible to estimate fossil fuel CO2 emissions back to 1751. Etemad et al. (1991) published a summary compilation that tabulates coal, brown coal, peat, and crude oil production by nation and year. Footnotes in the Etemad et al.(1991) publication extend the energy statistics time series back to 1751. Summary compilations of fossil fuel trade were published by Mitchell (1983, 1992, 1993, 1995). Mitchell's work tabulates solid and liquid fuel imports and exports by nation and year. These pre-1950 production and trade data were digitized and CO2 emission calculations were made following the procedures discussed in Marland and Rotty (1984) and Boden et al. (1995). Further details on the contents and processing of the historical energy statistics are provided in Andres et al. (1999). The 1950 to present CO2 emission estimates are derived primarily from energy statistics published by the United Nations (2014), using the methods of Marland and Rotty (1984). The energy statistics were compiled primarily from annual questionnaires distributed by the U.N. Statistical Office and supplemented by official national statistical publications. As stated in the introduction of the Statistical Yearbook, "in a few cases, official sources are supplemented by other sources and estimates, where these have been subjected to professional scrutiny and debate and are consistent with other independent sources." Data from the U.S. Department of Interior's Geological Survey (USGS 2014) were used to estimate CO2 emitted during cement production. Values for emissions from gas flaring were derived primarily from U.N. data but were supplemented with data from the U.S. Department of Energy's Energy Information Administration (1994), Rotty (1974), and data provided by G. Marland. Greater details about these methods are provided in Marland and Rotty (1984), Boden et al. (1995), and Andres et al. (1999).
Global, Regional, and National Fossil-Fuel CO2 Emissions (1751 - 2009) (V. 2012)
Boden, Thomas A. [CDIAC, Oak Ridge National Laboratory; Andres, Robert J. [Oak Ridge National Laboratory; Marland, G. [Research Institute for Environment, Energy and Economics, Appalachian State University
2012-01-01
Publications containing historical energy statistics make it possible to estimate fossil fuel CO2 emissions back to 1751. Etemad et al. (1991) published a summary compilation that tabulates coal, brown coal, peat, and crude oil production by nation and year. Footnotes in the Etemad et al.(1991) publication extend the energy statistics time series back to 1751. Summary compilations of fossil fuel trade were published by Mitchell (1983, 1992, 1993, 1995). Mitchell's work tabulates solid and liquid fuel imports and exports by nation and year. These pre-1950 production and trade data were digitized and CO2 emission calculations were made following the procedures discussed in Marland and Rotty (1984) and Boden et al. (1995). Further details on the contents and processing of the historical energy statistics are provided in Andres et al. (1999). The 1950 to present CO2 emission estimates are derived primarily from energy statistics published by the United Nations (2012), using the methods of Marland and Rotty (1984). The energy statistics were compiled primarily from annual questionnaires distributed by the U.N. Statistical Office and supplemented by official national statistical publications. As stated in the introduction of the Statistical Yearbook, "in a few cases, official sources are supplemented by other sources and estimates, where these have been subjected to professional scrutiny and debate and are consistent with other independent sources." Data from the U.S. Department of Interior's Geological Survey (USGS 2011) were used to estimate CO2 emitted during cement production. Values for emissions from gas flaring were derived primarily from U.N. data but were supplemented with data from the U.S. Department of Energy's Energy Information Administration (1994), Rotty (1974), and data provided by G. Marland. Greater details about these methods are provided in Marland and Rotty (1984), Boden et al. (1995), and Andres et al. (1999).
Global, Regional, and National Fossil-Fuel CO2 Emissions, 1751 - 2007 (Version 2010)
Boden, Thomas A. [CDIAC, Oak Ridge National Laboratory; Marland, G. [CDIAC, Oak Ridge National Laboratory; Andres, Robert J. [CDIAC, Oak Ridge National Laboratory
2010-01-01
Publications containing historical energy statistics make it possible to estimate fossil fuel CO2 emissions back to 1751. Etemad et al. (1991) published a summary compilation that tabulates coal, brown coal, peat, and crude oil production by nation and year. Footnotes in the Etemad et al.(1991) publication extend the energy statistics time series back to 1751. Summary compilations of fossil fuel trade were published by Mitchell (1983, 1992, 1993, 1995). Mitchell's work tabulates solid and liquid fuel imports and exports by nation and year. These pre-1950 production and trade data were digitized and CO2 emission calculations were made following the procedures discussed in Marland and Rotty (1984) and Boden et al. (1995). Further details on the contents and processing of the historical energy statistics are provided in Andres et al. (1999). The 1950 to present CO2 emission estimates are derived primarily from energy statistics published by the United Nations (2009), using the methods of Marland and Rotty (1984). The energy statistics were compiled primarily from annual questionnaires distributed by the U.N. Statistical Office and supplemented by official national statistical publications. As stated in the introduction of the Statistical Yearbook, "in a few cases, official sources are supplemented by other sources and estimates, where these have been subjected to professional scrutiny and debate and are consistent with other independent sources." Data from the U.S. Department of Interior's Geological Survey (USGS 2009) were used to estimate CO2 emitted during cement production. Values for emissions from gas flaring were derived primarily from U.N. data but were supplemented with data from the U.S. Department of Energy's Energy Information Administration (1994), Rotty (1974), and data provided by G. Marland. Greater details about these methods are provided in Marland and Rotty (1984), Boden et al. (1995), and Andres et al. (1999).
Global, Regional, and National Fossil-Fuel CO2 Emissions, 1751 - 2006 (published 2009)
Boden, Thomas A. [CDIAC, Oak Ridge National Laboratory; Marland, G. [CDIAC, Oak Ridge National Laboratory; Andres, Robert J. [CDIAC, Oak Ridge National Laboratory
2009-01-01
Publications containing historical energy statistics make it possible to estimate fossil fuel CO2 emissions back to 1751. Etemad et al. (1991) published a summary compilation that tabulates coal, brown coal, peat, and crude oil production by nation and year. Footnotes in the Etemad et al.(1991) publication extend the energy statistics time series back to 1751. Summary compilations of fossil fuel trade were published by Mitchell (1983, 1992, 1993, 1995). Mitchell's work tabulates solid and liquid fuel imports and exports by nation and year. These pre-1950 production and trade data were digitized and CO2 emission calculations were made following the procedures discussed in Marland and Rotty (1984) and Boden et al. (1995). Further details on the contents and processing of the historical energy statistics are provided in Andres et al. (1999). The 1950 to present CO2 emission estimates are derived primarily from energy statistics published by the United Nations (2008), using the methods of Marland and Rotty (1984). The energy statistics were compiled primarily from annual questionnaires distributed by the U.N. Statistical Office and supplemented by official national statistical publications. As stated in the introduction of the Statistical Yearbook, "in a few cases, official sources are supplemented by other sources and estimates, where these have been subjected to professional scrutiny and debate and are consistent with other independent sources." Data from the U.S. Department of Interior's Geological Survey (USGS 2008) were used to estimate CO2 emitted during cement production. Values for emissions from gas flaring were derived primarily from U.N. data but were supplemented with data from the U.S. Department of Energy's Energy Information Administration (1994), Rotty (1974), and data provided by G. Marland. Greater details about these methods are provided in Marland and Rotty (1984), Boden et al. (1995), and Andres et al. (1999).
On-line estimation of error covariance parameters for atmospheric data assimilation
NASA Technical Reports Server (NTRS)
Dee, Dick P.
1995-01-01
A simple scheme is presented for on-line estimation of covariance parameters in statistical data assimilation systems. The scheme is based on a maximum-likelihood approach in which estimates are produced on the basis of a single batch of simultaneous observations. Simple-sample covariance estimation is reasonable as long as the number of available observations exceeds the number of tunable parameters by two or three orders of magnitude. Not much is known at present about model error associated with actual forecast systems. Our scheme can be used to estimate some important statistical model error parameters such as regionally averaged variances or characteristic correlation length scales. The advantage of the single-sample approach is that it does not rely on any assumptions about the temporal behavior of the covariance parameters: time-dependent parameter estimates can be continuously adjusted on the basis of current observations. This is of practical importance since it is likely to be the case that both model error and observation error strongly depend on the actual state of the atmosphere. The single-sample estimation scheme can be incorporated into any four-dimensional statistical data assimilation system that involves explicit calculation of forecast error covariances, including optimal interpolation (OI) and the simplified Kalman filter (SKF). The computational cost of the scheme is high but not prohibitive; on-line estimation of one or two covariance parameters in each analysis box of an operational bozed-OI system is currently feasible. A number of numerical experiments performed with an adaptive SKF and an adaptive version of OI, using a linear two-dimensional shallow-water model and artificially generated model error are described. The performance of the nonadaptive versions of these methods turns out to depend rather strongly on correct specification of model error parameters. These parameters are estimated under a variety of conditions, including uniformly distributed model error and time-dependent model error statistics.
Automatic Keyword Extraction from Individual Documents
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rose, Stuart J.; Engel, David W.; Cramer, Nicholas O.
2010-05-03
This paper introduces a novel and domain-independent method for automatically extracting keywords, as sequences of one or more words, from individual documents. We describe the method’s configuration parameters and algorithm, and present an evaluation on a benchmark corpus of technical abstracts. We also present a method for generating lists of stop words for specific corpora and domains, and evaluate its ability to improve keyword extraction on the benchmark corpus. Finally, we apply our method of automatic keyword extraction to a corpus of news articles and define metrics for characterizing the exclusivity, essentiality, and generality of extracted keywords within a corpus.
How Valid are Estimates of Occupational Illness?
ERIC Educational Resources Information Center
Hilaski, Harvey J.; Wang, Chao Ling
1982-01-01
Examines some of the methods of estimating occupational diseases and suggests that a consensus on the adequacy and reliability of estimates by the Bureau of Labor Statistics and others is not likely. (SK)
Area estimation using multiyear designs and partial crop identification
NASA Technical Reports Server (NTRS)
Sielken, R. L., Jr.
1983-01-01
Progress is reported for the following areas: (1) estimating the stratum's crop acreage proportion using the multiyear area estimation model; (2) assessment of multiyear sampling designs; and (3) development of statistical methodology for incorporating partially identified sample segments into crop area estimation.
Fast maximum likelihood estimation using continuous-time neural point process models.
Lepage, Kyle Q; MacDonald, Christopher J
2015-06-01
A recent report estimates that the number of simultaneously recorded neurons is growing exponentially. A commonly employed statistical paradigm using discrete-time point process models of neural activity involves the computation of a maximum-likelihood estimate. The time to computate this estimate, per neuron, is proportional to the number of bins in a finely spaced discretization of time. By using continuous-time models of neural activity and the optimally efficient Gaussian quadrature, memory requirements and computation times are dramatically decreased in the commonly encountered situation where the number of parameters p is much less than the number of time-bins n. In this regime, with q equal to the quadrature order, memory requirements are decreased from O(np) to O(qp), and the number of floating-point operations are decreased from O(np(2)) to O(qp(2)). Accuracy of the proposed estimates is assessed based upon physiological consideration, error bounds, and mathematical results describing the relation between numerical integration error and numerical error affecting both parameter estimates and the observed Fisher information. A check is provided which is used to adapt the order of numerical integration. The procedure is verified in simulation and for hippocampal recordings. It is found that in 95 % of hippocampal recordings a q of 60 yields numerical error negligible with respect to parameter estimate standard error. Statistical inference using the proposed methodology is a fast and convenient alternative to statistical inference performed using a discrete-time point process model of neural activity. It enables the employment of the statistical methodology available with discrete-time inference, but is faster, uses less memory, and avoids any error due to discretization.
New U.S. Geological Survey Method for the Assessment of Reserve Growth
Klett, Timothy R.; Attanasi, E.D.; Charpentier, Ronald R.; Cook, Troy A.; Freeman, P.A.; Gautier, Donald L.; Le, Phuong A.; Ryder, Robert T.; Schenk, Christopher J.; Tennyson, Marilyn E.; Verma, Mahendra K.
2011-01-01
Reserve growth is defined as the estimated increases in quantities of crude oil, natural gas, and natural gas liquids that have the potential to be added to remaining reserves in discovered accumulations through extension, revision, improved recovery efficiency, and additions of new pools or reservoirs. A new U.S. Geological Survey method was developed to assess the reserve-growth potential of technically recoverable crude oil and natural gas to be added to reserves under proven technology currently in practice within the trend or play, or which reasonably can be extrapolated from geologically similar trends or plays. This method currently is in use to assess potential additions to reserves in discovered fields of the United States. The new approach involves (1) individual analysis of selected large accumulations that contribute most to reserve growth, and (2) conventional statistical modeling of reserve growth in remaining accumulations. This report will focus on the individual accumulation analysis. In the past, the U.S. Geological Survey estimated reserve growth by statistical methods using historical recoverable-quantity data. Those statistical methods were based on growth rates averaged by the number of years since accumulation discovery. Accumulations in mature petroleum provinces with volumetrically significant reserve growth, however, bias statistical models of the data; therefore, accumulations with significant reserve growth are best analyzed separately from those with less significant reserve growth. Large (greater than 500 million barrels) and older (with respect to year of discovery) oil accumulations increase in size at greater rates late in their development history in contrast to more recently discovered accumulations that achieve most growth early in their development history. Such differences greatly affect the statistical methods commonly used to forecast reserve growth. The individual accumulation-analysis method involves estimating the in-place petroleum quantity and its uncertainty, as well as the estimated (forecasted) recoverability and its respective uncertainty. These variables are assigned probabilistic distributions and are combined statistically to provide probabilistic estimates of ultimate recoverable quantities. Cumulative production and remaining reserves are then subtracted from the estimated ultimate recoverable quantities to provide potential reserve growth. In practice, results of the two methods are aggregated to various scales, the highest of which includes an entire country or the world total. The aggregated results are reported along with the statistically appropriate uncertainties.
Rivera-Rodriguez, Claudia L; Resch, Stephen; Haneuse, Sebastien
2018-01-01
In many low- and middle-income countries, the costs of delivering public health programs such as for HIV/AIDS, nutrition, and immunization are not routinely tracked. A number of recent studies have sought to estimate program costs on the basis of detailed information collected on a subsample of facilities. While unbiased estimates can be obtained via accurate measurement and appropriate analyses, they are subject to statistical uncertainty. Quantification of this uncertainty, for example, via standard errors and/or 95% confidence intervals, provides important contextual information for decision-makers and for the design of future costing studies. While other forms of uncertainty, such as that due to model misspecification, are considered and can be investigated through sensitivity analyses, statistical uncertainty is often not reported in studies estimating the total program costs. This may be due to a lack of awareness/understanding of (1) the technical details regarding uncertainty estimation and (2) the availability of software with which to calculate uncertainty for estimators resulting from complex surveys. We provide an overview of statistical uncertainty in the context of complex costing surveys, emphasizing the various potential specific sources that contribute to overall uncertainty. We describe how analysts can compute measures of uncertainty, either via appropriately derived formulae or through resampling techniques such as the bootstrap. We also provide an overview of calibration as a means of using additional auxiliary information that is readily available for the entire program, such as the total number of doses administered, to decrease uncertainty and thereby improve decision-making and the planning of future studies. A recent study of the national program for routine immunization in Honduras shows that uncertainty can be reduced by using information available prior to the study. This method can not only be used when estimating the total cost of delivering established health programs but also to decrease uncertainty when the interest lies in assessing the incremental effect of an intervention. Measures of statistical uncertainty associated with survey-based estimates of program costs, such as standard errors and 95% confidence intervals, provide important contextual information for health policy decision-making and key inputs for the design of future costing studies. Such measures are often not reported, possibly because of technical challenges associated with their calculation and a lack of awareness of appropriate software. Modern statistical analysis methods for survey data, such as calibration, provide a means to exploit additional information that is readily available but was not used in the design of the study to significantly improve the estimation of total cost through the reduction of statistical uncertainty.
Resch, Stephen
2018-01-01
Objectives: In many low- and middle-income countries, the costs of delivering public health programs such as for HIV/AIDS, nutrition, and immunization are not routinely tracked. A number of recent studies have sought to estimate program costs on the basis of detailed information collected on a subsample of facilities. While unbiased estimates can be obtained via accurate measurement and appropriate analyses, they are subject to statistical uncertainty. Quantification of this uncertainty, for example, via standard errors and/or 95% confidence intervals, provides important contextual information for decision-makers and for the design of future costing studies. While other forms of uncertainty, such as that due to model misspecification, are considered and can be investigated through sensitivity analyses, statistical uncertainty is often not reported in studies estimating the total program costs. This may be due to a lack of awareness/understanding of (1) the technical details regarding uncertainty estimation and (2) the availability of software with which to calculate uncertainty for estimators resulting from complex surveys. We provide an overview of statistical uncertainty in the context of complex costing surveys, emphasizing the various potential specific sources that contribute to overall uncertainty. Methods: We describe how analysts can compute measures of uncertainty, either via appropriately derived formulae or through resampling techniques such as the bootstrap. We also provide an overview of calibration as a means of using additional auxiliary information that is readily available for the entire program, such as the total number of doses administered, to decrease uncertainty and thereby improve decision-making and the planning of future studies. Results: A recent study of the national program for routine immunization in Honduras shows that uncertainty can be reduced by using information available prior to the study. This method can not only be used when estimating the total cost of delivering established health programs but also to decrease uncertainty when the interest lies in assessing the incremental effect of an intervention. Conclusion: Measures of statistical uncertainty associated with survey-based estimates of program costs, such as standard errors and 95% confidence intervals, provide important contextual information for health policy decision-making and key inputs for the design of future costing studies. Such measures are often not reported, possibly because of technical challenges associated with their calculation and a lack of awareness of appropriate software. Modern statistical analysis methods for survey data, such as calibration, provide a means to exploit additional information that is readily available but was not used in the design of the study to significantly improve the estimation of total cost through the reduction of statistical uncertainty. PMID:29636964
Ensemble Kalman filters for dynamical systems with unresolved turbulence
DOE Office of Scientific and Technical Information (OSTI.GOV)
Grooms, Ian, E-mail: grooms@cims.nyu.edu; Lee, Yoonsang; Majda, Andrew J.
Ensemble Kalman filters are developed for turbulent dynamical systems where the forecast model does not resolve all the active scales of motion. Coarse-resolution models are intended to predict the large-scale part of the true dynamics, but observations invariably include contributions from both the resolved large scales and the unresolved small scales. The error due to the contribution of unresolved scales to the observations, called ‘representation’ or ‘representativeness’ error, is often included as part of the observation error, in addition to the raw measurement error, when estimating the large-scale part of the system. It is here shown how stochastic superparameterization (amore » multiscale method for subgridscale parameterization) can be used to provide estimates of the statistics of the unresolved scales. In addition, a new framework is developed wherein small-scale statistics can be used to estimate both the resolved and unresolved components of the solution. The one-dimensional test problem from dispersive wave turbulence used here is computationally tractable yet is particularly difficult for filtering because of the non-Gaussian extreme event statistics and substantial small scale turbulence: a shallow energy spectrum proportional to k{sup −5/6} (where k is the wavenumber) results in two-thirds of the climatological variance being carried by the unresolved small scales. Because the unresolved scales contain so much energy, filters that ignore the representation error fail utterly to provide meaningful estimates of the system state. Inclusion of a time-independent climatological estimate of the representation error in a standard framework leads to inaccurate estimates of the large-scale part of the signal; accurate estimates of the large scales are only achieved by using stochastic superparameterization to provide evolving, large-scale dependent predictions of the small-scale statistics. Again, because the unresolved scales contain so much energy, even an accurate estimate of the large-scale part of the system does not provide an accurate estimate of the true state. By providing simultaneous estimates of both the large- and small-scale parts of the solution, the new framework is able to provide accurate estimates of the true system state.« less
Estimation versus falsification approaches in sport and exercise science.
Wilkinson, Michael; Winter, Edward M
2018-05-22
There has been a recent resurgence in debate about methods for statistical inference in science. The debate addresses statistical concepts and their impact on the value and meaning of analyses' outcomes. In contrast, philosophical underpinnings of approaches and the extent to which analytical tools match philosophical goals of the scientific method have received less attention. This short piece considers application of the scientific method to "what-is-the-influence-of x-on-y" type questions characteristic of sport and exercise science. We consider applications and interpretations of estimation versus falsification based statistical approaches and their value in addressing how much x influences y, and in measurement error and method agreement settings. We compare estimation using magnitude based inference (MBI) with falsification using null hypothesis significance testing (NHST), and highlight the limited value both of falsification and NHST to address problems in sport and exercise science. We recommend adopting an estimation approach, expressing the uncertainty of effects of x on y, and their practical/clinical value against pre-determined effect magnitudes using MBI.
Uncertainty Quantification and Statistical Convergence Guidelines for PIV Data
NASA Astrophysics Data System (ADS)
Stegmeir, Matthew; Kassen, Dan
2016-11-01
As Particle Image Velocimetry has continued to mature, it has developed into a robust and flexible technique for velocimetry used by expert and non-expert users. While historical estimates of PIV accuracy have typically relied heavily on "rules of thumb" and analysis of idealized synthetic images, recently increased emphasis has been placed on better quantifying real-world PIV measurement uncertainty. Multiple techniques have been developed to provide per-vector instantaneous uncertainty estimates for PIV measurements. Often real-world experimental conditions introduce complications in collecting "optimal" data, and the effect of these conditions is important to consider when planning an experimental campaign. The current work utilizes the results of PIV Uncertainty Quantification techniques to develop a framework for PIV users to utilize estimated PIV confidence intervals to compute reliable data convergence criteria for optimal sampling of flow statistics. Results are compared using experimental and synthetic data, and recommended guidelines and procedures leveraging estimated PIV confidence intervals for efficient sampling for converged statistics are provided.
Statistical analysis of the determinations of the Sun's Galactocentric distance
NASA Astrophysics Data System (ADS)
Malkin, Zinovy
2013-02-01
Based on several tens of R0 measurements made during the past two decades, several studies have been performed to derive the best estimate of R0. Some used just simple averaging to derive a result, whereas others provided comprehensive analyses of possible errors in published results. In either case, detailed statistical analyses of data used were not performed. However, a computation of the best estimates of the Galactic rotation constants is not only an astronomical but also a metrological task. Here we perform an analysis of 53 R0 measurements (published in the past 20 years) to assess the consistency of the data. Our analysis shows that they are internally consistent. It is also shown that any trend in the R0 estimates from the last 20 years is statistically negligible, which renders the presence of a bandwagon effect doubtful. On the other hand, the formal errors in the published R0 estimates improve significantly with time.
Robin M. Reich; C. Aguirre-Bravo; M.S. Williams
2006-01-01
A statistical strategy for spatial estimation and modeling of natural and environmental resource variables and indicators is presented. This strategy is part of an inventory and monitoring pilot study that is being carried out in the Mexican states of Jalisco and Colima. Fine spatial resolution estimates of key variables and indicators are outputs that will allow the...
NASA Astrophysics Data System (ADS)
Phillips, Thomas J.; Gates, W. Lawrence; Arpe, Klaus
1992-12-01
The effects of sampling frequency on the first- and second-moment statistics of selected European Centre for Medium-Range Weather Forecasts (ECMWF) model variables are investigated in a simulation of "perpetual July" with a diurnal cycle included and with surface and atmospheric fields saved at hourly intervals. The shortest characteristic time scales (as determined by the e-folding time of lagged autocorrelation functions) are those of ground heat fluxes and temperatures, precipitation and runoff, convective processes, cloud properties, and atmospheric vertical motion, while the longest time scales are exhibited by soil temperature and moisture, surface pressure, and atmospheric specific humidity, temperature, and wind. The time scales of surface heat and momentum fluxes and of convective processes are substantially shorter over land than over oceans. An appropriate sampling frequency for each model variable is obtained by comparing the estimates of first- and second-moment statistics determined at intervals ranging from 2 to 24 hours with the "best" estimates obtained from hourly sampling. Relatively accurate estimation of first- and second-moment climate statistics (10% errors in means, 20% errors in variances) can be achieved by sampling a model variable at intervals that usually are longer than the bandwidth of its time series but that often are shorter than its characteristic time scale. For the surface variables, sampling at intervals that are nonintegral divisors of a 24-hour day yields relatively more accurate time-mean statistics because of a reduction in errors associated with aliasing of the diurnal cycle and higher-frequency harmonics. The superior estimates of first-moment statistics are accompanied by inferior estimates of the variance of the daily means due to the presence of systematic biases, but these probably can be avoided by defining a different measure of low-frequency variability. Estimates of the intradiurnal variance of accumulated precipitation and surface runoff also are strongly impacted by the length of the storage interval. In light of these results, several alternative strategies for storage of the EMWF model variables are recommended.
Kalman filter for statistical monitoring of forest cover across sub-continental regions [Symposium
Raymond L. Czaplewski
1991-01-01
The Kalman filter is a generalization of the composite estimator. The univariate composite estimate combines 2 prior estimates of population parameter with a weighted average where the scalar weight is inversely proportional to the variances. The composite estimator is a minimum variance estimator that requires no distributional assumptions other than estimates of the...
Kwon, Deukwoo; Reis, Isildinha M
2015-08-12
When conducting a meta-analysis of a continuous outcome, estimated means and standard deviations from the selected studies are required in order to obtain an overall estimate of the mean effect and its confidence interval. If these quantities are not directly reported in the publications, they must be estimated from other reported summary statistics, such as the median, the minimum, the maximum, and quartiles. We propose a simulation-based estimation approach using the Approximate Bayesian Computation (ABC) technique for estimating mean and standard deviation based on various sets of summary statistics found in published studies. We conduct a simulation study to compare the proposed ABC method with the existing methods of Hozo et al. (2005), Bland (2015), and Wan et al. (2014). In the estimation of the standard deviation, our ABC method performs better than the other methods when data are generated from skewed or heavy-tailed distributions. The corresponding average relative error (ARE) approaches zero as sample size increases. In data generated from the normal distribution, our ABC performs well. However, the Wan et al. method is best for estimating standard deviation under normal distribution. In the estimation of the mean, our ABC method is best regardless of assumed distribution. ABC is a flexible method for estimating the study-specific mean and standard deviation for meta-analysis, especially with underlying skewed or heavy-tailed distributions. The ABC method can be applied using other reported summary statistics such as the posterior mean and 95 % credible interval when Bayesian analysis has been employed.
Dechartres, Agnes; Bond, Elizabeth G; Scheer, Jordan; Riveros, Carolina; Atal, Ignacio; Ravaud, Philippe
2016-11-30
Publication bias and other reporting bias have been well documented for journal articles, but no study has evaluated the nature of results posted at ClinicalTrials.gov. We aimed to assess how many randomized controlled trials (RCTs) with results posted at ClinicalTrials.gov report statistically significant results and whether the proportion of trials with significant results differs when no treatment effect estimate or p-value is posted. We searched ClinicalTrials.gov in June 2015 for all studies with results posted. We included completed RCTs with a superiority hypothesis and considered results for the first primary outcome with results posted. For each trial, we assessed whether a treatment effect estimate and/or p-value was reported at ClinicalTrials.gov and if yes, whether results were statistically significant. If no treatment effect estimate or p-value was reported, we calculated the treatment effect and corresponding p-value using results per arm posted at ClinicalTrials.gov when sufficient data were reported. From the 17,536 studies with results posted at ClinicalTrials.gov, we identified 2823 completed phase 3 or 4 randomized trials with a superiority hypothesis. Of these, 1400 (50%) reported a treatment effect estimate and/or p-value. Results were statistically significant for 844 trials (60%), with a median p-value of 0.01 (Q1-Q3: 0.001-0.26). For the 1423 trials with no treatment effect estimate or p-value posted, we could calculate the treatment effect and corresponding p-value using results reported per arm for 929 (65%). For 494 trials (35%), p-values could not be calculated mainly because of insufficient reporting, censored data, or repeated measurements over time. For the 929 trials we could calculate p-values, we found statistically significant results for 342 (37%), with a median p-value of 0.19 (Q1-Q3: 0.005-0.59). Half of the trials with results posted at ClinicalTrials.gov reported a treatment effect estimate and/or p-value, with significant results for 60% of these. p-values could be calculated from results reported per arm at ClinicalTrials.gov for only 65% of the other trials. The proportion of significant results was much lower for these trials, which suggests a selective posting of treatment effect estimates and/or p-values when results are statistically significant.
SPARK: Adapting Keyword Query to Semantic Search
NASA Astrophysics Data System (ADS)
Zhou, Qi; Wang, Chong; Xiong, Miao; Wang, Haofen; Yu, Yong
Semantic search promises to provide more accurate result than present-day keyword search. However, progress with semantic search has been delayed due to the complexity of its query languages. In this paper, we explore a novel approach of adapting keywords to querying the semantic web: the approach automatically translates keyword queries into formal logic queries so that end users can use familiar keywords to perform semantic search. A prototype system named 'SPARK' has been implemented in light of this approach. Given a keyword query, SPARK outputs a ranked list of SPARQL queries as the translation result. The translation in SPARK consists of three major steps: term mapping, query graph construction and query ranking. Specifically, a probabilistic query ranking model is proposed to select the most likely SPARQL query. In the experiment, SPARK achieved an encouraging translation result.
Zong, Nansu; Lee, Sungin; Ahn, Jinhyun; Kim, Hong-Gee
2017-08-01
The keyword-based entity search restricts search space based on the preference of search. When given keywords and preferences are not related to the same biomedical topic, existing biomedical Linked Data search engines fail to deliver satisfactory results. This research aims to tackle this issue by supporting an inter-topic search-improving search with inputs, keywords and preferences, under different topics. This study developed an effective algorithm in which the relations between biomedical entities were used in tandem with a keyword-based entity search, Siren. The algorithm, PERank, which is an adaptation of Personalized PageRank (PPR), uses a pair of input: (1) search preferences, and (2) entities from a keyword-based entity search with a keyword query, to formalize the search results on-the-fly based on the index of the precomputed Individual Personalized PageRank Vectors (IPPVs). Our experiments were performed over ten linked life datasets for two query sets, one with keyword-preference topic correspondence (intra-topic search), and the other without (inter-topic search). The experiments showed that the proposed method achieved better search results, for example a 14% increase in precision for the inter-topic search than the baseline keyword-based search engine. The proposed method improved the keyword-based biomedical entity search by supporting the inter-topic search without affecting the intra-topic search based on the relations between different entities. Copyright © 2017 Elsevier Ltd. All rights reserved.
Sokhey, Taegh; Gaebler-Spira, Deborah; Kording, Konrad P.
2017-01-01
Background It is important to understand the motor deficits of children with Cerebral Palsy (CP). Our understanding of this motor disorder can be enriched by computational models of motor control. One crucial stage in generating movement involves combining uncertain information from different sources, and deficits in this process could contribute to reduced motor function in children with CP. Healthy adults can integrate previously-learned information (prior) with incoming sensory information (likelihood) in a close-to-optimal way when estimating object location, consistent with the use of Bayesian statistics. However, there are few studies investigating how children with CP perform sensorimotor integration. We compare sensorimotor estimation in children with CP and age-matched controls using a model-based analysis to understand the process. Methods and findings We examined Bayesian sensorimotor integration in children with CP, aged between 5 and 12 years old, with Gross Motor Function Classification System (GMFCS) levels 1–3 and compared their estimation behavior with age-matched typically-developing (TD) children. We used a simple sensorimotor estimation task which requires participants to combine probabilistic information from different sources: a likelihood distribution (current sensory information) with a prior distribution (learned target information). In order to examine sensorimotor integration, we quantified how participants weighed statistical information from the two sources (prior and likelihood) and compared this to the statistical optimal weighting. We found that the weighing of statistical information in children with CP was as statistically efficient as that of TD children. Conclusions We conclude that Bayesian sensorimotor integration is not impaired in children with CP and therefore, does not contribute to their motor deficits. Future research has the potential to enrich our understanding of motor disorders by investigating the stages of motor processing set out by computational models. Therapeutic interventions should exploit the ability of children with CP to use statistical information. PMID:29186196
Statistics based sampling for controller and estimator design
NASA Astrophysics Data System (ADS)
Tenne, Dirk
The purpose of this research is the development of statistical design tools for robust feed-forward/feedback controllers and nonlinear estimators. This dissertation is threefold and addresses the aforementioned topics nonlinear estimation, target tracking and robust control. To develop statistically robust controllers and nonlinear estimation algorithms, research has been performed to extend existing techniques, which propagate the statistics of the state, to achieve higher order accuracy. The so-called unscented transformation has been extended to capture higher order moments. Furthermore, higher order moment update algorithms based on a truncated power series have been developed. The proposed techniques are tested on various benchmark examples. Furthermore, the unscented transformation has been utilized to develop a three dimensional geometrically constrained target tracker. The proposed planar circular prediction algorithm has been developed in a local coordinate framework, which is amenable to extension of the tracking algorithm to three dimensional space. This tracker combines the predictions of a circular prediction algorithm and a constant velocity filter by utilizing the Covariance Intersection. This combined prediction can be updated with the subsequent measurement using a linear estimator. The proposed technique is illustrated on a 3D benchmark trajectory, which includes coordinated turns and straight line maneuvers. The third part of this dissertation addresses the design of controller which include knowledge of parametric uncertainties and their distributions. The parameter distributions are approximated by a finite set of points which are calculated by the unscented transformation. This set of points is used to design robust controllers which minimize a statistical performance of the plant over the domain of uncertainty consisting of a combination of the mean and variance. The proposed technique is illustrated on three benchmark problems. The first relates to the design of prefilters for a linear and nonlinear spring-mass-dashpot system and the second applies a feedback controller to a hovering helicopter. Lastly, the statistical robust controller design is devoted to a concurrent feed-forward/feedback controller structure for a high-speed low tension tape drive.
Improved estimates of fixed reproducible tangible wealth, 1929-95
DOT National Transportation Integrated Search
1997-05-01
This article presents revised estimates of the value of fixed reproducible tangible wealth in the United States for 192995; these estimates incorporate the definitional and statistical : improvements introduced in last years comprehensive revis...
An analytic technique for statistically modeling random atomic clock errors in estimation
NASA Technical Reports Server (NTRS)
Fell, P. J.
1981-01-01
Minimum variance estimation requires that the statistics of random observation errors be modeled properly. If measurements are derived through the use of atomic frequency standards, then one source of error affecting the observable is random fluctuation in frequency. This is the case, for example, with range and integrated Doppler measurements from satellites of the Global Positioning and baseline determination for geodynamic applications. An analytic method is presented which approximates the statistics of this random process. The procedure starts with a model of the Allan variance for a particular oscillator and develops the statistics of range and integrated Doppler measurements. A series of five first order Markov processes is used to approximate the power spectral density obtained from the Allan variance.
Rodríguez-Entrena, Macario; Schuberth, Florian; Gelhard, Carsten
2018-01-01
Structural equation modeling using partial least squares (PLS-SEM) has become a main-stream modeling approach in various disciplines. Nevertheless, prior literature still lacks a practical guidance on how to properly test for differences between parameter estimates. Whereas existing techniques such as parametric and non-parametric approaches in PLS multi-group analysis solely allow to assess differences between parameters that are estimated for different subpopulations, the study at hand introduces a technique that allows to also assess whether two parameter estimates that are derived from the same sample are statistically different. To illustrate this advancement to PLS-SEM, we particularly refer to a reduced version of the well-established technology acceptance model.
Bootstrap Methods: A Very Leisurely Look.
ERIC Educational Resources Information Center
Hinkle, Dennis E.; Winstead, Wayland H.
The Bootstrap method, a computer-intensive statistical method of estimation, is illustrated using a simple and efficient Statistical Analysis System (SAS) routine. The utility of the method for generating unknown parameters, including standard errors for simple statistics, regression coefficients, discriminant function coefficients, and factor…
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kane, V.E.
1979-10-01
The standard maximum likelihood and moment estimation procedures are shown to have some undesirable characteristics for estimating the parameters in a three-parameter lognormal distribution. A class of goodness-of-fit estimators is found which provides a useful alternative to the standard methods. The class of goodness-of-fit tests considered include the Shapiro-Wilk and Shapiro-Francia tests which reduce to a weighted linear combination of the order statistics that can be maximized in estimation problems. The weighted-order statistic estimators are compared to the standard procedures in Monte Carlo simulations. Bias and robustness of the procedures are examined and example data sets analyzed including geochemical datamore » from the National Uranium Resource Evaluation Program.« less
Evaluation of a segment-based LANDSAT full-frame approach to corp area estimation
NASA Technical Reports Server (NTRS)
Bauer, M. E. (Principal Investigator); Hixson, M. M.; Davis, S. M.
1981-01-01
As the registration of LANDSAT full frames enters the realm of current technology, sampling methods should be examined which utilize other than the segment data used for LACIE. The effect of separating the functions of sampling for training and sampling for area estimation. The frame selected for analysis was acquired over north central Iowa on August 9, 1978. A stratification of he full-frame was defined. Training data came from segments within the frame. Two classification and estimation procedures were compared: statistics developed on one segment were used to classify that segment, and pooled statistics from the segments were used to classify a systematic sample of pixels. Comparisons to USDA/ESCS estimates illustrate that the full-frame sampling approach can provide accurate and precise area estimates.
Inference for lidar-assisted estimation of forest growing stock volume
Ronald E. McRoberts; Erik Næsset; Terje Gobakken
2013-01-01
Estimates of growing stock volume are reported by the national forest inventories (NFI) of most countries and may serve as the basis for aboveground biomass and carbon estimates as required by an increasing number of international agreements. The probability-based (design-based) statistical estimators traditionally used by NFIs to calculate estimates are generally...
Automatic generation of stop word lists for information retrieval and analysis
Rose, Stuart J
2013-01-08
Methods and systems for automatically generating lists of stop words for information retrieval and analysis. Generation of the stop words can include providing a corpus of documents and a plurality of keywords. From the corpus of documents, a term list of all terms is constructed and both a keyword adjacency frequency and a keyword frequency are determined. If a ratio of the keyword adjacency frequency to the keyword frequency for a particular term on the term list is less than a predetermined value, then that term is excluded from the term list. The resulting term list is truncated based on predetermined criteria to form a stop word list.
Variational dynamic background model for keyword spotting in handwritten documents
NASA Astrophysics Data System (ADS)
Kumar, Gaurav; Wshah, Safwan; Govindaraju, Venu
2013-12-01
We propose a bayesian framework for keyword spotting in handwritten documents. This work is an extension to our previous work where we proposed dynamic background model, DBM for keyword spotting that takes into account the local character level scores and global word level scores to learn a logistic regression classifier to separate keywords from non-keywords. In this work, we add a bayesian layer on top of the DBM called the variational dynamic background model, VDBM. The logistic regression classifier uses the sigmoid function to separate keywords from non-keywords. The sigmoid function being neither convex nor concave, exact inference of VDBM becomes intractable. An expectation maximization step is proposed to do approximate inference. The advantage of VDBM over the DBM is multi-fold. Firstly, being bayesian, it prevents over-fitting of data. Secondly, it provides better modeling of data and an improved prediction of unseen data. VDBM is evaluated on the IAM dataset and the results prove that it outperforms our prior work and other state of the art line based word spotting system.
Estimation of Global Network Statistics from Incomplete Data
Bliss, Catherine A.; Danforth, Christopher M.; Dodds, Peter Sheridan
2014-01-01
Complex networks underlie an enormous variety of social, biological, physical, and virtual systems. A profound complication for the science of complex networks is that in most cases, observing all nodes and all network interactions is impossible. Previous work addressing the impacts of partial network data is surprisingly limited, focuses primarily on missing nodes, and suggests that network statistics derived from subsampled data are not suitable estimators for the same network statistics describing the overall network topology. We generate scaling methods to predict true network statistics, including the degree distribution, from only partial knowledge of nodes, links, or weights. Our methods are transparent and do not assume a known generating process for the network, thus enabling prediction of network statistics for a wide variety of applications. We validate analytical results on four simulated network classes and empirical data sets of various sizes. We perform subsampling experiments by varying proportions of sampled data and demonstrate that our scaling methods can provide very good estimates of true network statistics while acknowledging limits. Lastly, we apply our techniques to a set of rich and evolving large-scale social networks, Twitter reply networks. Based on 100 million tweets, we use our scaling techniques to propose a statistical characterization of the Twitter Interactome from September 2008 to November 2008. Our treatment allows us to find support for Dunbar's hypothesis in detecting an upper threshold for the number of active social contacts that individuals maintain over the course of one week. PMID:25338183
Maximum Likelihood Time-of-Arrival Estimation of Optical Pulses via Photon-Counting Photodetectors
NASA Technical Reports Server (NTRS)
Erkmen, Baris I.; Moision, Bruce E.
2010-01-01
Many optical imaging, ranging, and communications systems rely on the estimation of the arrival time of an optical pulse. Recently, such systems have been increasingly employing photon-counting photodetector technology, which changes the statistics of the observed photocurrent. This requires time-of-arrival estimators to be developed and their performances characterized. The statistics of the output of an ideal photodetector, which are well modeled as a Poisson point process, were considered. An analytical model was developed for the mean-square error of the maximum likelihood (ML) estimator, demonstrating two phenomena that cause deviations from the minimum achievable error at low signal power. An approximation was derived to the threshold at which the ML estimator essentially fails to provide better than a random guess of the pulse arrival time. Comparing the analytic model performance predictions to those obtained via simulations, it was verified that the model accurately predicts the ML performance over all regimes considered. There is little prior art that attempts to understand the fundamental limitations to time-of-arrival estimation from Poisson statistics. This work establishes both a simple mathematical description of the error behavior, and the associated physical processes that yield this behavior. Previous work on mean-square error characterization for ML estimators has predominantly focused on additive Gaussian noise. This work demonstrates that the discrete nature of the Poisson noise process leads to a distinctly different error behavior.
A powerful test for Balaam's design.
Mori, Joji; Kano, Yutaka
2015-01-01
The crossover trial design (AB/BA design) is often used to compare the effects of two treatments in medical science because it performs within-subject comparisons, which increase the precision of a treatment effect (i.e., a between-treatment difference). However, the AB/BA design cannot be applied in the presence of carryover effects and/or treatments-by-period interaction. In such cases, Balaam's design is a more suitable choice. Unlike the AB/BA design, Balaam's design inflates the variance of an estimate of the treatment effect, thereby reducing the statistical power of tests. This is a serious drawback of the design. Although the variance of parameter estimators in Balaam's design has been extensively studied, the estimators of the treatment effect to improve the inference have received little attention. If the estimate of the treatment effect is obtained by solving the mixed model equations, the AA and BB sequences are excluded from the estimation process. In this study, we develop a new estimator of the treatment effect and a new test statistic using the estimator. The aim is to improve the statistical inference in Balaam's design. Simulation studies indicate that the type I error of the proposed test is well controlled, and that the test is more powerful and has more suitable characteristics than other existing tests when interactions are substantial. The proposed test is also applied to analyze a real dataset. Copyright © 2015 John Wiley & Sons, Ltd.
Giordano, Bruno L.; Kayser, Christoph; Rousselet, Guillaume A.; Gross, Joachim; Schyns, Philippe G.
2016-01-01
Abstract We begin by reviewing the statistical framework of information theory as applicable to neuroimaging data analysis. A major factor hindering wider adoption of this framework in neuroimaging is the difficulty of estimating information theoretic quantities in practice. We present a novel estimation technique that combines the statistical theory of copulas with the closed form solution for the entropy of Gaussian variables. This results in a general, computationally efficient, flexible, and robust multivariate statistical framework that provides effect sizes on a common meaningful scale, allows for unified treatment of discrete, continuous, unidimensional and multidimensional variables, and enables direct comparisons of representations from behavioral and brain responses across any recording modality. We validate the use of this estimate as a statistical test within a neuroimaging context, considering both discrete stimulus classes and continuous stimulus features. We also present examples of analyses facilitated by these developments, including application of multivariate analyses to MEG planar magnetic field gradients, and pairwise temporal interactions in evoked EEG responses. We show the benefit of considering the instantaneous temporal derivative together with the raw values of M/EEG signals as a multivariate response, how we can separately quantify modulations of amplitude and direction for vector quantities, and how we can measure the emergence of novel information over time in evoked responses. Open‐source Matlab and Python code implementing the new methods accompanies this article. Hum Brain Mapp 38:1541–1573, 2017. © 2016 Wiley Periodicals, Inc. PMID:27860095
Bjerregaard, Peter; Becker, Ulrik
2013-01-01
Questionnaires are widely used to obtain information on health-related behaviour, and they are more often than not the only method that can be used to assess the distribution of behaviour in subgroups of the population. No validation studies of reported consumption of tobacco or alcohol have been published from circumpolar indigenous communities. The purpose of the study is to compare information on the consumption of tobacco and alcohol obtained from 3 population surveys in Greenland with import statistics. Estimates of consumption of cigarettes and alcohol using several different survey instruments in cross-sectional population studies from 1993-1994, 1999-2001 and 2005-2010 were compared with import statistics from the same years. For cigarettes, survey results accounted for virtually the total import. Alcohol consumption was significantly under-reported with reporting completeness ranging from 40% to 51% for different estimates of habitual weekly consumption in the 3 study periods. Including an estimate of binge drinking increased the estimated total consumption to 78% of the import. Compared with import statistics, questionnaire-based population surveys capture the consumption of cigarettes well in Greenland. Consumption of alcohol is under-reported, but asking about binge episodes in addition to the usual intake considerably increased the reported intake in this population and made it more in agreement with import statistics. It is unknown to what extent these findings at the population level can be inferred to population subgroups.
Redmond, Shelagh M.; Alexander-Kisslig, Karin; Woodhall, Sarah C.; van den Broek, Ingrid V. F.; van Bergen, Jan; Ward, Helen; Uusküla, Anneli; Herrmann, Björn; Andersen, Berit; Götz, Hannelore M.; Sfetcu, Otilia; Low, Nicola
2015-01-01
Background Accurate information about the prevalence of Chlamydia trachomatis is needed to assess national prevention and control measures. Methods We systematically reviewed population-based cross-sectional studies that estimated chlamydia prevalence in European Union/European Economic Area (EU/EEA) Member States and non-European high income countries from January 1990 to August 2012. We examined results in forest plots, explored heterogeneity using the I2 statistic, and conducted random effects meta-analysis if appropriate. Meta-regression was used to examine the relationship between study characteristics and chlamydia prevalence estimates. Results We included 25 population-based studies from 11 EU/EEA countries and 14 studies from five other high income countries. Four EU/EEA Member States reported on nationally representative surveys of sexually experienced adults aged 18–26 years (response rates 52–71%). In women, chlamydia point prevalence estimates ranged from 3.0–5.3%; the pooled average of these estimates was 3.6% (95% CI 2.4, 4.8, I2 0%). In men, estimates ranged from 2.4–7.3% (pooled average 3.5%; 95% CI 1.9, 5.2, I2 27%). Estimates in EU/EEA Member States were statistically consistent with those in other high income countries (I2 0% for women, 6% for men). There was statistical evidence of an association between survey response rate and estimated chlamydia prevalence; estimates were higher in surveys with lower response rates, (p = 0.003 in women, 0.018 in men). Conclusions Population-based surveys that estimate chlamydia prevalence are at risk of participation bias owing to low response rates. Estimates obtained in nationally representative samples of the general population of EU/EEA Member States are similar to estimates from other high income countries. PMID:25615574
Makeyev, Oleksandr; Joe, Cody; Lee, Colin; Besio, Walter G
2017-07-01
Concentric ring electrodes have shown promise in non-invasive electrophysiological measurement demonstrating their superiority to conventional disc electrodes, in particular, in accuracy of Laplacian estimation. Recently, we have proposed novel variable inter-ring distances concentric ring electrodes. Analytic and finite element method modeling results for linearly increasing distances electrode configurations suggested they may decrease the truncation error resulting in more accurate Laplacian estimates compared to currently used constant inter-ring distances configurations. This study assesses statistical significance of Laplacian estimation accuracy improvement due to novel variable inter-ring distances concentric ring electrodes. Full factorial design of analysis of variance was used with one categorical and two numerical factors: the inter-ring distances, the electrode diameter, and the number of concentric rings in the electrode. The response variables were the Relative Error and the Maximum Error of Laplacian estimation computed using a finite element method model for each of the combinations of levels of three factors. Effects of the main factors and their interactions on Relative Error and Maximum Error were assessed and the obtained results suggest that all three factors have statistically significant effects in the model confirming the potential of using inter-ring distances as a means of improving accuracy of Laplacian estimation.
Estimating Local Food Capacity in Publicly Funded Institutions
ERIC Educational Resources Information Center
Knight, Andrew J.; Chopra, Hema M.
2013-01-01
This article presents three approaches to estimate the size of the publicly funded institutional marketplace to determine what opportunities exist for local farmers and fishers. First, we found that estimates from national foodservice sales statistics over-estimate local capacity opportunities. Second, analyzing budgets of publicly funded…
These model-based estimates use two surveys, the Behavioral Risk Factor Surveillance System (BRFSS) and the National Health Interview Survey (NHIS). The two surveys are combined using novel statistical methodology.
Publication Bias in Meta-Analysis: Confidence Intervals for Rosenthal's Fail-Safe Number.
Fragkos, Konstantinos C; Tsagris, Michail; Frangos, Christos C
2014-01-01
The purpose of the present paper is to assess the efficacy of confidence intervals for Rosenthal's fail-safe number. Although Rosenthal's estimator is highly used by researchers, its statistical properties are largely unexplored. First of all, we developed statistical theory which allowed us to produce confidence intervals for Rosenthal's fail-safe number. This was produced by discerning whether the number of studies analysed in a meta-analysis is fixed or random. Each case produces different variance estimators. For a given number of studies and a given distribution, we provided five variance estimators. Confidence intervals are examined with a normal approximation and a nonparametric bootstrap. The accuracy of the different confidence interval estimates was then tested by methods of simulation under different distributional assumptions. The half normal distribution variance estimator has the best probability coverage. Finally, we provide a table of lower confidence intervals for Rosenthal's estimator.
Publication Bias in Meta-Analysis: Confidence Intervals for Rosenthal's Fail-Safe Number
Fragkos, Konstantinos C.; Tsagris, Michail; Frangos, Christos C.
2014-01-01
The purpose of the present paper is to assess the efficacy of confidence intervals for Rosenthal's fail-safe number. Although Rosenthal's estimator is highly used by researchers, its statistical properties are largely unexplored. First of all, we developed statistical theory which allowed us to produce confidence intervals for Rosenthal's fail-safe number. This was produced by discerning whether the number of studies analysed in a meta-analysis is fixed or random. Each case produces different variance estimators. For a given number of studies and a given distribution, we provided five variance estimators. Confidence intervals are examined with a normal approximation and a nonparametric bootstrap. The accuracy of the different confidence interval estimates was then tested by methods of simulation under different distributional assumptions. The half normal distribution variance estimator has the best probability coverage. Finally, we provide a table of lower confidence intervals for Rosenthal's estimator. PMID:27437470
Federal Register 2010, 2011, 2012, 2013, 2014
2010-05-14
... the Council's Bluefish Monitoring Committee (Monitoring Committee) and Scientific and Statistical... to, commercial and recreational catch/landing statistics, current estimates of fishing mortality... Marine Recreational Fisheries Statistics Survey (MRFSS) data through Wave 2 were available for 2009, and...
A Statistical Test for Comparing Nonnested Covariance Structure Models.
ERIC Educational Resources Information Center
Levy, Roy; Hancock, Gregory R.
While statistical procedures are well known for comparing hierarchically related (nested) covariance structure models, statistical tests for comparing nonhierarchically related (nonnested) models have proven more elusive. While isolated attempts have been made, none exists within the commonly used maximum likelihood estimation framework, thereby…
A new test of multivariate nonlinear causality
Bai, Zhidong; Jiang, Dandan; Lv, Zhihui; Wong, Wing-Keung; Zheng, Shurong
2018-01-01
The multivariate nonlinear Granger causality developed by Bai et al. (2010) (Mathematics and Computers in simulation. 2010; 81: 5-17) plays an important role in detecting the dynamic interrelationships between two groups of variables. Following the idea of Hiemstra-Jones (HJ) test proposed by Hiemstra and Jones (1994) (Journal of Finance. 1994; 49(5): 1639-1664), they attempt to establish a central limit theorem (CLT) of their test statistic by applying the asymptotical property of multivariate U-statistic. However, Bai et al. (2016) (2016; arXiv: 1701.03992) revisit the HJ test and find that the test statistic given by HJ is NOT a function of U-statistics which implies that the CLT neither proposed by Hiemstra and Jones (1994) nor the one extended by Bai et al. (2010) is valid for statistical inference. In this paper, we re-estimate the probabilities and reestablish the CLT of the new test statistic. Numerical simulation shows that our new estimates are consistent and our new test performs decent size and power. PMID:29304085
A new test of multivariate nonlinear causality.
Bai, Zhidong; Hui, Yongchang; Jiang, Dandan; Lv, Zhihui; Wong, Wing-Keung; Zheng, Shurong
2018-01-01
The multivariate nonlinear Granger causality developed by Bai et al. (2010) (Mathematics and Computers in simulation. 2010; 81: 5-17) plays an important role in detecting the dynamic interrelationships between two groups of variables. Following the idea of Hiemstra-Jones (HJ) test proposed by Hiemstra and Jones (1994) (Journal of Finance. 1994; 49(5): 1639-1664), they attempt to establish a central limit theorem (CLT) of their test statistic by applying the asymptotical property of multivariate U-statistic. However, Bai et al. (2016) (2016; arXiv: 1701.03992) revisit the HJ test and find that the test statistic given by HJ is NOT a function of U-statistics which implies that the CLT neither proposed by Hiemstra and Jones (1994) nor the one extended by Bai et al. (2010) is valid for statistical inference. In this paper, we re-estimate the probabilities and reestablish the CLT of the new test statistic. Numerical simulation shows that our new estimates are consistent and our new test performs decent size and power.
Probing the Statistical Properties of Unknown Texts: Application to the Voynich Manuscript
Amancio, Diego R.; Altmann, Eduardo G.; Rybski, Diego; Oliveira, Osvaldo N.; Costa, Luciano da F.
2013-01-01
While the use of statistical physics methods to analyze large corpora has been useful to unveil many patterns in texts, no comprehensive investigation has been performed on the interdependence between syntactic and semantic factors. In this study we propose a framework for determining whether a text (e.g., written in an unknown alphabet) is compatible with a natural language and to which language it could belong. The approach is based on three types of statistical measurements, i.e. obtained from first-order statistics of word properties in a text, from the topology of complex networks representing texts, and from intermittency concepts where text is treated as a time series. Comparative experiments were performed with the New Testament in 15 different languages and with distinct books in English and Portuguese in order to quantify the dependency of the different measurements on the language and on the story being told in the book. The metrics found to be informative in distinguishing real texts from their shuffled versions include assortativity, degree and selectivity of words. As an illustration, we analyze an undeciphered medieval manuscript known as the Voynich Manuscript. We show that it is mostly compatible with natural languages and incompatible with random texts. We also obtain candidates for keywords of the Voynich Manuscript which could be helpful in the effort of deciphering it. Because we were able to identify statistical measurements that are more dependent on the syntax than on the semantics, the framework may also serve for text analysis in language-dependent applications. PMID:23844002
Probing the statistical properties of unknown texts: application to the Voynich Manuscript.
Amancio, Diego R; Altmann, Eduardo G; Rybski, Diego; Oliveira, Osvaldo N; Costa, Luciano da F
2013-01-01
While the use of statistical physics methods to analyze large corpora has been useful to unveil many patterns in texts, no comprehensive investigation has been performed on the interdependence between syntactic and semantic factors. In this study we propose a framework for determining whether a text (e.g., written in an unknown alphabet) is compatible with a natural language and to which language it could belong. The approach is based on three types of statistical measurements, i.e. obtained from first-order statistics of word properties in a text, from the topology of complex networks representing texts, and from intermittency concepts where text is treated as a time series. Comparative experiments were performed with the New Testament in 15 different languages and with distinct books in English and Portuguese in order to quantify the dependency of the different measurements on the language and on the story being told in the book. The metrics found to be informative in distinguishing real texts from their shuffled versions include assortativity, degree and selectivity of words. As an illustration, we analyze an undeciphered medieval manuscript known as the Voynich Manuscript. We show that it is mostly compatible with natural languages and incompatible with random texts. We also obtain candidates for keywords of the Voynich Manuscript which could be helpful in the effort of deciphering it. Because we were able to identify statistical measurements that are more dependent on the syntax than on the semantics, the framework may also serve for text analysis in language-dependent applications.
[Efficacy of the keyword mnemonic method in adults].
Campos, Alfredo; Pérez-Fabello, María José; Camino, Estefanía
2010-11-01
Two experiments were used to assess the efficacy of the keyword mnemonic method in adults. In Experiment 1, immediate and delayed recall (at a one-day interval) were assessed by comparing the results obtained by a group of adults using the keyword mnemonic method in contrast to a group using the repetition method. The mean age of the sample under study was 59.35 years. Subjects were required to learn a list of 16 words translated from Latin into Spanish. Participants who used keyword mnemonics that had been devised by other experimental participants of the same characteristics, obtained significantly higher immediate and delayed recall scores than participants in the repetition method. In Experiment 2, other participants had to learn a list of 24 Latin words translated into Spanish by using the keyword mnemonic method reinforced with pictures. Immediate and delayed recall were significantly greater in the keyword mnemonic method group than in the repetition method group.
A keyword spotting model using perceptually significant energy features
NASA Astrophysics Data System (ADS)
Umakanthan, Padmalochini
The task of a keyword recognition system is to detect the presence of certain words in a conversation based on the linguistic information present in human speech. Such keyword spotting systems have applications in homeland security, telephone surveillance and human-computer interfacing. General procedure of a keyword spotting system involves feature generation and matching. In this work, new set of features that are based on the psycho-acoustic masking nature of human speech are proposed. After developing these features a time aligned pattern matching process was implemented to locate the words in a set of unknown words. A word boundary detection technique based on frame classification using the nonlinear characteristics of speech is also addressed in this work. Validation of this keyword spotting model was done using widely acclaimed Cepstral features. The experimental results indicate the viability of using these perceptually significant features as an augmented feature set in keyword spotting.
Limitations of the mnemonic-keyword method.
Campos, Alfredo; González, María Angeles; Amor, Angeles
2003-10-01
The effectiveness of the mnemonic-keyword method was investigated in 4 experiments in which participants were required to learn the 1st-language (L1, Spanish) equivalents of a list of 30 2nd-language words (L2, Latin). Experiments 1 (adolescents) and 2 (adults) were designed to assess whether the keyword method was more effective than the rote method; the researcher supplied the keyword, and the participants were allowed to pace themselves through the list. Experiments 3 (adolescents) and 4 (adults) were similar to Experiments 1 and 2 except that the participants were also supplied with a drawing that illustrated the relationship between the keyword and the L1 target word. All the experiments were performed with groups of participants in their classrooms (i.e., not in a laboratory context). In all experiments, the rote method was significantly more effective than was the keyword method.
CoPub: a literature-based keyword enrichment tool for microarray data analysis.
Frijters, Raoul; Heupers, Bart; van Beek, Pieter; Bouwhuis, Maurice; van Schaik, René; de Vlieg, Jacob; Polman, Jan; Alkema, Wynand
2008-07-01
Medline is a rich information source, from which links between genes and keywords describing biological processes, pathways, drugs, pathologies and diseases can be extracted. We developed a publicly available tool called CoPub that uses the information in the Medline database for the biological interpretation of microarray data. CoPub allows batch input of multiple human, mouse or rat genes and produces lists of keywords from several biomedical thesauri that are significantly correlated with the set of input genes. These lists link to Medline abstracts in which the co-occurring input genes and correlated keywords are highlighted. Furthermore, CoPub can graphically visualize differentially expressed genes and over-represented keywords in a network, providing detailed insight in the relationships between genes and keywords, and revealing the most influential genes as highly connected hubs. CoPub is freely accessible at http://services.nbic.nl/cgi-bin/copub/CoPub.pl.
ERIC Educational Resources Information Center
National Center for Educational Statistics (DHEW/OE), Washington, DC.
In response to needs expressed by the community of higher education institutions, the National Center for Educational Statistics has produced early estimates of a selected group of mean salaries of instructional faculty in institutions of higher education in 1972-73. The number and salaries of male and female instructional staff by rank are of…
Thomas, Elaine
2005-01-01
This article is the second in a series of three that will give health care professionals (HCPs) a sound introduction to medical statistics (Thomas, 2004). The objective of research is to find out about the population at large. However, it is generally not possible to study the whole of the population and research questions are addressed in an appropriate study sample. The next crucial step is then to use the information from the sample of individuals to make statements about the wider population of like individuals. This procedure of drawing conclusions about the population, based on study data, is known as inferential statistics. The findings from the study give us the best estimate of what is true for the relevant population, given the sample is representative of the population. It is important to consider how accurate this best estimate is, based on a single sample, when compared to the unknown population figure. Any difference between the observed sample result and the population characteristic is termed the sampling error. This article will cover the two main forms of statistical inference (hypothesis tests and estimation) along with issues that need to be addressed when considering the implications of the study results. Copyright (c) 2005 Whurr Publishers Ltd.
Erus, Guray; Zacharaki, Evangelia I.; Davatzikos, Christos
2014-01-01
This paper presents a method for capturing statistical variation of normal imaging phenotypes, with emphasis on brain structure. The method aims to estimate the statistical variation of a normative set of images from healthy individuals, and identify abnormalities as deviations from normality. A direct estimation of the statistical variation of the entire volumetric image is challenged by the high-dimensionality of images relative to smaller sample sizes. To overcome this limitation, we iteratively sample a large number of lower dimensional subspaces that capture image characteristics ranging from fine and localized to coarser and more global. Within each subspace, a “target-specific” feature selection strategy is applied to further reduce the dimensionality, by considering only imaging characteristics present in a test subject’s images. Marginal probability density functions of selected features are estimated through PCA models, in conjunction with an “estimability” criterion that limits the dimensionality of estimated probability densities according to available sample size and underlying anatomy variation. A test sample is iteratively projected to the subspaces of these marginals as determined by PCA models, and its trajectory delineates potential abnormalities. The method is applied to segmentation of various brain lesion types, and to simulated data on which superiority of the iterative method over straight PCA is demonstrated. PMID:24607564
Statistical Machine Learning for Structured and High Dimensional Data
2014-09-17
AFRL-OSR-VA-TR-2014-0234 STATISTICAL MACHINE LEARNING FOR STRUCTURED AND HIGH DIMENSIONAL DATA Larry Wasserman CARNEGIE MELLON UNIVERSITY Final...Re . 8-98) v Prescribed by ANSI Std. Z39.18 14-06-2014 Final Dec 2009 - Aug 2014 Statistical Machine Learning for Structured and High Dimensional...area of resource-constrained statistical estimation. machine learning , high-dimensional statistics U U U UU John Lafferty 773-702-3813 > Research under
A critique of the usefulness of inferential statistics in applied behavior analysis
Hopkins, B. L.; Cole, Brian L.; Mason, Tina L.
1998-01-01
Researchers continue to recommend that applied behavior analysts use inferential statistics in making decisions about effects of independent variables on dependent variables. In many other approaches to behavioral science, inferential statistics are the primary means for deciding the importance of effects. Several possible uses of inferential statistics are considered. Rather than being an objective means for making decisions about effects, as is often claimed, inferential statistics are shown to be subjective. It is argued that the use of inferential statistics adds nothing to the complex and admittedly subjective nonstatistical methods that are often employed in applied behavior analysis. Attacks on inferential statistics that are being made, perhaps with increasing frequency, by those who are not behavior analysts, are discussed. These attackers are calling for banning the use of inferential statistics in research publications and commonly recommend that behavioral scientists should switch to using statistics aimed at interval estimation or the method of confidence intervals. Interval estimation is shown to be contrary to the fundamental assumption of behavior analysis that only individuals behave. It is recommended that authors who wish to publish the results of inferential statistics be asked to justify them as a means for helping us to identify any ways in which they may be useful. PMID:22478304
Chou, C P; Bentler, P M; Satorra, A
1991-11-01
Research studying robustness of maximum likelihood (ML) statistics in covariance structure analysis has concluded that test statistics and standard errors are biased under severe non-normality. An estimation procedure known as asymptotic distribution free (ADF), making no distributional assumption, has been suggested to avoid these biases. Corrections to the normal theory statistics to yield more adequate performance have also been proposed. This study compares the performance of a scaled test statistic and robust standard errors for two models under several non-normal conditions and also compares these with the results from ML and ADF methods. Both ML and ADF test statistics performed rather well in one model and considerably worse in the other. In general, the scaled test statistic seemed to behave better than the ML test statistic and the ADF statistic performed the worst. The robust and ADF standard errors yielded more appropriate estimates of sampling variability than the ML standard errors, which were usually downward biased, in both models under most of the non-normal conditions. ML test statistics and standard errors were found to be quite robust to the violation of the normality assumption when data had either symmetric and platykurtic distributions, or non-symmetric and zero kurtotic distributions.
Quantifying the impact of between-study heterogeneity in multivariate meta-analyses
Jackson, Dan; White, Ian R; Riley, Richard D
2012-01-01
Measures that quantify the impact of heterogeneity in univariate meta-analysis, including the very popular I2 statistic, are now well established. Multivariate meta-analysis, where studies provide multiple outcomes that are pooled in a single analysis, is also becoming more commonly used. The question of how to quantify heterogeneity in the multivariate setting is therefore raised. It is the univariate R2 statistic, the ratio of the variance of the estimated treatment effect under the random and fixed effects models, that generalises most naturally, so this statistic provides our basis. This statistic is then used to derive a multivariate analogue of I2, which we call . We also provide a multivariate H2 statistic, the ratio of a generalisation of Cochran's heterogeneity statistic and its associated degrees of freedom, with an accompanying generalisation of the usual I2 statistic, . Our proposed heterogeneity statistics can be used alongside all the usual estimates and inferential procedures used in multivariate meta-analysis. We apply our methods to some real datasets and show how our statistics are equally appropriate in the context of multivariate meta-regression, where study level covariate effects are included in the model. Our heterogeneity statistics may be used when applying any procedure for fitting the multivariate random effects model. Copyright © 2012 John Wiley & Sons, Ltd. PMID:22763950
Analysis of worldwide research in the field of cybernetics during 1997-2011.
Singh, Virender; Perdigones, Alicia; García, José Luis; Cañas-Guerrero, Ignacio; Mazarrón, Fernando R
2014-12-01
The study provides an overview of the research activity carried out in the field of cybernetics. To do so, all research papers from 1997 to 2011 (16,445 research papers) under the category of "Computer Science, Cybernetics" of Web of Science have been processed using our in-house software which is developed specifically for this purpose. Among its multiple capabilities, this software analyses individual and compound keywords, quantifies productivity taking into account the work distribution, estimates the impact of each article and determines the collaborations established at different scales. Keywords analysis identifies the evolution of the most important research topics in the field of cybernetics and their specificity in biological aspects, as well as the research topics with lesser interest. The analysis of productivity, impact and collaborations provides a framework to assess research activity in a specific and realistic context. The geographical and institutional distribution of publications reveals the leading countries and research centres, analysing their relation to main research journals. Moreover, collaborations analysis reveals great differences in terms of internationalization and complexity of research networks. The results of this study may be very useful for the characterization and the decisions made by research in the field of cybernetics.
Marginal Structural Models with Counterfactual Effect Modifiers.
Zheng, Wenjing; Luo, Zhehui; van der Laan, Mark J
2018-06-08
In health and social sciences, research questions often involve systematic assessment of the modification of treatment causal effect by patient characteristics. In longitudinal settings, time-varying or post-intervention effect modifiers are also of interest. In this work, we investigate the robust and efficient estimation of the Counterfactual-History-Adjusted Marginal Structural Model (van der Laan MJ, Petersen M. Statistical learning of origin-specific statically optimal individualized treatment rules. Int J Biostat. 2007;3), which models the conditional intervention-specific mean outcome given a counterfactual modifier history in an ideal experiment. We establish the semiparametric efficiency theory for these models, and present a substitution-based, semiparametric efficient and doubly robust estimator using the targeted maximum likelihood estimation methodology (TMLE, e.g. van der Laan MJ, Rubin DB. Targeted maximum likelihood learning. Int J Biostat. 2006;2, van der Laan MJ, Rose S. Targeted learning: causal inference for observational and experimental data, 1st ed. Springer Series in Statistics. Springer, 2011). To facilitate implementation in applications where the effect modifier is high dimensional, our third contribution is a projected influence function (and the corresponding projected TMLE estimator), which retains most of the robustness of its efficient peer and can be easily implemented in applications where the use of the efficient influence function becomes taxing. We compare the projected TMLE estimator with an Inverse Probability of Treatment Weighted estimator (e.g. Robins JM. Marginal structural models. In: Proceedings of the American Statistical Association. Section on Bayesian Statistical Science, 1-10. 1997a, Hernan MA, Brumback B, Robins JM. Marginal structural models to estimate the causal effect of zidovudine on the survival of HIV-positive men. 2000;11:561-570), and a non-targeted G-computation estimator (Robins JM. A new approach to causal inference in mortality studies with sustained exposure periods - application to control of the healthy worker survivor effect. Math Modell. 1986;7:1393-1512.). The comparative performance of these estimators is assessed in a simulation study. The use of the projected TMLE estimator is illustrated in a secondary data analysis for the Sequenced Treatment Alternatives to Relieve Depression (STAR*D) trial where effect modifiers are subject to missing at random.
NASA Astrophysics Data System (ADS)
Mazidi, Hesam; Nehorai, Arye; Lew, Matthew D.
2018-02-01
In single-molecule (SM) super-resolution microscopy, the complexity of a biological structure, high molecular density, and a low signal-to-background ratio (SBR) may lead to imaging artifacts without a robust localization algorithm. Moreover, engineered point spread functions (PSFs) for 3D imaging pose difficulties due to their intricate features. We develop a Robust Statistical Estimation algorithm, called RoSE, that enables joint estimation of the 3D location and photon counts of SMs accurately and precisely using various PSFs under conditions of high molecular density and low SBR.
Estimation of diagnostic test accuracy without full verification: a review of latent class methods
Collins, John; Huynh, Minh
2014-01-01
The performance of a diagnostic test is best evaluated against a reference test that is without error. For many diseases, this is not possible, and an imperfect reference test must be used. However, diagnostic accuracy estimates may be biased if inaccurately verified status is used as the truth. Statistical models have been developed to handle this situation by treating disease as a latent variable. In this paper, we conduct a systematized review of statistical methods using latent class models for estimating test accuracy and disease prevalence in the absence of complete verification. PMID:24910172
An estimation of Canadian population exposure to cosmic rays from air travel.
Chen, Jing; Newton, Dustin
2013-03-01
Based on air travel statistics in 1984, it was estimated that less than 4 % of the population dose from cosmic ray exposure would result from air travel. In the present study, cosmic ray doses were calculated for more than 3,000 flights departing from more than 200 Canadian airports using actual flight profiles. Based on currently available air travel statistics, the annual per capita effective dose from air transportation is estimated to be 32 μSv for Canadians, about 10 % of the average cosmic ray dose received at ground level (310 μSv per year).
Low statistical power in biomedical science: a review of three human research domains.
Dumas-Mallet, Estelle; Button, Katherine S; Boraud, Thomas; Gonon, Francois; Munafò, Marcus R
2017-02-01
Studies with low statistical power increase the likelihood that a statistically significant finding represents a false positive result. We conducted a review of meta-analyses of studies investigating the association of biological, environmental or cognitive parameters with neurological, psychiatric and somatic diseases, excluding treatment studies, in order to estimate the average statistical power across these domains. Taking the effect size indicated by a meta-analysis as the best estimate of the likely true effect size, and assuming a threshold for declaring statistical significance of 5%, we found that approximately 50% of studies have statistical power in the 0-10% or 11-20% range, well below the minimum of 80% that is often considered conventional. Studies with low statistical power appear to be common in the biomedical sciences, at least in the specific subject areas captured by our search strategy. However, we also observe evidence that this depends in part on research methodology, with candidate gene studies showing very low average power and studies using cognitive/behavioural measures showing high average power. This warrants further investigation.
Low statistical power in biomedical science: a review of three human research domains
Dumas-Mallet, Estelle; Button, Katherine S.; Boraud, Thomas; Gonon, Francois
2017-01-01
Studies with low statistical power increase the likelihood that a statistically significant finding represents a false positive result. We conducted a review of meta-analyses of studies investigating the association of biological, environmental or cognitive parameters with neurological, psychiatric and somatic diseases, excluding treatment studies, in order to estimate the average statistical power across these domains. Taking the effect size indicated by a meta-analysis as the best estimate of the likely true effect size, and assuming a threshold for declaring statistical significance of 5%, we found that approximately 50% of studies have statistical power in the 0–10% or 11–20% range, well below the minimum of 80% that is often considered conventional. Studies with low statistical power appear to be common in the biomedical sciences, at least in the specific subject areas captured by our search strategy. However, we also observe evidence that this depends in part on research methodology, with candidate gene studies showing very low average power and studies using cognitive/behavioural measures showing high average power. This warrants further investigation. PMID:28386409
NASA Technical Reports Server (NTRS)
Pierson, Willard J., Jr.
1989-01-01
The values of the Normalized Radar Backscattering Cross Section (NRCS), sigma (o), obtained by a scatterometer are random variables whose variance is a known function of the expected value. The probability density function can be obtained from the normal distribution. Models for the expected value obtain it as a function of the properties of the waves on the ocean and the winds that generated the waves. Point estimates of the expected value were found from various statistics given the parameters that define the probability density function for each value. Random intervals were derived with a preassigned probability of containing that value. A statistical test to determine whether or not successive values of sigma (o) are truly independent was derived. The maximum likelihood estimates for wind speed and direction were found, given a model for backscatter as a function of the properties of the waves on the ocean. These estimates are biased as a result of the terms in the equation that involve natural logarithms, and calculations of the point estimates of the maximum likelihood values are used to show that the contributions of the logarithmic terms are negligible and that the terms can be omitted.
Kim, Kiyeon; Omori, Ryosuke; Ito, Kimihito
2017-12-01
The estimation of the basic reproduction number is essential to understand epidemic dynamics, and time series data of infected individuals are usually used for the estimation. However, such data are not always available. Methods to estimate the basic reproduction number using genealogy constructed from nucleotide sequences of pathogens have been proposed so far. Here, we propose a new method to estimate epidemiological parameters of outbreaks using the time series change of Tajima's D statistic on the nucleotide sequences of pathogens. To relate the time evolution of Tajima's D to the number of infected individuals, we constructed a parsimonious mathematical model describing both the transmission process of pathogens among hosts and the evolutionary process of the pathogens. As a case study we applied this method to the field data of nucleotide sequences of pandemic influenza A (H1N1) 2009 viruses collected in Argentina. The Tajima's D-based method estimated basic reproduction number to be 1.55 with 95% highest posterior density (HPD) between 1.31 and 2.05, and the date of epidemic peak to be 10th July with 95% HPD between 22nd June and 9th August. The estimated basic reproduction number was consistent with estimation by birth-death skyline plot and estimation using the time series of the number of infected individuals. These results suggested that Tajima's D statistic on nucleotide sequences of pathogens could be useful to estimate epidemiological parameters of outbreaks. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.
Smooth extrapolation of unknown anatomy via statistical shape models
NASA Astrophysics Data System (ADS)
Grupp, R. B.; Chiang, H.; Otake, Y.; Murphy, R. J.; Gordon, C. R.; Armand, M.; Taylor, R. H.
2015-03-01
Several methods to perform extrapolation of unknown anatomy were evaluated. The primary application is to enhance surgical procedures that may use partial medical images or medical images of incomplete anatomy. Le Fort-based, face-jaw-teeth transplant is one such procedure. From CT data of 36 skulls and 21 mandibles separate Statistical Shape Models of the anatomical surfaces were created. Using the Statistical Shape Models, incomplete surfaces were projected to obtain complete surface estimates. The surface estimates exhibit non-zero error in regions where the true surface is known; it is desirable to keep the true surface and seamlessly merge the estimated unknown surface. Existing extrapolation techniques produce non-smooth transitions from the true surface to the estimated surface, resulting in additional error and a less aesthetically pleasing result. The three extrapolation techniques evaluated were: copying and pasting of the surface estimate (non-smooth baseline), a feathering between the patient surface and surface estimate, and an estimate generated via a Thin Plate Spline trained from displacements between the surface estimate and corresponding vertices of the known patient surface. Feathering and Thin Plate Spline approaches both yielded smooth transitions. However, feathering corrupted known vertex values. Leave-one-out analyses were conducted, with 5% to 50% of known anatomy removed from the left-out patient and estimated via the proposed approaches. The Thin Plate Spline approach yielded smaller errors than the other two approaches, with an average vertex error improvement of 1.46 mm and 1.38 mm for the skull and mandible respectively, over the baseline approach.
Ham, D Cal; Lin, Carol; Newman, Lori; Wijesooriya, N Saman; Kamb, Mary
2015-06-01
"Probable active syphilis," is defined as seroreactivity in both non-treponemal and treponemal tests. A correction factor of 65%, namely the proportion of pregnant women reactive in one syphilis test type that were likely reactive in the second, was applied to reported syphilis seropositivity data reported to WHO for global estimates of syphilis during pregnancy. To identify more accurate correction factors based on test type reported. Medline search using: "Syphilis [Mesh] and Pregnancy [Mesh]," "Syphilis [Mesh] and Prenatal Diagnosis [Mesh]," and "Syphilis [Mesh] and Antenatal [Keyword]. Eligible studies must have reported results for pregnant or puerperal women for both non-treponemal and treponemal serology. We manually calculated the crude percent estimates of subjects with both reactive treponemal and reactive non-treponemal tests among subjects with reactive treponemal and among subjects with reactive non-treponemal tests. We summarized the percent estimates using random effects models. Countries reporting both reactive non-treponemal and reactive treponemal testing required no correction factor. Countries reporting non-treponemal testing or treponemal testing alone required a correction factor of 52.2% and 53.6%, respectively. Countries not reporting test type required a correction factor of 68.6%. Future estimates should adjust reported maternal syphilis seropositivity by test type to ensure accuracy. Published by Elsevier Ireland Ltd.
Modelling past land use using archaeological and pollen data
NASA Astrophysics Data System (ADS)
Pirzamanbein, Behnaz; Lindström, johan; Poska, Anneli; Gaillard-Lemdahl, Marie-José
2016-04-01
Accurate maps of past land use are necessary for studying the impact of anthropogenic land-cover changes on climate and biodiversity. We develop a Bayesian hierarchical model to reconstruct the land use using Gaussian Markov random fields. The model uses two observations sets: 1) archaeological data, representing human settlements, urbanization and agricultural findings; and 2) pollen-based land estimates of the three land-cover types Coniferous forest, Broadleaved forest and Unforested/Open land. The pollen based estimates are obtained from the REVEALS model, based on pollen counts from lakes and bogs. Our developed model uses the sparse pollen-based estimations to reconstruct the spatial continuous cover of three land cover types. Using the open-land component and the archaeological data, the extent of land-use is reconstructed. The model is applied on three time periods - centred around 1900 CE, 1000 and, 4000 BCE over Sweden for which both pollen-based estimates and archaeological data are available. To estimate the model parameters and land use, a block updated Markov chain Monte Carlo (MCMC) algorithm is applied. Using the MCMC posterior samples uncertainties in land-use predictions are computed. Due to lack of good historic land use data, model results are evaluated by cross-validation. Keywords. Spatial reconstruction, Gaussian Markov random field, Fossil pollen records, Archaeological data, Human land-use, Prediction uncertainty
A meta-analysis of the worldwide prevalence of pica during pregnancy and the postpartum period.
Fawcett, Emily J; Fawcett, Jonathan M; Mazmanian, Dwight
2016-06-01
Although pica has long been associated with pregnancy, the exact prevalence in this population remains unknown. To estimate the prevalence of pica during pregnancy and the postpartum period, and to explain variations in prevalence estimates by examining potential moderating variables. PsycARTICLES, PsycINFO, PubMed, and Google Scholar were searched from inception to February 2014 using the keywords pica, prevalence, and epidemiology. Articles estimating pica prevalence during pregnancy and/or the postpartum period using a self-report questionnaire or interview were included. Study characteristics, pica prevalence, and eight potential moderating variables were recorded (parity, anemia, duration of pregnancy, mean maternal age, education, sampling method employed, region, and publication date). Random-effects models were employed. In total, 70 studies were included, producing an aggregate prevalence estimate of 27.8% (95% confidence interval 22.8-33.3). In light of substantial heterogeneity within the study model, the primary focus was identifying moderator variables. Pica prevalence was higher in Africa compared with elsewhere in the world, increased as the prevalence of anemia increased, and decreased as educational attainment increased. Geographical region, anemia, and education were found to moderate pica prevalence, partially explaining the heterogeneity in prevalence estimates across the literature. Copyright © 2016 International Federation of Gynecology and Obstetrics. Published by Elsevier Ireland Ltd. All rights reserved.
Development of a technique for estimating noise covariances using multiple observers
NASA Technical Reports Server (NTRS)
Bundick, W. Thomas
1988-01-01
Friedland's technique for estimating the unknown noise variances of a linear system using multiple observers has been extended by developing a general solution for the estimates of the variances, developing the statistics (mean and standard deviation) of these estimates, and demonstrating the solution on two examples.
Misclassification bias in areal estimates
Raymond L. Czaplewski
1992-01-01
In addition to thematic maps, remote sensing provides estimates of area in different thematic categories. Areal estimates are frequently used for resource inventories, management planning, and assessment analyses. Misclassification causes bias in these statistical areal estimates. For example, if a small percentage of a common cover type is misclassified as a rare...
Sample Size Estimation: The Easy Way
ERIC Educational Resources Information Center
Weller, Susan C.
2015-01-01
This article presents a simple approach to making quick sample size estimates for basic hypothesis tests. Although there are many sources available for estimating sample sizes, methods are not often integrated across statistical tests, levels of measurement of variables, or effect sizes. A few parameters are required to estimate sample sizes and…
Howard Stauffer; Nadav Nur
2005-01-01
The papers included in the Advances in Statistics section of the Partners in Flight (PIF) 2002 Proceedings represent a small sample of statistical topics of current importance to Partners In Flight research scientists: hierarchical modeling, estimation of detection probabilities, and Bayesian applications. Sauer et al. (this volume) examines a hierarchical model...
Explorations in Statistics: Correlation
ERIC Educational Resources Information Center
Curran-Everett, Douglas
2010-01-01
Learning about statistics is a lot like learning about science: the learning is more meaningful if you can actively explore. This sixth installment of "Explorations in Statistics" explores correlation, a familiar technique that estimates the magnitude of a straight-line relationship between two variables. Correlation is meaningful only when the…
U.S. Population Data 1969-2016 - SEER Population Data
Download county population estimates used in SEER*Stat to calculate cancer incidence and mortality rates. The estimates are a modification of the U.S. Census Bureau's Population Estimates Program, in collaboration with National Center for Health Statistics.
Yao, Xiaohui; Yan, Jingwen; Ginda, Michael; Börner, Katy; Saykin, Andrew J; Shen, Li
2017-01-01
Alzheimer's disease neuroimaging initiative (ADNI) is a landmark imaging and omics study in AD. ADNI research literature has increased substantially over the past decade, which poses challenges for effectively communicating information about the results and impact of ADNI-related studies. In this work, we employed advanced information visualization techniques to perform a comprehensive and systematic mapping of the ADNI scientific growth and impact over a period of 12 years. Citation information of ADNI-related publications from 01/01/2003 to 05/12/2015 were downloaded from the Scopus database. Five fields, including authors, years, affiliations, sources (journals), and keywords, were extracted and preprocessed. Statistical analyses were performed on basic publication data as well as journal and citations information. Science mapping workflows were conducted using the Science of Science (Sci2) Tool to generate geospatial, topical, and collaboration visualizations at the micro (individual) to macro (global) levels such as geospatial layouts of institutional collaboration networks, keyword co-occurrence networks, and author collaboration networks evolving over time. During the studied period, 996 ADNI manuscripts were published across 233 journals and conference proceedings. The number of publications grew linearly from 2008 to 2015, so did the number of involved institutions. ADNI publications received much more citations than typical papers from the same set of journals. Collaborations were visualized at multiple levels, including authors, institutions, and research areas. The evolution of key ADNI research topics was also plotted over the studied period. Both statistical and visualization results demonstrate the increasing attention of ADNI research, strong citation impact of ADNI publications, the expanding collaboration networks among researchers, institutions and ADNI core areas, and the dynamic evolution of ADNI research topics. The visualizations presented here can help improve daily decision making based on a deep understanding of existing patterns and trends using proven and replicable data analysis and visualization methods. They have great potential to provide new insights and actionable knowledge for helping translational research in AD.
Yao, Xiaohui; Yan, Jingwen; Ginda, Michael; Börner, Katy; Saykin, Andrew J.
2017-01-01
Background Alzheimer’s disease neuroimaging initiative (ADNI) is a landmark imaging and omics study in AD. ADNI research literature has increased substantially over the past decade, which poses challenges for effectively communicating information about the results and impact of ADNI-related studies. In this work, we employed advanced information visualization techniques to perform a comprehensive and systematic mapping of the ADNI scientific growth and impact over a period of 12 years. Methods Citation information of ADNI-related publications from 01/01/2003 to 05/12/2015 were downloaded from the Scopus database. Five fields, including authors, years, affiliations, sources (journals), and keywords, were extracted and preprocessed. Statistical analyses were performed on basic publication data as well as journal and citations information. Science mapping workflows were conducted using the Science of Science (Sci2) Tool to generate geospatial, topical, and collaboration visualizations at the micro (individual) to macro (global) levels such as geospatial layouts of institutional collaboration networks, keyword co-occurrence networks, and author collaboration networks evolving over time. Results During the studied period, 996 ADNI manuscripts were published across 233 journals and conference proceedings. The number of publications grew linearly from 2008 to 2015, so did the number of involved institutions. ADNI publications received much more citations than typical papers from the same set of journals. Collaborations were visualized at multiple levels, including authors, institutions, and research areas. The evolution of key ADNI research topics was also plotted over the studied period. Conclusions Both statistical and visualization results demonstrate the increasing attention of ADNI research, strong citation impact of ADNI publications, the expanding collaboration networks among researchers, institutions and ADNI core areas, and the dynamic evolution of ADNI research topics. The visualizations presented here can help improve daily decision making based on a deep understanding of existing patterns and trends using proven and replicable data analysis and visualization methods. They have great potential to provide new insights and actionable knowledge for helping translational research in AD. PMID:29095836
Lu, Xinxing; Han, Hu; Xing, Nianzeng; Tian, Long
2015-09-22
To systematically assess the efficacy and safety of oral sildenafil citrate for post bilateral nerve-sparig radical prostatectomy (post-BNSRP) erectile dysfunction (ED). The following keywords: sildenafil, radical prostatectomy were used to search in Medline, Pubmed, Web of Science, Cochrane Library, CNKI, WanFang Database. The title, abstract and keywords of each article were independently screened by two reviewers. Randomized controlled trials (RCT) published between 1990 and 2014 were retrieved according to our inclusion and exclusion criteria. The efficacy and safety of oral sildenafil citrate for post-BNSRP ED were systematically assessed by meta-analysis. Four RCTs with 320 cases were included after literature retrieval and filtering. The potency rates were 32.1% (35/109) and 11.3% (7/62) between sildenafil and placebo groups after meta-analysis and showed statistically significant differences (OR = 4.66, 95% CI: 1.79-12.11). IIEF-5 score in sildenafil group was significant higher than that in the placebo group (WMD: 4.73, 95% CI: 3.26-6.19). In subgroup meta-analysis, the potency rates in high-dose, low-dose sildenafil groups and placebo groups were 30.4% (14/46), 25.0% (10/40) and 4.5% (2/44), respectively. There were statistically significant differences between the high-dose subgroup and placebo group (OR = 9.32, 95% CI: 1.96-44.23), and the low-dose subgroup and placebo group (OR = 6.99, 95% CI: 1.43-34.22). But there was no significant difference between high-dose and low-dose subgroups (OR = 1.31, 95% CI: 0.51-3.40). Compared with placebo, sildenafil has considerable efficacy for erectile function rehabilitationas as a primary treatment for post-BNSRP ED. There is no significant difference between high-dose and low-dose schedule for its efficacy. However, further studies are required to optimize treatment.
Estimation of the geochemical threshold and its statistical significance
Miesch, A.T.
1981-01-01
A statistic is proposed for estimating the geochemical threshold and its statistical significance, or it may be used to identify a group of extreme values that can be tested for significance by other means. The statistic is the maximum gap between adjacent values in an ordered array after each gap has been adjusted for the expected frequency. The values in the ordered array are geochemical values transformed by either ln(?? - ??) or ln(?? - ??) and then standardized so that the mean is zero and the variance is unity. The expected frequency is taken from a fitted normal curve with unit area. The midpoint of an adjusted gap that exceeds the corresponding critical value may be taken as an estimate of the geochemical threshold, and the associated probability indicates the likelihood that the threshold separates two geochemical populations. The adjusted gap test may fail to identify threshold values if the variation tends to be continuous from background values to the higher values that reflect mineralized ground. However, the test will serve to identify other anomalies that may be too subtle to have been noted by other means. ?? 1981.
Pei, Yanbo; Tian, Guo-Liang; Tang, Man-Lai
2014-11-10
Stratified data analysis is an important research topic in many biomedical studies and clinical trials. In this article, we develop five test statistics for testing the homogeneity of proportion ratios for stratified correlated bilateral binary data based on an equal correlation model assumption. Bootstrap procedures based on these test statistics are also considered. To evaluate the performance of these statistics and procedures, we conduct Monte Carlo simulations to study their empirical sizes and powers under various scenarios. Our results suggest that the procedure based on score statistic performs well generally and is highly recommended. When the sample size is large, procedures based on the commonly used weighted least square estimate and logarithmic transformation with Mantel-Haenszel estimate are recommended as they do not involve any computation of maximum likelihood estimates requiring iterative algorithms. We also derive approximate sample size formulas based on the recommended test procedures. Finally, we apply the proposed methods to analyze a multi-center randomized clinical trial for scleroderma patients. Copyright © 2014 John Wiley & Sons, Ltd.
Some insight on censored cost estimators.
Zhao, H; Cheng, Y; Bang, H
2011-08-30
Censored survival data analysis has been studied for many years. Yet, the analysis of censored mark variables, such as medical cost, quality-adjusted lifetime, and repeated events, faces a unique challenge that makes standard survival analysis techniques invalid. Because of the 'informative' censorship imbedded in censored mark variables, the use of the Kaplan-Meier (Journal of the American Statistical Association 1958; 53:457-481) estimator, as an example, will produce biased estimates. Innovative estimators have been developed in the past decade in order to handle this issue. Even though consistent estimators have been proposed, the formulations and interpretations of some estimators are less intuitive to practitioners. On the other hand, more intuitive estimators have been proposed, but their mathematical properties have not been established. In this paper, we prove the analytic identity between some estimators (a statistically motivated estimator and an intuitive estimator) for censored cost data. Efron (1967) made similar investigation for censored survival data (between the Kaplan-Meier estimator and the redistribute-to-the-right algorithm). Therefore, we view our study as an extension of Efron's work to informatively censored data so that our findings could be applied to other marked variables. Copyright © 2011 John Wiley & Sons, Ltd.
Granato, Gregory E.
2014-01-01
The U.S. Geological Survey (USGS) developed the Stochastic Empirical Loading and Dilution Model (SELDM) in cooperation with the Federal Highway Administration (FHWA) to indicate the risk for stormwater concentrations, flows, and loads to be above user-selected water-quality goals and the potential effectiveness of mitigation measures to reduce such risks. SELDM models the potential effect of mitigation measures by using Monte Carlo methods with statistics that approximate the net effects of structural and nonstructural best management practices (BMPs). In this report, structural BMPs are defined as the components of the drainage pathway between the source of runoff and a stormwater discharge location that affect the volume, timing, or quality of runoff. SELDM uses a simple stochastic statistical model of BMP performance to develop planning-level estimates of runoff-event characteristics. This statistical approach can be used to represent a single BMP or an assemblage of BMPs. The SELDM BMP-treatment module has provisions for stochastic modeling of three stormwater treatments: volume reduction, hydrograph extension, and water-quality treatment. In SELDM, these three treatment variables are modeled by using the trapezoidal distribution and the rank correlation with the associated highway-runoff variables. This report describes methods for calculating the trapezoidal-distribution statistics and rank correlation coefficients for stochastic modeling of volume reduction, hydrograph extension, and water-quality treatment by structural stormwater BMPs and provides the calculated values for these variables. This report also provides robust methods for estimating the minimum irreducible concentration (MIC), which is the lowest expected effluent concentration from a particular BMP site or a class of BMPs. These statistics are different from the statistics commonly used to characterize or compare BMPs. They are designed to provide a stochastic transfer function to approximate the quantity, duration, and quality of BMP effluent given the associated inflow values for a population of storm events. A database application and several spreadsheet tools are included in the digital media accompanying this report for further documentation of methods and for future use. In this study, analyses were done with data extracted from a modified copy of the January 2012 version of International Stormwater Best Management Practices Database, designated herein as the January 2012a version. Statistics for volume reduction, hydrograph extension, and water-quality treatment were developed with selected data. Sufficient data were available to estimate statistics for 5 to 10 BMP categories by using data from 40 to more than 165 monitoring sites. Water-quality treatment statistics were developed for 13 runoff-quality constituents commonly measured in highway and urban runoff studies including turbidity, sediment and solids; nutrients; total metals; organic carbon; and fecal coliforms. The medians of the best-fit statistics for each category were selected to construct generalized cumulative distribution functions for the three treatment variables. For volume reduction and hydrograph extension, interpretation of available data indicates that selection of a Spearman’s rho value that is the average of the median and maximum values for the BMP category may help generate realistic simulation results in SELDM. The median rho value may be selected to help generate realistic simulation results for water-quality treatment variables. MIC statistics were developed for 12 runoff-quality constituents commonly measured in highway and urban runoff studies by using data from 11 BMP categories and more than 167 monitoring sites. Four statistical techniques were applied for estimating MIC values with monitoring data from each site. These techniques produce a range of lower-bound estimates for each site. Four MIC estimators are proposed as alternatives for selecting a value from among the estimates from multiple sites. Correlation analysis indicates that the MIC estimates from multiple sites were weakly correlated with the geometric mean of inflow values, which indicates that there may be a qualitative or semiquantitative link between the inflow quality and the MIC. Correlations probably are weak because the MIC is influenced by the inflow water quality and the capability of each individual BMP site to reduce inflow concentrations.
Galili, Tal; Meilijson, Isaac
2016-01-02
The Rao-Blackwell theorem offers a procedure for converting a crude unbiased estimator of a parameter θ into a "better" one, in fact unique and optimal if the improvement is based on a minimal sufficient statistic that is complete. In contrast, behind every minimal sufficient statistic that is not complete, there is an improvable Rao-Blackwell improvement. This is illustrated via a simple example based on the uniform distribution, in which a rather natural Rao-Blackwell improvement is uniformly improvable. Furthermore, in this example the maximum likelihood estimator is inefficient, and an unbiased generalized Bayes estimator performs exceptionally well. Counterexamples of this sort can be useful didactic tools for explaining the true nature of a methodology and possible consequences when some of the assumptions are violated. [Received December 2014. Revised September 2015.].
Donato, David I.
2012-01-01
This report presents the mathematical expressions and the computational techniques required to compute maximum-likelihood estimates for the parameters of the National Descriptive Model of Mercury in Fish (NDMMF), a statistical model used to predict the concentration of methylmercury in fish tissue. The expressions and techniques reported here were prepared to support the development of custom software capable of computing NDMMF parameter estimates more quickly and using less computer memory than is currently possible with available general-purpose statistical software. Computation of maximum-likelihood estimates for the NDMMF by numerical solution of a system of simultaneous equations through repeated Newton-Raphson iterations is described. This report explains the derivation of the mathematical expressions required for computational parameter estimation in sufficient detail to facilitate future derivations for any revised versions of the NDMMF that may be developed.
Weak Value Amplification is Suboptimal for Estimation and Detection
NASA Astrophysics Data System (ADS)
Ferrie, Christopher; Combes, Joshua
2014-01-01
We show by using statistically rigorous arguments that the technique of weak value amplification does not perform better than standard statistical techniques for the tasks of single parameter estimation and signal detection. Specifically, we prove that postselection, a necessary ingredient for weak value amplification, decreases estimation accuracy and, moreover, arranging for anomalously large weak values is a suboptimal strategy. In doing so, we explicitly provide the optimal estimator, which in turn allows us to identify the optimal experimental arrangement to be the one in which all outcomes have equal weak values (all as small as possible) and the initial state of the meter is the maximal eigenvalue of the square of the system observable. Finally, we give precise quantitative conditions for when weak measurement (measurements without postselection or anomalously large weak values) can mitigate the effect of uncharacterized technical noise in estimation.
Evaluation of methods to estimate lake herring spawner abundance in Lake Superior
Yule, D.L.; Stockwell, J.D.; Cholwek, G.A.; Evrard, L.M.; Schram, S.; Seider, M.; Symbal, M.
2006-01-01
Historically, commercial fishers harvested Lake Superior lake herring Coregonus artedi for their flesh, but recently operators have targeted lake herring for roe. Because no surveys have estimated spawning female abundance, direct estimates of fishing mortality are lacking. The primary objective of this study was to determine the feasibility of using acoustic techniques in combination with midwater trawling to estimate spawning female lake herring densities in a Lake Superior statistical grid (i.e., a 10′ latitude × 10′ longitude area over which annual commercial harvest statistics are compiled). Midwater trawling showed that mature female lake herring were largely pelagic during the night in late November, accounting for 94.5% of all fish caught exceeding 250 mm total length. When calculating acoustic estimates of mature female lake herring, we excluded backscattering from smaller pelagic fishes like immature lake herring and rainbow smelt Osmerus mordax by applying an empirically derived threshold of −35.6 dB. We estimated the average density of mature females in statistical grid 1409 at 13.3 fish/ha and the total number of spawning females at 227,600 (95% confidence interval = 172,500–282,700). Using information on mature female densities, size structure, and fecundity, we estimate that females deposited 3.027 billion (109) eggs in grid 1409 (95% confidence interval = 2.356–3.778 billion). The relative estimation error of the mature female density estimate derived using a geostatistical model—based approach was low (12.3%), suggesting that the employed method was robust. Fishing mortality rates of all mature females and their eggs were estimated at 2.3% and 3.8%, respectively. The techniques described for enumerating spawning female lake herring could be used to develop a more accurate stock–recruitment model for Lake Superior lake herring.
Robust estimation approach for blind denoising.
Rabie, Tamer
2005-11-01
This work develops a new robust statistical framework for blind image denoising. Robust statistics addresses the problem of estimation when the idealized assumptions about a system are occasionally violated. The contaminating noise in an image is considered as a violation of the assumption of spatial coherence of the image intensities and is treated as an outlier random variable. A denoised image is estimated by fitting a spatially coherent stationary image model to the available noisy data using a robust estimator-based regression method within an optimal-size adaptive window. The robust formulation aims at eliminating the noise outliers while preserving the edge structures in the restored image. Several examples demonstrating the effectiveness of this robust denoising technique are reported and a comparison with other standard denoising filters is presented.
TRAN-STAT: statistics for environmental transuranic studies, July 1978, Number 5
DOE Office of Scientific and Technical Information (OSTI.GOV)
Not Available
This issue is concerned with nonparametric procedures for (1) estimating the central tendency of a population, (2) describing data sets through estimating percentiles, (3) estimating confidence limits for the median and other percentiles, (4) estimating tolerance limits and associated numbers of samples, and (5) tests of significance and associated procedures for a variety of testing situations (counterparts to t-tests and analysis of variance). Some characteristics of several nonparametric tests are illustrated using the NAEG /sup 241/Am aliquot data presented and discussed in the April issue of TRAN-STAT. Some of the statistical terms used here are defined in a glossary. Themore » reference list also includes short descriptions of nonparametric books. 31 references, 3 figures, 1 table.« less
Simulated performance of an order statistic threshold strategy for detection of narrowband signals
NASA Technical Reports Server (NTRS)
Satorius, E.; Brady, R.; Deich, W.; Gulkis, S.; Olsen, E.
1988-01-01
The application of order statistics to signal detection is becoming an increasingly active area of research. This is due to the inherent robustness of rank estimators in the presence of large outliers that would significantly degrade more conventional mean-level-based detection systems. A detection strategy is presented in which the threshold estimate is obtained using order statistics. The performance of this algorithm in the presence of simulated interference and broadband noise is evaluated. In this way, the robustness of the proposed strategy in the presence of the interference can be fully assessed as a function of the interference, noise, and detector parameters.
Mager, P P; Rothe, H
1990-10-01
Multicollinearity of physicochemical descriptors leads to serious consequences in quantitative structure-activity relationship (QSAR) analysis, such as incorrect estimators and test statistics of regression coefficients of the ordinary least-squares (OLS) model applied usually to QSARs. Beside the diagnosis of the known simple collinearity, principal component regression analysis (PCRA) also allows the diagnosis of various types of multicollinearity. Only if the absolute values of PCRA estimators are order statistics that decrease monotonically, the effects of multicollinearity can be circumvented. Otherwise, obscure phenomena may be observed, such as good data recognition but low predictive model power of a QSAR model.
Detection of pneumonia using free-text radiology reports in the BioSense system.
Asatryan, Armenak; Benoit, Stephen; Ma, Haobo; English, Roseanne; Elkin, Peter; Tokars, Jerome
2011-01-01
Near real-time disease detection using electronic data sources is a public health priority. Detecting pneumonia is particularly important because it is the manifesting disease of several bioterrorism agents as well as a complication of influenza, including avian and novel H1N1 strains. Text radiology reports are available earlier than physician diagnoses and so could be integral to rapid detection of pneumonia. We performed a pilot study to determine which keywords present in text radiology reports are most highly associated with pneumonia diagnosis. Electronic radiology text reports from 11 hospitals from February 1, 2006 through December 31, 2007 were used. We created a computerized algorithm that searched for selected keywords ("airspace disease", "consolidation", "density", "infiltrate", "opacity", and "pneumonia"), differentiated between clinical history and radiographic findings, and accounted for negations and double negations; this algorithm was tested on a sample of 350 radiology reports. We used the algorithm to study 189,246 chest radiographs, searching for the keywords and determining their association with a final International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) diagnosis of pneumonia. Performance of the search algorithm in finding keywords, and association of the keywords with a pneumonia diagnosis. In the sample of 350 radiographs, the search algorithm was highly successful in identifying the selected keywords (sensitivity 98.5%, specificity 100%). Analysis of the 189,246 radiographs showed that the keyword "pneumonia" was the strongest predictor of an ICD-9-CM diagnosis of pneumonia (adjusted odds ratio 11.8) while "density" was the weakest (adjusted odds ratio 1.5). In general, the most highly associated keyword present in the report, regardless of whether a less highly associated keyword was also present, was the best predictor of a diagnosis of pneumonia. Empirical methods may assist in finding radiology report keywords that are most highly predictive of a pneumonia diagnosis. Copyright © 2010 Elsevier Ireland Ltd. All rights reserved.
Literature-based compound profiling: application to toxicogenomics.
Frijters, Raoul; Verhoeven, Stefan; Alkema, Wynand; van Schaik, René; Polman, Jan
2007-11-01
To reduce continuously increasing costs in drug development, adverse effects of drugs need to be detected as early as possible in the process. In recent years, compound-induced gene expression profiling methodologies have been developed to assess compound toxicity, including Gene Ontology term and pathway over-representation analyses. The objective of this study was to introduce an additional approach, in which literature information is used for compound profiling to evaluate compound toxicity and mode of toxicity. Gene annotations were built by text mining in Medline abstracts for retrieval of co-publications between genes, pathology terms, biological processes and pathways. This literature information was used to generate compound-specific keyword fingerprints, representing over-represented keywords calculated in a set of regulated genes after compound administration. To see whether keyword fingerprints can be used for assessment of compound toxicity, we analyzed microarray data sets of rat liver treated with 11 hepatotoxicants. Analysis of keyword fingerprints of two genotoxic carcinogens, two nongenotoxic carcinogens, two peroxisome proliferators and two randomly generated gene sets, showed that each compound produced a specific keyword fingerprint that correlated with the experimentally observed histopathological events induced by the individual compounds. By contrast, the random sets produced a flat aspecific keyword profile, indicating that the fingerprints induced by the compounds reflect biological events rather than random noise. A more detailed analysis of the keyword profiles of diethylhexylphthalate, dimethylnitrosamine and methapyrilene (MPy) showed that the differences in the keyword fingerprints of these three compounds are based upon known distinct modes of action. Visualization of MPy-linked keywords and MPy-induced genes in a literature network enabled us to construct a mode of toxicity proposal for MPy, which is in agreement with known effects of MPy in literature. Compound keyword fingerprinting based on information retrieved from literature is a powerful approach for compound profiling, allowing evaluation of compound toxicity and analysis of the mode of action.
Bayesian estimation of the transmissivity spatial structure from pumping test data
NASA Astrophysics Data System (ADS)
Demir, Mehmet Taner; Copty, Nadim K.; Trinchero, Paolo; Sanchez-Vila, Xavier
2017-06-01
Estimating the statistical parameters (mean, variance, and integral scale) that define the spatial structure of the transmissivity or hydraulic conductivity fields is a fundamental step for the accurate prediction of subsurface flow and contaminant transport. In practice, the determination of the spatial structure is a challenge because of spatial heterogeneity and data scarcity. In this paper, we describe a novel approach that uses time drawdown data from multiple pumping tests to determine the transmissivity statistical spatial structure. The method builds on the pumping test interpretation procedure of Copty et al. (2011) (Continuous Derivation method, CD), which uses the time-drawdown data and its time derivative to estimate apparent transmissivity values as a function of radial distance from the pumping well. A Bayesian approach is then used to infer the statistical parameters of the transmissivity field by combining prior information about the parameters and the likelihood function expressed in terms of radially-dependent apparent transmissivities determined from pumping tests. A major advantage of the proposed Bayesian approach is that the likelihood function is readily determined from randomly generated multiple realizations of the transmissivity field, without the need to solve the groundwater flow equation. Applying the method to synthetically-generated pumping test data, we demonstrate that, through a relatively simple procedure, information on the spatial structure of the transmissivity may be inferred from pumping tests data. It is also shown that the prior parameter distribution has a significant influence on the estimation procedure, given the non-uniqueness of the estimation procedure. Results also indicate that the reliability of the estimated transmissivity statistical parameters increases with the number of available pumping tests.
Adaptation of a Fast Optimal Interpolation Algorithm to the Mapping of Oceangraphic Data
NASA Technical Reports Server (NTRS)
Menemenlis, Dimitris; Fieguth, Paul; Wunsch, Carl; Willsky, Alan
1997-01-01
A fast, recently developed, multiscale optimal interpolation algorithm has been adapted to the mapping of hydrographic and other oceanographic data. This algorithm produces solution and error estimates which are consistent with those obtained from exact least squares methods, but at a small fraction of the computational cost. Problems whose solution would be completely impractical using exact least squares, that is, problems with tens or hundreds of thousands of measurements and estimation grid points, can easily be solved on a small workstation using the multiscale algorithm. In contrast to methods previously proposed for solving large least squares problems, our approach provides estimation error statistics while permitting long-range correlations, using all measurements, and permitting arbitrary measurement locations. The multiscale algorithm itself, published elsewhere, is not the focus of this paper. However, the algorithm requires statistical models having a very particular multiscale structure; it is the development of a class of multiscale statistical models, appropriate for oceanographic mapping problems, with which we concern ourselves in this paper. The approach is illustrated by mapping temperature in the northeastern Pacific. The number of hydrographic stations is kept deliberately small to show that multiscale and exact least squares results are comparable. A portion of the data were not used in the analysis; these data serve to test the multiscale estimates. A major advantage of the present approach is the ability to repeat the estimation procedure a large number of times for sensitivity studies, parameter estimation, and model testing. We have made available by anonymous Ftp a set of MATLAB-callable routines which implement the multiscale algorithm and the statistical models developed in this paper.