Secondary analysis of national survey datasets.
Boo, Sunjoo; Froelicher, Erika Sivarajan
2013-06-01
This paper describes the methodological issues associated with secondary analysis of large national survey datasets. Issues about survey sampling, data collection, and non-response and missing data in terms of methodological validity and reliability are discussed. Although reanalyzing large national survey datasets is an expedient and cost-efficient way of producing nursing knowledge, successful investigations require a methodological consideration of the intrinsic limitations of secondary survey analysis. Nursing researchers using existing national survey datasets should understand potential sources of error associated with survey sampling, data collection, and non-response and missing data. Although it is impossible to eliminate all potential errors, researchers using existing national survey datasets must be aware of the possible influence of errors on the results of the analyses. © 2012 The Authors. Japan Journal of Nursing Science © 2012 Japan Academy of Nursing Science.
Studying Child Care Subsidies with Secondary Data Sources. Methodological Brief OPRE 2012-54
ERIC Educational Resources Information Center
Ha, Yoonsook; Johnson, Anna D.
2012-01-01
This brief describes four national surveys with data relevant to subsidy-related research and provides a useful set of considerations for subsidy researchers considering use of secondary data. Specifically, this brief describes each of the four datasets reviewed, highlighting unique features of each dataset and providing information on the survey…
USDA-ARS?s Scientific Manuscript database
Due to economic and environmental consequences of nitrogen (N) lost from fertilizer applications in corn (Zea mays L.), considerable public and industry attention has been devoted to development of N decision tools. Now a wide variety of tools are available to farmers for managing N inputs. However,...
Chadeau-Hyam, Marc; Campanella, Gianluca; Jombart, Thibaut; Bottolo, Leonardo; Portengen, Lutzen; Vineis, Paolo; Liquet, Benoit; Vermeulen, Roel C H
2013-08-01
Recent technological advances in molecular biology have given rise to numerous large-scale datasets whose analysis imposes serious methodological challenges mainly relating to the size and complex structure of the data. Considerable experience in analyzing such data has been gained over the past decade, mainly in genetics, from the Genome-Wide Association Study era, and more recently in transcriptomics and metabolomics. Building upon the corresponding literature, we provide here a nontechnical overview of well-established methods used to analyze OMICS data within three main types of regression-based approaches: univariate models including multiple testing correction strategies, dimension reduction techniques, and variable selection models. Our methodological description focuses on methods for which ready-to-use implementations are available. We describe the main underlying assumptions, the main features, and advantages and limitations of each of the models. This descriptive summary constitutes a useful tool for driving methodological choices while analyzing OMICS data, especially in environmental epidemiology, where the emergence of the exposome concept clearly calls for unified methods to analyze marginally and jointly complex exposure and OMICS datasets. Copyright © 2013 Wiley Periodicals, Inc.
ERIC Educational Resources Information Center
Theodosiou-Zipiti, Galatia; Lamprianou, Iasonas
2016-01-01
Established literature suggests that language problems lead to lower attainment levels in those subjects that are more language dependent. Also, language has been suggested as a main driver of ethnic minority attainment. We use an original dataset of 2,020 secondary school students to show that ethnic minority students in Cyprus underperform…
NASA Astrophysics Data System (ADS)
Alkasem, Ameen; Liu, Hongwei; Zuo, Decheng; Algarash, Basheer
2018-01-01
The volume of data being collected, analyzed, and stored has exploded in recent years, in particular in relation to the activity on the cloud computing. While large-scale data processing, analysis, storage, and platform model such as cloud computing were previously and currently are increasingly. Today, the major challenge is it address how to monitor and control these massive amounts of data and perform analysis in real-time at scale. The traditional methods and model systems are unable to cope with these quantities of data in real-time. Here we present a new methodology for constructing a model for optimizing the performance of real-time monitoring of big datasets, which includes a machine learning algorithms and Apache Spark Streaming to accomplish fine-grained fault diagnosis and repair of big dataset. As a case study, we use the failure of Virtual Machines (VMs) to start-up. The methodology proposition ensures that the most sensible action is carried out during the procedure of fine-grained monitoring and generates the highest efficacy and cost-saving fault repair through three construction control steps: (I) data collection; (II) analysis engine and (III) decision engine. We found that running this novel methodology can save a considerate amount of time compared to the Hadoop model, without sacrificing the classification accuracy or optimization of performance. The accuracy of the proposed method (92.13%) is an improvement on traditional approaches.
Data splitting for artificial neural networks using SOM-based stratified sampling.
May, R J; Maier, H R; Dandy, G C
2010-03-01
Data splitting is an important consideration during artificial neural network (ANN) development where hold-out cross-validation is commonly employed to ensure generalization. Even for a moderate sample size, the sampling methodology used for data splitting can have a significant effect on the quality of the subsets used for training, testing and validating an ANN. Poor data splitting can result in inaccurate and highly variable model performance; however, the choice of sampling methodology is rarely given due consideration by ANN modellers. Increased confidence in the sampling is of paramount importance, since the hold-out sampling is generally performed only once during ANN development. This paper considers the variability in the quality of subsets that are obtained using different data splitting approaches. A novel approach to stratified sampling, based on Neyman sampling of the self-organizing map (SOM), is developed, with several guidelines identified for setting the SOM size and sample allocation in order to minimize the bias and variance in the datasets. Using an example ANN function approximation task, the SOM-based approach is evaluated in comparison to random sampling, DUPLEX, systematic stratified sampling, and trial-and-error sampling to minimize the statistical differences between data sets. Of these approaches, DUPLEX is found to provide benchmark performance with good model performance, with no variability. The results show that the SOM-based approach also reliably generates high-quality samples and can therefore be used with greater confidence than other approaches, especially in the case of non-uniform datasets, with the benefit of scalability to perform data splitting on large datasets. Copyright 2009 Elsevier Ltd. All rights reserved.
Yue, Lilly Q
2012-01-01
In the evaluation of medical products, including drugs, biological products, and medical devices, comparative observational studies could play an important role when properly conducted randomized, well-controlled clinical trials are infeasible due to ethical or practical reasons. However, various biases could be introduced at every stage and into every aspect of the observational study, and consequently the interpretation of the resulting statistical inference would be of concern. While there do exist statistical techniques for addressing some of the challenging issues, often based on propensity score methodology, these statistical tools probably have not been as widely employed in prospectively designing observational studies as they should be. There are also times when they are implemented in an unscientific manner, such as performing propensity score model selection for a dataset involving outcome data in the same dataset, so that the integrity of observational study design and the interpretability of outcome analysis results could be compromised. In this paper, regulatory considerations on prospective study design using propensity scores are shared and illustrated with hypothetical examples.
Perez-Calatayud, Jose; Ballester, Facundo; Das, Rupak K; Dewerd, Larry A; Ibbott, Geoffrey S; Meigooni, Ali S; Ouhib, Zoubir; Rivard, Mark J; Sloboda, Ron S; Williamson, Jeffrey F
2012-05-01
Recommendations of the American Association of Physicists in Medicine (AAPM) and the European Society for Radiotherapy and Oncology (ESTRO) on dose calculations for high-energy (average energy higher than 50 keV) photon-emitting brachytherapy sources are presented, including the physical characteristics of specific (192)Ir, (137)Cs, and (60)Co source models. This report has been prepared by the High Energy Brachytherapy Source Dosimetry (HEBD) Working Group. This report includes considerations in the application of the TG-43U1 formalism to high-energy photon-emitting sources with particular attention to phantom size effects, interpolation accuracy dependence on dose calculation grid size, and dosimetry parameter dependence on source active length. Consensus datasets for commercially available high-energy photon sources are provided, along with recommended methods for evaluating these datasets. Recommendations on dosimetry characterization methods, mainly using experimental procedures and Monte Carlo, are established and discussed. Also included are methodological recommendations on detector choice, detector energy response characterization and phantom materials, and measurement specification methodology. Uncertainty analyses are discussed and recommendations for high-energy sources without consensus datasets are given. Recommended consensus datasets for high-energy sources have been derived for sources that were commercially available as of January 2010. Data are presented according to the AAPM TG-43U1 formalism, with modified interpolation and extrapolation techniques of the AAPM TG-43U1S1 report for the 2D anisotropy function and radial dose function.
Combining users' activity survey and simulators to evaluate human activity recognition systems.
Azkune, Gorka; Almeida, Aitor; López-de-Ipiña, Diego; Chen, Liming
2015-04-08
Evaluating human activity recognition systems usually implies following expensive and time-consuming methodologies, where experiments with humans are run with the consequent ethical and legal issues. We propose a novel evaluation methodology to overcome the enumerated problems, which is based on surveys for users and a synthetic dataset generator tool. Surveys allow capturing how different users perform activities of daily living, while the synthetic dataset generator is used to create properly labelled activity datasets modelled with the information extracted from surveys. Important aspects, such as sensor noise, varying time lapses and user erratic behaviour, can also be simulated using the tool. The proposed methodology is shown to have very important advantages that allow researchers to carry out their work more efficiently. To evaluate the approach, a synthetic dataset generated following the proposed methodology is compared to a real dataset computing the similarity between sensor occurrence frequencies. It is concluded that the similarity between both datasets is more than significant.
NASA Astrophysics Data System (ADS)
Poobalasubramanian, Mangalraj; Agrawal, Anupam
2016-10-01
The presented work proposes fusion of panchromatic and multispectral images in a shearlet domain. The proposed fusion rules rely on the regional considerations which makes the system efficient in terms of spatial enhancement. The luminance hue saturation-based color conversion system is utilized to avoid spectral distortions. The proposed fusion method is tested on Worldview2 and Ikonos datasets, and the proposed method is compared against other methodologies. The proposed fusion method performs well against the other compared methods in terms of subjective and objective evaluations.
Decoys Selection in Benchmarking Datasets: Overview and Perspectives
Réau, Manon; Langenfeld, Florent; Zagury, Jean-François; Lagarde, Nathalie; Montes, Matthieu
2018-01-01
Virtual Screening (VS) is designed to prospectively help identifying potential hits, i.e., compounds capable of interacting with a given target and potentially modulate its activity, out of large compound collections. Among the variety of methodologies, it is crucial to select the protocol that is the most adapted to the query/target system under study and that yields the most reliable output. To this aim, the performance of VS methods is commonly evaluated and compared by computing their ability to retrieve active compounds in benchmarking datasets. The benchmarking datasets contain a subset of known active compounds together with a subset of decoys, i.e., assumed non-active molecules. The composition of both the active and the decoy compounds subsets is critical to limit the biases in the evaluation of the VS methods. In this review, we focus on the selection of decoy compounds that has considerably changed over the years, from randomly selected compounds to highly customized or experimentally validated negative compounds. We first outline the evolution of decoys selection in benchmarking databases as well as current benchmarking databases that tend to minimize the introduction of biases, and secondly, we propose recommendations for the selection and the design of benchmarking datasets. PMID:29416509
An Effective Methodology for Processing and Analyzing Large, Complex Spacecraft Data Streams
ERIC Educational Resources Information Center
Teymourlouei, Haydar
2013-01-01
The emerging large datasets have made efficient data processing a much more difficult task for the traditional methodologies. Invariably, datasets continue to increase rapidly in size with time. The purpose of this research is to give an overview of some of the tools and techniques that can be utilized to manage and analyze large datasets. We…
A hybrid approach to select features and classify diseases based on medical data
NASA Astrophysics Data System (ADS)
AbdelLatif, Hisham; Luo, Jiawei
2018-03-01
Feature selection is popular problem in the classification of diseases in clinical medicine. Here, we developing a hybrid methodology to classify diseases, based on three medical datasets, Arrhythmia, Breast cancer, and Hepatitis datasets. This methodology called k-means ANOVA Support Vector Machine (K-ANOVA-SVM) uses K-means cluster with ANOVA statistical to preprocessing data and selection the significant features, and Support Vector Machines in the classification process. To compare and evaluate the performance, we choice three classification algorithms, decision tree Naïve Bayes, Support Vector Machines and applied the medical datasets direct to these algorithms. Our methodology was a much better classification accuracy is given of 98% in Arrhythmia datasets, 92% in Breast cancer datasets and 88% in Hepatitis datasets, Compare to use the medical data directly with decision tree Naïve Bayes, and Support Vector Machines. Also, the ROC curve and precision with (K-ANOVA-SVM) Achieved best results than other algorithms
De Brouwere, Katleen; Cornelis, Christa; Arvanitis, Athanasios; Brown, Terry; Crump, Derrick; Harrison, Paul; Jantunen, Matti; Price, Paul; Torfs, Rudi
2014-05-01
The maximum cumulative ratio (MCR) method allows the categorisation of mixtures according to whether the mixture is of concern for toxicity and if so whether this is driven by one substance or multiple substances. The aim of the present study was to explore, by application of the MCR approach, whether health risks due to indoor air pollution are dominated by one substance or are due to concurrent exposure to various substances. Analysis was undertaken on monitoring data of four European indoor studies (giving five datasets), involving 1800 records of indoor air or personal exposure. Application of the MCR methodology requires knowledge of the concentrations of chemicals in a mixture together with health-based reference values for those chemicals. For this evaluation, single substance health-based reference values (RVs) were selected through a structured review process. The MCR analysis found high variability in the proportion of samples of concern for mixture toxicity. The fraction of samples in these groups of concern varied from 2% (Flemish schools) to 77% (EXPOLIS, Basel, indoor), the variation being due not only to the variation in indoor air contaminant levels across the studies but also to other factors such as differences in number and type of substances monitored, analytical performance, and choice of RVs. However, in 4 out of the 5 datasets, a considerable proportion of cases were found where a chemical-by-chemical approach failed to identify the need for the investigation of combined risk assessment. Although the MCR methodology applied in the current study provides no consideration of commonality of endpoints, it provides a tool for discrimination between those mixtures requiring further combined risk assessment and those for which a single-substance assessment is sufficient. Copyright © 2014 Elsevier B.V. All rights reserved.
Considerations for Observational Research using Large Datasets in Radiation Oncology
Jagsi, Reshma; Bekelman, Justin E.; Chen, Aileen; Chen, Ronald C.; Hoffman, Karen; Shih, Ya-Chen Tina; Smith, Benjamin D.; Yu, James B.
2014-01-01
The radiation oncology community has witnessed growing interest in observational research conducted using large-scale data sources such as registries and claims-based datasets. With the growing emphasis on observational analyses in health care, the radiation oncology community must possess a sophisticated understanding of the methodological considerations of such studies in order to evaluate evidence appropriately to guide practice and policy. Because observational research has unique features that distinguish it from clinical trials and other forms of traditional radiation oncology research, the Red Journal assembled a panel of experts in health services research to provide a concise and well-referenced review, intended to be informative for the lay reader, as well as for scholars who wish to embark on such research without prior experience. This review begins by discussing the types of research questions relevant to radiation oncology that large-scale databases may help illuminate. It then describes major potential data sources for such endeavors, including information regarding access and insights regarding the strengths and limitations of each. Finally, it provides guidance regarding the analytic challenges that observational studies must confront, along with discussion of the techniques that have been developed to help minimize the impact of certain common analytical issues in observational analysis. Features characterizing a well-designed observational study include clearly defined research questions, careful selection of an appropriate data source, consultation with investigators with relevant methodological expertise, inclusion of sensitivity analyses, caution not to overinterpret small but significant differences, and recognition of limitations when trying to evaluate causality. This review concludes that carefully designed and executed studies using observational data that possess these qualities hold substantial promise for advancing our understanding of many unanswered questions of importance to the field of radiation oncology. PMID:25195986
Kalwij, Jesse M; Robertson, Mark P; Ronk, Argo; Zobel, Martin; Pärtel, Meelis
2014-01-01
Much ecological research relies on existing multispecies distribution datasets. Such datasets, however, can vary considerably in quality, extent, resolution or taxonomic coverage. We provide a framework for a spatially-explicit evaluation of geographical representation within large-scale species distribution datasets, using the comparison of an occurrence atlas with a range atlas dataset as a working example. Specifically, we compared occurrence maps for 3773 taxa from the widely-used Atlas Florae Europaeae (AFE) with digitised range maps for 2049 taxa of the lesser-known Atlas of North European Vascular Plants. We calculated the level of agreement at a 50-km spatial resolution using average latitudinal and longitudinal species range, and area of occupancy. Agreement in species distribution was calculated and mapped using Jaccard similarity index and a reduced major axis (RMA) regression analysis of species richness between the entire atlases (5221 taxa in total) and between co-occurring species (601 taxa). We found no difference in distribution ranges or in the area of occupancy frequency distribution, indicating that atlases were sufficiently overlapping for a valid comparison. The similarity index map showed high levels of agreement for central, western, and northern Europe. The RMA regression confirmed that geographical representation of AFE was low in areas with a sparse data recording history (e.g., Russia, Belarus and the Ukraine). For co-occurring species in south-eastern Europe, however, the Atlas of North European Vascular Plants showed remarkably higher richness estimations. Geographical representation of atlas data can be much more heterogeneous than often assumed. Level of agreement between datasets can be used to evaluate geographical representation within datasets. Merging atlases into a single dataset is worthwhile in spite of methodological differences, and helps to fill gaps in our knowledge of species distribution ranges. Species distribution dataset mergers, such as the one exemplified here, can serve as a baseline towards comprehensive species distribution datasets.
Dimitriadis, Stavros I; Salis, Christos; Linden, David
2018-04-01
Limitations of the manual scoring of polysomnograms, which include data from electroencephalogram (EEG), electro-oculogram (EOG), electrocardiogram (ECG) and electromyogram (EMG) channels have long been recognized. Manual staging is resource intensive and time consuming, and thus considerable effort must be spent to ensure inter-rater reliability. As a result, there is a great interest in techniques based on signal processing and machine learning for a completely Automatic Sleep Stage Classification (ASSC). In this paper, we present a single-EEG-sensor ASSC technique based on the dynamic reconfiguration of different aspects of cross-frequency coupling (CFC) estimated between predefined frequency pairs over 5 s epoch lengths. The proposed analytic scheme is demonstrated using the PhysioNet Sleep European Data Format (EDF) Database with repeat recordings from 20 healthy young adults. We validate our methodology in a second sleep dataset. We achieved very high classification sensitivity, specificity and accuracy of 96.2 ± 2.2%, 94.2 ± 2.3%, and 94.4 ± 2.2% across 20 folds, respectively, and also a high mean F1 score (92%, range 90-94%) when a multi-class Naive Bayes classifier was applied. High classification performance has been achieved also in the second sleep dataset. Our method outperformed the accuracy of previous studies not only on different datasets but also on the same database. Single-sensor ASSC makes the entire methodology appropriate for longitudinal monitoring using wearable EEG in real-world and laboratory-oriented environments. Crown Copyright © 2018. Published by Elsevier B.V. All rights reserved.
Four aspects to make science open "by design" and not as an after-thought.
Halchenko, Yaroslav O; Hanke, Michael
2015-01-01
Unrestricted dissemination of methodological developments in neuroimaging became the propelling force in advancing our understanding of brain function. However, despite such a rich legacy, it remains not uncommon to encounter software and datasets that are distributed under unnecessarily restricted terms, or that violate terms of third-party products (software or data). With this brief correspondence we would like to recapitulate four important aspects of scientific research practice, which should be taken into consideration as early as possible in the course of any project. Keeping these in check will help neuroimaging to stay at the forefront of the open science movement.
A Novel Performance Evaluation Methodology for Single-Target Trackers.
Kristan, Matej; Matas, Jiri; Leonardis, Ales; Vojir, Tomas; Pflugfelder, Roman; Fernandez, Gustavo; Nebehay, Georg; Porikli, Fatih; Cehovin, Luka
2016-11-01
This paper addresses the problem of single-target tracker performance evaluation. We consider the performance measures, the dataset and the evaluation system to be the most important components of tracker evaluation and propose requirements for each of them. The requirements are the basis of a new evaluation methodology that aims at a simple and easily interpretable tracker comparison. The ranking-based methodology addresses tracker equivalence in terms of statistical significance and practical differences. A fully-annotated dataset with per-frame annotations with several visual attributes is introduced. The diversity of its visual properties is maximized in a novel way by clustering a large number of videos according to their visual attributes. This makes it the most sophistically constructed and annotated dataset to date. A multi-platform evaluation system allowing easy integration of third-party trackers is presented as well. The proposed evaluation methodology was tested on the VOT2014 challenge on the new dataset and 38 trackers, making it the largest benchmark to date. Most of the tested trackers are indeed state-of-the-art since they outperform the standard baselines, resulting in a highly-challenging benchmark. An exhaustive analysis of the dataset from the perspective of tracking difficulty is carried out. To facilitate tracker comparison a new performance visualization technique is proposed.
Munksgaard, Rasmus; Demant, Jakob; Branwen, Gwern
2016-09-01
The development of cryptomarkets has gained increasing attention from academics, including growing scientific literature on the distribution of illegal goods using cryptomarkets. Dolliver's 2015 article "Evaluating drug trafficking on the Tor Network: Silk Road 2, the Sequel" addresses this theme by evaluating drug trafficking on one of the most well-known cryptomarkets, Silk Road 2.0. The research on cryptomarkets in general-particularly in Dolliver's article-poses a number of new questions for methodologies. This commentary is structured around a replication of Dolliver's original study. The replication study is not based on Dolliver's original dataset, but on a second dataset collected applying the same methodology. We have found that the results produced by Dolliver differ greatly from our replicated study. While a margin of error is to be expected, the inconsistencies we found are too great to attribute to anything other than methodological issues. The analysis and conclusions drawn from studies using these methods are promising and insightful. However, based on the replication of Dolliver's study, we suggest that researchers using these methodologies consider and that datasets be made available for other researchers, and that methodology and dataset metrics (e.g. number of downloaded pages, error logs) are described thoroughly in the context of web-o-metrics and web crawling. Copyright © 2016 Elsevier B.V. All rights reserved.
The Fungal Frontier: A Comparative Analysis of Methods Used in the Study of the Human Gut Mycobiome.
Huseyin, Chloe E; Rubio, Raul Cabrera; O'Sullivan, Orla; Cotter, Paul D; Scanlan, Pauline D
2017-01-01
The human gut is host to a diverse range of fungal species, collectively referred to as the gut "mycobiome". The gut mycobiome is emerging as an area of considerable research interest due to the potential roles of these fungi in human health and disease. However, there is no consensus as to what the best or most suitable methodologies available are with respect to characterizing the human gut mycobiome. The aim of this study is to provide a comparative analysis of several previously published mycobiome-specific culture-dependent and -independent methodologies, including choice of culture media, incubation conditions (aerobic versus anaerobic), DNA extraction method, primer set and freezing of fecal samples to assess their relative merits and suitability for gut mycobiome analysis. There was no significant effect of media type or aeration on culture-dependent results. However, freezing was found to have a significant effect on fungal viability, with significantly lower fungal numbers recovered from frozen samples. DNA extraction method had a significant effect on DNA yield and quality. However, freezing and extraction method did not have any impact on either α or β diversity. There was also considerable variation in the ability of different fungal-specific primer sets to generate PCR products for subsequent sequence analysis. Through this investigation two DNA extraction methods and one primer set was identified which facilitated the analysis of the mycobiome for all samples in this study. Ultimately, a diverse range of fungal species were recovered using both approaches, with Candida and Saccharomyces identified as the most common fungal species recovered using culture-dependent and culture-independent methods, respectively. As has been apparent from ecological surveys of the bacterial fraction of the gut microbiota, the use of different methodologies can also impact on our understanding of gut mycobiome composition and therefore requires careful consideration. Future research into the gut mycobiome needs to adopt a common strategy to minimize potentially confounding effects of methodological choice and to facilitate comparative analysis of datasets.
Garraín, Daniel; Fazio, Simone; de la Rúa, Cristina; Recchioni, Marco; Lechón, Yolanda; Mathieux, Fabrice
2015-01-01
The aim of this study is to identify areas of potential improvement of the European Reference Life Cycle Database (ELCD) fuel datasets. The revision is based on the data quality indicators described by the ILCD Handbook, applied on sectorial basis. These indicators evaluate the technological, geographical and time-related representativeness of the dataset and the appropriateness in terms of completeness, precision and methodology. Results show that ELCD fuel datasets have a very good quality in general terms, nevertheless some findings and recommendations in order to improve the quality of Life-Cycle Inventories have been derived. Moreover, these results ensure the quality of the fuel-related datasets to any LCA practitioner, and provide insights related to the limitations and assumptions underlying in the datasets modelling. Giving this information, the LCA practitioner will be able to decide whether the use of the ELCD fuel datasets is appropriate based on the goal and scope of the analysis to be conducted. The methodological approach would be also useful for dataset developers and reviewers, in order to improve the overall DQR of databases.
Federal Register 2010, 2011, 2012, 2013, 2014
2012-03-14
... and related methodology. Emphasis will be placed on dataset accuracy and time-dependent biases. Pathways to overcome accuracy and bias issues will be an important focus. Participants will consider...] Guidance for improving these methods. [cir] Recommendations for rectifying any known time-dependent biases...
Garraín, Daniel; Fazio, Simone; de la Rúa, Cristina; Recchioni, Marco; Lechón, Yolanda; Mathieux, Fabrice
2015-01-01
The aim of this paper is to identify areas of potential improvement of the European Reference Life Cycle Database (ELCD) electricity datasets. The revision is based on the data quality indicators described by the International Life Cycle Data system (ILCD) Handbook, applied on sectorial basis. These indicators evaluate the technological, geographical and time-related representativeness of the dataset and the appropriateness in terms of completeness, precision and methodology. Results show that ELCD electricity datasets have a very good quality in general terms, nevertheless some findings and recommendations in order to improve the quality of Life-Cycle Inventories have been derived. Moreover, these results ensure the quality of the electricity-related datasets to any LCA practitioner, and provide insights related to the limitations and assumptions underlying in the datasets modelling. Giving this information, the LCA practitioner will be able to decide whether the use of the ELCD electricity datasets is appropriate based on the goal and scope of the analysis to be conducted. The methodological approach would be also useful for dataset developers and reviewers, in order to improve the overall Data Quality Requirements of databases.
Diffusion Weighted Image Denoising Using Overcomplete Local PCA
Manjón, José V.; Coupé, Pierrick; Concha, Luis; Buades, Antonio; Collins, D. Louis; Robles, Montserrat
2013-01-01
Diffusion Weighted Images (DWI) normally shows a low Signal to Noise Ratio (SNR) due to the presence of noise from the measurement process that complicates and biases the estimation of quantitative diffusion parameters. In this paper, a new denoising methodology is proposed that takes into consideration the multicomponent nature of multi-directional DWI datasets such as those employed in diffusion imaging. This new filter reduces random noise in multicomponent DWI by locally shrinking less significant Principal Components using an overcomplete approach. The proposed method is compared with state-of-the-art methods using synthetic and real clinical MR images, showing improved performance in terms of denoising quality and estimation of diffusion parameters. PMID:24019889
Supervised Machine Learning for Population Genetics: A New Paradigm
Schrider, Daniel R.; Kern, Andrew D.
2018-01-01
As population genomic datasets grow in size, researchers are faced with the daunting task of making sense of a flood of information. To keep pace with this explosion of data, computational methodologies for population genetic inference are rapidly being developed to best utilize genomic sequence data. In this review we discuss a new paradigm that has emerged in computational population genomics: that of supervised machine learning (ML). We review the fundamentals of ML, discuss recent applications of supervised ML to population genetics that outperform competing methods, and describe promising future directions in this area. Ultimately, we argue that supervised ML is an important and underutilized tool that has considerable potential for the world of evolutionary genomics. PMID:29331490
MetaCAA: A clustering-aided methodology for efficient assembly of metagenomic datasets.
Reddy, Rachamalla Maheedhar; Mohammed, Monzoorul Haque; Mande, Sharmila S
2014-01-01
A key challenge in analyzing metagenomics data pertains to assembly of sequenced DNA fragments (i.e. reads) originating from various microbes in a given environmental sample. Several existing methodologies can assemble reads originating from a single genome. However, these methodologies cannot be applied for efficient assembly of metagenomic sequence datasets. In this study, we present MetaCAA - a clustering-aided methodology which helps in improving the quality of metagenomic sequence assembly. MetaCAA initially groups sequences constituting a given metagenome into smaller clusters. Subsequently, sequences in each cluster are independently assembled using CAP3, an existing single genome assembly program. Contigs formed in each of the clusters along with the unassembled reads are then subjected to another round of assembly for generating the final set of contigs. Validation using simulated and real-world metagenomic datasets indicates that MetaCAA aids in improving the overall quality of assembly. A software implementation of MetaCAA is available at https://metagenomics.atc.tcs.com/MetaCAA. Copyright © 2014 Elsevier Inc. All rights reserved.
A robust dataset-agnostic heart disease classifier from Phonocardiogram.
Banerjee, Rohan; Dutta Choudhury, Anirban; Deshpande, Parijat; Bhattacharya, Sakyajit; Pal, Arpan; Mandana, K M
2017-07-01
Automatic classification of normal and abnormal heart sounds is a popular area of research. However, building a robust algorithm unaffected by signal quality and patient demography is a challenge. In this paper we have analysed a wide list of Phonocardiogram (PCG) features in time and frequency domain along with morphological and statistical features to construct a robust and discriminative feature set for dataset-agnostic classification of normal and cardiac patients. The large and open access database, made available in Physionet 2016 challenge was used for feature selection, internal validation and creation of training models. A second dataset of 41 PCG segments, collected using our in-house smart phone based digital stethoscope from an Indian hospital was used for performance evaluation. Our proposed methodology yielded sensitivity and specificity scores of 0.76 and 0.75 respectively on the test dataset in classifying cardiovascular diseases. The methodology also outperformed three popular prior art approaches, when applied on the same dataset.
Cooper, P David; Smart, David R
2017-03-01
In an era of ever-increasing medical costs, the identification and prohibition of ineffective medical therapies is of considerable economic interest to healthcare funding bodies. Likewise, the avoidance of interventions with an unduly elevated clinical risk/benefit ratio would be similarly advantageous for patients. Regrettably, the identification of such therapies has proven problematic. A recent paper from the Grattan Institute in Australia (identifying five hospital procedures as having the potential for disinvestment on these grounds) serves as a timely illustration of the difficulties inherent in non-clinicians attempting to accurately recognize such interventions using non-clinical, indirect or poorly validated datasets. To evaluate the Grattan Institute report and associated publications, and determine the validity of their assertions regarding hyperbaric oxygen treatment (HBOT) utilisation in Australia. Critical analysis of the HBOT metadata included in the Grattan Institute study was undertaken and compared against other publicly available Australian Government and independent data sources. The consistency, accuracy and reproducibility of data definitions and terminology across the various publications were appraised and the authors' methodology was reviewed. Reference sources were examined for relevance and temporal eligibility. Review of the Grattan publications demonstrated multiple problems, including (but not limited to): confusing patient-treatments with total patient numbers; incorrect identification of 'appropriate' vs. 'inappropriate' indications for HBOT; reliance upon a compromised primary dataset; lack of appropriate clinical input, muddled methodology and use of inapplicable references. These errors resulted in a more than seventy-fold over-estimation of the number of patients potentially treated inappropriately with HBOT in Australia that year. Numerous methodological flaws and factual errors have been identified in this Grattan Institute study. Its conclusions are not valid and a formal retraction is required.
Meta-Analysis in Genome-Wide Association Datasets: Strategies and Application in Parkinson Disease
Evangelou, Evangelos; Maraganore, Demetrius M.; Ioannidis, John P.A.
2007-01-01
Background Genome-wide association studies hold substantial promise for identifying common genetic variants that regulate susceptibility to complex diseases. However, for the detection of small genetic effects, single studies may be underpowered. Power may be improved by combining genome-wide datasets with meta-analytic techniques. Methodology/Principal Findings Both single and two-stage genome-wide data may be combined and there are several possible strategies. In the two-stage framework, we considered the options of (1) enhancement of replication data and (2) enhancement of first-stage data, and then, we also considered (3) joint meta-analyses including all first-stage and second-stage data. These strategies were examined empirically using data from two genome-wide association studies (three datasets) on Parkinson disease. In the three strategies, we derived 12, 5, and 49 single nucleotide polymorphisms that show significant associations at conventional levels of statistical significance. None of these remained significant after conservative adjustment for the number of performed analyses in each strategy. However, some may warrant further consideration: 6 SNPs were identified with at least 2 of the 3 strategies and 3 SNPs [rs1000291 on chromosome 3, rs2241743 on chromosome 4 and rs3018626 on chromosome 11] were identified with all 3 strategies and had no or minimal between-dataset heterogeneity (I2 = 0, 0 and 15%, respectively). Analyses were primarily limited by the suboptimal overlap of tested polymorphisms across different datasets (e.g., only 31,192 shared polymorphisms between the two tier 1 datasets). Conclusions/Significance Meta-analysis may be used to improve the power and examine the between-dataset heterogeneity of genome-wide association studies. Prospective designs may be most efficient, if they try to maximize the overlap of genotyping platforms and anticipate the combination of data across many genome-wide association studies. PMID:17332845
Deep learning-based fine-grained car make/model classification for visual surveillance
NASA Astrophysics Data System (ADS)
Gundogdu, Erhan; Parıldı, Enes Sinan; Solmaz, Berkan; Yücesoy, Veysel; Koç, Aykut
2017-10-01
Fine-grained object recognition is a potential computer vision problem that has been recently addressed by utilizing deep Convolutional Neural Networks (CNNs). Nevertheless, the main disadvantage of classification methods relying on deep CNN models is the need for considerably large amount of data. In addition, there exists relatively less amount of annotated data for a real world application, such as the recognition of car models in a traffic surveillance system. To this end, we mainly concentrate on the classification of fine-grained car make and/or models for visual scenarios by the help of two different domains. First, a large-scale dataset including approximately 900K images is constructed from a website which includes fine-grained car models. According to their labels, a state-of-the-art CNN model is trained on the constructed dataset. The second domain that is dealt with is the set of images collected from a camera integrated to a traffic surveillance system. These images, which are over 260K, are gathered by a special license plate detection method on top of a motion detection algorithm. An appropriately selected size of the image is cropped from the region of interest provided by the detected license plate location. These sets of images and their provided labels for more than 30 classes are employed to fine-tune the CNN model which is already trained on the large scale dataset described above. To fine-tune the network, the last two fully-connected layers are randomly initialized and the remaining layers are fine-tuned in the second dataset. In this work, the transfer of a learned model on a large dataset to a smaller one has been successfully performed by utilizing both the limited annotated data of the traffic field and a large scale dataset with available annotations. Our experimental results both in the validation dataset and the real field show that the proposed methodology performs favorably against the training of the CNN model from scratch.
NASA Astrophysics Data System (ADS)
Keefer, J.; Bourassa, M. A.
2014-12-01
A recent study (Young et al. 2011) investigated recent global trends in mean and extreme (90th- and 99th-percentile) wind speed and wave height. Wentz and Ricciardulli (2011) have criticized the study, citing the methodology solely employing data collected from a series of altimetry missions and lack of adequate verification of the results. An earlier study (Wentz et al. 2007) had differing results using data from microwave radiometers and scatterometers. This study serves as a response to these studies, employing a similar methodology but with a different set of data. Data collected from the QuikSCAT and ADEOS-2 SeaWinds scatterometers, SSMI(S), and TOPEX/POSEIDON and JASON-1 altimetry missions are used to calculate trends in the mean, 90th-, and 99th-percentile wind speed and wave height over the period 1999—2009. Linear regression analyses from the satellite missions are verified against regression analyses of data from the ERA-Interim reanalysis dataset. Temporal sampling presents the most critical consideration in the study. The scatterometers have a much greater independent temporal sampling (about 1.5 observations per day per satellite) than the altimeters (about 1 observation per 10 days). With this consideration, the satellite data are also used to sample the wind speeds in the ERA-Interim dataset. That portion of the study indicates the sampling requirements needed to accurately estimate the trends in the ERA-Interim reanalysis. Wentz, F.J., L. Ricciardulli, K. Hilburn, and C. Mears, 2007: How much more rain will global warming bring? Science, 317, 233-235. Wentz, F.J. and L. Ricciardulli, 2011: Comment on "Global trends in wind speed and wave height." Science, 334, 905. Young, I.R., S. Zieger, and A.V. Babanin, 2011a: Global trends in wind speed and wave height. Science, 332, 451-455.
NASA Astrophysics Data System (ADS)
Heathfield, D.; Walker, I. J.; Grilliot, M. J.
2016-12-01
The recent emergence of terrestrial laser scanning (TLS) and unmanned aerial systems (UAS) as mapping platforms in geomorphology research has allowed for expedited acquisition of high spatial and temporal resolution, three-dimensional topographic datasets. TLS provides dense 3D `point cloud' datasets that require careful acquisition strategies and appreciable post-processing to produce accurate digital elevation models (DEMs). UAS provide overlapping nadir and oblique imagery that can be analysed using Structure from Motion (SfM) photogrammetry software to provide accurate, high-resolution orthophoto mosaics and accurate digital surface models (DSMs). Both methods yield centimeter to decimeter scale accuracy, depending on various hardware and field acquisition considerations (e.g., camera resolution, flight height, on-site GNSS control, etc.). Combined, the UAS-SfM workflow provides a comparable and more affordable solution to the more expensive TLS or aerial LiDAR methods. This paper compares and contrasts SfM and TLS survey methodologies and related workflow costs and benefits as used to quantify and examine seasonal beach-dune erosion and recovery processes at a site (Calvert Island) on British Columbia's central coast in western Canada. Seasonal SfM- and TLS-derived DEMs were used to quantify spatial patterns of surface elevation change, geomorphic responses, and related significant sediment volume changes. Cluster maps of positive (depositional) and negative (erosional) change are analysed to detect and interpret the geomorphic and sediment budget responses following an erosive water level event during winter 2016 season (Oct. 2015 - Apr. 2016). Vantage cameras also provided qualitative data on the frequency and magnitude of environmental drivers (e.g., tide, wave, wind forcing) of erosion and deposition events during the observation period. In addition, we evaluate the costs, time expenditures, and accuracy considerations for both SfM and TLS methodologies.
The Fungal Frontier: A Comparative Analysis of Methods Used in the Study of the Human Gut Mycobiome
Huseyin, Chloe E.; Rubio, Raul Cabrera; O’Sullivan, Orla; Cotter, Paul D.; Scanlan, Pauline D.
2017-01-01
The human gut is host to a diverse range of fungal species, collectively referred to as the gut “mycobiome”. The gut mycobiome is emerging as an area of considerable research interest due to the potential roles of these fungi in human health and disease. However, there is no consensus as to what the best or most suitable methodologies available are with respect to characterizing the human gut mycobiome. The aim of this study is to provide a comparative analysis of several previously published mycobiome-specific culture-dependent and -independent methodologies, including choice of culture media, incubation conditions (aerobic versus anaerobic), DNA extraction method, primer set and freezing of fecal samples to assess their relative merits and suitability for gut mycobiome analysis. There was no significant effect of media type or aeration on culture-dependent results. However, freezing was found to have a significant effect on fungal viability, with significantly lower fungal numbers recovered from frozen samples. DNA extraction method had a significant effect on DNA yield and quality. However, freezing and extraction method did not have any impact on either α or β diversity. There was also considerable variation in the ability of different fungal-specific primer sets to generate PCR products for subsequent sequence analysis. Through this investigation two DNA extraction methods and one primer set was identified which facilitated the analysis of the mycobiome for all samples in this study. Ultimately, a diverse range of fungal species were recovered using both approaches, with Candida and Saccharomyces identified as the most common fungal species recovered using culture-dependent and culture-independent methods, respectively. As has been apparent from ecological surveys of the bacterial fraction of the gut microbiota, the use of different methodologies can also impact on our understanding of gut mycobiome composition and therefore requires careful consideration. Future research into the gut mycobiome needs to adopt a common strategy to minimize potentially confounding effects of methodological choice and to facilitate comparative analysis of datasets. PMID:28824566
Sowan, Azizeh Khaled; Reed, Charles Calhoun; Staggers, Nancy
2016-09-30
Large datasets of the audit log of modern physiologic monitoring devices have rarely been used for predictive modeling, capturing unsafe practices, or guiding initiatives on alarm systems safety. This paper (1) describes a large clinical dataset using the audit log of the physiologic monitors, (2) discusses benefits and challenges of using the audit log in identifying the most important alarm signals and improving the safety of clinical alarm systems, and (3) provides suggestions for presenting alarm data and improving the audit log of the physiologic monitors. At a 20-bed transplant cardiac intensive care unit, alarm data recorded via the audit log of bedside monitors were retrieved from the server of the central station monitor. Benefits of the audit log are many. They include easily retrievable data at no cost, complete alarm records, easy capture of inconsistent and unsafe practices, and easy identification of bedside monitors missed from a unit change of alarm settings adjustments. Challenges in analyzing the audit log are related to the time-consuming processes of data cleaning and analysis, and limited storage and retrieval capabilities of the monitors. The audit log is a function of current capabilities of the physiologic monitoring systems, monitor's configuration, and alarm management practices by clinicians. Despite current challenges in data retrieval and analysis, large digitalized clinical datasets hold great promise in performance, safety, and quality improvement. Vendors, clinicians, researchers, and professional organizations should work closely to identify the most useful format and type of clinical data to expand medical devices' log capacity.
U.S. Heat Demand by Sector for Potential Application of Direct Use Geothermal
Katherine Young
2016-06-23
This dataset includes heat demand for potential application of direct use geothermal broken down into 4 sectors: agricultural, commercial, manufacturing and residential. The data for each sector are organized by county, were disaggregated specifically to assess the market demand for geothermal direct use, and were derived using methodologies customized for each sector based on the availability of data and other sector-specific factors. This dataset also includes a paper containing a full explanation of the methodologies used.
Relationships between palaeogeography and opal occurrence in Australia: A data-mining approach
NASA Astrophysics Data System (ADS)
Landgrebe, T. C. W.; Merdith, A.; Dutkiewicz, A.; Müller, R. D.
2013-07-01
Age-coded multi-layered geological datasets are becoming increasingly prevalent with the surge in open-access geodata, yet there are few methodologies for extracting geological information and knowledge from these data. We present a novel methodology, based on the open-source GPlates software in which age-coded digital palaeogeographic maps are used to “data-mine” spatio-temporal patterns related to the occurrence of Australian opal. Our aim is to test the concept that only a particular sequence of depositional/erosional environments may lead to conditions suitable for the formation of gem quality sedimentary opal. Time-varying geographic environment properties are extracted from a digital palaeogeographic dataset of the eastern Australian Great Artesian Basin (GAB) at 1036 opal localities. We obtain a total of 52 independent ordinal sequences sampling 19 time slices from the Early Cretaceous to the present-day. We find that 95% of the known opal deposits are tied to only 27 sequences all comprising fluvial and shallow marine depositional sequences followed by a prolonged phase of erosion. We then map the total area of the GAB that matches these 27 opal-specific sequences, resulting in an opal-prospective region of only about 10% of the total area of the basin. The key patterns underlying this association involve only a small number of key environmental transitions. We demonstrate that these key associations are generally absent at arbitrary locations in the basin. This new methodology allows for the simplification of a complex time-varying geological dataset into a single map view, enabling straightforward application for opal exploration and for future co-assessment with other datasets/geological criteria. This approach may help unravel the poorly understood opal formation process using an empirical spatio-temporal data-mining methodology and readily available datasets to aid hypothesis testing.
NASA Astrophysics Data System (ADS)
Whitehall, K. D.; Jenkins, G. S.; Mattmann, C. A.; Waliser, D. E.; Kim, J.; Goodale, C. E.; Hart, A. F.; Ramirez, P.; Whittell, J.; Zimdars, P. A.
2012-12-01
Mesoscale convective complexes (MCCs) are large (2 - 3 x 105 km2) nocturnal convectively-driven weather systems that are generally associated with high precipitation events in short durations (less than 12hrs) in various locations through out the tropics and midlatitudes (Maddox 1980). These systems are particularly important for climate in the West Sahel region, where the precipitation associated with them is a principal component of the rainfall season (Laing and Fritsch 1993). These systems occur on weather timescales and are historically identified from weather data analysis via manual and more recently automated processes (Miller and Fritsch 1991, Nesbett 2006, Balmey and Reason 2012). The Regional Climate Model Evaluation System (RCMES) is an open source tool designed for easy evaluation of climate and Earth system data through access to standardized datasets, and intrinsic tools that perform common analysis and visualization tasks (Hart et al. 2011). The RCMES toolkit also provides the flexibility of user-defined subroutines for further metrics, visualization and even dataset manipulation. The purpose of this study is to present a methodology for identifying MCCs in observation datasets using the RCMES framework. TRMM 3 hourly datasets will be used to demonstrate the methodology for 2005 boreal summer. This method promotes the use of open source software for scientific data systems to address a concern to multiple stakeholders in the earth sciences. A historical MCC dataset provides a platform with regards to further studies of the variability of frequency on various timescales of MCCs that is important for many including climate scientists, meteorologists, water resource managers, and agriculturalists. The methodology of using RCMES for searching and clipping datasets will engender a new realm of studies as users of the system will no longer be restricted to solely using the datasets as they reside in their own local systems; instead will be afforded rapid, effective, and transparent access, processing and visualization of the wealth of remote sensing datasets and climate model outputs available.
Kunz, Meik; Dandekar, Thomas; Naseem, Muhammad
2017-01-01
Cytokinins (CKs) play an important role in plant growth and development. Also, several studies highlight the modulatory implications of CKs for plant-pathogen interaction. However, the underlying mechanisms of CK mediating immune networks in plants are still not fully understood. A detailed analysis of high-throughput transcriptome (RNA-Seq and microarrays) datasets under modulated conditions of plant CKs and its mergence with cellular interactome (large-scale protein-protein interaction data) has the potential to unlock the contribution of CKs to plant defense. Here, we specifically describe a detailed systems biology methodology pertinent to the acquisition and analysis of various omics datasets that delineate the role of plant CKs in impacting immune pathways in Arabidopsis.
Vanegas, Fernando; Bratanov, Dmitry; Powell, Kevin; Weiss, John; Gonzalez, Felipe
2018-01-17
Recent advances in remote sensed imagery and geospatial image processing using unmanned aerial vehicles (UAVs) have enabled the rapid and ongoing development of monitoring tools for crop management and the detection/surveillance of insect pests. This paper describes a (UAV) remote sensing-based methodology to increase the efficiency of existing surveillance practices (human inspectors and insect traps) for detecting pest infestations (e.g., grape phylloxera in vineyards). The methodology uses a UAV integrated with advanced digital hyperspectral, multispectral, and RGB sensors. We implemented the methodology for the development of a predictive model for phylloxera detection. In this method, we explore the combination of airborne RGB, multispectral, and hyperspectral imagery with ground-based data at two separate time periods and under different levels of phylloxera infestation. We describe the technology used-the sensors, the UAV, and the flight operations-the processing workflow of the datasets from each imagery type, and the methods for combining multiple airborne with ground-based datasets. Finally, we present relevant results of correlation between the different processed datasets. The objective of this research is to develop a novel methodology for collecting, processing, analising and integrating multispectral, hyperspectral, ground and spatial data to remote sense different variables in different applications, such as, in this case, plant pest surveillance. The development of such methodology would provide researchers, agronomists, and UAV practitioners reliable data collection protocols and methods to achieve faster processing techniques and integrate multiple sources of data in diverse remote sensing applications.
Fazio, Simone; Garraín, Daniel; Mathieux, Fabrice; De la Rúa, Cristina; Recchioni, Marco; Lechón, Yolanda
2015-01-01
Under the framework of the European Platform on Life Cycle Assessment, the European Reference Life-Cycle Database (ELCD - developed by the Joint Research Centre of the European Commission), provides core Life Cycle Inventory (LCI) data from front-running EU-level business associations and other sources. The ELCD contains energy-related data on power and fuels. This study describes the methods to be used for the quality analysis of energy data for European markets (available in third-party LC databases and from authoritative sources) that are, or could be, used in the context of the ELCD. The methodology was developed and tested on the energy datasets most relevant for the EU context, derived from GaBi (the reference database used to derive datasets for the ELCD), Ecoinvent, E3 and Gemis. The criteria for the database selection were based on the availability of EU-related data, the inclusion of comprehensive datasets on energy products and services, and the general approval of the LCA community. The proposed approach was based on the quality indicators developed within the International Reference Life Cycle Data System (ILCD) Handbook, further refined to facilitate their use in the analysis of energy systems. The overall Data Quality Rating (DQR) of the energy datasets can be calculated by summing up the quality rating (ranging from 1 to 5, where 1 represents very good, and 5 very poor quality) of each of the quality criteria indicators, divided by the total number of indicators considered. The quality of each dataset can be estimated for each indicator, and then compared with the different databases/sources. The results can be used to highlight the weaknesses of each dataset and can be used to guide further improvements to enhance the data quality with regard to the established criteria. This paper describes the application of the methodology to two exemplary datasets, in order to show the potential of the methodological approach. The analysis helps LCA practitioners to evaluate the usefulness of the ELCD datasets for their purposes, and dataset developers and reviewers to derive information that will help improve the overall DQR of databases.
Theofilatos, Konstantinos; Pavlopoulou, Niki; Papasavvas, Christoforos; Likothanassis, Spiros; Dimitrakopoulos, Christos; Georgopoulos, Efstratios; Moschopoulos, Charalampos; Mavroudi, Seferina
2015-03-01
Proteins are considered to be the most important individual components of biological systems and they combine to form physical protein complexes which are responsible for certain molecular functions. Despite the large availability of protein-protein interaction (PPI) information, not much information is available about protein complexes. Experimental methods are limited in terms of time, efficiency, cost and performance constraints. Existing computational methods have provided encouraging preliminary results, but they phase certain disadvantages as they require parameter tuning, some of them cannot handle weighted PPI data and others do not allow a protein to participate in more than one protein complex. In the present paper, we propose a new fully unsupervised methodology for predicting protein complexes from weighted PPI graphs. The proposed methodology is called evolutionary enhanced Markov clustering (EE-MC) and it is a hybrid combination of an adaptive evolutionary algorithm and a state-of-the-art clustering algorithm named enhanced Markov clustering. EE-MC was compared with state-of-the-art methodologies when applied to datasets from the human and the yeast Saccharomyces cerevisiae organisms. Using public available datasets, EE-MC outperformed existing methodologies (in some datasets the separation metric was increased by 10-20%). Moreover, when applied to new human datasets its performance was encouraging in the prediction of protein complexes which consist of proteins with high functional similarity. In specific, 5737 protein complexes were predicted and 72.58% of them are enriched for at least one gene ontology (GO) function term. EE-MC is by design able to overcome intrinsic limitations of existing methodologies such as their inability to handle weighted PPI networks, their constraint to assign every protein in exactly one cluster and the difficulties they face concerning the parameter tuning. This fact was experimentally validated and moreover, new potentially true human protein complexes were suggested as candidates for further validation using experimental techniques. Copyright © 2015 Elsevier B.V. All rights reserved.
Birth-death prior on phylogeny and speed dating
2008-01-01
Background In recent years there has been a trend of leaving the strict molecular clock in order to infer dating of speciations and other evolutionary events. Explicit modeling of substitution rates and divergence times makes formulation of informative prior distributions for branch lengths possible. Models with birth-death priors on tree branching and auto-correlated or iid substitution rates among lineages have been proposed, enabling simultaneous inference of substitution rates and divergence times. This problem has, however, mainly been analysed in the Markov chain Monte Carlo (MCMC) framework, an approach requiring computation times of hours or days when applied to large phylogenies. Results We demonstrate that a hill-climbing maximum a posteriori (MAP) adaptation of the MCMC scheme results in considerable gain in computational efficiency. We demonstrate also that a novel dynamic programming (DP) algorithm for branch length factorization, useful both in the hill-climbing and in the MCMC setting, further reduces computation time. For the problem of inferring rates and times parameters on a fixed tree, we perform simulations, comparisons between hill-climbing and MCMC on a plant rbcL gene dataset, and dating analysis on an animal mtDNA dataset, showing that our methodology enables efficient, highly accurate analysis of very large trees. Datasets requiring a computation time of several days with MCMC can with our MAP algorithm be accurately analysed in less than a minute. From the results of our example analyses, we conclude that our methodology generally avoids getting trapped early in local optima. For the cases where this nevertheless can be a problem, for instance when we in addition to the parameters also infer the tree topology, we show that the problem can be evaded by using a simulated-annealing like (SAL) method in which we favour tree swaps early in the inference while biasing our focus towards rate and time parameter changes later on. Conclusion Our contribution leaves the field open for fast and accurate dating analysis of nucleotide sequence data. Modeling branch substitutions rates and divergence times separately allows us to include birth-death priors on the times without the assumption of a molecular clock. The methodology is easily adapted to take data from fossil records into account and it can be used together with a broad range of rate and substitution models. PMID:18318893
Birth-death prior on phylogeny and speed dating.
Akerborg, Orjan; Sennblad, Bengt; Lagergren, Jens
2008-03-04
In recent years there has been a trend of leaving the strict molecular clock in order to infer dating of speciations and other evolutionary events. Explicit modeling of substitution rates and divergence times makes formulation of informative prior distributions for branch lengths possible. Models with birth-death priors on tree branching and auto-correlated or iid substitution rates among lineages have been proposed, enabling simultaneous inference of substitution rates and divergence times. This problem has, however, mainly been analysed in the Markov chain Monte Carlo (MCMC) framework, an approach requiring computation times of hours or days when applied to large phylogenies. We demonstrate that a hill-climbing maximum a posteriori (MAP) adaptation of the MCMC scheme results in considerable gain in computational efficiency. We demonstrate also that a novel dynamic programming (DP) algorithm for branch length factorization, useful both in the hill-climbing and in the MCMC setting, further reduces computation time. For the problem of inferring rates and times parameters on a fixed tree, we perform simulations, comparisons between hill-climbing and MCMC on a plant rbcL gene dataset, and dating analysis on an animal mtDNA dataset, showing that our methodology enables efficient, highly accurate analysis of very large trees. Datasets requiring a computation time of several days with MCMC can with our MAP algorithm be accurately analysed in less than a minute. From the results of our example analyses, we conclude that our methodology generally avoids getting trapped early in local optima. For the cases where this nevertheless can be a problem, for instance when we in addition to the parameters also infer the tree topology, we show that the problem can be evaded by using a simulated-annealing like (SAL) method in which we favour tree swaps early in the inference while biasing our focus towards rate and time parameter changes later on. Our contribution leaves the field open for fast and accurate dating analysis of nucleotide sequence data. Modeling branch substitutions rates and divergence times separately allows us to include birth-death priors on the times without the assumption of a molecular clock. The methodology is easily adapted to take data from fossil records into account and it can be used together with a broad range of rate and substitution models.
Duggleby, Wendy; Williams, Allison
2016-01-01
The purpose of this article is to discuss methodological and epistemological considerations involved in using qualitative inquiry to develop interventions. These considerations included (a) using diverse methodological approaches and (b) epistemological considerations such as generalization, de-contextualization, and subjective reality. Diverse methodological approaches have the potential to inform different stages of intervention development. Using the development of a psychosocial hope intervention for advanced cancer patients as an example, the authors utilized a thematic study to assess current theories/frameworks and interventions. However, to understand the processes that the intervention needed to target to affect change, grounded theory was used. Epistemological considerations provided a framework to understand and, further, critique the intervention. Using diverse qualitative methodological approaches and examining epistemological considerations were useful in developing an intervention that appears to foster hope in patients with advanced cancer. © The Author(s) 2015.
Operational Implementation of a Pc Uncertainty Construct for Conjunction Assessment Risk Analysis
NASA Technical Reports Server (NTRS)
Newman, Lauri K.; Hejduk, Matthew D.; Johnson, Lauren C.
2016-01-01
Earlier this year the NASA Conjunction Assessment and Risk Analysis (CARA) project presented the theoretical and algorithmic aspects of a method to include the uncertainties in the calculation inputs when computing the probability of collision (Pc) between two space objects, principally uncertainties in the covariances and the hard-body radius. The output of this calculation approach is to produce rather than a single Pc value an entire probability density function that will represent the range of possible Pc values given the uncertainties in the inputs and bring CA risk analysis methodologies more in line with modern risk management theory. The present study provides results from the exercise of this method against an extended dataset of satellite conjunctions in order to determine the effect of its use on the evaluation of conjunction assessment (CA) event risk posture. The effects are found to be considerable: a good number of events are downgraded from or upgraded to a serious risk designation on the basis of consideration of the Pc uncertainty. The findings counsel the integration of the developed methods into NASA CA operations.
a Metadata Based Approach for Analyzing Uav Datasets for Photogrammetric Applications
NASA Astrophysics Data System (ADS)
Dhanda, A.; Remondino, F.; Santana Quintero, M.
2018-05-01
This paper proposes a methodology for pre-processing and analysing Unmanned Aerial Vehicle (UAV) datasets before photogrammetric processing. In cases where images are gathered without a detailed flight plan and at regular acquisition intervals the datasets can be quite large and be time consuming to process. This paper proposes a method to calculate the image overlap and filter out images to reduce large block sizes and speed up photogrammetric processing. The python-based algorithm that implements this methodology leverages the metadata in each image to determine the end and side overlap of grid-based UAV flights. Utilizing user input, the algorithm filters out images that are unneeded for photogrammetric processing. The result is an algorithm that can speed up photogrammetric processing and provide valuable information to the user about the flight path.
Learning discriminative functional network features of schizophrenia
NASA Astrophysics Data System (ADS)
Gheiratmand, Mina; Rish, Irina; Cecchi, Guillermo; Brown, Matthew; Greiner, Russell; Bashivan, Pouya; Polosecki, Pablo; Dursun, Serdar
2017-03-01
Associating schizophrenia with disrupted functional connectivity is a central idea in schizophrenia research. However, identifying neuroimaging-based features that can serve as reliable "statistical biomarkers" of the disease remains a challenging open problem. We argue that generalization accuracy and stability of candidate features ("biomarkers") must be used as additional criteria on top of standard significance tests in order to discover more robust biomarkers. Generalization accuracy refers to the utility of biomarkers for making predictions about individuals, for example discriminating between patients and controls, in novel datasets. Feature stability refers to the reproducibility of the candidate features across different datasets. Here, we extracted functional connectivity network features from fMRI data at both high-resolution (voxel-level) and a spatially down-sampled lower-resolution ("supervoxel" level). At the supervoxel level, we used whole-brain network links, while at the voxel level, due to the intractably large number of features, we sampled a subset of them. We compared statistical significance, stability and discriminative utility of both feature types in a multi-site fMRI dataset, composed of schizophrenia patients and healthy controls. For both feature types, a considerable fraction of features showed significant differences between the two groups. Also, both feature types were similarly stable across multiple data subsets. However, the whole-brain supervoxel functional connectivity features showed a higher cross-validation classification accuracy of 78.7% vs. 72.4% for the voxel-level features. Cross-site variability and heterogeneity in the patient samples in the multi-site FBIRN dataset made the task more challenging compared to single-site studies. The use of the above methodology in combination with the fully data-driven approach using the whole brain information have the potential to shed light on "biomarker discovery" in schizophrenia.
CoINcIDE: A framework for discovery of patient subtypes across multiple datasets.
Planey, Catherine R; Gevaert, Olivier
2016-03-09
Patient disease subtypes have the potential to transform personalized medicine. However, many patient subtypes derived from unsupervised clustering analyses on high-dimensional datasets are not replicable across multiple datasets, limiting their clinical utility. We present CoINcIDE, a novel methodological framework for the discovery of patient subtypes across multiple datasets that requires no between-dataset transformations. We also present a high-quality database collection, curatedBreastData, with over 2,500 breast cancer gene expression samples. We use CoINcIDE to discover novel breast and ovarian cancer subtypes with prognostic significance and novel hypothesized ovarian therapeutic targets across multiple datasets. CoINcIDE and curatedBreastData are available as R packages.
Boyd, Philip W.; Rynearson, Tatiana A.; Armstrong, Evelyn A.; Fu, Feixue; Hayashi, Kendra; Hu, Zhangxi; Hutchins, David A.; Kudela, Raphael M.; Litchman, Elena; Mulholland, Margaret R.; Passow, Uta; Strzepek, Robert F.; Whittaker, Kerry A.; Yu, Elizabeth; Thomas, Mridul K.
2013-01-01
“It takes a village to finish (marine) science these days” Paraphrased from Curtis Huttenhower (the Human Microbiome project) The rapidity and complexity of climate change and its potential effects on ocean biota are challenging how ocean scientists conduct research. One way in which we can begin to better tackle these challenges is to conduct community-wide scientific studies. This study provides physiological datasets fundamental to understanding functional responses of phytoplankton growth rates to temperature. While physiological experiments are not new, our experiments were conducted in many laboratories using agreed upon protocols and 25 strains of eukaryotic and prokaryotic phytoplankton isolated across a wide range of marine environments from polar to tropical, and from nearshore waters to the open ocean. This community-wide approach provides both comprehensive and internally consistent datasets produced over considerably shorter time scales than conventional individual and often uncoordinated lab efforts. Such datasets can be used to parameterise global ocean model projections of environmental change and to provide initial insights into the magnitude of regional biogeographic change in ocean biota in the coming decades. Here, we compare our datasets with a compilation of literature data on phytoplankton growth responses to temperature. A comparison with prior published data suggests that the optimal temperatures of individual species and, to a lesser degree, thermal niches were similar across studies. However, a comparison of the maximum growth rate across studies revealed significant departures between this and previously collected datasets, which may be due to differences in the cultured isolates, temporal changes in the clonal isolates in cultures, and/or differences in culture conditions. Such methodological differences mean that using particular trait measurements from the prior literature might introduce unknown errors and bias into modelling projections. Using our community-wide approach we can reduce such protocol-driven variability in culture studies, and can begin to address more complex issues such as the effect of multiple environmental drivers on ocean biota. PMID:23704890
Vanegas, Fernando; Weiss, John; Gonzalez, Felipe
2018-01-01
Recent advances in remote sensed imagery and geospatial image processing using unmanned aerial vehicles (UAVs) have enabled the rapid and ongoing development of monitoring tools for crop management and the detection/surveillance of insect pests. This paper describes a (UAV) remote sensing-based methodology to increase the efficiency of existing surveillance practices (human inspectors and insect traps) for detecting pest infestations (e.g., grape phylloxera in vineyards). The methodology uses a UAV integrated with advanced digital hyperspectral, multispectral, and RGB sensors. We implemented the methodology for the development of a predictive model for phylloxera detection. In this method, we explore the combination of airborne RGB, multispectral, and hyperspectral imagery with ground-based data at two separate time periods and under different levels of phylloxera infestation. We describe the technology used—the sensors, the UAV, and the flight operations—the processing workflow of the datasets from each imagery type, and the methods for combining multiple airborne with ground-based datasets. Finally, we present relevant results of correlation between the different processed datasets. The objective of this research is to develop a novel methodology for collecting, processing, analysing and integrating multispectral, hyperspectral, ground and spatial data to remote sense different variables in different applications, such as, in this case, plant pest surveillance. The development of such methodology would provide researchers, agronomists, and UAV practitioners reliable data collection protocols and methods to achieve faster processing techniques and integrate multiple sources of data in diverse remote sensing applications. PMID:29342101
Methodology for evaluation of railroad technology research projects
DOT National Transportation Integrated Search
1981-04-01
This Project memorandum presents a methodology for evaluating railroad research projects. The methodology includes consideration of industry and societal benefits, with special attention given to technical risks, implementation considerations, and po...
Rear-end vision-based collision detection system for motorcyclists
NASA Astrophysics Data System (ADS)
Muzammel, Muhammad; Yusoff, Mohd Zuki; Meriaudeau, Fabrice
2017-05-01
In many countries, the motorcyclist fatality rate is much higher than that of other vehicle drivers. Among many other factors, motorcycle rear-end collisions are also contributing to these biker fatalities. To increase the safety of motorcyclists and minimize their road fatalities, this paper introduces a vision-based rear-end collision detection system. The binary road detection scheme contributes significantly to reduce the negative false detections and helps to achieve reliable results even though shadows and different lane markers are present on the road. The methodology is based on Harris corner detection and Hough transform. To validate this methodology, two types of dataset are used: (1) self-recorded datasets (obtained by placing a camera at the rear end of a motorcycle) and (2) online datasets (recorded by placing a camera at the front of a car). This method achieved 95.1% accuracy for the self-recorded dataset and gives reliable results for the rear-end vehicle detections under different road scenarios. This technique also performs better for the online car datasets. The proposed technique's high detection accuracy using a monocular vision camera coupled with its low computational complexity makes it a suitable candidate for a motorbike rear-end collision detection system.
Check your biosignals here: a new dataset for off-the-person ECG biometrics.
da Silva, Hugo Plácido; Lourenço, André; Fred, Ana; Raposo, Nuno; Aires-de-Sousa, Marta
2014-02-01
The Check Your Biosignals Here initiative (CYBHi) was developed as a way of creating a dataset and consistently repeatable acquisition framework, to further extend research in electrocardiographic (ECG) biometrics. In particular, our work targets the novel trend towards off-the-person data acquisition, which opens a broad new set of challenges and opportunities both for research and industry. While datasets with ECG signals collected using medical grade equipment at the chest can be easily found, for off-the-person ECG data the solution is generally for each team to collect their own corpus at considerable expense of resources. In this paper we describe the context, experimental considerations, methods, and preliminary findings of two public datasets created by our team, one for short-term and another for long-term assessment, with ECG data collected at the hand palms and fingers. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
Multi-decadal Hydrological Retrospective: Case study of Amazon floods and droughts
NASA Astrophysics Data System (ADS)
Wongchuig Correa, Sly; Paiva, Rodrigo Cauduro Dias de; Espinoza, Jhan Carlo; Collischonn, Walter
2017-06-01
Recently developed methodologies such as climate reanalysis make it possible to create a historical record of climate systems. This paper proposes a methodology called Hydrological Retrospective (HR), which essentially simulates large rainfall datasets, using this as input into hydrological models to develop a record of past hydrology, making it possible to analyze past floods and droughts. We developed a methodology for the Amazon basin, where studies have shown an increase in the intensity and frequency of hydrological extreme events in recent decades. We used eight large precipitation datasets (more than 30 years) as input for a large scale hydrological and hydrodynamic model (MGB-IPH). HR products were then validated against several in situ discharge gauges controlling the main Amazon sub-basins, focusing on maximum and minimum events. For the most accurate HR, based on performance metrics, we performed a forecast skill of HR to detect floods and droughts, comparing the results with in-situ observations. A statistical temporal series trend was performed for intensity of seasonal floods and droughts in the entire Amazon basin. Results indicate that HR could represent most past extreme events well, compared with in-situ observed data, and was consistent with many events reported in literature. Because of their flow duration, some minor regional events were not reported in literature but were captured by HR. To represent past regional hydrology and seasonal hydrological extreme events, we believe it is feasible to use some large precipitation datasets such as i) climate reanalysis, which is mainly based on a land surface component, and ii) datasets based on merged products. A significant upward trend in intensity was seen in maximum annual discharge (related to floods) in western and northwestern regions and for minimum annual discharge (related to droughts) in south and central-south regions of the Amazon basin. Because of the global coverage of rainfall datasets, this methodology can be transferred to other regions for better estimation of future hydrological behavior and its impact on society.
NASA Astrophysics Data System (ADS)
2018-01-01
The test dataset was also useful to compare visual range estimates carried out by the Koschmieder equation and visibility measured at the Milano-Linate airport. It is worthy to note that in this work the test dataset was used primarily for checking the proposed methodology and it was not meant to give an assessment of bext and VR in Milan for a wintertime period as done by Vecchi et al., [in press], who applied the tailored equation to a larger aerosol dataset.
Data-driven probability concentration and sampling on manifold
DOE Office of Scientific and Technical Information (OSTI.GOV)
Soize, C., E-mail: christian.soize@univ-paris-est.fr; Ghanem, R., E-mail: ghanem@usc.edu
2016-09-15
A new methodology is proposed for generating realizations of a random vector with values in a finite-dimensional Euclidean space that are statistically consistent with a dataset of observations of this vector. The probability distribution of this random vector, while a priori not known, is presumed to be concentrated on an unknown subset of the Euclidean space. A random matrix is introduced whose columns are independent copies of the random vector and for which the number of columns is the number of data points in the dataset. The approach is based on the use of (i) the multidimensional kernel-density estimation methodmore » for estimating the probability distribution of the random matrix, (ii) a MCMC method for generating realizations for the random matrix, (iii) the diffusion-maps approach for discovering and characterizing the geometry and the structure of the dataset, and (iv) a reduced-order representation of the random matrix, which is constructed using the diffusion-maps vectors associated with the first eigenvalues of the transition matrix relative to the given dataset. The convergence aspects of the proposed methodology are analyzed and a numerical validation is explored through three applications of increasing complexity. The proposed method is found to be robust to noise levels and data complexity as well as to the intrinsic dimension of data and the size of experimental datasets. Both the methodology and the underlying mathematical framework presented in this paper contribute new capabilities and perspectives at the interface of uncertainty quantification, statistical data analysis, stochastic modeling and associated statistical inverse problems.« less
We tested two methods for dataset generation and model construction, and three tree-classifier variants to identify the most parsimonious and thematically accurate mapping methodology for the SW ReGAP project. Competing methodologies were tested in the East Great Basin mapping un...
Choosing the Most Effective Pattern Classification Model under Learning-Time Constraint.
Saito, Priscila T M; Nakamura, Rodrigo Y M; Amorim, Willian P; Papa, João P; de Rezende, Pedro J; Falcão, Alexandre X
2015-01-01
Nowadays, large datasets are common and demand faster and more effective pattern analysis techniques. However, methodologies to compare classifiers usually do not take into account the learning-time constraints required by applications. This work presents a methodology to compare classifiers with respect to their ability to learn from classification errors on a large learning set, within a given time limit. Faster techniques may acquire more training samples, but only when they are more effective will they achieve higher performance on unseen testing sets. We demonstrate this result using several techniques, multiple datasets, and typical learning-time limits required by applications.
NASA Astrophysics Data System (ADS)
Klees, R.; Slobbe, D. C.; Farahani, H. H.
2018-04-01
The paper is about a methodology to combine a noisy satellite-only global gravity field model (GGM) with other noisy datasets to estimate a local quasi-geoid model using weighted least-squares techniques. In this way, we attempt to improve the quality of the estimated quasi-geoid model and to complement it with a full noise covariance matrix for quality control and further data processing. The methodology goes beyond the classical remove-compute-restore approach, which does not account for the noise in the satellite-only GGM. We suggest and analyse three different approaches of data combination. Two of them are based on a local single-scale spherical radial basis function (SRBF) model of the disturbing potential, and one is based on a two-scale SRBF model. Using numerical experiments, we show that a single-scale SRBF model does not fully exploit the information in the satellite-only GGM. We explain this by a lack of flexibility of a single-scale SRBF model to deal with datasets of significantly different bandwidths. The two-scale SRBF model performs well in this respect, provided that the model coefficients representing the two scales are estimated separately. The corresponding methodology is developed in this paper. Using the statistics of the least-squares residuals and the statistics of the errors in the estimated two-scale quasi-geoid model, we demonstrate that the developed methodology provides a two-scale quasi-geoid model, which exploits the information in all datasets.
A methodological investigation of hominoid craniodental morphology and phylogenetics.
Bjarnason, Alexander; Chamberlain, Andrew T; Lockwood, Charles A
2011-01-01
The evolutionary relationships of extant great apes and humans have been largely resolved by molecular studies, yet morphology-based phylogenetic analyses continue to provide conflicting results. In order to further investigate this discrepancy we present bootstrap clade support of morphological data based on two quantitative datasets, one dataset consisting of linear measurements of the whole skull from 5 hominoid genera and the second dataset consisting of 3D landmark data from the temporal bone of 5 hominoid genera, including 11 sub-species. Using similar protocols for both datasets, we were able to 1) compare distance-based phylogenetic methods to cladistic parsimony of quantitative data converted into discrete character states, 2) vary outgroup choice to observe its effect on phylogenetic inference, and 3) analyse male and female data separately to observe the effect of sexual dimorphism on phylogenies. Phylogenetic analysis was sensitive to methodological decisions, particularly outgroup selection, where designation of Pongo as an outgroup and removal of Hylobates resulted in greater congruence with the proposed molecular phylogeny. The performance of distance-based methods also justifies their use in phylogenetic analysis of morphological data. It is clear from our analyses that hominoid phylogenetics ought not to be used as an example of conflict between the morphological and molecular, but as an example of how outgroup and methodological choices can affect the outcome of phylogenetic analysis. Copyright © 2010 Elsevier Ltd. All rights reserved.
Realistic computer network simulation for network intrusion detection dataset generation
NASA Astrophysics Data System (ADS)
Payer, Garrett
2015-05-01
The KDD-99 Cup dataset is dead. While it can continue to be used as a toy example, the age of this dataset makes it all but useless for intrusion detection research and data mining. Many of the attacks used within the dataset are obsolete and do not reflect the features important for intrusion detection in today's networks. Creating a new dataset encompassing a large cross section of the attacks found on the Internet today could be useful, but would eventually fall to the same problem as the KDD-99 Cup; its usefulness would diminish after a period of time. To continue research into intrusion detection, the generation of new datasets needs to be as dynamic and as quick as the attacker. Simply examining existing network traffic and using domain experts such as intrusion analysts to label traffic is inefficient, expensive, and not scalable. The only viable methodology is simulation using technologies including virtualization, attack-toolsets such as Metasploit and Armitage, and sophisticated emulation of threat and user behavior. Simulating actual user behavior and network intrusion events dynamically not only allows researchers to vary scenarios quickly, but enables online testing of intrusion detection mechanisms by interacting with data as it is generated. As new threat behaviors are identified, they can be added to the simulation to make quicker determinations as to the effectiveness of existing and ongoing network intrusion technology, methodology and models.
Craig, Hugh; Berretta, Regina; Moscato, Pablo
2016-01-01
In this study we propose a novel, unsupervised clustering methodology for analyzing large datasets. This new, efficient methodology converts the general clustering problem into the community detection problem in graph by using the Jensen-Shannon distance, a dissimilarity measure originating in Information Theory. Moreover, we use graph theoretic concepts for the generation and analysis of proximity graphs. Our methodology is based on a newly proposed memetic algorithm (iMA-Net) for discovering clusters of data elements by maximizing the modularity function in proximity graphs of literary works. To test the effectiveness of this general methodology, we apply it to a text corpus dataset, which contains frequencies of approximately 55,114 unique words across all 168 written in the Shakespearean era (16th and 17th centuries), to analyze and detect clusters of similar plays. Experimental results and comparison with state-of-the-art clustering methods demonstrate the remarkable performance of our new method for identifying high quality clusters which reflect the commonalities in the literary style of the plays. PMID:27571416
Post-MBA Industry Shifts: An Investigation of Career, Educational and Demographic Factors
ERIC Educational Resources Information Center
Hwang, Alvin; Bento, Regina; Arbaugh, J. B.
2011-01-01
Purpose: The purpose of this study is to examine factors that predict industry-level career change among MBA graduates. Design/methodology/approach: The study analyzed longitudinal data from the Management Education Research Institute (MERI)'s Global MBA Graduate Survey Dataset and MBA Alumni Perspectives Survey Datasets, using principal component…
Size Distributions of Solar Proton Events: Methodological and Physical Restrictions
NASA Astrophysics Data System (ADS)
Miroshnichenko, L. I.; Yanke, V. G.
2016-12-01
Based on the new catalogue of solar proton events (SPEs) for the period of 1997 - 2009 (Solar Cycle 23) we revisit the long-studied problem of the event-size distributions in the context of those constructed for other solar-flare parameters. Recent results on the problem of size distributions of solar flares and proton events are briefly reviewed. Even a cursory acquaintance with this research field reveals a rather mixed and controversial picture. We concentrate on three main issues: i) SPE size distribution for {>} 10 MeV protons in Solar Cycle 23; ii) size distribution of {>} 1 GV proton events in 1942 - 2014; iii) variations of annual numbers for {>} 10 MeV proton events on long time scales (1955 - 2015). Different results are critically compared; most of the studies in this field are shown to suffer from vastly different input datasets as well as from insufficient knowledge of underlying physical processes in the SPEs under consideration. New studies in this field should be made on more distinct physical and methodological bases. It is important to note the evident similarity in size distributions of solar flares and superflares in Sun-like stars.
Towards a monitoring system of temperature extremes in Europe
NASA Astrophysics Data System (ADS)
Lavaysse, Christophe; Cammalleri, Carmelo; Dosio, Alessandro; van der Schrier, Gerard; Toreti, Andrea; Vogt, Jürgen
2018-01-01
Extreme-temperature anomalies such as heat and cold waves may have strong impacts on human activities and health. The heat waves in western Europe in 2003 and in Russia in 2010, or the cold wave in southeastern Europe in 2012, generated a considerable amount of economic loss and resulted in the death of several thousands of people. Providing an operational system to monitor extreme-temperature anomalies in Europe is thus of prime importance to help decision makers and emergency services to be responsive to an unfolding extreme event. In this study, the development and the validation of a monitoring system of extreme-temperature anomalies are presented. The first part of the study describes the methodology based on the persistence of events exceeding a percentile threshold. The method is applied to three different observational datasets, in order to assess the robustness and highlight uncertainties in the observations. The climatology of extreme events from the last 21 years is then analysed to highlight the spatial and temporal variability of the hazard, and discrepancies amongst the observational datasets are discussed. In the last part of the study, the products derived from this study are presented and discussed with respect to previous studies. The results highlight the accuracy of the developed index and the statistical robustness of the distribution used to calculate the return periods.
Consolidating drug data on a global scale using Linked Data.
Jovanovik, Milos; Trajanov, Dimitar
2017-01-21
Drug product data is available on the Web in a distributed fashion. The reasons lie within the regulatory domains, which exist on a national level. As a consequence, the drug data available on the Web are independently curated by national institutions from each country, leaving the data in varying languages, with a varying structure, granularity level and format, on different locations on the Web. Therefore, one of the main challenges in the realm of drug data is the consolidation and integration of large amounts of heterogeneous data into a comprehensive dataspace, for the purpose of developing data-driven applications. In recent years, the adoption of the Linked Data principles has enabled data publishers to provide structured data on the Web and contextually interlink them with other public datasets, effectively de-siloing them. Defining methodological guidelines and specialized tools for generating Linked Data in the drug domain, applicable on a global scale, is a crucial step to achieving the necessary levels of data consolidation and alignment needed for the development of a global dataset of drug product data. This dataset would then enable a myriad of new usage scenarios, which can, for instance, provide insight into the global availability of different drug categories in different parts of the world. We developed a methodology and a set of tools which support the process of generating Linked Data in the drug domain. Using them, we generated the LinkedDrugs dataset by seamlessly transforming, consolidating and publishing high-quality, 5-star Linked Drug Data from twenty-three countries, containing over 248,000 drug products, over 99,000,000 RDF triples and over 278,000 links to generic drugs from the LOD Cloud. Using the linked nature of the dataset, we demonstrate its ability to support advanced usage scenarios in the drug domain. The process of generating the LinkedDrugs dataset demonstrates the applicability of the methodological guidelines and the supporting tools in transforming drug product data from various, independent and distributed sources, into a comprehensive Linked Drug Data dataset. The presented user-centric and analytical usage scenarios over the dataset show the advantages of having a de-siloed, consolidated and comprehensive dataspace of drug data available via the existing infrastructure of the Web.
NASA Astrophysics Data System (ADS)
Cleves, Ann E.; Jain, Ajay N.
2008-03-01
Inductive bias is the set of assumptions that a person or procedure makes in making a prediction based on data. Different methods for ligand-based predictive modeling have different inductive biases, with a particularly sharp contrast between 2D and 3D similarity methods. A unique aspect of ligand design is that the data that exist to test methodology have been largely man-made, and that this process of design involves prediction. By analyzing the molecular similarities of known drugs, we show that the inductive bias of the historic drug discovery process has a very strong 2D bias. In studying the performance of ligand-based modeling methods, it is critical to account for this issue in dataset preparation, use of computational controls, and in the interpretation of results. We propose specific strategies to explicitly address the problems posed by inductive bias considerations.
School Climate Reports from Norwegian Teachers: A Methodological and Substantive Study.
ERIC Educational Resources Information Center
Kallestad, Jan Helge; Olweus, Dan; Alsaker, Francoise
1998-01-01
Explores methodological and substantive issues relating to school climate, using a dataset derived from 42 Norwegian schools at two points of time and a standard definition of organizational climate. Identifies and analyzes four school-climate dimensions. Three dimensions (collegial communication, orientation to change, and teacher influence over…
Industrial Ecology Approach to MSW Methodology Data Set
U.S. municipal solid waste data for the year 2012. This dataset is associated with the following publication:Smith , R., D. Sengupta, S. Takkellapati , and C. Lee. An industrial ecology approach to municipal solid wastemanagement: I. Methodology. Resources, Conservation and Recycling. Elsevier Science BV, Amsterdam, NETHERLANDS, 104: 311-316, (2015).
Hydrological Retrospective of floods and droughts: Case study in the Amazon
NASA Astrophysics Data System (ADS)
Wongchuig Correa, Sly; Cauduro Dias de Paiva, Rodrigo; Carlo Espinoza Villar, Jhan; Collischonn, Walter
2017-04-01
Recent studies have reported an increase in intensity and frequency of hydrological extreme events in many regions of the Amazon basin over last decades, these events such as seasonal floods and droughts have originated a significant impact in human and natural systems. Recently, methodologies such as climatic reanalysis are being developed in order to create a coherent register of climatic systems, thus taking this notion, this research efforts to produce a methodology called Hydrological Retrospective (HR), that essentially simulate large rainfall datasets over hydrological models in order to develop a record over past hydrology, enabling the analysis of past floods and droughts. We developed our methodology on the Amazon basin, thus we used eight large precipitation datasets (more than 30 years) through a large scale hydrological and hydrodynamic model (MGB-IPH), after that HR products were validated against several in situ discharge gauges dispersed throughout Amazon basin, given focus in maximum and minimum events. For better HR results according performance metrics, we performed a forecast skill of HR to detect floods and droughts considering in-situ observations. Furthermore, statistical temporal series trend was performed for intensity of seasonal floods and drought in the whole Amazon basin. Results indicate that better HR represented well most past extreme events registered by in-situ observed data and also showed coherent with many events cited by literature, thus we consider viable to use some large precipitation datasets as climatic reanalysis mainly based on land surface component and datasets based in merged products for represent past regional hydrology and seasonal hydrological extreme events. On the other hand, an increase trend of intensity was realized for maximum annual discharges (related to floods) in north-western regions and for minimum annual discharges (related to drought) in central-south regions of the Amazon basin, these features were previously detected by other researches. In the whole basin, we estimated an upward trend of maximum annual discharges at Amazon River. In order to estimate better future hydrological behavior and their impacts on the society, HR could be used as a methodology to understand past extreme events occurrence in many places considering the global coverage of rainfall datasets.
Handwritten mathematical symbols dataset.
Chajri, Yassine; Bouikhalene, Belaid
2016-06-01
Due to the technological advances in recent years, paper scientific documents are used less and less. Thus, the trend in the scientific community to use digital documents has increased considerably. Among these documents, there are scientific documents and more specifically mathematics documents. In this context, we present our own dataset of handwritten mathematical symbols composed of 10,379 images. This dataset gathers Arabic characters, Latin characters, Arabic numerals, Latin numerals, arithmetic operators, set-symbols, comparison symbols, delimiters, etc.
How to choose methods for lake greenhouse gas flux measurements?
NASA Astrophysics Data System (ADS)
Bastviken, David
2017-04-01
Lake greenhouse gas (GHG) fluxes are increasingly recognized as important for lake ecosystems as well as for large scale carbon and GHG budgets. However, many of our flux estimates are uncertain and it can be discussed if the presently available data is representative for the systems studied or not. Data are also very limited for some important flux pathways. Hence, many ongoing efforts try to better constrain fluxes and understand flux regulation. A fundamental challenge towards improved knowledge and when starting new studies is what methods to choose. A variety of approaches to measure aquatic GHG exchange is used and data from different methods and methodological approaches have often been treated as equally valid to create large datasets for extrapolations and syntheses. However, data from different approaches may cover different flux pathways or spatio-temporal domains and are thus not always comparable. Method inter-comparisons and critical method evaluations addressing these issues are rare. Emerging efforts to organize systematic multi-lake monitoring networks for GHG fluxes leads to method choices that may set the foundation for decades of data generation and therefore require fundamental evaluation of different approaches. The method choices do not only regard the equipment but also for example consideration of overall measurement design and field approaches, relevant spatial and temporal resolution for different flux components, and accessory variables to measure. In addition, consideration of how to design monitoring approaches being affordable, suitable for widespread (global) use, and comparable across regions is needed. Inspired by discussions with Prof. Dr. Cristian Blodau during the EGU General Assembly 2016, this presentation aims to (1) illustrate fundamental pros and cons for a number of common methods, (2) show how common methodological approaches originally adapted for other environments can be improved for lake flux measurements, (3) suggest how consideration of spatio-temporal dimensions of flux variability can lead to more optimized approaches, and (4) highlight possibilities of efficient ways forward including low-cost technologies that has potential for world-wide use.
Puthiyedth, Nisha; Riveros, Carlos; Berretta, Regina; Moscato, Pablo
2015-01-01
Background The joint study of multiple datasets has become a common technique for increasing statistical power in detecting biomarkers obtained from smaller studies. The approach generally followed is based on the fact that as the total number of samples increases, we expect to have greater power to detect associations of interest. This methodology has been applied to genome-wide association and transcriptomic studies due to the availability of datasets in the public domain. While this approach is well established in biostatistics, the introduction of new combinatorial optimization models to address this issue has not been explored in depth. In this study, we introduce a new model for the integration of multiple datasets and we show its application in transcriptomics. Methods We propose a new combinatorial optimization problem that addresses the core issue of biomarker detection in integrated datasets. Optimal solutions for this model deliver a feature selection from a panel of prospective biomarkers. The model we propose is a generalised version of the (α,β)-k-Feature Set problem. We illustrate the performance of this new methodology via a challenging meta-analysis task involving six prostate cancer microarray datasets. The results are then compared to the popular RankProd meta-analysis tool and to what can be obtained by analysing the individual datasets by statistical and combinatorial methods alone. Results Application of the integrated method resulted in a more informative signature than the rank-based meta-analysis or individual dataset results, and overcomes problems arising from real world datasets. The set of genes identified is highly significant in the context of prostate cancer. The method used does not rely on homogenisation or transformation of values to a common scale, and at the same time is able to capture markers associated with subgroups of the disease. PMID:26106884
Keuleers, Emmanuel; Balota, David A
2015-01-01
This paper introduces and summarizes the special issue on megastudies, crowdsourcing, and large datasets in psycholinguistics. We provide a brief historical overview and show how the papers in this issue have extended the field by compiling new databases and making important theoretical contributions. In addition, we discuss several studies that use text corpora to build distributional semantic models to tackle various interesting problems in psycholinguistics. Finally, as is the case across the papers, we highlight some methodological issues that are brought forth via the analyses of such datasets.
NASA Astrophysics Data System (ADS)
Karlsson, K.
2010-12-01
The EUMETSAT CMSAF project (www.cmsaf.eu) compiles climatological datasets from various satellite sources with emphasis on the use of EUMETSAT-operated satellites. However, since climate monitoring primarily has a global scope, also datasets merging data from various satellites and satellite operators are prepared. One such dataset is the CMSAF historic GAC (Global Area Coverage) dataset which is based on AVHRR data from the full historic series of NOAA-satellites and the European METOP satellite in mid-morning orbit launched in October 2006. The CMSAF GAC dataset consists of three groups of products: Macroscopical cloud products (cloud amount, cloud type and cloud top), cloud physical products (cloud phase, cloud optical thickness and cloud liquid water path) and surface radiation products (including surface albedo). Results will be presented and discussed for all product groups, including some preliminary inter-comparisons with other datasets (e.g., PATMOS-X, MODIS and CloudSat/CALIPSO datasets). A background will also be given describing the basic methodology behind the derivation of all products. This will include a short historical review of AVHRR cloud processing and resulting AVHRR applications at SMHI. Historic GAC processing is one of five pilot projects selected by the SCOPE-CM (Sustained Co-Ordinated Processing of Environmental Satellite data for Climate Monitoring) project organised by the WMO Space programme. The pilot project is carried out jointly between CMSAF and NOAA with the purpose of finding an optimal GAC processing approach. The initial activity is to inter-compare results of the CMSAF GAC dataset and the NOAA PATMOS-X dataset for the case when both datasets have been derived using the same inter-calibrated AVHRR radiance dataset. The aim is to get further knowledge of e.g. most useful multispectral methods and the impact of ancillary datasets (for example from meteorological reanalysis datasets from NCEP and ECMWF). The CMSAF project is currently defining plans for another five years (2012-2017) of operations and development. New GAC reprocessing efforts are planned and new methodologies will be tested. Central questions here will be how to increase the quantitative use of the products through improving error and uncertainty estimates and how to compile the information in a way to allow meaningful and efficient ways of using the data for e.g. validation of climate model information.
Quantifying Interannual Variability for Photovoltaic Systems in PVWatts
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ryberg, David Severin; Freeman, Janine; Blair, Nate
2015-10-01
The National Renewable Energy Laboratory's (NREL's) PVWatts is a relatively simple tool used by industry and individuals alike to easily estimate the amount of energy a photovoltaic (PV) system will produce throughout the course of a typical year. PVWatts Version 5 has previously been shown to be able to reasonably represent an operating system's output when provided with concurrent weather data, however this type of data is not available when estimating system output during future time frames. For this purpose PVWatts uses weather data from typical meteorological year (TMY) datasets which are available on the NREL website. The TMY filesmore » represent a statistically 'typical' year which by definition excludes anomalous weather patterns and as a result may not provide sufficient quantification of project risk to the financial community. It was therefore desired to quantify the interannual variability associated with TMY files in order to improve the understanding of risk associated with these projects. To begin to understand the interannual variability of a PV project, we simulated two archetypal PV system designs, which are common in the PV industry, in PVWatts using the NSRDB's 1961-1990 historical dataset. This dataset contains measured hourly weather data and spans the thirty years from 1961-1990 for 239 locations in the United States. To note, this historical dataset was used to compose the TMY2 dataset. Using the results of these simulations we computed several statistical metrics which may be of interest to the financial community and normalized the results with respect to the TMY energy prediction at each location, so that these results could be easily translated to similar systems. This report briefly describes the simulation process used and the statistical methodology employed for this project, but otherwise focuses mainly on a sample of our results. A short discussion of these results is also provided. It is our hope that this quantification of the interannual variability of PV systems will provide a starting point for variability considerations in future PV system designs and investigations. however this type of data is not available when estimating system output during future time frames.« less
Application of Alignment Methodologies to Spatial Ontologies in the Hydro Domain
NASA Astrophysics Data System (ADS)
Lieberman, J. E.; Cheatham, M.; Varanka, D.
2015-12-01
Ontologies are playing an increasing role in facilitating mediation and translation between datasets representing diverse schemas, vocabularies, or knowledge communities. This role is relatively straightforward when there is one ontology comprising all relevant common concepts that can be mapped to entities in each dataset. Frequently, one common ontology has not been agreed to. Either each dataset is represented by a distinct ontology, or there are multiple candidates for commonality. Either the one most appropriate (expressive, relevant, correct) ontology must be chosen, or else concepts and relationships matched across multiple ontologies through an alignment process so that they may be used in concert to carry out mediation or other semantic operations. A resulting alignment can be effective to the extent that entities in in the ontologies represent differing terminology for comparable conceptual knowledge. In cases such as spatial ontologies, though, ontological entities may also represent disparate conceptualizations of space according to the discernment methods and application domains on which they are based. One ontology's wetland concept may overlap in space with another ontology's recharge zone or wildlife range or water feature. In order to evaluate alignment with respect to spatial ontologies, alignment has been applied to a series of ontologies pertaining to surface water that are used variously in hydrography (characterization of water features), hydrology (study of water cycling), and water quality (nutrient and contaminant transport) application domains. There is frequently a need to mediate between datasets in each domain in order to develop broader understanding of surface water systems, so there is a practical as well theoretical value in the alignment. From a domain expertise standpoint, the ontologies under consideration clearly contain some concepts that are spatially as well as conceptually identical and then others with less clear similarities in either sense. Our study serves both to determine the limits of standard methods for aligning spatial ontologies and to suggest new methods of calculating similarity axioms that take into account semantic, spatial, and cognitive criteria relevant to fitness for relevant usage scenarios.
Sabree, Zakee L; Hansen, Allison K; Moran, Nancy A
2012-01-01
Starting in 2003, numerous studies using culture-independent methodologies to characterize the gut microbiota of honey bees have retrieved a consistent and distinctive set of eight bacterial species, based on near identity of the 16S rRNA gene sequences. A recent study [Mattila HR, Rios D, Walker-Sperling VE, Roeselers G, Newton ILG (2012) Characterization of the active microbiotas associated with honey bees reveals healthier and broader communities when colonies are genetically diverse. PLoS ONE 7(3): e32962], using pyrosequencing of the V1-V2 hypervariable region of the 16S rRNA gene, reported finding entirely novel bacterial species in honey bee guts, and used taxonomic assignments from these reads to predict metabolic activities based on known metabolisms of cultivable species. To better understand this discrepancy, we analyzed the Mattila et al. pyrotag dataset. In contrast to the conclusions of Mattila et al., we found that the large majority of pyrotag sequences belonged to clusters for which representative sequences were identical to sequences from previously identified core species of the bee microbiota. On average, they represent 95% of the bacteria in each worker bee in the Mattila et al. dataset, a slightly lower value than that found in other studies. Some colonies contain small proportions of other bacteria, mostly species of Enterobacteriaceae. Reanalysis of the Mattila et al. dataset also did not support a relationship between abundances of Bifidobacterium and of putative pathogens or a significant difference in gut communities between colonies from queens that were singly or multiply mated. Additionally, consistent with previous studies, the dataset supports the occurrence of considerable strain variation within core species, even within single colonies. The roles of these bacteria within bees, or the implications of the strain variation, are not yet clear.
Handwritten mathematical symbols dataset
Chajri, Yassine; Bouikhalene, Belaid
2016-01-01
Due to the technological advances in recent years, paper scientific documents are used less and less. Thus, the trend in the scientific community to use digital documents has increased considerably. Among these documents, there are scientific documents and more specifically mathematics documents. In this context, we present our own dataset of handwritten mathematical symbols composed of 10,379 images. This dataset gathers Arabic characters, Latin characters, Arabic numerals, Latin numerals, arithmetic operators, set-symbols, comparison symbols, delimiters, etc. PMID:27006975
EPA’s AP-42 development methodology: Converting or rerating current AP-42 datasets
USDA-ARS?s Scientific Manuscript database
In August 2013, the U.S. Environmental Protection Agency’s (EPA) published their new methodology for updating the Compilation of Air Pollution Emission Factors (AP-42). The “Recommended Procedures for Development of Emissions Factors and Use of the WebFIRE Database” instructs that the ratings of the...
Evaluating EPA’s AP-42 development methodology using a cotton gin total PM dataset
USDA-ARS?s Scientific Manuscript database
In August 2013, the U.S. Environmental Protection Agency’s (EPA) published their new methodology for updating the Compilation of Air Pollution Emission Factors (AP-42). The “Recommended Procedures for Development of Emissions Factors and Use of the WebFIRE Database” has yet to be widely used. These ...
Theory of impossible worlds: Toward a physics of information.
Buscema, Paolo Massimo; Sacco, Pier Luigi; Della Torre, Francesca; Massini, Giulia; Breda, Marco; Ferilli, Guido
2018-05-01
In this paper, we introduce an innovative approach to the fusion between datasets in terms of attributes and observations, even when they are not related at all. With our technique, starting from datasets representing independent worlds, it is possible to analyze a single global dataset, and transferring each dataset onto the others is always possible. This procedure allows a deeper perspective in the study of a problem, by offering the chance of looking into it from other, independent points of view. Even unrelated datasets create a metaphoric representation of the problem, useful in terms of speed of convergence and predictive results, preserving the fundamental relationships in the data. In order to extract such knowledge, we propose a new learning rule named double backpropagation, by which an auto-encoder concurrently codifies all the different worlds. We test our methodology on different datasets and different issues, to underline the power and flexibility of the Theory of Impossible Worlds.
Theory of impossible worlds: Toward a physics of information
NASA Astrophysics Data System (ADS)
Buscema, Paolo Massimo; Sacco, Pier Luigi; Della Torre, Francesca; Massini, Giulia; Breda, Marco; Ferilli, Guido
2018-05-01
In this paper, we introduce an innovative approach to the fusion between datasets in terms of attributes and observations, even when they are not related at all. With our technique, starting from datasets representing independent worlds, it is possible to analyze a single global dataset, and transferring each dataset onto the others is always possible. This procedure allows a deeper perspective in the study of a problem, by offering the chance of looking into it from other, independent points of view. Even unrelated datasets create a metaphoric representation of the problem, useful in terms of speed of convergence and predictive results, preserving the fundamental relationships in the data. In order to extract such knowledge, we propose a new learning rule named double backpropagation, by which an auto-encoder concurrently codifies all the different worlds. We test our methodology on different datasets and different issues, to underline the power and flexibility of the Theory of Impossible Worlds.
Usefulness of DARPA dataset for intrusion detection system evaluation
NASA Astrophysics Data System (ADS)
Thomas, Ciza; Sharma, Vishwas; Balakrishnan, N.
2008-03-01
The MIT Lincoln Laboratory IDS evaluation methodology is a practical solution in terms of evaluating the performance of Intrusion Detection Systems, which has contributed tremendously to the research progress in that field. The DARPA IDS evaluation dataset has been criticized and considered by many as a very outdated dataset, unable to accommodate the latest trend in attacks. Then naturally the question arises as to whether the detection systems have improved beyond detecting these old level of attacks. If not, is it worth thinking of this dataset as obsolete? The paper presented here tries to provide supporting facts for the use of the DARPA IDS evaluation dataset. The two commonly used signature-based IDSs, Snort and Cisco IDS, and two anomaly detectors, the PHAD and the ALAD, are made use of for this evaluation purpose and the results support the usefulness of DARPA dataset for IDS evaluation.
Maswadeh, Waleed M; Snyder, A Peter
2015-05-30
Variable responses are fundamental for all experiments, and they can consist of information-rich, redundant, and low signal intensities. A dataset can consist of a collection of variable responses over multiple classes or groups. Usually some of the variables are removed in a dataset that contain very little information. Sometimes all the variables are used in the data analysis phase. It is common practice to discriminate between two distributions of data; however, there is no formal algorithm to arrive at a degree of separation (DS) between two distributions of data. The DS is defined herein as the average of the sum of the areas from the probability density functions (PDFs) of A and B that contain a≥percentage of A and/or B. Thus, DS90 is the average of the sum of the PDF areas of A and B that contain ≥90% of A and/or B. To arrive at a DS value, two synthesized PDFs or very large experimental datasets are required. Experimentally it is common practice to generate relatively small datasets. Therefore, the challenge was to find a statistical parameter that can be used on small datasets to estimate and highly correlate with the DS90 parameter. Established statistical methods include the overlap area of the two data distribution profiles, Welch's t-test, Kolmogorov-Smirnov (K-S) test, Mann-Whitney-Wilcoxon test, and the area under the receiver operating characteristics (ROC) curve (AUC). The area between the ROC curve and diagonal (ACD) and the length of the ROC curve (LROC) are introduced. The established, ACD, and LROC methods were correlated to the DS90 when applied on many pairs of synthesized PDFs. The LROC method provided the best linear correlation with, and estimation of, the DS90. The estimated DS90 from the LROC (DS90-LROC) is applied to a database, as an example, of three Italian wines consisting of thirteen variable responses for variable ranking consideration. An important highlight of the DS90-LROC method is utilizing the LROC curve methodology to test all variables one-at-a-time with all pairs of classes in a dataset. Copyright © 2015 Elsevier B.V. All rights reserved.
Benchmarking Spike-Based Visual Recognition: A Dataset and Evaluation
Liu, Qian; Pineda-García, Garibaldi; Stromatias, Evangelos; Serrano-Gotarredona, Teresa; Furber, Steve B.
2016-01-01
Today, increasing attention is being paid to research into spike-based neural computation both to gain a better understanding of the brain and to explore biologically-inspired computation. Within this field, the primate visual pathway and its hierarchical organization have been extensively studied. Spiking Neural Networks (SNNs), inspired by the understanding of observed biological structure and function, have been successfully applied to visual recognition and classification tasks. In addition, implementations on neuromorphic hardware have enabled large-scale networks to run in (or even faster than) real time, making spike-based neural vision processing accessible on mobile robots. Neuromorphic sensors such as silicon retinas are able to feed such mobile systems with real-time visual stimuli. A new set of vision benchmarks for spike-based neural processing are now needed to measure progress quantitatively within this rapidly advancing field. We propose that a large dataset of spike-based visual stimuli is needed to provide meaningful comparisons between different systems, and a corresponding evaluation methodology is also required to measure the performance of SNN models and their hardware implementations. In this paper we first propose an initial NE (Neuromorphic Engineering) dataset based on standard computer vision benchmarksand that uses digits from the MNIST database. This dataset is compatible with the state of current research on spike-based image recognition. The corresponding spike trains are produced using a range of techniques: rate-based Poisson spike generation, rank order encoding, and recorded output from a silicon retina with both flashing and oscillating input stimuli. In addition, a complementary evaluation methodology is presented to assess both model-level and hardware-level performance. Finally, we demonstrate the use of the dataset and the evaluation methodology using two SNN models to validate the performance of the models and their hardware implementations. With this dataset we hope to (1) promote meaningful comparison between algorithms in the field of neural computation, (2) allow comparison with conventional image recognition methods, (3) provide an assessment of the state of the art in spike-based visual recognition, and (4) help researchers identify future directions and advance the field. PMID:27853419
Benchmarking Spike-Based Visual Recognition: A Dataset and Evaluation.
Liu, Qian; Pineda-García, Garibaldi; Stromatias, Evangelos; Serrano-Gotarredona, Teresa; Furber, Steve B
2016-01-01
Today, increasing attention is being paid to research into spike-based neural computation both to gain a better understanding of the brain and to explore biologically-inspired computation. Within this field, the primate visual pathway and its hierarchical organization have been extensively studied. Spiking Neural Networks (SNNs), inspired by the understanding of observed biological structure and function, have been successfully applied to visual recognition and classification tasks. In addition, implementations on neuromorphic hardware have enabled large-scale networks to run in (or even faster than) real time, making spike-based neural vision processing accessible on mobile robots. Neuromorphic sensors such as silicon retinas are able to feed such mobile systems with real-time visual stimuli. A new set of vision benchmarks for spike-based neural processing are now needed to measure progress quantitatively within this rapidly advancing field. We propose that a large dataset of spike-based visual stimuli is needed to provide meaningful comparisons between different systems, and a corresponding evaluation methodology is also required to measure the performance of SNN models and their hardware implementations. In this paper we first propose an initial NE (Neuromorphic Engineering) dataset based on standard computer vision benchmarksand that uses digits from the MNIST database. This dataset is compatible with the state of current research on spike-based image recognition. The corresponding spike trains are produced using a range of techniques: rate-based Poisson spike generation, rank order encoding, and recorded output from a silicon retina with both flashing and oscillating input stimuli. In addition, a complementary evaluation methodology is presented to assess both model-level and hardware-level performance. Finally, we demonstrate the use of the dataset and the evaluation methodology using two SNN models to validate the performance of the models and their hardware implementations. With this dataset we hope to (1) promote meaningful comparison between algorithms in the field of neural computation, (2) allow comparison with conventional image recognition methods, (3) provide an assessment of the state of the art in spike-based visual recognition, and (4) help researchers identify future directions and advance the field.
Conducting high-value secondary dataset analysis: an introductory guide and resources.
Smith, Alexander K; Ayanian, John Z; Covinsky, Kenneth E; Landon, Bruce E; McCarthy, Ellen P; Wee, Christina C; Steinman, Michael A
2011-08-01
Secondary analyses of large datasets provide a mechanism for researchers to address high impact questions that would otherwise be prohibitively expensive and time-consuming to study. This paper presents a guide to assist investigators interested in conducting secondary data analysis, including advice on the process of successful secondary data analysis as well as a brief summary of high-value datasets and online resources for researchers, including the SGIM dataset compendium ( www.sgim.org/go/datasets ). The same basic research principles that apply to primary data analysis apply to secondary data analysis, including the development of a clear and clinically relevant research question, study sample, appropriate measures, and a thoughtful analytic approach. A real-world case description illustrates key steps: (1) define your research topic and question; (2) select a dataset; (3) get to know your dataset; and (4) structure your analysis and presentation of findings in a way that is clinically meaningful. Secondary dataset analysis is a well-established methodology. Secondary analysis is particularly valuable for junior investigators, who have limited time and resources to demonstrate expertise and productivity.
Advanced Methodologies for NASA Science Missions
NASA Astrophysics Data System (ADS)
Hurlburt, N. E.; Feigelson, E.; Mentzel, C.
2017-12-01
Most of NASA's commitment to computational space science involves the organization and processing of Big Data from space-based satellites, and the calculations of advanced physical models based on these datasets. But considerable thought is also needed on what computations are needed. The science questions addressed by space data are so diverse and complex that traditional analysis procedures are often inadequate. The knowledge and skills of the statistician, applied mathematician, and algorithmic computer scientist must be incorporated into programs that currently emphasize engineering and physical science. NASA's culture and administrative mechanisms take full cognizance that major advances in space science are driven by improvements in instrumentation. But it is less well recognized that new instruments and science questions give rise to new challenges in the treatment of satellite data after it is telemetered to the ground. These issues might be divided into two stages: data reduction through software pipelines developed within NASA mission centers; and science analysis that is performed by hundreds of space scientists dispersed through NASA, U.S. universities, and abroad. Both stages benefit from the latest statistical and computational methods; in some cases, the science result is completely inaccessible using traditional procedures. This paper will review the current state of NASA and present example applications using modern methodologies.
Modelling gene expression profiles related to prostate tumor progression using binary states
2013-01-01
Background Cancer is a complex disease commonly characterized by the disrupted activity of several cancer-related genes such as oncogenes and tumor-suppressor genes. Previous studies suggest that the process of tumor progression to malignancy is dynamic and can be traced by changes in gene expression. Despite the enormous efforts made for differential expression detection and biomarker discovery, few methods have been designed to model the gene expression level to tumor stage during malignancy progression. Such models could help us understand the dynamics and simplify or reveal the complexity of tumor progression. Methods We have modeled an on-off state of gene activation per sample then per stage to select gene expression profiles associated to tumor progression. The selection is guided by statistical significance of profiles based on random permutated datasets. Results We show that our method identifies expected profiles corresponding to oncogenes and tumor suppressor genes in a prostate tumor progression dataset. Comparisons with other methods support our findings and indicate that a considerable proportion of significant profiles is not found by other statistical tests commonly used to detect differential expression between tumor stages nor found by other tailored methods. Ontology and pathway analysis concurred with these findings. Conclusions Results suggest that our methodology may be a valuable tool to study tumor malignancy progression, which might reveal novel cancer therapies. PMID:23721350
Uncovering Urban Temporal Patterns from Geo-Tagged Photography.
Paldino, Silvia; Kondor, Dániel; Bojic, Iva; Sobolevsky, Stanislav; González, Marta C; Ratti, Carlo
2016-01-01
We live in a world where digital trails of different forms of human activities compose big urban data, allowing us to detect many aspects of how people experience the city in which they live or come to visit. In this study we propose to enhance urban planning by taking into a consideration individual preferences using information from an unconventional big data source: dataset of geo-tagged photographs that people take in cities which we then use as a measure of urban attractiveness. We discover and compare a temporal behavior of residents and visitors in ten most photographed cities in the world. Looking at the periodicity in urban attractiveness, the results show that the strongest periodic patterns for visitors are usually weekly or monthly. Moreover, by dividing cities into two groups based on which continent they belong to (i.e., North America or Europe), it can be concluded that unlike European cities, behavior of visitors in the US cities in general is similar to the behavior of their residents. Finally, we apply two indices, called "dilatation attractiveness index" and "dilatation index", to our dataset which tell us the spatial and temporal attractiveness pulsations in the city. The proposed methodology is not only important for urban planning, but also does support various business and public stakeholder decision processes, concentrated for example around the question how to attract more visitors to the city or estimate the impact of special events organized there.
Reliability in content analysis: The case of semantic feature norms classification.
Bolognesi, Marianna; Pilgram, Roosmaryn; van den Heerik, Romy
2017-12-01
Semantic feature norms (e.g., STIMULUS: car → RESPONSE:
Research ethics in the post-genomic era.
Vähäkangas, Kirsi
2013-08-01
New high-throughput 'omics techniques are providing exciting opportunities in clinical medicine and toxicology, especially in the development of biomarkers. In health science research there are traditional ethical considerations that are reasonably obvious, like balancing health benefits and health risks, autonomy mainly pursued by informed consent, and protecting privacy. Epidemiological studies applying new large-scale approaches (e.g., high-throughput or high-content methods and global studies that utilize biobanking of samples and produce large-scale datasets) present new challenges that call for re-evaluation of standard ethical considerations. In this context, assessment of the ethics underlying study designs, bioinformatics, and statistics applied in the generation and clinical translation of research results should also be considered. Indeed, there are ethical considerations in the research process itself, in research objectives and how research is pursued (e.g., which methodologies are selected and how they are carried out). Maintaining research integrity is critical, as demonstrated by the relatively frequent retraction of scientific papers following violations of good scientific practice. Abiding by the laws is necessary but not sufficient for good research ethics, which is and remains in the hands of the scientific community at the level of both individual scientists and organizations. Senior scientists are responsible for the transfer of research tradition to the next generation of scientists through education, mentorship, and setting an example by their own behavior, as well as by creating systems in institutions that support good research ethics. Copyright © 2013 Wiley Periodicals, Inc.
Influence of spatial and temporal scales in identifying temperature extremes
NASA Astrophysics Data System (ADS)
van Eck, Christel M.; Friedlingstein, Pierre; Mulder, Vera L.; Regnier, Pierre A. G.
2016-04-01
Extreme heat events are becoming more frequent. Notable are severe heatwaves such as the European heatwave of 2003, the Russian heat wave of 2010 and the Australian heatwave of 2013. Surface temperature is attaining new maxima not only during the summer but also during the winter. The year of 2015 is reported to be a temperature record breaking year for both summer and winter. These extreme temperatures are taking their human and environmental toll, emphasizing the need for an accurate method to define a heat extreme in order to fully understand the spatial and temporal spread of an extreme and its impact. This research aims to explore how the use of different spatial and temporal scales influences the identification of a heat extreme. For this purpose, two near-surface temperature datasets of different temporal scale and spatial scale are being used. First, the daily ERA-Interim dataset of 0.25 degree and a time span of 32 years (1979-2010). Second, the daily Princeton Meteorological Forcing Dataset of 0.5 degree and a time span of 63 years (1948-2010). A temperature is considered extreme anomalous when it is surpassing the 90th, 95th, or the 99th percentile threshold based on the aforementioned pre-processed datasets. The analysis is conducted on a global scale, dividing the world in IPCC's so-called SREX regions developed for the analysis of extreme climate events. Pre-processing is done by detrending and/or subtracting the monthly climatology based on 32 years of data for both datasets and on 63 years of data for only the Princeton Meteorological Forcing Dataset. This results in 6 datasets of temperature anomalies from which the location in time and space of the anomalous warm days are identified. Comparison of the differences between these 6 datasets in terms of absolute threshold temperatures for extremes and the temporal and spatial spread of the extreme anomalous warm days show a dependence of the results on the datasets and methodology used. This stresses the need for a careful selection of data and methodology when identifying heat extremes.
Blade Displacement Measurement Technique Applied to a Full-Scale Rotor Test
NASA Technical Reports Server (NTRS)
Abrego, Anita I.; Olson, Lawrence E.; Romander, Ethan A.; Barrows, Danny A.; Burner, Alpheus W.
2012-01-01
Blade displacement measurements using multi-camera photogrammetry were acquired during the full-scale wind tunnel test of the UH-60A Airloads rotor, conducted in the National Full-Scale Aerodynamics Complex 40- by 80-Foot Wind Tunnel. The objectives were to measure the blade displacement and deformation of the four rotor blades as they rotated through the entire rotor azimuth. These measurements are expected to provide a unique dataset to aid in the development and validation of rotorcraft prediction techniques. They are used to resolve the blade shape and position, including pitch, flap, lag and elastic deformation. Photogrammetric data encompass advance ratios from 0.15 to slowed rotor simulations of 1.0, thrust coefficient to rotor solidity ratios from 0.01 to 0.13, and rotor shaft angles from -10.0 to 8.0 degrees. An overview of the blade displacement measurement methodology and system development, descriptions of image processing, uncertainty considerations, preliminary results covering static and moderate advance ratio test conditions and future considerations are presented. Comparisons of experimental and computational results for a moderate advance ratio forward flight condition show good trend agreements, but also indicate significant mean discrepancies in lag and elastic twist. Blade displacement pitch measurements agree well with both the wind tunnel commanded and measured values.
NASA Astrophysics Data System (ADS)
Shah, Syed Muhammad Saqlain; Batool, Safeera; Khan, Imran; Ashraf, Muhammad Usman; Abbas, Syed Hussnain; Hussain, Syed Adnan
2017-09-01
Automatic diagnosis of human diseases are mostly achieved through decision support systems. The performance of these systems is mainly dependent on the selection of the most relevant features. This becomes harder when the dataset contains missing values for the different features. Probabilistic Principal Component Analysis (PPCA) has reputation to deal with the problem of missing values of attributes. This research presents a methodology which uses the results of medical tests as input, extracts a reduced dimensional feature subset and provides diagnosis of heart disease. The proposed methodology extracts high impact features in new projection by using Probabilistic Principal Component Analysis (PPCA). PPCA extracts projection vectors which contribute in highest covariance and these projection vectors are used to reduce feature dimension. The selection of projection vectors is done through Parallel Analysis (PA). The feature subset with the reduced dimension is provided to radial basis function (RBF) kernel based Support Vector Machines (SVM). The RBF based SVM serves the purpose of classification into two categories i.e., Heart Patient (HP) and Normal Subject (NS). The proposed methodology is evaluated through accuracy, specificity and sensitivity over the three datasets of UCI i.e., Cleveland, Switzerland and Hungarian. The statistical results achieved through the proposed technique are presented in comparison to the existing research showing its impact. The proposed technique achieved an accuracy of 82.18%, 85.82% and 91.30% for Cleveland, Hungarian and Switzerland dataset respectively.
On the Multi-Modal Object Tracking and Image Fusion Using Unsupervised Deep Learning Methodologies
NASA Astrophysics Data System (ADS)
LaHaye, N.; Ott, J.; Garay, M. J.; El-Askary, H. M.; Linstead, E.
2017-12-01
The number of different modalities of remote-sensors has been on the rise, resulting in large datasets with different complexity levels. Such complex datasets can provide valuable information separately, yet there is a bigger value in having a comprehensive view of them combined. As such, hidden information can be deduced through applying data mining techniques on the fused data. The curse of dimensionality of such fused data, due to the potentially vast dimension space, hinders our ability to have deep understanding of them. This is because each dataset requires a user to have instrument-specific and dataset-specific knowledge for optimum and meaningful usage. Once a user decides to use multiple datasets together, deeper understanding of translating and combining these datasets in a correct and effective manner is needed. Although there exists data centric techniques, generic automated methodologies that can potentially solve this problem completely don't exist. Here we are developing a system that aims to gain a detailed understanding of different data modalities. Such system will provide an analysis environment that gives the user useful feedback and can aid in research tasks. In our current work, we show the initial outputs our system implementation that leverages unsupervised deep learning techniques so not to burden the user with the task of labeling input data, while still allowing for a detailed machine understanding of the data. Our goal is to be able to track objects, like cloud systems or aerosols, across different image-like data-modalities. The proposed system is flexible, scalable and robust to understand complex likenesses within multi-modal data in a similar spatio-temporal range, and also to be able to co-register and fuse these images when needed.
Exploring Relationships in Big Data
NASA Astrophysics Data System (ADS)
Mahabal, A.; Djorgovski, S. G.; Crichton, D. J.; Cinquini, L.; Kelly, S.; Colbert, M. A.; Kincaid, H.
2015-12-01
Big Data are characterized by several different 'V's. Volume, Veracity, Volatility, Value and so on. For many datasets inflated Volumes through redundant features often make the data more noisy and difficult to extract Value out of. This is especially true if one is comparing/combining different datasets, and the metadata are diverse. We have been exploring ways to exploit such datasets through a variety of statistical machinery, and visualization. We show how we have applied it to time-series from large astronomical sky-surveys. This was done in the Virtual Observatory framework. More recently we have been doing similar work for a completely different domain viz. biology/cancer. The methodology reuse involves application to diverse datasets gathered through the various centers associated with the Early Detection Research Network (EDRN) for cancer, an initiative of the National Cancer Institute (NCI). Application to Geo datasets is a natural extension.
NASA Astrophysics Data System (ADS)
Lary, D. J.
2013-12-01
A BigData case study is described where multiple datasets from several satellites, high-resolution global meteorological data, social media and in-situ observations are combined using machine learning on a distributed cluster using an automated workflow. The global particulate dataset is relevant to global public health studies and would not be possible to produce without the use of the multiple big datasets, in-situ data and machine learning.To greatly reduce the development time and enhance the functionality a high level language capable of parallel processing has been used (Matlab). A key consideration for the system is high speed access due to the large data volume, persistence of the large data volumes and a precise process time scheduling capability.
Gardiner, James; Gunarathne, Nuwan; Howard, David; Kenney, Laurence
2016-01-01
Collecting large datasets of amputee gait data is notoriously difficult. Additionally, collecting data on less prevalent amputations or on gait activities other than level walking and running on hard surfaces is rarely attempted. However, with the wealth of user-generated content on the Internet, the scope for collecting amputee gait data from alternative sources other than traditional gait labs is intriguing. Here we investigate the potential of YouTube videos to provide gait data on amputee walking. We use an example dataset of trans-femoral amputees level walking at self-selected speeds to collect temporal gait parameters and calculate gait asymmetry. We compare our YouTube data with typical literature values, and show that our methodology produces results that are highly comparable to data collected in a traditional manner. The similarity between the results of our novel methodology and literature values lends confidence to our technique. Nevertheless, clear challenges with the collection and interpretation of crowd-sourced gait data remain, including long term access to datasets, and a lack of validity and reliability studies in this area.
Gardiner, James; Gunarathne, Nuwan; Howard, David; Kenney, Laurence
2016-01-01
Collecting large datasets of amputee gait data is notoriously difficult. Additionally, collecting data on less prevalent amputations or on gait activities other than level walking and running on hard surfaces is rarely attempted. However, with the wealth of user-generated content on the Internet, the scope for collecting amputee gait data from alternative sources other than traditional gait labs is intriguing. Here we investigate the potential of YouTube videos to provide gait data on amputee walking. We use an example dataset of trans-femoral amputees level walking at self-selected speeds to collect temporal gait parameters and calculate gait asymmetry. We compare our YouTube data with typical literature values, and show that our methodology produces results that are highly comparable to data collected in a traditional manner. The similarity between the results of our novel methodology and literature values lends confidence to our technique. Nevertheless, clear challenges with the collection and interpretation of crowd-sourced gait data remain, including long term access to datasets, and a lack of validity and reliability studies in this area. PMID:27764226
ERIC Educational Resources Information Center
Davis, Jamie D., Ed.; Erickson, Jill Shepard, Ed.; Johnson, Sharon R., Ed.; Marshall, Catherine A., Ed.; Running Wolf, Paulette, Ed.; Santiago, Rolando L., Ed.
This first symposium of the Work Group on American Indian Research and Program Evaluation Methodology (AIRPEM) explored American Indian and Alaska Native cultural considerations in relation to "best practices" in research and program evaluation. These cultural considerations include the importance of tribal consultation on research…
A multi-source dataset of urban life in the city of Milan and the Province of Trentino.
Barlacchi, Gianni; De Nadai, Marco; Larcher, Roberto; Casella, Antonio; Chitic, Cristiana; Torrisi, Giovanni; Antonelli, Fabrizio; Vespignani, Alessandro; Pentland, Alex; Lepri, Bruno
2015-01-01
The study of socio-technical systems has been revolutionized by the unprecedented amount of digital records that are constantly being produced by human activities such as accessing Internet services, using mobile devices, and consuming energy and knowledge. In this paper, we describe the richest open multi-source dataset ever released on two geographical areas. The dataset is composed of telecommunications, weather, news, social networks and electricity data from the city of Milan and the Province of Trentino. The unique multi-source composition of the dataset makes it an ideal testbed for methodologies and approaches aimed at tackling a wide range of problems including energy consumption, mobility planning, tourist and migrant flows, urban structures and interactions, event detection, urban well-being and many others.
A multi-source dataset of urban life in the city of Milan and the Province of Trentino
NASA Astrophysics Data System (ADS)
Barlacchi, Gianni; de Nadai, Marco; Larcher, Roberto; Casella, Antonio; Chitic, Cristiana; Torrisi, Giovanni; Antonelli, Fabrizio; Vespignani, Alessandro; Pentland, Alex; Lepri, Bruno
2015-10-01
The study of socio-technical systems has been revolutionized by the unprecedented amount of digital records that are constantly being produced by human activities such as accessing Internet services, using mobile devices, and consuming energy and knowledge. In this paper, we describe the richest open multi-source dataset ever released on two geographical areas. The dataset is composed of telecommunications, weather, news, social networks and electricity data from the city of Milan and the Province of Trentino. The unique multi-source composition of the dataset makes it an ideal testbed for methodologies and approaches aimed at tackling a wide range of problems including energy consumption, mobility planning, tourist and migrant flows, urban structures and interactions, event detection, urban well-being and many others.
A multi-source dataset of urban life in the city of Milan and the Province of Trentino
Barlacchi, Gianni; De Nadai, Marco; Larcher, Roberto; Casella, Antonio; Chitic, Cristiana; Torrisi, Giovanni; Antonelli, Fabrizio; Vespignani, Alessandro; Pentland, Alex; Lepri, Bruno
2015-01-01
The study of socio-technical systems has been revolutionized by the unprecedented amount of digital records that are constantly being produced by human activities such as accessing Internet services, using mobile devices, and consuming energy and knowledge. In this paper, we describe the richest open multi-source dataset ever released on two geographical areas. The dataset is composed of telecommunications, weather, news, social networks and electricity data from the city of Milan and the Province of Trentino. The unique multi-source composition of the dataset makes it an ideal testbed for methodologies and approaches aimed at tackling a wide range of problems including energy consumption, mobility planning, tourist and migrant flows, urban structures and interactions, event detection, urban well-being and many others. PMID:26528394
Boulesteix, Anne-Laure; Wilson, Rory; Hapfelmeier, Alexander
2017-09-09
The goal of medical research is to develop interventions that are in some sense superior, with respect to patient outcome, to interventions currently in use. Similarly, the goal of research in methodological computational statistics is to develop data analysis tools that are themselves superior to the existing tools. The methodology of the evaluation of medical interventions continues to be discussed extensively in the literature and it is now well accepted that medicine should be at least partly "evidence-based". Although we statisticians are convinced of the importance of unbiased, well-thought-out study designs and evidence-based approaches in the context of clinical research, we tend to ignore these principles when designing our own studies for evaluating statistical methods in the context of our methodological research. In this paper, we draw an analogy between clinical trials and real-data-based benchmarking experiments in methodological statistical science, with datasets playing the role of patients and methods playing the role of medical interventions. Through this analogy, we suggest directions for improvement in the design and interpretation of studies which use real data to evaluate statistical methods, in particular with respect to dataset inclusion criteria and the reduction of various forms of bias. More generally, we discuss the concept of "evidence-based" statistical research, its limitations and its impact on the design and interpretation of real-data-based benchmark experiments. We suggest that benchmark studies-a method of assessment of statistical methods using real-world datasets-might benefit from adopting (some) concepts from evidence-based medicine towards the goal of more evidence-based statistical research.
NASA Astrophysics Data System (ADS)
Chegwidden, O.; Nijssen, B.; Rupp, D. E.; Kao, S. C.; Clark, M. P.
2017-12-01
We describe results from a large hydrologic climate change dataset developed across the Pacific Northwestern United States and discuss how the analysis of those results can be seen as a framework for other large hydrologic ensemble investigations. This investigation will better inform future modeling efforts and large ensemble analyses across domains within and beyond the Pacific Northwest. Using outputs from the Coupled Model Intercomparison Project Phase 5 (CMIP5), we provide projections of hydrologic change for the domain through the end of the 21st century. The dataset is based upon permutations of four methodological choices: (1) ten global climate models (2) two representative concentration pathways (3) three meteorological downscaling methods and (4) four unique hydrologic model set-ups (three of which entail the same hydrologic model using independently calibrated parameter sets). All simulations were conducted across the Columbia River Basin and Pacific coastal drainages at a 1/16th ( 6 km) resolution and at a daily timestep. In total, the 172 distinct simulations offer an updated, comprehensive view of climate change projections through the end of the 21st century. The results consist of routed streamflow at 400 sites throughout the domain as well as distributed spatial fields of relevant hydrologic variables like snow water equivalent and soil moisture. In this presentation, we discuss the level of agreement with previous hydrologic projections for the study area and how these projections differ with specific methodological choices. By controlling for some methodological choices we can show how each choice affects key climatic change metrics. We discuss how the spread in results varies across hydroclimatic regimes. We will use this large dataset as a case study for distilling a wide range of hydroclimatological projections into useful climate change assessments.
NASA Astrophysics Data System (ADS)
Schepaschenko, D.; McCallum, I.; Shvidenko, A.; Kraxner, F.; Fritz, S.
2009-04-01
There is a critical need for accurate land cover information for resource assessment, biophysical modeling, greenhouse gas studies, and for estimating possible terrestrial responses and feedbacks to climate change. However, practically all existing land cover datasets have quite a high level of uncertainty and suffer from a lack of important details that does not allow for relevant parameterization, e.g., data derived from different forest inventories. The objective of this study is to develop a methodology in order to create a hybrid land cover dataset at the level which would satisfy requirements of the verified terrestrial biota full greenhouse gas account (Shvidenko et al., 2008) for large regions i.e. Russia. Such requirements necessitate a detailed quantification of land classes (e.g., for forests - dominant species, age, growing stock, net primary production, etc.) with additional information on uncertainties of the major biometric and ecological parameters in the range of 10-20% and a confidence interval of around 0.9. The approach taken here allows the integration of different datasets to explore synergies and in particular the merging and harmonization of land and forest inventories, ecological monitoring, remote sensing data and in-situ information. The following datasets have been integrated: Remote sensing: Global Land Cover 2000 (Fritz et al., 2003), Vegetation Continuous Fields (Hansen et al., 2002), Vegetation Fire (Sukhinin, 2007), Regional land cover (Schmullius et al., 2005); GIS: Soil 1:2.5 Mio (Dokuchaev Soil Science Institute, 1996), Administrative Regions 1:2.5 Mio, Vegetation 1:4 Mio, Bioclimatic Zones 1:4 Mio (Stolbovoi & McCallum, 2002), Forest Enterprises 1:2.5 Mio, Rivers/Lakes and Roads/Railways 1:1 Mio (IIASA's data base); Inventories and statistics: State Land Account (FARSC RF, 2006), State Forest Account - SFA (FFS RF, 2003), Disturbances in forests (FFS RF, 2006). The resulting hybrid land cover dataset at 1-km resolution comprises the following classes: Forest (each grid links to the SFA database, which contains 86,613 records); Agriculture (5 classes, parameterized by 89 administrative units); Wetlands (8 classes, parameterized by 83 zone/region units); Open Woodland, Burnt area; Shrub/grassland (50 classes, parameterized by 300 zone/region units); Water; Unproductive area. This study has demonstrated the ability to produce a highly detailed (both spatially and thematically) land cover dataset over Russia. Future efforts include further validation of the hybrid land cover dataset for Russia, and its use for assessment of the terrestrial biota full greenhouse gas budget across Russia. The methodology proposed in this study could be applied at the global level. Results of such an undertaking would however be highly dependent upon the quality of the available ground data. The implementation of the hybrid land cover dataset was undertaken in a way that it can be regularly updated based on new ground data and remote sensing products (ie. MODIS).
ERIC Educational Resources Information Center
Trimble, Joseph E.; And Others
A review of pertinent research on the adaptation of ethnic minority elderly to life-threatening events (personal, man-made, or natural) exposes voids in the research, presents methodological considerations, and indicates that ethnic minority elderly are disproportionately victimized by life-threatening events. Unusually high numbers of…
ERIC Educational Resources Information Center
Abes, Elisa S.
2009-01-01
This article is an exploration of possibilities and methodological considerations for using multiple theoretical perspectives in research that challenges inequitable power structures in student development theory. Specifically, I explore methodological considerations when partnering queer theory and constructivism in research on lesbian identity…
NASA Astrophysics Data System (ADS)
Pedretti, Daniele; Beckie, Roger Daniel
2014-05-01
Missing data in hydrological time-series databases are ubiquitous in practical applications, yet it is of fundamental importance to make educated decisions in problems involving exhaustive time-series knowledge. This includes precipitation datasets, since recording or human failures can produce gaps in these time series. For some applications, directly involving the ratio between precipitation and some other quantity, lack of complete information can result in poor understanding of basic physical and chemical dynamics involving precipitated water. For instance, the ratio between precipitation (recharge) and outflow rates at a discharge point of an aquifer (e.g. rivers, pumping wells, lysimeters) can be used to obtain aquifer parameters and thus to constrain model-based predictions. We tested a suite of methodologies to reconstruct missing information in rainfall datasets. The goal was to obtain a suitable and versatile method to reduce the errors given by the lack of data in specific time windows. Our analyses included both a classical chronologically-pairing approach between rainfall stations and a probability-based approached, which accounted for the probability of exceedence of rain depths measured at two or multiple stations. Our analyses proved that it is not clear a priori which method delivers the best methodology. Rather, this selection should be based considering the specific statistical properties of the rainfall dataset. In this presentation, our emphasis is to discuss the effects of a few typical parametric distributions used to model the behavior of rainfall. Specifically, we analyzed the role of distributional "tails", which have an important control on the occurrence of extreme rainfall events. The latter strongly affect several hydrological applications, including recharge-discharge relationships. The heavy-tailed distributions we considered were parametric Log-Normal, Generalized Pareto, Generalized Extreme and Gamma distributions. The methods were first tested on synthetic examples, to have a complete control of the impact of several variables such as minimum amount of data required to obtain reliable statistical distributions from the selected parametric functions. Then, we applied the methodology to precipitation datasets collected in the Vancouver area and on a mining site in Peru.
NASA Astrophysics Data System (ADS)
Yu, H.; Gu, H.
2017-12-01
A novel multivariate seismic formation pressure prediction methodology is presented, which incorporates high-resolution seismic velocity data from prestack AVO inversion, and petrophysical data (porosity and shale volume) derived from poststack seismic motion inversion. In contrast to traditional seismic formation prediction methods, the proposed methodology is based on a multivariate pressure prediction model and utilizes a trace-by-trace multivariate regression analysis on seismic-derived petrophysical properties to calibrate model parameters in order to make accurate predictions with higher resolution in both vertical and lateral directions. With prestack time migration velocity as initial velocity model, an AVO inversion was first applied to prestack dataset to obtain high-resolution seismic velocity with higher frequency that is to be used as the velocity input for seismic pressure prediction, and the density dataset to calculate accurate Overburden Pressure (OBP). Seismic Motion Inversion (SMI) is an inversion technique based on Markov Chain Monte Carlo simulation. Both structural variability and similarity of seismic waveform are used to incorporate well log data to characterize the variability of the property to be obtained. In this research, porosity and shale volume are first interpreted on well logs, and then combined with poststack seismic data using SMI to build porosity and shale volume datasets for seismic pressure prediction. A multivariate effective stress model is used to convert velocity, porosity and shale volume datasets to effective stress. After a thorough study of the regional stratigraphic and sedimentary characteristics, a regional normally compacted interval model is built, and then the coefficients in the multivariate prediction model are determined in a trace-by-trace multivariate regression analysis on the petrophysical data. The coefficients are used to convert velocity, porosity and shale volume datasets to effective stress and then to calculate formation pressure with OBP. Application of the proposed methodology to a research area in East China Sea has proved that the method can bridge the gap between seismic and well log pressure prediction and give predicted pressure values close to pressure meassurements from well testing.
NASA Astrophysics Data System (ADS)
Styron, R. H.; Garcia, J.; Pagani, M.
2017-12-01
A global catalog of active faults is a resource of value to a wide swath of the geoscience, earthquake engineering, and hazards risk communities. Though construction of such a dataset has been attempted now and again through the past few decades, success has been elusive. The Global Earthquake Model (GEM) Foundation has been working on this problem, as a fundamental step in its goal of making a global seismic hazard model. Progress on the assembly of the database is rapid, with the concatenation of many national—, orogen—, and continental—scale datasets produced by different research groups throughout the years. However, substantial data gaps exist throughout much of the deforming world, requiring new mapping based on existing publications as well as consideration of seismicity, geodesy and remote sensing data. Thus far, new fault datasets have been created for the Caribbean and Central America, North Africa, and northeastern Asia, with Madagascar, Canada and a few other regions in the queue. The second major task, as formidable as the initial data concatenation, is the 'harmonization' of data. This entails the removal or recombination of duplicated structures, reconciliation of contrastinginterpretations in areas of overlap, and the synthesis of many different types of attributes or metadata into a consistent whole. In a project of this scale, the methods used in the database construction are as critical to project success as the data themselves. After some experimentation, we have settled on an iterative methodology that involves rapid accumulation of data followed by successive episodes of data revision, and a computer-scripted data assembly using GIS file formats that is flexible, reproducible, and as able as possible to cope with updates to the constituent datasets. We find that this approach of initially maximizing coverage and then increasing resolution is the most robust to regional data problems and the most amenable to continued updates and refinement. Combined with the public, open-source nature of this project, GEM is producing a resource that can continue to evolve with the changing knowledge and needs of the community.
Tungsten fiber reinforced superalloy composite high temperature component design considerations
NASA Technical Reports Server (NTRS)
Winsa, E. A.
1982-01-01
Tungsten fiber reinforced superalloy composites (TFRS) are intended for use in high temperature turbine components. Current turbine component design methodology is based on applying the experience, sometimes semiempirical, gained from over 30 years of superalloy component design. Current composite component design capability is generally limited to the methodology for low temperature resin matrix composites. Often the tendency is to treat TFRS as just another superalloy or low temperature composite. However, TFRS behavior is significantly different than that of superalloys, and the high environment adds consideration not common in low temperature composite component design. The methodology used for preliminary design of TFRS components are described. Considerations unique to TFRS are emphasized.
Uncovering Urban Temporal Patterns from Geo-Tagged Photography
Paldino, Silvia; Kondor, Dániel; Sobolevsky, Stanislav; González, Marta C.; Ratti, Carlo
2016-01-01
We live in a world where digital trails of different forms of human activities compose big urban data, allowing us to detect many aspects of how people experience the city in which they live or come to visit. In this study we propose to enhance urban planning by taking into a consideration individual preferences using information from an unconventional big data source: dataset of geo-tagged photographs that people take in cities which we then use as a measure of urban attractiveness. We discover and compare a temporal behavior of residents and visitors in ten most photographed cities in the world. Looking at the periodicity in urban attractiveness, the results show that the strongest periodic patterns for visitors are usually weekly or monthly. Moreover, by dividing cities into two groups based on which continent they belong to (i.e., North America or Europe), it can be concluded that unlike European cities, behavior of visitors in the US cities in general is similar to the behavior of their residents. Finally, we apply two indices, called “dilatation attractiveness index” and “dilatation index”, to our dataset which tell us the spatial and temporal attractiveness pulsations in the city. The proposed methodology is not only important for urban planning, but also does support various business and public stakeholder decision processes, concentrated for example around the question how to attract more visitors to the city or estimate the impact of special events organized there. PMID:27935979
Prediction of brain tissue temperature using near-infrared spectroscopy.
Holper, Lisa; Mitra, Subhabrata; Bale, Gemma; Robertson, Nicola; Tachtsidis, Ilias
2017-04-01
Broadband near-infrared spectroscopy (NIRS) can provide an endogenous indicator of tissue temperature based on the temperature dependence of the water absorption spectrum. We describe a first evaluation of the calibration and prediction of brain tissue temperature obtained during hypothermia in newborn piglets (animal dataset) and rewarming in newborn infants (human dataset) based on measured body (rectal) temperature. The calibration using partial least squares regression proved to be a reliable method to predict brain tissue temperature with respect to core body temperature in the wavelength interval of 720 to 880 nm with a strong mean predictive power of [Formula: see text] (animal dataset) and [Formula: see text] (human dataset). In addition, we applied regression receiver operating characteristic curves for the first time to evaluate the temperature prediction, which provided an overall mean error bias between NIRS predicted brain temperature and body temperature of [Formula: see text] (animal dataset) and [Formula: see text] (human dataset). We discuss main methodological aspects, particularly the well-known aspect of over- versus underestimation between brain and body temperature, which is relevant for potential clinical applications.
Al-Qaeda in Iraq (AQI): An Al-Qaeda Affiliate Case Study
2017-10-01
a comparative methodology that included eight case studies on groups affiliated or associated with Al-Qaeda. These case studies were then used as a... methodology that included eight case studies on groups affiliated or associated with Al-Qaeda. These case studies were then used as a dataset for cross...Case Study Zack Gold With contributions from Pamela G. Faber October 2017 This work was performed under Federal Government
EJ IWG Promising Practices for EJ Methodologies in NEPA Reviews
Report of methodologies gleaned from current agency practices identified by the NEPA Committee. These methodologies are concerning the interface of environmental justice considerations through NEPA processes.
Molina, Iñigo; Martinez, Estibaliz; Arquero, Agueda; Pajares, Gonzalo; Sanchez, Javier
2012-01-01
Landcover is subject to continuous changes on a wide variety of temporal and spatial scales. Those changes produce significant effects in human and natural activities. Maintaining an updated spatial database with the occurred changes allows a better monitoring of the Earth’s resources and management of the environment. Change detection (CD) techniques using images from different sensors, such as satellite imagery, aerial photographs, etc., have proven to be suitable and secure data sources from which updated information can be extracted efficiently, so that changes can also be inventoried and monitored. In this paper, a multisource CD methodology for multiresolution datasets is applied. First, different change indices are processed, then different thresholding algorithms for change/no_change are applied to these indices in order to better estimate the statistical parameters of these categories, finally the indices are integrated into a change detection multisource fusion process, which allows generating a single CD result from several combination of indices. This methodology has been applied to datasets with different spectral and spatial resolution properties. Then, the obtained results are evaluated by means of a quality control analysis, as well as with complementary graphical representations. The suggested methodology has also been proved efficiently for identifying the change detection index with the higher contribution. PMID:22737023
Molina, Iñigo; Martinez, Estibaliz; Arquero, Agueda; Pajares, Gonzalo; Sanchez, Javier
2012-01-01
Landcover is subject to continuous changes on a wide variety of temporal and spatial scales. Those changes produce significant effects in human and natural activities. Maintaining an updated spatial database with the occurred changes allows a better monitoring of the Earth's resources and management of the environment. Change detection (CD) techniques using images from different sensors, such as satellite imagery, aerial photographs, etc., have proven to be suitable and secure data sources from which updated information can be extracted efficiently, so that changes can also be inventoried and monitored. In this paper, a multisource CD methodology for multiresolution datasets is applied. First, different change indices are processed, then different thresholding algorithms for change/no_change are applied to these indices in order to better estimate the statistical parameters of these categories, finally the indices are integrated into a change detection multisource fusion process, which allows generating a single CD result from several combination of indices. This methodology has been applied to datasets with different spectral and spatial resolution properties. Then, the obtained results are evaluated by means of a quality control analysis, as well as with complementary graphical representations. The suggested methodology has also been proved efficiently for identifying the change detection index with the higher contribution.
Temperature, Geochemistry, and Gravity Data of the Tularosa Basin
Nash, Greg
2017-06-16
This submission contains multiple excel spreadsheets and associated written reports. The datasets area are representative of shallow temperature, geochemistry, and other well logging observations made across WSMR (white sands missile range); located to the west of the Tularosa Basin but still within the study area. Written reports accompany some of the datasets, and they provide ample description of the methodology and results obtained from these studies. Gravity data is also included, as point data in a shapefile, along with a written report describing that particular study.
Knowledge mining from clinical datasets using rough sets and backpropagation neural network.
Nahato, Kindie Biredagn; Harichandran, Khanna Nehemiah; Arputharaj, Kannan
2015-01-01
The availability of clinical datasets and knowledge mining methodologies encourages the researchers to pursue research in extracting knowledge from clinical datasets. Different data mining techniques have been used for mining rules, and mathematical models have been developed to assist the clinician in decision making. The objective of this research is to build a classifier that will predict the presence or absence of a disease by learning from the minimal set of attributes that has been extracted from the clinical dataset. In this work rough set indiscernibility relation method with backpropagation neural network (RS-BPNN) is used. This work has two stages. The first stage is handling of missing values to obtain a smooth data set and selection of appropriate attributes from the clinical dataset by indiscernibility relation method. The second stage is classification using backpropagation neural network on the selected reducts of the dataset. The classifier has been tested with hepatitis, Wisconsin breast cancer, and Statlog heart disease datasets obtained from the University of California at Irvine (UCI) machine learning repository. The accuracy obtained from the proposed method is 97.3%, 98.6%, and 90.4% for hepatitis, breast cancer, and heart disease, respectively. The proposed system provides an effective classification model for clinical datasets.
ProDaMa: an open source Python library to generate protein structure datasets.
Armano, Giuliano; Manconi, Andrea
2009-10-02
The huge difference between the number of known sequences and known tertiary structures has justified the use of automated methods for protein analysis. Although a general methodology to solve these problems has not been yet devised, researchers are engaged in developing more accurate techniques and algorithms whose training plays a relevant role in determining their performance. From this perspective, particular importance is given to the training data used in experiments, and researchers are often engaged in the generation of specialized datasets that meet their requirements. To facilitate the task of generating specialized datasets we devised and implemented ProDaMa, an open source Python library than provides classes for retrieving, organizing, updating, analyzing, and filtering protein data. ProDaMa has been used to generate specialized datasets useful for secondary structure prediction and to develop a collaborative web application aimed at generating and sharing protein structure datasets. The library, the related database, and the documentation are freely available at the URL http://iasc.diee.unica.it/prodama.
Statistical and Spatial Analysis of Bathymetric Data for the St. Clair River, 1971-2007
Bennion, David
2009-01-01
To address questions concerning ongoing geomorphic processes in the St. Clair River, selected bathymetric datasets spanning 36 years were analyzed. Comparisons of recent high-resolution datasets covering the upper river indicate a highly variable, active environment. Although statistical and spatial comparisons of the datasets show that some changes to the channel size and shape have taken place during the study period, uncertainty associated with various survey methods and interpolation processes limit the statistically certain results. The methods used to spatially compare the datasets are sensitive to small variations in position and depth that are within the range of uncertainty associated with the datasets. Characteristics of the data, such as the density of measured points and the range of values surveyed, can also influence the results of spatial comparison. With due consideration of these limitations, apparently active and ongoing areas of elevation change in the river are mapped and discussed.
Jeon, Soyoung; Paciorek, Christopher J.; Wehner, Michael F.
2016-02-16
Extreme event attribution characterizes how anthropogenic climate change may have influenced the probability and magnitude of selected individual extreme weather and climate events. Attribution statements often involve quantification of the fraction of attributable risk (FAR) or the risk ratio (RR) and associated confidence intervals. Many such analyses use climate model output to characterize extreme event behavior with and without anthropogenic influence. However, such climate models may have biases in their representation of extreme events. To account for discrepancies in the probabilities of extreme events between observational datasets and model datasets, we demonstrate an appropriate rescaling of the model output basedmore » on the quantiles of the datasets to estimate an adjusted risk ratio. Our methodology accounts for various components of uncertainty in estimation of the risk ratio. In particular, we present an approach to construct a one-sided confidence interval on the lower bound of the risk ratio when the estimated risk ratio is infinity. We demonstrate the methodology using the summer 2011 central US heatwave and output from the Community Earth System Model. In this example, we find that the lower bound of the risk ratio is relatively insensitive to the magnitude and probability of the actual event.« less
Meneghetti, Natascia; Facco, Pierantonio; Bezzo, Fabrizio; Himawan, Chrismono; Zomer, Simeone; Barolo, Massimiliano
2016-05-30
In this proof-of-concept study, a methodology is proposed to systematically analyze large data historians of secondary pharmaceutical manufacturing systems using data mining techniques. The objective is to develop an approach enabling to automatically retrieve operation-relevant information that can assist the management in the periodic review of a manufactory system. The proposed methodology allows one to automatically perform three tasks: the identification of single batches within the entire data-sequence of the historical dataset, the identification of distinct operating phases within each batch, and the characterization of a batch with respect to an assigned multivariate set of operating characteristics. The approach is tested on a six-month dataset of a commercial-scale granulation/drying system, where several millions of data entries are recorded. The quality of results and the generality of the approach indicate that there is a strong potential for extending the method to even larger historical datasets and to different operations, thus making it an advanced PAT tool that can assist the implementation of continual improvement paradigms within a quality-by-design framework. Copyright © 2016 Elsevier B.V. All rights reserved.
Yager, Douglas B.; Hofstra, Albert H.; Granitto, Matthew
2012-01-01
This report emphasizes geographic information system analysis and the display of data stored in the legacy U.S. Geological Survey National Geochemical Database for use in mineral resource investigations. Geochemical analyses of soils, stream sediments, and rocks that are archived in the National Geochemical Database provide an extensive data source for investigating geochemical anomalies. A study area in the Egan Range of east-central Nevada was used to develop a geographic information system analysis methodology for two different geochemical datasets involving detailed (Bureau of Land Management Wilderness) and reconnaissance-scale (National Uranium Resource Evaluation) investigations. ArcGIS was used to analyze and thematically map geochemical information at point locations. Watershed-boundary datasets served as a geographic reference to relate potentially anomalous sample sites with hydrologic unit codes at varying scales. The National Hydrography Dataset was analyzed with Hydrography Event Management and ArcGIS Utility Network Analyst tools to delineate potential sediment-sample provenance along a stream network. These tools can be used to track potential upstream-sediment-contributing areas to a sample site. This methodology identifies geochemically anomalous sample sites, watersheds, and streams that could help focus mineral resource investigations in the field.
Moon, Myungjin; Nakai, Kenta
2018-04-01
Currently, cancer biomarker discovery is one of the important research topics worldwide. In particular, detecting significant genes related to cancer is an important task for early diagnosis and treatment of cancer. Conventional studies mostly focus on genes that are differentially expressed in different states of cancer; however, noise in gene expression datasets and insufficient information in limited datasets impede precise analysis of novel candidate biomarkers. In this study, we propose an integrative analysis of gene expression and DNA methylation using normalization and unsupervised feature extractions to identify candidate biomarkers of cancer using renal cell carcinoma RNA-seq datasets. Gene expression and DNA methylation datasets are normalized by Box-Cox transformation and integrated into a one-dimensional dataset that retains the major characteristics of the original datasets by unsupervised feature extraction methods, and differentially expressed genes are selected from the integrated dataset. Use of the integrated dataset demonstrated improved performance as compared with conventional approaches that utilize gene expression or DNA methylation datasets alone. Validation based on the literature showed that a considerable number of top-ranked genes from the integrated dataset have known relationships with cancer, implying that novel candidate biomarkers can also be acquired from the proposed analysis method. Furthermore, we expect that the proposed method can be expanded for applications involving various types of multi-omics datasets.
Semantic similarity measures in the biomedical domain by leveraging a web search engine.
Hsieh, Sheau-Ling; Chang, Wen-Yung; Chen, Chi-Huang; Weng, Yung-Ching
2013-07-01
Various researches in web related semantic similarity measures have been deployed. However, measuring semantic similarity between two terms remains a challenging task. The traditional ontology-based methodologies have a limitation that both concepts must be resided in the same ontology tree(s). Unfortunately, in practice, the assumption is not always applicable. On the other hand, if the corpus is sufficiently adequate, the corpus-based methodologies can overcome the limitation. Now, the web is a continuous and enormous growth corpus. Therefore, a method of estimating semantic similarity is proposed via exploiting the page counts of two biomedical concepts returned by Google AJAX web search engine. The features are extracted as the co-occurrence patterns of two given terms P and Q, by querying P, Q, as well as P AND Q, and the web search hit counts of the defined lexico-syntactic patterns. These similarity scores of different patterns are evaluated, by adapting support vector machines for classification, to leverage the robustness of semantic similarity measures. Experimental results validating against two datasets: dataset 1 provided by A. Hliaoutakis; dataset 2 provided by T. Pedersen, are presented and discussed. In dataset 1, the proposed approach achieves the best correlation coefficient (0.802) under SNOMED-CT. In dataset 2, the proposed method obtains the best correlation coefficient (SNOMED-CT: 0.705; MeSH: 0.723) with physician scores comparing with measures of other methods. However, the correlation coefficients (SNOMED-CT: 0.496; MeSH: 0.539) with coder scores received opposite outcomes. In conclusion, the semantic similarity findings of the proposed method are close to those of physicians' ratings. Furthermore, the study provides a cornerstone investigation for extracting fully relevant information from digitizing, free-text medical records in the National Taiwan University Hospital database.
Chowdhury, Nilotpal; Sapru, Shantanu
2015-01-01
Microarray analysis has revolutionized the role of genomic prognostication in breast cancer. However, most studies are single series studies, and suffer from methodological problems. We sought to use a meta-analytic approach in combining multiple publicly available datasets, while correcting for batch effects, to reach a more robust oncogenomic analysis. The aim of the present study was to find gene sets associated with distant metastasis free survival (DMFS) in systemically untreated, node-negative breast cancer patients, from publicly available genomic microarray datasets. Four microarray series (having 742 patients) were selected after a systematic search and combined. Cox regression for each gene was done for the combined dataset (univariate, as well as multivariate - adjusted for expression of Cell cycle related genes) and for the 4 major molecular subtypes. The centre and microarray batch effects were adjusted by including them as random effects variables. The Cox regression coefficients for each analysis were then ranked and subjected to a Gene Set Enrichment Analysis (GSEA). Gene sets representing protein translation were independently negatively associated with metastasis in the Luminal A and Luminal B subtypes, but positively associated with metastasis in Basal tumors. Proteinaceous extracellular matrix (ECM) gene set expression was positively associated with metastasis, after adjustment for expression of cell cycle related genes on the combined dataset. Finally, the positive association of the proliferation-related genes with metastases was confirmed. To the best of our knowledge, the results depicting mixed prognostic significance of protein translation in breast cancer subtypes are being reported for the first time. We attribute this to our study combining multiple series and performing a more robust meta-analytic Cox regression modeling on the combined dataset, thus discovering 'hidden' associations. This methodology seems to yield new and interesting results and may be used as a tool to guide new research.
Chowdhury, Nilotpal; Sapru, Shantanu
2015-01-01
Introduction Microarray analysis has revolutionized the role of genomic prognostication in breast cancer. However, most studies are single series studies, and suffer from methodological problems. We sought to use a meta-analytic approach in combining multiple publicly available datasets, while correcting for batch effects, to reach a more robust oncogenomic analysis. Aim The aim of the present study was to find gene sets associated with distant metastasis free survival (DMFS) in systemically untreated, node-negative breast cancer patients, from publicly available genomic microarray datasets. Methods Four microarray series (having 742 patients) were selected after a systematic search and combined. Cox regression for each gene was done for the combined dataset (univariate, as well as multivariate – adjusted for expression of Cell cycle related genes) and for the 4 major molecular subtypes. The centre and microarray batch effects were adjusted by including them as random effects variables. The Cox regression coefficients for each analysis were then ranked and subjected to a Gene Set Enrichment Analysis (GSEA). Results Gene sets representing protein translation were independently negatively associated with metastasis in the Luminal A and Luminal B subtypes, but positively associated with metastasis in Basal tumors. Proteinaceous extracellular matrix (ECM) gene set expression was positively associated with metastasis, after adjustment for expression of cell cycle related genes on the combined dataset. Finally, the positive association of the proliferation-related genes with metastases was confirmed. Conclusion To the best of our knowledge, the results depicting mixed prognostic significance of protein translation in breast cancer subtypes are being reported for the first time. We attribute this to our study combining multiple series and performing a more robust meta-analytic Cox regression modeling on the combined dataset, thus discovering 'hidden' associations. This methodology seems to yield new and interesting results and may be used as a tool to guide new research. PMID:26080057
A methodology for cloud masking uncalibrated lidar signals
NASA Astrophysics Data System (ADS)
Binietoglou, Ioannis; D'Amico, Giuseppe; Baars, Holger; Belegante, Livio; Marinou, Eleni
2018-04-01
Most lidar processing algorithms, such as those included in EARLINET's Single Calculus Chain, can be applied only to cloud-free atmospheric scenes. In this paper, we present a methodology for masking clouds in uncalibrated lidar signals. First, we construct a reference dataset based on manual inspection and then train a classifier to separate clouds and cloud-free regions. Here we present details of this approach together with an example cloud masks from an EARLINET station.
2017-10-01
to patient safety by addressing key methodological and conceptual gaps in healthcare simulation-based team training. The investigators are developing...primary outcome of Aim 1a is a conceptually and methodologically sound training design architecture that supports the development and integration of team...should be delivered. This subtask was delayed by approximately 1 month and is now completed. Completed Evaluation of existing experimental dataset to
Petousis, Ioannis; Mrdjenovich, David; Ballouz, Eric; ...
2017-01-31
Dielectrics are an important class of materials that are ubiquitous in modern electronic applications. Even though their properties are important for the performance of devices, the number of compounds with known dielectric constant is on the order of a few hundred. Here, we use Density Functional Perturbation Theory as a way to screen for the dielectric constant and refractive index of materials in a fast and computationally efficient way. Our results constitute the largest dielectric tensors database to date, containing 1,056 compounds. Details regarding the computational methodology and technical validation are presented along with the format of our publicly availablemore » data. In addition, we integrate our dataset with the Materials Project allowing users easy access to material properties. Finally, we explain how our dataset and calculation methodology can be used in the search for novel dielectric compounds.« less
Petousis, Ioannis; Mrdjenovich, David; Ballouz, Eric; Liu, Miao; Winston, Donald; Chen, Wei; Graf, Tanja; Schladt, Thomas D.; Persson, Kristin A.; Prinz, Fritz B.
2017-01-01
Dielectrics are an important class of materials that are ubiquitous in modern electronic applications. Even though their properties are important for the performance of devices, the number of compounds with known dielectric constant is on the order of a few hundred. Here, we use Density Functional Perturbation Theory as a way to screen for the dielectric constant and refractive index of materials in a fast and computationally efficient way. Our results constitute the largest dielectric tensors database to date, containing 1,056 compounds. Details regarding the computational methodology and technical validation are presented along with the format of our publicly available data. In addition, we integrate our dataset with the Materials Project allowing users easy access to material properties. Finally, we explain how our dataset and calculation methodology can be used in the search for novel dielectric compounds. PMID:28140408
Detection of tuberculosis using hybrid features from chest radiographs
NASA Astrophysics Data System (ADS)
Fatima, Ayesha; Akram, M. Usman; Akhtar, Mahmood; Shafique, Irrum
2017-02-01
Tuberculosis is an infectious disease and becomes a major threat all over the world but still diagnosis of tuberculosis is a challenging task. In literature, chest radiographs are considered as most commonly used medical images in under developed countries for the diagnosis of TB. Different methods have been proposed but they are not helpful for radiologists due to cost and accuracy issues. Our paper presents a methodology in which different combinations of features are extracted based on intensities, shape and texture of chest radiograph and given to classifier for the detection of TB. The performance of our methodology is evaluated using publically available standard dataset Montgomery Country (MC) which contains 138 CXRs among which 80 CXRs are normal and 58 CXRs are abnormal including effusion and miliary patterns etc. The accuracy of 81.16% was achieved and the results show that proposed method have outperformed existing state of the art methods on MC dataset.
Petousis, Ioannis; Mrdjenovich, David; Ballouz, Eric; Liu, Miao; Winston, Donald; Chen, Wei; Graf, Tanja; Schladt, Thomas D; Persson, Kristin A; Prinz, Fritz B
2017-01-31
Dielectrics are an important class of materials that are ubiquitous in modern electronic applications. Even though their properties are important for the performance of devices, the number of compounds with known dielectric constant is on the order of a few hundred. Here, we use Density Functional Perturbation Theory as a way to screen for the dielectric constant and refractive index of materials in a fast and computationally efficient way. Our results constitute the largest dielectric tensors database to date, containing 1,056 compounds. Details regarding the computational methodology and technical validation are presented along with the format of our publicly available data. In addition, we integrate our dataset with the Materials Project allowing users easy access to material properties. Finally, we explain how our dataset and calculation methodology can be used in the search for novel dielectric compounds.
In-flight photogrammetric camera calibration and validation via complementary lidar
NASA Astrophysics Data System (ADS)
Gneeniss, A. S.; Mills, J. P.; Miller, P. E.
2015-02-01
This research assumes lidar as a reference dataset against which in-flight camera system calibration and validation can be performed. The methodology utilises a robust least squares surface matching algorithm to align a dense network of photogrammetric points to the lidar reference surface, allowing for the automatic extraction of so-called lidar control points (LCPs). Adjustment of the photogrammetric data is then repeated using the extracted LCPs in a self-calibrating bundle adjustment with additional parameters. This methodology was tested using two different photogrammetric datasets, a Microsoft UltraCamX large format camera and an Applanix DSS322 medium format camera. Systematic sensitivity testing explored the influence of the number and weighting of LCPs. For both camera blocks it was found that when the number of control points increase, the accuracy improves regardless of point weighting. The calibration results were compared with those obtained using ground control points, with good agreement found between the two.
Decision tree methods: applications for classification and prediction.
Song, Yan-Yan; Lu, Ying
2015-04-25
Decision tree methodology is a commonly used data mining method for establishing classification systems based on multiple covariates or for developing prediction algorithms for a target variable. This method classifies a population into branch-like segments that construct an inverted tree with a root node, internal nodes, and leaf nodes. The algorithm is non-parametric and can efficiently deal with large, complicated datasets without imposing a complicated parametric structure. When the sample size is large enough, study data can be divided into training and validation datasets. Using the training dataset to build a decision tree model and a validation dataset to decide on the appropriate tree size needed to achieve the optimal final model. This paper introduces frequently used algorithms used to develop decision trees (including CART, C4.5, CHAID, and QUEST) and describes the SPSS and SAS programs that can be used to visualize tree structure.
Maglione, Anton G; Brizi, Ambra; Vecchiato, Giovanni; Rossi, Dario; Trettel, Arianna; Modica, Enrica; Babiloni, Fabio
2017-01-01
In this study, the cortical activity correlated with the perception and appreciation of different set of pictures was estimated by using neuroelectric brain activity and graph theory methodologies in a group of artistic educated persons. The pictures shown to the subjects consisted of original pictures of Titian's and a contemporary artist's paintings (Orig dataset) plus two sets of additional pictures. These additional datasets were obtained from the previous paintings by removing all but the colors or the shapes employed (Color and Style dataset, respectively). Results suggest that the verbal appreciation of Orig dataset when compared to Color and Style ones was mainly correlated to the neuroelectric indexes estimated during the first 10 s of observation of the pictures. Always in the first 10 s of observation: (1) Orig dataset induced more emotion and is perceived with more appreciation than the other two Color and Style datasets; (2) Style dataset is perceived with more attentional effort than the other investigated datasets. During the whole period of observation of 30 s: (1) emotion induced by Color and Style datasets increased across the time while that induced of the Orig dataset remain stable; (2) Color and Style dataset were perceived with more attentional effort than the Orig dataset. During the entire experience, there is evidence of a cortical flow of activity from the parietal and central areas toward the prefrontal and frontal areas during the observation of the images of all the datasets. This is coherent from the notion that active perception of the images with sustained cognitive attention in parietal and central areas caused the generation of the judgment about their aesthetic appreciation in frontal areas.
Maglione, Anton G.; Brizi, Ambra; Vecchiato, Giovanni; Rossi, Dario; Trettel, Arianna; Modica, Enrica; Babiloni, Fabio
2017-01-01
In this study, the cortical activity correlated with the perception and appreciation of different set of pictures was estimated by using neuroelectric brain activity and graph theory methodologies in a group of artistic educated persons. The pictures shown to the subjects consisted of original pictures of Titian's and a contemporary artist's paintings (Orig dataset) plus two sets of additional pictures. These additional datasets were obtained from the previous paintings by removing all but the colors or the shapes employed (Color and Style dataset, respectively). Results suggest that the verbal appreciation of Orig dataset when compared to Color and Style ones was mainly correlated to the neuroelectric indexes estimated during the first 10 s of observation of the pictures. Always in the first 10 s of observation: (1) Orig dataset induced more emotion and is perceived with more appreciation than the other two Color and Style datasets; (2) Style dataset is perceived with more attentional effort than the other investigated datasets. During the whole period of observation of 30 s: (1) emotion induced by Color and Style datasets increased across the time while that induced of the Orig dataset remain stable; (2) Color and Style dataset were perceived with more attentional effort than the Orig dataset. During the entire experience, there is evidence of a cortical flow of activity from the parietal and central areas toward the prefrontal and frontal areas during the observation of the images of all the datasets. This is coherent from the notion that active perception of the images with sustained cognitive attention in parietal and central areas caused the generation of the judgment about their aesthetic appreciation in frontal areas. PMID:28790907
Soil Bulk Density by Soil Type, Land Use and Data Source: Putting the Error in SOC Estimates
NASA Astrophysics Data System (ADS)
Wills, S. A.; Rossi, A.; Loecke, T.; Ramcharan, A. M.; Roecker, S.; Mishra, U.; Waltman, S.; Nave, L. E.; Williams, C. O.; Beaudette, D.; Libohova, Z.; Vasilas, L.
2017-12-01
An important part of SOC stock and pool assessment is the assessment, estimation, and application of bulk density estimates. The concept of bulk density is relatively simple (the mass of soil in a given volume), the specifics Bulk density can be difficult to measure in soils due to logistical and methodological constraints. While many estimates of SOC pools use legacy data in their estimates, few concerted efforts have been made to assess the process used to convert laboratory carbon concentration measurements and bulk density collection into volumetrically based SOC estimates. The methodologies used are particularly sensitive in wetlands and organic soils with high amounts of carbon and very low bulk densities. We will present an analysis across four database measurements: NCSS - the National Cooperative Soil Survey Characterization dataset, RaCA - the Rapid Carbon Assessment sample dataset, NWCA - the National Wetland Condition Assessment, and ISCN - the International soil Carbon Network. The relationship between bulk density and soil organic carbon will be evaluated by dataset and land use/land cover information. Prediction methods (both regression and machine learning) will be compared and contrasted across datasets and available input information. The assessment and application of bulk density, including modeling, aggregation and error propagation will be evaluated. Finally, recommendations will be made about both the use of new data in soil survey products (such as SSURGO) and the use of that information as legacy data in SOC pool estimates.
Raja, Kalpana; Natarajan, Jeyakumar
2018-07-01
Extraction of protein phosphorylation information from biomedical literature has gained much attention because of the importance in numerous biological processes. In this study, we propose a text mining methodology which consists of two phases, NLP parsing and SVM classification to extract phosphorylation information from literature. First, using NLP parsing we divide the data into three base-forms depending on the biomedical entities related to phosphorylation and further classify into ten sub-forms based on their distribution with phosphorylation keyword. Next, we extract the phosphorylation entity singles/pairs/triplets and apply SVM to classify the extracted singles/pairs/triplets using a set of features applicable to each sub-form. The performance of our methodology was evaluated on three corpora namely PLC, iProLink and hPP corpus. We obtained promising results of >85% F-score on ten sub-forms of training datasets on cross validation test. Our system achieved overall F-score of 93.0% on iProLink and 96.3% on hPP corpus test datasets. Furthermore, our proposed system achieved best performance on cross corpus evaluation and outperformed the existing system with recall of 90.1%. The performance analysis of our unique system on three corpora reveals that it extracts protein phosphorylation information efficiently in both non-organism specific general datasets such as PLC and iProLink, and human specific dataset such as hPP corpus. Copyright © 2018 Elsevier B.V. All rights reserved.
Downscaling global precipitation for local applications - a case for the Rhine basin
NASA Astrophysics Data System (ADS)
Sperna Weiland, Frederiek; van Verseveld, Willem; Schellekens, Jaap
2017-04-01
Within the EU FP7 project eartH2Observe a global Water Resources Re-analysis (WRR) is being developed. This re-analysis consists of meteorological and hydrological water balance variables with global coverage, spanning the period 1979-2014 at 0.25 degrees resolution (Schellekens et al., 2016). The dataset can be of special interest in regions with limited in-situ data availability, yet for local scale analysis particularly in mountainous regions, a resolution of 0.25 degrees may be too coarse and downscaling the data to a higher resolution may be required. A downscaling toolbox has been made that includes spatial downscaling of precipitation based on the global WorldClim dataset that is available at 1 km resolution as a monthly climatology (Hijmans et al., 2005). The input of the down-scaling tool are either the global eartH2Observe WRR1 and WRR2 datasets based on the WFDEI correction methodology (Weedon et al., 2014) or the global Multi-Source Weighted-Ensemble Precipitation (MSWEP) dataset (Beck et al., 2016). Here we present a validation of the datasets over the Rhine catchment by means of a distributed hydrological model (wflow, Schellekens et al., 2014) using a number of precipitation scenarios. (1) We start by running the model using the local reference dataset derived by spatial interpolation of gauge observations. Furthermore we use (2) the MSWEP dataset at the native 0.25-degree resolution followed by (3) MSWEP downscaled with the WorldClim dataset and final (4) MSWEP downscaled with the local reference dataset. The validation will be based on comparison of the modeled river discharges as well as rainfall statistics. We expect that down-scaling the MSWEP dataset with the WorldClim data to higher resolution will increase its performance. To test the performance of the down-scaling routine we have added a run with MSWEP data down-scaled with the local dataset and compare this with the run based on the local dataset itself. - Beck, H. E. et al., 2016. MSWEP: 3-hourly 0.25° global gridded precipitation (1979-2015) by merging gauge, satellite, and reanalysis data, Hydrol. Earth Syst. Sci. Discuss., doi:10.5194/hess-2016-236, accepted for final publication. - Hijmans, R.J. et al., 2005. Very high resolution interpolated climate surfaces for global land areas. International Journal of Climatology 25: 1965-1978. - Schellekens, J. et al., 2016. A global water resources ensemble of hydrological models: the eartH2Observe Tier-1 dataset, Earth Syst. Sci. Data Discuss., doi:10.5194/essd-2016-55, under review. - Schellekens, J. et al., 2014. Rapid setup of hydrological and hydraulic models using OpenStreetMap and the SRTM derived digital elevation model. Environmental Modelling&Software - Weedon, G.P. et al., 2014. The WFDEI meteorological forcing data set: WATCH Forcing Data methodology applied to ERA-Interim reanalysis data. Water Resources Research, 50, doi:10.1002/2014WR015638.
Transportation networks : data, analysis, methodology development and visualization.
DOT National Transportation Integrated Search
2007-12-29
This project provides data compilation, analysis methodology and visualization methodology for the current network : data assets of the Alabama Department of Transportation (ALDOT). This study finds that ALDOT is faced with a : considerable number of...
Sibbett, Ruth A; Russ, Tom C; Deary, Ian J; Starr, John M
2017-07-03
Studies investigating the risk factors for or causation of dementia must consider subjects prior to disease onset. To overcome the limitations of prospective studies and self-reported recall of information, the use of existing data is key. This review provides a narrative account of dementia ascertainment methods using sources of existing data. The literature search was performed using: MEDLINE, EMBASE, PsychInfo and Web of Science. Included articles reported a UK-based study of dementia in which cases were ascertained using existing data. Existing data included that which was routinely collected and that which was collected for previous research. After removing duplicates, abstracts were screened and the remaining articles were included for full-text review. A quality tool was used to evaluate the description of the ascertainment methodology. Of the 3545 abstracts screened, 360 articles were selected for full-text review. 47 articles were included for final consideration. Data sources for ascertainment included: death records, national datasets, research databases and hospital records among others. 36 articles used existing data alone for ascertainment, of which 27 used only a single data source. The most frequently used source was a research database. Quality scores ranged from 7/16 to 16/16. Quality scores were better for articles with dementia ascertainment as an outcome. Some papers performed validation studies of dementia ascertainment and most indicated that observed rates of dementia were lower than expected. We identified a lack of consistency in dementia ascertainment methodology using existing data. With no data source identified as a "gold-standard", we suggest the use of multiple sources. Where possible, studies should access records with evidence to confirm the diagnosis. Studies should also calculate the dementia ascertainment rate for the population being studied to enable a comparison with an expected rate.
NASA Astrophysics Data System (ADS)
Tugores, M. Pilar; Iglesias, Magdalena; Oñate, Dolores; Miquel, Joan
2016-02-01
In the Mediterranean Sea, the European anchovy (Engraulis encrasicolus) displays a key role in ecological and economical terms. Ensuring stock sustainability requires the provision of crucial information, such as species spatial distribution or unbiased abundance and precision estimates, so that management strategies can be defined (e.g. fishing quotas, temporal closure areas or marine protected areas MPA). Furthermore, the estimation of the precision of global abundance at different sampling intensities can be used for survey design optimisation. Geostatistics provide a priori unbiased estimations of the spatial structure, global abundance and precision for autocorrelated data. However, their application to non-Gaussian data introduces difficulties in the analysis in conjunction with low robustness or unbiasedness. The present study applied intrinsic geostatistics in two dimensions in order to (i) analyse the spatial distribution of anchovy in Spanish Western Mediterranean waters during the species' recruitment season, (ii) produce distribution maps, (iii) estimate global abundance and its precision, (iv) analyse the effect of changing the sampling intensity on the precision of global abundance estimates and, (v) evaluate the effects of several methodological options on the robustness of all the analysed parameters. The results suggested that while the spatial structure was usually non-robust to the tested methodological options when working with the original dataset, it became more robust for the transformed datasets (especially for the log-backtransformed dataset). The global abundance was always highly robust and the global precision was highly or moderately robust to most of the methodological options, except for data transformation.
Dataset on predictive compressive strength model for self-compacting concrete.
Ofuyatan, O M; Edeki, S O
2018-04-01
The determination of compressive strength is affected by many variables such as the water cement (WC) ratio, the superplasticizer (SP), the aggregate combination, and the binder combination. In this dataset article, 7, 28, and 90-day compressive strength models are derived using statistical analysis. The response surface methodology is used toinvestigate the effect of the parameters: Varying percentages of ash, cement, WC, and SP on hardened properties-compressive strengthat 7,28 and 90 days. Thelevels of independent parameters are determinedbased on preliminary experiments. The experimental values for compressive strengthat 7, 28 and 90 days and modulus of elasticity underdifferent treatment conditions are also discussed and presented.These dataset can effectively be used for modelling and prediction in concrete production settings.
Prediction of Solvent Physical Properties using the Hierarchical Clustering Method
Recently a QSAR (Quantitative Structure Activity Relationship) method, the hierarchical clustering method, was developed to estimate acute toxicity values for large, diverse datasets. This methodology has now been applied to the estimate solvent physical properties including sur...
Stoker, Jason M.; Tyler, Dean J.; Turnipseed, D. Phil; Van Wilson, K.; Oimoen, Michael J.
2009-01-01
Hurricane Katrina was one of the largest natural disasters in U.S. history. Due to the sheer size of the affected areas, an unprecedented regional analysis at very high resolution and accuracy was needed to properly quantify and understand the effects of the hurricane and the storm tide. Many disparate sources of lidar data were acquired and processed for varying environmental reasons by pre- and post-Katrina projects. The datasets were in several formats and projections and were processed to varying phases of completion, and as a result the task of producing a seamless digital elevation dataset required a high level of coordination, research, and revision. To create a seamless digital elevation dataset, many technical issues had to be resolved before producing the desired 1/9-arc-second (3meter) grid needed as the map base for projecting the Katrina peak storm tide throughout the affected coastal region. This report presents the methodology that was developed to construct seamless digital elevation datasets from multipurpose, multi-use, and disparate lidar datasets, and describes an easily accessible Web application for viewing the maximum storm tide caused by Hurricane Katrina in southeastern Louisiana, Mississippi, and Alabama.
Prediction of brain tissue temperature using near-infrared spectroscopy
Holper, Lisa; Mitra, Subhabrata; Bale, Gemma; Robertson, Nicola; Tachtsidis, Ilias
2017-01-01
Abstract. Broadband near-infrared spectroscopy (NIRS) can provide an endogenous indicator of tissue temperature based on the temperature dependence of the water absorption spectrum. We describe a first evaluation of the calibration and prediction of brain tissue temperature obtained during hypothermia in newborn piglets (animal dataset) and rewarming in newborn infants (human dataset) based on measured body (rectal) temperature. The calibration using partial least squares regression proved to be a reliable method to predict brain tissue temperature with respect to core body temperature in the wavelength interval of 720 to 880 nm with a strong mean predictive power of R2=0.713±0.157 (animal dataset) and R2=0.798±0.087 (human dataset). In addition, we applied regression receiver operating characteristic curves for the first time to evaluate the temperature prediction, which provided an overall mean error bias between NIRS predicted brain temperature and body temperature of 0.436±0.283°C (animal dataset) and 0.162±0.149°C (human dataset). We discuss main methodological aspects, particularly the well-known aspect of over- versus underestimation between brain and body temperature, which is relevant for potential clinical applications. PMID:28630878
Catanuto, Giuseppe; Taher, Wafa; Rocco, Nicola; Catalano, Francesca; Allegra, Dario; Milotta, Filippo Luigi Maria; Stanco, Filippo; Gallo, Giovanni; Nava, Maurizio Bruno
2018-03-20
Breast shape is defined utilizing mainly qualitative assessment (full, flat, ptotic) or estimates, such as volume or distances between reference points, that cannot describe it reliably. We will quantitatively describe breast shape with two parameters derived from a statistical methodology denominated principal component analysis (PCA). We created a heterogeneous dataset of breast shapes acquired with a commercial infrared 3-dimensional scanner on which PCA was performed. We plotted on a Cartesian plane the two highest values of PCA for each breast (principal components 1 and 2). Testing of the methodology on a preoperative and postoperative surgical case and test-retest was performed by two operators. The first two principal components derived from PCA are able to characterize the shape of the breast included in the dataset. The test-retest demonstrated that different operators are able to obtain very similar values of PCA. The system is also able to identify major changes in the preoperative and postoperative stages of a two-stage reconstruction. Even minor changes were correctly detected by the system. This methodology can reliably describe the shape of a breast. An expert operator and a newly trained operator can reach similar results in a test/re-testing validation. Once developed and after further validation, this methodology could be employed as a good tool for outcome evaluation, auditing, and benchmarking.
Antibiotic Resistome: Improving Detection and Quantification Accuracy for Comparative Metagenomics.
Elbehery, Ali H A; Aziz, Ramy K; Siam, Rania
2016-04-01
The unprecedented rise of life-threatening antibiotic resistance (AR), combined with the unparalleled advances in DNA sequencing of genomes and metagenomes, has pushed the need for in silico detection of the resistance potential of clinical and environmental metagenomic samples through the quantification of AR genes (i.e., genes conferring antibiotic resistance). Therefore, determining an optimal methodology to quantitatively and accurately assess AR genes in a given environment is pivotal. Here, we optimized and improved existing AR detection methodologies from metagenomic datasets to properly consider AR-generating mutations in antibiotic target genes. Through comparative metagenomic analysis of previously published AR gene abundance in three publicly available metagenomes, we illustrate how mutation-generated resistance genes are either falsely assigned or neglected, which alters the detection and quantitation of the antibiotic resistome. In addition, we inspected factors influencing the outcome of AR gene quantification using metagenome simulation experiments, and identified that genome size, AR gene length, total number of metagenomics reads and selected sequencing platforms had pronounced effects on the level of detected AR. In conclusion, our proposed improvements in the current methodologies for accurate AR detection and resistome assessment show reliable results when tested on real and simulated metagenomic datasets.
Terrestrial Ecosystems - Land Surface Forms of the Conterminous United States
Cress, Jill J.; Sayre, Roger G.; Comer, Patrick; Warner, Harumi
2009-01-01
As part of an effort to map terrestrial ecosystems, the U.S. Geological Survey has generated land surface form classes to be used in creating maps depicting standardized, terrestrial ecosystem models for the conterminous United States, using an ecosystems classification developed by NatureServe . A biophysical stratification approach, developed for South America and now being implemented globally, was used to model the ecosystem distributions. Since land surface forms strongly influence the differentiation and distribution of terrestrial ecosystems, they are one of the key input layers in this biophysical stratification. After extensive investigation into various land surface form mapping methodologies, the decision was made to use the methodology developed by the Missouri Resource Assessment Partnership (MoRAP). MoRAP made modifications to Hammond's land surface form classification, which allowed the use of 30-meter source data and a 1-km2 window for analyzing the data cell and its surrounding cells (neighborhood analysis). While Hammond's methodology was based on three topographic variables, slope, local relief, and profile type, MoRAP's methodology uses only slope and local relief. Using the MoRAP method, slope is classified as gently sloping when more than 50 percent of the area in a 1-km2 neighborhood has slope less than 8 percent, otherwise the area is considered moderately sloping. Local relief, which is the difference between the maximum and minimum elevation in a neighborhood, is classified into five groups: 0-15 m, 16-30 m, 31-90 m, 91-150 m, and >150 m. The land surface form classes are derived by combining slope and local relief to create eight landform classes: flat plains (gently sloping and local relief = 90 m), low hills (not gently sloping and local relief = 150 m). However, in the USGS application of the MoRAP methodology, an additional local relief group was used (> 400 m) to capture additional local topographic variation. As a result, low mountains were redefined as not gently sloping and 151 m 400 m. The final application of the MoRAP methodology was implemented using the USGS 30-meter National Elevation Dataset and an existing USGS slope dataset that had been derived by calculating the slope from the NED in Universal Transverse Mercator (UTM) coordinates in each UTM zone, and then combining all of the zones into a national dataset. This map shows a smoothed image of the nine land surface form classes based on MoRAP's methodology. Additional information about this map and any data developed for the ecosystems modeling of the conterminous United States is available online at http://rmgsc.cr.usgs.gov/ecosystems/.
Automatic Diabetic Macular Edema Detection in Fundus Images Using Publicly Available Datasets
DOE Office of Scientific and Technical Information (OSTI.GOV)
Giancardo, Luca; Meriaudeau, Fabrice; Karnowski, Thomas Paul
2011-01-01
Diabetic macular edema (DME) is a common vision threatening complication of diabetic retinopathy. In a large scale screening environment DME can be assessed by detecting exudates (a type of bright lesions) in fundus images. In this work, we introduce a new methodology for diagnosis of DME using a novel set of features based on colour, wavelet decomposition and automatic lesion segmentation. These features are employed to train a classifier able to automatically diagnose DME. We present a new publicly available dataset with ground-truth data containing 169 patients from various ethnic groups and levels of DME. This and other two publiclymore » available datasets are employed to evaluate our algorithm. We are able to achieve diagnosis performance comparable to retina experts on the MESSIDOR (an independently labelled dataset with 1200 images) with cross-dataset testing. Our algorithm is robust to segmentation uncertainties, does not need ground truth at lesion level, and is very fast, generating a diagnosis on an average of 4.4 seconds per image on an 2.6 GHz platform with an unoptimised Matlab implementation.« less
Rios, Anthony; Kavuluru, Ramakanth
2013-09-01
Extracting diagnosis codes from medical records is a complex task carried out by trained coders by reading all the documents associated with a patient's visit. With the popularity of electronic medical records (EMRs), computational approaches to code extraction have been proposed in the recent years. Machine learning approaches to multi-label text classification provide an important methodology in this task given each EMR can be associated with multiple codes. In this paper, we study the the role of feature selection, training data selection, and probabilistic threshold optimization in improving different multi-label classification approaches. We conduct experiments based on two different datasets: a recent gold standard dataset used for this task and a second larger and more complex EMR dataset we curated from the University of Kentucky Medical Center. While conventional approaches achieve results comparable to the state-of-the-art on the gold standard dataset, on our complex in-house dataset, we show that feature selection, training data selection, and probabilistic thresholding provide significant gains in performance.
Blending geological observations and convection models to reconstruct mantle dynamics
NASA Astrophysics Data System (ADS)
Coltice, Nicolas; Bocher, Marie; Fournier, Alexandre; Tackley, Paul
2015-04-01
Knowledge of the state of the Earth mantle and its temporal evolution is fundamental to a variety of disciplines in Earth Sciences, from the internal dynamics to its many expressions in the geological record (postglacial rebound, sea level change, ore deposit, tectonics or geomagnetic reversals). Mantle convection theory is the centerpiece to unravel the present and past state of the mantle. For the past 40 years considerable efforts have been made to improve the quality of numerical models of mantle convection. However, they are still sparsely used to estimate the convective history of the solid Earth, in comparison to ocean or atmospheric models for weather and climate prediction. The main shortcoming is their inability to successfully produce Earth-like seafloor spreading and continental drift self-consistently. Recent convection models have begun to successfully predict these processes. Such breakthrough opens the opportunity to retrieve the recent dynamics of the Earth's mantle by blending convection models together with advanced geological datasets. A proof of concept will be presented, consisting in a synthetic test based on a sequential data assimilation methodology.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wood, E.; Burton, E.; Duran, A.
Accurate and reliable global positioning system (GPS)-based vehicle use data are highly valuable for many transportation, analysis, and automotive considerations. Model-based design, real-world fuel economy analysis, and the growing field of autonomous and connected technologies (including predictive powertrain control and self-driving cars) all have a vested interest in high-fidelity estimation of powertrain loads and vehicle usage profiles. Unfortunately, road grade can be a difficult property to extract from GPS data with consistency. In this report, we present a methodology for appending high-resolution elevation data to GPS speed traces via a static digital elevation model. Anomalous data points in the digitalmore » elevation model are addressed during a filtration/smoothing routine, resulting in an elevation profile that can be used to calculate road grade. This process is evaluated against a large, commercially available height/slope dataset from the Navteq/Nokia/HERE Advanced Driver Assistance Systems product. Results will show good agreement with the Advanced Driver Assistance Systems data in the ability to estimate road grade between any two consecutive points in the contiguous United States.« less
NASA Astrophysics Data System (ADS)
Khouri, R.; Beaulieu, C.; Henson, S.; Martin, A. P.; Edwards, M.
2016-02-01
It is believed that changes in phytoplankton community have happened in the North Sea and NE Atlantic in the past decades. Since phytoplankton are the base of the marine food web, it is essential to understand the causes of such behaviour due its potential to induce change in the wider ecosystem. Whilst the impact of environmental controls, such as climate, have received considerable attention, phytoplankton can also be affected by zooplankton grazing. We investigate how changes in zooplankton impact phytoplankton populations and community composition, and vice-versa. We use data from the Continuous Plankton Recorder survey, an unique dataset that uses the same sampling methodology since 1958 and thus provides long and comparable plankton time-series. We apply statistical modelling to describe the interaction between phytoplankton and zooplankton. The analysis is inspired from techniques available in econometrics literature, which do not require assumptions of normality, independence or stationarity of the time-series. In particular, we discuss wether climatic factors or zooplankton grazing are more relevant to the variability in phytoplankton abundance and community composition.
An Analysis of Citizen Science Based Research: Usage and Publication Patterns
Follett, Ria; Strezov, Vladimir
2015-01-01
The use of citizen science for scientific discovery relies on the acceptance of this method by the scientific community. Using the Web of Science and Scopus as the source of peer reviewed articles, an analysis of all published articles on “citizen science” confirmed its growth, and found that significant research on methodology and validation techniques preceded the rapid rise of the publications on research outcomes based on citizen science methods. Of considerable interest is the growing number of studies relying on the re-use of collected datasets from past citizen science research projects, which used data from either individual or multiple citizen science projects for new discoveries, such as for climate change research. The extent to which citizen science has been used in scientific discovery demonstrates its importance as a research approach. This broad analysis of peer reviewed papers on citizen science, that included not only citizen science projects, but the theory and methods developed to underpin the research, highlights the breadth and depth of the citizen science approach and encourages cross-fertilization between the different disciplines. PMID:26600041
Retrieving Leaf Area Index (LAI) Using Remote Sensing: Theories, Methods and Sensors
Zheng, Guang; Moskal, L. Monika
2009-01-01
The ability to accurately and rapidly acquire leaf area index (LAI) is an indispensable component of process-based ecological research facilitating the understanding of gas-vegetation exchange phenomenon at an array of spatial scales from the leaf to the landscape. However, LAI is difficult to directly acquire for large spatial extents due to its time consuming and work intensive nature. Such efforts have been significantly improved by the emergence of optical and active remote sensing techniques. This paper reviews the definitions and theories of LAI measurement with respect to direct and indirect methods. Then, the methodologies for LAI retrieval with regard to the characteristics of a range of remotely sensed datasets are discussed. Remote sensing indirect methods are subdivided into two categories of passive and active remote sensing, which are further categorized as terrestrial, aerial and satellite-born platforms. Due to a wide variety in spatial resolution of remotely sensed data and the requirements of ecological modeling, the scaling issue of LAI is discussed and special consideration is given to extrapolation of measurement to landscape and regional levels. PMID:22574042
An Analysis of Citizen Science Based Research: Usage and Publication Patterns.
Follett, Ria; Strezov, Vladimir
2015-01-01
The use of citizen science for scientific discovery relies on the acceptance of this method by the scientific community. Using the Web of Science and Scopus as the source of peer reviewed articles, an analysis of all published articles on "citizen science" confirmed its growth, and found that significant research on methodology and validation techniques preceded the rapid rise of the publications on research outcomes based on citizen science methods. Of considerable interest is the growing number of studies relying on the re-use of collected datasets from past citizen science research projects, which used data from either individual or multiple citizen science projects for new discoveries, such as for climate change research. The extent to which citizen science has been used in scientific discovery demonstrates its importance as a research approach. This broad analysis of peer reviewed papers on citizen science, that included not only citizen science projects, but the theory and methods developed to underpin the research, highlights the breadth and depth of the citizen science approach and encourages cross-fertilization between the different disciplines.
Retrieving Leaf Area Index (LAI) Using Remote Sensing: Theories, Methods and Sensors.
Zheng, Guang; Moskal, L Monika
2009-01-01
The ability to accurately and rapidly acquire leaf area index (LAI) is an indispensable component of process-based ecological research facilitating the understanding of gas-vegetation exchange phenomenon at an array of spatial scales from the leaf to the landscape. However, LAI is difficult to directly acquire for large spatial extents due to its time consuming and work intensive nature. Such efforts have been significantly improved by the emergence of optical and active remote sensing techniques. This paper reviews the definitions and theories of LAI measurement with respect to direct and indirect methods. Then, the methodologies for LAI retrieval with regard to the characteristics of a range of remotely sensed datasets are discussed. Remote sensing indirect methods are subdivided into two categories of passive and active remote sensing, which are further categorized as terrestrial, aerial and satellite-born platforms. Due to a wide variety in spatial resolution of remotely sensed data and the requirements of ecological modeling, the scaling issue of LAI is discussed and special consideration is given to extrapolation of measurement to landscape and regional levels.
A Regression Model for Predicting Shape Deformation after Breast Conserving Surgery
Zolfagharnasab, Hooshiar; Bessa, Sílvia; Oliveira, Sara P.; Faria, Pedro; Teixeira, João F.; Cardoso, Jaime S.
2018-01-01
Breast cancer treatments can have a negative impact on breast aesthetics, in case when surgery is intended to intersect tumor. For many years mastectomy was the only surgical option, but more recently breast conserving surgery (BCS) has been promoted as a liable alternative to treat cancer while preserving most part of the breast. However, there is still a significant number of BCS intervened patients who are unpleasant with the result of the treatment, which leads to self-image issues and emotional overloads. Surgeons recognize the value of a tool to predict the breast shape after BCS to facilitate surgeon/patient communication and allow more educated decisions; however, no such tool is available that is suited for clinical usage. These tools could serve as a way of visually sensing the aesthetic consequences of the treatment. In this research, it is intended to propose a methodology for predict the deformation after BCS by using machine learning techniques. Nonetheless, there is no appropriate dataset containing breast data before and after surgery in order to train a learning model. Therefore, an in-house semi-synthetic dataset is proposed to fulfill the requirement of this research. Using the proposed dataset, several learning methodologies were investigated, and promising outcomes are obtained. PMID:29315279
Tošić, Tamara; Sellers, Kristin K; Fröhlich, Flavio; Fedotenkova, Mariia; Beim Graben, Peter; Hutt, Axel
2015-01-01
For decades, research in neuroscience has supported the hypothesis that brain dynamics exhibits recurrent metastable states connected by transients, which together encode fundamental neural information processing. To understand the system's dynamics it is important to detect such recurrence domains, but it is challenging to extract them from experimental neuroscience datasets due to the large trial-to-trial variability. The proposed methodology extracts recurrent metastable states in univariate time series by transforming datasets into their time-frequency representations and computing recurrence plots based on instantaneous spectral power values in various frequency bands. Additionally, a new statistical inference analysis compares different trial recurrence plots with corresponding surrogates to obtain statistically significant recurrent structures. This combination of methods is validated by applying it to two artificial datasets. In a final study of visually-evoked Local Field Potentials in partially anesthetized ferrets, the methodology is able to reveal recurrence structures of neural responses with trial-to-trial variability. Focusing on different frequency bands, the δ-band activity is much less recurrent than α-band activity. Moreover, α-activity is susceptible to pre-stimuli, while δ-activity is much less sensitive to pre-stimuli. This difference in recurrence structures in different frequency bands indicates diverse underlying information processing steps in the brain.
Tošić, Tamara; Sellers, Kristin K.; Fröhlich, Flavio; Fedotenkova, Mariia; beim Graben, Peter; Hutt, Axel
2016-01-01
For decades, research in neuroscience has supported the hypothesis that brain dynamics exhibits recurrent metastable states connected by transients, which together encode fundamental neural information processing. To understand the system's dynamics it is important to detect such recurrence domains, but it is challenging to extract them from experimental neuroscience datasets due to the large trial-to-trial variability. The proposed methodology extracts recurrent metastable states in univariate time series by transforming datasets into their time-frequency representations and computing recurrence plots based on instantaneous spectral power values in various frequency bands. Additionally, a new statistical inference analysis compares different trial recurrence plots with corresponding surrogates to obtain statistically significant recurrent structures. This combination of methods is validated by applying it to two artificial datasets. In a final study of visually-evoked Local Field Potentials in partially anesthetized ferrets, the methodology is able to reveal recurrence structures of neural responses with trial-to-trial variability. Focusing on different frequency bands, the δ-band activity is much less recurrent than α-band activity. Moreover, α-activity is susceptible to pre-stimuli, while δ-activity is much less sensitive to pre-stimuli. This difference in recurrence structures in different frequency bands indicates diverse underlying information processing steps in the brain. PMID:26834580
NASA Technical Reports Server (NTRS)
Roberts, J. Brent; Robertson, Franklin R.; Clayson, Carol Anne
2012-01-01
Improved estimates of near-surface air temperature and air humidity are critical to the development of more accurate turbulent surface heat fluxes over the ocean. Recent progress in retrieving these parameters has been made through the application of artificial neural networks (ANN) and the use of multi-sensor passive microwave observations. Details are provided on the development of an improved retrieval algorithm that applies the nonlinear statistical ANN methodology to a set of observations from the Advanced Microwave Scanning Radiometer (AMSR-E) and the Advanced Microwave Sounding Unit (AMSU-A) that are currently available from the NASA AQUA satellite platform. Statistical inversion techniques require an adequate training dataset to properly capture embedded physical relationships. The development of multiple training datasets containing only in-situ observations, only synthetic observations produced using the Community Radiative Transfer Model (CRTM), or a mixture of each is discussed. An intercomparison of results using each training dataset is provided to highlight the relative advantages and disadvantages of each methodology. Particular emphasis will be placed on the development of retrievals in cloudy versus clear-sky conditions. Near-surface air temperature and humidity retrievals using the multi-sensor ANN algorithms are compared to previous linear and non-linear retrieval schemes.
2013-01-01
Background The honey bee is an economically important species. With a rapid decline of the honey bee population, it is necessary to implement an improved genetic evaluation methodology. In this study, we investigated the applicability of the unified approach and its impact on the accuracy of estimation of breeding values for maternally influenced traits on a simulated dataset for the honey bee. Due to the limitation to the number of individuals that can be genotyped in a honey bee population, the unified approach can be an efficient strategy to increase the genetic gain and to provide a more accurate estimation of breeding values. We calculated the accuracy of estimated breeding values for two evaluation approaches, the unified approach and the traditional pedigree based approach. We analyzed the effects of different heritabilities as well as genetic correlation between direct and maternal effects on the accuracy of estimation of direct, maternal and overall breeding values (sum of maternal and direct breeding values). The genetic and reproductive biology of the honey bee was accounted for by taking into consideration characteristics such as colony structure, uncertain paternity, overlapping generations and polyandry. In addition, we used a modified numerator relationship matrix and a realistic genome for the honey bee. Results For all values of heritability and correlation, the accuracy of overall estimated breeding values increased significantly with the unified approach. The increase in accuracy was always higher for the case when there was no correlation as compared to the case where a negative correlation existed between maternal and direct effects. Conclusions Our study shows that the unified approach is a useful methodology for genetic evaluation in honey bees, and can contribute immensely to the improvement of traits of apicultural interest such as resistance to Varroa or production and behavioural traits. In particular, the study is of great interest for cases where negative correlation between maternal and direct effects and uncertain paternity exist, thus, is of relevance for other species as well. The study also provides an important framework for simulating genomic and pedigree datasets that will prove to be helpful for future studies. PMID:23647776
Gupta, Pooja; Reinsch, Norbert; Spötter, Andreas; Conrad, Tim; Bienefeld, Kaspar
2013-05-06
The honey bee is an economically important species. With a rapid decline of the honey bee population, it is necessary to implement an improved genetic evaluation methodology. In this study, we investigated the applicability of the unified approach and its impact on the accuracy of estimation of breeding values for maternally influenced traits on a simulated dataset for the honey bee. Due to the limitation to the number of individuals that can be genotyped in a honey bee population, the unified approach can be an efficient strategy to increase the genetic gain and to provide a more accurate estimation of breeding values. We calculated the accuracy of estimated breeding values for two evaluation approaches, the unified approach and the traditional pedigree based approach. We analyzed the effects of different heritabilities as well as genetic correlation between direct and maternal effects on the accuracy of estimation of direct, maternal and overall breeding values (sum of maternal and direct breeding values). The genetic and reproductive biology of the honey bee was accounted for by taking into consideration characteristics such as colony structure, uncertain paternity, overlapping generations and polyandry. In addition, we used a modified numerator relationship matrix and a realistic genome for the honey bee. For all values of heritability and correlation, the accuracy of overall estimated breeding values increased significantly with the unified approach. The increase in accuracy was always higher for the case when there was no correlation as compared to the case where a negative correlation existed between maternal and direct effects. Our study shows that the unified approach is a useful methodology for genetic evaluation in honey bees, and can contribute immensely to the improvement of traits of apicultural interest such as resistance to Varroa or production and behavioural traits. In particular, the study is of great interest for cases where negative correlation between maternal and direct effects and uncertain paternity exist, thus, is of relevance for other species as well. The study also provides an important framework for simulating genomic and pedigree datasets that will prove to be helpful for future studies.
The direct and indirect costs of both overweight and obesity: a systematic review
2014-01-01
Background The rising prevalence of overweight and obesity places a financial burden on health services and on the wider economy. Health service and societal costs of overweight and obesity are typically estimated by top-down approaches which derive population attributable fractions for a range of conditions associated with increased body fat or bottom-up methods based on analyses of cross-sectional or longitudinal datasets. The evidence base of cost of obesity studies is continually expanding, however, the scope of these studies varies widely and a lack of standardised methods limits comparisons nationally and internationally. The objective of this review is to contribute to this knowledge pool by examining direct costs and indirect (lost productivity) costs of both overweight and obesity to provide comparable estimates. This review was undertaken as part of the introductory work for the Irish cost of overweight and obesity study and examines inconsistencies in the methodologies of cost of overweight and obesity studies. Studies which evaluated the direct costs and indirect costs of both overweight and obesity were included. Methods A computerised search of English language studies addressing direct and indirect costs of overweight and obesity in adults between 2001 and 2011 was conducted. Reference lists of reports, articles and earlier reviews were scanned to identify additional studies. Results Five published articles were deemed eligible for inclusion. Despite the limited scope of this review there was considerable heterogeneity in methodological approaches and findings. In the four studies which presented separate estimates for direct and indirect costs of overweight and obesity, the indirect costs were higher, accounting for between 54% and 59% of the estimated total costs. Conclusion A gradient exists between increasing BMI and direct healthcare costs and indirect costs due to reduced productivity and early premature mortality. Determining precise estimates for the increases is mired by the large presence of heterogeneity among the available cost estimation literature. To improve the availability of quality evidence an international consensus on standardised methods for cost of obesity studies is warranted. Analyses of nationally representative cross-sectional datasets augmented by data from primary care are likely to provide the best data for international comparisons. PMID:24739239
The direct and indirect costs of both overweight and obesity: a systematic review.
Dee, Anne; Kearns, Karen; O'Neill, Ciaran; Sharp, Linda; Staines, Anthony; O'Dwyer, Victoria; Fitzgerald, Sarah; Perry, Ivan J
2014-04-16
The rising prevalence of overweight and obesity places a financial burden on health services and on the wider economy. Health service and societal costs of overweight and obesity are typically estimated by top-down approaches which derive population attributable fractions for a range of conditions associated with increased body fat or bottom-up methods based on analyses of cross-sectional or longitudinal datasets. The evidence base of cost of obesity studies is continually expanding, however, the scope of these studies varies widely and a lack of standardised methods limits comparisons nationally and internationally. The objective of this review is to contribute to this knowledge pool by examining direct costs and indirect (lost productivity) costs of both overweight and obesity to provide comparable estimates. This review was undertaken as part of the introductory work for the Irish cost of overweight and obesity study and examines inconsistencies in the methodologies of cost of overweight and obesity studies. Studies which evaluated the direct costs and indirect costs of both overweight and obesity were included. A computerised search of English language studies addressing direct and indirect costs of overweight and obesity in adults between 2001 and 2011 was conducted. Reference lists of reports, articles and earlier reviews were scanned to identify additional studies. Five published articles were deemed eligible for inclusion. Despite the limited scope of this review there was considerable heterogeneity in methodological approaches and findings. In the four studies which presented separate estimates for direct and indirect costs of overweight and obesity, the indirect costs were higher, accounting for between 54% and 59% of the estimated total costs. A gradient exists between increasing BMI and direct healthcare costs and indirect costs due to reduced productivity and early premature mortality. Determining precise estimates for the increases is mired by the large presence of heterogeneity among the available cost estimation literature. To improve the availability of quality evidence an international consensus on standardised methods for cost of obesity studies is warranted. Analyses of nationally representative cross-sectional datasets augmented by data from primary care are likely to provide the best data for international comparisons.
A Review of Citation Analysis Methodologies for Collection Management
ERIC Educational Resources Information Center
Hoffmann, Kristin; Doucette, Lise
2012-01-01
While there is a considerable body of literature that presents the results of citation analysis studies, most researchers do not provide enough detail in their methodology to reproduce the study, nor do they provide rationale for methodological decisions. In this paper, we review the methodologies used in 34 recent articles that present a…
Optimal prediction of the number of unseen species
Orlitsky, Alon; Suresh, Ananda Theertha; Wu, Yihong
2016-01-01
Estimating the number of unseen species is an important problem in many scientific endeavors. Its most popular formulation, introduced by Fisher et al. [Fisher RA, Corbet AS, Williams CB (1943) J Animal Ecol 12(1):42−58], uses n samples to predict the number U of hitherto unseen species that would be observed if t⋅n new samples were collected. Of considerable interest is the largest ratio t between the number of new and existing samples for which U can be accurately predicted. In seminal works, Good and Toulmin [Good I, Toulmin G (1956) Biometrika 43(102):45−63] constructed an intriguing estimator that predicts U for all t≤1. Subsequently, Efron and Thisted [Efron B, Thisted R (1976) Biometrika 63(3):435−447] proposed a modification that empirically predicts U even for some t>1, but without provable guarantees. We derive a class of estimators that provably predict U all of the way up to t∝logn. We also show that this range is the best possible and that the estimator’s mean-square error is near optimal for any t. Our approach yields a provable guarantee for the Efron−Thisted estimator and, in addition, a variant with stronger theoretical and experimental performance than existing methodologies on a variety of synthetic and real datasets. The estimators are simple, linear, computationally efficient, and scalable to massive datasets. Their performance guarantees hold uniformly for all distributions, and apply to all four standard sampling models commonly used across various scientific disciplines: multinomial, Poisson, hypergeometric, and Bernoulli product. PMID:27830649
Optimal prediction of the number of unseen species.
Orlitsky, Alon; Suresh, Ananda Theertha; Wu, Yihong
2016-11-22
Estimating the number of unseen species is an important problem in many scientific endeavors. Its most popular formulation, introduced by Fisher et al. [Fisher RA, Corbet AS, Williams CB (1943) J Animal Ecol 12(1):42-58], uses n samples to predict the number U of hitherto unseen species that would be observed if [Formula: see text] new samples were collected. Of considerable interest is the largest ratio t between the number of new and existing samples for which U can be accurately predicted. In seminal works, Good and Toulmin [Good I, Toulmin G (1956) Biometrika 43(102):45-63] constructed an intriguing estimator that predicts U for all [Formula: see text] Subsequently, Efron and Thisted [Efron B, Thisted R (1976) Biometrika 63(3):435-447] proposed a modification that empirically predicts U even for some [Formula: see text], but without provable guarantees. We derive a class of estimators that provably predict U all of the way up to [Formula: see text] We also show that this range is the best possible and that the estimator's mean-square error is near optimal for any t Our approach yields a provable guarantee for the Efron-Thisted estimator and, in addition, a variant with stronger theoretical and experimental performance than existing methodologies on a variety of synthetic and real datasets. The estimators are simple, linear, computationally efficient, and scalable to massive datasets. Their performance guarantees hold uniformly for all distributions, and apply to all four standard sampling models commonly used across various scientific disciplines: multinomial, Poisson, hypergeometric, and Bernoulli product.
Long-term personality data collection in support of spaceflight and analogue research.
Musson, David M; Helmreich, Robert L
2005-06-01
This is a review of past and present research into personality and performance at the University of Texas (UT) Human Factors Research Project. Specifically, personality trait data collected from astronauts, pilots, Antarctic personnel, and other groups over a 15-yr period is discussed with particular emphasis on research in space and space analogue environments. The UT Human Factors Research Project conducts studies in personality and group dynamics in aviation, space, and medicine. Current studies include personality determinants of professional cultures, team effectiveness in both medicine and aviation, and personality predictors of long-term astronaut performance. The Project also studies the design and effectiveness of behavioral strategies used to minimize error and maximize team performance in safety-critical work settings. A multi-year personality and performance dataset presents many opportunities for research, including long-term and follow-up studies of human performance, analyses of trends in recruiting and attrition, and the ability to adapt research design to operational changes and methodological advances. Special problems posed by such long-duration projects include issues of confidentiality and security, as well as practical limitations imposed by current peer-review and short-term funding practices. Practical considerations for ongoing dataset management include consistency of assessment instruments over time, variations in data acquisition from one year to the next, and dealing with changes in theory and practice that occur over the life of the project. A fundamental change in how research into human performance is funded would be required to ensure the ongoing development of such long-duration research databases.
A Computational Approach to Qualitative Analysis in Large Textual Datasets
Evans, Michael S.
2014-01-01
In this paper I introduce computational techniques to extend qualitative analysis into the study of large textual datasets. I demonstrate these techniques by using probabilistic topic modeling to analyze a broad sample of 14,952 documents published in major American newspapers from 1980 through 2012. I show how computational data mining techniques can identify and evaluate the significance of qualitatively distinct subjects of discussion across a wide range of public discourse. I also show how examining large textual datasets with computational methods can overcome methodological limitations of conventional qualitative methods, such as how to measure the impact of particular cases on broader discourse, how to validate substantive inferences from small samples of textual data, and how to determine if identified cases are part of a consistent temporal pattern. PMID:24498398
Kim, Changjae; Habib, Ayman; Pyeon, Muwook; Kwon, Goo-rak; Jung, Jaehoon; Heo, Joon
2016-01-22
Diverse approaches to laser point segmentation have been proposed since the emergence of the laser scanning system. Most of these segmentation techniques, however, suffer from limitations such as sensitivity to the choice of seed points, lack of consideration of the spatial relationships among points, and inefficient performance. In an effort to overcome these drawbacks, this paper proposes a segmentation methodology that: (1) reduces the dimensions of the attribute space; (2) considers the attribute similarity and the proximity of the laser point simultaneously; and (3) works well with both airborne and terrestrial laser scanning data. A neighborhood definition based on the shape of the surface increases the homogeneity of the laser point attributes. The magnitude of the normal position vector is used as an attribute for reducing the dimension of the accumulator array. The experimental results demonstrate, through both qualitative and quantitative evaluations, the outcomes' high level of reliability. The proposed segmentation algorithm provided 96.89% overall correctness, 95.84% completeness, a 0.25 m overall mean value of centroid difference, and less than 1° of angle difference. The performance of the proposed approach was also verified with a large dataset and compared with other approaches. Additionally, the evaluation of the sensitivity of the thresholds was carried out. In summary, this paper proposes a robust and efficient segmentation methodology for abstraction of an enormous number of laser points into plane information.
Kim, Changjae; Habib, Ayman; Pyeon, Muwook; Kwon, Goo-rak; Jung, Jaehoon; Heo, Joon
2016-01-01
Diverse approaches to laser point segmentation have been proposed since the emergence of the laser scanning system. Most of these segmentation techniques, however, suffer from limitations such as sensitivity to the choice of seed points, lack of consideration of the spatial relationships among points, and inefficient performance. In an effort to overcome these drawbacks, this paper proposes a segmentation methodology that: (1) reduces the dimensions of the attribute space; (2) considers the attribute similarity and the proximity of the laser point simultaneously; and (3) works well with both airborne and terrestrial laser scanning data. A neighborhood definition based on the shape of the surface increases the homogeneity of the laser point attributes. The magnitude of the normal position vector is used as an attribute for reducing the dimension of the accumulator array. The experimental results demonstrate, through both qualitative and quantitative evaluations, the outcomes’ high level of reliability. The proposed segmentation algorithm provided 96.89% overall correctness, 95.84% completeness, a 0.25 m overall mean value of centroid difference, and less than 1° of angle difference. The performance of the proposed approach was also verified with a large dataset and compared with other approaches. Additionally, the evaluation of the sensitivity of the thresholds was carried out. In summary, this paper proposes a robust and efficient segmentation methodology for abstraction of an enormous number of laser points into plane information. PMID:26805849
APPLICATION OF BENCHMARK DOSE METHODOLOGY TO DATA FROM PRENATAL DEVELOPMENTAL TOXICITY STUDIES
The benchmark dose (BMD) concept was applied to 246 conventional developmental toxicity datasets from government, industry and commercial laboratories. Five modeling approaches were used, two generic and three specific to developmental toxicity (DT models). BMDs for both quantal ...
A Vulnerability Index and Analysis for the Road Network of Rural Chile
NASA Astrophysics Data System (ADS)
Braun, Andreas; Stötzer, Johanna; Kubisch, Susanne; Dittrich, Andre; Keller, Sina
2017-04-01
Natural hazards impose considerable threats to the physical and socio-economic wellbeing of people, a fact, which is well understood and investigated for many regions. However, not only people are vulnerable. During the last decades, a considerable amount of literature has focussed the particular vulnerability of the critical infrastructure: for example road networks. Considering critical infrastructure, far less reliable information exists for many regions worldwide - particularly, regions outside of the so called developed world. Critical infrastructure is destroyed in many disasters, causing cascade and follow up effects, for instance, impediments during evacuation, rescue and during the resilience phase. These circumstances, which are general enough to be applied to most regions, aggravate in regions characterized by high disparities between the urban and the rural sphere. Peripheral rural areas are especially prone to get isolated due to defects of the few roads which connect them to larger urban centres (where, frequently, disaster and emergency actors are situated). The rural area of Central Chile is a appropriate example for these circumstances. It is prone to destruction by several geo-hazards and furthermore, characterized by the aforementioned disparities. Past disasters, e.g. the 1991 Cerro Hudson eruption and the 2010 Maule earthquake have led to follow up effects (e.g. farmers, being unable to evacuate their animals due to road failures in the first case, and difficultires to evacuate people from places such as Caleta Tumbes or Dichato, which are connected by just a single road only in the second). The contribution develops a methodology to investigate into the critical infrastructure of such places. It develops a remoteness index for Chile, which identifies remote, peripheral rural areas, prone to get isolated due to road network failures during disasters. The approach is graph based. It offers particular advantages for regions like rural Chile since 1. it does not require traffic flow data which do not exist, 2. identifies peripheral areas particularly well, 3. identifies both nodes (places) prone to isolation and edges (roads) critical for the connectivity of rural areas, 4. based on a mathematical structure, it implies several possible planning solutions to reduce vulnerability of the critical infrastructure and people dependent on it. The methodology is presented and elaborated theoretically. Afterwards, it is demonstrated on an actual dataset from central Chile. It is demonstrated, how the methodology can be applied to derive planning solutions for peripheral rural areas.
High resolution global gridded data for use in population studies
NASA Astrophysics Data System (ADS)
Lloyd, Christopher T.; Sorichetta, Alessandro; Tatem, Andrew J.
2017-01-01
Recent years have seen substantial growth in openly available satellite and other geospatial data layers, which represent a range of metrics relevant to global human population mapping at fine spatial scales. The specifications of such data differ widely and therefore the harmonisation of data layers is a prerequisite to constructing detailed and contemporary spatial datasets which accurately describe population distributions. Such datasets are vital to measure impacts of population growth, monitor change, and plan interventions. To this end the WorldPop Project has produced an open access archive of 3 and 30 arc-second resolution gridded data. Four tiled raster datasets form the basis of the archive: (i) Viewfinder Panoramas topography clipped to Global ADMinistrative area (GADM) coastlines; (ii) a matching ISO 3166 country identification grid; (iii) country area; (iv) and slope layer. Further layers include transport networks, landcover, nightlights, precipitation, travel time to major cities, and waterways. Datasets and production methodology are here described. The archive can be downloaded both from the WorldPop Dataverse Repository and the WorldPop Project website.
High resolution global gridded data for use in population studies.
Lloyd, Christopher T; Sorichetta, Alessandro; Tatem, Andrew J
2017-01-31
Recent years have seen substantial growth in openly available satellite and other geospatial data layers, which represent a range of metrics relevant to global human population mapping at fine spatial scales. The specifications of such data differ widely and therefore the harmonisation of data layers is a prerequisite to constructing detailed and contemporary spatial datasets which accurately describe population distributions. Such datasets are vital to measure impacts of population growth, monitor change, and plan interventions. To this end the WorldPop Project has produced an open access archive of 3 and 30 arc-second resolution gridded data. Four tiled raster datasets form the basis of the archive: (i) Viewfinder Panoramas topography clipped to Global ADMinistrative area (GADM) coastlines; (ii) a matching ISO 3166 country identification grid; (iii) country area; (iv) and slope layer. Further layers include transport networks, landcover, nightlights, precipitation, travel time to major cities, and waterways. Datasets and production methodology are here described. The archive can be downloaded both from the WorldPop Dataverse Repository and the WorldPop Project website.
Exudate-based diabetic macular edema detection in fundus images using publicly available datasets
DOE Office of Scientific and Technical Information (OSTI.GOV)
Giancardo, Luca; Meriaudeau, Fabrice; Karnowski, Thomas Paul
2011-01-01
Diabetic macular edema (DME) is a common vision threatening complication of diabetic retinopathy. In a large scale screening environment DME can be assessed by detecting exudates (a type of bright lesions) in fundus images. In this work, we introduce a new methodology for diagnosis of DME using a novel set of features based on colour, wavelet decomposition and automatic lesion segmentation. These features are employed to train a classifier able to automatically diagnose DME through the presence of exudation. We present a new publicly available dataset with ground-truth data containing 169 patients from various ethnic groups and levels of DME.more » This and other two publicly available datasets are employed to evaluate our algorithm. We are able to achieve diagnosis performance comparable to retina experts on the MESSIDOR (an independently labelled dataset with 1200 images) with cross-dataset testing (e.g., the classifier was trained on an independent dataset and tested on MESSIDOR). Our algorithm obtained an AUC between 0.88 and 0.94 depending on the dataset/features used. Additionally, it does not need ground truth at lesion level to reject false positives and is computationally efficient, as it generates a diagnosis on an average of 4.4 s (9.3 s, considering the optic nerve localization) per image on an 2.6 GHz platform with an unoptimized Matlab implementation.« less
76 FR 55804 - Dicamba; Pesticide Tolerances
Federal Register 2010, 2011, 2012, 2013, 2014
2011-09-09
... Considerations A. Analytical Enforcement Methodology Adequate enforcement methodologies, Methods I and II--gas chromatography with electron capture detection (GC/ECD), are available to enforce the tolerance expression. The...
Soranno, Patricia A; Bissell, Edward G; Cheruvelil, Kendra S; Christel, Samuel T; Collins, Sarah M; Fergus, C Emi; Filstrup, Christopher T; Lapierre, Jean-Francois; Lottig, Noah R; Oliver, Samantha K; Scott, Caren E; Smith, Nicole J; Stopyak, Scott; Yuan, Shuai; Bremigan, Mary Tate; Downing, John A; Gries, Corinna; Henry, Emily N; Skaff, Nick K; Stanley, Emily H; Stow, Craig A; Tan, Pang-Ning; Wagner, Tyler; Webster, Katherine E
2015-01-01
Although there are considerable site-based data for individual or groups of ecosystems, these datasets are widely scattered, have different data formats and conventions, and often have limited accessibility. At the broader scale, national datasets exist for a large number of geospatial features of land, water, and air that are needed to fully understand variation among these ecosystems. However, such datasets originate from different sources and have different spatial and temporal resolutions. By taking an open-science perspective and by combining site-based ecosystem datasets and national geospatial datasets, science gains the ability to ask important research questions related to grand environmental challenges that operate at broad scales. Documentation of such complicated database integration efforts, through peer-reviewed papers, is recommended to foster reproducibility and future use of the integrated database. Here, we describe the major steps, challenges, and considerations in building an integrated database of lake ecosystems, called LAGOS (LAke multi-scaled GeOSpatial and temporal database), that was developed at the sub-continental study extent of 17 US states (1,800,000 km(2)). LAGOS includes two modules: LAGOSGEO, with geospatial data on every lake with surface area larger than 4 ha in the study extent (~50,000 lakes), including climate, atmospheric deposition, land use/cover, hydrology, geology, and topography measured across a range of spatial and temporal extents; and LAGOSLIMNO, with lake water quality data compiled from ~100 individual datasets for a subset of lakes in the study extent (~10,000 lakes). Procedures for the integration of datasets included: creating a flexible database design; authoring and integrating metadata; documenting data provenance; quantifying spatial measures of geographic data; quality-controlling integrated and derived data; and extensively documenting the database. Our procedures make a large, complex, and integrated database reproducible and extensible, allowing users to ask new research questions with the existing database or through the addition of new data. The largest challenge of this task was the heterogeneity of the data, formats, and metadata. Many steps of data integration need manual input from experts in diverse fields, requiring close collaboration.
Soranno, Patricia A.; Bissell, E.G.; Cheruvelil, Kendra S.; Christel, Samuel T.; Collins, Sarah M.; Fergus, C. Emi; Filstrup, Christopher T.; Lapierre, Jean-Francois; Lotting, Noah R.; Oliver, Samantha K.; Scott, Caren E.; Smith, Nicole J.; Stopyak, Scott; Yuan, Shuai; Bremigan, Mary Tate; Downing, John A.; Gries, Corinna; Henry, Emily N.; Skaff, Nick K.; Stanley, Emily H.; Stow, Craig A.; Tan, Pang-Ning; Wagner, Tyler; Webster, Katherine E.
2015-01-01
Although there are considerable site-based data for individual or groups of ecosystems, these datasets are widely scattered, have different data formats and conventions, and often have limited accessibility. At the broader scale, national datasets exist for a large number of geospatial features of land, water, and air that are needed to fully understand variation among these ecosystems. However, such datasets originate from different sources and have different spatial and temporal resolutions. By taking an open-science perspective and by combining site-based ecosystem datasets and national geospatial datasets, science gains the ability to ask important research questions related to grand environmental challenges that operate at broad scales. Documentation of such complicated database integration efforts, through peer-reviewed papers, is recommended to foster reproducibility and future use of the integrated database. Here, we describe the major steps, challenges, and considerations in building an integrated database of lake ecosystems, called LAGOS (LAke multi-scaled GeOSpatial and temporal database), that was developed at the sub-continental study extent of 17 US states (1,800,000 km2). LAGOS includes two modules: LAGOSGEO, with geospatial data on every lake with surface area larger than 4 ha in the study extent (~50,000 lakes), including climate, atmospheric deposition, land use/cover, hydrology, geology, and topography measured across a range of spatial and temporal extents; and LAGOSLIMNO, with lake water quality data compiled from ~100 individual datasets for a subset of lakes in the study extent (~10,000 lakes). Procedures for the integration of datasets included: creating a flexible database design; authoring and integrating metadata; documenting data provenance; quantifying spatial measures of geographic data; quality-controlling integrated and derived data; and extensively documenting the database. Our procedures make a large, complex, and integrated database reproducible and extensible, allowing users to ask new research questions with the existing database or through the addition of new data. The largest challenge of this task was the heterogeneity of the data, formats, and metadata. Many steps of data integration need manual input from experts in diverse fields, requiring close collaboration.
Understanding Participatory Action Research: A Qualitative Research Methodology Option
ERIC Educational Resources Information Center
MacDonald, Cathy
2012-01-01
Participatory Action Research (PAR) is a qualitative research methodology option that requires further understanding and consideration. PAR is considered democratic, equitable, liberating, and life-enhancing qualitative inquiry that remains distinct from other qualitative methodologies (Kach & Kralik, 2006). Using PAR, qualitative features of an…
Anstey, Kaarin J; Bielak, Allison AM; Birrell, Carole L; Browning, Colette J; Burns, Richard A; Byles, Julie; Kiley, Kim M; Nepal, Binod; Ross, Lesley A; Steel, David; Windsor, Timothy D
2014-01-01
Aim To describe the Dynamic Analyses to Optimise Ageing (DYNOPTA) project and illustrate its contributions to understanding ageing through innovative methodology, and investigations on outcomes based on the project themes. DYNOPTA provides a platform and technical expertise that may be used to combine other national and international datasets. Method The DYNOPTA project has pooled and harmonized data from nine Australian longitudinal studies to create the largest available longitudinal dataset (N=50652) on ageing in Australia. Results A range of findings have resulted from the study to date, including methodological advances, prevalence rates of disease and disability, and mapping trajectories of ageing with and without increasing morbidity. DYNOPTA also forms the basis of a microsimulation model that will provide projections of future costs of disease and disability for the baby boomer cohort. Conclusion DYNOPTA contributes significantly to the Australian evidence-base on ageing to inform key social and health policy domains. PMID:22032767
Rheem, Sungsue; Rheem, Insoo; Oh, Sejong
2017-01-01
Response surface methodology (RSM) is a useful set of statistical techniques for modeling and optimizing responses in research studies of food science. In the analysis of response surface data, a second-order polynomial regression model is usually used. However, sometimes we encounter situations where the fit of the second-order model is poor. If the model fitted to the data has a poor fit including a lack of fit, the modeling and optimization results might not be accurate. In such a case, using a fullest balanced model, which has no lack of fit, can fix such problem, enhancing the accuracy of the response surface modeling and optimization. This article presents how to develop and use such a model for the better modeling and optimizing of the response through an illustrative re-analysis of a dataset in Park et al. (2014) published in the Korean Journal for Food Science of Animal Resources .
Internal Consistency of the NVAP Water Vapor Dataset
NASA Technical Reports Server (NTRS)
Suggs, Ronnie J.; Jedlovec, Gary J.; Arnold, James E. (Technical Monitor)
2001-01-01
The NVAP (NASA Water Vapor Project) dataset is a global dataset at 1 x 1 degree spatial resolution consisting of daily, pentad, and monthly atmospheric precipitable water (PW) products. The analysis blends measurements from the Television and Infrared Operational Satellite (TIROS) Operational Vertical Sounder (TOVS), the Special Sensor Microwave/Imager (SSM/I), and radiosonde observations into a daily collage of PW. The original dataset consisted of five years of data from 1988 to 1992. Recent updates have added three additional years (1993-1995) and incorporated procedural and algorithm changes from the original methodology. Since each of the PW sources (TOVS, SSM/I, and radiosonde) do not provide global coverage, each of these sources compliment one another by providing spatial coverage over regions and during times where the other is not available. For this type of spatial and temporal blending to be successful, each of the source components should have similar or compatible accuracies. If this is not the case, regional and time varying biases may be manifested in the NVAP dataset. This study examines the consistency of the NVAP source data by comparing daily collocated TOVS and SSM/I PW retrievals with collocated radiosonde PW observations. The daily PW intercomparisons are performed over the time period of the dataset and for various regions.
Wu, Jing; Philip, Ana-Maria; Podkowinski, Dominika; Gerendas, Bianca S; Langs, Georg; Simader, Christian; Waldstein, Sebastian M; Schmidt-Erfurth, Ursula M
2016-01-01
Development of image analysis and machine learning methods for segmentation of clinically significant pathology in retinal spectral-domain optical coherence tomography (SD-OCT), used in disease detection and prediction, is limited due to the availability of expertly annotated reference data. Retinal segmentation methods use datasets that either are not publicly available, come from only one device, or use different evaluation methodologies making them difficult to compare. Thus we present and evaluate a multiple expert annotated reference dataset for the problem of intraretinal cystoid fluid (IRF) segmentation, a key indicator in exudative macular disease. In addition, a standardized framework for segmentation accuracy evaluation, applicable to other pathological structures, is presented. Integral to this work is the dataset used which must be fit for purpose for IRF segmentation algorithm training and testing. We describe here a multivendor dataset comprised of 30 scans. Each OCT scan for system training has been annotated by multiple graders using a proprietary system. Evaluation of the intergrader annotations shows a good correlation, thus making the reproducibly annotated scans suitable for the training and validation of image processing and machine learning based segmentation methods. The dataset will be made publicly available in the form of a segmentation Grand Challenge.
Wu, Jing; Philip, Ana-Maria; Podkowinski, Dominika; Gerendas, Bianca S.; Langs, Georg; Simader, Christian
2016-01-01
Development of image analysis and machine learning methods for segmentation of clinically significant pathology in retinal spectral-domain optical coherence tomography (SD-OCT), used in disease detection and prediction, is limited due to the availability of expertly annotated reference data. Retinal segmentation methods use datasets that either are not publicly available, come from only one device, or use different evaluation methodologies making them difficult to compare. Thus we present and evaluate a multiple expert annotated reference dataset for the problem of intraretinal cystoid fluid (IRF) segmentation, a key indicator in exudative macular disease. In addition, a standardized framework for segmentation accuracy evaluation, applicable to other pathological structures, is presented. Integral to this work is the dataset used which must be fit for purpose for IRF segmentation algorithm training and testing. We describe here a multivendor dataset comprised of 30 scans. Each OCT scan for system training has been annotated by multiple graders using a proprietary system. Evaluation of the intergrader annotations shows a good correlation, thus making the reproducibly annotated scans suitable for the training and validation of image processing and machine learning based segmentation methods. The dataset will be made publicly available in the form of a segmentation Grand Challenge. PMID:27579177
NASA Astrophysics Data System (ADS)
Waldhoff, Guido; Lussem, Ulrike; Bareth, Georg
2017-09-01
Spatial land use information is one of the key input parameters for regional agro-ecosystem modeling. Furthermore, to assess the crop-specific management in a spatio-temporal context accurately, parcel-related crop rotation information is additionally needed. Such data is scarcely available for a regional scale, so that only modeled crop rotations can be incorporated instead. However, the spectrum of the occurring multiannual land use patterns on arable land remains unknown. Thus, this contribution focuses on the mapping of the actually practiced crop rotations in the Rur catchment, located in the western part of Germany. We addressed this by combining multitemporal multispectral remote sensing data, ancillary information and expert-knowledge on crop phenology in a GIS-based Multi-Data Approach (MDA). At first, a methodology for the enhanced differentiation of the major crop types on an annual basis was developed. Key aspects are (i) the usage of physical block data to separate arable land from other land use types, (ii) the classification of remote sensing scenes of specific time periods, which are most favorable for the differentiation of certain crop types, and (iii) the combination of the multitemporal classification results in a sequential analysis strategy. Annual crop maps of eight consecutive years (2008-2015) were combined to a crop sequence dataset to have a profound data basis for the mapping of crop rotations. In most years, the remote sensing data basis was highly fragmented. Nevertheless, our method enabled satisfying crop mapping results. As an example for the annual crop mapping workflow, the procedure and the result of 2015 are illustrated. For the generation of the crop sequence dataset, the eight annual crop maps were geometrically smoothened and integrated into a single vector data layer. The resulting dataset informs about the occurring crop sequence for individual areas on arable land, so that crop rotation schemes can be derived. The resulting dataset reveals that the spectrum of the practiced crop rotations is extremely heterogeneous and contains a large amount of crop sequences, which strongly diverge from model crop rotations. Consequently, the integration of remote sensing-based crop rotation data can considerably reduce uncertainties regarding the management in regional agro-ecosystem modeling. Finally, the developed methods and the results are discussed in detail.
Spinelli, Lionel; Carpentier, Sabrina; Montañana Sanchis, Frédéric; Dalod, Marc; Vu Manh, Thien-Phong
2015-10-19
Recent advances in the analysis of high-throughput expression data have led to the development of tools that scaled-up their focus from single-gene to gene set level. For example, the popular Gene Set Enrichment Analysis (GSEA) algorithm can detect moderate but coordinated expression changes of groups of presumably related genes between pairs of experimental conditions. This considerably improves extraction of information from high-throughput gene expression data. However, although many gene sets covering a large panel of biological fields are available in public databases, the ability to generate home-made gene sets relevant to one's biological question is crucial but remains a substantial challenge to most biologists lacking statistic or bioinformatic expertise. This is all the more the case when attempting to define a gene set specific of one condition compared to many other ones. Thus, there is a crucial need for an easy-to-use software for generation of relevant home-made gene sets from complex datasets, their use in GSEA, and the correction of the results when applied to multiple comparisons of many experimental conditions. We developed BubbleGUM (GSEA Unlimited Map), a tool that allows to automatically extract molecular signatures from transcriptomic data and perform exhaustive GSEA with multiple testing correction. One original feature of BubbleGUM notably resides in its capacity to integrate and compare numerous GSEA results into an easy-to-grasp graphical representation. We applied our method to generate transcriptomic fingerprints for murine cell types and to assess their enrichments in human cell types. This analysis allowed us to confirm homologies between mouse and human immunocytes. BubbleGUM is an open-source software that allows to automatically generate molecular signatures out of complex expression datasets and to assess directly their enrichment by GSEA on independent datasets. Enrichments are displayed in a graphical output that helps interpreting the results. This innovative methodology has recently been used to answer important questions in functional genomics, such as the degree of similarities between microarray datasets from different laboratories or with different experimental models or clinical cohorts. BubbleGUM is executable through an intuitive interface so that both bioinformaticians and biologists can use it. It is available at http://www.ciml.univ-mrs.fr/applications/BubbleGUM/index.html .
Mahmood, Khalid; Jung, Chol-Hee; Philip, Gayle; Georgeson, Peter; Chung, Jessica; Pope, Bernard J; Park, Daniel J
2017-05-16
Genetic variant effect prediction algorithms are used extensively in clinical genomics and research to determine the likely consequences of amino acid substitutions on protein function. It is vital that we better understand their accuracies and limitations because published performance metrics are confounded by serious problems of circularity and error propagation. Here, we derive three independent, functionally determined human mutation datasets, UniFun, BRCA1-DMS and TP53-TA, and employ them, alongside previously described datasets, to assess the pre-eminent variant effect prediction tools. Apparent accuracies of variant effect prediction tools were influenced significantly by the benchmarking dataset. Benchmarking with the assay-determined datasets UniFun and BRCA1-DMS yielded areas under the receiver operating characteristic curves in the modest ranges of 0.52 to 0.63 and 0.54 to 0.75, respectively, considerably lower than observed for other, potentially more conflicted datasets. These results raise concerns about how such algorithms should be employed, particularly in a clinical setting. Contemporary variant effect prediction tools are unlikely to be as accurate at the general prediction of functional impacts on proteins as reported prior. Use of functional assay-based datasets that avoid prior dependencies promises to be valuable for the ongoing development and accurate benchmarking of such tools.
Perception of Virtual Audiences.
Chollet, Mathieu; Scherer, Stefan
2017-01-01
A growing body of evidence shows that virtual audiences are a valuable tool in the treatment of social anxiety, and recent works show that it also a useful in public-speaking training programs. However, little research has focused on how such audiences are perceived and on how the behavior of virtual audiences can be manipulated to create various types of stimuli. The authors used a crowdsourcing methodology to create a virtual audience nonverbal behavior model and, with it, created a dataset of videos with virtual audiences containing varying behaviors. Using this dataset, they investigated how virtual audiences are perceived and which factors affect this perception.
Systematic evaluation of deep learning based detection frameworks for aerial imagery
NASA Astrophysics Data System (ADS)
Sommer, Lars; Steinmann, Lucas; Schumann, Arne; Beyerer, Jürgen
2018-04-01
Object detection in aerial imagery is crucial for many applications in the civil and military domain. In recent years, deep learning based object detection frameworks significantly outperformed conventional approaches based on hand-crafted features on several datasets. However, these detection frameworks are generally designed and optimized for common benchmark datasets, which considerably differ from aerial imagery especially in object sizes. As already demonstrated for Faster R-CNN, several adaptations are necessary to account for these differences. In this work, we adapt several state-of-the-art detection frameworks including Faster R-CNN, R-FCN, and Single Shot MultiBox Detector (SSD) to aerial imagery. We discuss adaptations that mainly improve the detection accuracy of all frameworks in detail. As the output of deeper convolutional layers comprise more semantic information, these layers are generally used in detection frameworks as feature map to locate and classify objects. However, the resolution of these feature maps is insufficient for handling small object instances, which results in an inaccurate localization or incorrect classification of small objects. Furthermore, state-of-the-art detection frameworks perform bounding box regression to predict the exact object location. Therefore, so called anchor or default boxes are used as reference. We demonstrate how an appropriate choice of anchor box sizes can considerably improve detection performance. Furthermore, we evaluate the impact of the performed adaptations on two publicly available datasets to account for various ground sampling distances or differing backgrounds. The presented adaptations can be used as guideline for further datasets or detection frameworks.
An approach for reduction of false predictions in reverse engineering of gene regulatory networks.
Khan, Abhinandan; Saha, Goutam; Pal, Rajat Kumar
2018-05-14
A gene regulatory network discloses the regulatory interactions amongst genes, at a particular condition of the human body. The accurate reconstruction of such networks from time-series genetic expression data using computational tools offers a stiff challenge for contemporary computer scientists. This is crucial to facilitate the understanding of the proper functioning of a living organism. Unfortunately, the computational methods produce many false predictions along with the correct predictions, which is unwanted. Investigations in the domain focus on the identification of as many correct regulations as possible in the reverse engineering of gene regulatory networks to make it more reliable and biologically relevant. One way to achieve this is to reduce the number of incorrect predictions in the reconstructed networks. In the present investigation, we have proposed a novel scheme to decrease the number of false predictions by suitably combining several metaheuristic techniques. We have implemented the same using a dataset ensemble approach (i.e. combining multiple datasets) also. We have employed the proposed methodology on real-world experimental datasets of the SOS DNA Repair network of Escherichia coli and the IMRA network of Saccharomyces cerevisiae. Subsequently, we have experimented upon somewhat larger, in silico networks, namely, DREAM3 and DREAM4 Challenge networks, and 15-gene and 20-gene networks extracted from the GeneNetWeaver database. To study the effect of multiple datasets on the quality of the inferred networks, we have used four datasets in each experiment. The obtained results are encouraging enough as the proposed methodology can reduce the number of false predictions significantly, without using any supplementary prior biological information for larger gene regulatory networks. It is also observed that if a small amount of prior biological information is incorporated here, the results improve further w.r.t. the prediction of true positives. Copyright © 2018 Elsevier Ltd. All rights reserved.
Giambartolomei, Claudia; Vukcevic, Damjan; Schadt, Eric E; Franke, Lude; Hingorani, Aroon D; Wallace, Chris; Plagnol, Vincent
2014-05-01
Genetic association studies, in particular the genome-wide association study (GWAS) design, have provided a wealth of novel insights into the aetiology of a wide range of human diseases and traits, in particular cardiovascular diseases and lipid biomarkers. The next challenge consists of understanding the molecular basis of these associations. The integration of multiple association datasets, including gene expression datasets, can contribute to this goal. We have developed a novel statistical methodology to assess whether two association signals are consistent with a shared causal variant. An application is the integration of disease scans with expression quantitative trait locus (eQTL) studies, but any pair of GWAS datasets can be integrated in this framework. We demonstrate the value of the approach by re-analysing a gene expression dataset in 966 liver samples with a published meta-analysis of lipid traits including >100,000 individuals of European ancestry. Combining all lipid biomarkers, our re-analysis supported 26 out of 38 reported colocalisation results with eQTLs and identified 14 new colocalisation results, hence highlighting the value of a formal statistical test. In three cases of reported eQTL-lipid pairs (SYPL2, IFT172, TBKBP1) for which our analysis suggests that the eQTL pattern is not consistent with the lipid association, we identify alternative colocalisation results with SORT1, GCKR, and KPNB1, indicating that these genes are more likely to be causal in these genomic intervals. A key feature of the method is the ability to derive the output statistics from single SNP summary statistics, hence making it possible to perform systematic meta-analysis type comparisons across multiple GWAS datasets (implemented online at http://coloc.cs.ucl.ac.uk/coloc/). Our methodology provides information about candidate causal genes in associated intervals and has direct implications for the understanding of complex diseases as well as the design of drugs to target disease pathways.
GLEAM version 3: Global Land Evaporation Datasets and Model
NASA Astrophysics Data System (ADS)
Martens, B.; Miralles, D. G.; Lievens, H.; van der Schalie, R.; de Jeu, R.; Fernandez-Prieto, D.; Verhoest, N.
2015-12-01
Terrestrial evaporation links energy, water and carbon cycles over land and is therefore a key variable of the climate system. However, the global-scale magnitude and variability of the flux, and the sensitivity of the underlying physical process to changes in environmental factors, are still poorly understood due to limitations in in situ measurements. As a result, several methods have risen to estimate global patterns of land evaporation from satellite observations. However, these algorithms generally differ in their approach to model evaporation, resulting in large differences in their estimates. One of these methods is GLEAM, the Global Land Evaporation: the Amsterdam Methodology. GLEAM estimates terrestrial evaporation based on daily satellite observations of meteorological variables, vegetation characteristics and soil moisture. Since the publication of the first version of the algorithm (2011), the model has been widely applied to analyse trends in the water cycle and land-atmospheric feedbacks during extreme hydrometeorological events. A third version of the GLEAM global datasets is foreseen by the end of 2015. Given the relevance of having a continuous and reliable record of global-scale evaporation estimates for climate and hydrological research, the establishment of an online data portal to host these data to the public is also foreseen. In this new release of the GLEAM datasets, different components of the model have been updated, with the most significant change being the revision of the data assimilation algorithm. In this presentation, we will highlight the most important changes of the methodology and present three new GLEAM datasets and their validation against in situ observations and an alternative dataset of terrestrial evaporation (ERA-Land). Results of the validation exercise indicate that the magnitude and the spatiotemporal variability of the modelled evaporation agree reasonably well with the estimates of ERA-Land and the in situ observations. It is also shown that the performance of the revised model is higher compared to the original one.
Fernandez-Lozano, Carlos; Gestal, Marcos; Munteanu, Cristian R; Dorado, Julian; Pazos, Alejandro
2016-01-01
The design of experiments and the validation of the results achieved with them are vital in any research study. This paper focuses on the use of different Machine Learning approaches for regression tasks in the field of Computational Intelligence and especially on a correct comparison between the different results provided for different methods, as those techniques are complex systems that require further study to be fully understood. A methodology commonly accepted in Computational intelligence is implemented in an R package called RRegrs. This package includes ten simple and complex regression models to carry out predictive modeling using Machine Learning and well-known regression algorithms. The framework for experimental design presented herein is evaluated and validated against RRegrs. Our results are different for three out of five state-of-the-art simple datasets and it can be stated that the selection of the best model according to our proposal is statistically significant and relevant. It is of relevance to use a statistical approach to indicate whether the differences are statistically significant using this kind of algorithms. Furthermore, our results with three real complex datasets report different best models than with the previously published methodology. Our final goal is to provide a complete methodology for the use of different steps in order to compare the results obtained in Computational Intelligence problems, as well as from other fields, such as for bioinformatics, cheminformatics, etc., given that our proposal is open and modifiable.
Gestal, Marcos; Munteanu, Cristian R.; Dorado, Julian; Pazos, Alejandro
2016-01-01
The design of experiments and the validation of the results achieved with them are vital in any research study. This paper focuses on the use of different Machine Learning approaches for regression tasks in the field of Computational Intelligence and especially on a correct comparison between the different results provided for different methods, as those techniques are complex systems that require further study to be fully understood. A methodology commonly accepted in Computational intelligence is implemented in an R package called RRegrs. This package includes ten simple and complex regression models to carry out predictive modeling using Machine Learning and well-known regression algorithms. The framework for experimental design presented herein is evaluated and validated against RRegrs. Our results are different for three out of five state-of-the-art simple datasets and it can be stated that the selection of the best model according to our proposal is statistically significant and relevant. It is of relevance to use a statistical approach to indicate whether the differences are statistically significant using this kind of algorithms. Furthermore, our results with three real complex datasets report different best models than with the previously published methodology. Our final goal is to provide a complete methodology for the use of different steps in order to compare the results obtained in Computational Intelligence problems, as well as from other fields, such as for bioinformatics, cheminformatics, etc., given that our proposal is open and modifiable. PMID:27920952
Paliwal, Nikhil; Damiano, Robert J; Varble, Nicole A; Tutino, Vincent M; Dou, Zhongwang; Siddiqui, Adnan H; Meng, Hui
2017-12-01
Computational fluid dynamics (CFD) is a promising tool to aid in clinical diagnoses of cardiovascular diseases. However, it uses assumptions that simplify the complexities of the real cardiovascular flow. Due to high-stakes in the clinical setting, it is critical to calculate the effect of these assumptions in the CFD simulation results. However, existing CFD validation approaches do not quantify error in the simulation results due to the CFD solver's modeling assumptions. Instead, they directly compare CFD simulation results against validation data. Thus, to quantify the accuracy of a CFD solver, we developed a validation methodology that calculates the CFD model error (arising from modeling assumptions). Our methodology identifies independent error sources in CFD and validation experiments, and calculates the model error by parsing out other sources of error inherent in simulation and experiments. To demonstrate the method, we simulated the flow field of a patient-specific intracranial aneurysm (IA) in the commercial CFD software star-ccm+. Particle image velocimetry (PIV) provided validation datasets for the flow field on two orthogonal planes. The average model error in the star-ccm+ solver was 5.63 ± 5.49% along the intersecting validation line of the orthogonal planes. Furthermore, we demonstrated that our validation method is superior to existing validation approaches by applying three representative existing validation techniques to our CFD and experimental dataset, and comparing the validation results. Our validation methodology offers a streamlined workflow to extract the "true" accuracy of a CFD solver.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hamada, Yuki; Grippo, Mark A.
2015-01-01
A monitoring plan that incorporates regional datasets and integrates cost-effective data collection methods is necessary to sustain the long-term environmental monitoring of utility-scale solar energy development in expansive, environmentally sensitive desert environments. Using very high spatial resolution (VHSR; 15 cm) multispectral imagery collected in November 2012 and January 2014, an image processing routine was developed to characterize ephemeral streams, vegetation, and land surface in the southwestern United States where increased utility-scale solar development is anticipated. In addition to knowledge about desert landscapes, the methodology integrates existing spectral indices and transformation (e.g., visible atmospherically resistant index and principal components); a newlymore » developed index, erosion resistance index (ERI); and digital terrain and surface models, all of which were derived from a common VHSR image. The methodology identified fine-scale ephemeral streams with greater detail than the National Hydrography Dataset and accurately estimated vegetation distribution and fractional cover of various surface types. The ERI classified surface types that have a range of erosive potentials. The remote-sensing methodology could ultimately reduce uncertainty and monitoring costs for all stakeholders by providing a cost-effective monitoring approach that accurately characterizes the land resources at potential development sites.« less
USDA-ARS?s Scientific Manuscript database
We describe new methods for characterizing gene tree discordance in phylogenomic datasets, which screen for deviations from neutral expectations, summarize variation in statistical support among gene trees, and allow comparison of the patterns of discordance induced by various analysis choices. Usin...
This dataset provides the basic building blocks for the USEEIO v1.1 model and life cycle results per $1 (2013 USD) demand for all goods and services in the model in the producer's price (see BEA 2015). The methodology underlying USEEIO is described in Yang, Ingwersen et al., 2017...
The Development of the Global Citizenship Inventory for Adolescents
ERIC Educational Resources Information Center
Van Gent, Marije; Carabain, Christine; De Goede, Irene; Boonstoppel, Evelien; Hogeling, Lette
2013-01-01
In this paper we report on the development of an inventory that measures global citizenship among adolescents. The methodology used consists of cognitive interviews for questionnaire design and explorative and confirmatory factor analyses among several datasets. The resulting Global Citizenship Inventory (GCI) includes a global citizenship…
Online interviewing with interpreters in humanitarian contexts
Chiumento, Anna; Rahman, Atif; Frith, Lucy
2018-01-01
ABSTRACT Purpose: Recognising that one way to address the logistical and safety considerations of research conducted in humanitarian emergencies is to use internet communication technologies to facilitate interviews online, this article explores some practical and methodological considerations inherent to qualitative online interviewing. Method: Reflections from a case study of a multi-site research project conducted in post-conflict countries are presented. Synchronous online cross-language qualitative interviews were conducted in one country. Although only a small proportion of interviews were conducted online (six out of 35), it remains important to critically consider the impact upon data produced in this way. Results: A range of practical and methodological considerations are discussed, illustrated with examples. Results suggest that whilst online interviewing has methodological and ethical potential and versatility, there are inherent practical challenges in settings with poor internet and electricity infrastructure. Notable methodological limitations include barriers to building rapport due to partial visual and non-visual cues, and difficulties interpreting pauses or silences. Conclusions: Drawing upon experiences in this case study, strategies for managing the practical and methodological limitations of online interviewing are suggested, alongside recommendations for supporting future research practice. These are intended to act as a springboard for further reflection, and operate alongside other conceptual frameworks for online interviewing. PMID:29532739
Online interviewing with interpreters in humanitarian contexts.
Chiumento, Anna; Machin, Laura; Rahman, Atif; Frith, Lucy
2018-12-01
Recognising that one way to address the logistical and safety considerations of research conducted in humanitarian emergencies is to use internet communication technologies to facilitate interviews online, this article explores some practical and methodological considerations inherent to qualitative online interviewing. Reflections from a case study of a multi-site research project conducted in post-conflict countries are presented. Synchronous online cross-language qualitative interviews were conducted in one country. Although only a small proportion of interviews were conducted online (six out of 35), it remains important to critically consider the impact upon data produced in this way. A range of practical and methodological considerations are discussed, illustrated with examples. Results suggest that whilst online interviewing has methodological and ethical potential and versatility, there are inherent practical challenges in settings with poor internet and electricity infrastructure. Notable methodological limitations include barriers to building rapport due to partial visual and non-visual cues, and difficulties interpreting pauses or silences. Drawing upon experiences in this case study, strategies for managing the practical and methodological limitations of online interviewing are suggested, alongside recommendations for supporting future research practice. These are intended to act as a springboard for further reflection, and operate alongside other conceptual frameworks for online interviewing.
Zhang, Tingting; Wei, Wensong; Zhao, Bin; Wang, Ranran; Li, Mingliu; Yang, Liming; Wang, Jianhua; Sun, Qun
2018-03-08
This study investigated the possibility of using visible and near-infrared (VIS/NIR) hyperspectral imaging techniques to discriminate viable and non-viable wheat seeds. Both sides of individual seeds were subjected to hyperspectral imaging (400-1000 nm) to acquire reflectance spectral data. Four spectral datasets, including the ventral groove side, reverse side, mean (the mean of two sides' spectra of every seed), and mixture datasets (two sides' spectra of every seed), were used to construct the models. Classification models, partial least squares discriminant analysis (PLS-DA), and support vector machines (SVM), coupled with some pre-processing methods and successive projections algorithm (SPA), were built for the identification of viable and non-viable seeds. Our results showed that the standard normal variate (SNV)-SPA-PLS-DA model had high classification accuracy for whole seeds (>85.2%) and for viable seeds (>89.5%), and that the prediction set was based on a mixed spectral dataset by only using 16 wavebands. After screening with this model, the final germination of the seed lot could be higher than 89.5%. Here, we develop a reliable methodology for predicting the viability of wheat seeds, showing that the VIS/NIR hyperspectral imaging is an accurate technique for the classification of viable and non-viable wheat seeds in a non-destructive manner.
Zhang, Tingting; Wei, Wensong; Zhao, Bin; Wang, Ranran; Li, Mingliu; Yang, Liming; Wang, Jianhua; Sun, Qun
2018-01-01
This study investigated the possibility of using visible and near-infrared (VIS/NIR) hyperspectral imaging techniques to discriminate viable and non-viable wheat seeds. Both sides of individual seeds were subjected to hyperspectral imaging (400–1000 nm) to acquire reflectance spectral data. Four spectral datasets, including the ventral groove side, reverse side, mean (the mean of two sides’ spectra of every seed), and mixture datasets (two sides’ spectra of every seed), were used to construct the models. Classification models, partial least squares discriminant analysis (PLS-DA), and support vector machines (SVM), coupled with some pre-processing methods and successive projections algorithm (SPA), were built for the identification of viable and non-viable seeds. Our results showed that the standard normal variate (SNV)-SPA-PLS-DA model had high classification accuracy for whole seeds (>85.2%) and for viable seeds (>89.5%), and that the prediction set was based on a mixed spectral dataset by only using 16 wavebands. After screening with this model, the final germination of the seed lot could be higher than 89.5%. Here, we develop a reliable methodology for predicting the viability of wheat seeds, showing that the VIS/NIR hyperspectral imaging is an accurate technique for the classification of viable and non-viable wheat seeds in a non-destructive manner. PMID:29517991
A Metastatistical Approach to Satellite Estimates of Extreme Rainfall Events
NASA Astrophysics Data System (ADS)
Zorzetto, E.; Marani, M.
2017-12-01
The estimation of the average recurrence interval of intense rainfall events is a central issue for both hydrologic modeling and engineering design. These estimates require the inference of the properties of the right tail of the statistical distribution of precipitation, a task often performed using the Generalized Extreme Value (GEV) distribution, estimated either from a samples of annual maxima (AM) or with a peaks over threshold (POT) approach. However, these approaches require long and homogeneous rainfall records, which often are not available, especially in the case of remote-sensed rainfall datasets. We use here, and tailor it to remotely-sensed rainfall estimates, an alternative approach, based on the metastatistical extreme value distribution (MEVD), which produces estimates of rainfall extreme values based on the probability distribution function (pdf) of all measured `ordinary' rainfall event. This methodology also accounts for the interannual variations observed in the pdf of daily rainfall by integrating over the sample space of its random parameters. We illustrate the application of this framework to the TRMM Multi-satellite Precipitation Analysis rainfall dataset, where MEVD optimally exploits the relatively short datasets of satellite-sensed rainfall, while taking full advantage of its high spatial resolution and quasi-global coverage. Accuracy of TRMM precipitation estimates and scale issues are here investigated for a case study located in the Little Washita watershed, Oklahoma, using a dense network of rain gauges for independent ground validation. The methodology contributes to our understanding of the risk of extreme rainfall events, as it allows i) an optimal use of the TRMM datasets in estimating the tail of the probability distribution of daily rainfall, and ii) a global mapping of daily rainfall extremes and distributional tail properties, bridging the existing gaps in rain gauges networks.
Using discrete choice experiments within a cost-benefit analysis framework: some considerations.
McIntosh, Emma
2006-01-01
A great advantage of the stated preference discrete choice experiment (SPDCE) approach to economic evaluation methodology is its immense flexibility within applied cost-benefit analyses (CBAs). However, while the use of SPDCEs in healthcare has increased markedly in recent years there has been a distinct lack of equivalent CBAs in healthcare using such SPDCE-derived valuations. This article outlines specific issues and some practical suggestions for consideration relevant to the development of CBAs using SPDCE-derived benefits. The article shows that SPDCE-derived CBA can adopt recent developments in cost-effectiveness methodology including the cost-effectiveness plane, appropriate consideration of uncertainty, the net-benefit framework and probabilistic sensitivity analysis methods, while maintaining the theoretical advantage of the SPDCE approach. The concept of a cost-benefit plane is no different in principle to the cost-effectiveness plane and can be a useful tool for reporting and presenting the results of CBAs.However, there are many challenging issues to address for the advancement of CBA methodology using SPCDEs within healthcare. Particular areas for development include the importance of accounting for uncertainty in SPDCE-derived willingness-to-pay values, the methodology of SPDCEs in clinical trial settings and economic models, measurement issues pertinent to using SPDCEs specifically in healthcare, and the importance of issues such as consideration of the dynamic nature of healthcare and the resulting impact this has on the validity of attribute definitions and context.
Upscaling river biomass using dimensional analysis and hydrogeomorphic scaling
NASA Astrophysics Data System (ADS)
Barnes, Elizabeth A.; Power, Mary E.; Foufoula-Georgiou, Efi; Hondzo, Miki; Dietrich, William E.
2007-12-01
We propose a methodology for upscaling biomass in a river using a combination of dimensional analysis and hydro-geomorphologic scaling laws. We first demonstrate the use of dimensional analysis for determining local scaling relationships between Nostoc biomass and hydrologic and geomorphic variables. We then combine these relationships with hydraulic geometry and streamflow scaling in order to upscale biomass from point to reach-averaged quantities. The methodology is demonstrated through an illustrative example using an 18 year dataset of seasonal monitoring of biomass of a stream cyanobacterium (Nostoc parmeloides) in a northern California river.
Using Random Forest Models to Predict Organizational Violence
NASA Technical Reports Server (NTRS)
Levine, Burton; Bobashev, Georgly
2012-01-01
We present a methodology to access the proclivity of an organization to commit violence against nongovernment personnel. We fitted a Random Forest model using the Minority at Risk Organizational Behavior (MAROS) dataset. The MAROS data is longitudinal; so, individual observations are not independent. We propose a modification to the standard Random Forest methodology to account for the violation of the independence assumption. We present the results of the model fit, an example of predicting violence for an organization; and finally, we present a summary of the forest in a "meta-tree,"
An Integrated Science-based methodology
The data is secondary in nature. Meaning that no data was generated as part of this review effort. Rather, data that was available in the peer-reviewed literature was used.This dataset is associated with the following publication:Tolaymat , T., A. El Badawy, R. Sequeira, and A. Genaidy. An integrated science-based methodology to assess potential risks and implications of engineered nanomaterials. Diana Aga, Wonyong Choi, Andrew Daugulis, Gianluca Li Puma, Gerasimos Lyberatos, and Joo Hwa Tay JOURNAL OF HAZARDOUS MATERIALS. Elsevier Science Ltd, New York, NY, USA, 298: 270-281, (2015).
McCann, Liza J; Pilkington, Clarissa A; Huber, Adam M; Ravelli, Angelo; Appelbe, Duncan; Kirkham, Jamie J; Williamson, Paula R; Aggarwal, Amita; Christopher-Stine, Lisa; Constantin, Tamas; Feldman, Brian M; Lundberg, Ingrid; Maillard, Sue; Mathiesen, Pernille; Murphy, Ruth; Pachman, Lauren M; Reed, Ann M; Rider, Lisa G; van Royen-Kerkof, Annet; Russo, Ricardo; Spinty, Stefan; Wedderburn, Lucy R
2018-01-01
Objectives This study aimed to develop consensus on an internationally agreed dataset for juvenile dermatomyositis (JDM), designed for clinical use, to enhance collaborative research and allow integration of data between centres. Methods A prototype dataset was developed through a formal process that included analysing items within existing databases of patients with idiopathic inflammatory myopathies. This template was used to aid a structured multistage consensus process. Exploiting Delphi methodology, two web-based questionnaires were distributed to healthcare professionals caring for patients with JDM identified through email distribution lists of international paediatric rheumatology and myositis research groups. A separate questionnaire was sent to parents of children with JDM and patients with JDM, identified through established research networks and patient support groups. The results of these parallel processes informed a face-to-face nominal group consensus meeting of international myositis experts, tasked with defining the content of the dataset. This developed dataset was tested in routine clinical practice before review and finalisation. Results A dataset containing 123 items was formulated with an accompanying glossary. Demographic and diagnostic data are contained within form A collected at baseline visit only, disease activity measures are included within form B collected at every visit and disease damage items within form C collected at baseline and annual visits thereafter. Conclusions Through a robust international process, a consensus dataset for JDM has been formulated that can capture disease activity and damage over time. This dataset can be incorporated into national and international collaborative efforts, including existing clinical research databases. PMID:29084729
Participant Observation and the Political Scientist: Possibilities, Priorities, and Practicalities
ERIC Educational Resources Information Center
Gillespie, Andra; Michelson, Melissa R.
2011-01-01
Surveys, experiments, large-"N" datasets and formal models are common instruments in the political scientist's toolkit. In-depth interviews and focus groups play a critical role in helping scholars answer important political questions. In contrast, participant observation techniques are an underused methodological approach. In this article, we…
Initial Development and Validation of the Global Citizenship Scale
ERIC Educational Resources Information Center
Morais, Duarte B.; Ogden, Anthony C.
2011-01-01
The purpose of this article is to report on the initial development of a theoretically grounded and empirically validated scale to measure global citizenship. The methodology employed is multi-faceted, including two expert face validity trials, extensive exploratory and confirmatory factor analyses with multiple datasets, and a series of three…
Diurnal Soil Temperature Effects within the Globe[R] Program Dataset
ERIC Educational Resources Information Center
Witter, Jason D.; Spongberg, Alison L.; Czajkowski, Kevin P.
2007-01-01
Long-term collection of soil temperature with depth is important when studying climate change. The international program GLOBE[R] provides an excellent opportunity to collect such data, although currently endorsed temperature collection protocols need to be refined. To enhance data quality, protocol-based methodology and automated data logging,…
ECOALIM: A Dataset of Environmental Impacts of Feed Ingredients Used in French Animal Production.
Wilfart, Aurélie; Espagnol, Sandrine; Dauguet, Sylvie; Tailleur, Aurélie; Gac, Armelle; Garcia-Launay, Florence
2016-01-01
Feeds contribute highly to environmental impacts of livestock products. Therefore, formulating low-impact feeds requires data on environmental impacts of feed ingredients with consistent perimeters and methodology for life cycle assessment (LCA). We created the ECOALIM dataset of life cycle inventories (LCIs) and associated impacts of feed ingredients used in animal production in France. It provides several perimeters for LCIs (field gate, storage agency gate, plant gate and harbour gate) with homogeneously collected data from French R&D institutes covering the 2005-2012 period. The dataset of environmental impacts is available as a Microsoft® Excel spreadsheet on the ECOALIM website and provides climate change, acidification, eutrophication, non-renewable and total cumulative energy demand, phosphorus demand, and land occupation. LCIs in the ECOALIM dataset are available in the AGRIBALYSE® database in SimaPro® software. The typology performed on the dataset classified the 149 average feed ingredients into categories of low impact (co-products of plant origin and minerals), high impact (feed-use amino acids, fats and vitamins) and intermediate impact (cereals, oilseeds, oil meals and protein crops). Therefore, the ECOALIM dataset can be used by feed manufacturers and LCA practitioners to investigate formulation of low-impact feeds. It also provides data for environmental evaluation of feeds and animal production systems. Included in AGRIBALYSE® database and SimaPro®, the ECOALIM dataset will benefit from their procedures for maintenance and regular updating. Future use can also include environmental labelling of commercial products from livestock production.
ECOALIM: A Dataset of Environmental Impacts of Feed Ingredients Used in French Animal Production
Espagnol, Sandrine; Dauguet, Sylvie; Tailleur, Aurélie; Gac, Armelle; Garcia-Launay, Florence
2016-01-01
Feeds contribute highly to environmental impacts of livestock products. Therefore, formulating low-impact feeds requires data on environmental impacts of feed ingredients with consistent perimeters and methodology for life cycle assessment (LCA). We created the ECOALIM dataset of life cycle inventories (LCIs) and associated impacts of feed ingredients used in animal production in France. It provides several perimeters for LCIs (field gate, storage agency gate, plant gate and harbour gate) with homogeneously collected data from French R&D institutes covering the 2005–2012 period. The dataset of environmental impacts is available as a Microsoft® Excel spreadsheet on the ECOALIM website and provides climate change, acidification, eutrophication, non-renewable and total cumulative energy demand, phosphorus demand, and land occupation. LCIs in the ECOALIM dataset are available in the AGRIBALYSE® database in SimaPro® software. The typology performed on the dataset classified the 149 average feed ingredients into categories of low impact (co-products of plant origin and minerals), high impact (feed-use amino acids, fats and vitamins) and intermediate impact (cereals, oilseeds, oil meals and protein crops). Therefore, the ECOALIM dataset can be used by feed manufacturers and LCA practitioners to investigate formulation of low-impact feeds. It also provides data for environmental evaluation of feeds and animal production systems. Included in AGRIBALYSE® database and SimaPro®, the ECOALIM dataset will benefit from their procedures for maintenance and regular updating. Future use can also include environmental labelling of commercial products from livestock production. PMID:27930682
Documentation of indigenous Pacific agroforestry systems: a review of methodologies
Bill Raynor
1993-01-01
Recent interest in indigenous agroforestry has led to a need for documentation of these systems. However, previous work is very limited, and few methodologies are well-known or widely accepted. This paper outlines various methodologies (including sampling methods, data to be collected, and considerations in analysis) for documenting structure and productivity of...
Shoulder-to-Shoulder Research "with" Children: Methodological and Ethical Considerations
ERIC Educational Resources Information Center
Griffin, Krista M.; Lahman, Maria K. E.; Opitz, Michael F.
2016-01-01
This paper presents a methodological study with children where two different interview methods were utilized: the "walk-around" (a form of mobile interview) and the "shoulder-to-shoulder." The paper reviews the methodological aspects of the study then provides a brief review of the history of methods employed in research with…
Active learning for clinical text classification: is it better than random sampling?
Figueroa, Rosa L; Zeng-Treitler, Qing; Ngo, Long H; Goryachev, Sergey; Wiechmann, Eduardo P
2012-01-01
This study explores active learning algorithms as a way to reduce the requirements for large training sets in medical text classification tasks. Three existing active learning algorithms (distance-based (DIST), diversity-based (DIV), and a combination of both (CMB)) were used to classify text from five datasets. The performance of these algorithms was compared to that of passive learning on the five datasets. We then conducted a novel investigation of the interaction between dataset characteristics and the performance results. Classification accuracy and area under receiver operating characteristics (ROC) curves for each algorithm at different sample sizes were generated. The performance of active learning algorithms was compared with that of passive learning using a weighted mean of paired differences. To determine why the performance varies on different datasets, we measured the diversity and uncertainty of each dataset using relative entropy and correlated the results with the performance differences. The DIST and CMB algorithms performed better than passive learning. With a statistical significance level set at 0.05, DIST outperformed passive learning in all five datasets, while CMB was found to be better than passive learning in four datasets. We found strong correlations between the dataset diversity and the DIV performance, as well as the dataset uncertainty and the performance of the DIST algorithm. For medical text classification, appropriate active learning algorithms can yield performance comparable to that of passive learning with considerably smaller training sets. In particular, our results suggest that DIV performs better on data with higher diversity and DIST on data with lower uncertainty.
Active learning for clinical text classification: is it better than random sampling?
Figueroa, Rosa L; Ngo, Long H; Goryachev, Sergey; Wiechmann, Eduardo P
2012-01-01
Objective This study explores active learning algorithms as a way to reduce the requirements for large training sets in medical text classification tasks. Design Three existing active learning algorithms (distance-based (DIST), diversity-based (DIV), and a combination of both (CMB)) were used to classify text from five datasets. The performance of these algorithms was compared to that of passive learning on the five datasets. We then conducted a novel investigation of the interaction between dataset characteristics and the performance results. Measurements Classification accuracy and area under receiver operating characteristics (ROC) curves for each algorithm at different sample sizes were generated. The performance of active learning algorithms was compared with that of passive learning using a weighted mean of paired differences. To determine why the performance varies on different datasets, we measured the diversity and uncertainty of each dataset using relative entropy and correlated the results with the performance differences. Results The DIST and CMB algorithms performed better than passive learning. With a statistical significance level set at 0.05, DIST outperformed passive learning in all five datasets, while CMB was found to be better than passive learning in four datasets. We found strong correlations between the dataset diversity and the DIV performance, as well as the dataset uncertainty and the performance of the DIST algorithm. Conclusion For medical text classification, appropriate active learning algorithms can yield performance comparable to that of passive learning with considerably smaller training sets. In particular, our results suggest that DIV performs better on data with higher diversity and DIST on data with lower uncertainty. PMID:22707743
Heyer, K; Herberger, K; Protz, K; Mayer, A; Dissemond, J; Debus, S; Augustin, M
2017-09-01
Standards for basic documentation and the course of treatment increase quality assurance and efficiency in health care. To date, no standards for the treatment of patients with leg ulcers are available in Germany. The aim of the study was to develop standards under routine conditions in the documentation of patients with leg ulcers. This article shows the recommended variables of a "standard dataset" and a "minimum dataset". Consensus building among experts from 38 scientific societies, professional associations, insurance and supply networks (n = 68 experts) took place. After conducting a systematic international literature research, available standards were reviewed and supplemented with our own considerations of the expert group. From 2012-2015 standards for documentation were defined in multistage online visits and personal meetings. A consensus was achieved for 18 variables for the minimum dataset and 48 variables for the standard dataset in a total of seven meetings and nine online Delphi visits. The datasets involve patient baseline data, data on the general health status, wound characteristics, diagnostic and therapeutic interventions, patient reported outcomes, nutrition, and education status. Based on a multistage continuous decision-making process, a standard in the measurement of events in routine care in patients with a leg ulcer was developed.
Designing Solar Data Archives: Practical Considerations
NASA Astrophysics Data System (ADS)
Messerotti, M.
The variety of new solar observatories in space and on the ground poses the stringent problem of an efficient storage and archiving of huge datasets. We briefly address some typical architectures and consider the key point of data access and distribution through networking.
NASA Astrophysics Data System (ADS)
Carniel, Roberto; Di Cecca, Mauro; Jaquet, Olivier
2006-05-01
In the framework of the EU-funded project "Multi-disciplinary monitoring, modelling and forecasting of volcanic hazard" (MULTIMO), multiparametric data have been recorded at the MULTIMO station in Montserrat. Moreover, several other long time series, recorded at Montserrat and at other volcanoes, have been acquired in order to test stochastic and deterministic methodologies under development. Creating a general framework to handle data efficiently is a considerable task even for homogeneous data. In the case of heterogeneous data, this becomes a major issue. A need for a consistent way of browsing such a heterogeneous dataset in a user-friendly way therefore arose. Additionally, a framework for applying the calculation of the developed dynamical parameters on the data series was also needed in order to easily keep these parameters under control, e.g. for monitoring, research or forecasting purposes. The solution which we present is completely based on Open Source software, including Linux operating system, MySql database management system, Apache web server, Zope application server, Scilab math engine, Plone content management framework, Unified Modelling Language. From the user point of view the main advantage is the possibility of browsing through datasets recorded on different volcanoes, with different instruments, with different sampling frequencies, stored in different formats, all via a consistent, user- friendly interface that transparently runs queries to the database, gets the data from the main storage units, generates the graphs and produces dynamically generated web pages to interact with the user. The involvement of third parties for continuing the development in the Open Source philosophy and/or extending the application fields is now sought.
A risk assessment methodology for critical transportation infrastructure.
DOT National Transportation Integrated Search
2002-01-01
Infrastructure protection typifies a problem of risk assessment and management in a large-scale system. This study offers a methodological framework to identify, prioritize, assess, and manage risks. It includes the following major considerations: (1...
Using Risk Assessment Methodologies to Meet Management Objectives
NASA Technical Reports Server (NTRS)
DeMott, D. L.
2015-01-01
Current decision making involves numerous possible combinations of technology elements, safety and health issues, operational aspects and process considerations to satisfy program goals. Identifying potential risk considerations as part of the management decision making process provides additional tools to make more informed management decision. Adapting and using risk assessment methodologies can generate new perspectives on various risk and safety concerns that are not immediately apparent. Safety and operational risks can be identified and final decisions can balance these considerations with cost and schedule risks. Additional assessments can also show likelihood of event occurrence and event consequence to provide a more informed basis for decision making, as well as cost effective mitigation strategies. Methodologies available to perform Risk Assessments range from qualitative identification of risk potential, to detailed assessments where quantitative probabilities are calculated. Methodology used should be based on factors that include: 1) type of industry and industry standards, 2) tasks, tools, and environment 3) type and availability of data and 4) industry views and requirements regarding risk & reliability. Risk Assessments are a tool for decision makers to understand potential consequences and be in a position to reduce, mitigate or eliminate costly mistakes or catastrophic failures.
Computer assisted screening, correction, and analysis of historical weather measurements
NASA Astrophysics Data System (ADS)
Burnette, Dorian J.; Stahle, David W.
2013-04-01
A computer program, Historical Observation Tools (HOB Tools), has been developed to facilitate many of the calculations used by historical climatologists to develop instrumental and documentary temperature and precipitation datasets and makes them readily accessible to other researchers. The primitive methodology used by the early weather observers makes the application of standard techniques difficult. HOB Tools provides a step-by-step framework to visually and statistically assess, adjust, and reconstruct historical temperature and precipitation datasets. These routines include the ability to check for undocumented discontinuities, adjust temperature data for poor thermometer exposures and diurnal averaging, and assess and adjust daily precipitation data for undercount. This paper provides an overview of the Visual Basic.NET program and a demonstration of how it can assist in the development of extended temperature and precipitation datasets using modern and early instrumental measurements from the United States.
Automatic detection of blood vessels in retinal images for diabetic retinopathy diagnosis.
Raja, D Siva Sundhara; Vasuki, S
2015-01-01
Diabetic retinopathy (DR) is a leading cause of vision loss in diabetic patients. DR is mainly caused due to the damage of retinal blood vessels in the diabetic patients. It is essential to detect and segment the retinal blood vessels for DR detection and diagnosis, which prevents earlier vision loss in diabetic patients. The computer aided automatic detection and segmentation of blood vessels through the elimination of optic disc (OD) region in retina are proposed in this paper. The OD region is segmented using anisotropic diffusion filter and subsequentially the retinal blood vessels are detected using mathematical binary morphological operations. The proposed methodology is tested on two different publicly available datasets and achieved 93.99% sensitivity, 98.37% specificity, 98.08% accuracy in DRIVE dataset and 93.6% sensitivity, 98.96% specificity, and 95.94% accuracy in STARE dataset, respectively.
High resolution global gridded data for use in population studies
Lloyd, Christopher T.; Sorichetta, Alessandro; Tatem, Andrew J.
2017-01-01
Recent years have seen substantial growth in openly available satellite and other geospatial data layers, which represent a range of metrics relevant to global human population mapping at fine spatial scales. The specifications of such data differ widely and therefore the harmonisation of data layers is a prerequisite to constructing detailed and contemporary spatial datasets which accurately describe population distributions. Such datasets are vital to measure impacts of population growth, monitor change, and plan interventions. To this end the WorldPop Project has produced an open access archive of 3 and 30 arc-second resolution gridded data. Four tiled raster datasets form the basis of the archive: (i) Viewfinder Panoramas topography clipped to Global ADMinistrative area (GADM) coastlines; (ii) a matching ISO 3166 country identification grid; (iii) country area; (iv) and slope layer. Further layers include transport networks, landcover, nightlights, precipitation, travel time to major cities, and waterways. Datasets and production methodology are here described. The archive can be downloaded both from the WorldPop Dataverse Repository and the WorldPop Project website. PMID:28140386
GSuite HyperBrowser: integrative analysis of dataset collections across the genome and epigenome.
Simovski, Boris; Vodák, Daniel; Gundersen, Sveinung; Domanska, Diana; Azab, Abdulrahman; Holden, Lars; Holden, Marit; Grytten, Ivar; Rand, Knut; Drabløs, Finn; Johansen, Morten; Mora, Antonio; Lund-Andersen, Christin; Fromm, Bastian; Eskeland, Ragnhild; Gabrielsen, Odd Stokke; Ferkingstad, Egil; Nakken, Sigve; Bengtsen, Mads; Nederbragt, Alexander Johan; Thorarensen, Hildur Sif; Akse, Johannes Andreas; Glad, Ingrid; Hovig, Eivind; Sandve, Geir Kjetil
2017-07-01
Recent large-scale undertakings such as ENCODE and Roadmap Epigenomics have generated experimental data mapped to the human reference genome (as genomic tracks) representing a variety of functional elements across a large number of cell types. Despite the high potential value of these publicly available data for a broad variety of investigations, little attention has been given to the analytical methodology necessary for their widespread utilisation. We here present a first principled treatment of the analysis of collections of genomic tracks. We have developed novel computational and statistical methodology to permit comparative and confirmatory analyses across multiple and disparate data sources. We delineate a set of generic questions that are useful across a broad range of investigations and discuss the implications of choosing different statistical measures and null models. Examples include contrasting analyses across different tissues or diseases. The methodology has been implemented in a comprehensive open-source software system, the GSuite HyperBrowser. To make the functionality accessible to biologists, and to facilitate reproducible analysis, we have also developed a web-based interface providing an expertly guided and customizable way of utilizing the methodology. With this system, many novel biological questions can flexibly be posed and rapidly answered. Through a combination of streamlined data acquisition, interoperable representation of dataset collections, and customizable statistical analysis with guided setup and interpretation, the GSuite HyperBrowser represents a first comprehensive solution for integrative analysis of track collections across the genome and epigenome. The software is available at: https://hyperbrowser.uio.no. © The Author 2017. Published by Oxford University Press.
Factors affecting reproducibility between genome-scale siRNA-based screens
Barrows, Nicholas J.; Le Sommer, Caroline; Garcia-Blanco, Mariano A.; Pearson, James L.
2011-01-01
RNA interference-based screening is a powerful new genomic technology which addresses gene function en masse. To evaluate factors influencing hit list composition and reproducibility, we performed two identically designed small interfering RNA (siRNA)-based, whole genome screens for host factors supporting yellow fever virus infection. These screens represent two separate experiments completed five months apart and allow the direct assessment of the reproducibility of a given siRNA technology when performed in the same environment. Candidate hit lists generated by sum rank, median absolute deviation, z-score, and strictly standardized mean difference were compared within and between whole genome screens. Application of these analysis methodologies within a single screening dataset using a fixed threshold equivalent to a p-value ≤ 0.001 resulted in hit lists ranging from 82 to 1,140 members and highlighted the tremendous impact analysis methodology has on hit list composition. Intra- and inter-screen reproducibility was significantly influenced by the analysis methodology and ranged from 32% to 99%. This study also highlighted the power of testing at least two independent siRNAs for each gene product in primary screens. To facilitate validation we conclude by suggesting methods to reduce false discovery at the primary screening stage. In this study we present the first comprehensive comparison of multiple analysis strategies, and demonstrate the impact of the analysis methodology on the composition of the “hit list”. Therefore, we propose that the entire dataset derived from functional genome-scale screens, especially if publicly funded, should be made available as is done with data derived from gene expression and genome-wide association studies. PMID:20625183
Investigation of Super Learner Methodology on HIV-1 Small Sample: Application on Jaguar Trial Data.
Houssaïni, Allal; Assoumou, Lambert; Marcelin, Anne Geneviève; Molina, Jean Michel; Calvez, Vincent; Flandre, Philippe
2012-01-01
Background. Many statistical models have been tested to predict phenotypic or virological response from genotypic data. A statistical framework called Super Learner has been introduced either to compare different methods/learners (discrete Super Learner) or to combine them in a Super Learner prediction method. Methods. The Jaguar trial is used to apply the Super Learner framework. The Jaguar study is an "add-on" trial comparing the efficacy of adding didanosine to an on-going failing regimen. Our aim was also to investigate the impact on the use of different cross-validation strategies and different loss functions. Four different repartitions between training set and validations set were tested through two loss functions. Six statistical methods were compared. We assess performance by evaluating R(2) values and accuracy by calculating the rates of patients being correctly classified. Results. Our results indicated that the more recent Super Learner methodology of building a new predictor based on a weighted combination of different methods/learners provided good performance. A simple linear model provided similar results to those of this new predictor. Slight discrepancy arises between the two loss functions investigated, and slight difference arises also between results based on cross-validated risks and results from full dataset. The Super Learner methodology and linear model provided around 80% of patients correctly classified. The difference between the lower and higher rates is around 10 percent. The number of mutations retained in different learners also varys from one to 41. Conclusions. The more recent Super Learner methodology combining the prediction of many learners provided good performance on our small dataset.
How will alcohol sales in the UK be affected if drinkers follow government guidelines?
Baumberg, Ben
2009-01-01
The proportion of alcohol consumption that is above government guidelines ('risky drinking') has been estimated in several countries, suggesting that reductions in risky drinking would lead to significant declines in total alcohol consumption. However, this has not previously been conducted transparently in the UK. Furthermore, existing studies have under-explored the importance of several methodological decisions, as well as not closely examining the meaning of these figures for debates on 'corporate social responsibility' (CSR). Secondary analysis of the amount of alcohol consumption above various government guidelines in four British datasets for 2000-2002: the National Diet and Nutrition Survey; the General Household Survey; Smoking, Drinking and Drug Use among Young People; and the March 2002 ONS Omnibus Survey. Risky drinking accounts for 55-82% of the total consumption by 18- to 64-year olds, depending on the definition of risky drinking used. If only alcohol above the government guidelines is counted, this falls to 22-47%. Consumption by underage drinkers accounts for 4.5% of the total consumption, while consumption by drink-drivers accounts for 0.5-8.0% depending on the assumptions made. Methodologically, the study shows that at least two decisions have considerable importance: the definition of risky drinking used and whether we count all drinking (as in most previous studies) or only drinking above guidelines. Substantively, these studies do not directly show that drink companies' profitability would be affected by declines in risky drinking. Nevertheless, they are valuable for present debate in themselves and form the basis of a more complex analysis of alcohol CSR.
Brenner, Everton A; Zein, Imad; Chen, Yongsheng; Andersen, Jeppe R; Wenzel, Gerhard; Ouzunova, Milena; Eder, Joachim; Darnhofer, Birte; Frei, Uschi; Barrière, Yves; Lübberstedt, Thomas
2010-02-12
OMT (O-methyltransferase) genes are involved in lignin biosynthesis, which relates to stover cell wall digestibility. Reduced lignin content is an important determinant of both forage quality and ethanol conversion efficiency of maize stover. Variation in genomic sequences coding for COMT, CCoAOMT1, and CCoAOMT2 was analyzed in relation to stover cell wall digestibility for a panel of 40 European forage maize inbred lines, and re-analyzed for a panel of 34 lines from a published French study. Different methodologies for association analysis were performed and compared. Across association methodologies, a total number of 25, 12, 1, 6 COMT polymorphic sites were significantly associated with DNDF, OMD, NDF, and WSC, respectively. Association analysis for CCoAOMT1 and CCoAOMT2 identified substantially fewer polymorphic sites (3 and 2, respectively) associated with the investigated traits. Our re-analysis on the 34 lines from a published French dataset identified 14 polymorphic sites significantly associated with cell wall digestibility, two of them were consistent with our study. Promising polymorphisms putatively causally associated with variability of cell wall digestibility were inferred from the total number of significantly associated SNPs/Indels. Several polymorphic sites for three O-methyltransferase loci were associated with stover cell wall digestibility. All three tested genes seem to be involved in controlling DNDF, in particular COMT. Thus, considerable variation among Bm3 wildtype alleles can be exploited for improving cell-wall digestibility. Target sites for functional markers were identified enabling development of efficient marker-based selection strategies.
2010-01-01
Background OMT (O-methyltransferase) genes are involved in lignin biosynthesis, which relates to stover cell wall digestibility. Reduced lignin content is an important determinant of both forage quality and ethanol conversion efficiency of maize stover. Results Variation in genomic sequences coding for COMT, CCoAOMT1, and CCoAOMT2 was analyzed in relation to stover cell wall digestibility for a panel of 40 European forage maize inbred lines, and re-analyzed for a panel of 34 lines from a published French study. Different methodologies for association analysis were performed and compared. Across association methodologies, a total number of 25, 12, 1, 6 COMT polymorphic sites were significantly associated with DNDF, OMD, NDF, and WSC, respectively. Association analysis for CCoAOMT1 and CCoAOMT2 identified substantially fewer polymorphic sites (3 and 2, respectively) associated with the investigated traits. Our re-analysis on the 34 lines from a published French dataset identified 14 polymorphic sites significantly associated with cell wall digestibility, two of them were consistent with our study. Promising polymorphisms putatively causally associated with variability of cell wall digestibility were inferred from the total number of significantly associated SNPs/Indels. Conclusions Several polymorphic sites for three O-methyltransferase loci were associated with stover cell wall digestibility. All three tested genes seem to be involved in controlling DNDF, in particular COMT. Thus, considerable variation among Bm3 wildtype alleles can be exploited for improving cell-wall digestibility. Target sites for functional markers were identified enabling development of efficient marker-based selection strategies. PMID:20152036
Greedy feature selection for glycan chromatography data with the generalized Dirichlet distribution
2013-01-01
Background Glycoproteins are involved in a diverse range of biochemical and biological processes. Changes in protein glycosylation are believed to occur in many diseases, particularly during cancer initiation and progression. The identification of biomarkers for human disease states is becoming increasingly important, as early detection is key to improving survival and recovery rates. To this end, the serum glycome has been proposed as a potential source of biomarkers for different types of cancers. High-throughput hydrophilic interaction liquid chromatography (HILIC) technology for glycan analysis allows for the detailed quantification of the glycan content in human serum. However, the experimental data from this analysis is compositional by nature. Compositional data are subject to a constant-sum constraint, which restricts the sample space to a simplex. Statistical analysis of glycan chromatography datasets should account for their unusual mathematical properties. As the volume of glycan HILIC data being produced increases, there is a considerable need for a framework to support appropriate statistical analysis. Proposed here is a methodology for feature selection in compositional data. The principal objective is to provide a template for the analysis of glycan chromatography data that may be used to identify potential glycan biomarkers. Results A greedy search algorithm, based on the generalized Dirichlet distribution, is carried out over the feature space to search for the set of “grouping variables” that best discriminate between known group structures in the data, modelling the compositional variables using beta distributions. The algorithm is applied to two glycan chromatography datasets. Statistical classification methods are used to test the ability of the selected features to differentiate between known groups in the data. Two well-known methods are used for comparison: correlation-based feature selection (CFS) and recursive partitioning (rpart). CFS is a feature selection method, while recursive partitioning is a learning tree algorithm that has been used for feature selection in the past. Conclusions The proposed feature selection method performs well for both glycan chromatography datasets. It is computationally slower, but results in a lower misclassification rate and a higher sensitivity rate than both correlation-based feature selection and the classification tree method. PMID:23651459
Cost-of-illness studies of atrial fibrillation: methodological considerations.
Becker, Christian
2014-10-01
Atrial fibrillation (AF) is the most common heart rhythm arrhythmia, which has considerable economic consequences. This study aims to identify the current cost-of-illness estimates of AF; a focus was put on describing the studies' methodology. A literature review was conducted. Twenty-eight cost-of-illness studies were identified. Cost-of-illness estimates exist for health insurance members, hospital and primary care populations. In addition, the cost of stroke in AF patients and the costs of post-operative AF were calculated. The methods used were heterogeneous, mostly studies calculated excess costs. The identified annual excess costs varied, even among studies from the USA (∼US$1900 to ∼US$19,000). While pointing toward considerable costs, the cost-of-illness studies' relevance could be improved by focusing on subpopulations and treatment mixes. As possible starting points for subsequent economic studies, the methodology of cost-of-illness studies should be taken into account using methods, allowing stakeholders to find suitable studies and validate estimates.
Wang, Zhi-wei; Wu, Xiao-dong; Yue, Guang-yang; Zhao, Lin; Wang, Qian; Nan, Zhuo-tong; Qin, Yu; Wu, Tong-hua; Shi, Jian-zong; Zou, De-fu
2016-02-01
Recently considerable researches have focused on monitoring vegetation changes because of its important role in regula- ting the terrestrial carbon cycle and the climate system. There were the largest areas with high-altitudes in the Qinghai-Tibet Plateau (QTP), which is often referred to as the third pole of the world. And vegetation in this region is significantly sensitive to the global warming. Meanwhile NDVI dataset was one of the most useful tools to monitor the vegetation activity with high spatial and temporal resolution, which is a normalized transform of the near-infrared radiation (NIR) to red reflectance ratio. Therefore, an extended GIMMS NDVI dataset from 1982-2006 to 1982-2014 was presented using a unary linear regression by MODIS dataset from 2000 to 2014 in QTP. Compared with previous researches, the accuracy of the extended NDVI dataset was improved again with consideration the residuals derived from scale transformation. So the model of extend NDVI dataset could be a new method to integrate different NDVI products. With the extended NDVI dataset, we found that in growing season there was a statistically significant increase (0.000 4 yr⁻¹, r² = 0.585 9, p < 0.001) in QTP from 1982 to 2014. During the study pe- riod, the trends of NDVI were significantly increased in spring (0.000 5 yr⁻¹, r² = 0.295 4, p = 0.001), summer (0.000 3 yr⁻¹, r² = 0.105 3, p = 0.065) and autumn respectively (0.000 6 yr⁻¹, r² = 0.436 7, p < 0.001). Due to the increased vegeta- tion activity in Qinghai-Tibet Plateau from 1982 to 2014, the magnitude of carbon sink was accumulated in this region also at this same period. Then the data of temperature and precipitation was used to explore the reason of vegetation changed. Although the trends of them are both increased, the correlation between NDVI and temperature is higher than precipitation in vegetation grow- ing season, spring, summer and autumn. Furthermore, there is significant spatial heterogeneity of the changing trends for ND- VI, temperature and precipitation at Qinghai-Tibet Plateau scale.
Naska, A; Trichopoulou, A
2001-08-01
The EU-supported project entitled: "Compatibility of household budget and individual nutrition surveys and disparities in food habits" aimed at comparing individualised household budget survey (HBS) data with food consumption values derived from individual nutrition surveys (INS). The present paper provides a brief description of the methodology applied for rendering the datasets at a comparable level. Results of the preliminary evaluation of their compatibility are also presented. A non parametric modelling approach was used for the individualisation (age and gender-specific) of the food data collected at household level, in the context of the national HBSs and the bootstrap technique was used for the derivation of 95% confidence intervals. For each food group, INS and HBS-derived mean values were calculated for twenty-four research units, jointly defined by country (four countries involved), gender (male, female) and age (younger, middle-aged and older). Pearson correlation coefficients were calculated. The results of this preliminary analysis show that there is considerable scope in the nutritional information derived from HBSs. Additional and more sophisticated work is however required, putting particular emphasis on addressing limitations present in both surveys and on deriving reliable individual consumption point and interval estimates, on the basis of HBS data.
Rat sperm motility analysis: methodologic considerations
The objective of these studies was to optimize conditions for computer-assisted sperm analysis (CASA) of rat epididymal spermatozoa. Methodologic issues addressed include sample collection technique, sampling region within the epididymis, type of diluent medium used, and sample c...
DESIGNING PROCESSES FOR ENVIRONMENTAL PROBLEMS
Designing for the environment requires consideration of environmental impacts. The Generalized WAR Algorithm is the methodology that allows the user to evaluate the potential environmental impact of the design of a chemical process. In this methodology, chemicals are assigned val...
Data Publication in the Meteorological Sciences: the OJIMS project
NASA Astrophysics Data System (ADS)
Callaghan, Sarah; Hewer, Fiona; Pepler, Sam; Hardaker, Paul; Gadian, Alan
2010-05-01
Historically speaking, scientific publication has mainly focussed on the analysis, interpretation and conclusions drawn from a given dataset, as these are the information that can be easily published in hard copy text format with the aid of diagrams. Examining the raw data that forms the dataset is often difficult to do, as datasets are usually stored in digital media, in a variety of (often proprietary or non-standard) formats. This means that the peer-review process is generally only applied to the methodology and final conclusions of a piece of work, and not the underlying data itself. Yet for the conclusions to stand, the data must be of good quality, and the peer-review process must be used to judge the data quality. Data publication, involving the peer-review of datasets, would be of benefit to many sectors of the academic community. For the data scientists, who often spend considerable time and effort ensuring that their data and metadata is complete, valid and stored in an accredited data repository, this would provide academic credit in the form of extra publications and citations. Data publication would benefit the wider community, allowing discovery and reuse of useful datasets, ensuring their curation and providing the best possible value for money. Overlay journals are a technology which is already being used to facilitate peer review and publication on-line. The Overlay Journal Infrastructure for Meteorological Sciences (OJIMS) Project aimed to develop the mechanisms that could support both a new (overlay) Journal of Meteorological Data and an Open-Access Repository for documents related to the meteorological sciences. The OJIMS project was conducted by a partnership between the UK's Royal Meteorological Society (RMetS) and two members of the National Centre for Atmospheric Science (NCAS), the British Atmospheric Data Centre (BADC) and the University of Leeds. Conference delegates at the NCAS Conference in Bristol of 8-10 December 2008 were invited to complete a survey to assess the potential implications for the meteorological sciences should a data journal and an open access subject repository be created and operated. Supervised run-throughs of a demonstrator Journal of Meteorological Data were also carried out by seven volunteers at the conference. The feedback from the surveys and demonstrations became part of the reports and recommendations produced by the project. This included discussion of the benefits to data creators, the review process, branding, version control and citations. The project concluded that standard online journal technologies are suitable for the development and operation of a data journal as they allow the use of all the functions of journals without the need to engineer new solutions. The user surveys and interviews also showed that there is a significant desire in the meteorological sciences community for a data journal.
ERIC Educational Resources Information Center
Diamond, Michael Jay; Shapiro, Jerrold Lee
This paper proposes a model for the long-term scientific study of encounter, T-, and sensitivity groups. The authors see the need for overcoming major methodological and design inadequacies of such research. They discuss major methodological flaws in group outcome research as including: (1) lack of adequate base rate or pretraining measures; (2)…
ERIC Educational Resources Information Center
Awsumb, Jessica M.
2017-01-01
This study examines post-school outcomes of youth with disabilities that were served by the Illinois vocational rehabilitation (VR) agency while in Chicago Public Schools (CPS) through a mixed methodology research design. In order to understand how outcomes differ among the study population, a large-scale dataset of the employment outcomes of…
BIOFRAG - a new database for analyzing BIOdiversity responses to forest FRAGmentation
M. Pfeifer; Tamara Heartsill Scalley
2014-01-01
Habitat fragmentation studies have produced complex results that are challenging to synthesize. Inconsistencies among studies may result from variation in the choice of landscape metrics and response variables, which is often compounded by a lack of key statistical or methodological information. Collating primary datasets on biodiversity responses to fragmentation in a...
ERIC Educational Resources Information Center
Welton, Anjale D.; Mansfield, Katherine Cumings; Lee, Pei-Ling; Young, Michelle D.
2015-01-01
An essential component to learning and teaching in educational leadership is mentoring graduate students for successful transition to K-12 and higher education positions. This study integrates quantitative and qualitative datasets to examine doctoral students' experiences with mentoring from macro and micro perspectives. Findings show that…
Analysing the Preferences of Prospective Students for Higher Education Institution Attributes
ERIC Educational Resources Information Center
Walsh, Sharon; Flannery, Darragh; Cullinan, John
2018-01-01
We utilise a dataset of students in their final year of upper secondary education in Ireland to provide a detailed examination of the preferences of prospective students for higher education institutions (HEIs). Our analysis is based upon a discrete choice experiment methodology with willingness to pay estimates derived for specific HEI attributes…
Service Delivery Experiences and Intervention Needs of Military Families with Children with ASD
ERIC Educational Resources Information Center
Davis, Jennifer M.; Finke, Erinn; Hickerson, Benjamin
2016-01-01
The purpose of this study was to describe the experiences of military families with children with autism spectrum disorder (ASD) specifically as it relates to relocation. Online survey methodology was used to gather information from military spouses with children with ASD. The finalized dataset included 189 cases. Descriptive statistics and…
Expanding Downward: Innovation, Diffusion, and State Policy Adoptions of Universal Preschool
ERIC Educational Resources Information Center
Curran, F. Chris
2015-01-01
Framed within the theoretical framework of policy innovation and diffusion, this study explores both interstate (diffusion) and intrastate predictors of adoption of state universal preschool policies. Event history analysis methodology is applied to a state level dataset drawn from the Census, the NCES Common Core, the Book of the States, and…
The Concepts of Informational Approach to the Management of Higher Education's Development
ERIC Educational Resources Information Center
Levina, Elena Y.; Voronina, Marianna V.; Rybolovleva, Alla A.; Sharafutdinova, Mariya M.; Zhandarova, Larisa F.; Avilova, Vilora V.
2016-01-01
The research urgency is caused by necessity to develop the informational support for management of development of higher education in conditions of high turbulence of external and internal environment. The purpose of the paper is the development of methodology for structuring and analyzing datasets of educational activities in order to reduce…
ERIC Educational Resources Information Center
Conelea, Christine A.; Woods, Douglas W.; Zinner, Samuel H.; Budman, Cathy; Murphy, Tanya; Scahill, Lawrence D.; Compton, Scott N.; Walkup, John
2011-01-01
Prior research has demonstrated that chronic tic disorders (CTD) are associated with functional impairment across several domains. However, methodological limitations, such as data acquired by parental report, datasets aggregated across child and adult samples, and small treatment-seeking samples, curtail interpretation. The current study explored…
Gulliford, Martin C; van Staa, Tjeerd P; McDermott, Lisa; McCann, Gerard; Charlton, Judith; Dregan, Alex
2014-06-11
There is growing interest in conducting clinical and cluster randomized trials through electronic health records. This paper reports on the methodological issues identified during the implementation of two cluster randomized trials using the electronic health records of the Clinical Practice Research Datalink (CPRD). Two trials were completed in primary care: one aimed to reduce inappropriate antibiotic prescribing for acute respiratory infection; the other aimed to increase physician adherence with secondary prevention interventions after first stroke. The paper draws on documentary records and trial datasets to report on the methodological experience with respect to research ethics and research governance approval, general practice recruitment and allocation, sample size calculation and power, intervention implementation, and trial analysis. We obtained research governance approvals from more than 150 primary care organizations in England, Wales, and Scotland. There were 104 CPRD general practices recruited to the antibiotic trial and 106 to the stroke trial, with the target number of practices being recruited within six months. Interventions were installed into practice information systems remotely over the internet. The mean number of participants per practice was 5,588 in the antibiotic trial and 110 in the stroke trial, with the coefficient of variation of practice sizes being 0.53 and 0.56 respectively. Outcome measures showed substantial correlations between the 12 months before, and after intervention, with coefficients ranging from 0.42 for diastolic blood pressure to 0.91 for proportion of consultations with antibiotics prescribed, defining practice and participant eligibility for analysis requires careful consideration. Cluster randomized trials may be performed efficiently in large samples from UK general practices using the electronic health records of a primary care database. The geographical dispersal of trial sites presents a difficulty for research governance approval and intervention implementation. Pretrial data analyses should inform trial design and analysis plans. Current Controlled Trials ISRCTN 47558792 and ISRCTN 35701810 (both registered on 17 March 2010).
2014-01-01
Background There is growing interest in conducting clinical and cluster randomized trials through electronic health records. This paper reports on the methodological issues identified during the implementation of two cluster randomized trials using the electronic health records of the Clinical Practice Research Datalink (CPRD). Methods Two trials were completed in primary care: one aimed to reduce inappropriate antibiotic prescribing for acute respiratory infection; the other aimed to increase physician adherence with secondary prevention interventions after first stroke. The paper draws on documentary records and trial datasets to report on the methodological experience with respect to research ethics and research governance approval, general practice recruitment and allocation, sample size calculation and power, intervention implementation, and trial analysis. Results We obtained research governance approvals from more than 150 primary care organizations in England, Wales, and Scotland. There were 104 CPRD general practices recruited to the antibiotic trial and 106 to the stroke trial, with the target number of practices being recruited within six months. Interventions were installed into practice information systems remotely over the internet. The mean number of participants per practice was 5,588 in the antibiotic trial and 110 in the stroke trial, with the coefficient of variation of practice sizes being 0.53 and 0.56 respectively. Outcome measures showed substantial correlations between the 12 months before, and after intervention, with coefficients ranging from 0.42 for diastolic blood pressure to 0.91 for proportion of consultations with antibiotics prescribed, defining practice and participant eligibility for analysis requires careful consideration. Conclusions Cluster randomized trials may be performed efficiently in large samples from UK general practices using the electronic health records of a primary care database. The geographical dispersal of trial sites presents a difficulty for research governance approval and intervention implementation. Pretrial data analyses should inform trial design and analysis plans. Trial registration Current Controlled Trials ISRCTN 47558792 and ISRCTN 35701810 (both registered on 17 March 2010). PMID:24919485
The cost of post-abortion care in developing countries: a comparative analysis of four studies.
Vlassoff, Michael; Singh, Susheela; Onda, Tsuyoshi
2016-10-01
Over the last five years, comprehensive national surveys of the cost of post-abortion care (PAC) to national health systems have been undertaken in Ethiopia, Uganda, Rwanda and Colombia using a specially developed costing methodology-the Post-abortion Care Costing Methodology (PACCM). The objective of this study is to expand the research findings of these four studies, making use of their extensive datasets. These studies offer the most complete and consistent estimates of the cost of PAC to date, and comparing their findings not only provides generalizable implications for health policies and programs, but also allows an assessment of the PACCM methodology. We find that the labor cost component varies widely: in Ethiopia and Colombia doctors spend about 30-60% more time with PAC patients than do nurses; in Uganda and Rwanda an opposite pattern is found. Labor costs range from I$42.80 in Uganda to I$301.30 in Colombia. The cost of drugs and supplies does not vary greatly, ranging from I$79 in Colombia to I$115 in Rwanda. Capital and overhead costs are substantial amounting to 52-68% of total PAC costs. Total costs per PAC case vary from I$334 in Rwanda to I$972 in Colombia. The financial burden of PAC is considerable: the expense of treating each PAC case is equivalent to around 35% of annual per capita income in Uganda, 29% in Rwanda and 11% in Colombia. Providing modern methods of contraception to women with an unmet need would cost just a fraction of the average expenditure on PAC: one year of modern contraceptive services and supplies cost only 3-12% of the average cost of treating a PAC patient. © The Author 2016. Published by Oxford University Press in association with The London School of Hygiene and Tropical Medicine.
The inland water macro-invertebrate occurrences in Flanders, Belgium.
Vannevel, Rudy; Brosens, Dimitri; Cooman, Ward De; Gabriels, Wim; Frank Lavens; Mertens, Joost; Vervaeke, Bart
2018-01-01
The Flanders Environment Agency (VMM) has been performing biological water quality assessments on inland waters in Flanders (Belgium) since 1989 and sediment quality assessments since 2000. The water quality monitoring network is a combined physico-chemical and biological network, the biological component focusing on macro-invertebrates. The sediment monitoring programme produces biological data to assess the sediment quality. Both monitoring programmes aim to provide index values, applying a similar conceptual methodology based on the presence of macro-invertebrates. The biological data obtained from both monitoring networks are consolidated in the VMM macro-invertebrates database and include identifications at family and genus level of the freshwater phyla Coelenterata, Platyhelminthes, Annelida, Mollusca, and Arthropoda. This paper discusses the content of this database, and the dataset published thereof: 282,309 records of 210 observed taxa from 4,140 monitoring sites located on 657 different water bodies, collected during 22,663 events. This paper provides some background information on the methodology, temporal and spatial coverage, and taxonomy, and describes the content of the dataset. The data are distributed as open data under the Creative Commons CC-BY license.
Poussin, Carine; Mathis, Carole; Alexopoulos, Leonidas G; Messinis, Dimitris E; Dulize, Rémi H J; Belcastro, Vincenzo; Melas, Ioannis N; Sakellaropoulos, Theodore; Rhrissorrakrai, Kahn; Bilal, Erhan; Meyer, Pablo; Talikka, Marja; Boué, Stéphanie; Norel, Raquel; Rice, John J; Stolovitzky, Gustavo; Ivanov, Nikolai V; Peitsch, Manuel C; Hoeng, Julia
2014-01-01
The biological responses to external cues such as drugs, chemicals, viruses and hormones, is an essential question in biomedicine and in the field of toxicology, and cannot be easily studied in humans. Thus, biomedical research has continuously relied on animal models for studying the impact of these compounds and attempted to ‘translate’ the results to humans. In this context, the SBV IMPROVER (Systems Biology Verification for Industrial Methodology for PROcess VErification in Research) collaborative initiative, which uses crowd-sourcing techniques to address fundamental questions in systems biology, invited scientists to deploy their own computational methodologies to make predictions on species translatability. A multi-layer systems biology dataset was generated that was comprised of phosphoproteomics, transcriptomics and cytokine data derived from normal human (NHBE) and rat (NRBE) bronchial epithelial cells exposed in parallel to more than 50 different stimuli under identical conditions. The present manuscript describes in detail the experimental settings, generation, processing and quality control analysis of the multi-layer omics dataset accessible in public repositories for further intra- and inter-species translation studies. PMID:25977767
Poussin, Carine; Mathis, Carole; Alexopoulos, Leonidas G; Messinis, Dimitris E; Dulize, Rémi H J; Belcastro, Vincenzo; Melas, Ioannis N; Sakellaropoulos, Theodore; Rhrissorrakrai, Kahn; Bilal, Erhan; Meyer, Pablo; Talikka, Marja; Boué, Stéphanie; Norel, Raquel; Rice, John J; Stolovitzky, Gustavo; Ivanov, Nikolai V; Peitsch, Manuel C; Hoeng, Julia
2014-01-01
The biological responses to external cues such as drugs, chemicals, viruses and hormones, is an essential question in biomedicine and in the field of toxicology, and cannot be easily studied in humans. Thus, biomedical research has continuously relied on animal models for studying the impact of these compounds and attempted to 'translate' the results to humans. In this context, the SBV IMPROVER (Systems Biology Verification for Industrial Methodology for PROcess VErification in Research) collaborative initiative, which uses crowd-sourcing techniques to address fundamental questions in systems biology, invited scientists to deploy their own computational methodologies to make predictions on species translatability. A multi-layer systems biology dataset was generated that was comprised of phosphoproteomics, transcriptomics and cytokine data derived from normal human (NHBE) and rat (NRBE) bronchial epithelial cells exposed in parallel to more than 50 different stimuli under identical conditions. The present manuscript describes in detail the experimental settings, generation, processing and quality control analysis of the multi-layer omics dataset accessible in public repositories for further intra- and inter-species translation studies.
McCann, Liza J; Pilkington, Clarissa A; Huber, Adam M; Ravelli, Angelo; Appelbe, Duncan; Kirkham, Jamie J; Williamson, Paula R; Aggarwal, Amita; Christopher-Stine, Lisa; Constantin, Tamas; Feldman, Brian M; Lundberg, Ingrid; Maillard, Sue; Mathiesen, Pernille; Murphy, Ruth; Pachman, Lauren M; Reed, Ann M; Rider, Lisa G; van Royen-Kerkof, Annet; Russo, Ricardo; Spinty, Stefan; Wedderburn, Lucy R; Beresford, Michael W
2018-02-01
This study aimed to develop consensus on an internationally agreed dataset for juvenile dermatomyositis (JDM), designed for clinical use, to enhance collaborative research and allow integration of data between centres. A prototype dataset was developed through a formal process that included analysing items within existing databases of patients with idiopathic inflammatory myopathies. This template was used to aid a structured multistage consensus process. Exploiting Delphi methodology, two web-based questionnaires were distributed to healthcare professionals caring for patients with JDM identified through email distribution lists of international paediatric rheumatology and myositis research groups. A separate questionnaire was sent to parents of children with JDM and patients with JDM, identified through established research networks and patient support groups. The results of these parallel processes informed a face-to-face nominal group consensus meeting of international myositis experts, tasked with defining the content of the dataset. This developed dataset was tested in routine clinical practice before review and finalisation. A dataset containing 123 items was formulated with an accompanying glossary. Demographic and diagnostic data are contained within form A collected at baseline visit only, disease activity measures are included within form B collected at every visit and disease damage items within form C collected at baseline and annual visits thereafter. Through a robust international process, a consensus dataset for JDM has been formulated that can capture disease activity and damage over time. This dataset can be incorporated into national and international collaborative efforts, including existing clinical research databases. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Hydrodynamic modelling and global datasets: Flow connectivity and SRTM data, a Bangkok case study.
NASA Astrophysics Data System (ADS)
Trigg, M. A.; Bates, P. B.; Michaelides, K.
2012-04-01
The rise in the global interconnected manufacturing supply chains requires an understanding and consistent quantification of flood risk at a global scale. Flood risk is often better quantified (or at least more precisely defined) in regions where there has been an investment in comprehensive topographical data collection such as LiDAR coupled with detailed hydrodynamic modelling. Yet in regions where these data and modelling are unavailable, the implications of flooding and the knock on effects for global industries can be dramatic, as evidenced by the recent floods in Bangkok, Thailand. There is a growing momentum in terms of global modelling initiatives to address this lack of a consistent understanding of flood risk and they will rely heavily on the application of available global datasets relevant to hydrodynamic modelling, such as Shuttle Radar Topography Mission (SRTM) data and its derivatives. These global datasets bring opportunities to apply consistent methodologies on an automated basis in all regions, while the use of coarser scale datasets also brings many challenges such as sub-grid process representation and downscaled hydrology data from global climate models. There are significant opportunities for hydrological science in helping define new, realistic and physically based methodologies that can be applied globally as well as the possibility of gaining new insights into flood risk through analysis of the many large datasets that will be derived from this work. We use Bangkok as a case study to explore some of the issues related to using these available global datasets for hydrodynamic modelling, with particular focus on using SRTM data to represent topography. Research has shown that flow connectivity on the floodplain is an important component in the dynamics of flood flows on to and off the floodplain, and indeed within different areas of the floodplain. A lack of representation of flow connectivity, often due to data resolution limitations, means that important subgrid processes are missing from hydrodynamic models leading to poor model predictive capabilities. Specifically here, the issue of flow connectivity during flood events is explored using geostatistical techniques to quantify the change of flow connectivity on floodplains due to grid rescaling methods. We also test whether this method of assessing connectivity can be used as new tool in the quantification of flood risk that moves beyond the simple flood extent approach, encapsulating threshold changes and data limitations.
De-identification Methods for Open Health Data: The Case of the Heritage Health Prize Claims Dataset
Arbuckle, Luk; Koru, Gunes; Eze, Benjamin; Gaudette, Lisa; Neri, Emilio; Rose, Sean; Howard, Jeremy; Gluck, Jonathan
2012-01-01
Background There are many benefits to open datasets. However, privacy concerns have hampered the widespread creation of open health data. There is a dearth of documented methods and case studies for the creation of public-use health data. We describe a new methodology for creating a longitudinal public health dataset in the context of the Heritage Health Prize (HHP). The HHP is a global data mining competition to predict, by using claims data, the number of days patients will be hospitalized in a subsequent year. The winner will be the team or individual with the most accurate model past a threshold accuracy, and will receive a US $3 million cash prize. HHP began on April 4, 2011, and ends on April 3, 2013. Objective To de-identify the claims data used in the HHP competition and ensure that it meets the requirements in the US Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule. Methods We defined a threshold risk consistent with the HIPAA Privacy Rule Safe Harbor standard for disclosing the competition dataset. Three plausible re-identification attacks that can be executed on these data were identified. For each attack the re-identification probability was evaluated. If it was deemed too high then a new de-identification algorithm was applied to reduce the risk to an acceptable level. We performed an actual evaluation of re-identification risk using simulated attacks and matching experiments to confirm the results of the de-identification and to test sensitivity to assumptions. The main metric used to evaluate re-identification risk was the probability that a record in the HHP data can be re-identified given an attempted attack. Results An evaluation of the de-identified dataset estimated that the probability of re-identifying an individual was .0084, below the .05 probability threshold specified for the competition. The risk was robust to violations of our initial assumptions. Conclusions It was possible to ensure that the probability of re-identification for a large longitudinal dataset was acceptably low when it was released for a global user community in support of an analytics competition. This is an example of, and methodology for, achieving open data principles for longitudinal health data. PMID:22370452
Variable Star Signature Classification using Slotted Symbolic Markov Modeling
NASA Astrophysics Data System (ADS)
Johnston, K. B.; Peter, A. M.
2017-01-01
With the advent of digital astronomy, new benefits and new challenges have been presented to the modern day astronomer. No longer can the astronomer rely on manual processing, instead the profession as a whole has begun to adopt more advanced computational means. This paper focuses on the construction and application of a novel time-domain signature extraction methodology and the development of a supporting supervised pattern classification algorithm for the identification of variable stars. A methodology for the reduction of stellar variable observations (time-domain data) into a novel feature space representation is introduced. The methodology presented will be referred to as Slotted Symbolic Markov Modeling (SSMM) and has a number of advantages which will be demonstrated to be beneficial; specifically to the supervised classification of stellar variables. It will be shown that the methodology outperformed a baseline standard methodology on a standardized set of stellar light curve data. The performance on a set of data derived from the LINEAR dataset will also be shown.
Variable Star Signature Classification using Slotted Symbolic Markov Modeling
NASA Astrophysics Data System (ADS)
Johnston, Kyle B.; Peter, Adrian M.
2016-01-01
With the advent of digital astronomy, new benefits and new challenges have been presented to the modern day astronomer. No longer can the astronomer rely on manual processing, instead the profession as a whole has begun to adopt more advanced computational means. Our research focuses on the construction and application of a novel time-domain signature extraction methodology and the development of a supporting supervised pattern classification algorithm for the identification of variable stars. A methodology for the reduction of stellar variable observations (time-domain data) into a novel feature space representation is introduced. The methodology presented will be referred to as Slotted Symbolic Markov Modeling (SSMM) and has a number of advantages which will be demonstrated to be beneficial; specifically to the supervised classification of stellar variables. It will be shown that the methodology outperformed a baseline standard methodology on a standardized set of stellar light curve data. The performance on a set of data derived from the LINEAR dataset will also be shown.
Panzer, Katrin; Yilmaz, Pelin; Weiß, Michael; Reich, Lothar; Richter, Michael; Wiese, Jutta; Schmaljohann, Rolf; Labes, Antje; Imhoff, Johannes F.; Glöckner, Frank Oliver; Reich, Marlis
2015-01-01
Molecular diversity surveys have demonstrated that aquatic fungi are highly diverse, and that they play fundamental ecological roles in aquatic systems. Unfortunately, comparative studies of aquatic fungal communities are few and far between, due to the scarcity of adequate datasets. We combined all publicly available fungal 18S ribosomal RNA (rRNA) gene sequences with new sequence data from a marine fungi culture collection. We further enriched this dataset by adding validated contextual data. Specifically, we included data on the habitat type of the samples assigning fungal taxa to ten different habitat categories. This dataset has been created with the intention to serve as a valuable reference dataset for aquatic fungi including a phylogenetic reference tree. The combined data enabled us to infer fungal community patterns in aquatic systems. Pairwise habitat comparisons showed significant phylogenetic differences, indicating that habitat strongly affects fungal community structure. Fungal taxonomic composition differed considerably even on phylum and class level. Freshwater fungal assemblage was most different from all other habitat types and was dominated by basal fungal lineages. For most communities, phylogenetic signals indicated clustering of sequences suggesting that environmental factors were the main drivers of fungal community structure, rather than species competition. Thus, the diversification process of aquatic fungi must be highly clade specific in some cases.The combined data enabled us to infer fungal community patterns in aquatic systems. Pairwise habitat comparisons showed significant phylogenetic differences, indicating that habitat strongly affects fungal community structure. Fungal taxonomic composition differed considerably even on phylum and class level. Freshwater fungal assemblage was most different from all other habitat types and was dominated by basal fungal lineages. For most communities, phylogenetic signals indicated clustering of sequences suggesting that environmental factors were the main drivers of fungal community structure, rather than species competition. Thus, the diversification process of aquatic fungi must be highly clade specific in some cases. PMID:26226014
Dupree, Jean A.; Crowfoot, Richard M.
2012-01-01
The drainage basin is a fundamental hydrologic entity used for studies of surface-water resources and during planning of water-related projects. Numeric drainage areas published by the U.S. Geological Survey water science centers in Annual Water Data Reports and on the National Water Information Systems (NWIS) Web site are still primarily derived from hard-copy sources and by manual delineation of polygonal basin areas on paper topographic map sheets. To expedite numeric drainage area determinations, the Colorado Water Science Center developed a digital database structure and a delineation methodology based on the hydrologic unit boundaries in the National Watershed Boundary Dataset. This report describes the digital database architecture and delineation methodology and also presents the results of a comparison of the numeric drainage areas derived using this digital methodology with those derived using traditional, non-digital methods. (Please see report for full Abstract)
UK surveillance: provision of quality assured information from combined datasets.
Paiba, G A; Roberts, S R; Houston, C W; Williams, E C; Smith, L H; Gibbens, J C; Holdship, S; Lysons, R
2007-09-14
Surveillance information is most useful when provided within a risk framework, which is achieved by presenting results against an appropriate denominator. Often the datasets are captured separately and for different purposes, and will have inherent errors and biases that can be further confounded by the act of merging. The United Kingdom Rapid Analysis and Detection of Animal-related Risks (RADAR) system contains data from several sources and provides both data extracts for research purposes and reports for wider stakeholders. Considerable efforts are made to optimise the data in RADAR during the Extraction, Transformation and Loading (ETL) process. Despite efforts to ensure data quality, the final dataset inevitably contains some data errors and biases, most of which cannot be rectified during subsequent analysis. So, in order for users to establish the 'fitness for purpose' of data merged from more than one data source, Quality Statements are produced as defined within the overarching surveillance Quality Framework. These documents detail identified data errors and biases following ETL and report construction as well as relevant aspects of the datasets from which the data originated. This paper illustrates these issues using RADAR datasets, and describes how they can be minimised.
NASA Astrophysics Data System (ADS)
Lal, Mohan; Mishra, S. K.; Pandey, Ashish; Pandey, R. P.; Meena, P. K.; Chaudhary, Anubhav; Jha, Ranjit Kumar; Shreevastava, Ajit Kumar; Kumar, Yogendra
2017-01-01
The Soil Conservation Service curve number (SCS-CN) method, also known as the Natural Resources Conservation Service curve number (NRCS-CN) method, is popular for computing the volume of direct surface runoff for a given rainfall event. The performance of the SCS-CN method, based on large rainfall (P) and runoff (Q) datasets of United States watersheds, is evaluated using a large dataset of natural storm events from 27 agricultural plots in India. On the whole, the CN estimates from the National Engineering Handbook (chapter 4) tables do not match those derived from the observed P and Q datasets. As a result, the runoff prediction using former CNs was poor for the data of 22 (out of 24) plots. However, the match was little better for higher CN values, consistent with the general notion that the existing SCS-CN method performs better for high rainfall-runoff (high CN) events. Infiltration capacity (fc) was the main explanatory variable for runoff (or CN) production in study plots as it exhibited the expected inverse relationship between CN and fc. The plot-data optimization yielded initial abstraction coefficient (λ) values from 0 to 0.659 for the ordered dataset and 0 to 0.208 for the natural dataset (with 0 as the most frequent value). Mean and median λ values were, respectively, 0.030 and 0 for the natural rainfall-runoff dataset and 0.108 and 0 for the ordered rainfall-runoff dataset. Runoff estimation was very sensitive to λ and it improved consistently as λ changed from 0.2 to 0.03.
An ontological system for interoperable spatial generalisation in biodiversity monitoring
NASA Astrophysics Data System (ADS)
Nieland, Simon; Moran, Niklas; Kleinschmit, Birgit; Förster, Michael
2015-11-01
Semantic heterogeneity remains a barrier to data comparability and standardisation of results in different fields of spatial research. Because of its thematic complexity, differing acquisition methods and national nomenclatures, interoperability of biodiversity monitoring information is especially difficult. Since data collection methods and interpretation manuals broadly vary there is a need for automatised, objective methodologies for the generation of comparable data-sets. Ontology-based applications offer vast opportunities in data management and standardisation. This study examines two data-sets of protected heathlands in Germany and Belgium which are based on remote sensing image classification and semantically formalised in an OWL2 ontology. The proposed methodology uses semantic relations of the two data-sets, which are (semi-)automatically derived from remote sensing imagery, to generate objective and comparable information about the status of protected areas by utilising kernel-based spatial reclassification. This automatised method suggests a generalisation approach, which is able to generate delineation of Special Areas of Conservation (SAC) of the European biodiversity Natura 2000 network. Furthermore, it is able to transfer generalisation rules between areas surveyed with varying acquisition methods in different countries by taking into account automated inference of the underlying semantics. The generalisation results were compared with the manual delineation of terrestrial monitoring. For the different habitats in the two sites an accuracy of above 70% was detected. However, it has to be highlighted that the delineation of the ground-truth data inherits a high degree of uncertainty, which is discussed in this study.
Coalescence computations for large samples drawn from populations of time-varying sizes
Polanski, Andrzej; Szczesna, Agnieszka; Garbulowski, Mateusz; Kimmel, Marek
2017-01-01
We present new results concerning probability distributions of times in the coalescence tree and expected allele frequencies for coalescent with large sample size. The obtained results are based on computational methodologies, which involve combining coalescence time scale changes with techniques of integral transformations and using analytical formulae for infinite products. We show applications of the proposed methodologies for computing probability distributions of times in the coalescence tree and their limits, for evaluation of accuracy of approximate expressions for times in the coalescence tree and expected allele frequencies, and for analysis of large human mitochondrial DNA dataset. PMID:28170404
From Sky to Earth: Data Science Methodology Transfer
NASA Astrophysics Data System (ADS)
Mahabal, Ashish A.; Crichton, Daniel; Djorgovski, S. G.; Law, Emily; Hughes, John S.
2017-06-01
We describe here the parallels in astronomy and earth science datasets, their analyses, and the opportunities for methodology transfer from astroinformatics to geoinformatics. Using example of hydrology, we emphasize how meta-data and ontologies are crucial in such an undertaking. Using the infrastructure being designed for EarthCube - the Virtual Observatory for the earth sciences - we discuss essential steps for better transfer of tools and techniques in the future e.g. domain adaptation. Finally we point out that it is never a one-way process and there is enough for astroinformatics to learn from geoinformatics as well.
Site-conditions map for Portugal based on VS measurements: methodology and final model
NASA Astrophysics Data System (ADS)
Vilanova, Susana; Narciso, João; Carvalho, João; Lopes, Isabel; Quinta Ferreira, Mario; Moura, Rui; Borges, José; Nemser, Eliza; Pinto, carlos
2017-04-01
In this paper we present a statistically significant site-condition model for Portugal based on shear-wave velocity (VS) data and surface geology. We also evaluate the performance of commonly used Vs30 proxies based on exogenous data and analyze the implications of using those proxies for calculating site amplification in seismic hazard assessment. The dataset contains 161 Vs profiles acquired in Portugal in the context of research projects, technical reports, academic thesis and academic papers. The methodologies involved in characterizing the Vs structure at the sites in the database include seismic refraction, multichannel analysis of seismic waves and refraction microtremor. Invasive measurements were performed in selected locations in order to compare the Vs profiles obtained from both invasive and non-invasive techniques. In general there was good agreement in the subsurface structure of Vs30 obtained from the different methodologies. The database flat-file includes information on Vs30, surface geology at 1:50.000 and 1:500.000 scales, elevation and topographic slope and based on SRTM30 topographic dataset. The procedure used to develop the site-conditions map is based on a three-step process that includes defining a preliminary set of geological units based on the literature, performing statistical tests to assess whether or not the differences in the distributions of Vs30 are statistically significant, and merging of the geological units accordingly. The dataset was, to some extent, affected by clustering and/or preferential sampling and therefore a declustering algorithm was applied. The final model includes three geological units: 1) Igneous, metamorphic and old (Paleogene and Mesozoic) sedimentary rocks; 2) Neogene and Pleistocene formations, and 3) Holocene formations. The evaluation of proxies indicates that although geological analogues and topographic slope are in general unbiased, the latter shows significant bias for particular geological units and subsequently for some geographical regions.
Developing a new global network of river reaches from merged satellite-derived datasets
NASA Astrophysics Data System (ADS)
Lion, C.; Allen, G. H.; Beighley, E.; Pavelsky, T.
2015-12-01
In 2020, the Surface Water and Ocean Topography satellite (SWOT), a joint mission of NASA/CNES/CSA/UK will be launched. One of its major products will be the measurements of continental water extent, including the width, height, and slope of rivers and the surface area and elevations of lakes. The mission will improve the monitoring of continental water and also our understanding of the interactions between different hydrologic reservoirs. For rivers, SWOT measurements of slope must be carried out over predefined river reaches. As such, an a priori dataset for rivers is needed in order to facilitate analysis of the raw SWOT data. The information required to produce this dataset includes measurements of river width, elevation, slope, planform, river network topology, and flow accumulation. To produce this product, we have linked two existing global datasets: the Global River Widths from Landsat (GRWL) database, which contains river centerline locations, widths, and a braiding index derived from Landsat imagery, and a modified version of the HydroSHEDS hydrologically corrected digital elevation product, which contains heights and flow accumulation measurements for streams at 3 arcsecond spatial resolution. Merging these two datasets requires considerable care. The difficulties, among others, lie in the difference of resolution: 30m versus 3 arseconds, and the age of the datasets: 2000 versus ~2010 (some rivers have moved, the braided sections are different). As such, we have developed custom software to merge the two datasets, taking into account the spatial proximity of river channels in the two datasets and ensuring that flow accumulation in the final dataset always increases downstream. Here, we present our preliminary results for a portion of South America and demonstrate the strengths and weaknesses of the method.
Comprehensive curation and analysis of global interaction networks in Saccharomyces cerevisiae
Reguly, Teresa; Breitkreutz, Ashton; Boucher, Lorrie; Breitkreutz, Bobby-Joe; Hon, Gary C; Myers, Chad L; Parsons, Ainslie; Friesen, Helena; Oughtred, Rose; Tong, Amy; Stark, Chris; Ho, Yuen; Botstein, David; Andrews, Brenda; Boone, Charles; Troyanskya, Olga G; Ideker, Trey; Dolinski, Kara; Batada, Nizar N; Tyers, Mike
2006-01-01
Background The study of complex biological networks and prediction of gene function has been enabled by high-throughput (HTP) methods for detection of genetic and protein interactions. Sparse coverage in HTP datasets may, however, distort network properties and confound predictions. Although a vast number of well substantiated interactions are recorded in the scientific literature, these data have not yet been distilled into networks that enable system-level inference. Results We describe here a comprehensive database of genetic and protein interactions, and associated experimental evidence, for the budding yeast Saccharomyces cerevisiae, as manually curated from over 31,793 abstracts and online publications. This literature-curated (LC) dataset contains 33,311 interactions, on the order of all extant HTP datasets combined. Surprisingly, HTP protein-interaction datasets currently achieve only around 14% coverage of the interactions in the literature. The LC network nevertheless shares attributes with HTP networks, including scale-free connectivity and correlations between interactions, abundance, localization, and expression. We find that essential genes or proteins are enriched for interactions with other essential genes or proteins, suggesting that the global network may be functionally unified. This interconnectivity is supported by a substantial overlap of protein and genetic interactions in the LC dataset. We show that the LC dataset considerably improves the predictive power of network-analysis approaches. The full LC dataset is available at the BioGRID () and SGD () databases. Conclusion Comprehensive datasets of biological interactions derived from the primary literature provide critical benchmarks for HTP methods, augment functional prediction, and reveal system-level attributes of biological networks. PMID:16762047
Jayaraman, Jayakumar; Wong, Hai Ming; King, Nigel M; Roberts, Graham J
2013-07-01
Estimation of age of an individual can be performed by evaluating the pattern of dental development. A dataset for age estimation based on the dental maturity of a French-Canadian population was published over 35 years ago and has become the most widely accepted dataset. The applicability of this dataset has been tested on different population groups. To estimate the observed differences between Chronological age (CA) and Dental age (DA) when the French Canadian dataset was used to estimate the age of different population groups. A systematic search of literature for papers utilizing the French Canadian dataset for age estimation was performed. All language articles from PubMed, Embase and Cochrane databases were electronically searched for terms 'Demirjian' and 'Dental age' published between January 1973 and December 2011. A hand search of articles was also conducted. A total of 274 studies were identified from which 34 studies were included for qualitative analysis and 12 studies were included for quantitative assessment and meta-analysis. When synthesizing the estimation results from different population groups, on average, the Demirjian dataset overestimated the age of females by 0.65 years (-0.10 years to +2.82 years) and males by 0.60 years (-0.23 years to +3.04 years). The French Canadian dataset overestimates the age of the subjects by more than six months and hence this dataset should be used only with considerable caution when estimating age of group of subjects of any global population. Copyright © 2013 Elsevier Ltd and Faculty of Forensic and Legal Medicine. All rights reserved.
A global wind resource atlas including high-resolution terrain effects
NASA Astrophysics Data System (ADS)
Hahmann, Andrea; Badger, Jake; Olsen, Bjarke; Davis, Neil; Larsen, Xiaoli; Badger, Merete
2015-04-01
Currently no accurate global wind resource dataset is available to fill the needs of policy makers and strategic energy planners. Evaluating wind resources directly from coarse resolution reanalysis datasets underestimate the true wind energy resource, as the small-scale spatial variability of winds is missing. This missing variability can account for a large part of the local wind resource. Crucially, it is the windiest sites that suffer the largest wind resource errors: in simple terrain the windiest sites may be underestimated by 25%, in complex terrain the underestimate can be as large as 100%. The small-scale spatial variability of winds can be modelled using novel statistical methods and by application of established microscale models within WAsP developed at DTU Wind Energy. We present the framework for a single global methodology, which is relative fast and economical to complete. The method employs reanalysis datasets, which are downscaled to high-resolution wind resource datasets via a so-called generalization step, and microscale modelling using WAsP. This method will create the first global wind atlas (GWA) that covers all land areas (except Antarctica) and 30 km coastal zone over water. Verification of the GWA estimates will be done at carefully selected test regions, against verified estimates from mesoscale modelling and satellite synthetic aperture radar (SAR). This verification exercise will also help in the estimation of the uncertainty of the new wind climate dataset. Uncertainty will be assessed as a function of spatial aggregation. It is expected that the uncertainty at verification sites will be larger than that of dedicated assessments, but the uncertainty will be reduced at levels of aggregation appropriate for energy planning, and importantly much improved relative to what is used today. In this presentation we discuss the methodology used, which includes the generalization of wind climatologies, and the differences in local and spatially aggregated wind resources that result from using different reanalyses in the various verification regions. A prototype web interface for the public access to the data will also be showcased.
Measurement properties of comorbidity indices in maternal health research: a systematic review.
Aoyama, Kazuyoshi; D'Souza, Rohan; Inada, Eiichi; Lapinsky, Stephen E; Fowler, Robert A
2017-11-13
Maternal critical illness occurs in 1.2 to 4.7 of every 1000 live births in the United States and approximately 1 in 100 women who become critically ill will die. Patient characteristics and comorbid conditions are commonly summarized as an index or score for the purpose of predicting the likelihood of dying; however, most such indices have arisen from non-pregnant patient populations. We sought to systematically review comorbidity indices used in health administrative datasets of pregnant women, in order to critically appraise their measurement properties and recommend optimal tools for clinicians and maternal health researchers. We conducted a systematic search of MEDLINE and EMBASE to identify studies published from 1946 and 1947, respectively, to May 2017 that describe predictive validity of comorbidity indices using health administrative datasets in the field of maternal health research. We applied a methodological PubMed search filter to identify all studies of measurement properties for each index. Our initial search retrieved 8944 citations. The full text of 61 articles were identified and assessed for final eligibility. Finally, two eligible articles, describing three comorbidity indices appropriate for health administrative data remained: The Maternal comorbidity index, the Charlson comorbidity index and the Elixhauser Comorbidity Index. These studies of identified indices had a low risk of bias. The lack of an established consensus-building methodology in generating each index resulted in marginal sensibility for all indices. Only the Maternal Comorbidity Index was derived and validated specifically from a cohort of pregnant and postpartum women, using an administrative dataset, and had an associated c-statistic of 0.675 (95% Confidence Interval 0.647-0.666) in predicting mortality. Only the Maternal Comorbidity Index directly evaluated measurement properties relevant to pregnant women in health administrative datasets; however, it has only modest predictive ability for mortality among development and validation studies. Further research to investigate the feasibility of applying this index in clinical research, and its reliability across a variety of health administrative datasets would be incrementally helpful. Evolution of this and other tools for risk prediction and risk adjustment in pregnant and post-partum patients is an important area for ongoing study.
Fernández, M D; López, J C; Baeza, E; Céspedes, A; Meca, D E; Bailey, B
2015-08-01
A typical meteorological year (TMY) represents the typical meteorological conditions over many years but still contains the short term fluctuations which are absent from long-term averaged data. Meteorological data were measured at the Experimental Station of Cajamar 'Las Palmerillas' (Cajamar Foundation) in Almeria, Spain, over 19 years at the meteorological station and in a reference greenhouse which is typical of those used in the region. The two sets of measurements were subjected to quality control analysis and then used to create TMY datasets using three different methodologies proposed in the literature. Three TMY datasets were generated for the external conditions and two for the greenhouse. They were assessed by using each as input to seven horticultural models and comparing the model results with those obtained by experiment in practical trials. In addition, the models were used with the meteorological data recorded during the trials. A scoring system was used to identify the best performing TMY in each application and then rank them in overall performance. The best methodology was that of Argiriou for both greenhouse and external conditions. The average relative errors between the seasonal values estimated using the 19-year dataset and those using the Argiriou greenhouse TMY were 2.2 % (reference evapotranspiration), -0.45 % (pepper crop transpiration), 3.4 % (pepper crop nitrogen uptake) and 0.8 % (green bean yield). The values obtained using the Argiriou external TMY were 1.8 % (greenhouse reference evapotranspiration), 0.6 % (external reference evapotranspiration), 4.7 % (greenhouse heat requirement) and 0.9 % (loquat harvest date). Using the models with the 19 individual years in the historical dataset showed that the year to year weather variability gave results which differed from the average values by ± 15 %. By comparison with results from other greenhouses it was shown that the greenhouse TMY is applicable to greenhouses which have a solar radiation transmission of approximately 65 % and rely on manual control of ventilation which constitute the majority in the south-east of Spain and in most Mediterranean greenhouse areas.
NASA Astrophysics Data System (ADS)
Fernández, M. D.; López, J. C.; Baeza, E.; Céspedes, A.; Meca, D. E.; Bailey, B.
2015-08-01
A typical meteorological year (TMY) represents the typical meteorological conditions over many years but still contains the short term fluctuations which are absent from long-term averaged data. Meteorological data were measured at the Experimental Station of Cajamar `Las Palmerillas' (Cajamar Foundation) in Almeria, Spain, over 19 years at the meteorological station and in a reference greenhouse which is typical of those used in the region. The two sets of measurements were subjected to quality control analysis and then used to create TMY datasets using three different methodologies proposed in the literature. Three TMY datasets were generated for the external conditions and two for the greenhouse. They were assessed by using each as input to seven horticultural models and comparing the model results with those obtained by experiment in practical trials. In addition, the models were used with the meteorological data recorded during the trials. A scoring system was used to identify the best performing TMY in each application and then rank them in overall performance. The best methodology was that of Argiriou for both greenhouse and external conditions. The average relative errors between the seasonal values estimated using the 19-year dataset and those using the Argiriou greenhouse TMY were 2.2 % (reference evapotranspiration), -0.45 % (pepper crop transpiration), 3.4 % (pepper crop nitrogen uptake) and 0.8 % (green bean yield). The values obtained using the Argiriou external TMY were 1.8 % (greenhouse reference evapotranspiration), 0.6 % (external reference evapotranspiration), 4.7 % (greenhouse heat requirement) and 0.9 % (loquat harvest date). Using the models with the 19 individual years in the historical dataset showed that the year to year weather variability gave results which differed from the average values by ± 15 %. By comparison with results from other greenhouses it was shown that the greenhouse TMY is applicable to greenhouses which have a solar radiation transmission of approximately 65 % and rely on manual control of ventilation which constitute the majority in the south-east of Spain and in most Mediterranean greenhouse areas.
Methodological considerations for designing a community water fluoridation cessation study.
Singhal, Sonica; Farmer, Julie; McLaren, Lindsay
2017-06-01
High-quality, up-to-date research on community water fluoridation (CWF), and especially on the implications of CWF cessation for dental health, is limited. Although CWF cessation studies have been conducted, they are few in number; one of the major reasons is the methodological complexity of conducting such a study. This article draws on a systematic review of existing cessation studies (n=15) to explore methodological considerations of conducting CWF cessation studies in future. We review nine important methodological aspects (study design, comparison community, target population, time frame, sampling strategy, clinical indicators, assessment criteria, covariates and biomarkers) and provide recommendations for planning future CWF cessation studies that examine effects on dental caries. There is no one ideal study design to answer a research question. However, recommendations proposed regarding methodological aspects to conduct an epidemiological study to observe the effects of CWF cessation on dental caries, coupled with our identification of important methodological gaps, will be useful for researchers who are looking to optimize resources to conduct such a study with standards of rigour. © 2017 Her Majesty the Queen in Right of Canada. Community Dentistry and Oral Epidemiology © 2017 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Prioritization Methodology for Chemical Replacement
NASA Technical Reports Server (NTRS)
Cruit, W.; Schutzenhofer, S.; Goldberg, B.; Everhart, K.
1993-01-01
This project serves to define an appropriate methodology for effective prioritization of efforts required to develop replacement technologies mandated by imposed and forecast legislation. The methodology used is a semiquantitative approach derived from quality function deployment techniques (QFD Matrix). This methodology aims to weigh the full environmental, cost, safety, reliability, and programmatic implications of replacement technology development to allow appropriate identification of viable candidates and programmatic alternatives. The results are being implemented as a guideline for consideration for current NASA propulsion systems.
Saeed, Mohammad
2017-05-01
Systemic lupus erythematosus (SLE) is a complex disorder. Genetic association studies of complex disorders suffer from the following three major issues: phenotypic heterogeneity, false positive (type I error), and false negative (type II error) results. Hence, genes with low to moderate effects are missed in standard analyses, especially after statistical corrections. OASIS is a novel linkage disequilibrium clustering algorithm that can potentially address false positives and negatives in genome-wide association studies (GWAS) of complex disorders such as SLE. OASIS was applied to two SLE dbGAP GWAS datasets (6077 subjects; ∼0.75 million single-nucleotide polymorphisms). OASIS identified three known SLE genes viz. IFIH1, TNIP1, and CD44, not previously reported using these GWAS datasets. In addition, 22 novel loci for SLE were identified and the 5 SLE genes previously reported using these datasets were verified. OASIS methodology was validated using single-variant replication and gene-based analysis with GATES. This led to the verification of 60% of OASIS loci. New SLE genes that OASIS identified and were further verified include TNFAIP6, DNAJB3, TTF1, GRIN2B, MON2, LATS2, SNX6, RBFOX1, NCOA3, and CHAF1B. This study presents the OASIS algorithm, software, and the meta-analyses of two publicly available SLE GWAS datasets along with the novel SLE genes. Hence, OASIS is a novel linkage disequilibrium clustering method that can be universally applied to existing GWAS datasets for the identification of new genes.
Classification of Alzheimer's Patients through Ubiquitous Computing.
Nieto-Reyes, Alicia; Duque, Rafael; Montaña, José Luis; Lage, Carmen
2017-07-21
Functional data analysis and artificial neural networks are the building blocks of the proposed methodology that distinguishes the movement patterns among c's patients on different stages of the disease and classifies new patients to their appropriate stage of the disease. The movement patterns are obtained by the accelerometer device of android smartphones that the patients carry while moving freely. The proposed methodology is relevant in that it is flexible on the type of data to which it is applied. To exemplify that, it is analyzed a novel real three-dimensional functional dataset where each datum is observed in a different time domain. Not only is it observed on a difference frequency but also the domain of each datum has different length. The obtained classification success rate of 83 % indicates the potential of the proposed methodology.
REM-3D Reference Datasets: Reconciling large and diverse compilations of travel-time observations
NASA Astrophysics Data System (ADS)
Moulik, P.; Lekic, V.; Romanowicz, B. A.
2017-12-01
A three-dimensional Reference Earth model (REM-3D) should ideally represent the consensus view of long-wavelength heterogeneity in the Earth's mantle through the joint modeling of large and diverse seismological datasets. This requires reconciliation of datasets obtained using various methodologies and identification of consistent features. The goal of REM-3D datasets is to provide a quality-controlled and comprehensive set of seismic observations that would not only enable construction of REM-3D, but also allow identification of outliers and assist in more detailed studies of heterogeneity. The community response to data solicitation has been enthusiastic with several groups across the world contributing recent measurements of normal modes, (fundamental mode and overtone) surface waves, and body waves. We present results from ongoing work with body and surface wave datasets analyzed in consultation with a Reference Dataset Working Group. We have formulated procedures for reconciling travel-time datasets that include: (1) quality control for salvaging missing metadata; (2) identification of and reasons for discrepant measurements; (3) homogenization of coverage through the construction of summary rays; and (4) inversions of structure at various wavelengths to evaluate inter-dataset consistency. In consultation with the Reference Dataset Working Group, we retrieved the station and earthquake metadata in several legacy compilations and codified several guidelines that would facilitate easy storage and reproducibility. We find strong agreement between the dispersion measurements of fundamental-mode Rayleigh waves, particularly when made using supervised techniques. The agreement deteriorates substantially in surface-wave overtones, for which discrepancies vary with frequency and overtone number. A half-cycle band of discrepancies is attributed to reversed instrument polarities at a limited number of stations, which are not reflected in the instrument response history. By assessing inter-dataset consistency across similar paths, we quantify travel-time measurement errors for both surface and body waves. Finally, we discuss challenges associated with combining high frequency ( 1 Hz) and long period (10-20s) body-wave measurements into the REM-3D reference dataset.
Benchmarking of Typical Meteorological Year datasets dedicated to Concentrated-PV systems
NASA Astrophysics Data System (ADS)
Realpe, Ana Maria; Vernay, Christophe; Pitaval, Sébastien; Blanc, Philippe; Wald, Lucien; Lenoir, Camille
2016-04-01
Accurate analysis of meteorological and pyranometric data for long-term analysis is the basis of decision-making for banks and investors, regarding solar energy conversion systems. This has led to the development of methodologies for the generation of Typical Meteorological Years (TMY) datasets. The most used method for solar energy conversion systems was proposed in 1978 by the Sandia Laboratory (Hall et al., 1978) considering a specific weighted combination of different meteorological variables with notably global, diffuse horizontal and direct normal irradiances, air temperature, wind speed, relative humidity. In 2012, a new approach was proposed in the framework of the European project FP7 ENDORSE. It introduced the concept of "driver" that is defined by the user as an explicit function of the pyranometric and meteorological relevant variables to improve the representativeness of the TMY datasets with respect the specific solar energy conversion system of interest. The present study aims at comparing and benchmarking different TMY datasets considering a specific Concentrated-PV (CPV) system as the solar energy conversion system of interest. Using long-term (15+ years) time-series of high quality meteorological and pyranometric ground measurements, three types of TMY datasets generated by the following methods: the Sandia method, a simplified driver with DNI as the only representative variable and a more sophisticated driver. The latter takes into account the sensitivities of the CPV system with respect to the spectral distribution of the solar irradiance and wind speed. Different TMY datasets from the three methods have been generated considering different numbers of years in the historical dataset, ranging from 5 to 15 years. The comparisons and benchmarking of these TMY datasets are conducted considering the long-term time series of simulated CPV electric production as a reference. The results of this benchmarking clearly show that the Sandia method is not suitable for CPV systems. For these systems, the TMY datasets obtained using dedicated drivers (DNI only or more precise one) are more representative to derive TMY datasets from limited long-term meteorological dataset.
Investigating Teacher Stress when Using Technology
ERIC Educational Resources Information Center
Al-Fudail, Mohammed; Mellar, Harvey
2008-01-01
In this study we use a model which we refer to as the "teacher-technology environment interaction model" to explore the issue of the stress experienced by teachers whilst using ICT in the classroom. The methodology we used involved a comparison of three datasets obtained from: direct observation and video-logging of the teachers in the classroom;…
ERIC Educational Resources Information Center
Munn, Sunny L.
2016-01-01
Organizational structures are comprised of an organizational culture created by the beliefs, values, traditions, policies and processes carried out by the organization. The work-life system in which individuals use work-life initiatives to achieve a work-life balance can be influenced by the type of organizational culture within one's workplace,…
Integrative missing value estimation for microarray data.
Hu, Jianjun; Li, Haifeng; Waterman, Michael S; Zhou, Xianghong Jasmine
2006-10-12
Missing value estimation is an important preprocessing step in microarray analysis. Although several methods have been developed to solve this problem, their performance is unsatisfactory for datasets with high rates of missing data, high measurement noise, or limited numbers of samples. In fact, more than 80% of the time-series datasets in Stanford Microarray Database contain less than eight samples. We present the integrative Missing Value Estimation method (iMISS) by incorporating information from multiple reference microarray datasets to improve missing value estimation. For each gene with missing data, we derive a consistent neighbor-gene list by taking reference data sets into consideration. To determine whether the given reference data sets are sufficiently informative for integration, we use a submatrix imputation approach. Our experiments showed that iMISS can significantly and consistently improve the accuracy of the state-of-the-art Local Least Square (LLS) imputation algorithm by up to 15% improvement in our benchmark tests. We demonstrated that the order-statistics-based integrative imputation algorithms can achieve significant improvements over the state-of-the-art missing value estimation approaches such as LLS and is especially good for imputing microarray datasets with a limited number of samples, high rates of missing data, or very noisy measurements. With the rapid accumulation of microarray datasets, the performance of our approach can be further improved by incorporating larger and more appropriate reference datasets.
Jalbert, Jessica J; Ritchey, Mary Elizabeth; Mi, Xiaojuan; Chen, Chih-Ying; Hammill, Bradley G; Curtis, Lesley H; Setoguchi, Soko
2014-11-01
Medical devices play a vital role in diagnosing, treating, and preventing diseases and are an integral part of the health-care system. Many devices, including implantable medical devices, enter the market through a regulatory pathway that was not designed to assure safety and effectiveness. Several recent studies and high-profile device recalls have demonstrated the need for well-designed, valid postmarketing studies of medical devices. Medical device epidemiology is a relatively new field compared with pharmacoepidemiology, which for decades has been developed to assess the safety and effectiveness of medications. Many methodological considerations in pharmacoepidemiology apply to medical device epidemiology. Fundamental differences in mechanisms of action and use and in how exposure data are captured mean that comparative effectiveness studies of medical devices often necessitate additional and different considerations. In this paper, we discuss some of the most salient issues encountered in conducting comparative effectiveness research on implantable devices. We discuss special methodological considerations regarding the use of data sources, exposure and outcome definitions, timing of exposure, and sources of bias. © The Author 2014. Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of Public Health. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Larriba, Yolanda; Rueda, Cristina; Fernández, Miguel A; Peddada, Shyamal D
2018-01-01
Motivation: Gene-expression data obtained from high throughput technologies are subject to various sources of noise and accordingly the raw data are pre-processed before formally analyzed. Normalization of the data is a key pre-processing step, since it removes systematic variations across arrays. There are numerous normalization methods available in the literature. Based on our experience, in the context of oscillatory systems, such as cell-cycle, circadian clock, etc., the choice of the normalization method may substantially impact the determination of a gene to be rhythmic. Thus rhythmicity of a gene can purely be an artifact of how the data were normalized. Since the determination of rhythmic genes is an important component of modern toxicological and pharmacological studies, it is important to determine truly rhythmic genes that are robust to the choice of a normalization method. Results: In this paper we introduce a rhythmicity measure and a bootstrap methodology to detect rhythmic genes in an oscillatory system. Although the proposed methodology can be used for any high-throughput gene expression data, in this paper we illustrate the proposed methodology using several publicly available circadian clock microarray gene-expression datasets. We demonstrate that the choice of normalization method has very little effect on the proposed methodology. Specifically, for any pair of normalization methods considered in this paper, the resulting values of the rhythmicity measure are highly correlated. Thus it suggests that the proposed measure is robust to the choice of a normalization method. Consequently, the rhythmicity of a gene is potentially not a mere artifact of the normalization method used. Lastly, as demonstrated in the paper, the proposed bootstrap methodology can also be used for simulating data for genes participating in an oscillatory system using a reference dataset. Availability: A user friendly code implemented in R language can be downloaded from http://www.eio.uva.es/~miguel/robustdetectionprocedure.html.
Larriba, Yolanda; Rueda, Cristina; Fernández, Miguel A.; Peddada, Shyamal D.
2018-01-01
Motivation: Gene-expression data obtained from high throughput technologies are subject to various sources of noise and accordingly the raw data are pre-processed before formally analyzed. Normalization of the data is a key pre-processing step, since it removes systematic variations across arrays. There are numerous normalization methods available in the literature. Based on our experience, in the context of oscillatory systems, such as cell-cycle, circadian clock, etc., the choice of the normalization method may substantially impact the determination of a gene to be rhythmic. Thus rhythmicity of a gene can purely be an artifact of how the data were normalized. Since the determination of rhythmic genes is an important component of modern toxicological and pharmacological studies, it is important to determine truly rhythmic genes that are robust to the choice of a normalization method. Results: In this paper we introduce a rhythmicity measure and a bootstrap methodology to detect rhythmic genes in an oscillatory system. Although the proposed methodology can be used for any high-throughput gene expression data, in this paper we illustrate the proposed methodology using several publicly available circadian clock microarray gene-expression datasets. We demonstrate that the choice of normalization method has very little effect on the proposed methodology. Specifically, for any pair of normalization methods considered in this paper, the resulting values of the rhythmicity measure are highly correlated. Thus it suggests that the proposed measure is robust to the choice of a normalization method. Consequently, the rhythmicity of a gene is potentially not a mere artifact of the normalization method used. Lastly, as demonstrated in the paper, the proposed bootstrap methodology can also be used for simulating data for genes participating in an oscillatory system using a reference dataset. Availability: A user friendly code implemented in R language can be downloaded from http://www.eio.uva.es/~miguel/robustdetectionprocedure.html PMID:29456555
Climate Model Diagnostic Analyzer Web Service System
NASA Astrophysics Data System (ADS)
Lee, S.; Pan, L.; Zhai, C.; Tang, B.; Kubar, T. L.; Li, J.; Zhang, J.; Wang, W.
2015-12-01
Both the National Research Council Decadal Survey and the latest Intergovernmental Panel on Climate Change Assessment Report stressed the need for the comprehensive and innovative evaluation of climate models with the synergistic use of global satellite observations in order to improve our weather and climate simulation and prediction capabilities. The abundance of satellite observations for fundamental climate parameters and the availability of coordinated model outputs from CMIP5 for the same parameters offer a great opportunity to understand and diagnose model biases in climate models. In addition, the Obs4MIPs efforts have created several key global observational datasets that are readily usable for model evaluations. However, a model diagnostic evaluation process requires physics-based multi-variable comparisons that typically involve large-volume and heterogeneous datasets, making them both computationally- and data-intensive. In response, we have developed a novel methodology to diagnose model biases in contemporary climate models and implementing the methodology as a web-service based, cloud-enabled, provenance-supported climate-model evaluation system. The evaluation system is named Climate Model Diagnostic Analyzer (CMDA), which is the product of the research and technology development investments of several current and past NASA ROSES programs. The current technologies and infrastructure of CMDA are designed and selected to address several technical challenges that the Earth science modeling and model analysis community faces in evaluating and diagnosing climate models. In particular, we have three key technology components: (1) diagnostic analysis methodology; (2) web-service based, cloud-enabled technology; (3) provenance-supported technology. The diagnostic analysis methodology includes random forest feature importance ranking, conditional probability distribution function, conditional sampling, and time-lagged correlation map. We have implemented the new methodology as web services and incorporated the system into the Cloud. We have also developed a provenance management system for CMDA where CMDA service semantics modeling, service search and recommendation, and service execution history management are designed and implemented.
1985-11-26
etc.).., Major decisions involving reliability ptudies, based on competing risk methodology , have been made in the past and will continue to be made...censoring mechanism. In such instances, the methodology for estimating relevant reliabili- ty probabilities has received considerable attention (cf. David...proposal for a discussion of the general methodology . .,4..% . - ’ -. - ’ . ’ , . * I - " . . - - - - . . ,_ . . . . . . . . .4
A Radial Basis Function Approach to Financial Time Series Analysis
1993-12-01
including efficient methods for parameter estimation and pruning, a pointwise prediction error estimator, and a methodology for controlling the "data...collection of practical techniques to address these issues for a modeling methodology . Radial Basis Function networks. These techniques in- clude efficient... methodology often then amounts to a careful consideration of the interplay between model complexity and reliability. These will be recurrent themes
O'Brien, Kelly K; Colquhoun, Heather; Levac, Danielle; Baxter, Larry; Tricco, Andrea C; Straus, Sharon; Wickerson, Lisa; Nayar, Ayesha; Moher, David; O'Malley, Lisa
2016-07-26
Scoping studies (or reviews) are a method used to comprehensively map evidence across a range of study designs in an area, with the aim of informing future research practice, programs and policy. However, no universal agreement exists on terminology, definition or methodological steps. Our aim was to understand the experiences of, and considerations for conducting scoping studies from the perspective of academic and community partners. Primary objectives were to 1) describe experiences conducting scoping studies including strengths and challenges; and 2) describe perspectives on terminology, definition, and methodological steps. We conducted a cross-sectional web-based survey with clinicians, educators, researchers, knowledge users, representatives from community-based organizations, graduate students, and policy stakeholders with experience and/or interest in conducting scoping studies to gain an understanding of experiences and perspectives on the conduct and reporting of scoping studies. We administered an electronic self-reported questionnaire comprised of 22 items related to experiences with scoping studies, strengths and challenges, opinions on terminology, and methodological steps. We analyzed questionnaire data using descriptive statistics and content analytical techniques. Survey results were discussed during a multi-stakeholder consultation to identify key considerations in the conduct and reporting of scoping studies. Of the 83 invitations, 54 individuals (65 %) completed the scoping questionnaire, and 48 (58 %) attended the scoping study meeting from Canada, the United Kingdom and United States. Many scoping study strengths were dually identified as challenges including breadth of scope, and iterative process. No consensus on terminology emerged, however key defining features that comprised a working definition of scoping studies included the exploratory mapping of literature in a field; iterative process, inclusion of grey literature; no quality assessment of included studies, and an optional consultation phase. We offer considerations for the conduct and reporting of scoping studies for researchers, clinicians and knowledge users engaging in this methodology. Lack of consensus on scoping terminology, definition and methodological steps persists. Reasons for this may be attributed to diversity of disciplines adopting this methodology for differing purposes. Further work is needed to establish guidelines on the reporting and methodological quality assessment of scoping studies.
NASA Technical Reports Server (NTRS)
Cruit, Wendy; Schutzenhofer, Scott; Goldberg, Ben; Everhart, Kurt
1993-01-01
This project served to define an appropriate methodology for effective prioritization of technology efforts required to develop replacement technologies mandated by imposed and forecast legislation. The methodology used is a semiquantitative approach derived from quality function deployment techniques (QFD Matrix). This methodology aims to weight the full environmental, cost, safety, reliability, and programmatic implications of replacement technology development to allow appropriate identification of viable candidates and programmatic alternatives. The results will be implemented as a guideline for consideration for current NASA propulsion systems.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Srinivas, Nisha; Rose, Derek C; Bolme, David S
This paper examines the difficulty associated with performing machine-based automatic demographic prediction on a sub-population of Asian faces. We introduce the Wild East Asian Face dataset (WEAFD), a new and unique dataset to the research community. This dataset consists primarily of labeled face images of individuals from East Asian countries, including Vietnam, Burma, Thailand, China, Korea, Japan, Indonesia, and Malaysia. East Asian turk annotators were uniquely used to judge the age and fine grain ethnicity attributes to reduce the impact of the other race effect and improve quality of annotations. We focus on predicting age, gender and fine-grained ethnicity ofmore » an individual by providing baseline results with a convolutional neural network (CNN). Finegrained ethnicity prediction refers to predicting ethnicity of an individual by country or sub-region (Chinese, Japanese, Korean, etc.) of the East Asian continent. Performance for two CNN architectures is presented, highlighting the difficulty of these tasks and showcasing potential design considerations that ease network optimization by promoting region based feature extraction.« less
Acquisition of thin coronal sectional dataset of cadaveric liver.
Lou, Li; Liu, Shu Wei; Zhao, Zhen Mei; Tang, Yu Chun; Lin, Xiang Tao
2014-04-01
To obtain the thin coronal sectional anatomic dataset of the liver by using digital freezing milling technique. The upper abdomen of one Chinese adult cadaver was selected as the specimen. After CT and MRI examinations verification of absent liver lesions, the specimen was embedded with gelatin in stand erect position and frozen under profound hypothermia, and the specimen was then serially sectioned from anterior to posterior layer by layer with digital milling machine in the freezing chamber. The sequential images were captured by means of a digital camera and the dataset was imported to imaging workstation. The thin serial section of the liver added up to 699 layers with each layer being 0.2 mm in thickness. The shape, location, structure, intrahepatic vessels and adjacent structures of the liver was displayed clearly on each layer of the coronal sectional slice. CT and MR images through the body were obtained at 1.0 and 3.0 mm intervals, respectively. The methodology reported here is an adaptation of the milling methods previously described, which is a new data acquisition method for sectional anatomy. The thin coronal sectional anatomic dataset of the liver obtained by this technique is of high precision and good quality.
Dell’Acqua, F.; Gamba, P.; Jaiswal, K.
2012-01-01
This paper discusses spatial aspects of the global exposure dataset and mapping needs for earthquake risk assessment. We discuss this in the context of development of a Global Exposure Database for the Global Earthquake Model (GED4GEM), which requires compilation of a multi-scale inventory of assets at risk, for example, buildings, populations, and economic exposure. After defining the relevant spatial and geographic scales of interest, different procedures are proposed to disaggregate coarse-resolution data, to map them, and if necessary to infer missing data by using proxies. We discuss the advantages and limitations of these methodologies and detail the potentials of utilizing remote-sensing data. The latter is used especially to homogenize an existing coarser dataset and, where possible, replace it with detailed information extracted from remote sensing using the built-up indicators for different environments. Present research shows that the spatial aspects of earthquake risk computation are tightly connected with the availability of datasets of the resolution necessary for producing sufficiently detailed exposure. The global exposure database designed by the GED4GEM project is able to manage datasets and queries of multiple spatial scales.
A multi-strategy approach to informative gene identification from gene expression data.
Liu, Ziying; Phan, Sieu; Famili, Fazel; Pan, Youlian; Lenferink, Anne E G; Cantin, Christiane; Collins, Catherine; O'Connor-McCourt, Maureen D
2010-02-01
An unsupervised multi-strategy approach has been developed to identify informative genes from high throughput genomic data. Several statistical methods have been used in the field to identify differentially expressed genes. Since different methods generate different lists of genes, it is very challenging to determine the most reliable gene list and the appropriate method. This paper presents a multi-strategy method, in which a combination of several data analysis techniques are applied to a given dataset and a confidence measure is established to select genes from the gene lists generated by these techniques to form the core of our final selection. The remainder of the genes that form the peripheral region are subject to exclusion or inclusion into the final selection. This paper demonstrates this methodology through its application to an in-house cancer genomics dataset and a public dataset. The results indicate that our method provides more reliable list of genes, which are validated using biological knowledge, biological experiments, and literature search. We further evaluated our multi-strategy method by consolidating two pairs of independent datasets, each pair is for the same disease, but generated by different labs using different platforms. The results showed that our method has produced far better results.
Transient Science from Diverse Surveys
NASA Astrophysics Data System (ADS)
Mahabal, A.; Crichton, D.; Djorgovski, S. G.; Donalek, C.; Drake, A.; Graham, M.; Law, E.
2016-12-01
Over the last several years we have moved closer to being able to make digital movies of the non-static sky with wide-field synoptic telescopes operating at a variety of depths, resolutions, and wavelengths. For optimal combined use of these datasets, it is crucial that they speak and understand the same language and are thus interoperable. Initial steps towards such interoperability (e.g. the footprint service) were taken during the two five-year Virtual Observatory projects viz. National Virtual Observatory (NVO), and later Virtual Astronomical Observatory (VAO). Now with far bigger datasets and in an era of resource excess thanks to the cloud-based workflows, we show how the movement of data and of resources is required - rather than just one or the other - to combine diverse datasets for applications such as real-time astronomical transient characterization. Taking the specific example of ElectroMagnetic (EM) follow-up of Gravitational Wave events and EM transients (such as CRTS but also other optical and non-optical surveys), we discuss the requirements for rapid and flexible response. We show how the same methodology is applicable to Earth Science data with its datasets differing in spatial and temporal resolution as well as differing time-spans.
Inter-fraction variations in respiratory motion models
NASA Astrophysics Data System (ADS)
McClelland, J. R.; Hughes, S.; Modat, M.; Qureshi, A.; Ahmad, S.; Landau, D. B.; Ourselin, S.; Hawkes, D. J.
2011-01-01
Respiratory motion can vary dramatically between the planning stage and the different fractions of radiotherapy treatment. Motion predictions used when constructing the radiotherapy plan may be unsuitable for later fractions of treatment. This paper presents a methodology for constructing patient-specific respiratory motion models and uses these models to evaluate and analyse the inter-fraction variations in the respiratory motion. The internal respiratory motion is determined from the deformable registration of Cine CT data and related to a respiratory surrogate signal derived from 3D skin surface data. Three different models for relating the internal motion to the surrogate signal have been investigated in this work. Data were acquired from six lung cancer patients. Two full datasets were acquired for each patient, one before the course of radiotherapy treatment and one at the end (approximately 6 weeks later). Separate models were built for each dataset. All models could accurately predict the respiratory motion in the same dataset, but had large errors when predicting the motion in the other dataset. Analysis of the inter-fraction variations revealed that most variations were spatially varying base-line shifts, but changes to the anatomy and the motion trajectories were also observed.
Semi-Automatic Assembly of Learning Resources
ERIC Educational Resources Information Center
Verbert, K.; Ochoa, X.; Derntl, M.; Wolpers, M.; Pardo, A.; Duval, E.
2012-01-01
Technology Enhanced Learning is a research field that has matured considerably over the last decade. Many technical solutions to support design, authoring and use of learning activities and resources have been developed. The first datasets that reflect the tracking of actual use of these tools in real-life settings are beginning to become…
TriageTools: tools for partitioning and prioritizing analysis of high-throughput sequencing data.
Fimereli, Danai; Detours, Vincent; Konopka, Tomasz
2013-04-01
High-throughput sequencing is becoming a popular research tool but carries with it considerable costs in terms of computation time, data storage and bandwidth. Meanwhile, some research applications focusing on individual genes or pathways do not necessitate processing of a full sequencing dataset. Thus, it is desirable to partition a large dataset into smaller, manageable, but relevant pieces. We present a toolkit for partitioning raw sequencing data that includes a method for extracting reads that are likely to map onto pre-defined regions of interest. We show the method can be used to extract information about genes of interest from DNA or RNA sequencing samples in a fraction of the time and disk space required to process and store a full dataset. We report speedup factors between 2.6 and 96, depending on settings and samples used. The software is available at http://www.sourceforge.net/projects/triagetools/.
Methodological standards in single-case experimental design: Raising the bar.
Ganz, Jennifer B; Ayres, Kevin M
2018-04-12
Single-case experimental designs (SCEDs), or small-n experimental research, are frequently implemented to assess approaches to improving outcomes for people with disabilities, particularly those with low-incidence disabilities, such as some developmental disabilities. SCED has become increasingly accepted as a research design. As this literature base is needed to determine what interventions are evidence-based practices, the acceptance of SCED has resulted in increased critiques with regard to methodological quality. Recent trends include recommendations from a number of expert scholars and institutions. The purpose of this article is to summarize the recent history of methodological quality considerations, synthesize the recommendations found in the SCED literature, and provide recommendations to researchers designing SCEDs with regard to essential and aspirational standards for methodological quality. Conclusions include imploring SCED to increase the quality of their experiments, with particular consideration regarding the applied nature of SCED research to be published in Research in Developmental Disabilities and beyond. Copyright © 2018 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Agapiou, Athos; Lysandrou, Vasiliki; Themistocleous, Kyriakos; Nisantzi, Argyro; Lasaponara, Rosa; Masini, Nicola; Krauss, Thomas; Cerra, Daniele; Gessner, Ursula; Schreier, Gunter; Hadjimitsis, Diofantos
2016-08-01
The landscape of Cyprus is characterized by transformations that occurred during the 20th century, with many of such changes being still active today. Landscapes' changes are due to a variety of reasons including war conflicts, environmental conditions and modern development that have often caused the alteration or even the total loss of important information that could have assisted the archaeologists to comprehend the archaeo-landscape. The present work aims to provide detailed information regarding the different existing datasets that can be used to support archaeologists in understanding the transformations that the landscape in Cyprus undergone, from a remote sensing perspective. Such datasets may help archaeologists to visualize a lost landscape and try to retrieve valuable information, while they support researchers for future investigations. As such they can further highlight in a predictive manner and consequently assess the impacts of landscape transformation -being of natural or anthropogenic cause- to cultural heritage. Three main datasets are presented here: aerial images, satellite datasets including spy satellite datasets acquired during the Cold War, and cadastral maps. The variety of data is provided in a chronological order (e.g. year of acquisitions), while other important parameters such as the cost and the accuracy are also determined. Individual examples of archaeological sites in Cyprus are also provided for each dataset in order to underline both their importance and performance. Also some pre- and post-processing remote sensing methodologies are briefly described in order to enhance the final results. The paper within the framework of ATHENA project, dedicated to remote sensing archaeology/CH, aims to fill a significant gap in the recent literature of remote sensing archaeology of the island and to assist current and future archaeologists in their quest for remote sensing information to support their research.
Developing a Global Network of River Reaches in Preparation of SWOT
NASA Astrophysics Data System (ADS)
Lion, C.; Pavelsky, T.; Allen, G. H.; Beighley, E.; Schumann, G.; Durand, M. T.
2016-12-01
In 2020, the Surface Water and Ocean Topography satellite (SWOT), a joint mission of NASA/CNES/CSA/UK will be launched. One of its major products will be the measurements of continental water surfaces, including the width, height, and slope of rivers and the surface area and elevations of lakes. The mission will improve the monitoring of continental water and also our understanding of the interactions between different hydrologic reservoirs. For rivers, SWOT measurements of slope will be carried out over predefined river reaches. As such, an a priori dataset for rivers is needed in order to facilitate analysis of the raw SWOT data. The information required to produce this dataset includes measurements of river width, elevation, slope, planform, river network topology, and flow accumulation. To produce this product, we have linked two existing global datasets: the Global River Widths from Landsat (GRWL) database, which contains river centerline locations, widths, and a braiding index derived from Landsat imagery, and a modified version of the HydroSHEDS hydrologically corrected digital elevation product, which contains heights and flow accumulation measurements for streams at 3 arcseconds spatial resolution. Merging these two datasets requires considerable care. The difficulties, among others, lie in the difference of resolution: 30m versus 3 arseconds, and the age of the datasets: 2000 versus 2010 (some rivers have moved, the braided sections are different). As such, we have developed custom software to merge the two datasets, taking into account the spatial proximity of river channels in the two datasets and ensuring that flow accumulation in the final dataset always increases downstream. Here, we present our results for the globe.
NASA Astrophysics Data System (ADS)
Soulard, C. E.; Acevedo, W.; Yang, Z.; Cohen, W. B.; Stehman, S. V.; Taylor, J. L.
2015-12-01
A wide range of spatial forest disturbance data exist for the conterminous United States, yet inconsistencies between map products arise because of differing programmatic objectives and methodologies. Researchers on the Land Change Research Project (LCRP) are working to assess spatial agreement, characterize uncertainties, and resolve discrepancies between these national level datasets, in regard to forest disturbance. Disturbance maps from the Global Forest Change (GFC), Landfire Vegetation Disturbance (LVD), National Land Cover Dataset (NLCD), Vegetation Change Tracker (VCT), Web-enabled Landsat Data (WELD), and Monitoring Trends in Burn Severity (MTBS) were harmonized using a pixel-based data fusion process. The harmonization process reconciled forest harvesting, forest fire, and remaining forest disturbance across four intervals (1986-1992, 1992-2001, 2001-2006, and 2006-2011) by relying on convergence of evidence across all datasets available for each interval. Pixels with high agreement across datasets were retained, while moderate-to-low agreement pixels were visually assessed and either manually edited using reference imagery or discarded from the final disturbance map(s). National results show that annual rates of forest harvest and overall fire have increased over the past 25 years. Overall, this study shows that leveraging the best elements of readily-available data improves forest loss monitoring relative to using a single dataset to monitor forest change, particularly by reducing commission errors.
How does spatial extent of fMRI datasets affect independent component analysis decomposition?
Aragri, Adriana; Scarabino, Tommaso; Seifritz, Erich; Comani, Silvia; Cirillo, Sossio; Tedeschi, Gioacchino; Esposito, Fabrizio; Di Salle, Francesco
2006-09-01
Spatial independent component analysis (sICA) of functional magnetic resonance imaging (fMRI) time series can generate meaningful activation maps and associated descriptive signals, which are useful to evaluate datasets of the entire brain or selected portions of it. Besides computational implications, variations in the input dataset combined with the multivariate nature of ICA may lead to different spatial or temporal readouts of brain activation phenomena. By reducing and increasing a volume of interest (VOI), we applied sICA to different datasets from real activation experiments with multislice acquisition and single or multiple sensory-motor task-induced blood oxygenation level-dependent (BOLD) signal sources with different spatial and temporal structure. Using receiver operating characteristics (ROC) methodology for accuracy evaluation and multiple regression analysis as benchmark, we compared sICA decompositions of reduced and increased VOI fMRI time-series containing auditory, motor and hemifield visual activation occurring separately or simultaneously in time. Both approaches yielded valid results; however, the results of the increased VOI approach were spatially more accurate compared to the results of the decreased VOI approach. This is consistent with the capability of sICA to take advantage of extended samples of statistical observations and suggests that sICA is more powerful with extended rather than reduced VOI datasets to delineate brain activity. (c) 2006 Wiley-Liss, Inc.
DOE Office of Scientific and Technical Information (OSTI.GOV)
al-Saffar, Sinan; Joslyn, Cliff A.; Chappell, Alan R.
As semantic datasets grow to be very large and divergent, there is a need to identify and exploit their inherent semantic structure for discovery and optimization. Towards that end, we present here a novel methodology to identify the semantic structures inherent in an arbitrary semantic graph dataset. We first present the concept of an extant ontology as a statistical description of the semantic relations present amongst the typed entities modeled in the graph. This serves as a model of the underlying semantic structure to aid in discovery and visualization. We then describe a method of ontological scaling in which themore » ontology is employed as a hierarchical scaling filter to infer different resolution levels at which the graph structures are to be viewed or analyzed. We illustrate these methods on three large and publicly available semantic datasets containing more than one billion edges each. Keywords-Semantic Web; Visualization; Ontology; Multi-resolution Data Mining;« less
Automated colour identification in melanocytic lesions.
Sabbaghi, S; Aldeen, M; Garnavi, R; Varigos, G; Doliantis, C; Nicolopoulos, J
2015-08-01
Colour information plays an important role in classifying skin lesion. However, colour identification by dermatologists can be very subjective, leading to cases of misdiagnosis. Therefore, a computer-assisted system for quantitative colour identification is highly desirable for dermatologists to use. Although numerous colour detection systems have been developed, few studies have focused on imitating the human visual perception of colours in melanoma application. In this paper we propose a new methodology based on QuadTree decomposition technique for automatic colour identification in dermoscopy images. Our approach mimics the human perception of lesion colours. The proposed method is trained on a set of 47 images from NIH dataset and applied to a test set of 190 skin lesions obtained from PH2 dataset. The results of our proposed method are compared with a recently reported colour identification method using the same dataset. The effectiveness of our method in detecting colours in dermoscopy images is vindicated by obtaining approximately 93% accuracy when the CIELab1 colour space is used.
Peek, N; Holmes, J H; Sun, J
2014-08-15
To review technical and methodological challenges for big data research in biomedicine and health. We discuss sources of big datasets, survey infrastructures for big data storage and big data processing, and describe the main challenges that arise when analyzing big data. The life and biomedical sciences are massively contributing to the big data revolution through secondary use of data that were collected during routine care and through new data sources such as social media. Efficient processing of big datasets is typically achieved by distributing computation over a cluster of computers. Data analysts should be aware of pitfalls related to big data such as bias in routine care data and the risk of false-positive findings in high-dimensional datasets. The major challenge for the near future is to transform analytical methods that are used in the biomedical and health domain, to fit the distributed storage and processing model that is required to handle big data, while ensuring confidentiality of the data being analyzed.
Hyperspectral signature analysis of skin parameters
NASA Astrophysics Data System (ADS)
Vyas, Saurabh; Banerjee, Amit; Garza, Luis; Kang, Sewon; Burlina, Philippe
2013-02-01
The temporal analysis of changes in biological skin parameters, including melanosome concentration, collagen concentration and blood oxygenation, may serve as a valuable tool in diagnosing the progression of malignant skin cancers and in understanding the pathophysiology of cancerous tumors. Quantitative knowledge of these parameters can also be useful in applications such as wound assessment, and point-of-care diagnostics, amongst others. We propose an approach to estimate in vivo skin parameters using a forward computational model based on Kubelka-Munk theory and the Fresnel Equations. We use this model to map the skin parameters to their corresponding hyperspectral signature. We then use machine learning based regression to develop an inverse map from hyperspectral signatures to skin parameters. In particular, we employ support vector machine based regression to estimate the in vivo skin parameters given their corresponding hyperspectral signature. We build on our work from SPIE 2012, and validate our methodology on an in vivo dataset. This dataset consists of 241 signatures collected from in vivo hyperspectral imaging of patients of both genders and Caucasian, Asian and African American ethnicities. In addition, we also extend our methodology past the visible region and through the short-wave infrared region of the electromagnetic spectrum. We find promising results when comparing the estimated skin parameters to the ground truth, demonstrating good agreement with well-established physiological precepts. This methodology can have potential use in non-invasive skin anomaly detection and for developing minimally invasive pre-screening tools.
NASA Astrophysics Data System (ADS)
Islam, Siraj Ul; Déry, Stephen J.
2017-03-01
This study evaluates predictive uncertainties in the snow hydrology of the Fraser River Basin (FRB) of British Columbia (BC), Canada, using the Variable Infiltration Capacity (VIC) model forced with several high-resolution gridded climate datasets. These datasets include the Canadian Precipitation Analysis and the thin-plate smoothing splines (ANUSPLIN), North American Regional Reanalysis (NARR), University of Washington (UW) and Pacific Climate Impacts Consortium (PCIC) gridded products. Uncertainties are evaluated at different stages of the VIC implementation, starting with the driving datasets, optimization of model parameters, and model calibration during cool and warm phases of the Pacific Decadal Oscillation (PDO). The inter-comparison of the forcing datasets (precipitation and air temperature) and their VIC simulations (snow water equivalent - SWE - and runoff) reveals widespread differences over the FRB, especially in mountainous regions. The ANUSPLIN precipitation shows a considerable dry bias in the Rocky Mountains, whereas the NARR winter air temperature is 2 °C warmer than the other datasets over most of the FRB. In the VIC simulations, the elevation-dependent changes in the maximum SWE (maxSWE) are more prominent at higher elevations of the Rocky Mountains, where the PCIC-VIC simulation accumulates too much SWE and ANUSPLIN-VIC yields an underestimation. Additionally, at each elevation range, the day of maxSWE varies from 10 to 20 days between the VIC simulations. The snow melting season begins early in the NARR-VIC simulation, whereas the PCIC-VIC simulation delays the melting, indicating seasonal uncertainty in SWE simulations. When compared with the observed runoff for the Fraser River main stem at Hope, BC, the ANUSPLIN-VIC simulation shows considerable underestimation of runoff throughout the water year owing to reduced precipitation in the ANUSPLIN forcing dataset. The NARR-VIC simulation yields more winter and spring runoff and earlier decline of flows in summer due to a nearly 15-day earlier onset of the FRB springtime snowmelt. Analysis of the parametric uncertainty in the VIC calibration process shows that the choice of the initial parameter range plays a crucial role in defining the model hydrological response for the FRB. Furthermore, the VIC calibration process is biased toward cool and warm phases of the PDO and the choice of proper calibration and validation time periods is important for the experimental setup. Overall the VIC hydrological response is prominently influenced by the uncertainties involved in the forcing datasets rather than those in its parameter optimization and experimental setups.
NASA Astrophysics Data System (ADS)
Hiebl, Johann; Frei, Christoph
2018-04-01
Spatial precipitation datasets that are long-term consistent, highly resolved and extend over several decades are an increasingly popular basis for modelling and monitoring environmental processes and planning tasks in hydrology, agriculture, energy resources management, etc. Here, we present a grid dataset of daily precipitation for Austria meant to promote such applications. It has a grid spacing of 1 km, extends back till 1961 and is continuously updated. It is constructed with the classical two-tier analysis, involving separate interpolations for mean monthly precipitation and daily relative anomalies. The former was accomplished by kriging with topographic predictors as external drift utilising 1249 stations. The latter is based on angular distance weighting and uses 523 stations. The input station network was kept largely stationary over time to avoid artefacts on long-term consistency. Example cases suggest that the new analysis is at least as plausible as previously existing datasets. Cross-validation and comparison against experimental high-resolution observations (WegenerNet) suggest that the accuracy of the dataset depends on interpretation. Users interpreting grid point values as point estimates must expect systematic overestimates for light and underestimates for heavy precipitation as well as substantial random errors. Grid point estimates are typically within a factor of 1.5 from in situ observations. Interpreting grid point values as area mean values, conditional biases are reduced and the magnitude of random errors is considerably smaller. Together with a similar dataset of temperature, the new dataset (SPARTACUS) is an interesting basis for modelling environmental processes, studying climate change impacts and monitoring the climate of Austria.
Ellis, Robert J; Zhu, Bilei; Koenig, Julian; Thayer, Julian F; Wang, Ye
2015-09-01
As the literature on heart rate variability (HRV) continues to burgeon, so too do the challenges faced with comparing results across studies conducted under different recording conditions and analysis options. Two important methodological considerations are (1) what sampling frequency (SF) to use when digitizing the electrocardiogram (ECG), and (2) whether to interpolate an ECG to enhance the accuracy of R-peak detection. Although specific recommendations have been offered on both points, the evidence used to support them can be seen to possess a number of methodological limitations. The present study takes a new and careful look at how SF influences 24 widely used time- and frequency-domain measures of HRV through the use of a Monte Carlo-based analysis of false positive rates (FPRs) associated with two-sample tests on independent sets of healthy subjects. HRV values from the first sample were calculated at 1000 Hz, and HRV values from the second sample were calculated at progressively lower SFs (and either with or without R-peak interpolation). When R-peak interpolation was applied prior to HRV calculation, FPRs for all HRV measures remained very close to 0.05 (i.e. the theoretically expected value), even when the second sample had an SF well below 100 Hz. Without R-peak interpolation, all HRV measures held their expected FPR down to 125 Hz (and far lower, in the case of some measures). These results provide concrete insights into the statistical validity of comparing datasets obtained at (potentially) very different SFs; comparisons which are particularly relevant for the domains of meta-analysis and mobile health.
The cost of post-abortion care in developing countries: a comparative analysis of four studies
Vlassoff, Michael; Singh, Susheela; Onda, Tsuyoshi
2016-01-01
Over the last five years, comprehensive national surveys of the cost of post-abortion care (PAC) to national health systems have been undertaken in Ethiopia, Uganda, Rwanda and Colombia using a specially developed costing methodology—the Post-abortion Care Costing Methodology (PACCM). The objective of this study is to expand the research findings of these four studies, making use of their extensive datasets. These studies offer the most complete and consistent estimates of the cost of PAC to date, and comparing their findings not only provides generalizable implications for health policies and programs, but also allows an assessment of the PACCM methodology. We find that the labor cost component varies widely: in Ethiopia and Colombia doctors spend about 30–60% more time with PAC patients than do nurses; in Uganda and Rwanda an opposite pattern is found. Labor costs range from I$42.80 in Uganda to I$301.30 in Colombia. The cost of drugs and supplies does not vary greatly, ranging from I$79 in Colombia to I$115 in Rwanda. Capital and overhead costs are substantial amounting to 52–68% of total PAC costs. Total costs per PAC case vary from I$334 in Rwanda to I$972 in Colombia. The financial burden of PAC is considerable: the expense of treating each PAC case is equivalent to around 35% of annual per capita income in Uganda, 29% in Rwanda and 11% in Colombia. Providing modern methods of contraception to women with an unmet need would cost just a fraction of the average expenditure on PAC: one year of modern contraceptive services and supplies cost only 3–12% of the average cost of treating a PAC patient. PMID:27045001
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhang, Jie; Draxl, Caroline; Hopson, Thomas
Numerical weather prediction (NWP) models have been widely used for wind resource assessment. Model runs with higher spatial resolution are generally more accurate, yet extremely computational expensive. An alternative approach is to use data generated by a low resolution NWP model, in conjunction with statistical methods. In order to analyze the accuracy and computational efficiency of different types of NWP-based wind resource assessment methods, this paper performs a comparison of three deterministic and probabilistic NWP-based wind resource assessment methodologies: (i) a coarse resolution (0.5 degrees x 0.67 degrees) global reanalysis data set, the Modern-Era Retrospective Analysis for Research and Applicationsmore » (MERRA); (ii) an analog ensemble methodology based on the MERRA, which provides both deterministic and probabilistic predictions; and (iii) a fine resolution (2-km) NWP data set, the Wind Integration National Dataset (WIND) Toolkit, based on the Weather Research and Forecasting model. Results show that: (i) as expected, the analog ensemble and WIND Toolkit perform significantly better than MERRA confirming their ability to downscale coarse estimates; (ii) the analog ensemble provides the best estimate of the multi-year wind distribution at seven of the nine sites, while the WIND Toolkit is the best at one site; (iii) the WIND Toolkit is more accurate in estimating the distribution of hourly wind speed differences, which characterizes the wind variability, at five of the available sites, with the analog ensemble being best at the remaining four locations; and (iv) the analog ensemble computational cost is negligible, whereas the WIND Toolkit requires large computational resources. Future efforts could focus on the combination of the analog ensemble with intermediate resolution (e.g., 10-15 km) NWP estimates, to considerably reduce the computational burden, while providing accurate deterministic estimates and reliable probabilistic assessments.« less
Quantifying uncertainties in wind energy assessment
NASA Astrophysics Data System (ADS)
Patlakas, Platon; Galanis, George; Kallos, George
2015-04-01
The constant rise of wind energy production and the subsequent penetration in global energy markets during the last decades resulted in new sites selection with various types of problems. Such problems arise due to the variability and the uncertainty of wind speed. The study of the wind speed distribution lower and upper tail may support the quantification of these uncertainties. Such approaches focused on extreme wind conditions or periods below the energy production threshold are necessary for a better management of operations. Towards this direction, different methodologies are presented for the credible evaluation of potential non-frequent/extreme values for these environmental conditions. The approaches used, take into consideration the structural design of the wind turbines according to their lifespan, the turbine failures, the time needed for repairing as well as the energy production distribution. In this work, a multi-parametric approach for studying extreme wind speed values will be discussed based on tools of Extreme Value Theory. In particular, the study is focused on extreme wind speed return periods and the persistence of no energy production based on a weather modeling system/hind cast/10-year dataset. More specifically, two methods (Annual Maxima and Peaks Over Threshold) were used for the estimation of extreme wind speeds and their recurrence intervals. Additionally, two different methodologies (intensity given duration and duration given intensity, both based on Annual Maxima method) were implied to calculate the extreme events duration, combined with their intensity as well as the event frequency. The obtained results prove that the proposed approaches converge, at least on the main findings, for each case. It is also remarkable that, despite the moderate wind speed climate of the area, several consequent days of no energy production are observed.
Some Methodological Considerations in Researching the Family Career.
ERIC Educational Resources Information Center
White, James
Methodological issues which confront researchers using the concept of the family career include the selection of appropriate dependent variables; the efficacy of historical versus immediate effects; and scaling the family career (a proposed replacement for the "family life cycle"). The issue of which dependent variables should be…
Some Pre-Methodological Considerations in Foreign-Language Teaching.
ERIC Educational Resources Information Center
Higgs, Theodore V.
1981-01-01
Combines studies in cognitive psychology and language acquisition with observations of pedagogical materials and student performance to analyze foreign-language teaching from the perspective of what students and teachers need to understand about language learning and language before meaningful debate over methodology can be undertaken. (Author/MES)
Structural Equation Modeling of School Violence Data: Methodological Considerations
ERIC Educational Resources Information Center
Mayer, Matthew J.
2004-01-01
Methodological challenges associated with structural equation modeling (SEM) and structured means modeling (SMM) in research on school violence and related topics in the social and behavioral sciences are examined. Problems associated with multiyear implementations of large-scale surveys are discussed. Complex sample designs, part of any…
The retrospective chart review: important methodological considerations.
Vassar, Matt; Holzmann, Matthew
2013-01-01
In this paper, we review and discuss ten common methodological mistakes found in retrospective chart reviews. The retrospective chart review is a widely applicable research methodology that can be used by healthcare disciplines as a means to direct subsequent prospective investigations. In many cases in this review, we have also provided suggestions or accessible resources that researchers can apply as a "best practices" guide when planning, conducting, or reviewing this investigative method.
Classification of boreal forest by satellite and inventory data using neural network approach
NASA Astrophysics Data System (ADS)
Romanov, A. A.
2012-12-01
The main objective of this research was to develop methodology for boreal (Siberian Taiga) land cover classification in a high accuracy level. The study area covers the territories of Central Siberian several parts along the Yenisei River (60-62 degrees North Latitude): the right bank includes mixed forest and dark taiga, the left - pine forests; so were taken as a high heterogeneity and statistically equal surfaces concerning spectral characteristics. Two main types of data were used: time series of middle spatial resolution satellite images (Landsat 5, 7 and SPOT4) and inventory datasets from the nature fieldworks (used for training samples sets preparation). Method of collecting field datasets included a short botany description (type/species of vegetation, density, compactness of the crowns, individual height and max/min diameters representative of each type, surface altitude of the plot), at the same time the geometric characteristic of each training sample unit corresponded to the spatial resolution of satellite images and geo-referenced (prepared datasets both of the preliminary processing and verification). The network of test plots was planned as irregular and determined by the landscape oriented approach. The main focus of the thematic data processing has been allocated for the use of neural networks (fuzzy logic inc.); therefore, the results of field studies have been converting input parameter of type / species of vegetation cover of each unit and the degree of variability. Proposed approach involves the processing of time series separately for each image mainly for the verification: shooting parameters taken into consideration (time, albedo) and thus expected to assess the quality of mapping. So the input variables for the networks were sensor bands, surface altitude, solar angels and land surface temperature (for a few experiments); also given attention to the formation of the formula class on the basis of statistical pre-processing of results of field research (prevalence type). Besides some statistical methods of supervised classification has been used (minimal distance, maximum likelihood, Mahalanobis). During the study received various types of neural classifiers suitable for the mapping, and even for the high heterogenic areas neural network approach has shown better results in precision despite the validity of the assumption of Gaussian distribution (Table). Experimentally chosen optimum network structure consisting of three layers of ten neuron in each, but it should be clarified that such configuration requires larges computational resources in comparison the statistical methods presented above; necessary to increase the number of iteration in network learning process for RMS errors minimization. It should also be emphasized that the key issues of accuracy estimation of the classification results is lack of completeness of the training sets, this is especially true with summer image processing of mixed forest. However seems that proposed methodology can be used also for measure local dynamic of boreal land surface by the type of vegetation.Comparison of classification accuracyt;
Yohay Carmel; Curtis Flather; Denis Dean
2006-01-01
This paper summarizes our efforts to investigate the nature, behavior, and implications of positional error and attribute error in spatiotemporal datasets. Estimating the combined influence of these errors on map analysis has been hindered by the fact that these two error types are traditionally expressed in different units (distance units, and categorical units,...
NASA Technical Reports Server (NTRS)
Rumsey, Christopher L.; Poirier, Diane M. A.; Bush, Robert H.; Towne, Charles E.
2001-01-01
The CFD General Notation System (CGNS) was developed to be a self-descriptive, machine-independent standard for storing CFD aerodynamic data. This guide aids users in the implementation of CGNS. It is intended as a tutorial on the usage of the CGNS mid-level library routines for reading and writing grid and flow solution datasets for both structured and unstructured methodologies.
ERIC Educational Resources Information Center
Delaney, Jennifer A.; Kearney, Tyler D.
2016-01-01
This study considered the impact of state-level guaranteed tuition programs on alternative student-based revenue streams. It used a quasi-experimental, difference-in-difference methodology with a panel dataset of public four-year institutions from 2000-2012. Illinois' 2004 "Truth-in-Tuition" law was used as the policy of interest and the…
Universal Stochastic Multiscale Image Fusion: An Example Application for Shale Rock.
Gerke, Kirill M; Karsanina, Marina V; Mallants, Dirk
2015-11-02
Spatial data captured with sensors of different resolution would provide a maximum degree of information if the data were to be merged into a single image representing all scales. We develop a general solution for merging multiscale categorical spatial data into a single dataset using stochastic reconstructions with rescaled correlation functions. The versatility of the method is demonstrated by merging three images of shale rock representing macro, micro and nanoscale spatial information on mineral, organic matter and porosity distribution. Merging multiscale images of shale rock is pivotal to quantify more reliably petrophysical properties needed for production optimization and environmental impacts minimization. Images obtained by X-ray microtomography and scanning electron microscopy were fused into a single image with predefined resolution. The methodology is sufficiently generic for implementation of other stochastic reconstruction techniques, any number of scales, any number of material phases, and any number of images for a given scale. The methodology can be further used to assess effective properties of fused porous media images or to compress voluminous spatial datasets for efficient data storage. Practical applications are not limited to petroleum engineering or more broadly geosciences, but will also find their way in material sciences, climatology, and remote sensing.
Storytelling and story testing in domestication.
Gerbault, Pascale; Allaby, Robin G; Boivin, Nicole; Rudzinski, Anna; Grimaldi, Ilaria M; Pires, J Chris; Climer Vigueira, Cynthia; Dobney, Keith; Gremillion, Kristen J; Barton, Loukas; Arroyo-Kalin, Manuel; Purugganan, Michael D; Rubio de Casas, Rafael; Bollongino, Ruth; Burger, Joachim; Fuller, Dorian Q; Bradley, Daniel G; Balding, David J; Richerson, Peter J; Gilbert, M Thomas P; Larson, Greger; Thomas, Mark G
2014-04-29
The domestication of plants and animals marks one of the most significant transitions in human, and indeed global, history. Traditionally, study of the domestication process was the exclusive domain of archaeologists and agricultural scientists; today it is an increasingly multidisciplinary enterprise that has come to involve the skills of evolutionary biologists and geneticists. Although the application of new information sources and methodologies has dramatically transformed our ability to study and understand domestication, it has also generated increasingly large and complex datasets, the interpretation of which is not straightforward. In particular, challenges of equifinality, evolutionary variance, and emergence of unexpected or counter-intuitive patterns all face researchers attempting to infer past processes directly from patterns in data. We argue that explicit modeling approaches, drawing upon emerging methodologies in statistics and population genetics, provide a powerful means of addressing these limitations. Modeling also offers an approach to analyzing datasets that avoids conclusions steered by implicit biases, and makes possible the formal integration of different data types. Here we outline some of the modeling approaches most relevant to current problems in domestication research, and demonstrate the ways in which simulation modeling is beginning to reshape our understanding of the domestication process.
Storytelling and story testing in domestication
Gerbault, Pascale; Allaby, Robin G.; Boivin, Nicole; Rudzinski, Anna; Grimaldi, Ilaria M.; Pires, J. Chris; Climer Vigueira, Cynthia; Dobney, Keith; Gremillion, Kristen J.; Barton, Loukas; Arroyo-Kalin, Manuel; Purugganan, Michael D.; Rubio de Casas, Rafael; Bollongino, Ruth; Burger, Joachim; Fuller, Dorian Q.; Bradley, Daniel G.; Balding, David J.; Richerson, Peter J.; Gilbert, M. Thomas P.; Larson, Greger; Thomas, Mark G.
2014-01-01
The domestication of plants and animals marks one of the most significant transitions in human, and indeed global, history. Traditionally, study of the domestication process was the exclusive domain of archaeologists and agricultural scientists; today it is an increasingly multidisciplinary enterprise that has come to involve the skills of evolutionary biologists and geneticists. Although the application of new information sources and methodologies has dramatically transformed our ability to study and understand domestication, it has also generated increasingly large and complex datasets, the interpretation of which is not straightforward. In particular, challenges of equifinality, evolutionary variance, and emergence of unexpected or counter-intuitive patterns all face researchers attempting to infer past processes directly from patterns in data. We argue that explicit modeling approaches, drawing upon emerging methodologies in statistics and population genetics, provide a powerful means of addressing these limitations. Modeling also offers an approach to analyzing datasets that avoids conclusions steered by implicit biases, and makes possible the formal integration of different data types. Here we outline some of the modeling approaches most relevant to current problems in domestication research, and demonstrate the ways in which simulation modeling is beginning to reshape our understanding of the domestication process. PMID:24753572
Universal Stochastic Multiscale Image Fusion: An Example Application for Shale Rock
Gerke, Kirill M.; Karsanina, Marina V.; Mallants, Dirk
2015-01-01
Spatial data captured with sensors of different resolution would provide a maximum degree of information if the data were to be merged into a single image representing all scales. We develop a general solution for merging multiscale categorical spatial data into a single dataset using stochastic reconstructions with rescaled correlation functions. The versatility of the method is demonstrated by merging three images of shale rock representing macro, micro and nanoscale spatial information on mineral, organic matter and porosity distribution. Merging multiscale images of shale rock is pivotal to quantify more reliably petrophysical properties needed for production optimization and environmental impacts minimization. Images obtained by X-ray microtomography and scanning electron microscopy were fused into a single image with predefined resolution. The methodology is sufficiently generic for implementation of other stochastic reconstruction techniques, any number of scales, any number of material phases, and any number of images for a given scale. The methodology can be further used to assess effective properties of fused porous media images or to compress voluminous spatial datasets for efficient data storage. Practical applications are not limited to petroleum engineering or more broadly geosciences, but will also find their way in material sciences, climatology, and remote sensing. PMID:26522938
Modern data science for analytical chemical data - A comprehensive review.
Szymańska, Ewa
2018-10-22
Efficient and reliable analysis of chemical analytical data is a great challenge due to the increase in data size, variety and velocity. New methodologies, approaches and methods are being proposed not only by chemometrics but also by other data scientific communities to extract relevant information from big datasets and provide their value to different applications. Besides common goal of big data analysis, different perspectives and terms on big data are being discussed in scientific literature and public media. The aim of this comprehensive review is to present common trends in the analysis of chemical analytical data across different data scientific fields together with their data type-specific and generic challenges. Firstly, common data science terms used in different data scientific fields are summarized and discussed. Secondly, systematic methodologies to plan and run big data analysis projects are presented together with their steps. Moreover, different analysis aspects like assessing data quality, selecting data pre-processing strategies, data visualization and model validation are considered in more detail. Finally, an overview of standard and new data analysis methods is provided and their suitability for big analytical chemical datasets shortly discussed. Copyright © 2018 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Gillespie, D.; La Pensée, A.; Cooper, M.
2013-07-01
Three dimensional (3D) laser scanning is an important documentation technique for cultural heritage. This technology has been adopted from the engineering and aeronautical industry and is an invaluable tool for the documentation of objects within museum collections (La Pensée, 2008). The datasets created via close range laser scanning are extremely accurate and the created 3D dataset allows for a more detailed analysis in comparison to other documentation technologies such as photography. The dataset can be used for a range of different applications including: documentation; archiving; surface monitoring; replication; gallery interactives; educational sessions; conservation and visualization. However, the novel nature of a 3D dataset is presenting a rather unique challenge with respect to its sharing and dissemination. This is in part due to the need for specialised 3D software and a supported graphics card to display high resolution 3D models. This can be detrimental to one of the main goals of cultural institutions, which is to share knowledge and enable activities such as research, education and entertainment. This has limited the presentation of 3D models of cultural heritage objects to mainly either images or videos. Yet with recent developments in computer graphics, increased internet speed and emerging technologies such as Adobe's Stage 3D (Adobe, 2013) and WebGL (Khronos, 2013), it is now possible to share a dataset directly within a webpage. This allows website visitors to interact with the 3D dataset allowing them to explore every angle of the object, gaining an insight into its shape and nature. This can be very important considering that it is difficult to offer the same level of understanding of the object through the use of traditional mediums such as photographs and videos. Yet this presents a range of problems: this is a very novel experience and very few people have engaged with 3D objects outside of 3D software packages or games. This paper presents results of research that aims to provide a methodology for museums and cultural institutions for prototyping a 3D viewer within a webpage, thereby not only allowing institutions to promote their collections via the internet but also providing a tool for users to engage in a meaningful way with cultural heritage datasets. The design process encompasses evaluation as the central part of the design methodology; focusing on how slight changes to navigation, object engagement and aesthetic appearance can influence the user's experience. The prototype used in this paper, was created using WebGL with the Three.Js (Three.JS, 2013) library and datasets were loaded as the OpenCTM (Geelnard, 2010) file format. The overall design is centred on creating an easy-tolearn interface allowing non-skilled users to interact with the datasets, and also providing tools allowing skilled users to discover more about the cultural heritage object. User testing was carried out, allowing users to interact with 3D datasets within the interactive viewer. The results are analysed and the insights learned are discussed in relation to an interface designed to interact with 3D content. The results will lead to the design of interfaces for interacting with 3D objects, which allow for both skilled and non skilled users to engage with 3D cultural heritage objects in a meaningful way.
Assessment of Adolescent Neurotoxicity: Rationale and Methodological Considerations
Spear, Linda Patia
2007-01-01
This introduction to the special issue of Neurotoxicology and Teratology on “Risk of neurobehavioral toxicity in adolescence” begins by broadly considering the ontogeny and phylogeny of adolescence, and the potential value of animal models of adolescence. Major findings from the emerging neuroscience of adolescence are then highlighted to establish the importance of studies of adolescent neurotoxicity. A variety of methodological issues that are of particular relevance to adolescent exposures are then discussed. These include consideration of pharmacokinetic factors, inclusion of other-aged comparison group(s), and issues involving timing, route of administration, and exposure-induced alterations in growth rate. Despite such methodological challenges, research to determine whether adolescence is a time of increased vulnerability (or greater resiliency) to specific drugs and environmental toxicants is progressing rapidly, as exemplified by the work presented in the articles of this special issue. PMID:17222532
Design Considerations for Creating a Chemical Information Workstation.
ERIC Educational Resources Information Center
Mess, John A.
1995-01-01
Discusses what a functional chemical information workstation should provide to support the users in an academic library and examines how it can be implemented. Highlights include basic design considerations; natural language interface, including grammar-based, context-based, and statistical methodologies; expert system interface; and programming…
Guidelines for the Design and Conduct of Clinical Studies in Knee Articular Cartilage Repair
Mithoefer, Kai; Saris, Daniel B.F.; Farr, Jack; Kon, Elizaveta; Zaslav, Kenneth; Cole, Brian J.; Ranstam, Jonas; Yao, Jian; Shive, Matthew; Levine, David; Dalemans, Wilfried; Brittberg, Mats
2011-01-01
Objective: To summarize current clinical research practice and develop methodological standards for objective scientific evaluation of knee cartilage repair procedures and products. Design: A comprehensive literature review was performed of high-level original studies providing information relevant for the design of clinical studies on articular cartilage repair in the knee. Analysis of cartilage repair publications and synopses of ongoing trials were used to identify important criteria for the design, reporting, and interpretation of studies in this field. Results: Current literature reflects the methodological limitations of the scientific evidence available for articular cartilage repair. However, clinical trial databases of ongoing trials document a trend suggesting improved study designs and clinical evaluation methodology. Based on the current scientific information and standards of clinical care, detailed methodological recommendations were developed for the statistical study design, patient recruitment, control group considerations, study endpoint definition, documentation of results, use of validated patient-reported outcome instruments, and inclusion and exclusion criteria for the design and conduct of scientifically sound cartilage repair study protocols. A consensus statement among the International Cartilage Repair Society (ICRS) and contributing authors experienced in clinical trial design and implementation was achieved. Conclusions: High-quality clinical research methodology is critical for the optimal evaluation of current and new cartilage repair technologies. In addition to generally applicable principles for orthopedic study design, specific criteria and considerations apply to cartilage repair studies. Systematic application of these criteria and considerations can facilitate study designs that are scientifically rigorous, ethical, practical, and appropriate for the question(s) being addressed in any given cartilage repair research project. PMID:26069574
Classification of Alzheimer’s Patients through Ubiquitous Computing †
Nieto-Reyes, Alicia; Duque, Rafael; Montaña, José Luis; Lage, Carmen
2017-01-01
Functional data analysis and artificial neural networks are the building blocks of the proposed methodology that distinguishes the movement patterns among c’s patients on different stages of the disease and classifies new patients to their appropriate stage of the disease. The movement patterns are obtained by the accelerometer device of android smartphones that the patients carry while moving freely. The proposed methodology is relevant in that it is flexible on the type of data to which it is applied. To exemplify that, it is analyzed a novel real three-dimensional functional dataset where each datum is observed in a different time domain. Not only is it observed on a difference frequency but also the domain of each datum has different length. The obtained classification success rate of 83% indicates the potential of the proposed methodology. PMID:28753975
Study of infectious diseases in archaeological bone material - A dataset.
Pucu, Elisa; Cascardo, Paula; Chame, Marcia; Felice, Gisele; Guidon, Niéde; Cleonice Vergne, Maria; Campos, Guadalupe; Roberto Machado-Silva, José; Leles, Daniela
2017-08-01
Bones of human and ground sloth remains were analyzed for presence of Trypanosoma cruzi by conventional PCR using primers TC, TC1 and TC2. Sequence results amplified a fragment with the same product size as the primers (300 and 350pb). Amplified PCR product was sequenced and analyzed on GenBank, using Blast. Although these sequences did not match with these parasites they showed high amplification with species of bacteria. This article presents the methodology used and the alignment of the sequences. The display of this dataset will allow further analysis of our results and discussion presented in the manuscript "Finding the unexpected: a critical view on molecular diagnosis of infectious diseases in archaeological samples" (Pucu et al. 2017) [1].
Use of graph theory measures to identify errors in record linkage.
Randall, Sean M; Boyd, James H; Ferrante, Anna M; Bauer, Jacqueline K; Semmens, James B
2014-07-01
Ensuring high linkage quality is important in many record linkage applications. Current methods for ensuring quality are manual and resource intensive. This paper seeks to determine the effectiveness of graph theory techniques in identifying record linkage errors. A range of graph theory techniques was applied to two linked datasets, with known truth sets. The ability of graph theory techniques to identify groups containing errors was compared to a widely used threshold setting technique. This methodology shows promise; however, further investigations into graph theory techniques are required. The development of more efficient and effective methods of improving linkage quality will result in higher quality datasets that can be delivered to researchers in shorter timeframes. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
A biclustering algorithm for extracting bit-patterns from binary datasets.
Rodriguez-Baena, Domingo S; Perez-Pulido, Antonio J; Aguilar-Ruiz, Jesus S
2011-10-01
Binary datasets represent a compact and simple way to store data about the relationships between a group of objects and their possible properties. In the last few years, different biclustering algorithms have been specially developed to be applied to binary datasets. Several approaches based on matrix factorization, suffix trees or divide-and-conquer techniques have been proposed to extract useful biclusters from binary data, and these approaches provide information about the distribution of patterns and intrinsic correlations. A novel approach to extracting biclusters from binary datasets, BiBit, is introduced here. The results obtained from different experiments with synthetic data reveal the excellent performance and the robustness of BiBit to density and size of input data. Also, BiBit is applied to a central nervous system embryonic tumor gene expression dataset to test the quality of the results. A novel gene expression preprocessing methodology, based on expression level layers, and the selective search performed by BiBit, based on a very fast bit-pattern processing technique, provide very satisfactory results in quality and computational cost. The power of biclustering in finding genes involved simultaneously in different cancer processes is also shown. Finally, a comparison with Bimax, one of the most cited binary biclustering algorithms, shows that BiBit is faster while providing essentially the same results. The source and binary codes, the datasets used in the experiments and the results can be found at: http://www.upo.es/eps/bigs/BiBit.html dsrodbae@upo.es Supplementary data are available at Bioinformatics online.
A framework for automatic creation of gold-standard rigid 3D-2D registration datasets.
Madan, Hennadii; Pernuš, Franjo; Likar, Boštjan; Špiclin, Žiga
2017-02-01
Advanced image-guided medical procedures incorporate 2D intra-interventional information into pre-interventional 3D image and plan of the procedure through 3D/2D image registration (32R). To enter clinical use, and even for publication purposes, novel and existing 32R methods have to be rigorously validated. The performance of a 32R method can be estimated by comparing it to an accurate reference or gold standard method (usually based on fiducial markers) on the same set of images (gold standard dataset). Objective validation and comparison of methods are possible only if evaluation methodology is standardized, and the gold standard dataset is made publicly available. Currently, very few such datasets exist and only one contains images of multiple patients acquired during a procedure. To encourage the creation of gold standard 32R datasets, we propose an automatic framework. The framework is based on rigid registration of fiducial markers. The main novelty is spatial grouping of fiducial markers on the carrier device, which enables automatic marker localization and identification across the 3D and 2D images. The proposed framework was demonstrated on clinical angiograms of 20 patients. Rigid 32R computed by the framework was more accurate than that obtained manually, with the respective target registration error below 0.027 mm compared to 0.040 mm. The framework is applicable for gold standard setup on any rigid anatomy, provided that the acquired images contain spatially grouped fiducial markers. The gold standard datasets and software will be made publicly available.
Essential methodological considerations when using grounded theory.
Achora, Susan; Matua, Gerald Amandu
2016-07-01
To suggest important methodological considerations when using grounded theory. A research method widely used in nursing research is grounded theory, at the centre of which is theory construction. However, researchers still struggle with some of its methodological issues. Although grounded theory is widely used to study and explain issues in nursing practice, many researchers are still failing to adhere to its rigorous standards. Researchers should articulate the focus of their investigations - the substantive area of interest as well as the focal population. This should be followed by a succinct explanation of the strategies used to collect and analyse data, supported by clear coding processes. Finally, the resolution of the core issues, including the core category and related categories, should be explained to advance readers' understanding. Researchers should endeavour to understand the tenets of grounded theory. This enables 'neophytes' in particular to make methodological decisions that will improve their studies' rigour and fit with grounded theory. This paper complements the current dialogue on improving the understanding of grounded theory methodology in nursing research. The paper also suggests important procedural decisions researchers need to make to preserve their studies' scientific merit and fit with grounded theory.
Language barriers and qualitative nursing research: methodological considerations.
Squires, A
2008-09-01
This review of the literature synthesizes methodological recommendations for the use of translators and interpreters in cross-language qualitative research. Cross-language qualitative research involves the use of interpreters and translators to mediate a language barrier between researchers and participants. Qualitative nurse researchers successfully address language barriers between themselves and their participants when they systematically plan for how they will use interpreters and translators throughout the research process. Experienced qualitative researchers recognize that translators can generate qualitative data through translation processes and by participating in data analysis. Failure to address language barriers and the methodological challenges they present threatens the credibility, transferability, dependability and confirmability of cross-language qualitative nursing research. Through a synthesis of the cross-language qualitative methods literature, this article reviews the basics of language competence, translator and interpreter qualifications, and roles for each kind of qualitative research approach. Methodological and ethical considerations are also provided. By systematically addressing the methodological challenges cross-language research presents, nurse researchers can produce better evidence for nursing practice and policy making when working across different language groups. Findings from qualitative studies will also accurately represent the experiences of the participants without concern that the meaning was lost in translation.
Language barriers and qualitative nursing research: methodological considerations
Squires, A.
2009-01-01
Aim This review of the literature synthesizes methodological recommendations for the use of translators and interpreters in cross-language qualitative research. Background Cross-language qualitative research involves the use of interpreters and translators to mediate a language barrier between researchers and participants. Qualitative nurse researchers successfully address language barriers between themselves and their participants when they systematically plan for how they will use interpreters and translators throughout the research process. Experienced qualitative researchers recognize that translators can generate qualitative data through translation processes and by participating in data analysis. Failure to address language barriers and the methodological challenges they present threatens the credibility, transferability, dependability and confirmability of cross-language qualitative nursing research. Through a synthesis of the cross-language qualitative methods literature, this article reviews the basics of language competence, translator and interpreter qualifications, and roles for each kind of qualitative research approach. Methodological and ethical considerations are also provided. Conclusion By systematically addressing the methodological challenges cross-language research presents, nurse researchers can produce better evidence for nursing practice and policy making when working across different language groups. Findings from qualitative studies will also accurately represent the experiences of the participants without concern that the meaning was lost in translation. PMID:19522941
Federal Register 2010, 2011, 2012, 2013, 2014
2012-05-03
... determine endpoints; questionnaire design and analyses; and presentation of survey results. To date, FDA has..., the workshop will invest considerable time in identifying best methodological practices for conducting... sample, sample size, question design, process, and endpoints. Panel 2 will focus on alternatives to...
Methodological and Ethical Considerations in a Life History Study of Teacher Thinking.
ERIC Educational Resources Information Center
Muchmore, James A.
This paper discusses some of the methodological and ethical issues that one educational researcher encountered throughout his work, focusing on the importance of understanding teachers' thinking from their perspective (an insider looking out rather than an outsider looking in). It highlights a collaborative research relationship that the…
ERIC Educational Resources Information Center
Moradi, Bonnie; Mohr, Jonathan J.; Worthington, Roger L.; Fassinger, Ruth E.
2009-01-01
This lead article of the special issue discusses conceptual and methodological considerations in studying sexual minority issues, particularly in research conducted by counseling psychologists (including the work represented in this special issue). First, the overarching challenge of conceptualizing and defining sexual minority populations is…
ERIC Educational Resources Information Center
Casado, Banghwa Lee; Negi, Nalini Junko; Hong, Michin
2012-01-01
Despite the growing number of language minorities, foreign-born individuals with limited English proficiency, this population has been largely left out of social work research, often due to methodological challenges involved in conducting research with this population. Whereas the professional standard calls for cultural competence, a discussion…
No Trespassing: U.S. Public Schools and the Border of Institutional Homophobia.
ERIC Educational Resources Information Center
Lugg, Catherine A.
This presentation takes an historical approach to homosexuality and homophobia in public schools. The methodology of "history from below" is applied. Methodological considerations are discussed, and experiences of gay and lesbian teachers and students are explored. The psychological, moral and political meanings various groups attach to…
ERIC Educational Resources Information Center
Wong, Vivian C.; Steiner, Peter M.
2015-01-01
Across the disciplines of economics, political science, public policy, and now, education, the randomized controlled trial (RCT) is the preferred methodology for establishing causal inference about program impacts. But randomized experiments are not always feasible because of ethical, political, and/or practical considerations, so non-experimental…
Multilevel Modeling: A Review of Methodological Issues and Applications
ERIC Educational Resources Information Center
Dedrick, Robert F.; Ferron, John M.; Hess, Melinda R.; Hogarty, Kristine Y.; Kromrey, Jeffrey D.; Lang, Thomas R.; Niles, John D.; Lee, Reginald S.
2009-01-01
This study analyzed the reporting of multilevel modeling applications of a sample of 99 articles from 13 peer-reviewed journals in education and the social sciences. A checklist, derived from the methodological literature on multilevel modeling and focusing on the issues of model development and specification, data considerations, estimation, and…
Kohlberg's Moral Judgment Scale: Some Methodological Considerations
ERIC Educational Resources Information Center
Rubin, Kenneth H.; Trotter, Kristin T.
1977-01-01
Examined 3 methodological issues in the use of Kohlberg's Moral Judgment Scale: (1) test-retest reliability, (2) consistency of moral judgment stages from one dilemma to the next, and (3) influence of subject's verbal facility on the projective test scores. Forty children in grades 3 and 5 participated. (JMB)
ERIC Educational Resources Information Center
Pucci, Bruno
2000-01-01
Considers the differences between quantitative and qualitative research. Cites some essays by Adorno when he was living in New York which led to the conclusion that empirical data has much to say and discusses the theoretical-methodological contributions in a recent master's thesis in education. (BT)
Subsampling for dataset optimisation
NASA Astrophysics Data System (ADS)
Ließ, Mareike
2017-04-01
Soil-landscapes have formed by the interaction of soil-forming factors and pedogenic processes. In modelling these landscapes in their pedodiversity and the underlying processes, a representative unbiased dataset is required. This concerns model input as well as output data. However, very often big datasets are available which are highly heterogeneous and were gathered for various purposes, but not to model a particular process or data space. As a first step, the overall data space and/or landscape section to be modelled needs to be identified including considerations regarding scale and resolution. Then the available dataset needs to be optimised via subsampling to well represent this n-dimensional data space. A couple of well-known sampling designs may be adapted to suit this purpose. The overall approach follows three main strategies: (1) the data space may be condensed and de-correlated by a factor analysis to facilitate the subsampling process. (2) Different methods of pattern recognition serve to structure the n-dimensional data space to be modelled into units which then form the basis for the optimisation of an existing dataset through a sensible selection of samples. Along the way, data units for which there is currently insufficient soil data available may be identified. And (3) random samples from the n-dimensional data space may be replaced by similar samples from the available dataset. While being a presupposition to develop data-driven statistical models, this approach may also help to develop universal process models and identify limitations in existing models.
Hannon, Tamara S; Kahn, Steven E; Utzschneider, Kristina M; Buchanan, Thomas A; Nadeau, Kristen J; Zeitler, Philip S; Ehrmann, David A; Arslanian, Silva A; Caprio, Sonia; Edelstein, Sharon L; Savage, Peter J; Mather, Kieren J
2018-01-01
The Restoring Insulin Secretion (RISE) study was initiated to evaluate interventions to slow or reverse the progression of β-cell failure in type 2 diabetes (T2D). To design the RISE study, we undertook an evaluation of methods for measurement of β-cell function and changes in β-cell function in response to interventions. In the present paper, we review approaches for measurement of β-cell function, focusing on methodologic and feasibility considerations. Methodologic considerations included: (1) the utility of each technique for evaluating key aspects of β-cell function (first- and second-phase insulin secretion, maximum insulin secretion, glucose sensitivity, incretin effects) and (2) tactics for incorporating a measurement of insulin sensitivity in order to adjust insulin secretion measures for insulin sensitivity appropriately. Of particular concern were the capacity to measure β-cell function accurately in those with poor function, as is seen in established T2D, and the capacity of each method for demonstrating treatment-induced changes in β-cell function. Feasibility considerations included: staff burden, including time and required methodological expertise; participant burden, including time and number of study visits; and ease of standardizing methods across a multicentre consortium. After this evaluation, we selected a 2-day measurement procedure, combining a 3-hour 75-g oral glucose tolerance test and a 2-stage hyperglycaemic clamp procedure, augmented with arginine. © 2017 John Wiley & Sons Ltd.
Overview of a public-industry partnership for enhancing corn nitrogen research and datasets
USDA-ARS?s Scientific Manuscript database
Due to economic and environmental consequences of nitrogen (N) lost from fertilizer applications in corn (Zea mays L.), considerable public and industry attention has been devoted to development of N decision tools. Now a wide variety of tools are available to farmers for managing N inputs. However,...
Genetic architechture and biological basis for feed efficiency in dairy cattle
USDA-ARS?s Scientific Manuscript database
The genetic architecture of residual feed intake (RFI) and related traits was evaluated using a dataset of 2,894 cows. A Bayesian analysis estimated that markers accounted for 14% of the variance in RFI, and that RFI had considerable genetic variation. Effects of marker windows were small, but QTL p...
Assessing Conformity to Standards for Treatment Foster Care.
ERIC Educational Resources Information Center
Farmer, Elizabeth M. Z.; Burns, Barbara J.; Dubs, Melanie S.; Thompson, Shealy
2002-01-01
This study examined conformity to the Program Standards for Treatment Foster Care among 42 statewide programs. Findings suggest fair to good overall conformity, with considerable variation among programs. A discussion of methodological and substantive considerations for future research and evaluation using this approach is included. (Contains…
Simplifier: a web tool to eliminate redundant NGS contigs.
Ramos, Rommel Thiago Jucá; Carneiro, Adriana Ribeiro; Azevedo, Vasco; Schneider, Maria Paula; Barh, Debmalya; Silva, Artur
2012-01-01
Modern genomic sequencing technologies produce a large amount of data with reduced cost per base; however, this data consists of short reads. This reduction in the size of the reads, compared to those obtained with previous methodologies, presents new challenges, including a need for efficient algorithms for the assembly of genomes from short reads and for resolving repetitions. Additionally after abinitio assembly, curation of the hundreds or thousands of contigs generated by assemblers demands considerable time and computational resources. We developed Simplifier, a stand-alone software that selectively eliminates redundant sequences from the collection of contigs generated by ab initio assembly of genomes. Application of Simplifier to data generated by assembly of the genome of Corynebacterium pseudotuberculosis strain 258 reduced the number of contigs generated by ab initio methods from 8,004 to 5,272, a reduction of 34.14%; in addition, N50 increased from 1 kb to 1.5 kb. Processing the contigs of Escherichia coli DH10B with Simplifier reduced the mate-paired library 17.47% and the fragment library 23.91%. Simplifier removed redundant sequences from datasets produced by assemblers, thereby reducing the effort required for finalization of genome assembly in tests with data from Prokaryotic organisms. Simplifier is available at http://www.genoma.ufpa.br/rramos/softwares/simplifier.xhtmlIt requires Sun jdk 6 or higher.
Gagliano, Sarah A; Ravji, Reena; Barnes, Michael R; Weale, Michael E; Knight, Jo
2015-08-24
Although technology has triumphed in facilitating routine genome sequencing, new challenges have been created for the data-analyst. Genome-scale surveys of human variation generate volumes of data that far exceed capabilities for laboratory characterization. By incorporating functional annotations as predictors, statistical learning has been widely investigated for prioritizing genetic variants likely to be associated with complex disease. We compared three published prioritization procedures, which use different statistical learning algorithms and different predictors with regard to the quantity, type and coding. We also explored different combinations of algorithm and annotation set. As an application, we tested which methodology performed best for prioritizing variants using data from a large schizophrenia meta-analysis by the Psychiatric Genomics Consortium. Results suggest that all methods have considerable (and similar) predictive accuracies (AUCs 0.64-0.71) in test set data, but there is more variability in the application to the schizophrenia GWAS. In conclusion, a variety of algorithms and annotations seem to have a similar potential to effectively enrich true risk variants in genome-scale datasets, however none offer more than incremental improvement in prediction. We discuss how methods might be evolved for risk variant prediction to address the impending bottleneck of the new generation of genome re-sequencing studies.
NASA Astrophysics Data System (ADS)
He, Jingjing; Guan, Xuefei; Peng, Tishun; Liu, Yongming; Saxena, Abhinav; Celaya, Jose; Goebel, Kai
2013-10-01
This paper presents an experimental study of damage detection and quantification in riveted lap joints. Embedded lead zirconate titanate piezoelectric (PZT) ceramic wafer-type sensors are employed to perform in situ non-destructive evaluation (NDE) during fatigue cyclical loading. PZT wafers are used to monitor the wave reflection from the boundaries of the fatigue crack at the edge of bolt joints. The group velocity of the guided wave is calculated to select a proper time window in which the received signal contains the damage information. It is found that the fatigue crack lengths are correlated with three main features of the signal, i.e., correlation coefficient, amplitude change, and phase change. It was also observed that a single feature cannot be used to quantify the damage among different specimens since a considerable variability was observed in the response from different specimens. A multi-feature integration method based on a second-order multivariate regression analysis is proposed for the prediction of fatigue crack lengths using sensor measurements. The model parameters are obtained using training datasets from five specimens. The effectiveness of the proposed methodology is demonstrated using several lap joint specimens from different manufactures and under different loading conditions.
Van Landeghem, Sofie; Abeel, Thomas; Saeys, Yvan; Van de Peer, Yves
2010-09-15
In the field of biomolecular text mining, black box behavior of machine learning systems currently limits understanding of the true nature of the predictions. However, feature selection (FS) is capable of identifying the most relevant features in any supervised learning setting, providing insight into the specific properties of the classification algorithm. This allows us to build more accurate classifiers while at the same time bridging the gap between the black box behavior and the end-user who has to interpret the results. We show that our FS methodology successfully discards a large fraction of machine-generated features, improving classification performance of state-of-the-art text mining algorithms. Furthermore, we illustrate how FS can be applied to gain understanding in the predictions of a framework for biomolecular event extraction from text. We include numerous examples of highly discriminative features that model either biological reality or common linguistic constructs. Finally, we discuss a number of insights from our FS analyses that will provide the opportunity to considerably improve upon current text mining tools. The FS algorithms and classifiers are available in Java-ML (http://java-ml.sf.net). The datasets are publicly available from the BioNLP'09 Shared Task web site (http://www-tsujii.is.s.u-tokyo.ac.jp/GENIA/SharedTask/).
Single-Cell RNA-Sequencing: Assessment of Differential Expression Analysis Methods.
Dal Molin, Alessandra; Baruzzo, Giacomo; Di Camillo, Barbara
2017-01-01
The sequencing of the transcriptomes of single-cells, or single-cell RNA-sequencing, has now become the dominant technology for the identification of novel cell types and for the study of stochastic gene expression. In recent years, various tools for analyzing single-cell RNA-sequencing data have been proposed, many of them with the purpose of performing differentially expression analysis. In this work, we compare four different tools for single-cell RNA-sequencing differential expression, together with two popular methods originally developed for the analysis of bulk RNA-sequencing data, but largely applied to single-cell data. We discuss results obtained on two real and one synthetic dataset, along with considerations about the perspectives of single-cell differential expression analysis. In particular, we explore the methods performance in four different scenarios, mimicking different unimodal or bimodal distributions of the data, as characteristic of single-cell transcriptomics. We observed marked differences between the selected methods in terms of precision and recall, the number of detected differentially expressed genes and the overall performance. Globally, the results obtained in our study suggest that is difficult to identify a best performing tool and that efforts are needed to improve the methodologies for single-cell RNA-sequencing data analysis and gain better accuracy of results.
Numerical study on simultaneous emission and transmission tomography in the MRI framework
NASA Astrophysics Data System (ADS)
Gjesteby, Lars; Cong, Wenxiang; Wang, Ge
2017-09-01
Multi-modality imaging methods are instrumental for advanced diagnosis and therapy. Specifically, a hybrid system that combines computed tomography (CT), nuclear imaging, and magnetic resonance imaging (MRI) will be a Holy Grail of medical imaging, delivering complementary structural/morphological, functional, and molecular information for precision medicine. A novel imaging method was recently demonstrated that takes advantage of radiotracer polarization to combine MRI principles with nuclear imaging. This approach allows the concentration of a polarized Υ-ray emitting radioisotope to be imaged with MRI resolution potentially outperforming the standard nuclear imaging mode at a sensitivity significantly higher than that of MRI. In our work, we propose to acquire MRI-modulated nuclear data for simultaneous image reconstruction of both emission and transmission parameters, suggesting the potential for simultaneous CT-SPECT-MRI. The synchronized diverse datasets allow excellent spatiotemporal registration and unique insight into physiological and pathological features. Here we describe the methodology involving the system design with emphasis on the formulation for tomographic images, even when significant radiotracer signals are limited to a region of interest (ROI). Initial numerical results demonstrate the feasibility of our approach for reconstructing concentration and attenuation images through a head phantom with various radio-labeled ROIs. Additional considerations regarding the radioisotope characteristics are also discussed.
Olea, R.A.; Luppens, J.A.; Tewalt, S.J.
2011-01-01
A common practice for characterizing uncertainty in coal resource assessments has been the itemization of tonnage at the mining unit level and the classification of such units according to distance to drilling holes. Distance criteria, such as those used in U.S. Geological Survey Circular 891, are still widely used for public disclosure. A major deficiency of distance methods is that they do not provide a quantitative measure of uncertainty. Additionally, relying on distance between data points alone does not take into consideration other factors known to have an influence on uncertainty, such as spatial correlation, type of probability distribution followed by the data, geological discontinuities, and boundary of the deposit. Several geostatistical methods have been combined to formulate a quantitative characterization for appraising uncertainty. Drill hole datasets ranging from widespread exploration drilling to detailed development drilling from a lignite deposit in Texas were used to illustrate the modeling. The results show that distance to the nearest drill hole is almost completely unrelated to uncertainty, which confirms the inadequacy of characterizing uncertainty based solely on a simple classification of resources by distance classes. The more complex statistical methods used in this study quantify uncertainty and show good agreement between confidence intervals in the uncertainty predictions and data from additional drilling. ?? 2010.
Less is less: a systematic review of graph use in meta-analyses.
Schild, Anne H E; Voracek, Martin
2013-09-01
Graphs are an essential part of scientific communication. Complex datasets, of which meta-analyses are textbook examples, benefit the most from visualization. Although a number of graph options for meta-analyses exist, the extent to which these are used was hitherto unclear. A systematic review on graph use in meta-analyses in three disciplines (medicine, psychology, and business) and nine journals was conducted. Interdisciplinary differences, which are mirrored in the respective journals, were revealed, that is, graph use correlates with external factors rather than methodological considerations. There was only limited variation in graph types (with forest plots as the most important representatives), and diagnostic plots were very rare. Although an increase in graph use over time could be observed, it is unlikely that this phenomenon is specific to meta-analyses. There is a gaping discrepancy between available graphic methods and their application in meta-analyses. This may be rooted in a number of factors, namely, (i) insufficient dissemination of new developments, (ii) unsatisfactory implementation in software packages, and (iii) minor attention on graphics in meta-analysis reporting guidelines. Using visualization methods to their full capacity is a further step in using meta-analysis to its full potential. Copyright © 2013 John Wiley & Sons, Ltd.
Oztekin, Asil; Delen, Dursun; Kong, Zhenyu James
2009-12-01
Predicting the survival of heart-lung transplant patients has the potential to play a critical role in understanding and improving the matching procedure between the recipient and graft. Although voluminous data related to the transplantation procedures is being collected and stored, only a small subset of the predictive factors has been used in modeling heart-lung transplantation outcomes. The previous studies have mainly focused on applying statistical techniques to a small set of factors selected by the domain-experts in order to reveal the simple linear relationships between the factors and survival. The collection of methods known as 'data mining' offers significant advantages over conventional statistical techniques in dealing with the latter's limitations such as normality assumption of observations, independence of observations from each other, and linearity of the relationship between the observations and the output measure(s). There are statistical methods that overcome these limitations. Yet, they are computationally more expensive and do not provide fast and flexible solutions as do data mining techniques in large datasets. The main objective of this study is to improve the prediction of outcomes following combined heart-lung transplantation by proposing an integrated data-mining methodology. A large and feature-rich dataset (16,604 cases with 283 variables) is used to (1) develop machine learning based predictive models and (2) extract the most important predictive factors. Then, using three different variable selection methods, namely, (i) machine learning methods driven variables-using decision trees, neural networks, logistic regression, (ii) the literature review-based expert-defined variables, and (iii) common sense-based interaction variables, a consolidated set of factors is generated and used to develop Cox regression models for heart-lung graft survival. The predictive models' performance in terms of 10-fold cross-validation accuracy rates for two multi-imputed datasets ranged from 79% to 86% for neural networks, from 78% to 86% for logistic regression, and from 71% to 79% for decision trees. The results indicate that the proposed integrated data mining methodology using Cox hazard models better predicted the graft survival with different variables than the conventional approaches commonly used in the literature. This result is validated by the comparison of the corresponding Gains charts for our proposed methodology and the literature review based Cox results, and by the comparison of Akaike information criteria (AIC) values received from each. Data mining-based methodology proposed in this study reveals that there are undiscovered relationships (i.e. interactions of the existing variables) among the survival-related variables, which helps better predict the survival of the heart-lung transplants. It also brings a different set of variables into the scene to be evaluated by the domain-experts and be considered prior to the organ transplantation.
Team building: conceptual, methodological, and applied considerations.
Beauchamp, Mark R; McEwan, Desmond; Waldhauser, Katrina J
2017-08-01
Team building has been identified as an important method of improving the psychological climate in which teams operate, as well as overall team functioning. Within the context of sports, team building interventions have consistently been found to result in improvements in team effectiveness. In this paper we review the extant literature on team building in sport, and address a range of conceptual, methodological, and applied considerations that have the potential to advance theory, research, and applied intervention initiatives within the field. This involves expanding the scope of team building strategies that have, to date, primarily focused on developing group cohesion. Copyright © 2017 Elsevier Ltd. All rights reserved.
Dworkin, Robert H; Turk, Dennis C; Peirce-Sandner, Sarah; Baron, Ralf; Bellamy, Nicholas; Burke, Laurie B; Chappell, Amy; Chartier, Kevin; Cleeland, Charles S; Costello, Ann; Cowan, Penney; Dimitrova, Rozalina; Ellenberg, Susan; Farrar, John T; French, Jacqueline A; Gilron, Ian; Hertz, Sharon; Jadad, Alejandro R; Jay, Gary W; Kalliomäki, Jarkko; Katz, Nathaniel P; Kerns, Robert D; Manning, Donald C; McDermott, Michael P; McGrath, Patrick J; Narayana, Arvind; Porter, Linda; Quessy, Steve; Rappaport, Bob A; Rauschkolb, Christine; Reeve, Bryce B; Rhodes, Thomas; Sampaio, Cristina; Simpson, David M; Stauffer, Joseph W; Stucki, Gerold; Tobias, Jeffrey; White, Richard E; Witter, James
2010-05-01
There has been an increase in the number of chronic pain clinical trials in which the treatments being evaluated did not differ significantly from placebo in the primary efficacy analyses despite previous research suggesting that efficacy could be expected. These findings could reflect a true lack of efficacy or methodological and other aspects of these trials that compromise the demonstration of efficacy. There is substantial variability among chronic pain clinical trials with respect to important research design considerations, and identifying and addressing any methodological weaknesses would enhance the likelihood of demonstrating the analgesic effects of new interventions. An IMMPACT consensus meeting was therefore convened to identify the critical research design considerations for confirmatory chronic pain trials and to make recommendations for their conduct. We present recommendations for the major components of confirmatory chronic pain clinical trials, including participant selection, trial phases and duration, treatment groups and dosing regimens, and types of trials. Increased attention to and research on the methodological aspects of confirmatory chronic pain clinical trials has the potential to enhance their assay sensitivity and ultimately provide more meaningful evaluations of treatments for chronic pain. Copyright 2010 International Association for the Study of Pain. All rights reserved.
Topology optimization for nonlinear dynamic problems: Considerations for automotive crashworthiness
NASA Astrophysics Data System (ADS)
Kaushik, Anshul; Ramani, Anand
2014-04-01
Crashworthiness of automotive structures is most often engineered after an optimal topology has been arrived at using other design considerations. This study is an attempt to incorporate crashworthiness requirements upfront in the topology synthesis process using a mathematically consistent framework. It proposes the use of equivalent linear systems from the nonlinear dynamic simulation in conjunction with a discrete-material topology optimizer. Velocity and acceleration constraints are consistently incorporated in the optimization set-up. Issues specific to crash problems due to the explicit solution methodology employed, nature of the boundary conditions imposed on the structure, etc. are discussed and possible resolutions are proposed. A demonstration of the methodology on two-dimensional problems that address some of the structural requirements and the types of loading typical of frontal and side impact is provided in order to show that this methodology has the potential for topology synthesis incorporating crashworthiness requirements.
Parks, Nathan A.
2013-01-01
The simultaneous application of transcranial magnetic stimulation (TMS) with non-invasive neuroimaging provides a powerful method for investigating functional connectivity in the human brain and the causal relationships between areas in distributed brain networks. TMS has been combined with numerous neuroimaging techniques including, electroencephalography (EEG), functional magnetic resonance imaging (fMRI), and positron emission tomography (PET). Recent work has also demonstrated the feasibility and utility of combining TMS with non-invasive near-infrared optical imaging techniques, functional near-infrared spectroscopy (fNIRS) and the event-related optical signal (EROS). Simultaneous TMS and optical imaging affords a number of advantages over other neuroimaging methods but also involves a unique set of methodological challenges and considerations. This paper describes the methodology of concurrently performing optical imaging during the administration of TMS, focusing on experimental design, potential artifacts, and approaches to controlling for these artifacts. PMID:24065911
Single-case research design in pediatric psychology: considerations regarding data analysis.
Cohen, Lindsey L; Feinstein, Amanda; Masuda, Akihiko; Vowles, Kevin E
2014-03-01
Single-case research allows for an examination of behavior and can demonstrate the functional relation between intervention and outcome in pediatric psychology. This review highlights key assumptions, methodological and design considerations, and options for data analysis. Single-case methodology and guidelines are reviewed with an in-depth focus on visual and statistical analyses. Guidelines allow for the careful evaluation of design quality and visual analysis. A number of statistical techniques have been introduced to supplement visual analysis, but to date, there is no consensus on their recommended use in single-case research design. Single-case methodology is invaluable for advancing pediatric psychology science and practice, and guidelines have been introduced to enhance the consistency, validity, and reliability of these studies. Experts generally agree that visual inspection is the optimal method of analysis in single-case design; however, statistical approaches are becoming increasingly evaluated and used to augment data interpretation.
Initialization and Setup of the Coastal Model Test Bed: STWAVE
2017-01-01
Laboratory (CHL) Field Research Facility (FRF) in Duck , NC. The improved evaluation methodology will promote rapid enhancement of model capability and focus...Blanton 2008) study . This regional digital elevation model (DEM), with a cell size of 10 m, was generated from numerous datasets collected at different...INFORMATION: For additional information, contact Spicer Bak, Coastal Observation and Analysis Branch, Coastal and Hydraulics Laboratory, 1261 Duck Road
Nikolay Strigul; Jean Lienard
2015-01-01
Forest inventory datasets offer unprecedented opportunities to model forest dynamics under evolving environmental conditions but they are analytically challenging due to irregular sampling time intervals of the same plot, across the years. We propose here a novel method to model dynamic changes in forest biomass and basal area using forest inventory data. Our...
2018-02-01
similar methodology as the author’s example was conducted to prepare this dataset for processing via the SGM algorithm. Since and ′ are...TECHNICAL MEMORANDUM APPROVED FOR PUBLIC RELEASE; DISTRIBUTION UNLIMITED STINFO COPY AIR FORCE RESEARCH LABORATORY...PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) Air Force Research Laboratory/RIEA 525 Brooks Road Rome NY 13441-4505 8. PERFORMING ORGANIZATION REPORT NUMBER
NASA Astrophysics Data System (ADS)
Grova, C.; Jannin, P.; Biraben, A.; Buvat, I.; Benali, H.; Bernard, A. M.; Scarabin, J. M.; Gibaud, B.
2003-12-01
Quantitative evaluation of brain MRI/SPECT fusion methods for normal and in particular pathological datasets is difficult, due to the frequent lack of relevant ground truth. We propose a methodology to generate MRI and SPECT datasets dedicated to the evaluation of MRI/SPECT fusion methods and illustrate the method when dealing with ictal SPECT. The method consists in generating normal or pathological SPECT data perfectly aligned with a high-resolution 3D T1-weighted MRI using realistic Monte Carlo simulations that closely reproduce the response of a SPECT imaging system. Anatomical input data for the SPECT simulations are obtained from this 3D T1-weighted MRI, while functional input data result from an inter-individual analysis of anatomically standardized SPECT data. The method makes it possible to control the 'brain perfusion' function by proposing a theoretical model of brain perfusion from measurements performed on real SPECT images. Our method provides an absolute gold standard for assessing MRI/SPECT registration method accuracy since, by construction, the SPECT data are perfectly registered with the MRI data. The proposed methodology has been applied to create a theoretical model of normal brain perfusion and ictal brain perfusion characteristic of mesial temporal lobe epilepsy. To approach realistic and unbiased perfusion models, real SPECT data were corrected for uniform attenuation, scatter and partial volume effect. An anatomic standardization was used to account for anatomic variability between subjects. Realistic simulations of normal and ictal SPECT deduced from these perfusion models are presented. The comparison of real and simulated SPECT images showed relative differences in regional activity concentration of less than 20% in most anatomical structures, for both normal and ictal data, suggesting realistic models of perfusion distributions for evaluation purposes. Inter-hemispheric asymmetry coefficients measured on simulated data were found within the range of asymmetry coefficients measured on corresponding real data. The features of the proposed approach are compared with those of other methods previously described to obtain datasets appropriate for the assessment of fusion methods.
Updated population metadata for United States historical climatology network stations
Owen, T.W.; Gallo, K.P.
2000-01-01
The United States Historical Climatology Network (HCN) serial temperature dataset is comprised of 1221 high-quality, long-term climate observing stations. The HCN dataset is available in several versions, one of which includes population-based temperature modifications to adjust urban temperatures for the "heat-island" effect. Unfortunately, the decennial population metadata file is not complete as missing values are present for 17.6% of the 12 210 population values associated with the 1221 individual stations during the 1900-90 interval. Retrospective grid-based populations. Within a fixed distance of an HCN station, were estimated through the use of a gridded population density dataset and historically available U.S. Census county data. The grid-based populations for the HCN stations provide values derived from a consistent methodology compared to the current HCN populations that can vary as definitions of the area associated with a city change over time. The use of grid-based populations may minimally be appropriate to augment populations for HCN climate stations that lack any population data, and are recommended when consistent and complete population data are required. The recommended urban temperature adjustments based on the HCN and grid-based methods of estimating station population can be significantly different for individual stations within the HCN dataset.
Parton Distributions based on a Maximally Consistent Dataset
NASA Astrophysics Data System (ADS)
Rojo, Juan
2016-04-01
The choice of data that enters a global QCD analysis can have a substantial impact on the resulting parton distributions and their predictions for collider observables. One of the main reasons for this has to do with the possible presence of inconsistencies, either internal within an experiment or external between different experiments. In order to assess the robustness of the global fit, different definitions of a conservative PDF set, that is, a PDF set based on a maximally consistent dataset, have been introduced. However, these approaches are typically affected by theory biases in the selection of the dataset. In this contribution, after a brief overview of recent NNPDF developments, we propose a new, fully objective, definition of a conservative PDF set, based on the Bayesian reweighting approach. Using the new NNPDF3.0 framework, we produce various conservative sets, which turn out to be mutually in agreement within the respective PDF uncertainties, as well as with the global fit. We explore some of their implications for LHC phenomenology, finding also good consistency with the global fit result. These results provide a non-trivial validation test of the new NNPDF3.0 fitting methodology, and indicate that possible inconsistencies in the fitted dataset do not affect substantially the global fit PDFs.
Learning in data-limited multimodal scenarios: Scandent decision forests and tree-based features.
Hor, Soheil; Moradi, Mehdi
2016-12-01
Incomplete and inconsistent datasets often pose difficulties in multimodal studies. We introduce the concept of scandent decision trees to tackle these difficulties. Scandent trees are decision trees that optimally mimic the partitioning of the data determined by another decision tree, and crucially, use only a subset of the feature set. We show how scandent trees can be used to enhance the performance of decision forests trained on a small number of multimodal samples when we have access to larger datasets with vastly incomplete feature sets. Additionally, we introduce the concept of tree-based feature transforms in the decision forest paradigm. When combined with scandent trees, the tree-based feature transforms enable us to train a classifier on a rich multimodal dataset, and use it to classify samples with only a subset of features of the training data. Using this methodology, we build a model trained on MRI and PET images of the ADNI dataset, and then test it on cases with only MRI data. We show that this is significantly more effective in staging of cognitive impairments compared to a similar decision forest model trained and tested on MRI only, or one that uses other kinds of feature transform applied to the MRI data. Copyright © 2016. Published by Elsevier B.V.
Key Methodological Aspects of Translators' Training in Ukraine and in the USA
ERIC Educational Resources Information Center
Skyba, Kateryna
2015-01-01
The diversity of international relations in the globalized world has influenced the role of a translator that is becoming more and more important. Translators' training institutions today are to work out and to implement the best teaching methodology taking into consideration the new challenges of modern multinational and multicultural society.…
Practical Considerations for Conducting Delphi Studies: The Oracle Enters a New Age.
ERIC Educational Resources Information Center
Eggers, Renee M.; Jones, Charles M.
1998-01-01
In addition to giving an overview of Delphi methodology and describing the methodology used by the researchers in two Delphi studies, the authors provide information about electronic communication in Delphi studies. Also provided are suggestions that can be used in a Delphi study involving any form of communication. (SLD)
Toward an Affinity Space Methodology: Considerations for Literacy Research
ERIC Educational Resources Information Center
Lammers, Jayne C.; Curwood, Jen Scott; Magnifico, Alecia Marie
2012-01-01
As researchers seek to make sense of young people's online literacy practices and participation, questions of methodology are important to consider. In our work to understand the culture of physical, virtual and blended spheres that adolescents inhabit, we find it necessary to expand Gee's (2004) notion of affinity spaces. In this article, we draw…
ERIC Educational Resources Information Center
Matthews, Michael R.
2004-01-01
Galileo's discovery of the properties of pendulum motion depended on his adoption of the novel methodology of idealisation. Galileo's laws of pendulum motion could not be accepted until the empiricist methodological constraints placed on science by Aristotle, and by common sense, were overturned. As long as scientific claims were judged by how the…
ERIC Educational Resources Information Center
Wylie, Ruth C.
This volume of the revised edition describes and evaluates measurement methods, research designs, and procedures which have been or might appropriately be used in self-concept research. Working from the perspective that self-concept or phenomenal personality theories can be scientifically investigated, methodological flaws and questionable…
Researching Education Policy in a Globalized World: Theoretical and Methodological Considerations
ERIC Educational Resources Information Center
Lingard, Bob
2009-01-01
This paper shows how globalization has given rise to a number of new theoretical and methodological issues for doing education policy analysis linked to globalization's impact within critical social science. Critical policy analysis has always required critical "reflexivity" and awareness of the "positionality" of the policy analyst. However, as…
Johnstone, Daniel M.; Riveros, Carlos; Heidari, Moones; Graham, Ross M.; Trinder, Debbie; Berretta, Regina; Olynyk, John K.; Scott, Rodney J.; Moscato, Pablo; Milward, Elizabeth A.
2013-01-01
While Illumina microarrays can be used successfully for detecting small gene expression changes due to their high degree of technical replicability, there is little information on how different normalization and differential expression analysis strategies affect outcomes. To evaluate this, we assessed concordance across gene lists generated by applying different combinations of normalization strategy and analytical approach to two Illumina datasets with modest expression changes. In addition to using traditional statistical approaches, we also tested an approach based on combinatorial optimization. We found that the choice of both normalization strategy and analytical approach considerably affected outcomes, in some cases leading to substantial differences in gene lists and subsequent pathway analysis results. Our findings suggest that important biological phenomena may be overlooked when there is a routine practice of using only one approach to investigate all microarray datasets. Analytical artefacts of this kind are likely to be especially relevant for datasets involving small fold changes, where inherent technical variation—if not adequately minimized by effective normalization—may overshadow true biological variation. This report provides some basic guidelines for optimizing outcomes when working with Illumina datasets involving small expression changes. PMID:27605185
Sorting protein decoys by machine-learning-to-rank
Jing, Xiaoyang; Wang, Kai; Lu, Ruqian; Dong, Qiwen
2016-01-01
Much progress has been made in Protein structure prediction during the last few decades. As the predicted models can span a broad range of accuracy spectrum, the accuracy of quality estimation becomes one of the key elements of successful protein structure prediction. Over the past years, a number of methods have been developed to address this issue, and these methods could be roughly divided into three categories: the single-model methods, clustering-based methods and quasi single-model methods. In this study, we develop a single-model method MQAPRank based on the learning-to-rank algorithm firstly, and then implement a quasi single-model method Quasi-MQAPRank. The proposed methods are benchmarked on the 3DRobot and CASP11 dataset. The five-fold cross-validation on the 3DRobot dataset shows the proposed single model method outperforms other methods whose outputs are taken as features of the proposed method, and the quasi single-model method can further enhance the performance. On the CASP11 dataset, the proposed methods also perform well compared with other leading methods in corresponding categories. In particular, the Quasi-MQAPRank method achieves a considerable performance on the CASP11 Best150 dataset. PMID:27530967
Sorting protein decoys by machine-learning-to-rank.
Jing, Xiaoyang; Wang, Kai; Lu, Ruqian; Dong, Qiwen
2016-08-17
Much progress has been made in Protein structure prediction during the last few decades. As the predicted models can span a broad range of accuracy spectrum, the accuracy of quality estimation becomes one of the key elements of successful protein structure prediction. Over the past years, a number of methods have been developed to address this issue, and these methods could be roughly divided into three categories: the single-model methods, clustering-based methods and quasi single-model methods. In this study, we develop a single-model method MQAPRank based on the learning-to-rank algorithm firstly, and then implement a quasi single-model method Quasi-MQAPRank. The proposed methods are benchmarked on the 3DRobot and CASP11 dataset. The five-fold cross-validation on the 3DRobot dataset shows the proposed single model method outperforms other methods whose outputs are taken as features of the proposed method, and the quasi single-model method can further enhance the performance. On the CASP11 dataset, the proposed methods also perform well compared with other leading methods in corresponding categories. In particular, the Quasi-MQAPRank method achieves a considerable performance on the CASP11 Best150 dataset.
Capturing Data Connections within the Climate Data Initiative to Support Resiliency
NASA Astrophysics Data System (ADS)
Ramachandran, R.; Bugbee, K.; Weigel, A. M.; Tilmes, C.
2015-12-01
The Climate Data Initiative (CDI) focuses on preparing the United States for the impacts of climate change by leveraging existing federal climate-relevant data to stimulate innovation and private-sector entrepreneurship supporting national climate-change preparedness. To achieve these goals, relevant data was curated around seven thematic areas relevant to climate change resiliency. Data for each theme was selected by subject matter experts from various Federal agencies and collected in Data.gov at http://climate.data.gov. While the curation effort for each theme has been immensely valuable on its own, in the end, the themes essentially become a long directory or a list. Establishing valuable connections between datasets and their intended use is lost. Therefore, the user understands that the datasets in the list have been approved by the CDI subject matter experts but has less certainty when making connections between the various datasets and their possible applications. Additionally, the intended use of the curated list is overwhelming and can be difficult to interpret. In order to better address the needs of the CDI data end users, the CDI team has been developing a new controlled vocabulary that will assist in capturing connections between datasets. This new vocabulary will be implemented in the Global Change Information System (GCIS), which has the capability to link individual items within the system. This presentation will highlight the methodology used to develop the controlled vocabulary that will aid end users in both understanding and locating relevant datasets for their intended use.
Hamdan, Sadeque; Cheaitou, Ali
2017-08-01
This data article provides detailed optimization input and output datasets and optimization code for the published research work titled "Dynamic green supplier selection and order allocation with quantity discounts and varying supplier availability" (Hamdan and Cheaitou, 2017, In press) [1]. Researchers may use these datasets as a baseline for future comparison and extensive analysis of the green supplier selection and order allocation problem with all-unit quantity discount and varying number of suppliers. More particularly, the datasets presented in this article allow researchers to generate the exact optimization outputs obtained by the authors of Hamdan and Cheaitou (2017, In press) [1] using the provided optimization code and then to use them for comparison with the outputs of other techniques or methodologies such as heuristic approaches. Moreover, this article includes the randomly generated optimization input data and the related outputs that are used as input data for the statistical analysis presented in Hamdan and Cheaitou (2017 In press) [1] in which two different approaches for ranking potential suppliers are compared. This article also provides the time analysis data used in (Hamdan and Cheaitou (2017, In press) [1] to study the effect of the problem size on the computation time as well as an additional time analysis dataset. The input data for the time study are generated randomly, in which the problem size is changed, and then are used by the optimization problem to obtain the corresponding optimal outputs as well as the corresponding computation time.
Watson, Nathanial E; Parsons, Brendon A; Synovec, Robert E
2016-08-12
Performance of tile-based Fisher Ratio (F-ratio) data analysis, recently developed for discovery-based studies using comprehensive two-dimensional gas chromatography coupled with time-of-flight mass spectrometry (GC×GC-TOFMS), is evaluated with a metabolomics dataset that had been previously analyzed in great detail, but while taking a brute force approach. The previously analyzed data (referred to herein as the benchmark dataset) were intracellular extracts from Saccharomyces cerevisiae (yeast), either metabolizing glucose (repressed) or ethanol (derepressed), which define the two classes in the discovery-based analysis to find metabolites that are statistically different in concentration between the two classes. Beneficially, this previously analyzed dataset provides a concrete means to validate the tile-based F-ratio software. Herein, we demonstrate and validate the significant benefits of applying tile-based F-ratio analysis. The yeast metabolomics data are analyzed more rapidly in about one week versus one year for the prior studies with this dataset. Furthermore, a null distribution analysis is implemented to statistically determine an adequate F-ratio threshold, whereby the variables with F-ratio values below the threshold can be ignored as not class distinguishing, which provides the analyst with confidence when analyzing the hit table. Forty-six of the fifty-four benchmarked changing metabolites were discovered by the new methodology while consistently excluding all but one of the benchmarked nineteen false positive metabolites previously identified. Copyright © 2016 Elsevier B.V. All rights reserved.
Knai, Cécile; Brusamento, Serena; Legido-Quigley, Helena; Saliba, Vanessa; Panteli, Dimitra; Turk, Eva; Car, Josip; McKee, Martin; Busse, Reinhard
2012-10-01
The use of evidence-based clinical guidelines is an essential component of chronic disease management. However, there is well-documented concern about variability in the quality of clinical guidelines, with evidence of persisting methodological shortcomings. The most widely accepted approach to assessing the quality of guidelines is the Appraisal of Guidelines for Research and Evaluation (AGREE) instrument. We have conducted a systematic review of the methodological quality (as assessed by AGREE) of clinical guidelines developed in Europe for the management of chronic diseases published since 2000. The systematic review was undertaken in accordance with the Cochrane methodology. The inclusion criteria were that studies should have appraised European clinical guidelines for certain selected chronic disorders using the AGREE instrument. We searched five databases (Cab Abstracts, EMBASE, MEDLINE, Trip and EPPI). Nine studies reported in 10 papers, analysing a total of 28 European guidelines from eight countries as well as pan-European, were included. There was considerable variation in the quality of clinical guidelines across the AGREE domains. The least well addressed domains were 'editorial independence' (with a mean domain score of 41%), 'applicability' (44%), 'stakeholder involvement' (55%), and 'rigour of development' (64%), while 'clarity of presentation' (80%) and 'scope and purpose' (84%) were less problematic. This review indicates that there is considerable scope for improvement in the methods used to develop clinical guidelines for the prevention, management and treatment of chronic diseases in Europe. Given the importance of decision support strategies such as clinical guidelines in chronic disease management, improvement measures should include the explicit and transparent involvement of key stakeholders (especially scientific experts, guideline users and methodological specialists) and consideration of the implications for guideline implementation and applicability early on in the process. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.
Gratia, Audrey; Merlet, Denis; Ducruet, Violette; Lyathaud, Cédric
2015-01-01
A nuclear magnetic resonance (NMR) methodology was assessed regarding the identification and quantification of additives in three types of polylactide (PLA) intended as food contact materials. Additives were identified using the LNE/NMR database which clusters NMR datasets on more than 130 substances authorized by European Regulation No. 10/2011. Of the 12 additives spiked in the three types of PLA pellets, 10 were rapidly identified by the database and correlated with spectral comparison. The levels of the 12 additives were estimated using quantitative NMR combined with graphical computation. A comparison with chromatographic methods tended to prove the sensitivity of NMR by demonstrating an analytical difference of less than 15%. Our results therefore demonstrated the efficiency of the proposed NMR methodology for rapid assessment of the composition of PLA. Copyright © 2014 Elsevier B.V. All rights reserved.
Large Dataset of Acute Oral Toxicity Data Created for Testing ...
Acute toxicity data is a common requirement for substance registration in the US. Currently only data derived from animal tests are accepted by regulatory agencies, and the standard in vivo tests use lethality as the endpoint. Non-animal alternatives such as in silico models are being developed due to animal welfare and resource considerations. We compiled a large dataset of oral rat LD50 values to assess the predictive performance currently available in silico models. Our dataset combines LD50 values from five different sources: literature data provided by The Dow Chemical Company, REACH data from eChemportal, HSDB (Hazardous Substances Data Bank), RTECS data from Leadscope, and the training set underpinning TEST (Toxicity Estimation Software Tool). Combined these data sources yield 33848 chemical-LD50 pairs (data points), with 23475 unique data points covering 16439 compounds. The entire dataset was loaded into a chemical properties database. All of the compounds were registered in DSSTox and 59.5% have publically available structures. Compounds without a structure in DSSTox are currently having their structures registered. The structural data will be used to evaluate the predictive performance and applicable chemical domains of three QSAR models (TIMES, PROTOX, and TEST). Future work will combine the dataset with information from ToxCast assays, and using random forest modeling, assess whether ToxCast assays are useful in predicting acute oral toxicity. Pre
Sylvetsky, Allison C.; Blau, Jenny E.; Rother, Kristina I.
2016-01-01
Consumption of foods, beverages, and packets containing low-calorie sweeteners (LCS) has increased markedly across gender, age, race/ethnicity, weight status, and socioeconomic subgroups. However, well-controlled intervention studies rigorously evaluating the health effects of LCS in humans are limited. One of the key questions is whether LCS are indeed a beneficial strategy for weight management and prevention of obesity. The current review discusses several methodological considerations in the design and interpretation of these studies. Specifically, we focus on the selection of study participants, inclusion of an appropriate control, importance of considering habitual LCS exposure, selection of specific LCS, dose and route of LCS administration, choice of study outcomes, and the context and generalizability of the study findings. These critical considerations will guide the design of future studies and thus assist in understanding the health effects of LCS. PMID:26936185
Sylvetsky, Allison C; Blau, Jenny E; Rother, Kristina I
2016-06-01
Consumption of foods, beverages, and packets containing low-calorie sweeteners (LCS) has increased markedly across gender, age, race/ethnicity, weight status, and socio-economic subgroups. However, well-controlled intervention studies rigorously evaluating the health effects of LCS in humans are limited. One of the key questions is whether LCS are indeed a beneficial strategy for weight management and prevention of obesity. The current review discusses several methodological considerations in the design and interpretation of these studies. Specifically, we focus on the selection of study participants, inclusion of an appropriate control, importance of considering habitual LCS exposure, selection of specific LCS, dose and route of LCS administration, choice of study outcomes, and the context and generalizability of the study findings. These critical considerations will guide the design of future studies and thus assist in understanding the health effects of LCS.
Returns to Education in Rural China
ERIC Educational Resources Information Center
Zhao, Litao
2007-01-01
Based on one of the most widely used datasets by foreign-based sociologists, this paper examines the rate of returns to education in rural China. Compared with the previous studies that showed rather low rates in rural areas throughout the 1980s, this study finds a considerably higher rate in 1996. A chief contributor is the rapid non-agricultural…
Schnohr, Christina W; Molcho, Michal; Rasmussen, Mette; Samdal, Oddrun; de Looze, Margreet; Levin, Kate; Roberts, Chris J; Ehlinger, Virginie; Krølner, Rikke; Dalmasso, Paola; Torsheim, Torbjørn
2015-04-01
This article presents the scope and development of the Health Behaviour in School-aged Children (HBSC) study, reviews trend papers published on international HBSC data up to 2012 and discusses the efforts made to produce reliable trend analyses. The major goal of this article is to present the statistical procedures and analytical strategies for upholding high data quality, as well as reflections from the authors of this article on how to produce reliable trends based on an international study of the magnitude of the HBSC study. HBSC is an international cross-sectional study collecting data from adolescents aged 11-15 years, on a broad variety of health determinants and health behaviours. A number of methodological challenges have stemmed from the growth of the HBSC-study, in particular given that the study has a focus on monitoring trends. Some of those challenges are considered. When analysing trends, researchers must be able to assess whether a change in prevalence is an expression of an actual change in the observed outcome, whether it is a result of methodological artefacts, or whether it is due to changes in the conceptualization of the outcome by the respondents. The article present recommendations to take a number of the considerations into account. The considerations imply methodological challenges, which are core issues in undertaking trend analyses. © The Author 2015. Published by Oxford University Press on behalf of the European Public Health Association. All rights reserved.
Scardigno, Domenico; Fanelli, Emanuele; Viggiano, Annarita; Braccio, Giacobbe; Magi, Vinicio
2016-06-01
This article provides the dataset of operating conditions of a hybrid organic Rankine plant generated by the optimization procedure employed in the research article "A genetic optimization of a hybrid organic Rankine plant for solar and low-grade energy sources" (Scardigno et al., 2015) [1]. The methodology used to obtain the data is described. The operating conditions are subdivided into two separate groups: feasible and unfeasible solutions. In both groups, the values of the design variables are given. Besides, the subset of feasible solutions is described in details, by providing the thermodynamic and economic performances, the temperatures at some characteristic sections of the thermodynamic cycle, the net power, the absorbed powers and the area of the heat exchange surfaces.
Genetic Algorithms and Classification Trees in Feature Discovery: Diabetes and the NHANES database
DOE Office of Scientific and Technical Information (OSTI.GOV)
Heredia-Langner, Alejandro; Jarman, Kristin H.; Amidan, Brett G.
2013-09-01
This paper presents a feature selection methodology that can be applied to datasets containing a mixture of continuous and categorical variables. Using a Genetic Algorithm (GA), this method explores a dataset and selects a small set of features relevant for the prediction of a binary (1/0) response. Binary classification trees and an objective function based on conditional probabilities are used to measure the fitness of a given subset of features. The method is applied to health data in order to find factors useful for the prediction of diabetes. Results show that our algorithm is capable of narrowing down the setmore » of predictors to around 8 factors that can be validated using reputable medical and public health resources.« less
Estimating Mixture of Gaussian Processes by Kernel Smoothing
Huang, Mian; Li, Runze; Wang, Hansheng; Yao, Weixin
2014-01-01
When the functional data are not homogeneous, e.g., there exist multiple classes of functional curves in the dataset, traditional estimation methods may fail. In this paper, we propose a new estimation procedure for the Mixture of Gaussian Processes, to incorporate both functional and inhomogeneous properties of the data. Our method can be viewed as a natural extension of high-dimensional normal mixtures. However, the key difference is that smoothed structures are imposed for both the mean and covariance functions. The model is shown to be identifiable, and can be estimated efficiently by a combination of the ideas from EM algorithm, kernel regression, and functional principal component analysis. Our methodology is empirically justified by Monte Carlo simulations and illustrated by an analysis of a supermarket dataset. PMID:24976675
DOE Office of Scientific and Technical Information (OSTI.GOV)
Li, Xiaoma; Zhou, Yuyu; Asrar, Ghassem R.
High spatiotemporal resolution air temperature (Ta) datasets are increasingly needed for assessing the impact of temperature change on people, ecosystems, and energy system, especially in the urban domains. However, such datasets are not widely available because of the large spatiotemporal heterogeneity of Ta caused by complex biophysical and socioeconomic factors such as built infrastructure and human activities. In this study, we developed a 1-km gridded dataset of daily minimum Ta (Tmin) and maximum Ta (Tmax), and the associated uncertainties, in urban and surrounding areas in the conterminous U.S. for the 2003–2016 period. Daily geographically weighted regression (GWR) models were developedmore » and used to interpolate Ta using 1 km daily land surface temperature and elevation as explanatory variables. The leave-one-out cross-validation approach indicates that our method performs reasonably well, with root mean square errors of 2.1 °C and 1.9 °C, mean absolute errors of 1.5 °C and 1.3 °C, and R 2 of 0.95 and 0.97, for Tmin and Tmax, respectively. The resulting dataset captures reasonably the spatial heterogeneity of Ta in the urban areas, and also captures effectively the urban heat island (UHI) phenomenon that Ta rises with the increase of urban development (i.e., impervious surface area). The new dataset is valuable for studying environmental impacts of urbanization such as UHI and other related effects (e.g., on building energy consumption and human health). The proposed methodology also shows a potential to build a long-term record of Ta worldwide, to fill the data gap that currently exists for studies of urban systems.« less
White blood cells identification system based on convolutional deep neural learning networks.
Shahin, A I; Guo, Yanhui; Amin, K M; Sharawi, Amr A
2017-11-16
White blood cells (WBCs) differential counting yields valued information about human health and disease. The current developed automated cell morphology equipments perform differential count which is based on blood smear image analysis. Previous identification systems for WBCs consist of successive dependent stages; pre-processing, segmentation, feature extraction, feature selection, and classification. There is a real need to employ deep learning methodologies so that the performance of previous WBCs identification systems can be increased. Classifying small limited datasets through deep learning systems is a major challenge and should be investigated. In this paper, we propose a novel identification system for WBCs based on deep convolutional neural networks. Two methodologies based on transfer learning are followed: transfer learning based on deep activation features and fine-tuning of existed deep networks. Deep acrivation featues are extracted from several pre-trained networks and employed in a traditional identification system. Moreover, a novel end-to-end convolutional deep architecture called "WBCsNet" is proposed and built from scratch. Finally, a limited balanced WBCs dataset classification is performed through the WBCsNet as a pre-trained network. During our experiments, three different public WBCs datasets (2551 images) have been used which contain 5 healthy WBCs types. The overall system accuracy achieved by the proposed WBCsNet is (96.1%) which is more than different transfer learning approaches or even the previous traditional identification system. We also present features visualization for the WBCsNet activation which reflects higher response than the pre-trained activated one. a novel WBCs identification system based on deep learning theory is proposed and a high performance WBCsNet can be employed as a pre-trained network. Copyright © 2017. Published by Elsevier B.V.
NASA Astrophysics Data System (ADS)
Garcia Galiano, S. G.; Olmos, P.; Giraldo Osorio, J. D.
2015-12-01
In the Mediterranean area, significant changes on temperature and precipitation are expected throughout the century. These trends could exacerbate the existing conditions in regions already vulnerable to climatic variability, reducing the water availability. Improving knowledge about plausible impacts of climate change on water cycle processes at basin scale, is an important step for building adaptive capacity to the impacts in this region, where severe water shortages are expected for the next decades. RCMs ensemble in combination with distributed hydrological models with few parameters, constitutes a valid and robust methodology to increase the reliability of climate and hydrological projections. For reaching this objective, a novel methodology for building Regional Climate Models (RCMs) ensembles of meteorological variables (rainfall an temperatures), was applied. RCMs ensembles are justified for increasing the reliability of climate and hydrological projections. The evaluation of RCMs goodness-of-fit to build the ensemble is based on empirical probability density functions (PDF) extracted from both RCMs dataset and a highly resolution gridded observational dataset, for the time period 1961-1990. The applied method is considering the seasonal and annual variability of the rainfall and temperatures. The RCMs ensembles constitute the input to a distributed hydrological model at basin scale, for assessing the runoff projections. The selected hydrological model is presenting few parameters in order to reduce the uncertainties involved. The study basin corresponds to a head basin of Segura River Basin, located in the South East of Spain. The impacts on runoff and its trend from observational dataset and climate projections, were assessed. Considering the control period 1961-1990, plausible significant decreases in runoff for the time period 2021-2050, were identified.
Ware, Jessica L; Grimaldi, David A; Engel, Michael S
2010-01-01
Among insects, eusocial behavior occurs in termites, ants, some bees and wasps. Isoptera and Hymenoptera convergently share social behavior, and for both taxa its evolution remains poorly understood. While dating analyses provide researchers with the opportunity to date the origin of eusociality, fossil calibration methodology may mislead subsequent ecological interpretations. Using a comprehensive termite dataset, we explored the effect of fossil placement and calibration methodology. A combined molecular and morphological dataset for 42 extant termite lineages was used, and a second dataset including these 42 taxa, plus an additional 39 fossil lineages for which we had only morphological data. MrBayes doublet-model analyses recovered similar topologies, with one minor exception (Stolotermitidae is sister to the Hodotermitidae, s.s., in the 42-taxon analysis but is in a polytomy with Hodotermitidae and (Kalotermitidae + Neoisoptera) in the 81-taxon analysis). Analyses using the r8s program on these topologies were run with either minimum/maximum constraints (analysis a = 42-taxon and analysis c = 81-taxon analyses) or with the fossil taxon ages fixed (ages fixed to be the geological age of the deposit from which they came, analysis b = 81-taxon analysis). Confidence intervals were determined for the resulting ultrametric trees, and for most major clades there was significant overlap between dates recovered for analyses A and C (with exceptions, such as the nodes Neoisoptera, and Euisoptera). With the exception of isopteran and eusiopteran node ages, however, none of the major clade ages overlapped when analysis B is compared with either analysis A or C. Future studies on Dictyoptera should note that the age of Kalotermitidae was underestimated in absence of kalotermitid fossils with fixed ages. Copyright (c) 2009 Elsevier Ltd. All rights reserved.
Fast 3D shape screening of large chemical databases through alignment-recycling
Fontaine, Fabien; Bolton, Evan; Borodina, Yulia; Bryant, Stephen H
2007-01-01
Background Large chemical databases require fast, efficient, and simple ways of looking for similar structures. Although such tasks are now fairly well resolved for graph-based similarity queries, they remain an issue for 3D approaches, particularly for those based on 3D shape overlays. Inspired by a recent technique developed to compare molecular shapes, we designed a hybrid methodology, alignment-recycling, that enables efficient retrieval and alignment of structures with similar 3D shapes. Results Using a dataset of more than one million PubChem compounds of limited size (< 28 heavy atoms) and flexibility (< 6 rotatable bonds), we obtained a set of a few thousand diverse structures covering entirely the 3D shape space of the conformers of the dataset. Transformation matrices gathered from the overlays between these diverse structures and the 3D conformer dataset allowed us to drastically (100-fold) reduce the CPU time required for shape overlay. The alignment-recycling heuristic produces results consistent with de novo alignment calculation, with better than 80% hit list overlap on average. Conclusion Overlay-based 3D methods are computationally demanding when searching large databases. Alignment-recycling reduces the CPU time to perform shape similarity searches by breaking the alignment problem into three steps: selection of diverse shapes to describe the database shape-space; overlay of the database conformers to the diverse shapes; and non-optimized overlay of query and database conformers using common reference shapes. The precomputation, required by the first two steps, is a significant cost of the method; however, once performed, querying is two orders of magnitude faster. Extensions and variations of this methodology, for example, to handle more flexible and larger small-molecules are discussed. PMID:17880744
Dupree, Jean A.; Crowfoot, Richard M.
2012-01-01
This geodatabase and its component datasets are part of U.S. Geological Survey Digital Data Series 650 and were generated to store basin boundaries for U.S. Geological Survey streamgages and other sites in Colorado. The geodatabase and its components were created by the U.S. Geological Survey, Colorado Water Science Center, and are used to derive the numeric drainage areas for Colorado that are input into the U.S. Geological Survey's National Water Information System (NWIS) database and also published in the Annual Water Data Report and on NWISWeb. The foundational dataset used to create the basin boundaries in this geodatabase was the National Watershed Boundary Dataset. This geodatabase accompanies a U.S. Geological Survey Techniques and Methods report (Book 11, Section C, Chapter 6) entitled "Digital Database Architecture and Delineation Methodology for Deriving Drainage Basins, and Comparison of Digitally and Non-Digitally Derived Numeric Drainage Areas." The Techniques and Methods report details the geodatabase architecture, describes the delineation methodology and workflows used to develop these basin boundaries, and compares digitally derived numeric drainage areas in this geodatabase to non-digitally derived areas. 1. COBasins.gdb: This geodatabase contains site locations and basin boundaries for Colorado. It includes a single feature dataset, called BasinsFD, which groups the component feature classes and topology rules. 2. BasinsFD: This feature dataset in the "COBasins.gdb" geodatabase is a digital container that holds the feature classes used to archive site locations and basin boundaries as well as the topology rules that govern spatial relations within and among component feature classes. This feature dataset includes three feature classes: the sites for which basins have been delineated (the "Sites" feature class), basin bounding lines (the "BasinLines" feature class), and polygonal basin areas (the "BasinPolys" feature class). The feature dataset also stores the topology rules (the "BasinsFD_Topology") that constrain the relations within and among component feature classes. The feature dataset also forces any feature classes inside it to have a consistent projection system, which is, in this case, an Albers-Equal-Area projection system. 3. BasinsFD_Topology: This topology contains four persistent topology rules that constrain the spatial relations within the "BasinLines" feature class and between the "BasinLines" feature class and the "BasinPolys" feature classes. 4. Sites: This point feature class contains the digital representations of the site locations for which Colorado Water Science Center basin boundaries have been delineated. This feature class includes point locations for Colorado Water Science Center active (as of September 30, 2009) gages and for other sites. 5. BasinLines: This line feature class contains the perimeters of basins delineated for features in the "Sites" feature class, and it also contains information regarding the sources of lines used for the basin boundaries. 6. BasinPolys: This polygon feature class contains the polygonal basin areas delineated for features in the "Sites" feature class, and it is used to derive the numeric drainage areas published by the Colorado Water Science Center.
Norbury, Agnes; Seymour, Ben
2018-01-01
Response rates to available treatments for psychological and chronic pain disorders are poor, and there is a substantial burden of suffering and disability for patients, who often cycle through several rounds of ineffective treatment. As individuals presenting to the clinic with symptoms of these disorders are likely to be heterogeneous, there is considerable interest in the possibility that different constellations of signs could be used to identify subgroups of patients that might preferentially benefit from particular kinds of treatment. To this end, there has been a recent focus on the application of machine learning methods to attempt to identify sets of predictor variables (demographic, genetic, etc.) that could be used to target individuals towards treatments that are more likely to work for them in the first instance. Importantly, the training of such models generally relies on datasets where groups of individual predictor variables are labelled with a binary outcome category - usually 'responder' or 'non-responder' (to a particular treatment). However, as previously highlighted in other areas of medicine, there is a basic statistical problem in classifying individuals as 'responding' to a particular treatment on the basis of data from conventional randomized controlled trials. Specifically, insufficient information on the partition of variance components in individual symptom changes mean that it is inappropriate to consider data from the active treatment arm alone in this way. This may be particularly problematic in the case of psychiatric and chronic pain symptom data, where both within-subject variability and measurement error are likely to be high. Here, we outline some possible solutions to this problem in terms of dataset design and machine learning methodology, and conclude that it is important to carefully consider the kind of inferences that particular training data are able to afford, especially in arenas where the potential clinical benefit is so large.
Mapping Palm Swamp Wetland Ecosystems in the Peruvian Amazon: a Multi-Sensor Remote Sensing Approach
NASA Astrophysics Data System (ADS)
Podest, E.; McDonald, K. C.; Schroeder, R.; Pinto, N.; Zimmerman, R.; Horna, V.
2012-12-01
Wetland ecosystems are prevalent in the Amazon basin, especially in northern Peru. Of specific interest are palm swamp wetlands because they are characterized by constant surface inundation and moderate seasonal water level variation. This combination of constantly saturated soils and warm temperatures year-round can lead to considerable methane release to the atmosphere. Because of the widespread occurrence and expected sensitivity of these ecosystems to climate change, it is critical to develop methods to quantify their spatial extent and inundation state in order to assess their carbon dynamics. Spatio-temporal information on palm swamps is difficult to gather because of their remoteness and difficult accessibility. Spaceborne microwave remote sensing is an effective tool for characterizing these ecosystems since it is sensitive to surface water and vegetation structure and allows monitoring large inaccessible areas on a temporal basis regardless of atmospheric conditions or solar illumination. We developed a remote sensing methodology using multi-sensor remote sensing data from the Advanced Land Observing Satellite (ALOS) Phased Array L-Band Synthetic Aperture Radar (PALSAR), Shuttle Radar Topography Mission (SRTM) DEM, and Landsat to derive maps at 100 meter resolution of palm swamp extent and inundation based on ground data collections; and combined active and passive microwave data from AMSR-E and QuikSCAT to derive inundation extent at 25 kilometer resolution on a weekly basis. We then compared information content and accuracy of the coarse resolution products relative to the high-resolution datasets. The synergistic combination of high and low resolution datasets allowed for characterization of palm swamps and assessment of their flooding status. This work has been undertaken partly within the framework of the JAXA ALOS Kyoto & Carbon Initiative. PALSAR data have been provided by JAXA. Portions of this work were carried out at the Jet Propulsion Laboratory, California Institute of Technology, under contract with the National Aeronautics and Space Administration.
Yang, Deying; Fu, Yan; Wu, Xuhang; Xie, Yue; Nie, Huaming; Chen, Lin; Nong, Xiang; Gu, Xiaobin; Wang, Shuxian; Peng, Xuerong; Yan, Ning; Zhang, Runhui; Zheng, Wanpeng; Yang, Guangyou
2012-01-01
Background Taenia pisiformis is one of the most common intestinal tapeworms and can cause infections in canines. Adult T. pisiformis (canines as definitive hosts) and Cysticercus pisiformis (rabbits as intermediate hosts) cause significant health problems to the host and considerable socio-economic losses as a consequence. No complete genomic data regarding T. pisiformis are currently available in public databases. RNA-seq provides an effective approach to analyze the eukaryotic transcriptome to generate large functional gene datasets that can be used for further studies. Methodology/Principal Findings In this study, 2.67 million sequencing clean reads and 72,957 unigenes were generated using the RNA-seq technique. Based on a sequence similarity search with known proteins, a total of 26,012 unigenes (no redundancy) were identified after quality control procedures via the alignment of four databases. Overall, 15,920 unigenes were mapped to 203 Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. Through analyzing the glycolysis/gluconeogenesis and axonal guidance pathways, we achieved an in-depth understanding of the biochemistry of T. pisiformis. Here, we selected four unigenes at random and obtained their full-length cDNA clones using RACE PCR. Functional distribution characteristics were gained through comparing four cestode species (72,957 unigenes of T. pisiformis, 30,700 ESTs of T. solium, 1,058 ESTs of Eg+Em [conserved ESTs between Echinococcus granulosus and Echinococcus multilocularis]), with the cluster of orthologous groups (COG) and gene ontology (GO) functional classification systems. Furthermore, the conserved common genes in these four cestode species were obtained and aligned by the KEGG database. Conclusion This study provides an extensive transcriptome dataset obtained from the deep sequencing of T. pisiformis in a non-model whole genome. The identification of conserved genes may provide novel approaches for potential drug targets and vaccinations against cestode infections. Research can now accelerate into the functional genomics, immunity and gene expression profiles of cestode species. PMID:22514598
Mass Spectrometry Imaging of Biological Tissue: An Approach for Multicenter Studies
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rompp, Andreas; Both, Jean-Pierre; Brunelle, Alain
2015-03-01
Mass spectrometry imaging has become a popular tool for probing the chemical complexity of biological surfaces. This led to the development of a wide range of instrumentation and preparation protocols. It is thus desirable to evaluate and compare the data output from different methodologies and mass spectrometers. Here, we present an approach for the comparison of mass spectrometry imaging data from different laboratories (often referred to as multicenter studies). This is exemplified by the analysis of mouse brain sections in five laboratories in Europe and the USA. The instrumentation includes matrix-assisted laser desorption/ionization (MALDI)-time-of-flight (TOF), MALDI-QTOF, MALDIFourier transform ion cyclotronmore » resonance (FTICR), atmospheric-pressure (AP)-MALDI-Orbitrap, and cluster TOF-secondary ion mass spectrometry (SIMS). Experimental parameters such as measurement speed, imaging bin width, and mass spectrometric parameters are discussed. All datasets were converted to the standard data format imzML and displayed in a common open-source software with identical parameters for visualization, which facilitates direct comparison of MS images. The imzML conversion also allowed exchange of fully functional MS imaging datasets between the different laboratories. The experiments ranged from overview measurements of the full mouse brain to detailed analysis of smaller features (depending on spatial resolution settings), but common histological features such as the corpus callosum were visible in all measurements. High spatial resolution measurements of AP-MALDI-Orbitrap and TOF-SIMS showed comparable structures in the low-micrometer range. We discuss general considerations for planning and performing multicenter studies in mass spectrometry imaging. This includes details on the selection, distribution, and preparation of tissue samples as well as on data handling. Such multicenter studies in combination with ongoing activities for reporting guidelines, a common data format (imzML) and a public data repository can contribute to more reliability and transparency of MS imaging studies.« less
NASA Astrophysics Data System (ADS)
Zolina, Olga; Simmer, Clemens; Kapala, Alice; Mächel, Hermann; Gulev, Sergey; Groisman, Pavel
2014-05-01
We present new high resolution precipitation daily grids developed at Meteorological Institute, University of Bonn and German Weather Service (DWD) under the STAMMEX project (Spatial and Temporal Scales and Mechanisms of Extreme Precipitation Events over Central Europe). Daily precipitation grids have been developed from the daily-observing precipitation network of DWD, which runs one of the World's densest rain gauge networks comprising more than 7500 stations. Several quality-controlled daily gridded products with homogenized sampling were developed covering the periods 1931-onwards (with 0.5 degree resolution), 1951-onwards (0.25 degree and 0.5 degree), and 1971-2000 (0.1 degree). Different methods were tested to select the best gridding methodology that minimizes errors of integral grid estimates over hilly terrain. Besides daily precipitation values with uncertainty estimates (which include standard estimates of the kriging uncertainty as well as error estimates derived by a bootstrapping algorithm), the STAMMEX data sets include a variety of statistics that characterize temporal and spatial dynamics of the precipitation distribution (quantiles, extremes, wet/dry spells, etc.). Comparisons with existing continental-scale daily precipitation grids (e.g., CRU, ECA E-OBS, GCOS) which include considerably less observations compared to those used in STAMMEX, demonstrate the added value of high-resolution grids for extreme rainfall analyses. These data exhibit spatial variability pattern and trends in precipitation extremes, which are missed or incorrectly reproduced over Central Europe from coarser resolution grids based on sparser networks. The STAMMEX dataset can be used for high-quality climate diagnostics of precipitation variability, as a reference for reanalyses and remotely-sensed precipitation products (including the upcoming Global Precipitation Mission products), and for input into regional climate and operational weather forecast models. We will present numerous application of the STAMMEX grids spanning from case studies of the major Central European floods to long-term changes in different precipitation statistics, including those accounting for the alternation of dry and wet periods and precipitation intensities associated with prolonged rainy episodes.
Basak, Subhash C; Majumdar, Subhabrata
2015-01-01
Variation in high-dimensional data is often caused by a few latent factors, and hence dimension reduction or variable selection techniques are often useful in gathering useful information from the data. In this paper we consider two such recent methods: Interrelated two-way clustering and envelope models. We couple these methods with traditional statistical procedures like ridge regression and linear discriminant analysis, and apply them on two data sets which have more predictors than samples (i.e. n < p scenario) and several types of molecular descriptors. One of these datasets consists of a congeneric group of Amines while the other has a much diverse collection compounds. The difference of prediction results between these two datasets for both the methods supports the hypothesis that for a congeneric set of compounds, descriptors of a certain type are enough to provide good QSAR models, but as the data set grows diverse including a variety of descriptors can improve model quality considerably.
NASA Astrophysics Data System (ADS)
Easterday, K.; Kelly, M.; McIntyre, P. J.
2015-12-01
Climate change is forecasted to have considerable influence on the distribution, structure, and function of California's forests. However, human interactions with forested landscapes (e.g. fire suppression, resource extraction and etc.) have complicated scientific understanding of the relative contributions of climate change and anthropogenic land management practices as drivers of change. Observed changes in forest structure towards smaller, denser forests across California have been attributed to both climate change (e.g. increased temperatures and declining water availability) and management practices (e.g. fire suppression and logging). Disentangling how these drivers of change act both together and apart is important to developing sustainable policy and land management practices as well as enhancing knowledge of human and natural system interactions. To that end, a comprehensive historical dataset - the Vegetation Type Mapping project (VTM) - and a modern forest inventory dataset (FIA) are used to analyze how spatial variations in vegetation composition and structure over a ~100 year period can be explained by land ownership.Climate change is forecasted to have considerable influence on the distribution, structure, and function of California's forests. However, human interactions with forested landscapes (e.g. fire suppression, resource extraction and etc.) have complicated scientific understanding of the relative contributions of climate change and anthropogenic land management practices as drivers of change. Observed changes in forest structure towards smaller, denser forests across California have been attributed to both climate change (e.g. increased temperatures and declining water availability) and management practices (e.g. fire suppression and logging). Disentangling how these drivers of change act both together and apart is important to developing sustainable policy and land management practices as well as enhancing knowledge of human and natural system interactions. To that end, a comprehensive historical dataset - the Vegetation Type Mapping project (VTM) - and a modern forest inventory dataset (FIA) are used to analyze how spatial variations in vegetation composition and structure over a ~100 year period can be explained by land ownership.
The uncertainties and causes of the recent changes in global evapotranspiration from 1982 to 2010
NASA Astrophysics Data System (ADS)
Dong, Bo; Dai, Aiguo
2017-07-01
Recent studies have shown considerable changes in terrestrial evapotranspiration (ET) since the early 1980s, but the causes of these changes remain unclear. In this study, the relative contributions of external climate forcing and internal climate variability to the recent ET changes are examined. Three datasets of global terrestrial ET and the CMIP5 multi-model ensemble mean ET are analyzed, respectively, to quantify the apparent and externally-forced ET changes, while the unforced ET variations are estimated as the apparent ET minus the forced component. Large discrepancies of the ET estimates, in terms of their trend, variability, and temperature- and precipitation-dependence, are found among the three datasets. Results show that the forced global-mean ET exhibits an upward trend of 0.08 mm day-1 century-1 from 1982 to 2010. The forced ET also contains considerable multi-year to decadal variations during the latter half of the 20th century that are caused by volcanic aerosols. The spatial patterns and interannual variations of the forced ET are more closely linked to precipitation than temperature. After removing the forced component, the global-mean ET shows a trend ranging from -0.07 to 0.06 mm day-1 century-1 during 1982-2010 with varying spatial patterns among the three datasets. Furthermore, linkages between the unforced ET and internal climate modes are examined. Variations in Pacific sea surface temperatures (SSTs) are found to be consistently correlated with ET over many land areas among the ET datasets. The results suggest that there are large uncertainties in our current estimates of global terrestrial ET for the recent decades, and the greenhouse gas (GHG) and aerosol external forcings account for a large part of the apparent trend in global-mean terrestrial ET since 1982, but Pacific SST and other internal climate variability dominate recent ET variations and changes over most regions.
Federal Register 2010, 2011, 2012, 2013, 2014
2013-08-09
...; methodological considerations that could affect the interpretation of or confidence in study results; and... summarizing key characteristics and findings from critical studies that EPA proposes to consider in..., screening studies for consideration, and selecting studies to include in evidence tables, are responsive to...
ERIC Educational Resources Information Center
Niles, Gloria Y.
2013-01-01
Using basic qualitative research methodology, the purpose for this dissertation study was to explore the language, social and learning considerations and subsequent actions taken by eight, bilingual, Hispanic-American mothers of children with autism between the ages of four and eight-years-old regarding speaking Spanish, English or both languages…
Report of an exploratory study: Safety and liability considerations for photovoltaic modules/panels
NASA Technical Reports Server (NTRS)
Weinstein, A. S.; Meeker, D. G.
1981-01-01
An overview of legal issues as they apply to design, manufacture and use of photovoltaic module/array devices is provided and a methodology is suggested for use of the design stage of these products to minimize or eliminate perceived hazards. Questions are posed to stimulate consideration of this area.
NASA Astrophysics Data System (ADS)
Lawler, D. M.
2008-01-01
In most episodic erosion and deposition systems, knowledge of the timing of geomorphological change, in relation to fluctuations in the driving forces, is crucial to strong erosion process inference, and model building, validation and development. A challenge for geomorphology, however, is that few studies have focused on geomorphological event structure (timing, magnitude, frequency and duration of individual erosion and deposition events), in relation to applied stresses, because of the absence of key monitoring methodologies. This paper therefore (a) presents full details of a new erosion and deposition measurement system — PEEP-3T — developed from the Photo-Electronic Erosion Pin sensor in five key areas, including the addition of nocturnal monitoring through the integration of the Thermal Consonance Timing (TCT) concept, to produce a continuous sensing system; (b) presents novel high-resolution datasets from the redesigned PEEP-3T system for river bank system of the Rivers Nidd and Wharfe, northern England, UK; and (c) comments on their potential for wider application throughout geomorphology to address these key measurement challenges. Relative to manual methods of erosion and deposition quantification, continuous PEEP-3T methodologies increase the temporal resolution of erosion/deposition event detection by more than three orders of magnitude (better than 1-second resolution if required), and this facility can significantly enhance process inference. Results show that river banks are highly dynamic thermally and respond quickly to radiation inputs. Data on bank retreat timing, fixed with PEEP-3T TCT evidence, confirmed that they were significantly delayed up to 55 h after flood peaks. One event occurred 13 h after emergence from the flow. This suggests that mass failure processes rather than fluid entrainment dominated the system. It is also shown how, by integrating turbidity instrumentation with TCT ideas, linkages between sediment supply and sediment flux can be forged at event timescales, and a lack of sediment exhaustion was evident here. Five challenges for wider geomorphological process investigation are discussed. This event-based dynamics approach, based on continuous monitoring methodologies, appears to have considerable wider potential for stronger process inference and model testing and validation in many areas of geomorphology.
A new method for detecting, quantifying and monitoring diffuse contamination
NASA Astrophysics Data System (ADS)
Fabian, Karl; Reimann, Clemens; de Caritat, Patrice
2017-04-01
A new method is presented for detecting and quantifying diffuse contamination at the regional to continental scale. It is based on the analysis of cumulative distribution functions (CDFs) in cumulative probability (CP) plots for spatially representative datasets, preferably containing >1000 samples. Simulations demonstrate how different types of contamination influence elemental CDFs of different sample media. Contrary to common belief, diffuse contamination does not result in exceedingly high element concentrations in regional- to continental-scale datasets. Instead it produces a distinctive shift of concentrations in the background distribution of the studied element resulting in a steeper data distribution in the CP plot. Via either (1) comparing the distribution of an element in top soil samples to the distribution of the same element in bottom soil samples from the same area, taking soil forming processes into consideration, or (2) comparing the distribution of the contaminating element (e.g., Pb) to that of an element with a geochemically comparable behaviour but no contamination source (e.g., Rb or Ba in case of Pb), the relative impact of diffuse contamination on the element concentration can be estimated either graphically in the CP plot via a best fit estimate or quantitatively via a Kolmogorov-Smirnov or Cramer vonMiese test. This is demonstrated using continental-scale geochemical soil datasets from Europe, Australia, and the USA, and a regional scale dataset from Norway. Several different datasets from Europe deliver comparable results at regional to continental scales. The method is also suitable for monitoring diffuse contamination based on the statistical distribution of repeat datasets at the continental scale in a cost-effective manner.
Canessa, Andrea; Gibaldi, Agostino; Chessa, Manuela; Fato, Marco; Solari, Fabio; Sabatini, Silvio P.
2017-01-01
Binocular stereopsis is the ability of a visual system, belonging to a live being or a machine, to interpret the different visual information deriving from two eyes/cameras for depth perception. From this perspective, the ground-truth information about three-dimensional visual space, which is hardly available, is an ideal tool both for evaluating human performance and for benchmarking machine vision algorithms. In the present work, we implemented a rendering methodology in which the camera pose mimics realistic eye pose for a fixating observer, thus including convergent eye geometry and cyclotorsion. The virtual environment we developed relies on highly accurate 3D virtual models, and its full controllability allows us to obtain the stereoscopic pairs together with the ground-truth depth and camera pose information. We thus created a stereoscopic dataset: GENUA PESTO—GENoa hUman Active fixation database: PEripersonal space STereoscopic images and grOund truth disparity. The dataset aims to provide a unified framework useful for a number of problems relevant to human and computer vision, from scene exploration and eye movement studies to 3D scene reconstruction. PMID:28350382
Real-time individual predictions of prostate cancer recurrence using joint models
Taylor, Jeremy M. G.; Park, Yongseok; Ankerst, Donna P.; Proust-Lima, Cecile; Williams, Scott; Kestin, Larry; Bae, Kyoungwha; Pickles, Tom; Sandler, Howard
2012-01-01
Summary Patients who were previously treated for prostate cancer with radiation therapy are monitored at regular intervals using a laboratory test called Prostate Specific Antigen (PSA). If the value of the PSA test starts to rise, this is an indication that the prostate cancer is more likely to recur, and the patient may wish to initiate new treatments. Such patients could be helped in making medical decisions by an accurate estimate of the probability of recurrence of the cancer in the next few years. In this paper, we describe the methodology for giving the probability of recurrence for a new patient, as implemented on a web-based calculator. The methods use a joint longitudinal survival model. The model is developed on a training dataset of 2,386 patients and tested on a dataset of 846 patients. Bayesian estimation methods are used with one Markov chain Monte Carlo (MCMC) algorithm developed for estimation of the parameters from the training dataset and a second quick MCMC developed for prediction of the risk of recurrence that uses the longitudinal PSA measures from a new patient. PMID:23379600
Climatic Analysis of Oceanic Water Vapor Transports Based on Satellite E-P Datasets
NASA Technical Reports Server (NTRS)
Smith, Eric A.; Sohn, Byung-Ju; Mehta, Vikram
2004-01-01
Understanding the climatically varying properties of water vapor transports from a robust observational perspective is an essential step in calibrating climate models. This is tantamount to measuring year-to-year changes of monthly- or seasonally-averaged, divergent water vapor transport distributions. This cannot be done effectively with conventional radiosonde data over ocean regions where sounding data are generally sparse. This talk describes how a methodology designed to derive atmospheric water vapor transports over the world oceans from satellite-retrieved precipitation (P) and evaporation (E) datasets circumvents the problem of inadequate sampling. Ultimately, the method is intended to take advantage of the relatively complete and consistent coverage, as well as continuity in sampling, associated with E and P datasets obtained from satellite measurements. Independent P and E retrievals from Special Sensor Microwave Imager (SSM/I) measurements, along with P retrievals from Tropical Rainfall Measuring Mission (TRMM) measurements, are used to obtain transports by solving a potential function for the divergence of water vapor transport as balanced by large scale E - P conditions.
Glycan array data management at Consortium for Functional Glycomics.
Venkataraman, Maha; Sasisekharan, Ram; Raman, Rahul
2015-01-01
Glycomics or the study of structure-function relationships of complex glycans has reshaped post-genomics biology. Glycans mediate fundamental biological functions via their specific interactions with a variety of proteins. Recognizing the importance of glycomics, large-scale research initiatives such as the Consortium for Functional Glycomics (CFG) were established to address these challenges. Over the past decade, the Consortium for Functional Glycomics (CFG) has generated novel reagents and technologies for glycomics analyses, which in turn have led to generation of diverse datasets. These datasets have contributed to understanding glycan diversity and structure-function relationships at molecular (glycan-protein interactions), cellular (gene expression and glycan analysis), and whole organism (mouse phenotyping) levels. Among these analyses and datasets, screening of glycan-protein interactions on glycan array platforms has gained much prominence and has contributed to cross-disciplinary realization of the importance of glycomics in areas such as immunology, infectious diseases, cancer biomarkers, etc. This manuscript outlines methodologies for capturing data from glycan array experiments and online tools to access and visualize glycan array data implemented at the CFG.
Medicine, methodology, and values: trade-offs in clinical science and practice.
Ho, Vincent K Y
2011-01-01
The current guidelines of evidence-based medicine (EBM) presuppose that clinical research and clinical practice should advance from rigorous scientific tests as they generate reliable, value-free knowledge. Under this presupposition, hypotheses postulated by doctors and patients in the process of their decision making are preferably tested in randomized clinical trials (RCTs), and in systematic reviews and meta-analyses summarizing outcomes from multiple RCTs. Since testing under this scheme is predominantly focused on the criteria of generality and precision achieved through methodological rigor, at the cost of the criterion of realism, translating test results to clinical practice is often problematic. Choices concerning which methodological criteria should have priority are inevitable, however, as clinical trials, and scientific research in general, cannot meet all relevant criteria at the same time. Since these choices may be informed by considerations external to science, we must acknowledge that science cannot be value-free in a strict sense, and this invites a more prominent role for value-laden considerations in evaluating clinical research. The urgency for this becomes even more apparent when we consider the important yet implicit role of scientific theories in EBM, which may also be subjected to methodological evaluation and for which selectiveness in methodological focus is likewise inevitable.
Improving average ranking precision in user searches for biomedical research datasets
Gobeill, Julien; Gaudinat, Arnaud; Vachon, Thérèse; Ruch, Patrick
2017-01-01
Abstract Availability of research datasets is keystone for health and life science study reproducibility and scientific progress. Due to the heterogeneity and complexity of these data, a main challenge to be overcome by research data management systems is to provide users with the best answers for their search queries. In the context of the 2016 bioCADDIE Dataset Retrieval Challenge, we investigate a novel ranking pipeline to improve the search of datasets used in biomedical experiments. Our system comprises a query expansion model based on word embeddings, a similarity measure algorithm that takes into consideration the relevance of the query terms, and a dataset categorization method that boosts the rank of datasets matching query constraints. The system was evaluated using a corpus with 800k datasets and 21 annotated user queries, and provided competitive results when compared to the other challenge participants. In the official run, it achieved the highest infAP, being +22.3% higher than the median infAP of the participant’s best submissions. Overall, it is ranked at top 2 if an aggregated metric using the best official measures per participant is considered. The query expansion method showed positive impact on the system’s performance increasing our baseline up to +5.0% and +3.4% for the infAP and infNDCG metrics, respectively. The similarity measure algorithm showed robust performance in different training conditions, with small performance variations compared to the Divergence from Randomness framework. Finally, the result categorization did not have significant impact on the system’s performance. We believe that our solution could be used to enhance biomedical dataset management systems. The use of data driven expansion methods, such as those based on word embeddings, could be an alternative to the complexity of biomedical terminologies. Nevertheless, due to the limited size of the assessment set, further experiments need to be performed to draw conclusive results. Database URL: https://biocaddie.org/benchmark-data PMID:29220475
Emerging technologies for the changing global market
NASA Technical Reports Server (NTRS)
Cruit, Wendy; Schutzenhofer, Scott; Goldberg, Ben; Everhart, Kurt
1993-01-01
This project served to define an appropriate methodology for effective prioritization of technology efforts required to develop replacement technologies mandated by imposed and forecast legislation. The methodology used is a semi-quantative approach derived from quality function deployment techniques (QFD Matrix). This methodology aims to weight the full environmental, cost, safety, reliability, and programmatic implications of replacement technology development to allow appropriate identification of viable candidates and programmatic alternatives. The results will be implemented as a guideline for consideration for current NASA propulsion systems.
ERIC Educational Resources Information Center
Karabenick, Stuart A.; Zusho, Akane
2015-01-01
We provide a conceptual commentary on the articles in this special issue, first by describing the unique features of each study, focusing on what we consider to be their theoretical and methodological contributions, and then by highlighting significant crosscutting themes and future directions in the study of SRL. Specifically, we define SRL to be…
ERIC Educational Resources Information Center
Selig, Judith A.; And Others
This report, summarizing the activities of the Vision Information Center (VIC) in the field of computer-assisted instruction from December, 1966 to August, 1967, describes the methodology used to load a large body of information--a programed text on basic opthalmology--onto a computer for subsequent information retrieval and computer-assisted…
ERIC Educational Resources Information Center
Pearson, Marion L.; Albon, Simon P.; Hubball, Harry
2015-01-01
Individuals and teams engaging in the scholarship of teaching and learning (SoTL) in multidisciplinary higher education settings must make decisions regarding choice of research methodology and methods. These decisions are guided by the research context and the goals of the inquiry. With reference to our own recent experiences investigating…
Methodological Considerations for an Evolving Model of Institutional Research.
ERIC Educational Resources Information Center
Jones, Timothy B.; Essien-Barrett, Barbara; Gill, Peggy B.
A multi-case study was used in the self-study of three programs within an academic department of a mid-sized Southern university. Multi-case methodology as a form of self-study encourages a process of self-renewal and programmatic change as it defines an active stakeholder role. The participants in the three case studies were university faculty…
The 3D Reference Earth Model: Status and Preliminary Results
NASA Astrophysics Data System (ADS)
Moulik, P.; Lekic, V.; Romanowicz, B. A.
2017-12-01
In the 20th century, seismologists constructed models of how average physical properties (e.g. density, rigidity, compressibility, anisotropy) vary with depth in the Earth's interior. These one-dimensional (1D) reference Earth models (e.g. PREM) have proven indispensable in earthquake location, imaging of interior structure, understanding material properties under extreme conditions, and as a reference in other fields, such as particle physics and astronomy. Over the past three decades, new datasets motivated more sophisticated efforts that yielded models of how properties vary both laterally and with depth in the Earth's interior. Though these three-dimensional (3D) models exhibit compelling similarities at large scales, differences in the methodology, representation of structure, and dataset upon which they are based, have prevented the creation of 3D community reference models. As part of the REM-3D project, we are compiling and reconciling reference seismic datasets of body wave travel-time measurements, fundamental mode and overtone surface wave dispersion measurements, and normal mode frequencies and splitting functions. These reference datasets are being inverted for a long-wavelength, 3D reference Earth model that describes the robust long-wavelength features of mantle heterogeneity. As a community reference model with fully quantified uncertainties and tradeoffs and an associated publically available dataset, REM-3D will facilitate Earth imaging studies, earthquake characterization, inferences on temperature and composition in the deep interior, and be of improved utility to emerging scientific endeavors, such as neutrino geoscience. Here, we summarize progress made in the construction of the reference long period dataset and present a preliminary version of REM-3D in the upper-mantle. In order to determine the level of detail warranted for inclusion in REM-3D, we analyze the spectrum of discrepancies between models inverted with different subsets of the reference dataset. This procedure allows us to evaluate the extent of consistency in imaging heterogeneity at various depths and between spatial scales.
Handling limited datasets with neural networks in medical applications: A small-data approach.
Shaikhina, Torgyn; Khovanova, Natalia A
2017-01-01
Single-centre studies in medical domain are often characterised by limited samples due to the complexity and high costs of patient data collection. Machine learning methods for regression modelling of small datasets (less than 10 observations per predictor variable) remain scarce. Our work bridges this gap by developing a novel framework for application of artificial neural networks (NNs) for regression tasks involving small medical datasets. In order to address the sporadic fluctuations and validation issues that appear in regression NNs trained on small datasets, the method of multiple runs and surrogate data analysis were proposed in this work. The approach was compared to the state-of-the-art ensemble NNs; the effect of dataset size on NN performance was also investigated. The proposed framework was applied for the prediction of compressive strength (CS) of femoral trabecular bone in patients suffering from severe osteoarthritis. The NN model was able to estimate the CS of osteoarthritic trabecular bone from its structural and biological properties with a standard error of 0.85MPa. When evaluated on independent test samples, the NN achieved accuracy of 98.3%, outperforming an ensemble NN model by 11%. We reproduce this result on CS data of another porous solid (concrete) and demonstrate that the proposed framework allows for an NN modelled with as few as 56 samples to generalise on 300 independent test samples with 86.5% accuracy, which is comparable to the performance of an NN developed with 18 times larger dataset (1030 samples). The significance of this work is two-fold: the practical application allows for non-destructive prediction of bone fracture risk, while the novel methodology extends beyond the task considered in this study and provides a general framework for application of regression NNs to medical problems characterised by limited dataset sizes. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.
VIS – A database on the distribution of fishes in inland and estuarine waters in Flanders, Belgium
Brosens, Dimitri; Breine, Jan; Van Thuyne, Gerlinde; Belpaire, Claude; Desmet, Peter; Verreycken, Hugo
2015-01-01
Abstract The Research Institute for Nature and Forest (INBO) has been performing standardized fish stock assessments in Flanders, Belgium. This Flemish Fish Monitoring Network aims to assess fish populations in public waters at regular time intervals in both inland waters and estuaries. This monitoring was set up in support of the Water Framework Directive, the Habitat Directive, the Eel Regulation, the Red List of fishes, fish stock management, biodiversity research, and to assess the colonization and spreading of non-native fish species. The collected data are consolidated in the Fish Information System or VIS. From VIS, the occurrence data are now published at the INBO IPT as two datasets: ‘VIS - Fishes in inland waters in Flanders, Belgium’ and ‘VIS - Fishes in estuarine waters in Flanders, Belgium’. Together these datasets represent a complete overview of the distribution and abundance of fish species pertaining in Flanders from late 1992 to the end of 2012. This data paper discusses both datasets together, as both have a similar methodology and structure. The inland waters dataset contains over 350,000 fish observations, sampled between 1992 and 2012 from over 2,000 locations in inland rivers, streams, canals, and enclosed waters in Flanders. The dataset includes 64 fish species, as well as a number of non-target species (mainly crustaceans). The estuarine waters dataset contains over 44,000 fish observations, sampled between 1995 and 2012 from almost 50 locations in the estuaries of the rivers Yser and Scheldt (“Zeeschelde”), including two sampling sites in the Netherlands. The dataset includes 69 fish species and a number of non-target crustacean species. To foster broad and collaborative use, the data are dedicated to the public domain under a Creative Commons Zero waiver and reference the INBO norms for data use. PMID:25685001
MSWEP V2 global 3-hourly 0.1° precipitation: methodology and quantitative appraisal
NASA Astrophysics Data System (ADS)
Beck, H.; Yang, L.; Pan, M.; Wood, E. F.; William, L.
2017-12-01
Here, we present Multi-Source Weighted-Ensemble Precipitation (MSWEP) V2, the first fully global gridded precipitation (P) dataset with a 0.1° spatial resolution. The dataset covers the period 1979-2016, has a 3-hourly temporal resolution, and was derived by optimally merging a wide range of data sources based on gauges (WorldClim, GHCN-D, GSOD, and others), satellites (CMORPH, GridSat, GSMaP, and TMPA 3B42RT), and reanalyses (ERA-Interim, JRA-55, and NCEP-CFSR). MSWEP V2 implements some major improvements over V1, such as (i) the correction of distributional P biases using cumulative distribution function matching, (ii) increasing the spatial resolution from 0.25° to 0.1°, (iii) the inclusion of ocean areas, (iv) the addition of NCEP-CFSR P estimates, (v) the addition of thermal infrared-based P estimates for the pre-TRMM era, (vi) the addition of 0.1° daily interpolated gauge data, (vii) the use of a daily gauge correction scheme that accounts for regional differences in the 24-hour accumulation period of gauges, and (viii) extension of the data record to 2016. The gauge-based assessment of the reanalysis and satellite P datasets, necessary for establishing the merging weights, revealed that the reanalysis datasets strongly overestimate the P frequency for the entire globe, and that the satellite (resp. reanalysis) datasets consistently performed better at low (high) latitudes. Compared to other state-of-the-art P datasets, MSWEP V2 exhibits more plausible global patterns in mean annual P, percentiles, and annual number of dry days, and better resolves the small-scale variability over topographically complex terrain. Other P datasets appear to consistently underestimate P amounts over mountainous regions. Long-term mean P estimates for the global, land, and ocean domains based on MSWEP V2 are 959, 796, and 1026 mm/yr, respectively, in close agreement with the best previous published estimates.
ERIC Educational Resources Information Center
Lewis, Jonathan S.
2017-01-01
Paid employment is one of the most common extracurricular activities among full-time undergraduates, and an array of studies has attempted to measure its impact. Methodological concerns with the extant literature, however, make it difficult to draw reliable conclusions. Furthermore, the research on working college students has little to say about…
The coordinate-based meta-analysis of neuroimaging data.
Samartsidis, Pantelis; Montagna, Silvia; Nichols, Thomas E; Johnson, Timothy D
2017-01-01
Neuroimaging meta-analysis is an area of growing interest in statistics. The special characteristics of neuroimaging data render classical meta-analysis methods inapplicable and therefore new methods have been developed. We review existing methodologies, explaining the benefits and drawbacks of each. A demonstration on a real dataset of emotion studies is included. We discuss some still-open problems in the field to highlight the need for future research.
The coordinate-based meta-analysis of neuroimaging data
Samartsidis, Pantelis; Montagna, Silvia; Nichols, Thomas E.; Johnson, Timothy D.
2017-01-01
Neuroimaging meta-analysis is an area of growing interest in statistics. The special characteristics of neuroimaging data render classical meta-analysis methods inapplicable and therefore new methods have been developed. We review existing methodologies, explaining the benefits and drawbacks of each. A demonstration on a real dataset of emotion studies is included. We discuss some still-open problems in the field to highlight the need for future research. PMID:29545671
2011-10-01
inconsistency in the representation of the dataset. RST provides a mathematical tool for representing and reasoning about vagueness and inconsistency. Its...use of various mathematical , statistical and soft computing methodologies with the objective of identifying meaningful relationships between condition...Evidence-based Medicine and Health Outcomes Research, University of South Florida, Tampa, FL 2Department of Mathematics , Indiana University Northwest, Gary
2016-03-02
some close- ness constant and dissimilar pairs be more distant than some larger constant. Online and non -linear extensions to the ITML methodology are...is obtained, instead of solving an objective function formed from the entire dataset. Many online learning methods have regret guarantees, that is... function Metric learning seeks to learn a metric that encourages data points marked as similar to be close and data points marked as different to be far
Measures and Indicators of Vgi Quality: AN Overview
NASA Astrophysics Data System (ADS)
Antoniou, V.; Skopeliti, A.
2015-08-01
The evaluation of VGI quality has been a very interesting and popular issue amongst academics and researchers. Various metrics and indicators have been proposed for evaluating VGI quality elements. Various efforts have focused on the use of well-established methodologies for the evaluation of VGI quality elements against authoritative data. In this paper, a number of research papers have been reviewed and summarized in a detailed report on measures for each spatial data quality element. Emphasis is given on the methodology followed and the data used in order to assess and evaluate the quality of the VGI datasets. However, as the use of authoritative data is not always possible many researchers have turned their focus on the analysis of new quality indicators that can function as proxies for the understanding of VGI quality. In this paper, the difficulties in using authoritative datasets are briefly presented and new proposed quality indicators are discussed, as recorded through the literature review. We classify theses new indicators in four main categories that relate with: i) data, ii) demographics, iii) socio-economic situation and iv) contributors. This paper presents a dense, yet comprehensive overview of the research on this field and provides the basis for the ongoing academic effort to create a practical quality evaluation method through the use of appropriate quality indicators.
Performance testing of LiDAR exploitation software
NASA Astrophysics Data System (ADS)
Varela-González, M.; González-Jorge, H.; Riveiro, B.; Arias, P.
2013-04-01
Mobile LiDAR systems are being used widely in recent years for many applications in the field of geoscience. One of most important limitations of this technology is the large computational requirements involved in data processing. Several software solutions for data processing are available in the market, but users are often unknown about the methodologies to verify their performance accurately. In this work a methodology for LiDAR software performance testing is presented and six different suites are studied: QT Modeler, AutoCAD Civil 3D, Mars 7, Fledermaus, Carlson and TopoDOT (all of them in x64). Results depict as QTModeler, TopoDOT and AutoCAD Civil 3D allow the loading of large datasets, while Fledermaus, Mars7 and Carlson do not achieve these powerful performance. AutoCAD Civil 3D needs large loading time in comparison with the most powerful softwares such as QTModeler and TopoDOT. Carlson suite depicts the poorest results among all the softwares under study, where point clouds larger than 5 million points cannot be loaded and loading time is very large in comparison with the other suites even for the smaller datasets. AutoCAD Civil 3D, Carlson and TopoDOT show more threads than other softwares like QTModeler, Mars7 and Fledermaus.
Maljovec, D.; Liu, S.; Wang, B.; ...
2015-07-14
Here, dynamic probabilistic risk assessment (DPRA) methodologies couple system simulator codes (e.g., RELAP and MELCOR) with simulation controller codes (e.g., RAVEN and ADAPT). Whereas system simulator codes model system dynamics deterministically, simulation controller codes introduce both deterministic (e.g., system control logic and operating procedures) and stochastic (e.g., component failures and parameter uncertainties) elements into the simulation. Typically, a DPRA is performed by sampling values of a set of parameters and simulating the system behavior for that specific set of parameter values. For complex systems, a major challenge in using DPRA methodologies is to analyze the large number of scenarios generated,more » where clustering techniques are typically employed to better organize and interpret the data. In this paper, we focus on the analysis of two nuclear simulation datasets that are part of the risk-informed safety margin characterization (RISMC) boiling water reactor (BWR) station blackout (SBO) case study. We provide the domain experts a software tool that encodes traditional and topological clustering techniques within an interactive analysis and visualization environment, for understanding the structures of such high-dimensional nuclear simulation datasets. We demonstrate through our case study that both types of clustering techniques complement each other for enhanced structural understanding of the data.« less
Considerations about our approach to obstetric psychoprophylaxis.
Cerutti, R; Volpe, B; Sichel, M P; Sandri, M; Sbrignadello, C; Fede, T
1983-01-01
Usually the term "obstetric psychoprophylaxis" refers to a specific method or technique. We prefer to consider it as a procedure that involves on one side the woman, the child and its family, and on the other the services entitled to give pre- and post-natal assistance. In order to realize this, a reformation of our methodological parameters and a critical analysis of the results obtained are required. In the courses of obstetric psychoprophylaxis that are held in the Department of Obstetrics and Gynaecology of the University of Padua we take into consideration the following themes: - Methodological approach - Professional training of the staff - Significance of psychosocial culture in the management of the pregnancy by the health services.
Ecological Momentary Assessment is a Neglected Methodology in Suicidology.
Davidson, Collin L; Anestis, Michael D; Gutierrez, Peter M
2017-01-02
Ecological momentary assessment (EMA) is a group of research methods that collect data frequently, in many contexts, and in real-world settings. EMA has been fairly neglected in suicidology. The current article provides an overview of EMA for suicidologists including definitions, data collection considerations, and different sampling strategies. Next, the benefits of EMA in suicidology (i.e., reduced recall bias, accurate tracking of fluctuating variables, testing assumptions of theories, use in interventions), participant safety considerations, and examples of published research that investigate self-directed violence variables using EMA are discussed. The article concludes with a summary and suggested directions for EMA research in suicidology with the particular aim to spur the increased use of this methodology among suicidologists.
Design consideration of resonance inverters with electro-technological application
NASA Astrophysics Data System (ADS)
Hinov, Nikolay
2017-12-01
This study presents design consideration of resonance inverters with electro-technological application. The presented methodology was achieved as a result of investigations and analyses of different types and working regimes of resonance inverters, made by the author. Are considered schemes of resonant inverters without inverse diodes. The first harmonic method is used in the analysis and design. This method for the case of inverters with electro-technological application gives very good accuracy. This does not require the use of a complex and heavy mathematical apparatus. The proposed methodology is easy to use and is suitable for use in training students in power electronics. Authenticity of achieved results is confirmed by simulating and physical prototypes research work.
de Dumast, Priscille; Mirabel, Clément; Cevidanes, Lucia; Ruellas, Antonio; Yatabe, Marilia; Ioshida, Marcos; Ribera, Nina Tubau; Michoud, Loic; Gomes, Liliane; Huang, Chao; Zhu, Hongtu; Muniz, Luciana; Shoukri, Brandon; Paniagua, Beatriz; Styner, Martin; Pieper, Steve; Budin, Francois; Vimort, Jean-Baptiste; Pascal, Laura; Prieto, Juan Carlos
2018-07-01
The purpose of this study is to describe the methodological innovations of a web-based system for storage, integration and computation of biomedical data, using a training imaging dataset to remotely compute a deep neural network classifier of temporomandibular joint osteoarthritis (TMJOA). This study imaging dataset consisted of three-dimensional (3D) surface meshes of mandibular condyles constructed from cone beam computed tomography (CBCT) scans. The training dataset consisted of 259 condyles, 105 from control subjects and 154 from patients with diagnosis of TMJ OA. For the image analysis classification, 34 right and left condyles from 17 patients (39.9 ± 11.7 years), who experienced signs and symptoms of the disease for less than 5 years, were included as the testing dataset. For the integrative statistical model of clinical, biological and imaging markers, the sample consisted of the same 17 test OA subjects and 17 age and sex matched control subjects (39.4 ± 15.4 years), who did not show any sign or symptom of OA. For these 34 subjects, a standardized clinical questionnaire, blood and saliva samples were also collected. The technological methodologies in this study include a deep neural network classifier of 3D condylar morphology (ShapeVariationAnalyzer, SVA), and a flexible web-based system for data storage, computation and integration (DSCI) of high dimensional imaging, clinical, and biological data. The DSCI system trained and tested the neural network, indicating 5 stages of structural degenerative changes in condylar morphology in the TMJ with 91% close agreement between the clinician consensus and the SVA classifier. The DSCI remotely ran with a novel application of a statistical analysis, the Multivariate Functional Shape Data Analysis, that computed high dimensional correlations between shape 3D coordinates, clinical pain levels and levels of biological markers, and then graphically displayed the computation results. The findings of this study demonstrate a comprehensive phenotypic characterization of TMJ health and disease at clinical, imaging and biological levels, using novel flexible and versatile open-source tools for a web-based system that provides advanced shape statistical analysis and a neural network based classification of temporomandibular joint osteoarthritis. Published by Elsevier Ltd.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lafata, K; Ren, L; Cai, J
2016-06-15
Purpose: To develop a methodology based on digitally-reconstructed-fluoroscopy (DRF) to quantitatively assess target localization accuracy of lung SBRT, and to evaluate using both a dynamic digital phantom and a patient dataset. Methods: For each treatment field, a 10-phase DRF is generated based on the planning 4DCT. Each frame is pre-processed with a morphological top-hat filter, and corresponding beam apertures are projected to each detector plane. A template-matching algorithm based on cross-correlation is used to detect the tumor location in each frame. Tumor motion relative beam aperture is extracted in the superior-inferior direction based on each frame’s impulse response to themore » template, and the mean tumor position (MTP) is calculated as the average tumor displacement. The DRF template coordinates are then transferred to the corresponding MV-cine dataset, which is retrospectively filtered as above. The treatment MTP is calculated within each field’s projection space, relative to the DRF-defined template. The field’s localization error is defined as the difference between the DRF-derived-MTP (planning) and the MV-cine-derived-MTP (delivery). A dynamic digital phantom was used to assess the algorithm’s ability to detect intra-fractional changes in patient alignment, by simulating different spatial variations in the MV-cine and calculating the corresponding change in MTP. Inter-and-intra-fractional variation, IGRT accuracy, and filtering effects were investigated on a patient dataset. Results: Phantom results demonstrated a high accuracy in detecting both translational and rotational variation. The lowest localization error of the patient dataset was achieved at each fraction’s first field (mean=0.38mm), with Fx3 demonstrating a particularly strong correlation between intra-fractional motion-caused localization error and treatment progress. Filtering significantly improved tracking visibility in both the DRF and MV-cine images. Conclusion: We have developed and evaluated a methodology to quantify lung SBRT target localization accuracy based on digitally-reconstructed-fluoroscopy. Our approach may be useful in potentially reducing treatment margins to optimize lung SBRT outcomes. R01-184173.« less
Considerations for preparing collaborative international research: a Ugandan experience.
Musil, Carol M; Mutabaazi, Jemimah; Walusimbi, Mariam; Okonsky, Jennifer G; Biribonwa, Yedidah; Eagan, Sabrina; Dimarco, Marguerite A; Mbaballi, Speciosa; Fitzpatrick, Joyce J
2004-08-01
This article describes issues to consider when planning and conducting international research projects. Key considerations include building collaboration, developing a comprehensive and feasible research plan, funding and budgets, addressing human subjects concerns, and analyzing and disseminating project findings. These considerations and related methodological issues are discussed in the context of a replication pilot project conducted outside Kampala, Uganda. Ongoing dialog, flexibility, and collaboration, in addition to good science, are critical to developing successful international research projects.
Speeding up the Consensus Clustering methodology for microarray data analysis
2011-01-01
Background The inference of the number of clusters in a dataset, a fundamental problem in Statistics, Data Analysis and Classification, is usually addressed via internal validation measures. The stated problem is quite difficult, in particular for microarrays, since the inferred prediction must be sensible enough to capture the inherent biological structure in a dataset, e.g., functionally related genes. Despite the rich literature present in that area, the identification of an internal validation measure that is both fast and precise has proved to be elusive. In order to partially fill this gap, we propose a speed-up of Consensus (Consensus Clustering), a methodology whose purpose is the provision of a prediction of the number of clusters in a dataset, together with a dissimilarity matrix (the consensus matrix) that can be used by clustering algorithms. As detailed in the remainder of the paper, Consensus is a natural candidate for a speed-up. Results Since the time-precision performance of Consensus depends on two parameters, our first task is to show that a simple adjustment of the parameters is not enough to obtain a good precision-time trade-off. Our second task is to provide a fast approximation algorithm for Consensus. That is, the closely related algorithm FC (Fast Consensus) that would have the same precision as Consensus with a substantially better time performance. The performance of FC has been assessed via extensive experiments on twelve benchmark datasets that summarize key features of microarray applications, such as cancer studies, gene expression with up and down patterns, and a full spectrum of dimensionality up to over a thousand. Based on their outcome, compared with previous benchmarking results available in the literature, FC turns out to be among the fastest internal validation methods, while retaining the same outstanding precision of Consensus. Moreover, it also provides a consensus matrix that can be used as a dissimilarity matrix, guaranteeing the same performance as the corresponding matrix produced by Consensus. We have also experimented with the use of Consensus and FC in conjunction with NMF (Nonnegative Matrix Factorization), in order to identify the correct number of clusters in a dataset. Although NMF is an increasingly popular technique for biological data mining, our results are somewhat disappointing and complement quite well the state of the art about NMF, shedding further light on its merits and limitations. Conclusions In summary, FC with a parameter setting that makes it robust with respect to small and medium-sized datasets, i.e, number of items to cluster in the hundreds and number of conditions up to a thousand, seems to be the internal validation measure of choice. Moreover, the technique we have developed here can be used in other contexts, in particular for the speed-up of stability-based validation measures. PMID:21235792
Bryce, Shayden; Sloan, Elise; Lee, Stuart; Ponsford, Jennie; Rossell, Susan
2016-04-01
Systematic reviews and meta-analyses are a primary source of evidence when evaluating the benefit(s) of cognitive remediation (CR) in schizophrenia. These studies are designed to rigorously synthesize scientific literature; however, cannot be assumed to be of high methodological quality. The aims of this report were to: 1) review the use of systematic reviews and meta-analyses regarding CR in schizophrenia; 2) conduct a systematic methodological appraisal of published reports examining the benefits of this intervention on core outcome domains; and 3) compare the correspondence between methodological and reporting quality. Electronic databases were searched for relevant articles. Twenty-one reviews met inclusion criteria and were scored according to the AMSTAR checklist-a validated scale of methodological quality. Five meta-analyses were also scored according to PRISMA statement to compare 'quality of conduct' with 'quality of reporting'. Most systematic reviews and meta-analyses shared strengths and fell within a 'medium' level of methodological quality. Nevertheless, there were consistent areas of potential weakness that were not addressed by most reviews. These included the lack of protocol registration, uncertainty regarding independent data extraction and consensus procedures, and the minimal assessment of publication bias. Moreover, quality of conduct may not necessarily parallel quality of reporting, suggesting that consideration of these methods independently may be important. Reviews concerning CR for schizophrenia are a valuable source of evidence. However, the methodological quality of these reports may require additional consideration. Enhancing quality of conduct is essential for enabling research literature to be interpreted with confidence. Copyright © 2016 Elsevier Ltd. All rights reserved.
School Context and Gender Differences in Mathematical Performance among School Graduates in Russia
ERIC Educational Resources Information Center
Bessudnov, Alexey; Makarov, Alexey
2015-01-01
Gender differences in mathematical performance have received considerable scrutiny in the fields of sociology, economics and psychology. We analyse a large data-set of high school graduates who took a standardised mathematical test in Russia in 2011 (n = 738,456) and find no substantial difference in mean test scores across boys and girls.…
Parks, Sean; Holsinger, Lisa M.; Voss, Morgan; Loehman, Rachel A.; Robinson, Nathaniel P.
2018-01-01
Landsat-based fire severity datasets are an invaluable resource for monitoring and research purposes. These gridded fire severity datasets are generally produced with pre-and post-fire imagery to estimate the degree of fire-induced ecological change. Here, we introduce methods to produce three Landsat-based fire severity metrics using the Google Earth Engine (GEE) platform: the delta normalized burn ratio (dNBR), the relativized delta normalized burn ratio (RdNBR), and the relativized burn ratio (RBR). Our methods do not rely on time-consuming a priori scene selection and instead use a mean compositing approach in which all valid pixels (e.g. cloud-free) over a pre-specified date range (pre- and post-fire) are stacked and the mean value for each pixel over each stack is used to produce the resulting fire severity datasets. This approach demonstrates that fire severity datasets can be produced with relative ease and speed compared the standard approach in which one pre-fire and post-fire scene are judiciously identified and used to produce fire severity datasets. We also validate the GEE-derived fire severity metrics using field-based fire severity plots for 18 fires in the western US. These validations are compared to Landsat-based fire severity datasets produced using only one pre- and post-fire scene, which has been the standard approach in producing such datasets since their inception. Results indicate that the GEE-derived fire severity datasets show improved validation statistics compared to parallel versions in which only one pre-fire and post-fire scene are used. We provide code and a sample geospatial fire history layer to produce dNBR, RdNBR, and RBR for the 18 fires we evaluated. Although our approach requires that a geospatial fire history layer (i.e. fire perimeters) be produced independently and prior to applying our methods, we suggest our GEE methodology can reasonably be implemented on hundreds to thousands of fires, thereby increasing opportunities for fire severity monitoring and research across the globe.
Xu, Huayong; Yu, Hui; Tu, Kang; Shi, Qianqian; Wei, Chaochun; Li, Yuan-Yuan; Li, Yi-Xue
2013-01-01
We are witnessing rapid progress in the development of methodologies for building the combinatorial gene regulatory networks involving both TFs (Transcription Factors) and miRNAs (microRNAs). There are a few tools available to do these jobs but most of them are not easy to use and not accessible online. A web server is especially needed in order to allow users to upload experimental expression datasets and build combinatorial regulatory networks corresponding to their particular contexts. In this work, we compiled putative TF-gene, miRNA-gene and TF-miRNA regulatory relationships from forward-engineering pipelines and curated them as built-in data libraries. We streamlined the R codes of our two separate forward-and-reverse engineering algorithms for combinatorial gene regulatory network construction and formalized them as two major functional modules. As a result, we released the cGRNB (combinatorial Gene Regulatory Networks Builder): a web server for constructing combinatorial gene regulatory networks through integrated engineering of seed-matching sequence information and gene expression datasets. The cGRNB enables two major network-building modules, one for MPGE (miRNA-perturbed gene expression) datasets and the other for parallel miRNA/mRNA expression datasets. A miRNA-centered two-layer combinatorial regulatory cascade is the output of the first module and a comprehensive genome-wide network involving all three types of combinatorial regulations (TF-gene, TF-miRNA, and miRNA-gene) are the output of the second module. In this article we propose cGRNB, a web server for building combinatorial gene regulatory networks through integrated engineering of seed-matching sequence information and gene expression datasets. Since parallel miRNA/mRNA expression datasets are rapidly accumulated by the advance of next-generation sequencing techniques, cGRNB will be very useful tool for researchers to build combinatorial gene regulatory networks based on expression datasets. The cGRNB web-server is free and available online at http://www.scbit.org/cgrnb.
NASA Technical Reports Server (NTRS)
Mackall, D. A.; Ishmael, S. D.; Regenie, V. A.
1983-01-01
Qualification considerations for assuring the safety of a life-critical digital flight control system include four major areas: systems interactions, verification, validation, and configuration control. The AFTI/F-16 design, development, and qualification illustrate these considerations. In this paper, qualification concepts, procedures, and methodologies are discussed and illustrated through specific examples.
ERIC Educational Resources Information Center
Anderson, Melissa L.; Wolf Craig, Kelly S.; Ziedonis, Douglas M.
2017-01-01
Deaf individuals experience significant obstacles to participating in behavioral health research when careful consideration is not given to accessibility during the design of study methodology. To inform such considerations, we conducted an exploratory secondary analysis of a mixed-methods study that originally explored 16 Deaf trauma survivors'…
Slavinskaya, N. A.; Abbasi, M.; Starcke, J. H.; ...
2017-01-24
An automated data-centric infrastructure, Process Informatics Model (PrIMe), was applied to validation and optimization of a syngas combustion model. The Bound-to-Bound Data Collaboration (B2BDC) module of PrIMe was employed to discover the limits of parameter modifications based on uncertainty quantification (UQ) and consistency analysis of the model–data system and experimental data, including shock-tube ignition delay times and laminar flame speeds. Existing syngas reaction models are reviewed, and the selected kinetic data are described in detail. Empirical rules were developed and applied to evaluate the uncertainty bounds of the literature experimental data. Here, the initial H 2/CO reaction model, assembled frommore » 73 reactions and 17 species, was subjected to a B2BDC analysis. For this purpose, a dataset was constructed that included a total of 167 experimental targets and 55 active model parameters. Consistency analysis of the composed dataset revealed disagreement between models and data. Further analysis suggested that removing 45 experimental targets, 8 of which were self-inconsistent, would lead to a consistent dataset. This dataset was subjected to a correlation analysis, which highlights possible directions for parameter modification and model improvement. Additionally, several methods of parameter optimization were applied, some of them unique to the B2BDC framework. The optimized models demonstrated improved agreement with experiments compared to the initially assembled model, and their predictions for experiments not included in the initial dataset (i.e., a blind prediction) were investigated. The results demonstrate benefits of applying the B2BDC methodology for developing predictive kinetic models.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Slavinskaya, N. A.; Abbasi, M.; Starcke, J. H.
An automated data-centric infrastructure, Process Informatics Model (PrIMe), was applied to validation and optimization of a syngas combustion model. The Bound-to-Bound Data Collaboration (B2BDC) module of PrIMe was employed to discover the limits of parameter modifications based on uncertainty quantification (UQ) and consistency analysis of the model–data system and experimental data, including shock-tube ignition delay times and laminar flame speeds. Existing syngas reaction models are reviewed, and the selected kinetic data are described in detail. Empirical rules were developed and applied to evaluate the uncertainty bounds of the literature experimental data. Here, the initial H 2/CO reaction model, assembled frommore » 73 reactions and 17 species, was subjected to a B2BDC analysis. For this purpose, a dataset was constructed that included a total of 167 experimental targets and 55 active model parameters. Consistency analysis of the composed dataset revealed disagreement between models and data. Further analysis suggested that removing 45 experimental targets, 8 of which were self-inconsistent, would lead to a consistent dataset. This dataset was subjected to a correlation analysis, which highlights possible directions for parameter modification and model improvement. Additionally, several methods of parameter optimization were applied, some of them unique to the B2BDC framework. The optimized models demonstrated improved agreement with experiments compared to the initially assembled model, and their predictions for experiments not included in the initial dataset (i.e., a blind prediction) were investigated. The results demonstrate benefits of applying the B2BDC methodology for developing predictive kinetic models.« less
Varret, C; Beronius, A; Bodin, L; Bokkers, B G H; Boon, P E; Burger, M; De Wit-Bos, L; Fischer, A; Hanberg, A; Litens-Karlsson, S; Slob, W; Wolterink, G; Zilliacus, J; Beausoleil, C; Rousselle, C
2018-01-15
This study aims to evaluate the evidence for the existence of non-monotonic dose-responses (NMDRs) of substances in the area of food safety. This review was performed following the systematic review methodology with the aim to identify in vivo studies published between January 2002 and February 2015 containing evidence for potential NMDRs. Inclusion and reliability criteria were defined and used to select relevant and reliable studies. A set of six checkpoints was developed to establish the likelihood that the data retrieved contained evidence for NMDR. In this review, 49 in vivo studies were identified as relevant and reliable, of which 42 were used for dose-response analysis. These studies contained 179 in vivo dose-response datasets with at least five dose groups (and a control group) as fewer doses cannot provide evidence for NMDR. These datasets were extracted and analyzed using the PROAST software package. The resulting dose-response relationships were evaluated for possible evidence of NMDRs by applying the six checkpoints. In total, 10 out of the 179 in vivo datasets fulfilled all six checkpoints. While these datasets could be considered as providing evidence for NMDR, replicated studies would still be needed to check if the results can be reproduced to rule out that the non-monotonicity was caused by incidental anomalies in that specific study. This approach, combining a systematic review with a set of checkpoints, is new and appears useful for future evaluations of the dose response datasets regarding evidence of non-monotonicity. Published by Elsevier Inc.
Cooper, P David; Smart, David R
2017-06-01
Recent Australian attempts to facilitate disinvestment in healthcare, by identifying instances of 'inappropriate' care from large Government datasets, are subject to significant methodological flaws. Amongst other criticisms has been the fact that the Government datasets utilized for this purpose correlate poorly with datasets collected by relevant professional bodies. Government data derive from official hospital coding, collected retrospectively by clerical personnel, whilst professional body data derive from unit-specific databases, collected contemporaneously with care by clinical personnel. Assessment of accuracy of official hospital coding data for hyperbaric services in a tertiary referral hospital. All official hyperbaric-relevant coding data submitted to the relevant Australian Government agencies by the Royal Hobart Hospital, Tasmania, Australia for financial year 2010-2011 were reviewed and compared against actual hyperbaric unit activity as determined by reference to original source documents. Hospital coding data contained one or more errors in diagnoses and/or procedures in 70% of patients treated with hyperbaric oxygen that year. Multiple discrete error types were identified, including (but not limited to): missing patients; missing treatments; 'additional' treatments; 'additional' patients; incorrect procedure codes and incorrect diagnostic codes. Incidental observations of errors in surgical, anaesthetic and intensive care coding within this cohort suggest that the problems are not restricted to the specialty of hyperbaric medicine alone. Publications from other centres indicate that these problems are not unique to this institution or State. Current Government datasets are irretrievably compromised and not fit for purpose. Attempting to inform the healthcare policy debate by reference to these datasets is inappropriate. Urgent clinical engagement with hospital coding departments is warranted.
Improving stability of prediction models based on correlated omics data by using network approaches.
Tissier, Renaud; Houwing-Duistermaat, Jeanine; Rodríguez-Girondo, Mar
2018-01-01
Building prediction models based on complex omics datasets such as transcriptomics, proteomics, metabolomics remains a challenge in bioinformatics and biostatistics. Regularized regression techniques are typically used to deal with the high dimensionality of these datasets. However, due to the presence of correlation in the datasets, it is difficult to select the best model and application of these methods yields unstable results. We propose a novel strategy for model selection where the obtained models also perform well in terms of overall predictability. Several three step approaches are considered, where the steps are 1) network construction, 2) clustering to empirically derive modules or pathways, and 3) building a prediction model incorporating the information on the modules. For the first step, we use weighted correlation networks and Gaussian graphical modelling. Identification of groups of features is performed by hierarchical clustering. The grouping information is included in the prediction model by using group-based variable selection or group-specific penalization. We compare the performance of our new approaches with standard regularized regression via simulations. Based on these results we provide recommendations for selecting a strategy for building a prediction model given the specific goal of the analysis and the sizes of the datasets. Finally we illustrate the advantages of our approach by application of the methodology to two problems, namely prediction of body mass index in the DIetary, Lifestyle, and Genetic determinants of Obesity and Metabolic syndrome study (DILGOM) and prediction of response of each breast cancer cell line to treatment with specific drugs using a breast cancer cell lines pharmacogenomics dataset.
Using the Spatial Distribution of Installers to Define Solar Photovoltaic Markets
DOE Office of Scientific and Technical Information (OSTI.GOV)
O'Shaughnessy, Eric; Nemet, Gregory F.; Darghouth, Naim
2016-09-01
Solar PV market research to date has largely relied on arbitrary jurisdictional boundaries, such as counties, to study solar PV market dynamics. This paper seeks to improve solar PV market research by developing a methodology to define solar PV markets. The methodology is based on the spatial distribution of solar PV installers. An algorithm is developed and applied to a rich dataset of solar PV installations to study the outcomes of the installer-based market definitions. The installer-based approach exhibits several desirable properties. Specifically, the higher market granularity of the installer-based approach will allow future PV market research to study themore » relationship between market dynamics and pricing with more precision.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ward, Lee H.; Laros, James H., III
This paper describes a methodology for implementing disk-less cluster systems using the Network File System (NFS) that scales to thousands of nodes. This method has been successfully deployed and is currently in use on several production systems at Sandia National Labs. This paper will outline our methodology and implementation, discuss hardware and software considerations in detail and present cluster configurations with performance numbers for various management operations like booting.
The Contribution of Human Factors in Military System Development: Methodological Considerations
1980-07-01
Risk/Uncertainty Analysis - Project Scoring - Utility Scales - Relevance Tree Techniques (Reverse Factor Analysis) 2. Computer Simulation Simulation...effectiveness of mathematical models for R&D project selection. Management Science, April 1973, 18. 6-43 .1~ *.-. Souder, W.E. h scoring methodology for...per some interval PROFICIENCY test scores (written) RADIATION radiation effects aircrew performance on radiation environments REACTION TIME 1) (time
Adolphus, Katie; Bellissimo, Nick; Lawton, Clare L; Ford, Nikki A; Rains, Tia M; Totosy de Zepetnek, Julia; Dye, Louise
2017-01-01
Breakfast is purported to confer a number of benefits on diet quality, health, appetite regulation, and cognitive performance. However, new evidence has challenged the long-held belief that breakfast is the most important meal of the day. This review aims to provide a comprehensive discussion of the key methodological challenges and considerations in studies assessing the effect of breakfast on cognitive performance and appetite control, along with recommendations for future research. This review focuses on the myriad challenges involved in studying children and adolescents specifically. Key methodological challenges and considerations include study design and location, sampling and sample section, choice of objective cognitive tests, choice of objective and subjective appetite measures, merits of providing a fixed breakfast compared with ad libitum, assessment and definition of habitual breakfast consumption, transparency of treatment condition, difficulty of isolating the direct effects of breakfast consumption, untangling acute and chronic effects, and influence of confounding variables. These methodological challenges have hampered a clear substantiation of the potential positive effects of breakfast on cognition and appetite control and contributed to the debate questioning the notion that breakfast is the most important meal of the day. © 2017 American Society for Nutrition.
Bellissimo, Nick; Ford, Nikki A; Rains, Tia M
2017-01-01
Breakfast is purported to confer a number of benefits on diet quality, health, appetite regulation, and cognitive performance. However, new evidence has challenged the long-held belief that breakfast is the most important meal of the day. This review aims to provide a comprehensive discussion of the key methodological challenges and considerations in studies assessing the effect of breakfast on cognitive performance and appetite control, along with recommendations for future research. This review focuses on the myriad challenges involved in studying children and adolescents specifically. Key methodological challenges and considerations include study design and location, sampling and sample section, choice of objective cognitive tests, choice of objective and subjective appetite measures, merits of providing a fixed breakfast compared with ad libitum, assessment and definition of habitual breakfast consumption, transparency of treatment condition, difficulty of isolating the direct effects of breakfast consumption, untangling acute and chronic effects, and influence of confounding variables. These methodological challenges have hampered a clear substantiation of the potential positive effects of breakfast on cognition and appetite control and contributed to the debate questioning the notion that breakfast is the most important meal of the day. PMID:28096143
Methodology, Methods, and Metrics for Testing and Evaluating Augmented Cognition Systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Greitzer, Frank L.
The augmented cognition research community seeks cognitive neuroscience-based solutions to improve warfighter performance by applying and managing mitigation strategies to reduce workload and improve the throughput and quality of decisions. The focus of augmented cognition mitigation research is to define, demonstrate, and exploit neuroscience and behavioral measures that support inferences about the warfighter’s cognitive state that prescribe the nature and timing of mitigation. A research challenge is to develop valid evaluation methodologies, metrics and measures to assess the impact of augmented cognition mitigations. Two considerations are external validity, which is the extent to which the results apply to operational contexts;more » and internal validity, which reflects the reliability of performance measures and the conclusions based on analysis of results. The scientific rigor of the research methodology employed in conducting empirical investigations largely affects the validity of the findings. External validity requirements also compel us to demonstrate operational significance of mitigations. Thus it is important to demonstrate effectiveness of mitigations under specific conditions. This chapter reviews some cognitive science and methodological considerations in designing augmented cognition research studies and associated human performance metrics and analysis methods to assess the impact of augmented cognition mitigations.« less
Assessment of health risks of policies
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ádám, Balázs, E-mail: badam@cmss.sdu.dk; Department of Preventive Medicine, Faculty of Public Health, University of Debrecen, P.O. Box 9, H-4012 Debrecen; Molnár, Ágnes, E-mail: MolnarAg@smh.ca
The assessment of health risks of policies is an inevitable, although challenging prerequisite for the inclusion of health considerations in political decision making. The aim of our project was to develop a so far missing methodological guide for the assessment of the complex impact structure of policies. The guide was developed in a consensual way based on experiences gathered during the assessment of specific national policies selected by the partners of an EU project. Methodological considerations were discussed and summarized in workshops and pilot tested on the EU Health Strategy for finalization. The combined tool, which includes a textual guidancemore » and a checklist, follows the top-down approach, that is, it guides the analysis of causal chains from the policy through related health determinants and risk factors to health outcomes. The tool discusses the most important practical issues of assessment by impact level. It emphasises the transparent identification and prioritisation of factors, the consideration of the feasibility of exposure and outcome assessment with special focus on quantification. The developed guide provides useful methodological instructions for the comprehensive assessment of health risks of policies that can be effectively used in the health impact assessment of policy proposals. - Highlights: • Methodological guide for the assessment of health risks of policies is introduced. • The tool is developed based on the experiences from several case studies. • The combined tool consists of a textual guidance and a checklist. • The top-down approach is followed through the levels of the full impact chain. • The guide provides assistance for the health impact assessment of policy proposals.« less
Fast and Accurate Support Vector Machines on Large Scale Systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vishnu, Abhinav; Narasimhan, Jayenthi; Holder, Larry
Support Vector Machines (SVM) is a supervised Machine Learning and Data Mining (MLDM) algorithm, which has become ubiquitous largely due to its high accuracy and obliviousness to dimensionality. The objective of SVM is to find an optimal boundary --- also known as hyperplane --- which separates the samples (examples in a dataset) of different classes by a maximum margin. Usually, very few samples contribute to the definition of the boundary. However, existing parallel algorithms use the entire dataset for finding the boundary, which is sub-optimal for performance reasons. In this paper, we propose a novel distributed memory algorithm to eliminatemore » the samples which do not contribute to the boundary definition in SVM. We propose several heuristics, which range from early (aggressive) to late (conservative) elimination of the samples, such that the overall time for generating the boundary is reduced considerably. In a few cases, a sample may be eliminated (shrunk) pre-emptively --- potentially resulting in an incorrect boundary. We propose a scalable approach to synchronize the necessary data structures such that the proposed algorithm maintains its accuracy. We consider the necessary trade-offs of single/multiple synchronization using in-depth time-space complexity analysis. We implement the proposed algorithm using MPI and compare it with libsvm--- de facto sequential SVM software --- which we enhance with OpenMP for multi-core/many-core parallelism. Our proposed approach shows excellent efficiency using up to 4096 processes on several large datasets such as UCI HIGGS Boson dataset and Offending URL dataset.« less
Lindgren, Annie R; Anderson, Frank E
2018-01-01
Historically, deep-level relationships within the molluscan class Cephalopoda (squids, cuttlefishes, octopods and their relatives) have remained elusive due in part to the considerable morphological diversity of extant taxa, a limited fossil record for species that lack a calcareous shell and difficulties in sampling open ocean taxa. Many conflicts identified by morphologists in the early 1900s remain unresolved today in spite of advances in morphological, molecular and analytical methods. In this study we assess the utility of transcriptome data for resolving cephalopod phylogeny, with special focus on the orders of Decapodiformes (open-eye squids, bobtail squids, cuttlefishes and relatives). To do so, we took new and previously published transcriptome data and used a unique cephalopod core ortholog set to generate a dataset that was subjected to an array of filtering and analytical methods to assess the impacts of: taxon sampling, ortholog number, compositional and rate heterogeneity and incongruence across loci. Analyses indicated that datasets that maximized taxonomic coverage but included fewer orthologs were less stable than datasets that sacrificed taxon sampling to increase the number of orthologs. Clades recovered irrespective of dataset, filtering or analytical method included Octopodiformes (Vampyroteuthis infernalis + octopods), Decapodiformes (squids, cuttlefishes and their relatives), and orders Oegopsida (open-eyed squids) and Myopsida (e.g., loliginid squids). Ordinal-level relationships within Decapodiformes were the most susceptible to dataset perturbation, further emphasizing the challenges associated with uncovering relationships at deep nodes in the cephalopod tree of life. Copyright © 2017 Elsevier Inc. All rights reserved.
Soh, Jung; Turinsky, Andrei L; Trinh, Quang M; Chang, Jasmine; Sabhaney, Ajay; Dong, Xiaoli; Gordon, Paul Mk; Janzen, Ryan Pw; Hau, David; Xia, Jianguo; Wishart, David S; Sensen, Christoph W
2009-01-01
We have developed a computational framework for spatiotemporal integration of molecular and anatomical datasets in a virtual reality environment. Using two case studies involving gene expression data and pharmacokinetic data, respectively, we demonstrate how existing knowledge bases for molecular data can be semantically mapped onto a standardized anatomical context of human body. Our data mapping methodology uses ontological representations of heterogeneous biomedical datasets and an ontology reasoner to create complex semantic descriptions of biomedical processes. This framework provides a means to systematically combine an increasing amount of biomedical imaging and numerical data into spatiotemporally coherent graphical representations. Our work enables medical researchers with different expertise to simulate complex phenomena visually and to develop insights through the use of shared data, thus paving the way for pathological inference, developmental pattern discovery and biomedical hypothesis testing.
A large dataset of protein dynamics in the mammalian heart proteome.
Lau, Edward; Cao, Quan; Ng, Dominic C M; Bleakley, Brian J; Dincer, T Umut; Bot, Brian M; Wang, Ding; Liem, David A; Lam, Maggie P Y; Ge, Junbo; Ping, Peipei
2016-03-15
Protein stability is a major regulatory principle of protein function and cellular homeostasis. Despite limited understanding on mechanisms, disruption of protein turnover is widely implicated in diverse pathologies from heart failure to neurodegenerations. Information on global protein dynamics therefore has the potential to expand the depth and scope of disease phenotyping and therapeutic strategies. Using an integrated platform of metabolic labeling, high-resolution mass spectrometry and computational analysis, we report here a comprehensive dataset of the in vivo half-life of 3,228 and the expression of 8,064 cardiac proteins, quantified under healthy and hypertrophic conditions across six mouse genetic strains commonly employed in biomedical research. We anticipate these data will aid in understanding key mitochondrial and metabolic pathways in heart diseases, and further serve as a reference for methodology development in dynamics studies in multiple organ systems.
Methodological challenges to bridge the gap between regional climate and hydrology models
NASA Astrophysics Data System (ADS)
Bozhinova, Denica; José Gómez-Navarro, Juan; Raible, Christoph; Felder, Guido
2017-04-01
The frequency and severity of floods worldwide, together with their impacts, are expected to increase under climate change scenarios. It is therefore very important to gain insight into the physical mechanisms responsible for such events in order to constrain the associated uncertainties. Model simulations of the climate and hydrological processes are important tools that can provide insight in the underlying physical processes and thus enable an accurate assessment of the risks. Coupled together, they can provide a physically consistent picture that allows to assess the phenomenon in a comprehensive way. However, climate and hydrological models work at different temporal and spatial scales, so there are a number of methodological challenges that need to be carefully addressed. An important issue pertains the presence of biases in the simulation of precipitation. Climate models in general, and Regional Climate models (RCMs) in particular, are affected by a number of systematic biases that limit their reliability. In many studies, prominently the assessment of changes due to climate change, such biases are minimised by applying the so-called delta approach, which focuses on changes disregarding absolute values that are more affected by biases. However, this approach is not suitable in this scenario, as the absolute value of precipitation, rather than the change, is fed into the hydrological model. Therefore, bias has to be previously removed, being this a complex matter where various methodologies have been proposed. In this study, we apply and discuss the advantages and caveats of two different methodologies that correct the simulated precipitation to minimise differences with respect an observational dataset: a linear fit (FIT) of the accumulated distributions and Quantile Mapping (QM). The target region is Switzerland, and therefore the observational dataset is provided by MeteoSwiss. The RCM is the Weather Research and Forecasting model (WRF), driven at the boundaries by the Community Earth System Model (CESM). The raw simulation driven by CESM exhibit prominent biases that stand out in the evolution of the annual cycle and demonstrate that the correction of biases is mandatory in this type of studies, rather than a minor correction that might be neglected. The simulation spans the period 1976 - 2005, although the application of the correction is carried out on a daily basis. Both methods lead to a corrected field of precipitation that respects the temporal evolution of the simulated precipitation, at the same time that mimics the distribution of precipitation according to the one in the observations. Due to the nature of the two methodologies, there are important differences between the products of both corrections, that lead to dataset with different properties. FIT is generally more accurate regarding the reproduction of the tails of the distribution, i.e. extreme events, whereas the nature of QM renders it a general-purpose correction whose skill is equally distributed across the full distribution of precipitation, including central values.
Compressive sensing reconstruction of 3D wet refractivity based on GNSS and InSAR observations
NASA Astrophysics Data System (ADS)
Heublein, Marion; Alshawaf, Fadwa; Erdnüß, Bastian; Zhu, Xiao Xiang; Hinz, Stefan
2018-06-01
In this work, the reconstruction quality of an approach for neutrospheric water vapor tomography based on Slant Wet Delays (SWDs) obtained from Global Navigation Satellite Systems (GNSS) and Interferometric Synthetic Aperture Radar (InSAR) is investigated. The novelties of this approach are (1) the use of both absolute GNSS and absolute InSAR SWDs for tomography and (2) the solution of the tomographic system by means of compressive sensing (CS). The tomographic reconstruction is performed based on (i) a synthetic SWD dataset generated using wet refractivity information from the Weather Research and Forecasting (WRF) model and (ii) a real dataset using GNSS and InSAR SWDs. Thus, the validation of the achieved results focuses (i) on a comparison of the refractivity estimates with the input WRF refractivities and (ii) on radiosonde profiles. In case of the synthetic dataset, the results show that the CS approach yields a more accurate and more precise solution than least squares (LSQ). In addition, the benefit of adding synthetic InSAR SWDs into the tomographic system is analyzed. When applying CS, adding synthetic InSAR SWDs into the tomographic system improves the solution both in magnitude and in scattering. When solving the tomographic system by means of LSQ, no clear behavior is observed. In case of the real dataset, the estimated refractivities of both methodologies show a consistent behavior although the LSQ and CS solution strategies differ.
Introduction to SIMRAND: Simulation of research and development project
NASA Technical Reports Server (NTRS)
Miles, R. F., Jr.
1982-01-01
SIMRAND: SIMulation of Research ANd Development Projects is a methodology developed to aid the engineering and management decision process in the selection of the optimal set of systems or tasks to be funded on a research and development project. A project may have a set of systems or tasks under consideration for which the total cost exceeds the allocated budget. Other factors such as personnel and facilities may also enter as constraints. Thus the project's management must select, from among the complete set of systems or tasks under consideration, a partial set that satisfies all project constraints. The SIMRAND methodology uses analytical techniques and probability theory, decision analysis of management science, and computer simulation, in the selection of this optimal partial set. The SIMRAND methodology is truly a management tool. It initially specifies the information that must be generated by the engineers, thus providing information for the management direction of the engineers, and it ranks the alternatives according to the preferences of the decision makers.
Squires, Hazel; Chilcott, James; Akehurst, Ronald; Burr, Jennifer; Kelly, Michael P
2016-04-01
To identify the key methodological challenges for public health economic modelling and set an agenda for future research. An iterative literature search identified papers describing methodological challenges for developing the structure of public health economic models. Additional multidisciplinary literature searches helped expand upon important ideas raised within the review. Fifteen articles were identified within the formal literature search, highlighting three key challenges: inclusion of non-healthcare costs and outcomes; inclusion of equity; and modelling complex systems and multi-component interventions. Based upon these and multidisciplinary searches about dynamic complexity, the social determinants of health, and models of human behaviour, six areas for future research were specified. Future research should focus on: the use of systems approaches within health economic modelling; approaches to assist the systematic consideration of the social determinants of health; methods for incorporating models of behaviour and social interactions; consideration of equity; and methodology to help modellers develop valid, credible and transparent public health economic model structures.
Do childhood vaccines have non-specific effects on mortality?
Cooper, William O.; Boyce, Thomas G.; Wright, Peter F.; Griffin, Marie R.
2003-01-01
A recent article by Kristensen et al. suggested that measles vaccine and bacille Calmette-Gu rin (BCG) vaccine might reduce mortality beyond what is expected simply from protection against measles and tuberculosis. Previous reviews of the potential effects of childhood vaccines on mortality have not considered methodological features of reviewed studies. Methodological considerations play an especially important role in observational assessments, in which selection factors for vaccination may be difficult to ascertain. We reviewed 782 English language articles on vaccines and childhood mortality and found only a few whose design met the criteria for methodological rigor. The data reviewed suggest that measles vaccine delivers its promised reduction in mortality, but there is insufficient evidence to suggest a mortality benefit above that caused by its effect on measles disease and its sequelae. Our review of the available data in the literature reinforces how difficult answering these considerations has been and how important study design will be in determining the effect of specific vaccines on all-cause mortality. PMID:14758409
Oncogenic gene fusions drive many human cancers, but tools to more quickly unravel their functional contributions are needed. Here we describe methodology permitting fusion gene construction for functional evaluation. Using this strategy, we engineered the known fusion oncogenes, BCR-ABL1, EML4-ALK, and ETV6-NTRK3, as well as 20 previously uncharacterized fusion genes identified in TCGA datasets.
Bayesian Hierarchical Models to Augment the Mediterranean Forecast System
2010-09-30
In part 2 (Bonazzi et al., 2010), the impact of the ensemble forecast methodology based on MFS-Wind-BHM perturbations is documented. Forecast...absence of dt data stage inputs, the forecast impact of MFS-Error-BHM is neutral. Experiments are underway now to introduce dt back into the MFS-Error...BHM and quantify forecast impacts at MFS. MFS-SuperEnsemble-BHM We have assembled all needed datasets and completed algorithmic development
2012-09-01
supported by the National Science Foundation (NSF) IGERT 9972762, the Army Research Institute (ARI) W91WAW07C0063, the Army Research Laboratory (ARL/CTA...prediction models in AutoMap .................................................. 144 Figure 13: Decision Tree for prediction model selection in...generated for nationally funded initiatives and made available through the Linguistic Data Consortium (LDC). An overview of these datasets is provided in
NASA Astrophysics Data System (ADS)
Pinales, J. C.; Graber, H. C.; Hargrove, J. T.; Caruso, M. J.
2016-02-01
Previous studies have demonstrated the ability to detect and classify marine hydrocarbon films with spaceborne synthetic aperture radar (SAR) imagery. The dampening effects of hydrocarbon discharges on small surface capillary-gravity waves renders the ocean surface "radar dark" compared with the standard wind-borne ocean surfaces. Given the scope and impact of events like the Deepwater Horizon oil spill, the need for improved, automated and expedient monitoring of hydrocarbon-related marine anomalies has become a pressing and complex issue for governments and the extraction industry. The research presented here describes the development, training, and utilization of an algorithm that detects marine oil spills in an automated, semi-supervised manner, utilizing X-, C-, or L-band SAR data as the primary input. Ancillary datasets include related radar-borne variables (incidence angle, etc.), environmental data (wind speed, etc.) and textural descriptors. Shapefiles produced by an experienced human-analyst served as targets (validation) during the training portion of the investigation. Training and testing datasets were chosen for development and assessment of algorithm effectiveness as well as optimal conditions for oil detection in SAR data. The algorithm detects oil spills by following a 3-step methodology: object detection, feature extraction, and classification. Previous oil spill detection and classification methodologies such as machine learning algorithms, artificial neural networks (ANN), and multivariate classification methods like partial least squares-discriminant analysis (PLS-DA) are evaluated and compared. Statistical, transform, and model-based image texture techniques, commonly used for object mapping directly or as inputs for more complex methodologies, are explored to determine optimal textures for an oil spill detection system. The influence of the ancillary variables is explored, with a particular focus on the role of strong vs. weak wind forcing.
Survival prediction of trauma patients: a study on US National Trauma Data Bank.
Sefrioui, I; Amadini, R; Mauro, J; El Fallahi, A; Gabbrielli, M
2017-12-01
Exceptional circumstances like major incidents or natural disasters may cause a huge number of victims that might not be immediately and simultaneously saved. In these cases it is important to define priorities avoiding to waste time and resources for not savable victims. Trauma and Injury Severity Score (TRISS) methodology is the well-known and standard system usually used by practitioners to predict the survival probability of trauma patients. However, practitioners have noted that the accuracy of TRISS predictions is unacceptable especially for severely injured patients. Thus, alternative methods should be proposed. In this work we evaluate different approaches for predicting whether a patient will survive or not according to simple and easily measurable observations. We conducted a rigorous, comparative study based on the most important prediction techniques using real clinical data of the US National Trauma Data Bank. Empirical results show that well-known Machine Learning classifiers can outperform the TRISS methodology. Based on our findings, we can say that the best approach we evaluated is Random Forest: it has the best accuracy, the best area under the curve, and k-statistic, as well as the second-best sensitivity and specificity. It has also a good calibration curve. Furthermore, its performance monotonically increases as the dataset size grows, meaning that it can be very effective to exploit incoming knowledge. Considering the whole dataset, it is always better than TRISS. Finally, we implemented a new tool to compute the survival of victims. This will help medical practitioners to obtain a better accuracy than the TRISS tools. Random Forests may be a good candidate solution for improving the predictions on survival upon the standard TRISS methodology.
Land cover mapping for development planning in Eastern and Southern Africa
NASA Astrophysics Data System (ADS)
Oduor, P.; Flores Cordova, A. I.; Wakhayanga, J. A.; Kiema, J.; Farah, H.; Mugo, R. M.; Wahome, A.; Limaye, A. S.; Irwin, D.
2016-12-01
Africa continues to experience intensification of land use, driven by competition for resources and a growing population. Land cover maps are some of the fundamental datasets required by numerous stakeholders to inform a number of development decisions. For instance, they can be integrated with other datasets to create value added products such as vulnerability impact assessment maps, and natural capital accounting products. In addition, land cover maps are used as inputs into Greenhouse Gas (GHG) inventories to inform the Agriculture, Forestry and other Land Use (AFOLU) sector. However, the processes and methodologies of creating land cover maps consistent with international and national land cover classification schemes can be challenging, especially in developing countries where skills, hardware and software resources can be limiting. To meet this need, SERVIR Eastern and Southern Africa developed methodologies and stakeholder engagement processes that led to a successful initiative in which land cover maps for 9 countries (Malawi, Rwanda, Namibia, Botswana, Lesotho, Ethiopia, Uganda, Zambia and Tanzania) were developed, using 2 major classification schemes. The first sets of maps were developed based on an internationally acceptable classification system, while the second sets of maps were based on a nationally defined classification system. The mapping process benefited from reviews from national experts and also from technical advisory groups. The maps have found diverse uses, among them the definition of the Forest Reference Levels in Zambia. In Ethiopia, the maps have been endorsed by the national mapping agency as part of national data. The data for Rwanda is being used to inform the Natural Capital Accounting process, through the WAVES program, a World Bank Initiative. This work illustrates the methodologies and stakeholder engagement processes that brought success to this land cover mapping initiative.
Shah, Tayyab Ikram; Milosavljevic, Stephan; Bath, Brenna
2017-06-01
This research is focused on methodological challenges and considerations associated with the estimation of the geographical aspects of access to healthcare with a focus on rural and remote areas. With the assumption that GIS-based accessibility measures for rural healthcare services will vary across geographic units of analysis and estimation techniques, which could influence the interpretation of spatial access to rural healthcare services. Estimations of geographical accessibility depend on variations of the following three parameters: 1) quality of input data; 2) accessibility method; and 3) geographical area. This research investigated the spatial distributions of physiotherapists (PTs) in comparison to family physicians (FPs) across Saskatchewan, Canada. The three-steps floating catchment areas (3SFCA) method was applied to calculate the accessibility scores for both PT and FP services at two different geographical units. A comparison of accessibility scores to simple healthcare provider-to-population ratios was also calculated. The results vary considerably depending on the accessibility methods used and the choice of geographical area unit for measuring geographical accessibility for both FP and PT services. These findings raise intriguing questions regarding the nature and extent of technical issues and methodological considerations that can affect GIS-based measures in health services research and planning. This study demonstrates how the selection of geographical areal units and different methods for measuring geographical accessibility could affect the distribution of healthcare resources in rural areas. These methodological issues have implications for determining where there is reduced access that will ultimately impact health human resource priorities and policies. Copyright © 2017 Elsevier Ltd. All rights reserved.
Rooting phylogenies using gene duplications: an empirical example from the bees (Apoidea).
Brady, Seán G; Litman, Jessica R; Danforth, Bryan N
2011-09-01
The placement of the root node in a phylogeny is fundamental to characterizing evolutionary relationships. The root node of bee phylogeny remains unclear despite considerable previous attention. In order to test alternative hypotheses for the location of the root node in bees, we used the F1 and F2 paralogs of elongation factor 1-alpha (EF-1α) to compare the tree topologies that result when using outgroup versus paralogous rooting. Fifty-two taxa representing each of the seven bee families were sequenced for both copies of EF-1α. Two datasets were analyzed. In the first (the "concatenated" dataset), the F1 and F2 copies for each species were concatenated and the tree was rooted using appropriate outgroups (sphecid and crabronid wasps). In the second dataset (the "duplicated" dataset), the F1 and F2 copies were aligned to each another and each copy for all taxa were treated as separate terminals. In this dataset, the root was placed between the F1 and F2 copies (e.g., paralog rooting). Bayesian analyses demonstrate that the outgroup rooting approach outperforms paralog rooting, recovering deeper clades and showing stronger support for groups well established by both morphological and other molecular data. Sequence characteristics of the two copies were compared at the amino acid level, but little evidence was found to suggest that one copy is more functionally conserved. Although neither approach yields an unambiguous root to the tree, both approaches strongly indicate that the root of bee phylogeny does not fall near Colletidae, as has been previously proposed. We discuss paralog rooting as a general strategy and why this approach performs relatively poorly with our particular dataset. Copyright © 2011 Elsevier Inc. All rights reserved.
Zhang, Zhe; Erbe, Malena; He, Jinlong; Ober, Ulrike; Gao, Ning; Zhang, Hao; Simianer, Henner; Li, Jiaqi
2015-02-09
Obtaining accurate predictions of unobserved genetic or phenotypic values for complex traits in animal, plant, and human populations is possible through whole-genome prediction (WGP), a combined analysis of genotypic and phenotypic data. Because the underlying genetic architecture of the trait of interest is an important factor affecting model selection, we propose a new strategy, termed BLUP|GA (BLUP-given genetic architecture), which can use genetic architecture information within the dataset at hand rather than from public sources. This is achieved by using a trait-specific covariance matrix ( T: ), which is a weighted sum of a genetic architecture part ( S: matrix) and the realized relationship matrix ( G: ). The algorithm of BLUP|GA (BLUP-given genetic architecture) is provided and illustrated with real and simulated datasets. Predictive ability of BLUP|GA was validated with three model traits in a dairy cattle dataset and 11 traits in three public datasets with a variety of genetic architectures and compared with GBLUP and other approaches. Results show that BLUP|GA outperformed GBLUP in 20 of 21 scenarios in the dairy cattle dataset and outperformed GBLUP, BayesA, and BayesB in 12 of 13 traits in the analyzed public datasets. Further analyses showed that the difference of accuracies for BLUP|GA and GBLUP significantly correlate with the distance between the T: and G: matrices. The new strategy applied in BLUP|GA is a favorable and flexible alternative to the standard GBLUP model, allowing to account for the genetic architecture of the quantitative trait under consideration when necessary. This feature is mainly due to the increased similarity between the trait-specific relationship matrix ( T: matrix) and the genetic relationship matrix at unobserved causal loci. Applying BLUP|GA in WGP would ease the burden of model selection. Copyright © 2015 Zhang et al.
NASA Astrophysics Data System (ADS)
Slinskey, E. A.; Loikith, P. C.; Waliser, D. E.; Goodman, A.
2017-12-01
Extreme precipitation events are associated with numerous societal and environmental impacts. Furthermore, anthropogenic climate change is projected to alter precipitation intensity across portions of the Continental United States (CONUS). Therefore, a spatial understanding and intuitive means of monitoring extreme precipitation over time is critical. Towards this end, we apply an event-based indicator, developed as a part of NASA's support of the ongoing efforts of the US National Climate Assessment, which assigns categories to extreme precipitation events based on 3-day storm totals as a basis for dataset intercomparison. To assess observational uncertainty across a wide range of historical precipitation measurement approaches, we intercompare in situ station data from the Global Historical Climatology Network (GHCN), satellite-derived precipitation data from NASA's Tropical Rainfall Measuring Mission (TRMM), gridded in situ station data from the Parameter-elevation Regressions on Independent Slopes Model (PRISM), global reanalysis from NASA's Modern Era Retrospective-Analysis version 2 (MERRA 2), and regional reanalysis with gauge data assimilation from NCEP's North American Regional Reanalysis (NARR). Results suggest considerable variability across the five-dataset suite in the frequency, spatial extent, and magnitude of extreme precipitation events. Consistent with expectations, higher resolution datasets were found to resemble station data best and capture a greater frequency of high-end extreme events relative to lower spatial resolution datasets. The degree of dataset agreement varies regionally, however all datasets successfully capture the seasonal cycle of precipitation extremes across the CONUS. These intercomparison results provide additional insight about observational uncertainty and the ability of a range of precipitation measurement and analysis products to capture extreme precipitation event climatology. While the event category threshold is fixed in this analysis, preliminary results from the development of a flexible categorization scheme, that scales with grid resolution, are presented.
Kent, Peter; Jensen, Rikke K; Kongsted, Alice
2014-10-02
There are various methodological approaches to identifying clinically important subgroups and one method is to identify clusters of characteristics that differentiate people in cross-sectional and/or longitudinal data using Cluster Analysis (CA) or Latent Class Analysis (LCA). There is a scarcity of head-to-head comparisons that can inform the choice of which clustering method might be suitable for particular clinical datasets and research questions. Therefore, the aim of this study was to perform a head-to-head comparison of three commonly available methods (SPSS TwoStep CA, Latent Gold LCA and SNOB LCA). The performance of these three methods was compared: (i) quantitatively using the number of subgroups detected, the classification probability of individuals into subgroups, the reproducibility of results, and (ii) qualitatively using subjective judgments about each program's ease of use and interpretability of the presentation of results.We analysed five real datasets of varying complexity in a secondary analysis of data from other research projects. Three datasets contained only MRI findings (n = 2,060 to 20,810 vertebral disc levels), one dataset contained only pain intensity data collected for 52 weeks by text (SMS) messaging (n = 1,121 people), and the last dataset contained a range of clinical variables measured in low back pain patients (n = 543 people). Four artificial datasets (n = 1,000 each) containing subgroups of varying complexity were also analysed testing the ability of these clustering methods to detect subgroups and correctly classify individuals when subgroup membership was known. The results from the real clinical datasets indicated that the number of subgroups detected varied, the certainty of classifying individuals into those subgroups varied, the findings had perfect reproducibility, some programs were easier to use and the interpretability of the presentation of their findings also varied. The results from the artificial datasets indicated that all three clustering methods showed a near-perfect ability to detect known subgroups and correctly classify individuals into those subgroups. Our subjective judgement was that Latent Gold offered the best balance of sensitivity to subgroups, ease of use and presentation of results with these datasets but we recognise that different clustering methods may suit other types of data and clinical research questions.
NASA Technical Reports Server (NTRS)
Weber, Mark; Coldewey-Egbers, Melanie; Fioletov, Vitali E.; Frith, Stacey M.; Wild, Jeannette D.; Burrows, John P.; Loyola, Diego
2018-01-01
We report on updated trends using different merged datasets from satellite and ground-based observations for the period from 1979 to 2016. Trends were determined by applying a multiple linear regression (MLR) to annual mean zonal mean data. Merged datasets used here include NASA MOD v8.6 and National Oceanic and Atmospheric Administration (NOAA) merge v8.6, both based on data from the series of Solar Backscatter UltraViolet (SBUV) and SBUV-2 satellite instruments (1978–present) as well as the Global Ozone Monitoring Experiment (GOME)-type Total Ozone (GTO) and GOME-SCIAMACHY-GOME-2 (GSG) merged datasets (1995-present), mainly comprising satellite data from GOME, the Scanning Imaging Absorption Spectrometer for Atmospheric Chartography (SCIAMACHY), and GOME-2A. The fifth dataset consists of the monthly mean zonal mean data from ground-based measurements collected at World Ozone and UV Data Center (WOUDC). The addition of four more years of data since the last World Meteorological Organization (WMO) ozone assessment (2013-2016) shows that for most datasets and regions the trends since the stratospheric halogen reached its maximum (approximately 1996 globally and approximately 2000 in polar regions) are mostly not significantly different from zero. However, for some latitudes, in particular the Southern Hemisphere extratropics and Northern Hemisphere subtropics, several datasets show small positive trends of slightly below +1 percent decade(exp. -1) that are barely statistically significant at the 2 Sigma uncertainty level. In the tropics, only two datasets show significant trends of +0.5 to +0.8 percent(exp.-1), while the others show near-zero trends. Positive trends since 2000 have been observed over Antarctica in September, but near-zero trends are found in October as well as in March over the Arctic. Uncertainties due to possible drifts between the datasets, from the merging procedure used to combine satellite datasets and related to the low sampling of ground-based data, are not accounted for in the trend analysis. Consequently, the retrieved trends can be only considered to be at the brink of becoming significant, but there are indications that we are about to emerge into the expected recovery phase. However, the recent trends are still considerably masked by the observed large year-to-year dynamical variability in total ozone.
NASA Astrophysics Data System (ADS)
Weber, Mark; Coldewey-Egbers, Melanie; Fioletov, Vitali E.; Frith, Stacey M.; Wild, Jeannette D.; Burrows, John P.; Long, Craig S.; Loyola, Diego
2018-02-01
We report on updated trends using different merged datasets from satellite and ground-based observations for the period from 1979 to 2016. Trends were determined by applying a multiple linear regression (MLR) to annual mean zonal mean data. Merged datasets used here include NASA MOD v8.6 and National Oceanic and Atmospheric Administration (NOAA) merge v8.6, both based on data from the series of Solar Backscatter UltraViolet (SBUV) and SBUV-2 satellite instruments (1978-present) as well as the Global Ozone Monitoring Experiment (GOME)-type Total Ozone (GTO) and GOME-SCIAMACHY-GOME-2 (GSG) merged datasets (1995-present), mainly comprising satellite data from GOME, the Scanning Imaging Absorption Spectrometer for Atmospheric Chartography (SCIAMACHY), and GOME-2A. The fifth dataset consists of the monthly mean zonal mean data from ground-based measurements collected at World Ozone and UV Data Center (WOUDC). The addition of four more years of data since the last World Meteorological Organization (WMO) ozone assessment (2013-2016) shows that for most datasets and regions the trends since the stratospheric halogen reached its maximum (˜ 1996 globally and ˜ 2000 in polar regions) are mostly not significantly different from zero. However, for some latitudes, in particular the Southern Hemisphere extratropics and Northern Hemisphere subtropics, several datasets show small positive trends of slightly below +1 % decade-1 that are barely statistically significant at the 2σ uncertainty level. In the tropics, only two datasets show significant trends of +0.5 to +0.8 % decade-1, while the others show near-zero trends. Positive trends since 2000 have been observed over Antarctica in September, but near-zero trends are found in October as well as in March over the Arctic. Uncertainties due to possible drifts between the datasets, from the merging procedure used to combine satellite datasets and related to the low sampling of ground-based data, are not accounted for in the trend analysis. Consequently, the retrieved trends can be only considered to be at the brink of becoming significant, but there are indications that we are about to emerge into the expected recovery phase. However, the recent trends are still considerably masked by the observed large year-to-year dynamical variability in total ozone.
NASA Technical Reports Server (NTRS)
Miller, James; Leggett, Jay; Kramer-White, Julie
2008-01-01
A team directed by the NASA Engineering and Safety Center (NESC) collected methodologies for how best to develop safe and reliable human rated systems and how to identify the drivers that provide the basis for assessing safety and reliability. The team also identified techniques, methodologies, and best practices to assure that NASA can develop safe and reliable human rated systems. The results are drawn from a wide variety of resources, from experts involved with the space program since its inception to the best-practices espoused in contemporary engineering doctrine. This report focuses on safety and reliability considerations and does not duplicate or update any existing references. Neither does it intend to replace existing standards and policy.
Garg, Rakesh
2016-09-01
The conduct of research requires a systematic approach involving diligent planning and its execution as planned. It comprises various essential predefined components such as aims, population, conduct/technique, outcome and statistical considerations. These need to be objective, reliable and in a repeatable format. Hence, the understanding of the basic aspects of methodology is essential for any researcher. This is a narrative review and focuses on various aspects of the methodology for conduct of a clinical research. The relevant keywords were used for literature search from various databases and from bibliographies of the articles.
Griffith, James W; Sumner, Jennifer A; Raes, Filip; Barnhofer, Thorsten; Debeer, Elise; Hermans, Dirk
2012-12-01
Autobiographical memory is a multifaceted construct that is related to psychopathology and other difficulties in functioning. Across many studies, a variety of methods have been used to study autobiographical memory. The relationship between overgeneral autobiographical memory (OGM) and psychopathology has been of particular interest, and many studies of this cognitive phenomenon rely on the Autobiographical Memory Test (AMT) to assess it. In this paper, we examine several methodological approaches to studying autobiographical memory, and focus primarily on methodological and psychometric considerations in OGM research. We pay particular attention to what is known about the reliability, validity, and methodological variations of the AMT. The AMT has adequate psychometric properties, but there is great variability in methodology across studies that use it. Methodological recommendations and suggestions for future studies are presented. Copyright © 2011 Elsevier Ltd. All rights reserved.
ERIC Educational Resources Information Center
Cosner, Shelby; Jones, Mary F.
2016-01-01
Purpose: The purpose of this paper is to advance a framework that identifies three key domains of work and a set of more nuanced considerations and actions within each domain for school leaders seeking to improve school-wide student learning in low-performing schools facing conditions of accountability. Design/methodology/approach: Review of…
Kaufman, Tanya K; Sheehan, Daniel M; Rundle, Andrew; Neckerman, Kathryn M; Bader, Michael D M; Jack, Darby; Lovasi, Gina S
2015-09-29
The densities of food retailers, alcohol outlets, physical activity facilities, and medical facilities have been associated with diet, physical activity, and management of medical conditions. Most of the research, however, has relied on cross-sectional studies. In this paper, we assess methodological issues raised by a data source that is increasingly used to characterize change in the local business environment: the National Establishment Time Series (NETS) dataset. Longitudinal data, such as NETS, offer opportunities to assess how differential access to resources impacts population health, to consider correlations among multiple environmental influences across the life course, and to gain a better understanding of their interactions and cumulative health effects. Longitudinal data also introduce new data management, geoprocessing, and business categorization challenges. Examining geocoding accuracy and categorization over 21 years of data in 23 counties surrounding New York City (NY, USA), we find that health-related business environments change considerably over time. We note that re-geocoding data may improve spatial precision, particularly in early years. Our intent with this paper is to make future public health applications of NETS data more efficient, since the size and complexity of the data can be difficult to exploit fully within its 2-year data-licensing period. Further, standardized approaches to NETS and other "big data" will facilitate the veracity and comparability of results across studies.